-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcase-study.html
More file actions
1253 lines (1220 loc) · 67.7 KB
/
case-study.html
File metadata and controls
1253 lines (1220 loc) · 67.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html data-wf-page="5f71dd169010d6326b65485d">
<head>
<meta charset="utf-8" />
<title>Pilot • Case Study</title>
<meta content="width=device-width, initial-scale=1" name="viewport" />
<link href="assets/css/style.css" rel="stylesheet" type="text/css" />
<script src="https://ajax.googleapis.com/ajax/libs/webfont/1.6.26/webfont.js" type="text/javascript"></script>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Raleway&family=Titillium+Web&display=swap" rel="stylesheet" media="all">
<script type="text/javascript">
!(function (o, c) {
var n = c.documentElement,
t = " w-mod-";
(n.className += t + "js"),
("ontouchstart" in o ||
(o.DocumentTouch && c instanceof DocumentTouch)) &&
(n.className += t + "touch");
})(window, document);
</script>
<link href="assets/images/pilot-logo.ico" rel="shortcut icon" type="image/x-icon" />
<link href="assets/images/pilot-logo.ico" rel="apple-touch-icon" />
<script src="https://kit.fontawesome.com/d019875f94.js" crossorigin="anonymous"></script>
<meta name="image" property="og:image" content="assets/images/thumbnail.png" />
</head>
<body>
<div class="navigation-wrap">
<div data-collapse="medium" data-animation="default" data-duration="400" role="banner" class="navigation w-nav">
<div class="navigation-container">
<div class="navigation-left">
<a href="/" aria-current="page" class="brand w-nav-brand w—current" aria-label="home">
<img src="assets/images/logo-mono.png" alt="" class="template-logo" />
</a>
<nav role="navigation" class="nav-menu w-nav-menu">
<a href="/case-study" class="link-block w-inline-block">
<div>Case Study</div>
</a>
<a href="/team" class="link-block w-inline-block">
<div>The Team</div>
</a>
</nav>
</div>
<div class="navigation-right">
<div class="login-buttons">
<a href="https://github.com/pilot-framework" target="_blank">
<span style="color: #161d6f">
<i class="fab fa-github fa-lg"></i>
</span>
</a>
</div>
</div>
</div>
<div class="w-nav-overlay" data-wf-ignore="" id="w-nav-overlay-0"></div>
</div>
</div>
<div id="sidebar" class="toc">
</div>
<div class="section header">
<article class="container case-study-container">
<div class="hero-text-container">
<h1 class="h1 centered">Case Study</h1>
</div>
<div id="case-study">
<br />
<br />
<!-- Section 1 -->
<h2 class="h2">1 Introduction</h2>
<br>
<p>
The last decade has seen a paradigm shift in computing infrastructure. Systems are becoming more and more
ephemeral, abstracted, and distributed. Innovations in computing infrastructure and software architectures
have yielded simpler components at higher levels of abstraction, allowing engineers to deploy and stitch
together many more different kinds of software components today, resulting in vastly increased complexity.
</p>
<br />
<p>
This complexity has only been increasing as more and more companies adopt a multi-cloud strategy and utilize
services across cloud providers.
</p>
<h3>1.1 What is Pilot?</h3>
<p>
Pilot is an open-source, multi-cloud framework that provisions an internal PaaS with a workflow-agnostic
build, deploy, and release pipeline. Our team built Pilot in order to help small teams tackle the challenges
of adopting a multi-cloud strategy.
</p>
<br />
<p>
Pilot enables you to deploy, manage, and quickly iterate upon your applications regardless of the complexity
of the underlying architecture. With Pilot, you don't have to learn the specifics of a certain service offered
by a cloud provider - such as Amazon Web Service's Elastic Container Service. You can simply give Pilot some
cursory information about your application, and Pilot will handle the rest. The same goes for other cloud providers,
such as Google Cloud Platform.
</p>
<br />
<p>
All of this allows developers to quickly get an application up and running on their desired cloud provider without
having to trawl through pages and pages of documentation or learn the ins-and-outs of how that cloud provider works.
</p>
<br />
<!-- Section 2 -->
<h2 class="h2">2 Multi-Cloud Strategy</h2>
<h3>2.1 ACME: An Example User Story</h3>
<p>
To discuss the adoption of a multi-cloud strategy, let's use an example company ACME. ACME's application is currently
a monolithic application - all business logic is handled within the same code-base. And in ACME's case, on a bare-metal
server that they own. This has been ACME's tried and true approach for awhile now; however, this infrastructure has limits.
With a single server, they're limited by physical bottlenecks like the server's memory and networking capabilities.
</p>
<br />
<figure>
<img src="assets/images/case-study/monolith_1.gif" class="case-study-image" />
<figcaption>Fig. 1: multiple users making requests to a monolithic application</figcaption>
</figure>
<br />
<p>
Because of this, ACME's CTO has been working on their <a href="https://en.wikipedia.org/wiki/Digital_transformation" target="_blank"> digital transformation</a>
plans. As part of their digital transformation, the CTO has informed the development team of the two most important goals:
</p>
<ul>
<li>
breaking apart their monolith into microservices
</li>
<li>
adopting a multi-cloud, or hybrid cloud, strategy
</li>
</ul>
<br />
<h3>2.2 Breaking Apart a Monolith</h3>
<p>
If we look at their monolithic application in more detail, we can see that they've got some distinct, logical services
already - in a monolith, these could be classes in an object-oriented language, but otherwise they are just portions of
the code-base with a distinct responsibility.
</p>
<br />
<figure>
<img src="assets/images/case-study/monolith_2.gif" class="case-study-image" />
<figcaption>Fig. 2: the underlying services contained within the monolith's codebase</figcaption>
</figure>
<br />
<p>
Adopting a microservice approach would mean that rather than portions of the same code-base, services are discrete and are
only concerned with their own processes while still being able to communicate with other services. One of the primary benefits
of this approach is that it allows services to scale and consume resources independently of one another.
</p>
<br />
<figure>
<img src="assets/images/case-study/microservice_1.gif" class="case-study-image" />
<figcaption>Fig. 3: the same monolith broken apart into microservices</figcaption>
</figure>
<br />
<h3>2.3 Adopting a Multi-Cloud Strategy</h3>
<p>
A multi-cloud strategy means utilizing services provided by multiple cloud providers. There are many reasons a company might
adopt this approach, but some common ones are:
</p>
<ul>
<li>
avoiding single-vendor lock-in
</li>
<li>
operational resiliency
</li>
<li>
security and governance
</li>
<li>
emerging technologies
</li>
</ul>
<br />
<p>
Because of this, many companies have already adopted a multi-cloud strategy. However, adopting this strategy can be difficult,
especially for smaller companies. Common inhibitors for teams are:
</p>
<ul>
<li>
lack of in-house skills
</li>
<li>
hybrid cloud complexity
</li>
<li>
complexity of networking
</li>
<li>
lack of tooling
</li>
</ul>
<br />
<p>
Many small companies also utilize a <i>Platform-as-a-Service</i>, which typically don't provide the flexibility needed for
multi-cloud deployments.
</p>
<br />
<!-- Section 3 -->
<h2 class="h2">3 Platform-as-a-Service</h2>
<h3>3.1 ACME: Service Extraction</h3>
<p>
Now that we're familiar with ACME's digital transformation plans, let's see how their development team would go about
implementing these changes. The first thing they're going to do is identify which service to extract out of the monolith.
They want their users to be able to log in to 3rd party websites with their ACME accounts; however, this means that a lot
of requests coming in <i>just</i> for authentication could degrade their overall application's performance.
</p>
<br />
<figure>
<img src="assets/images/case-study/monolith_3.gif" class="case-study-image" />
<figcaption>Fig. 4: users overwhelming the authentication service in a monolithic application</figcaption>
</figure>
<br />
<p>
To solve this, they'll extract the auth service out of the monolith, and they'll want to deploy it where the service will be
highly available. With that in mind, let's take a look at what the deployment pipeline could look like with a multi-cloud approach.
</p>
<br />
<h3>3.2 Deployment Pipeline</h3>
<figure>
<img src="assets/images/case-study/deployment_pipeline.png" class="case-study-image" />
<figcaption>Fig. 5: an example deployment pipeline for various applications</figcaption>
</figure>
<br />
<p>
First, a software development team plans, architects, and develops an application.
</p>
<br />
<p>
Then, the source code goes through some sort of build process to produce a runnable artifact. That could be containerizing
the application to provide a slim, low-footprint environment for the application to execute in using Docker. Or, it could be
compiling code from a frontend framework like React into static files using a package manager like Yarn.
</p>
<br />
<p>
The artifact from the build phase is then used as the application is deployed to the proper hosting solution. There are many
solutions out there, such as: Google Cloud Run for container orchestration, AWS EC2 for virtual machines, and AWS S3 for static file hosting.
</p>
<br />
<p>
Finally, any necessary resources for public consumption are configured to interface with an existing deployment - such as setting up an
application load balancer to direct network traffic, or configuring a content distribution network like AWS Cloudfront to distribute
static files stored in S3.
</p>
<br />
<p>
All this complexity can be difficult to both learn and manage, especially for small development teams with little DevOps experience.
This is why Platforms as a Service have arisen.
</p>
<br />
<h3>3.3 PaaS v. Other Solutions</h3>
<p>
A platform-as-a-service, or PaaS, abstracts away the complexity of the build, deploy, and release cycle. This allows developers to simply
focus on their code. Examples of a PaaS are Heroku, and cloud provider-specific platforms like Google's App Engine and Amazon's Elastic Beanstalk.
</p>
<br />
<p>
When using a PaaS, developers generally trade control and configuration away for ease-of-use. Let's look at PaaS compared to other options,
like on-premise servers and Infrastructure-as-a-Service.
</p>
<br />
<figure>
<img src="assets/images/case-study/paas_compare.png" class="case-study-image" />
<figcaption>Fig. 6: spectrum of ownership of infrastructure between on-prem, IaaS, and PaaS</figcaption>
</figure>
<br />
<p>
With on-premise servers, you have the most control and configurability of your environment - from your application all the way down to
the bare-metal server - however, with that comes a lot of responsibility. With physical servers come physical maintenance - ensuring your
server room is properly ventilated and secured are very important to ensuring your services remain stable.
</p>
<br />
<p>
Next, there is Infrastructure-as-a-Service, or IaaS. As the name implies, the service you're paying for is the management and control of
those physical servers. You no longer need the overhead of a server room and maintenance, simply spin up a virtual machine and go from there.
However, for simply deploying an application, there's still a lot of management and configuration overhead.
</p>
<br />
<p>
This is why PaaS exists - developers can concern themselves with their applications and data, while the platform handles the infrastructure under the hood.
</p>
<br />
<h3>3.4 Internal PaaS</h3>
<p>
Let's revisit the auth service ACME wants to deploy. They're allowing their users to log in to 3rd party websites with their ACME accounts; however, they know
it needs to be highly available since they can't predict the usage patterns of that 3rd party website.
</p>
<br />
<p>
ACME has decided they want to use a PaaS for ease-of-use. They've decided on Google's App Engine, which will handle the build, deploy, and release cycle for
their auth service. Under the hood, Google uses their own Cloud Run container orchestrator - which is known to be highly available and scalable.
</p>
<br />
<p>
But now, ACME has another issue. As part of their payment logic, they process payment data, generate invoices, and store them on a local file server. However,
their local file server is running out of disk space, and they don't want to keep buying hard drives.
</p>
<br />
<figure>
<img src="assets/images/case-study/monolith_5.png" class="case-study-image" />
<figcaption>Fig. 7: ACME's monolith communicating with a nearly-full file server</figcaption>
</figure>
<br />
<p>
While they could utilize Google's Cloud Storage, they don't want to get locked in to a single vendor so early on in their service extraction process - so,
they've decided to migrate their files to Amazon's S3. However, this also means they need to modify their payment service.
</p>
<br />
<figure>
<img src="assets/images/case-study/monolith_6.png" class="case-study-image" />
<figcaption>Fig. 8: ACME's monolith communicating with a newly extracted data pipeline</figcaption>
</figure>
<br />
<p>
They've decided to design a data pipeline - the application will send invoice data to an AWS Lambda, which will perform any pre-processing before sending
it to Firehose, which then streams it into their S3 buckets. As you can see below, with just two services extracted, the application topology is beginning to become
more and more complex. They've got one service running on Google Cloud Run via App Engine, and their data pipeline on AWS.
</p>
<br />
<figure>
<img src="assets/images/case-study/acme_arch_1.png" class="case-study-image" />
<figcaption>Fig. 9: the current architecture of ACME's application</figcaption>
</figure>
<br />
<p>
The development team at ACME is getting frustrated with the separated deployment pipelines, and wish they could manage all of this in a unified platform - they've seen
companies build their own platform before and are toying with the idea themselves.
</p>
<br />
<p>
Companies like Netflix and Atlassian <i>have</i> built their own platforms. The reasoning behind that is as you initially set up IaaS offerings, you think that managing them will
be as simple as setting up some load balancers and pointing them to an autoscaling group of virtual machines, which have access to your resources.
</p>
<br />
<p>
However the reality is that, especially as your application grows, this topology becomes increasingly complex to manage. It's difficult to track down exactly which
application is running on which resource, which deployments may be stale and ready to tear down, and overall takes a dedicated team to truly manage
in an efficient manner. Both Netflix and Atlassian are large enterprises that can afford to dedicate teams to their platforms - ACME <i>isn't</i>.
</p>
<br />
<p>
And <i>that</i> is why we built Pilot - a tool that can assist small development teams without the resources and expertise that
larger companies might have at their disposal to tackle complex multi-cloud deployments. Because Pilot acts as an internal PaaS,
this also means ACME would still own all of the infrastructure and services Pilot utilizes.
</p>
<br />
<figure>
<img src="assets/images/case-study/internal_paas.png" class="case-study-image" />
<figcaption>Fig. 10: comparison of a PaaS and internal PaaS</figcaption>
</figure>
<br />
<p>
As a reminder, a PaaS manages everything for you, taking away control and configurability to provide ease-of-use.
The platform provider, such as Heroku, manages the infrastructure themselves - potentially even using an IaaS
provider. In Heroku's case, they build on top of AWS services.
</p>
<br />
<p>
An internal, or self-hosted, PaaS still provides the ability to control and configure the platform as long as you have
the expertise, while still providing sensible defaults so that novice users don't have to worry about the underlying
services. In Pilot's case, developers can provision that internal PaaS on either GCP or AWS - so the cloud provider
still manages the underlying hardware - and Pilot manages the necessary IaaS services that we utilize, which we will
discuss in section 5.
</p>
<!-- Section 4 -->
<h2 class="h2">4 Who Should Use Pilot?</h2>
<p>
When it comes to utilizing a platform, companies generally consider three options: build, buy, or operate.
</p>
<br />
<figure>
<img src="assets/images/case-study/pilot_compare.png" class="case-study-image" />
<figcaption>Fig. 11: comparison of potential solutions</figcaption>
</figure>
<br />
<p>
The first option, <strong>build</strong>, simply isn’t feasible for a small company. If you’re a tech giant like Netflix or Atlassian,
with a huge budget and experienced DevOps engineers, you can roll your own internal platform. For a small company,
their time is much better spent on their own core-business product that keeps the lights on.
</p>
<br />
<p>
If a small company like ACME were to <strong>buy</strong> a solution, they might look at working with a PaaS vendor like Cloud66.
Utilizing a platform vendor makes the whole process very easy and painless, but that ease comes with a pretty
steep price. Not every small company has enough runway to justify spending money on a highly-scalable platform.
</p>
<br />
<p>
Pilot exists in the third option, <strong>operate</strong>. Our open source solution makes it relatively simple to spin up your own
internal PaaS without breaking the bank and without being overly complex for a small team to manage and maintain. And if they have the time,
they can even extend it to fit their own use cases.
</p>
<!-- Section 5 -->
<h2 class="h2">5 Pilot's Architecture</h2>
<p>
Pilot consists of two main parts, the <strong>server</strong> and the <strong>CLI</strong>.
</p>
<h3>5.1 Pilot CLI</h3>
<p>
The <i>Pilot CLI</i> is what provisions the Pilot Server and how we can communicate with it after it's provisioned. The functionality of the CLI is covered
in greater detail in section 6.
</p>
<br />
<!-- <figure>
<img src="assets/images/case-study/acme_arch_1.png" class="case-study-image" />
<figcaption>Fig. X: Pilot CLI provisioning and communicating with Pilot server</figcaption>
</figure>
<br /> -->
<h3>5.2 Pilot Server</h3>
<p>
The Pilot server is provisioned using Terraform. A <code>main.tf</code> file is dynamically generated for either an AWS EC2 instance or a GCP Compute Engine
instance depending on what flag the user provides. Terraform then provisions a virtual machine that will host our custom Waypoint containers which will run in
Docker.
</p>
<p>
We also used a tool called cloud-init to help us bootstrap the virtual machine with the proper software and configure it the same way every time. Cloud-init allowed
us to create identical clones of the Pilot server across different cloud providers.
</p>
<br />
<figure>
<img src="assets/images/case-study/pilot_server.png" class="case-study-image" />
<figcaption>Fig. 12: Pilot server architecture </figcaption>
</figure>
<br />
<p>
The containers running on the Pilot Server are a Waypoint server and Waypoint runner. The Waypoint server handles incoming requests, serves the user interface,
and caches project metadata. The runner handles the execution of the deployment lifecycle. Whenever a command is executed, such as <code>pilot up</code>, the
server receives the request, understands that it is a deployment command, then tells the runner to handle the execution for the proper application.
</p>
<br />
<p>
Instead of using the default Waypoint image provided, we needed to create our own custom Waypoint image so that the runner container can have everything it needs to
handle the deployment process. We discuss this further in section 9.2.
</p>
<br />
<h3>5.3 Pilot Database</h3>
<p>
Pilot also provisions a global PostgreSQL database for applications to use. The caveat is that only applications deployed
on the same cloud provider as the database can access it. For example, an application deployed on Elastic Container Service
would <i>not</i> be able to communicate with a Cloud SQL instance without manual configuration.
</p>
<br />
<h4>GCP Cloud SQL Instance</h4>
<p>
When running the command <code>pilot setup --gcp</code>, Pilot will provision a Cloud SQL instance using PostgreSQL.
To provide a layer of security, the database is limited to communicating via an internal IPV4 address with services in the same
VPC. We also provision a VPC Connector that allows for communication between a Serverless service, such as an application
running on Google Cloud Run, and the database.
</p>
<br />
<h4>AWS RDS Instance</h4>
<p>
When running the command <code>pilot setup --aws</code>, Pilot will provision an RDS instance using PostgreSQL.
To allow applications to communicate with the database, we also provision a subnet group and security group that
applications can be tied to.
</p>
<br />
<h4>Database Configuration</h4>
<p>
During setup, Pilot configures a database user for your applications to use. Any extra configuration will
need to be handled via the respective cloud provider's management console. Any necessary connection information
can be retrieved via <code>pilot server --db</code>; this will return the database address, user, and password.
</p>
<h3>5.4 Application Deployment</h3>
<p>
With a <code>waypoint.hcl</code> configuration file, Pilot can be used to deploy applications with a variety of configurations.
We wanted to give flexibility to our users, so these configuration files are freely editable for your needs but we do provide some sensible defaults to play around with.
</p>
<figure>
<img src="assets/images/case-study/waypoint_hcl_1.png" class="case-study-image" />
<figcaption>Fig. 13: Example <code>waypoint.hcl</code> configuration used to deploy on ECS</figcaption>
</figure>
<p>
Here we will dive into the architecture of the deployment lifecycle being build, deploy, and release as well as how it pertains to the stanzas within a <code>waypoint.hcl</code>.
</p>
<h4>Build</h4>
<p>
The <code>build</code> stanza defines how the application should be built into an image and where the image should be stored.
</p>
<p>
You can use Docker or Pack as builders to compile source code into an image.
</p>
<p>
A Docker build will require a <code>Dockerfile</code> in your project repository that is used to configure the dependencies needed for the application.
The <code>Dockerfile</code> places more of a demand on the part of the user to figure out what is needed for their application to run properly.
This can lead to oversized images if software is installed that is not actually needed.
</p>
<p>
Going with Pack for the build phase will use Cloud Native Buildpacks. Pack will analyze the source code of your project to determine the language and any necessary dependencies.
After this analysis, Pack will apply any relevant buildpacks that can be used to build a runnable artifact without the user defining it themselves.
This compilation of buildpacks creates a container image that can be used to spin up containers within a platform.
</p>
<p>
Once the artifact is built the image is pushed to a container registry which in Pilot's case can be a AWS Elastic Container Registry or Google Cloud Container Registry repository.
</p>
<br />
<figure>
<img src="assets/images/case-study/build.png" class="case-study-image" />
<figcaption>Fig. 14: Build phase</figcaption>
</figure>
<br />
<h4>Deploy</h4>
<p>
The <code>deploy</code> stanza defines how an application is deployed on the cloud provider.
It will take the built artifact from the registry and deploy it to a target deployment platform such as AWS Elastic Container Service or Google Cloud Run cluster.
This allows Pilot deployed applications to run on a managed platform without a user needing to configure container orchestration such as networking and autoscaling resources.
</p>
<p>
The deploy phase is also where applications are staged for release.
Staging refers to the application being ready to receive traffic but is not open to public consumption by adding a load balancer or updating DNS records.
Some platforms do not support staging like in the case of ECS which the <code>deploy</code> stanza serves as both the deploy and release phases.
</p>
<p>
In addition, this section of the configuration file can be used to define specifics once the container is deployed such as CPU and memory constraints or port number the application will listen on.
</p>
<br />
<figure>
<img src="assets/images/case-study/deploy.png" class="case-study-image" />
<figcaption>Fig. 15: Deploy phase</figcaption>
</figure>
<br />
<h4>Release</h4>
<p>
A <code>release</code> stanza defines the final phase of deployment. As mentioned in the previous section this is typically where the application is opened up for general traffic.
The release phase is considered optional depending on the plugins being employed. It will generally attach a load balancer to an application, assign a DNS record,
and any other configuration needed to make the application available on the internet.
</p>
<br />
<figure>
<img src="assets/images/case-study/release.png" class="case-study-image" />
<figcaption>Fig. 16: Release phase</figcaption>
</figure>
<br />
<!-- Section 6 -->
<h2 class="h2">6 Installing and Using Pilot</h2>
<h3>6.1 Installation</h3>
<p>
Pilot is an NPM package that can be installed using <code>npm i -g @pilot-framework/pilot</code>
</p>
<h3>6.2 Set Up Pilot Server</h3>
<h4>pilot init</h4>
<p>
<code>pilot init</code> sets up your local environment. It downloads binaries Pilot needs like <i>Terraform</i> and <i>Waypoint</i>, and also scaffolds our metadata
directory, <code>~/.pilot</code>, which contains any necessary information Pilot needs to operate.
</p>
<h4>pilot setup [PROVIDER]</h4>
<p>
<code>pilot setup [PROVIDER]</code> provisions a virtual machine in the chosen cloud provider. The architecture previously shown depicts an AWS EC2 instance but it can be a
GCP Compute Engine instance as well. Setting up the Pilot Server can take a few minutes to finish provisioning because it has to provision a VM, provision a database,
install all the necessary software the Pilot Server needs, and finally configure the local environment to communicate with the server.
</p>
<br />
<figure>
<img src="assets/images/pilot-setup.gif" class="case-study-image" />
<figcaption>Fig. 17: pilot setup command</figcaption>
</figure>
<br />
<p>
Once the Pilot Server is provisioned and configured, you're able to deploy your applications.
</p>
<h3>6.3 Deploy an Application</h3>
<p>Deploying an application is a three step process:</p>
<ol>
<li>Creating the Project</li>
<li>Creating and Configuring the Application</li>
<li>Deploying the Application</li>
</ol>
<h4>Creating the Project - <code>pilot new project</code></h4>
<p>
First, you need to create the project to deploy on the Pilot server. It's as simple as calling <code>pilot new project</code>. The command will prompt for the name of
the project and will register it with the Pilot Server. This action can also be done via the provided UI.
</p>
<br />
<!-- <figure>
<img src="assets/images/case-study/acme_arch_1.png" class="case-study-image" />
<figcaption>Fig. X: pilot new project command</figcaption>
</figure>
<br /> -->
<h4>Creating and Configuring the App - <code>pilot new app</code></h4>
<p>
<code>pilot new app</code> configures general information about your application. It will prompt you for some of the necessary information that is needed to
deploy your application like what cloud provider to deploy to, the application name, and other pertinent information.
</p>
<br />
<figure>
<img src="assets/images/case-study/pilot_new_app.gif" class="case-study-image" />
<figcaption>Fig. 18: pilot new app command</figcaption>
</figure>
<br />
<p>
Once the configuration is complete, a <code>waypoint.hcl</code> file will be generated. You need to push this into the root of your application repository before you can
deploy it. Or, if you prefer the <code>waypoint.hcl</code> to not be checked into your repository, you can copy the file contents, and paste it within the UI when
selecting the location of the <code>waypoint.hcl</code> file.
</p>
<h4>Creating and Configuring the App - UI Configuration</h4>
<p>
From the UI, you need to configure your app and link it to your remote repository. More information on how to do this can be found via
<a href="https://www.waypointproject.io/docs/projects/git#polling" target="_blank">Waypoint's documentation</a>.
</p>
<h4>Deploying the App - <code>pilot up [PROJECT]/[APP]</code></h4>
<p>
<code>pilot up</code> is called with the project name and app name as arguments. It will look at the <code>waypoint.hcl</code> file and go through the three phases of
deployment: build, deploy, and release.
</p>
<br />
<figure>
<img src="assets/images/pilot-up-cli.gif" class="case-study-image" />
<figcaption>Fig. 19: pilot up command</figcaption>
</figure>
<br />
<p>
Keep in mind that depending on your application and cloud provider, the deployment process can take multiple minutes to complete.
</p>
<!-- Section 7 -->
<h2 class="h2">7 Design Decisions</h2>
<p>
Our design intention while developing Pilot was to make it easy for a small team of developers
to focus on writing code and deploying their applications with a cloud-agnostic approach. They
will not have to worry about the underlying infrastructure, just that their applications are made
available for consumption while still having control of the platform and choice of provider. The
main design decisions centered around using Waypoint as a deployment tool and how to install it
to handle remote operations.
</p>
<br />
<h3>7.1 Extending Waypoint</h3>
<p>
An integral decision for Pilot was to extend Waypoint. It is a fairly new product in the open
source deployment space but gives you various options around image builds and deploying applications
to multiple cloud providers. Using it as base for Pilot's design provided several benefits.
</p>
<br />
<h4>Existing User Interface</h4>
<br />
<p>
We were able to focus on CLI and infrastructure development since Waypoint has an existing user
interface. It is well-designed and provides a simple to understand user interface.
</p>
<br />
<h4>Custom Plugin Ecosystem</h4>
<br />
<p>
The Waypoint SDK allowed us to develop our own custom plugins in Go. This enabled us to add in
static web hosting as a deployment option using our Cloudfront and Cloud CDN plugins.
</p>
<br />
<h4>Trusted Open-Source Company</h4>
<br />
<p>
Waypoint was developed by HashiCorp, which we consider to be a trusted open-source company.
They are well known and have products that many companies have incorporated into their multi-cloud strategies.
</p>
<br />
<h4>Familiarity with HashiCorp Configuration Language</h4>
<br />
<p>
Developers that have used HashiCorp's products are also familiar with their HashiCorp
Configuration Language. You would recognize it if you have used their products like
Terraform, which is a popular infrastructure automation tool. The syntax of the
configuration files is meant to be easy to read and write.
</p>
<br />
<h3>7.2 Handling Remote Waypoint Operations</h3>
<p>
Remote collaboration is an important aspect of a PaaS, but can introduce roadblocks when implementing your own.
</p>
<br />
<h4>Steps to Set Up Waypoint</h4>
<br />
<p>
As of right now, it is difficult to configure Waypoint as a remote management option.
If a team wanted to integrate Waypoint into their platform, they would have to consider a few design decisions of their own.
</p>
<br />
<figure>
<img src="assets/images/case-study/waypoint_steps.png" class="case-study-image" />
<figcaption>Fig. 20: The necessary steps to configure a remote Waypoint server</figcaption>
</figure>
<br />
<p>
First, you would need to provision the necessary infrastructure for remote operations,
which could be an AWS EC2 virtual machine. Then install and configure dependencies, such as
Waypoint and Docker, to be able to deploy applications to the cloud. Setting up networking
and rules for the EC2 instance is also a necessary step to ensure your instance is secure.
</p>
<br />
<p>
The Waypoint server would also need the proper permissions, such as service credentials
for all cloud providers you would want to deploy to. Additionally, Waypoint's documentation
recommends configuring Docker-in-Docker for a remote Waypoint runner to handle building
and pushing images to a container registry. Finally, you have to configure your local
environment to communicate with the established remote pipeline.
</p>
<br />
<p>
We believed automating this process would remove a large pain point for our users.
</p>
<br />
<h4>Using a Remote Waypoint Server</h4>
<br />
<p>
We decided to use a remote Waypoint server to serve as the backbone of Pilot's
deployment operations to multiple cloud providers. But you might be wondering:
why was it a good option to create a remote Pilot server in the first place?
</p>
<br />
<figure>
<img src="assets/images/case-study/local_waypoint.png" class="case-study-image" />
<figcaption>Fig. 21: A local Waypoint configuration with a single user</figcaption>
</figure>
<br />
<p>
As you can see, a simple Waypoint configuration is much easier when set up
locally. All that it requires is a Waypoint binary and a server container
running on your local machine. A runner container is not necessary for taking
on deployment jobs, since the Waypoint binary will serve as the runner in this
type of configuration, which is less management overhead.
</p>
<br />
<figure>
<img src="assets/images/case-study/remote_waypoint.png" class="case-study-image" />
<figcaption>Fig. 22: A remote Waypoint configuration with multiple users</figcaption>
</figure>
<br />
<p>
A remote Pilot server, on the other hand, would enable us to provide increased flexibility for
our users. The Pilot server can be provisioned on our supported cloud providers
to easily allow collaboration with a team. Users will not have to configure
their local machine to be accessible on a network and you get a centralized
management server for project and application deployments.
</p>
<br />
<p>
We also take away the concern of installing local dependencies for the
deployment lifecycle. You will not have to install Docker, Pack, or any
additional tooling since they are available on the Pilot server. However,
we do provide the flexibility of managing the Pilot server through SSH access.
</p>
<br />
<p>
Having a remote Pilot server is good and all, but how should we install
the Waypoint server to it? Waypoint allows installing servers and runners
using Docker, Kubernetes, and even HashiCorp’s Nomad orchestration service.
For Pilot, the options came down to Docker and Kubernetes as an installation platform
for our containers.
</p>
<br />
<figure>
<img src="assets/images/case-study/k8s_cluster.png" class="case-study-image" />
<figcaption>Fig. 23: a kubernetes cluster running custom Pilot nodes</figcaption>
</figure>
<br />
<p>
With Kubernetes orchestration, we could spin up a cluster that would give us
the benefits of high availability and scalability. Having multiple runners
set up on a cluster would be able to handle concurrent deployments of
applications. This would be a value-add for teams wanting simultaneous deployments
without a wait time. A cluster could also have self-healing capabilities
to bring containers back up if they were to go down.
</p>
<br />
<p>
That all sounds great, but Kubernetes has some drawbacks for users new to it.
</p>
<br />
<p>
Kubernetes will introduce extra overhead for smaller development teams. It is
difficult to understand and implement as you will be spending a lot of time in
documentation to wrap your head around proper configuration and management.
With that in mind it is not the best option if you want to get up and running quickly.
</p>
<br />
<p>
Kubernetes clusters can also be difficult to maintain without a dedicated DevOps engineer or team member who has DevOps experience.
</p>
<br />
<figure>
<img src="assets/images/case-study/pilot_server.png" class="case-study-image" />
<figcaption>Fig. 24: the Pilot Server with Waypoint server and runner containers</figcaption>
</figure>
<br />
<p>
With Pilot we decided to make the Pilot server virtual machine provision a single
server and runner container on Docker as a platform. Through our testing as a
small development team, a single Waypoint server and runner was enough for our needs.
This makes it easier for users as they will only have to manage a single virtual machine
along with a couple of containers, and not the headache of a Kubernetes cluster on top of it.
</p>
<br />
<p>
Runners are considered an advanced configuration but you could spin up more on the
Pilot server if needed with documentation available on the Waypoint site.
</p>
<br />
<!-- Section 8 -->
<h2 class="h2">8 Implementation Challenges</h2>
<p>
There were three major challenges we ran into while developing Pilot: creating
Waypoint plugins, building out a custom image capable of handling our Waypoint
operations, and whether to use Docker-in-Docker for image builds.
</p>
<br />
<h3>8.1 Creating Waypoint Plugins</h3>
<p>
Plugins are binaries written in Go that implement one or more Waypoint components.
Waypoint components consist of the deployment lifecycle such as building artifacts
and pushing them to a registry. You can think of them as middleware that Waypoint
injects into the lifecycle.
</p>
<br />
<p>
When we started learning about Waypoint, we realized that many of the built-in
plugins were geared towards deploying full-stack or backend applications. We felt
there was an opportunity for plugins that were able to deploy static assets from
something like a React application, so we were happy to see Waypoint provides an
SDK that allow developers to create their own plugins. You can create plugins that
execute for the entire build, deploy, and release cycle or individual parts of it.
</p>
<br />
<p>
Pilot has several plugins created for your use that come baked into our custom Waypoint image.
</p>
<ul>
<li>
A Yarn plugin executes during the build phase to bundle frontend applications into static files.
</li>
<li>
An Amazon Cloudfront plugin that interfaces during the deploy and release phases.
This plugin will upload static files to S3 and release a static site to a Cloudfront distribution.
</li>
<li>
Finally, a Cloud CDN plugin for GCP that works like the Cloudfront plugin by uploading static files
to Cloud Storage and releasing on Cloud CDN.
</li>
</ul>
<br />
<p>
Developing these plugins introduced their own challenges. For one, the documentation for creating plugins
was a bit light on context. It and the associated tutorial focuses mainly on the build phase. Since the
information for additional components was more conceptual, we needed to reference source code of the built-in
plugins, custom plugins from other developers, and trial and error to get the functionality we wanted.
</p>
<br />
<figure>
<img src="assets/images/case-study/sdk_table.png" class="case-study-image" />
<figcaption>Fig. 25: a comparison table of the AWS and GCP SDKs</figcaption>
</figure>
<br />
<p>
Another problem was SDK support depending on the cloud provider. We found the AWS SDK to be robust and
easy to work with. Anything we needed to provision for deploying to Cloudfront could be done using it.
</p>
<br />
<p>
The GCP SDK, however, was limited in some ways. Creating Cloud Storage buckets and uploading assets was
supported, but provisioning the resources needed for a Cloud CDN release was not. Additionally, deleting
certain resources using the SDK was also partially supported, which was a drawback when wanting to
teardown those resources automatically.
</p>
<br />
<figure>
<img src="assets/images/case-study/gcp_wrapper.png" class="case-study-image" />
<figcaption>Fig. 26: a gcloud CLI command with its respective custom SDK function</figcaption>
</figure>
<br />
<p>
We decided to use a combination of the SDK with the gcloud CLI to give us the same functionality as
the Cloudfront plugin. We created a wrapper for gcloud commands to provide familiarity and usability
as an SDK and it worked out fine for our needs.
</p>
<br />
<p>
However, needing the gcloud CLI installed created a dependency on our Waypoint containers
when deploying front-end applications using our Cloud CDN plugin.
</p>
<br />
<h3>8.2 Building a Custom Docker Image</h3>
<p>
By default, Waypoint uses HashiCorp's Waypoint image which can be found publicly available
on Docker Hub. It includes the Waypoint binary and built-in plugins developed by HashiCorp.
But this introduced a problem for our team when it came to adding our own plugins.
</p>
<br />
<figure>
<img src="assets/images/case-study/waypoint_binary_1.png" class="case-study-image" />
<figcaption>Fig. 27: the Waypoint image unable to pull our custom plugins</figcaption>
</figure>
<br />
<p>
We wanted to include our custom plugins for Yarn builds and Cloud CDN and Cloudfront
deployments. At this time, Waypoint does not currently have a community plugin system
for making custom plugins generally available. Our plugins also had their own
dependencies, like NodeJS and Yarn to build static files and gcloud CLI for Cloud CDN provisioning.
</p>
<br />
<figure>
<img src="assets/images/case-study/waypoint_binary_2.png" class="case-study-image" />
<figcaption>Fig. 28: our custom Waypoint image</figcaption>
</figure>
<br />
<p>
We decided to create our own custom Waypoint development and release images that can
be pulled from Docker Hub freely. To do this, we had to reverse engineer Waypoint's
default image and source code to figure out how Waypoint starts the server and runner
containers, but also what packages the Waypoint image had internally.
</p>
<br />
<p>
Our investigation found the image was pretty much a barebones Linux distribution
limited to the Bourne shell. Not much else stood out in terms of dependencies for
normal Waypoint installations. This meant using that image as a base would require
installing multiple standard Linux packages in our Dockerfile.
</p>
<br />
<p>
We initially started off with a standard Node image and only needed to install
gcloud as a dependency. This worked fine, but the resulting image was unnecessarily
large at approximately ~1.6 gigs. The Node image has other packages installed we really
did not need. Waypoint server installs took longer as a result.
</p>
<br />
<p>
We were keen on downsizing the image build as much as possible and attempted multiple
prototypes. Eventually, we discovered an alpine image that Google maintains which contains gcloud and
decided to use it as a base for our custom image. This cut down the bloat by 50%
and increased performance of our Waypoint server installs using Pilot's image.
</p>
<h3>8.3 Docker-in-Docker Image Builds</h3>
<p>
Another challenging aspect of remote Waypoint operations is executing image builds
and pushes to a container registry from a runner container. There are several so
called Docker-in-Docker methods that one may use to accomplish this with a few outlined below.
</p>
<br />
<figure>
<img src="assets/images/case-study/docker_1.png" class="case-study-image" />
<figcaption>Fig. 29: a container with the Docker socket mounted</figcaption>
</figure>
<br />
<p>
The first method uses the <code>docker.sock</code> which is the default Unix socket that
the Docker daemon listens to. From a Docker host you run a command that starts a container
with a Docker binary installed and mount the host's socket path to allow running Docker
commands from within the container. When commands are executed, all operations happen on
the host such as image builds, pushes, and container starts. Any started containers become
siblings of the container issuing the commands.
</p>
<br />
<figure>
<img src="assets/images/case-study/docker_2.png" class="case-study-image" />
<figcaption>Fig. 30: a privileged container running on a Docker host</figcaption>
</figure>
<br />
<p>
The second method uses an outer container started from an official Docker-in-Docker image
that Docker maintains. A container must be started in privileged mode which can have
adverse side effects.
</p>
<br />
<p>
The container will get full access to all devices on the host. Consequently, this is
the same access to the host as any processes running outside of containers, which can
be a security flaw if the privileged container were to be compromised.
</p>
<br />
<p>
Additionally, all Docker operations would occur within the container. Images would be built and
stored on the container which is not great when containers should be lightweight.
Started containers would become children and nested within the outer container and
this creates additional management overhead with the amount of abstraction.
</p>
<br />
<p>
Evaluating Docker-in-Docker generated its own problems. It is generally not recommended