Spawning Kubernetes Clusters in CI for Integration and E2E tests

Mon, Jul 16, 2018

Making sure your application works correctly is an important step before deploying changes or merging a Pull Request. You want to be sure that incoming changes are not going to introduce any regressions and negatively effect any part of the system. This is usually done by writing and running Integration and E2E tests. In order to make sure you have a clean testing environment to prevent possible errors, and to make it easier to test all incoming changes, running Integration and E2E tests in CI is recommended.

However, when you’re developing complex applications, such as Kubernetes operators, controllers, or API servers, in order to simulate actual environments as close as possible, you need to have all dependencies installed and configured, which is not an easy task in CI. While Kubernetes provides many official and unofficial solutions for deploying clusters, running Kubernetes in CI is not an easy task.

kubeadm is the official and the most popular solution for bootstrapping clusters, but it’s not working out-of-box in CI. There’re many helpers utilizing kubeadm such as kube-spawn that could work in CI, but requires systemd and systemd-nspawn, which are not available in some CI systems, including Travis-CI.

Minikube is used to run Kubernetes locally for developing and experimenting, but with some tricks we can utilize it in CI to spawn a cluster for testing. Beside Minikube, there’re DIND (Docker-in-Docker) solutions that work in CI, such as kubeadm-dind-cluster.

In this blogpost, I’ll go through two most popular solutions for running Kubernetes in CI, Minikube and DIND. I’ll try to compare them and share my experience. Cloud-provider solutions and utilizing them will be also mentioned. By the end of the post, we’re going to see how you can debug Kubernetes in CI when it is not working as expected.

For this blogpost, I’ll use Travis-CI, because it’s the most popular solution in the wild and it’s free for open source projects.

Defining expectations

Before choosing a solution, it is very important to define expectations and what exactly do you need. That includes what Kubernetes version do you need, number of nodes in cluster, bootstrapping speed and more.

Some of the important vectors that you should pay attention when choosing a solution include:

Number of nodes in cluster. For example, Minikube is a single-node solution, so in case you have a need for more nodes, you’ll need to use kubeadm-dind-cluster or something similar.
Kubernetes version and/or number of Kubernetes versions supported,
Available Kubernetes features and customizability. For example, can you use Feature Gates to enable alpha and experimental features.
Bootstrap speed. Some solutions takes more time to bootstrap than others. This depends on many other factors, including speed of machine you’re using to run tests and number of nodes.
Security. Kubernetes uses RBAC, however, it is not available for every solution.

Beside defining your expectations, you must verify is your CI environment capable of running some solution.

At the time of writing this blogpost, Travis-CI uses Ubuntu 14.04 Trusty Tahr to run builds. Therefore, systemd and systemd-containers are not installed. Many popular solutions, including various DIND (Docker-in-Docker) solutions such as kube-spawn require systemd and systemd-containers, so you’ll not be able to use those solutions.

This include some of the most important points, however, it’s hard to choose the appropriate solution before you try them all. Let’s go ahead and compare the two most popular solutions, Minikube and kubeadm-dind-cluster.

Minikube

Minikube is the Kubernetes-official solution for deploying single-node Kubernetes clusters. Minikube is usually associated with VMs, which are usually not available in CI environments. This is even stated in their README:

Minikube runs a single-node Kubernetes cluster inside a VM on your laptop for users looking to try out Kubernetes or develop with it day-to-day.

Luckily, Minikube has none driver, which is not documented and promoted at all. The none driver uses your local Docker installation instead of VMs. The requirement for using none driver is to run it on Linux and it’s not recommended to use it on your local machines to prevent problems.

Lili Cosic has a great blog post on this topic, so make sure to check it out to learn more about running Minikube on Travis-CI! Lili also maintains a GitHub repository—lilc/travis-minikube with documented Travis-CI manifests for Kubernetes 1.9 and Kubernetes 1.10.

For a reference, this is how Travis-CI manifest for Kubernetes 1.10 looks like:

env:
# Set appropriate permissions to Minikube and Kubernetes related files.
- CHANGE_MINIKUBE_NONE_USER=true

before_script:
# Make root mounted as rshared to fix kube-dns issues.
- sudo mount --make-rshared /
# Download kubectl, which is a requirement for using minikube.
- curl -Lo kubectl https://storage.googleapis.com/kubernetes-release/release/v1.9.0/bin/linux/amd64/kubectl && chmod +x kubectl && sudo mv kubectl /usr/local/bin/
# Download minikube.
- curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x minikube && sudo mv minikube /usr/local/bin/
- sudo minikube start --vm-driver=none --bootstrapper=localkube --kubernetes-version=v1.10.0
# Fix the kubectl context, as it's often stale.
- minikube update-context
# Wait for Kubernetes to be up and ready.
- JSONPATH='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}'; until kubectl get nodes -o jsonpath="$JSONPATH" 2>&1 | grep -q "Ready=True"; do sleep 1; done

There are several important points that you need to pay attention in order to successfully run Minikube and Kubernetes 1.10 in Travis-CI. Many errors are silent, and as there is no SSH access to the CI environment out-of-box, issues are hard to debug.

Lili’s blogpost covers all the important points, but there’re two changes for Kubernetes 1.10 and Minikube v0.26 and newer:

sudo mount --make-rshared / — as of Kubernetes 1.10, it is required for /etc/kubernetes to be on a shared or slave mount. If the volume is not shared/slave, the kube-addon pods will fail to start, resulting in kube-dns pods not being deployed. This PR contains some more details why is this required.
Running Minikube with localkube bootstrapper — as of Minikube v0.26, the default bootstrapper is kubeadm, which doesn’t work in Travis-CI, so it’s required to tell Minikube to use the localkube bootstrapper. The bootstrappers and their differences will be covered later in this blog post.

RBAC and Minikube

As of Kubernetes 1.8, RBAC has became the most popular way of handling permissions and significantly improving security of your cluster. In order to enable it, you need to start the Kubernetes API Server with the --authorization-mode=RBAC flag.

Some solutions such as kubeadm do that out-of-box, but this is not a case for Minikube, at least not when started with the localkube bootstrapper. To get RBAC working in Minikube, you have to start it with the --extra-config=apiserver.Authorization.Mode=RBAC flag, such as:

sudo minikube start --vm-driver=none --bootstrapper=localkube --kubernetes-version=${KUBERNETES_VERSION} --feature-gates=CustomResourceSubresources=true --extra-config=apiserver.Authorization.Mode=RBAC

It’s is important to note that enabling RBAC in Minikube causes some services to stop working, including kube-dns, as there are no appropriate ServiceAccounts created out of box.

The issue kubernetes/minikube#1722 includes some more details about this, along with several solutions.

One of the comments mentions giving cluster-admin permissions to the default ServiceAccount in order to fix the problem. This is not the greatest solution looking from the security aspect, however, this is the easiest one. Usually, your Travis-CI Kubernetes clusters are supposed to be disposable, i.e. they’re deleted after the build is done, and they’re not exposed to the world, so it is okay to use this solution.

You can give cluster-admin permissions to the default ServiceAccount in kube-system namespace such as:

kubectl create clusterrolebinding add-on-cluster-admin --clusterrole=cluster-admin --serviceaccount=kube-system:default

Minikube Bootstrappers. Available Kubernetes versions.

Throughout the post, I have already mentioned localkube and kubeadm Minikube bootstrappers and it’s time to deep dive into them.

Minikube bootstrappers define how your Kubernetes cluster is created. It defines what tools are used to build the cluster, how is it ran, what dependencies are required and installed, and a lot of other things.

Minikube has two bootstrappers—localkube and kubeadm.

The kubeadm bootstrapper, as its name says, uses kubeadm to spawn a Kubernetes cluster. It is the newest bootstrapper and is supposed to replace the localkube bootstrapper in the future. It supports many Kubernetes versions, including latest ones. However, the kubeadm bootstrapper depends on systemd to run components needed by Kubernetes, such as kubelet, and therefore, it’s not possible to use it in Travis-CI.

The localkube bootstrapper provides all Kubernetes components as a single binary. It doesn’t require systemd and can be used in Travis-CI. As kubeadm is the most popular and official solution, as well as it’s harder to maintain everything as a single binary, the localkube bootstrapper is deprecated and will be removed in the future. As of Minikube v0.26, kubeadm is default, and as mentioned, you need to explicitly specify to use the localkube bootstrapper.

Even as the localkube bootstrapper is still available, its deprecated and not maintained anymore. Therefore, support for any newer Kubernetes version will not be added. The latest available Kubernetes version with the localkube is v1.10.0.

This is problematic if you depend on newer versions or you want to test your application against latest Kubernetes version in CI. There is no workaround for this problem beside using alternative solutions, such as DIND, which will also be covered in this blogpost.

Enabling alpha and experimental features

As latest available Kubernetes is v1.10.0, you could potentially miss some features required for your workflow. However, some feature goes into beta, and then into GA, the feature is available as an alpha feature. Alpha features are disabled by default and are guarded by the Feature Gates. To enable alpha feature, you need to turn on the appropriate feature gate.

For example, my project depends on CRD Status Subresource, which is available as beta in v1.11, but was first introduced in v1.10 as an alpha feature, guarded by the CustomResourceSubresources feature gate.

In order to enable a feature gate, you need to start Minikube with the --feature-gates flag such as:

sudo minikube start --vm-driver=none --bootstrapper=localkube --kubernetes-version=${KUBERNETES_VERSION} --feature-gates=CustomResourceSubresources=true

The list of the Kubernetes Feature Gates available for v1.10 can be found in the documentation.

Docker-in-Docker (DIND) and kubeadm

Dokcer-in-Docker assumes running Docker within a Docker container. While this could sound strange, this way you can run around some CI limitations.

There are many DIND and DIND-like solutions, but not all of them work in Travis-CI. One of the most popular DIND-like solutions is kube-spawn, but it depends on systemd-nspawn and as systemd is not available in Travis-CI, it’s not possible to use kube-spawn.

The kubeadm-dind-cluster utility can be used to create a Kubernetes cluster based on DIND. It works with Travis-CI and supports Kubernetes v1.8, v1.9 and v1.10.3.

The README file has instructions on how to get started, as well as has details about various settings. The Travis-CI manifest would look like the following one:

before_script:
# Download kubeadm-dind-cluster script and give it executable permissions.
- wget https://cdn.rawgit.com/kubernetes-sigs/kubeadm-dind-cluster/master/fixed/dind-cluster-v1.10.sh
- chmod +x dind-cluster-v1.10.sh
# Start Kubernetes cluster.
- ./dind-cluster-v1.10.sh up
# Add Kubectl directory to the PATH.
- export PATH="$HOME/.kubeadm-dind-cluster:$PATH"
# Wait for Kubernetes to be up and ready.
- JSONPATH='{range .items[*]}{@.metadata.name}:{range @.status.conditions[*]}{@.type}={@.status};{end}{end}'; until kubectl get nodes -o jsonpath="$JSONPATH" 2>&1 | grep -q "Ready=True"; do sleep 1; done

At this point, you have Kubernetes cluster ready to run your E2E tests. Compared to Minikube, kubeadm-dind-cluster uses kubeadm, which is recommended tool for bootstrapping clusters, as well as supports latests versions and is maintained.

Enabling alpha and experimental features

Similar as for Minikube, alpha and experimental features can be enabled using appropriate Feature Gate. The Feature Gates are specified by using the FEATURE_GATES environment variable. If no Feature Gates are specified, the MountPropagation Feature Gate will be enabled.

To enable a feature gate, you need to start kubeadm-dind-cluster such as:

# Enable MountPropagation and CustomResourceSubresources feature gates.
- export FEATURE_GATES="MountPropagation=true,CustomResourceSubresources=true"
# Start Kubernetes cluster.
- ./dind-cluster-v1.10.sh up

Number of Nodes and Bootstrapping Speed

Compared to Minikube, the kubeadm-dind-cluster takes much longer to bootstrap and to provide usable Kubernetes cluster.

First reason is number of nodes—the kubeadm-dind-cluster comes with 3 nodes by default, compared to Minikube which is a single-node cluster. The number of nodes can be configured using the NUM_NODES environment variable. The default value for NUM_NODES is 2, meaning master and two nodes will be bootstrapped. If you set NUM_NODES to zero, only master is going to be bootstrapped.

Second reason is time needed to download and build Docker images needed for Kubernetes. Travis-CI has fast connection, so the time needed to build all images is more significant.

The kubeadm-dind-cluster with master and 2 nodes takes about 6-10 minutes to bootstrap, while single-node Minikube cluster takes about 30-60 seconds to bootstrap.

Running Tests In Cloud

The Minikube and kubeadm-dind-cluster solutions are easy to set up and mostly important, they’re free. However, each solution has some drawbacks, but even ignoring that, some edge cases can’t be reproduced in Minikube or kubeadm-dind-cluster environments.

For example, on my GSoC project—etcdproxy-controller, the E2E tests in Travis-CI running Minikube were passing and everything worked wonderfully. I had the same results when running tests on my DigitalOcean kubeadm cluster.

But then, I remembered I have some GCP Credits left from Trial, and decided to deploy my controller to GKE. Shortly after getting started, I ran into two problems:

In order to create and manage RBAC roles, your account must be a cluster-admin,
In order to authenticate with Kubernetes when running controller in-cluster, you need to import the GCP authorization plugin.

The first problem is not related to my controller. However, the second one is, and I had to modify the controller code to import the GCP authorization plugin.

The two most common options for running Kubernetes in cloud:

Using managed solutions, which are offered by all major cloud providers—Google Kubernetes Engine, Amazon EKS and Azure Kubernetes Service. DigitalOcean is also joining the party as of September with Kubernetes on DigitalOcean. All of the mentioned providers have official or unofficial CLI tools that you can use to create and manage clusters: gcloud for GKE, eksctl by WeaveWorks for Amazon EKS and azure-cli for Azure.
Creating cluster from scratch using solution such as kubeadm or Ansible. In this case, Kubicorn can be a good solution to create and manage Kubernetes clusters. Kubicorn leverages kubeadm and allows you to easily, just with two commands, create a cluster. Kubicorn can also be used as a Go library, so you can even easier integrate with your current testing infrastructure.

Debugging in Travis-CI

When working with complex systems such as Kubernetes, which has many components and many dependencies, chances for errors are much higher. The negative downside when running in CI is that CI environment is hardly accessible, i.e. you can’t really SSH into the CI Virtual Machine to get logs and deploy some fixes.

For example, in case of CircleCI, you can re-run build with SSH enabled, and then SSH into the VM like you would SSH into any other machine.

But, is this possible for Travis-CI builds? Yes, it is!

However, the SSH feature is disabled by default for public repositories. It can be enabled for specific repository by sending request to the Travis-CI Support team.

The Running Build in Debug Mode portion of Travis-CI documentation contains information about how you can contact support to enable it, how to use it, and what do you need to pay attention to.

The Travis-CI support team has been really responsive in my case. I got the feature activated for about half an hour after sending the request, as well as all my questions answered.

Once enabled, you can invoke the Debug/SSH build by sending the appropriate API request using curl. The documentation contains information where you can find your access token, job ID, as well as what endpoint you need to use.

The biggest downside, and the reason why SSH/Debug builds are disabled by default, is that when you start the Debug build, the credentials for accessing the CI VM are written to the job log, which is public and available to all users. The CI VM is up for 30 minutes, but if somebody finds the credentials while VM is still running, somebody can access the CI VM over SSH and gain access to all environment variables and secrets.

Conclusion

While kubeadm-dind-cluster could sound like a much better solution compared to Minikube, especially if you need newer Kubernetes versions, both have various pros and cons. Choosing solution depends on many factors and your needs, and there is no universal solution.

Some of the problems will be solved once Travis-CI starts supporting Ubuntu 16.04 Xenial Xerus or Ubuntu 18.04 Bionic Beaver, systemd and systemd-containers, but there is still no ETA.

Big thanks to Lili Cosic for the awesome blog post that helped me to get started with Minikube, as well as to my mentors, Dr. Stefan Schimanski and David Eads, for all the great tips and for going through debugging with me.

If you have any questions, suggestions and feedback, reach out to me on Twitter or on Kubernetes Slack as xmudrii.

Thanks to my mentor Dr. Stefan Schimanski for reviewing this post!

xmudrii.com

Spawning Kubernetes Clusters in CI for Integration and E2E tests

Defining expectations

Minikube

RBAC and Minikube

Minikube Bootstrappers. Available Kubernetes versions.

Enabling alpha and experimental features

Docker-in-Docker (DIND) and kubeadm

Enabling alpha and experimental features

Number of Nodes and Bootstrapping Speed

Running Tests In Cloud

Debugging in Travis-CI

Conclusion