Did your kubelet certificate expire in k8s

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

For some reason the kublet selfsigned certificate was expired in my cluster. That is the kubelets own API-service, running on port 10250 (i.e. not the client cert that kubelet uses to talk with api-servers). Its supposed to be a self-signed certificate but it was not renewed.

The problem was not very obvious but we saw it when the metrics-service did not work properly. It complained about expired certificates on for port 10250 on nodes.

I could not find any article about how to re-create this certificate. Sure, kubeadm certs has a lot of renewal options, but not for the actual kublet https port as far as I could find out.

The solution showed up to be quite simple. Just remove the two files /var/lib/kubelet/pki/kubelet.crt and /var/lib/kubelet/pki/kubelet.key and restart the kublet service with systemctl restart kublet.

The kubelet will then generate new self-signed certs.

In the end though, this was shown not to be the problem. First, the metrics service deployment needs to be run with the container argument: --kubelet-insecure-tls

at least if the kubelets run with self-signed certs.

Our root problem was that one api-server was running with a faulty proxy settings which caused its internal call to the metrics server to fail.

Running out of ephemeral storage in kubernetes?

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

This is how I rescued running systems from running out of the ephemeral storage in kubernetes.

The ephemeral storage is local temporary storage used by kubermetes. most noticaly the emptyDir kind of volume. Depending of type of pods running this might not require a lot of storage, but can also need significant sizes. One particular case if you run docker-in-docker. Then all images pulled by that pod will be stored in the ephemeral space.

What i did was following on relevant nodes.

  1. First i stopped further scheduling with: kubectl cordon <node>
  2. The I made sure that none of the pods where in use anywhere in the system.
  3. Then run: kubectl drain --force --delete-emptydir-data --ignore-daemonsets <node>
  4. Stop kubelet with: systemctl stop kubelet
  5. Stopped all containers with : docker kill $(docker ps -aq) and docker rm $(docker ps -aq)
  6. Unmounted all current mounts below /var/lib/kubelet.
  7. Create a new lvm to keep the ephemeral storage. Copied the current ephemeral storage from /var/lib/kubelet to the new volume and mounted the new volume there instead (cleaning out the old data first).
  8. Then start kubelet and uncordon the node with systemctl start kubelet and kubectl uncordon <node>

Done

Number of gerrit-trigger connections keeps growing using helm jenkins

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

When using the gerrit-trigger plugin in jenkins and wanting to configure everything from git I've experienced that the ssh connections to the gerrit server can grow to eventually consume all connections possible when using the groovy-script given as example for setting up the gerrit-trigger plugin. Since the gerrit-trigger does not yet support Jenkins as Code configuration, it must be setup with a groovy JCasC script.

I found a way of solving this connection leakage which was caused by the configuration being reloaded quite often from the helm side-cart.
This is my code section to make it work in a helm deployment of jenkins:

jenkins:
  JCasC:

gerrit-trigger: |
groovy:
- script: >
import jenkins.model.Jenkins;
import net.sf.json.JSONObject;
import com.sonyericsson.hudson.plugins.gerrit.trigger.GerritServer;
if (Jenkins.instance.pluginManager.activePlugins.find { it.shortName == "gerrit-trigger" } != null)
{
    println("JCasC Groovvy: Setting gerrit-trigger server plugin");
    def gerritPlugin = Jenkins.instance.getPlugin(com.sonyericsson.hudson.plugins.gerrit.trigger.PluginImpl.class);
    // Create new or attach to existing server
    def serverName = "my-gerrit";
    def GerritServer server;
    if (gerritPlugin.containsServer(serverName)) {
        server = gerritPlugin.getServer(serverName);
    }
    else {
        println("JCasC Groovvy: Created new gerrit server ${serverName}");
        server = new GerritServer(serverName);
    }
    server.stop();
    def config = server.getConfig();
    config.setGerritHostName("<gerrit-server>")
    config.setGerritSshPort(29418)
    config.setGerritUserName("<gerrit-ssh-user>")
    config.setGerritFrontEndURL("<your-gerrit-url>:8080")
    config.setGerritAuthKeyFile(new File("/var/jenkins_home/.ssh/id_rsa.<gerrit-ssh-user>"))
    config.setGerritEMail("<jenkins-email>")
    config.setNumberOfReceivingWorkerThreads(3);
    config.setNumberOfSendingWorkerThreads(1);
    config.setUseRestApi(false)
    server.setConfig(config);
    gerritPlugin.addServer(server);
    server.start();
    server.startConnection();
    println("JCasC Groovvy: Setting ${serverName} completed");
}

How to pre-configure rasberry image for swedish wifi and enable ssh

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

This will setup a raspian image to be used with a swedish wifi and ssh enabled. Done in linux.

Prologue:

  • Map the partitions from the image with:
        kpartx -a <name-of-unpacked-raspian-image>.img
  • Run losetup to see which loop device was used. Lets say it was loop4 in this case
  • Mount the root partition of the image (second partition) under /mnt:
       sudo mount /dev/mapper/loop4p2 /mnt

Wifi:

  • Create the file /mnt/etc/wpa_supplicant/wpa_supplicant.conf  with following content:
    ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
    update_config=1
    country=SE
    network={
    ssid="<your-wifi-ssid>"
    psk="<your-wifi-password>"
    }
  • Replace the '1' to a '0' i all files /mnt/var/lib/systemd/rfkill/platform-*

Enable ssh daemon:

  • ln -s /lib/systemd/system/ssh.service /etc/systemd/system/
  • ln -s /lib/systemd/system/ssh.service /etc/systemd/system/multi-user.target.wants/

Epilogue:

  • Run: umount /mnt
  • Drop the devicemapper mappings:
        kpartx -x <name-of-unpacked-raspian-image>.img

Now your image file should be ready to write to a MicroSD card and the raspberry should boot up directly to the wifi network with ssh enabled.

Extra stuff

Enable camera:

  • Run the kpartx and mount /dev/mapper/loop4p1 instead as /mnt
  • Add following files to the "[all]" section of /mnt/config.txt:
    start_x=1
    gpu_mem=128

Kubernetes certificate missery

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

Today I did really had to exercise some certificate signing and debugging. It all started when I saw some deployment would not run properly but was stuck in rollout status:

"waiting for deployment spec update to be observed...".

After reading all system logs files I could find and after looking at the logs from apiservers and controller-managers I saw a lot of errors like:

"error retrieving resource lock kube-system/kube-controller-manager: Unauthorized"

This is a typical sign of an expired certificate in k8s. Kubernetes is supposed to perforrm automatically renew of its certificates - that normally expires within a year - but this only happens if you do an k8s-upgrade within this period. If you miss that, all certificates except the CA-certs expires and most things stops working. From reading https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/ you learn you can renew all certificated manually by using the command:

kubeadm alpha certs renew all

It really does renew them, but the problem is that when whole k8s is locked up, it does not pick up those updated certificates. The reason for this - I think - is that running docker mounts, using overlayfs, misses these updates to the local /etc-filesystem. To be really sure that the certs are update, verify them with:

openssl x509 -in <cert> -noout -text

What I ended up doing was to stop the service kubelet on all masters, forcibly stopping all docker containers running the kube-apiserver and kube-etcd and then restart kubelet service. This will revive the apiservers and etcd daemons and they will pick up the certs.

But wait there is more, the kube-scheduler and kube-controller-manager does not have discrete certificates in the /etc/kubernetes/pki folders, instead they have their certs base64 encoded in their respectively config-files /etc/kubernetes/controller-manager.conf and /etc/kubernetes/scheduler.conf. They are also expired.

Again I had to stop the kubelet daemon on all masters, extract the certificate data, and base64-decode it. Both certs then needs to be renewed and singed by the k8s CA (/etc/kubernetes/pki/ca.{crt,key}). What I ended up doing was to import all of the above certs (and keys) in to the superior certificate tool XCA.
Inside XCA I then renewed both certs (for all masters), extract the new certs, base64 encoded them and pasted in back in the above config files. 
Make sure the certificate-data starts with "LS0tLS1C..." which is the base64 encoding of "==== BEGIN ..."

Finishing up with forcibly deleting all running kube-controller-manager and kube-scheduler (and their corresponding pause-containers) on all masters and restarting kubelet again. Note, don't do this on all nodes at the same time, but on one complete node at a time.

Finally, all proxy pods needed to be restarted by just deleting them and let the daemon-set re-create them. There is one for each node in the cluster.

I strongly recommend you to keep all the keys and certificates from k8s in a tool like xca. 

That was basically it.

Subcategories