kubeadm upgrade node failing with "failed to get config map: Unauthorized"

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

Are you getting below errors when running: kubeadm upgrade node

[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
unable to fetch the kubeadm-config ConfigMap: failed to get config map: Unauthorized
To see the stack trace of this error execute with --v=5 or higher

Chances are that the kubelet certificate is expired and can not be used to upgrade the node. A strace of that command reveals it does not really use the current users config but the file /etc/kubernetes/kubelet.conf. That file does in turn point to the key and cert to be used. In my installation both cert and key refered to /var/lib/kubelet/pki/kubelet-client-current.pem as follows:

client-certificate: /var/lib/kubelet/pki/kubelet-client-current.pem
client-key: /var/lib/kubelet/pki/kubelet-client-current.pem

That file is in turn a symlink to the real file, in my case was /var/lib/kubelet/pki/kubelet-client-2020-06-04-18-38-46.pem - which for some reason was not updated recently.

You need to create an updated cert/key file there and change /var/lib/kubelet/pki/kubelet-client-current.pem to this new file instead. 

Problem solved.

Gluster cluster split in two?

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

It can happen that a Gluster cluster gets divided in two parts. I'm not talking a about a volume split brain here but a whole cluster. Something might have gone wrong when probing a node. Or as in our case when adding aliases for nodes, the peer info file was corrupted (seems to be a maximum name length for nodes) which caused some nodes to believe they where in another cluster.

The solution to this is to first decide which nodes you consider as the proper cluster. Running gluster peer status will show you what other nodes are considered to be in the same group as the node you run the status command on. Nodes that are in state "Peer Rejected State" might thing they are part of another cluster. If most of the nodes are in "Peer Rejected State", then probably you should run the command on one of those nodes in rejected state and you will see that most nodes there will be in ok state.

On all those nodes in rejected state, run following procedure:

  1. Stop glusterd
  2. Remove all files from /var/lib/glusterd except the glusterd.info-file
  3. Start glusterd again
  4. Run a gluster peer probe to a member node.
  5. Restart glusterd again

Other lessons learned:

Do make sure that you save the glusterd.info file, if not a new one will be created and effectively you will be creating a new node, with the same name. To solve this, stop the glusterd daemon on all nodes, remove the faulty uuid from /var/lib/glusterfs/peers and restart glusterd on all nodes again.
I did not find this error immediately and I was strugling with a lot of locking errors in glusterd.log file and any "gluster volume status" command would just hang for ever.

Running out of ephemeral storage in kubernetes?

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

This is how I rescued running systems from running out of the ephemeral storage in kubernetes.

The ephemeral storage is local temporary storage used by kubermetes. most noticaly the emptyDir kind of volume. Depending of type of pods running this might not require a lot of storage, but can also need significant sizes. One particular case if you run docker-in-docker. Then all images pulled by that pod will be stored in the ephemeral space.

What i did was following on relevant nodes.

  1. First i stopped further scheduling with: kubectl cordon <node>
  2. The I made sure that none of the pods where in use anywhere in the system.
  3. Then run: kubectl drain --force --delete-emptydir-data --ignore-daemonsets <node>
  4. Stop kubelet with: systemctl stop kubelet
  5. Stopped all containers with : docker kill $(docker ps -aq) and docker rm $(docker ps -aq)
  6. Unmounted all current mounts below /var/lib/kubelet.
  7. Create a new lvm to keep the ephemeral storage. Copied the current ephemeral storage from /var/lib/kubelet to the new volume and mounted the new volume there instead (cleaning out the old data first).
  8. Then start kubelet and uncordon the node with systemctl start kubelet and kubectl uncordon <node>


Did your kubelet certificate expire in k8s

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

For some reason the kublet selfsigned certificate was expired in my cluster. That is the kubelets own API-service, running on port 10250 (i.e. not the client cert that kubelet uses to talk with api-servers). Its supposed to be a self-signed certificate but it was not renewed.

The problem was not very obvious but we saw it when the metrics-service did not work properly. It complained about expired certificates on for port 10250 on nodes.

I could not find any article about how to re-create this certificate. Sure, kubeadm certs has a lot of renewal options, but not for the actual kublet https port as far as I could find out.

The solution showed up to be quite simple. Just remove the two files /var/lib/kubelet/pki/kubelet.crt and /var/lib/kubelet/pki/kubelet.key and restart the kublet service with systemctl restart kublet.

The kubelet will then generate new self-signed certs.

In the end though, this was shown not to be the problem. First, the metrics service deployment needs to be run with the container argument: --kubelet-insecure-tls

at least if the kubelets run with self-signed certs.

Our root problem was that one api-server was running with a faulty proxy settings which caused its internal call to the metrics server to fail.

How to pre-configure rasberry image for swedish wifi and enable ssh

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive

This will setup a raspian image to be used with a swedish wifi and ssh enabled. Done in linux.


  • Map the partitions from the image with:
        kpartx -a <name-of-unpacked-raspian-image>.img
  • Run losetup to see which loop device was used. Lets say it was loop4 in this case
  • Mount the root partition of the image (second partition) under /mnt:
       sudo mount /dev/mapper/loop4p2 /mnt


  • Create the file /mnt/etc/wpa_supplicant/wpa_supplicant.conf  with following content:
    ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
  • Replace the '1' to a '0' i all files /mnt/var/lib/systemd/rfkill/platform-*

Enable ssh daemon:

  • ln -s /lib/systemd/system/ssh.service /etc/systemd/system/
  • ln -s /lib/systemd/system/ssh.service /etc/systemd/system/multi-user.target.wants/


  • Run: umount /mnt
  • Drop the devicemapper mappings:
        kpartx -x <name-of-unpacked-raspian-image>.img

Now your image file should be ready to write to a MicroSD card and the raspberry should boot up directly to the wifi network with ssh enabled.

Extra stuff

Enable camera:

  • Run the kpartx and mount /dev/mapper/loop4p1 instead as /mnt
  • Add following files to the "[all]" section of /mnt/config.txt: