How to Reinstall Kubernetes on Ubuntu Server


[Update]: Make sure that you are using the latest Linux Distro, preferably Linux 22.04

Make sure that you have installed Docker

1. Add a Signed Key to Your Server, before running this command make sure curl is installed on your machine

curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add

2. Add Kubernetes Repository as it is not included in Ubuntu App Repositories

sudo apt-add-repository "deb http://apt.kubernetes.io/ kubernetes-xenial main"

3. Install Kubernetes Tools you will need to manage the Cluster when it is up and running

sudo apt-get install kubeadm kubelet kubectl

4. Put the Kubernetes Tools on Hold until everything is configured and up and running

sudo apt-mark hold kubeadm kubelet kubectl

5. If you are planning to use Kubernestes Master Node on the Same Server as the Worker Nodes then you will have to disable or add Tolarations to the Pods when configured. Set the Server you are operating on as a master-machine 

sudo hostnamectl set-hostname master-machine

6. Register the Network IP Range to Kubernetes Admin Controller

sudo kubeadm init --pod-network-cidr=FirstDigits.SecondDigits.0.0/16

7. After Step 6 Completes, follow the instruction from Kubernetes Installation to complete other steps needed. The outcome from Step 6 will look like this:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join YourLocalIPAddress:Port --token [HashToken] \
    --discovery-token-ca-cert-hash [Algorithm]:[TokenHere]

8. Create Pod Communication Network so that Pods can communicate with each other

sudo kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

9. Now that you followed Step 6 up to Step 8, it's time to join the Worker Node (The machine/server you want to pods to be running on) to the Admin Controller (The Admin Controller Oversees every Node that runs the Multiple Different Pods, Pods can run multiple Deployments (Docker Containers) ).
- Pay Attention to Step 6, as the instruction going forward in Step 7,8,9 will significantly come from the output of Step 6. Except for step 8, which you will have to decide which Network Parameter you would want to use. The output shows you the URL to read more about Step 8 Command.

kubeadm join --discovery-token ThisValueComesFromKubernetesAsOutPutFromStep7 --discovery-token-ca-cert-hash sha256:ThisValueComesFromKubernetesAsOutPutFromStep7  ThisValueComesFromKubernetesAsOutPutFromStep7 

[Note] If everything goes well, you will eventually see the outcome that says:

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

[Note] If you want to allow Nodes to be Scheduled on Master-Node Server

kubectl taint nodes --all node-role.kubernetes.io/master-
  • You should be proud, a lot could have gone wrong, below are troubleshooting notes:

    Error: [ERROR DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
    Try: sudo rm -r /etc/kubernetes/manifest/

    Error: [ERROR Port-Number]: Port Number is in use
    Try: systemctl restart kubelet or systemctl stop kublet #to stop the service
    Note: If you need to restart over type: kubeadmn reset #this will reset every configuration you have done on the Server for Kubernetes, this command might cause a lot of headaches because reinstalling Kubernetes might throw Errors because of previous Directories Created.
    Error: If you installed the Certificates from the previous installation, you might get this error:
    "/etc/kubernetes/controller-manager.conf" exists already but has got the wrong CA cert

    Try: sudo rm -r /etc/kubernetes/
    Error: Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

    Try: This error might have happened because you have different certificates in the Kubernetes Admin Config.yaml vs Home/.kube/config to solve this type:
    (1) sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config #when prompted type "yes"
    (2)  sudo chown $(id -u):$(id -g) $HOME/.kube/config


    Troubleshooting Errors in Kubernetes:
    1. Failed to stop kublet.service: Unit kublet.service not loaded.
        OR kubernetes error execution phase upload-config/kubelet

      - If you have this problem, check that you did not include node-name MyNodeNameMaster
        there is a chance that DNS is not resolving the Computer Hostname, just remove node-name 
        and only specify IP address like so:
    sudo kubeadm init --control-plane-endpoint=localIPV4 --pod-network-cidr=10.244.0.0/16 --v=5?

       After everything is okay, make sure you create the Core DNS Overlay Network for Pods to connect back to Master
    #Run this on Master Node when all namespaces are in pending mode.
    kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml?


    Then check if Kubernetes Master-Node is running: kubectl cluster-info 
    you should see something like: Kubernetes master is running at https://IP:Port

    Other Errors you might encounter are:


    1. [Error DirAvailable-var-lib-etcd]: /var/lib/etcd is not empty
        Solution: Go ahead and remove the directory by using the command below
                         sudo rm -r /var/lib/etcd/
          then do: sudo kubeadm reset
           then try to initialize the Kubernetes by using the init command.
    2. [ERROR Port-10250]: Port 10250 is in use
        Solution: On the Slave Node do the following commands
          sudo kubeadm reset
          Then try to rejoin the cluster, if you still have errors make sure you remove the configuration Yaml files in the error logs.

    3. Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

    Solution: This means that you previously joined a cluster that has now changed. Go ahead and remove the certificate file as shown in the error log.

    Things I didn't know about Kubernates
    1. You need to do the same except for Initializing the kubeadm init on the other nodes joining the cluser.
    2. Reseting the Kubeadmn is not that bad, with few commands you can get up and running again.
    3. The most annoying error is when the Container in a pod gets stuck in ContainerCreating. I haven't found a solution to this yet.


    Error when joining a Kubenetes Node to the Cluster

    [preflight] Running pre-flight checks
    [preflight] Reading configuration from the cluster...
    [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
    error execution phase preflight:
    One or more conditions for hosting a new control plane instance is not satisfied.
    
    [failure loading certificate for CA: couldn't load the certificate file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory, failure loading key for service account: couldn't load the private key file /etc/kubernetes/pki/sa.key: open /etc/kubernetes/pki/sa.key: no such file or directory, failure loading certificate for front-proxy CA: couldn't load the certificate file /etc/kubernetes/pki/front-proxy-ca.crt: open /etc/kubernetes/pki/front-proxy-ca.crt: no such file or directory, failure loading certificate for etcd CA: couldn't load the certificate file /etc/kubernetes/pki/etcd/ca.crt: open /etc/kubernetes/pki/etcd/ca.crt: no such file or directory]
    
    Please ensure that:
    * The cluster has a stable controlPlaneEndpoint address.
    * The certificates that must be shared among control plane instances are provided.
    
    
    To see the stack trace of this error execute with --v=5 or higher?

    Solution: Reginerate the Join Hash Command, there is timeout from when the command was generated to when the node is trying to join the cluster.
    Use the Command to reginerate the Join Command:
    kubeadm token create --print-join-command

    Resources:



 







For peering opportunity Autonomouse System Number: AS401345 Custom Software Development at ErnesTech Email Address[email protected]