# 2018-04-02-kubernetes1.9 install online

本文主要介绍通过在线方式kubeadm部署kubernetes1.9. 部署环境需要能够拉去google镜像

主要内容：

1. kubeadm init 自定义etcd环境，支持etcd证书模式
2. kubeadm join的异常解决
3. 两种网络部署方式，flannel与calico
4. dashboard的https与http两种部署方式。
5. 部署EFK、监控、ingress

## install docker/kubeadm

<https://kubernetes.io/docs/setup/independent/install-kubeadm/>

在所有kubernetes节点上设置kubelet使用cgroupfs，与dockerd保持一致，否则kubelet会启动报错

```
docker配置
cat << EOF > /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=cgroupfs"]
}
EOF
systemctl daemon-reload && systemctl restart docker

默认kubelet使用的cgroup-driver=systemd，改为cgroup-driver=cgroupfs
vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
#Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=systemd"
Environment="KUBELET_CGROUP_ARGS=--cgroup-driver=cgroupfs"

重设kubelet服务，并重启kubelet服务
systemctl daemon-reload && systemctl restart kubelet
```

关闭swap

```
swapoff -a
vim /etc/fstab  #swap一行注释掉
```

## master节点 kubeadm init 方法1

初始化集群，也会起一个etcd的pod

```
kubeadm init --pod-network-cidr=10.244.0.0/16
```

## master节点 kubeadm init 方法2

通过配置文件init

配置文件大全见官方

<https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#config-file>

install etcd

1. 准备证书

<https://www.kubernetes.org.cn/3096.html>

在master1需要安装CFSSL工具，这将会用来建立 TLS certificates。

```
export CFSSL_URL="https://pkg.cfssl.org/R1.2"
wget "${CFSSL_URL}/cfssl_linux-amd64" -O /usr/local/bin/cfssl
wget "${CFSSL_URL}/cfssljson_linux-amd64" -O /usr/local/bin/cfssljson
chmod +x /usr/local/bin/cfssl /usr/local/bin/cfssljson
```

创建集群 CA 与 Certificates

在这部分，将会需要产生 client 与 server 的各组件 certificates，并且替 Kubernetes admin user 产生 client 证书。

建立/etc/etcd/ssl文件夹，然后进入目录完成以下操作。

```
 mkdir -p /etc/etcd/ssl && cd /etc/etcd/ssl
 export PKI_URL="https://kairen.github.io/files/manual-v1.8/pki"
```

下载ca-config.json与etcd-ca-csr.json文件，并产生 CA 密钥：

```
wget "${PKI_URL}/ca-config.json" "${PKI_URL}/etcd-ca-csr.json"
cfssl gencert -initca etcd-ca-csr.json | cfssljson -bare etcd-ca
ls etcd-ca*.pem
etcd-ca-key.pem  etcd-ca.pem
```

下载etcd-csr.json文件，并产生 kube-apiserver certificate 证书：

```
wget "${PKI_URL}/etcd-csr.json"   #修改IP为本地，如果是集群，每个节点IP都要添加进去
cfssl gencert \
  -ca=etcd-ca.pem \
  -ca-key=etcd-ca-key.pem \
  -config=ca-config.json \
  -profile=kubernetes \
  etcd-csr.json | cfssljson -bare etcd

ls etcd*.pem
etcd-ca-key.pem  etcd-ca.pem  etcd-key.pem  etcd.pe
```

若节点 IP 不同，需要修改etcd-csr.json的hosts。

完成后删除不必要文件： `rm -rf *.json`

确认/etc/etcd/ssl有以下文件：

```
ls /etc/etcd/ssl
etcd-ca.csr  etcd-ca-key.pem  etcd-ca.pem  etcd.csr  etcd-key.pem  etcd.pem
```

1. Etcd 安装与设定

首先在master1节点下载 Etcd，并解压缩放到 /opt 底下与安装：

```
export ETCD_URL="https://github.com/coreos/etcd/releases/download"
cd && wget -qO- --show-progress "${ETCD_URL}/v3.2.9/etcd-v3.2.9-linux-amd64.tar.gz" | tar -zx
mv etcd-v3.2.9-linux-amd64/etcd* /usr/local/bin/ && rm -rf etcd-v3.2.9-linux-amd64
```

完成后新建 Etcd Group 与 User，并建立 Etcd 配置文件目录：

```
groupadd etcd && useradd -c "Etcd user" -g etcd -s /sbin/nologin -r etcd
```

下载etcd相关文件，我们将来管理 Etcd：

```
export ETCD_CONF_URL="https://kairen.github.io/files/manual-v1.8/master"
wget "${ETCD_CONF_URL}/etcd.conf" -O /etc/etcd/etcd.conf
wget "${ETCD_CONF_URL}/etcd.service" -O /lib/systemd/system/etcd.service
```

编辑/etc/etcd/etcd.conf, 把IP改成本地IP，0.0.0.0的不要改。

如果是etcd集群，ETCD\_INITIAL\_CLUSTER="master1=[https://192.168.1.144:2380,node1=https://192.168.1.145:2380,node2=https://192.168.1.146:2380](https://192.168.1.144/:2380,node1=https/192.168.1.145:2380,node2=https://192.168.1.146:2380)"

master1,node1,node2与ETCD\_NAME参数匹配。

建立 var 存放信息，然后启动 Etcd 服务:

```
mkdir -p /var/lib/etcd && chown etcd:etcd -R /var/lib/etcd /etc/etcd
```

1. node1,node2 etcd安装（如果单点etcd跳过此步）

从master1 copy配置文件

```
mkdir -p /etc/etcd/ssl && cd /etc/etcd/ssl
scp  192.168.1.144:/etc/etcd/ssl/* .
scp  192.168.1.144:/usr/local/bin/etcd* /usr/local/bin/
groupadd etcd && useradd -c "Etcd user" -g etcd -s /sbin/nologin -r etcd
scp  192.168.1.144:/etc/etcd/etcd.conf /etc/etcd/etcd.conf
scp  192.168.1.144:/lib/systemd/system/etcd.service /lib/systemd/system/etcd.service
mkdir -p /var/lib/etcd && chown etcd:etcd -R /var/lib/etcd /etc/etcd
```

vim /etc/etcd/etcd.conf, ETCD\_NAME改为node1 node2， 及修改IP

1. 启动etcd\
   systemctl enable etcd.service && systemctl start etcd.service\
   如为集群，则都要启动\
   验证，集群内节点注意时间要同步

   ```
   export CA="/etc/etcd/ssl"
   ETCDCTL_API=3 etcdctl  --cacert=${CA}/etcd-ca.pem \
    --cert=${CA}/etcd.pem  --key=${CA}/etcd-key.pem \
    --endpoints="https://192.168.1.144:2379" \
    endpoint health
   ETCDCTL_API=3 etcdctl  --cacert=${CA}/etcd-ca.pem \
    --cert=${CA}/etcd.pem  --key=${CA}/etcd-key.pem \
    --endpoints="https://192.168.1.144:2379" \
    member list
   ```
2. 写kubeadm配置文件

   ```
   root@instance-1:/home# cat cluster 
   apiVersion: kubeadm.k8s.io/v1alpha1
   kind: MasterConfiguration
   etcd:
   endpoints:
   - https://192.168.1.144:2379
   - https://192.168.1.145:2379
   - https://192.168.1.146:2379
   caFile : /etc/etcd/ssl/etcd-ca.pem
   certFile : /etc/etcd/ssl/etcd.pem
   keyFile : /etc/etcd/ssl/etcd-key.pem
   networking:
   podSubnet: 10.244.0.0/16
   ```

kubeadm init

```
root@master1:~# kubeadm init --config=cluster
```

## kubeadm init 排错

完成后如果不能get node ,get po

```
root@master1:/etc/kubernetes# kubectl get node
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
```

清空/root/.kube

```
root@master1:/etc/kubernetes# rm -rf /root/.kube/

root@master1:/etc/kubernetes#   mkdir -p $HOME/.kube
root@master1:/etc/kubernetes#   sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
root@master1:/etc/kubernetes#   sudo chown $(id -u):$(id -g) $HOME/.kube/config
```

## 部署网络-flannel，与calico二选一

未部署网络时，get node，node的状态是not ready

```
root@master1:~# kubectl get no
NAME      STATUS     ROLES     AGE       VERSION
master1   NotReady   master    22m       v1.9.2
node1     NotReady   <none>    14m       v1.9.2
```

kube-dns的状态也是pending，他依赖于网络

通过官方模版导入flannel，yaml文件中有一行参数，"Network": "10.244.0.0/16" ，这个要与kubeadm init时候参数一致，是pod IP的范围

```
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
或者
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml  #来自官方

root@master1:~# kubectl get no
NAME      STATUS    ROLES     AGE       VERSION
master1   Ready     master    24m       v1.9.2
node1     Ready     <none>    16m       v1.9.2
```

## 部署网络-calico，与flannel二选一

1. 如果使用的k8s内置etcd &#x20;

   直接执行，注意calico.yaml中pod IP范围与init一致。

   ```
   kubectl taint node master1 node-role.kubernetes.io/master-  #如果只有一个maser，要设为可运行pod
   kubectl apply -f https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/rbac.yaml
   kubectl apply -f https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/calico.yaml
   ```
2. 使用自己部署的etcd &#x20;

   RBAC. If deploying Calico on an RBAC enabled cluster, you should first apply the ClusterRole and ClusterRoleBinding specs: &#x20;

   kubectl apply -f <https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/rbac.yaml>

Download calico.yaml\
Configure etcd\_endpoints in the provided ConfigMap to match your etcd cluster. Then simply apply the manifest:

```
wget https://docs.projectcalico.org/v3.0/getting-started/kubernetes/installation/hosted/calico.yaml

vim calico.yaml  
修改
etcd_endpoints: "https://192.168.1.144:2379,https://192.168.1.145:2379,https://192.168.1.146:2379"
  etcd_ca: ""   # "/calico-secrets/etcd-ca"  --/etc/etcd/ssl/etcd-ca.pem
  etcd_cert: "" # "/calico-secrets/etcd-cert"  --/etc/etcd/ssl/etcd.pem
  etcd_key: ""  # "/calico-secrets/etcd-key"  --/etc/etcd/ssl/etcd-key.pem

对三个pem文件分别转码  
 base64 /etc/etcd/ssl/etcd-ca.pem  | tr -d '\n'
 base64 /etc/etcd/ssl/etcd.pem  | tr -d '\n'
 base64 /etc/etcd/ssl/etcd-key.pem | tr -d '\n'

把pod ip池修改为和init时候的ip池一致
            - name: CALICO_IPV4POOL_CIDR
              value: "192.168.0.0/16"

kubectl apply -f calico.yaml
kubectl taint node master1 node-role.kubernetes.io/master-  #如果只有一个maser，要设为可运行pod
```

```
root@master1:/home# kubectl logs calico-node-vvl87 calico-node  -n kube-system 
2018-01-29 09:46:16.638 [INFO][9] startup.go 187: Early log level set to info
2018-01-29 09:46:16.639 [INFO][9] startup.go 198: NODENAME environment not specified - check HOSTNAME
```

1. calicoctl

```
curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v2.0.0/calicoctl
mv calicoctl /usr/local/bin/
chmod a+x /usr/local/bin/calicoctl
mkdir -p /etc/calico

vim /etc/calico/calicoctl.cfg
apiVersion: projectcalico.org/v3
kind: CalicoAPIConfig
metadata:
spec:
  etcdEndpoints: https://192.168.1.144:2379,https://192.168.1.145:2379,https://192.168.1.146:2379
  etcdKeyFile: /etc/etcd/ssl/etcd-key.pem
  etcdCertFile: /etc/etcd/ssl/etcd.pem
  etcdCACertFile: /etc/etcd/ssl/etcd-ca.pem

calicoctl get ippools
calicoctl get node
```

## node节点kubeadm join

node节点join失败

```
kubeadm join --token 55c2c6.2a4bde1bc73a6562 192.168.1.144:6443 --discovery-token-ca-cert-hash sha256:0fdf8cfc6fecc18fded38649a4d9a81d043bf0e4bf57341239250dcc62d2c832

[discovery] Failed to request cluster info, will try again: [Get https://192.168.1.144:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: x509: certificate has expired or is not yet valid]
```

检查master节点KUBECONFIG变量，如果不存在,则执行export

```
root@master1:~# echo $KUBECONFIG
export KUBECONFIG=$HOME/.kube/config
```

node节点kubeadm reset ，再join

如果忘了token，在master执行kubeadm token list 可以看到token node执行 kubeadm join --token n7uezq.b82snkxzseitpjzs 192.168.1.144:6443 --discovery-token-unsafe-skip-ca-verification 执行join执行要执行下kubeadm reset

## Addons

<https://github.com/kubernetes/kubernetes/tree/master/cluster/addons>

## 部署dashboard

dashboard（UI）默认是没有部署的，需要手动导入

<https://github.com/kubernetes/dashboard>

dashboard的语言是根据浏览器的语言自己识别的

方法1： https

```
wget https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

修改下service，不然无法外部访问dashboard
kind: Service，增加type: NodePort和NodePort: 32001
  type: NodePort
  ports:
    - port: 443
      targetPort: 8443
      nodePort: 32001

kubectl create -f kubernetes-dashboard.yaml

赋予dashboard 的sa 集群admin权限
cat admin-role.yaml

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: kubernetes-dashboard
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: kubernetes-dashboard
  namespace: kube-system

root@master1:~# kubectl create -f admin-role.yaml 
clusterrolebinding "admin" created

创建完成后，通过 https://任意节点的IP:32001即可访问，注意是https。
如果chrome无法打开，换firefox，kubeadm生产的证书日期不太多，过期了
登录页面直接点跳过就行，因为默认的serviceaccount已经具备了集群管理员权限。

如果想使用token登录，使用下面命令获取。就是kubernetes-dashboard-token这个secret
kubectl -n kube-system describe secret `kubectl -n kube-system get secret|grep kubernetes-dashboard-token|cut -d " " -f1`|grep "token:"|tr -s " "|cut -d " " -f2
```

方法2： http，登录不弹出认证页面

```
wget https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/alternative/kubernetes-dashboard.yaml

修改下service，不然无法外部访问dashboard
kind: Service，增加type: NodePort和NodePort: 32001
  type: NodePort
  ports:
  - port: 80
    targetPort: 9090
    nodePort: 32002

kubectl create -f kubernetes-dashboard.yaml

赋予dashboard 的sa 集群admin权限
cat admin-role.yaml

kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: kubernetes-dashboard
  annotations:
    rbac.authorization.kubernetes.io/autoupdate: "true"
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: kubernetes-dashboard
  namespace: kube-system

root@master1:~# kubectl create -f admin-role.yaml 
clusterrolebinding "admin" created
```

创建完成后，通过 [http://任意节点的IP:32002即可访问](http://xn--ip-dh3cr99d42ri4h7zv/:32002%E5%8D%B3%E5%8F%AF%E8%AE%BF%E9%97%AE)

## 部署监控

Heapster + InfluxDB + Grafana

<https://github.com/kubernetes/heapster/blob/master/docs/influxdb.md>

```
kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/rbac/heapster-rbac.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/grafana.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/heapster.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/heapster/master/deploy/kube-config/influxdb/influxdb.yaml
```

部署完成后，在dashboard能看到pod的cpu、mem使用量

如果想看grafana页面，改下service，通过nodeport暴露端口

```
 kubectl edit svc monitoring-grafana  -n kube-system
 type: ClusterIP 改成  type: NodePort

 kubectl get svc monitoring-grafana  -n kube-system
```

遗留问题：不显示master节点pod监控信息

## 部署EFK

```
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/fluentd-elasticsearch

kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/es-service.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/es-statefulset.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/fluentd-es-ds.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/kibana-deployment.yaml
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/fluentd-elasticsearch/kibana-service.yaml

给每个每个每个节点加下标签，不然fluentd-es-ds不会部署

kubectl label node node1 beta.kubernetes.io/fluentd-ds-ready=true
```

## 部署ingress controller

<https://github.com/kubernetes/ingress-nginx>

Mandatory commands

```
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/namespace.yaml  | kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/default-backend.yaml | kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/configmap.yaml  | kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/tcp-services-configmap.yaml | kubectl apply -f -
curl https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/udp-services-configmap.yaml | kubectl apply -f -
```

Install with RBAC roles

```
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/rbac.yaml
wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/master/deploy/with-rbac.yaml

vim with-rbac.yaml  修改三处
1. kind: Deployment 改成kind: DaemonSet
2. replicas: 1  删除
3. initContainers:上面一行加hostNetwork: true #两句平级

导入
kubectl create -f rbac.yaml 
kubectl create -f with-rbac.yaml
```

检查

```
root@master1:~# kubectl get po -n ingress-nginx  -o wide
NAME                                    READY     STATUS    RESTARTS   AGE       IP              NODE
default-http-backend-55c6c69b88-6n88s   1/1       Running   0          10m       10.244.1.15     node2
nginx-ingress-controller-2w9g4          1/1       Running   0          7m        192.168.1.146   node2
nginx-ingress-controller-bzfzr          1/1       Running   0          7m        192.168.1.145   node1
nginx-ingress-controller-j9lds          1/1       Running   3          7m        192.168.1.144   master1
```

## FAQ

## kubectl 命令补全

```
root@master1:/# vim /etc/profilecd   #添加下面这句，再source
source <(kubectl completion bash)
root@master1:/# source /etc/profile
```

## master节点默认不可部署pod

执行如下，node-role.kubernetes.io/master 可以在 kubectl edit node master1中taint配置参数下查到

```
root@master1:/var/lib/kubelet# kubectl taint node master1 node-role.kubernetes.io/master-
node "master1" untainted
```

## node节点pod无法启动/节点删除网络重置

node1之前反复添加过,添加之前需要清除下网络

```
root@master1:/var/lib/kubelet# kubectl get po -o wide
NAME                   READY     STATUS              RESTARTS   AGE       IP           NODE
nginx-8586cf59-6zw9k   1/1       Running             0          9m        10.244.3.3   node2
nginx-8586cf59-jk5pc   0/1       ContainerCreating   0          9m        <none>       node1
nginx-8586cf59-vm9h4   0/1       ContainerCreating   0          9m        <none>       node1
nginx-8586cf59-zjb84   1/1       Running             0          9m        10.244.3.2   node2
```

```
root@node1:~# journalctl -u kubelet
 failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nginx-8586cf59-rm4sh_default" network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.2.1/24
12252 cni.go:227] Error while adding to cni network: failed to set bridge addr: "cni0" already
```

重置kubernetes服务，重置网络。删除网络配置，link

```
kubeadm reset
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start docker
```

加入节点 systemctl start docker

```
kubeadm join --token 55c2c6.2a4bde1bc73a6562 192.168.1.144:6443 --discovery-token-ca-cert-hash sha256:0fdf8cfc6fecc18fded38649a4d9a81d043bf0e4bf57341239250dcc62d2c832
```
