Ceph容器化部署一时爽,运维火葬场~

一颗小胡椒2022-07-19 09:34:10

Rook 是一个开源的云原生存储编排工具,提供平台、框架和对各种存储解决方案的支持,以和云原生环境进行本地集成。

Rook 将存储软件转变成自我管理、自我扩展和自我修复的存储服务,通过自动化部署、启动、配置、供应、扩展、升级、迁移、灾难恢复、监控和资源管理来实现。Rook 底层使用云原生容器管理、调度和编排平台提供的能力来提供这些功能。

Rook 利用扩展功能将其深度集成到云原生环境中,并为调度、生命周期管理、资源管理、安全性、监控等提供了无缝的体验。有关 Rook 当前支持的存储解决方案的状态的更多详细信息,可以参考 Rook 仓库 的项目介绍。不过目前 Rook 已经很好地提供了对 Ceph 的支持。

Rook架构图

Rook通俗地理解,就是一个存储适配层,是一个框架。对上可以承接K8S的存储需求 ,对下实现对底层存储软件统一适配管控。

目前Rook支持多种存储集群的部署,主要包括:

  • Ceph,它是一个高度可扩展的分布式存储解决方案,适用于块存储、对象存储和共享文件系统,具有多年的生产部署经验。
  • NFS,它允许远程主机通过网络挂载文件系统,并与这些文件系统进行交互,就像在本地挂载一样。
  • Cassandra,它是一个高度可用的NoSQL数据库,具有闪电般快速的性能、灵活可控的数据一致性和大规模的可扩展性。

以上这些存储系统都有独立的基于K8S的Openrator,能够实现all in K8S运行。真正达到云原生。

为什么要使用Rook

云原生是大势所趋,应用以容器化作为交付标准越来成为事实标准,存储类的应用也不例外。基础设施围绕基于K8S的“云”操作系统来建设,逐渐在技术圈内达成了共识。使用Rook进行存储管控, 可以解决以下问题:

  • 本身有基于K8S的云原生基础设施,可以直接接入存储管理,实现统一化
  • 能够快速部署一套云原生存储集群
  • 平台化管理云原生存储集群,包括存储的扩容、升级、监控、灾难恢复等全生命周期管理

环境说明

测试试验环境:

  • Kubernetes: v1.19.9
  • Docker:20.10.11
  • Rook:release-1.4

k8s环境可以通过minikube或者kubeadm进行部署, 这里我使用的是kainstall,在此强烈推荐好友@lework的kainstall,一个脚本完成生产级的k8s集群搭建(基于kubeadm的shell封装)

[root@k8s-master-node1 ~]# kubectl  get nodes -o wide
NAME               STATUS   ROLES    AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
k8s-master-node1   Ready    master   244d   v1.19.9   172.16.8.80     <none>        CentOS Linux 7 (Core)   3.10.0-862.el7.x86_64   docker://20.10.5
k8s-master-node2   Ready    master   244d   v1.19.9   172.16.8.81     <none>        CentOS Linux 7 (Core)   3.10.0-862.el7.x86_64   docker://20.10.5
k8s-master-node3   Ready    master   244d   v1.19.9   172.16.8.82     <none>        CentOS Linux 7 (Core)   3.10.0-862.el7.x86_64   docker://20.10.5
k8s-worker-node1   Ready    worker   244d   v1.19.9   172.16.8.83     <none>        CentOS Linux 7 (Core)   3.10.0-862.el7.x86_64   docker://20.10.5
k8s-worker-node2   Ready    worker   21h    v1.19.9   172.16.49.210   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://20.10.11
k8s-worker-node3   Ready    worker   21h    v1.19.9   172.16.49.211   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://20.10.11
k8s-worker-node4   Ready    worker   21h    v1.19.9   172.16.49.212   <none>        CentOS Linux 7 (Core)   3.10.0-957.el7.x86_64   docker://20.10.11
说明:
k8s-worker-node{2,3,4}每个节点上面都一块vdb的数据盘

部署rook和Ceph集群

从github获取指定版本的rook

[root@k8s-master-node1 /opt]# git clone -b release-1.4 https://github.com/rook/rook.git
正克隆到 'rook'...
remote: Enumerating objects: 91504, done.
remote: Counting objects: 100% (389/389), done.
remote: Compressing objects: 100% (237/237), done.
remote: Total 91504 (delta 176), reused 327 (delta 144), pack-reused 91115
接收对象中: 100% (91504/91504), 45.43 MiB | 4.41 MiB/s, done.
处理 delta 中: 100% (63525/63525), done.

进入rook的ceph目录,部署rook以及ceph集群

cd rook/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml -f operator.yaml
kubectl create -f cluster.yaml
说明:
  1. comm.yaml里面主要是权限控制以及CRD资源定义
  2. operator.yamlrook-ceph-operator的deloyment
  3. cluster.yamlcephclusters.ceph.rook.io这个CRD资源的使用,即部署一个完整的ceph集群
  4. 默认不做定制, 集群默认会启动3个mon,带有空闲裸盘的节点,会自动为这个盘进行OSD初始化.(默认至少需要有3个节点,且每个节点至少有一个空闲盘)

部署完成之后,可以通过kubectl在rook-ceph空间下查看pod的状态

[root@k8s-master-node1 /tmp/rook/cluster/examples/kubernetes/ceph]# kubectl get pods -n rook-ceph  -o wide
NAME                                                         READY   STATUS      RESTARTS   AGE     IP               NODE               NOMINATED NODE   READINESS GATES
csi-cephfsplugin-4q4j5                                       3/3     Running     0          21h     172.16.49.211    k8s-worker-node3   <none>           <none>
csi-cephfsplugin-c8sdw                                       3/3     Running     0          21h     172.16.8.82      k8s-master-node3   <none>           <none>
csi-cephfsplugin-provisioner-56d8446896-4c68j                6/6     Running     0          21h     10.244.91.69     k8s-worker-node4   <none>           <none>
csi-cephfsplugin-provisioner-56d8446896-rq2j2                6/6     Running     0          21h     10.244.219.71    k8s-worker-node2   <none>           <none>
csi-cephfsplugin-r9cqt                                       3/3     Running     0          21h     172.16.8.80      k8s-master-node1   <none>           <none>
csi-cephfsplugin-sdxm5                                       3/3     Running     0          21h     172.16.49.212    k8s-worker-node4   <none>           <none>
csi-cephfsplugin-sntm4                                       3/3     Running     0          21h     172.16.49.210    k8s-worker-node2   <none>           <none>
csi-cephfsplugin-stkg4                                       3/3     Running     0          21h     172.16.8.83      k8s-worker-node1   <none>           <none>
csi-cephfsplugin-v88d6                                       3/3     Running     0          21h     172.16.8.81      k8s-master-node2   <none>           <none>
csi-rbdplugin-bnmhp                                          3/3     Running     0          21h     172.16.8.82      k8s-master-node3   <none>           <none>
csi-rbdplugin-grw9c                                          3/3     Running     0          21h     172.16.8.80      k8s-master-node1   <none>           <none>
csi-rbdplugin-p47n6                                          3/3     Running     0          21h     172.16.49.210    k8s-worker-node2   <none>           <none>
csi-rbdplugin-provisioner-569c75558-4hw9d                    6/6     Running     0          21h     10.244.198.197   k8s-worker-node3   <none>           <none>
csi-rbdplugin-provisioner-569c75558-62ds8                    6/6     Running     0          21h     10.244.219.70    k8s-worker-node2   <none>           <none>
csi-rbdplugin-s56gp                                          3/3     Running     0          21h     172.16.49.211    k8s-worker-node3   <none>           <none>
csi-rbdplugin-vhjv7                                          3/3     Running     0          21h     172.16.49.212    k8s-worker-node4   <none>           <none>
csi-rbdplugin-xg48n                                          3/3     Running     0          21h     172.16.8.81      k8s-master-node2   <none>           <none>
csi-rbdplugin-zb6b9                                          3/3     Running     0          21h     172.16.8.83      k8s-worker-node1   <none>           <none>
rook-ceph-crashcollector-k8s-worker-node2-bbd9587f9-hvq92    1/1     Running     0          21h     10.244.219.74    k8s-worker-node2   <none>           <none>
rook-ceph-crashcollector-k8s-worker-node3-65bb549b8b-8z4q2   1/1     Running     0          21h     10.244.198.202   k8s-worker-node3   <none>           <none>
rook-ceph-crashcollector-k8s-worker-node4-8457f67c97-29wgn   1/1     Running     0          21h     10.244.91.72     k8s-worker-node4   <none>           <none>
rook-ceph-mgr-a-749575fc54-dtbpw                             1/1     Running     0          21h     10.244.198.198   k8s-worker-node3   <none>           <none>
rook-ceph-mon-a-59f6565594-nxlbv                             1/1     Running     0          21h     10.244.198.196   k8s-worker-node3   <none>           <none>
rook-ceph-mon-b-688948c479-j7hcj                             1/1     Running     0          21h     10.244.91.68     k8s-worker-node4   <none>           <none>
rook-ceph-mon-c-7b7c6fffd7-h5hk6                             1/1     Running     0          21h     10.244.219.69    k8s-worker-node2   <none>           <none>
rook-ceph-operator-864f5d5868-gsww8                          1/1     Running     0          22h     10.244.91.65     k8s-worker-node4   <none>           <none>
rook-ceph-osd-0-6b74867f6b-2qwnv                             1/1     Running     0          21h     10.244.219.73    k8s-worker-node2   <none>           <none>
rook-ceph-osd-1-65596bf48-6lxxv                              1/1     Running     0          21h     10.244.91.71     k8s-worker-node4   <none>           <none>
rook-ceph-osd-2-5bc6788b7f-z2rzv                             1/1     Running     0          21h     10.244.198.201   k8s-worker-node3   <none>           <none>
rook-ceph-osd-prepare-k8s-master-node1-4kxg8                 0/1     Completed   0          3h12m   10.244.236.163   k8s-master-node1   <none>           <none>
rook-ceph-osd-prepare-k8s-master-node2-tztm9                 0/1     Completed   0          3h12m   10.244.237.101   k8s-master-node2   <none>           <none>
rook-ceph-osd-prepare-k8s-master-node3-768v5                 0/1     Completed   0          3h12m   10.244.113.222   k8s-master-node3   <none>           <none>
rook-ceph-osd-prepare-k8s-worker-node1-dlljc                 0/1     Completed   0          3h12m   10.244.50.240    k8s-worker-node1   <none>           <none>
rook-ceph-osd-prepare-k8s-worker-node2-qszkt                 0/1     Completed   0          3h12m   10.244.219.79    k8s-worker-node2   <none>           <none>
rook-ceph-osd-prepare-k8s-worker-node3-krxqc                 0/1     Completed   0          3h12m   10.244.198.210   k8s-worker-node3   <none>           <none>
rook-ceph-osd-prepare-k8s-worker-node4-l77ds                 0/1     Completed   0          3h12m   10.244.91.78     k8s-worker-node4   <none>           <none>
rook-ceph-tools-5949d6759-lbj74                              1/1     Running     0          21h     10.244.50.234    k8s-worker-node1   <none>           <none>
rook-discover-cjpxh                                          1/1     Running     0          22h     10.244.198.193   k8s-worker-node3   <none>           <none>
rook-discover-lw96w                                          1/1     Running     0          22h     10.244.91.66     k8s-worker-node4   <none>           <none>
rook-discover-m7jzr                                          1/1     Running     0          22h     10.244.236.157   k8s-master-node1   <none>           <none>
rook-discover-mbqtx                                          1/1     Running     0          22h     10.244.237.95    k8s-master-node2   <none>           <none>
rook-discover-r4m6h                                          1/1     Running     0          22h     10.244.50.232    k8s-worker-node1   <none>           <none>
rook-discover-xwml2                                          1/1     Running     0          22h     10.244.113.216   k8s-master-node3   <none>           <none>
rook-discover-xzw2z                                          1/1     Running     0          22h     10.244.219.66    k8s-worker-node2   <none>           <none>
说明:
1.部署成功之后,会包含rook的组件以及ceph-csi相关的组件(rbd和cephfs的plugin同时都会部署)

Ceph 面板

Ceph mgr组件里带有一个Dashboard 的插件,通过这个面板,我们可以在上面查看集群的状态,包括总体运行状态,mgr、osd 和其他 Ceph 进程的状态,查看存储池和 PG 状态,以及显示守护进程的日志等等.以下是cluster.yaml里的默认配置

  dashboard:
    enabled: true
    # serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
    # urlPrefix: /ceph-dashboard
    # serve the dashboard at the given port.
    # port: 8443
    # serve the dashboard using SSL
    ssl: true
说明:
  1. 默认开启dashboard插件,在ceph里可以通过ceph mgr module ls可以查看mgr各种插件状态
  2. 默认访问路径为/,可通过urlPrefix指定路由访问路由前缀
  3. 默认开启ssl,且访问端口是8443

rook部署成功后,可以查看到如下的 service 服务

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# kubectl get service -n rook-ceph
NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
rook-ceph-mgr                            ClusterIP   10.96.227.178   <none>        9283/TCP            3d16h
rook-ceph-mgr-dashboard                  ClusterIP   10.96.159.108   <none>        8443/TCP            3d16h
rook-ceph-mon-a                          ClusterIP   10.96.31.17     <none>        6789/TCP,3300/TCP   3d17h
rook-ceph-mon-b                          ClusterIP   10.96.176.163   <none>        6789/TCP,3300/TCP   3d17h
rook-ceph-mon-c                          ClusterIP   10.96.146.28    <none>        6789/TCP,3300/TCP   3d17h

其中rook-ceph-mgr 服务用于暴露 Prometheus metrics格式的监控指标,而rook-ceph-mgr-dashboard 服务即是ceph dashboard 服务。在集群内部可以通过 DNS 名称 https://rook-ceph-mgr-dashboard.rook-ceph:8443 或者 CluterIP https://10.96.159.108:7000 来进行访问。通常dashboard需要通过外部浏览器来进行访问,可以通过 Ingress 或者 NodePort 类型的 Service 来暴露服务。rook已经贴心地为我们准备好了相关的service

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# ll dashboard-*
-rw-r--r-- 1 root root 363 11月 30 14:10 dashboard-external-https.yaml
-rw-r--r-- 1 root root 362 11月 30 14:10 dashboard-external-http.yaml
-rw-r--r-- 1 root root 839 11月 30 14:10 dashboard-ingress-https.yaml
-rw-r--r-- 1 root root 365 11月 30 14:10 dashboard-loadbalancer.yaml

这里选择NodePort类型的service

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# cat dashboard-external-https.yaml
apiVersion: v1
kind: Service
metadata:
  name: rook-ceph-mgr-dashboard-external-https
  namespace: rook-ceph
  labels:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
spec:
  ports:
  - name: dashboard
    port: 8443
    protocol: TCP
    targetPort: 8443
  selector:
    app: rook-ceph-mgr
    rook_cluster: rook-ceph
  sessionAffinity: None
  type: NodePort

创建成功之后, 就可查看相关的service了。如下所示, 其中49096就是NodePort的外部端口。浏览器通过https://<NodeIP>:49096可以访问到ceph dashboard了

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# kubectl  get svc -n rook-ceph  | grep dash
rook-ceph-mgr-dashboard                  ClusterIP   10.96.159.108   <none>        8443/TCP            3d17h
rook-ceph-mgr-dashboard-external-https   NodePort    10.96.83.5      <none>        8443:49096/TCP      2m

注意:
  1. 由于是自签名证书,需要手动添加信任证书
  2. 不同ceph版本dashboard略有差异, 本环境的ceph版本为ceph version 15.2.8 octopus (stable)

默认用户名为admin,密码是放rook-ceph空间下的rook-ceph-dashboard-password的secret里, 通过以下方式可以获取明文密码

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]#  kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
xxxxxxx ##你的密码 ##

成功登陆之后,大屏展示如下:

Rook 工具箱

要验证集群是否处于正常状态,我们可以使用 Rook 工具箱 来运行 ceph -s 命令来查看集群整体状态。

Rook 工具箱是一个用于调试和测试 Rook 的常用工具容器,对应的toolbox的yaml文件如下所示:

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# kubectl apply -f toolbox.yaml
deployment.apps/rook-ceph-tools created

部署成功之后,通过以下指令进入toolbox的pod环境,然后对ceph集群可以进行运维操作:

kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- bash

toolbox的base镜像是基于Centos8,因此要扩展一些工具,直接使用yum或者rpm工具就可以安装

[root@rook-ceph-tools-5949d6759-256c5 /]# cat /etc/redhat-release
CentOS Linux release 8.3.2011
Tip:
可以创建如下的命令别名,可以方便进入toolbox的环境
alias  ceph-ops='kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') -- bash'

获取集群状态

[root@rook-ceph-tools-5949d6759-256c5 /]# ceph -s
  cluster:
    id:     a0540409-d822-48e0-869b-273936597f2d
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 23h)
    mgr: a(active, since 4h)
    osd: 3 osds: 3 up (since 23h), 3 in (since 23h)

  data:
    pools:   2 pools, 33 pgs
    objects: 17 objects, 21 MiB
    usage:   3.1 GiB used, 297 GiB / 300 GiB avail
    pgs:     33 active+clean

获取集群拓扑

与预期的一样,每块空闲盘都初始化成了OSD

[root@rook-ceph-tools-5949d6759-256c5 /]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME                  STATUS  REWEIGHT  PRI-AFF
-1         0.29306  root default
-3         0.09769      host k8s-worker-node2
 0    hdd  0.09769          osd.0                  up   1.00000  1.00000
-7         0.09769      host k8s-worker-node3
 2    hdd  0.09769          osd.2                  up   1.00000  1.00000
-5         0.09769      host k8s-worker-node4
 1    hdd  0.09769          osd.1                  up   1.00000  1.00000

更多关于ceph架构及运维事宜请查看这里(https://docs.ceph.com/en/pacific/)

部署StorageClass

通过rook部署的Ceph-CSI,已经包含了rbdplugincephfs-plugin

rbd块存储

rbd属于块存储,通俗地理解,就是给使用方(这里指POD)挂载一块硬盘。在k8s里不适用多端同时(挂载)读写。Statefulset的应用中的volumeClaimTemplates会为每个pod都创建独立的pv(rbd)卷

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/rbd]# cat storageclass.yaml
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool   ## 指定的rados pool的名字 ##
  namespace: rook-ceph
spec:
  failureDomain: host  ## 故障域为host级 ##
  replicated:     ## 使用副本机制而非EC ##
    size: 3
    # Disallow setting pool with replica 1, this could lead to data loss without recovery.
    # Make sure you're *ABSOLUTELY CERTAIN* that is what you want
    requireSafeReplicaSize: true
    # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
    # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
    #targetSizeRatio: .5
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-block
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
    # clusterID is the namespace where the rook cluster is running
    # If you change this namespace, also change the namespace below where the secret namespaces are defined
    clusterID: rook-ceph

    # If you want to use erasure coded pool with RBD, you need to create
    # two pools. one erasure coded and one replicated.
    # You need to specify the replicated pool here in the `pool` parameter, it is
    # used for the metadata of the images.
    # The erasure coded pool must be set as the `dataPool` parameter below.
    #dataPool: ec-data-pool
    pool: replicapool   ## 使用的rados pool的名字 ##

    # RBD image format. Defaults to "2".
    imageFormat: "2"

    # RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
    imageFeatures: layering

    # The secrets contain Ceph admin credentials. These are generated automatically by the operator
    # in the same namespace as the cluster.
    csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
    csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
    csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
    csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
    # Specify the filesystem type of the volume. If not specified, csi-provisioner
    # will set default as `ext4`. Note that `xfs` is not recommended due to potential deadlock
    # in hyperconverged settings where the volume is mounted on the same node as the osds.
    csi.storage.k8s.io/fstype: ext4  ## rbd卷的文件系统 ##
# uncomment the following to use rbd-nbd as mounter on supported nodes
# **IMPORTANT**: If you are using rbd-nbd as the mounter, during upgrade you will be hit a ceph-csi
# issue that causes the mount to be disconnected. You will need to follow special upgrade steps
# to restart your application pods. Therefore, this option is not recommended.
#mounter: rbd-nbd
allowVolumeExpansion: true
reclaimPolicy: Delete
说明:
  1. spec.replicated.size表示存储池使用的副本,值为3且表示为3个副本。更多解释请点这里

创建storageclass

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/rbd]# kubectl  apply -f storageclass.yaml
cephblockpool.ceph.rook.io/replicapool created
storageclass.storage.k8s.io/rook-ceph-block created

创建pvc和带有pvc的pod

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/rbd]# cat pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: csirbd-demo-pod
spec:
  containers:
   - name: web-server
     image: nginx
     volumeMounts:
       - name: mypvc
         mountPath: /var/lib/www/html
  volumes:
   - name: mypvc
     persistentVolumeClaim:
       claimName: rbd-pvc
       readOnly: false
[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/rbd]# cat pvc.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/rbd]# kubectl  apply   -f pvc.yaml  -f pod.yaml
persistentvolumeclaim/rbd-pvc created
pod/csirbd-demo-pod created

查看pvc和pv的状态

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/rbd]# kubectl  get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rbd-pvc   Bound    pvc-a38b140d-cff8-4bfb-9fa6-141b207fe5f4   1Gi        RWO            rook-ceph-block   44s
[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/rbd]# kubectl  get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM             STORAGECLASS      REASON   AGE
pvc-a38b140d-cff8-4bfb-9fa6-141b207fe5f4   1Gi        RWO            Delete           Bound    default/rbd-pvc   rook-ceph-block            45s

进入pod,验证pv挂载,如下所示,在pod里可以找到rbd的挂载的卷

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/rbd]# kubectl  get pods
NAME                           READY   STATUS    RESTARTS   AGE
csirbd-demo-pod                1/1     Running   0          87s


[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/rbd]# kubectl  exec -it csirbd-demo-pod -- bash
root@csirbd-demo-pod:/# df -Th
Filesystem              Type     Size  Used Avail Use% Mounted on
overlay                 overlay   47G  9.5G   38G  21% /
tmpfs                   tmpfs     64M     0   64M   0% /dev
tmpfs                   tmpfs    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/mapper/centos-root xfs       47G  9.5G   38G  21% /etc/hosts
shm                     tmpfs     64M     0   64M   0% /dev/shm
/dev/rbd0               ext4     976M  2.6M  958M   1% /var/lib/www/html
tmpfs                   tmpfs    1.9G   12K  1.9G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                   tmpfs    1.9G     0  1.9G   0% /proc/acpi
tmpfs                   tmpfs    1.9G     0  1.9G   0% /proc/scsi
tmpfs                   tmpfs    1.9G     0  1.9G   0% /sys/firmware

cephfs文件存储

cephfs属于文件存储,通俗地理解为挂载一个远程目录(类似NFS),可实现多端同时读写。

默认rook只部署了mon/mgr/osd组件,cephfs需要额外的mds组件,需要如下的CRD资源部署

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# cat filesystem.yaml   | grep -v "#"

apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: myfs
  namespace: rook-ceph
spec:
  metadataPool:    ## 元数据存储池 ##
    replicated:
      size: 3
      requireSafeReplicaSize: true
    parameters:
      compression_mode: none
  dataPools:  ## 数据存储池 ##
    - failureDomain: host
      replicated:
        size: 3
        requireSafeReplicaSize: true
      parameters:
        compression_mode: none
  preservePoolsOnDelete: true
  metadataServer:   ## 主备模式,1主1从 ##
    activeCount: 1
    activeStandby: true
    placement:
       podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - rook-ceph-mds
            topologyKey: kubernetes.io/hostname
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - rook-ceph-mds
              topologyKey: topology.kubernetes.io/zone
    annotations:
    labels:
    resources:
说明:
  1. 创建一个myfs的cephfs的文件系统,分别定义了副本类型的元数据存储池和数据存储池,且副本数为3
  2. 创建2个mds, 为一主一从
  3. 可以通过亲和性设置,把mon pod调度到指定的节点上

创建cephfs文件系统,然后进入toolbox,验证cephfs状态

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# kubectl  apply -f filesystem.yaml

从集群状态可以看到有mds组件状态

[root@rook-ceph-tools-5949d6759-256c5 /]# ceph -s
  cluster:
    id:     a0540409-d822-48e0-869b-273936597f2d
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 35m)
    mgr: a(active, since 54m)
    mds: myfs:1 {0=myfs-a=up:active} 1 up:standby-replay  ##mds的状态,1主1从 ##
    osd: 3 osds: 3 up (since 24h), 3 in (since 24h)

  data:
    pools:   4 pools, 97 pgs
    objects: 41 objects, 21 MiB
    usage:   3.1 GiB used, 297 GiB / 300 GiB avail
    pgs:     97 active+clean

  io:
    client:   853 B/s rd, 1 op/s rd, 0 op/s wr

查看fs详情

[root@rook-ceph-tools-5949d6759-256c5 /]# ceph fs status  --format=json  | jq .
{
  "clients": [
    {
      "clients": 0,
      "fs": "myfs"
    }
  ],
  "mds_version": "ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable)",
  "mdsmap": [
    {
      "dns": 10,
      "inos": 13,
      "name": "myfs-a",
      "rank": 0,
      "rate": 0,
      "state": "active"
    },
    {
      "dns": 5,
      "events": 0,
      "inos": 5,
      "name": "myfs-b",
      "rank": 0,
      "state": "standby-replay"
    }
  ],
  "pools": [
    {
      "avail": 100898840576,
      "id": 5,
      "name": "myfs-metadata",
      "type": "metadata",
      "used": 1572864
    },
    {
      "avail": 100898840576,
      "id": 6,
      "name": "myfs-data0",
      "type": "data",
      "used": 0
    }
  ]
}
说明:
  1. 元数据存储池的名字是myfs-metadata
  2. 数据存储池的名字是myfs-data0

创建cephfs类型的storageclass

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/cephfs]# cat storageclass.yaml

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  # clusterID is the namespace where operator is deployed.
  clusterID: rook-ceph

  # CephFS filesystem name into which the volume shall be created
  fsName: myfs

  # Ceph pool into which the volume shall be created
  # Required for provisionVolume: "true"
  pool: myfs-data0

  # Root path of an existing CephFS volume
  # Required for provisionVolume: "false"
  # rootPath: /absolute/path

  # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  # in the same namespace as the cluster.
  ## 使用到的相关的密钥 ##
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

  # (optional) The driver can use either ceph-fuse (fuse) or ceph kernel client (kernel)
  # If omitted, default volume mounter will be used - this is determined by probing for ceph-fuse
  # or by setting the default mounter explicitly via --volumemounter command-line argument.
  # mounter: kernel
reclaimPolicy: Delete   ## 回收策略 ##
allowVolumeExpansion: true
mountOptions:  ### 挂载定制参数 ##
  # uncomment the following line for debugging
  #- debug
说明:
  1. fsName设置cephfs文件系统的名字,根据前文部署,指定为myfs
  2. pool设置数据存储池的名字,根据前文部署,指定为myfs-data0

查看是否成功能创建storageclass

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/cephfs]# kubectl  get sc
NAME              PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block   rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   19h
rook-cephfs       rook-ceph.cephfs.csi.ceph.com   Delete          Immediate           true                   12h

创建pvc和pod,并且把pvc挂载在pod里

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/cephfs]# cat pod.yaml
---
apiVersion: v1
kind: Pod
metadata:
  name: csicephfs-demo-pod
spec:
  containers:
   - name: web-server
     image: nginx
     volumeMounts:
       - name: mypvc
         mountPath: /var/lib/www/html
  volumes:
   - name: mypvc
     persistentVolumeClaim:
       claimName: cephfs-pvc
       readOnly: false
[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/cephfs]# cat pvc.yaml
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-cephfs

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/cephfs]# kubectl apply -f pvc.yaml   -f pod.yaml
persistentvolumeclaim/cephfs-pvc created
pod/csicephfs-demo-pod created

如果全部正常,pod应该是running状态,如下所示

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/cephfs]# kubectl  get pod
NAME                                         READY   STATUS    RESTARTS   AGE
csicephfs-demo-pod                           1/1     Running   0          23m   ##  启动正常 ##
csirbd-demo-pod                              1/1     Running   0          25h

进入pod,查看挂载情况

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph/csi/cephfs]# kubectl  exec -it  csicephfs-demo-pod -- df -Th
Filesystem                                                                                                                                           Type     Size  Used Avail Use% Mounted on
overlay                                                                                                                                              overlay   10G  6.0G  4.1G  60% /
tmpfs                                                                                                                                                tmpfs     64M     0   64M   0% /dev
tmpfs                                                                                                                                                tmpfs    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/mapper/vg_root-lv_root                                                                                                                          xfs       10G  6.0G  4.1G  60% /etc/hosts
shm                                                                                                                                                  tmpfs     64M     0   64M   0% /dev/shm
10.96.176.163:6789,10.96.146.28:6789,10.96.31.17:6789:/volumes/csi/csi-vol-e45c556b-528d-11ec-97cf-222a3fe2a760/69824d96-cdeb-4602-a770-1c4422db4a34 ceph     1.0G     0  1.0G   0% /var/lib/www/html
tmpfs                                                                                                                                                tmpfs    1.9G   12K  1.9G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                                                                                                                                tmpfs    1.9G     0  1.9G   0% /proc/acpi
tmpfs                                                                                                                                                tmpfs    1.9G     0  1.9G   0% /proc/scsi
tmpfs                                                                                                                                                tmpfs    1.9G     0  1.9G   0% /sys/firmware

如上所示,1G的pv已经成功挂载到指定的目录 ,挂载的文件系统类型为ceph,符合预期

S3对象存储

Rook可以直接在当前是的Ceph环境里部署RGW实例,也可以对接已经存在的外部的Ceph集群,详情点击查看

以下是rook提供的CephObjectStore类型的CRD资源yaml文件,通过kubectl apply 直接创建。

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# cat object.yaml  | grep -v "#"

apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
  name: my-store
  namespace: rook-ceph
spec:
  metadataPool:  ## 索引存储池配置 ##
    failureDomain: host  ## 故障域为host级 ##
    replicated:  ## 副本策略 ##
      size: 3    ## 副本数 ##
      requireSafeReplicaSize: true
    parameters:
      compression_mode: none
  dataPool:      ## 数据存储池配置 ##
    failureDomain: host
    replicated:
      size: 3
      requireSafeReplicaSize: true
    parameters:
      compression_mode: none
  preservePoolsOnDelete: false
  gateway:
    type: s3     ##网关类型是S3 ##
    sslCertificateRef:
    port: 80           ## rgw实例端口 ##
    instances: 1       ## 实例个数 ##
    placement:
    annotations:
    labels:
    resources:
  healthCheck:
    bucket:
      disabled: false
      interval: 60s
    livenessProbe:
      disabled: false

成功创建之后会有如下的service,通过service ip和端口验证S3服务.服务正常就会出现如下返回。

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# kubectl -n rook-ceph get svc -l app=rook-ceph-rgw
NAME                     TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
rook-ceph-rgw-my-store   ClusterIP   10.96.57.91   <none>        80/TCP    174m

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# curl 10.96.57.91
<?xml version="1.0" encoding="UTF-8"?><ListAllMyBucketsResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>anonymous</ID><DisplayName></DisplayName></Owner><Buckets></Buckets></ListAllMyBucketsResult>

至此, 基于ceph rgw的S3对象存储服务已经正常运行.传统模式下,可以通过radosgw-admin工具或者API创建相关的用户及密钥对。详情见 https://docs.ceph.com/en/pacific/radosgw/admin/

既然已经配置了对象存储,接下来我们需要创建一个存储桶(即bucket,下同),客户端可以在其中读写对象。可以通过定义StorageClass来创建存储桶,类似于块和文件存储使用的模式。首先,定义允许对象客户端创建存储桶的存储类。StorageClass定义了对象存储系统、存储桶保留策略以及管理员所需的其他属性。

以下是S3类型的storageclass。通过kubectl apply创建即可。

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# cat storageclass-bucket-retain.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-retain-bucket
provisioner: rook-ceph.ceph.rook.io/bucket
# set the reclaim policy to retain the bucket when its OBC is deleted
reclaimPolicy: Retain  ## 回收策略 ##
parameters:
  objectStoreName: my-store # port 80 assumed
  objectStoreNamespace: rook-ceph
  region: us-east-1  ## 默认即可 ##
  # To accommodate brownfield cases reference the existing bucket name here instead
  # of in the ObjectBucketClaim (OBC). In this case the provisioner will grant
  # access to the bucket by creating a new user, attaching it to the bucket, and
  # providing the credentials via a Secret in the namespace of the requesting OBC.
  #bucketName:
说明:
  1. 定义的回收策略是持久化,不随申请者的生命周期终结而自动清除
  2. objectStoreName指定CephObjectStore类型的CRD的名字

用户端要使用对象存储资源,可以通过创建ObjectBucketClaim(即OBC,可类比于PVC)类型的CRD资源进行申请。创建成功之后会在当前的命名空间下面一个对应名字的Secret,里面的包含用户需要密钥对(access_key和secret_key)。

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# cat <<EOF | kubectl apply -f - 
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
  name: ceph-bucket
spec:
  generateBucketName: ceph-bucket-demo 
  storageClassName: rook-ceph-retain-bucket
EOF
注意:
  1. storageClassName指向上文创建的StorageClass的名字
  2. generateBucketName指定创建的bucket前缀名

以下方式可以直接获取相关的密钥对

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# kubectl -n default get secret ceph-bucket  -o jsonpath='{.data.AWS_ACCESS_KEY_ID}' | base64 -d
xxxxxxxxxxxxxx    ######## 这里是access_key ######

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# kubectl -n default get secret ceph-bucket  -o jsonpath='{.data.AWS_SECRET_ACCESS_KEY}' | base64 -d
xxxxxxxxxxxxxx   ######## 这里secret_key #########

在这个命名空间下面还会成一个同名的ConfigMap,里面包含了bucket和endpoint的信息。

[root@k8s-master-node1 /opt/rook/cluster/examples/kubernetes/ceph]# kubectl -n default get cm  ceph-bucket -o jsonpath='{.data}' | jq .
{
  "BUCKET_HOST": "rook-ceph-rgw-my-store.rook-ceph.svc",
  "BUCKET_NAME": "ceph-bucket-demo-aaa69329-40db-467b-81d6-dd4f6585ebfa",  ## 符合预期 ##
  "BUCKET_PORT": "80",
  "BUCKET_REGION": "us-east-1",
  "BUCKET_SUBREGION": ""
}
注意:
  1. 关于bucket的详情,可以通过radosgw-admin进行查看
  2. 关于验证s3的使用,在此不再详细列举

小结

以上是我对Rook初次体验过程的简单记录,下面谈谈个人感受吧。

  1. Rook部署确实快, 一把梭各种组件直接都running了。但这只是理想状态,不同的环境都有差异,只要一个环节出现异常,整个排障周期会延长很多。(以上的体验过程中,遇到了各种小问题)
  2. 对于不熟悉的Ceph的人,重头开始部署一套的成本是很大的,有这样能够一键部署的工具(或者说方案),确实带来了很大的便利。但只是图部署快, 那只能是demo或者验证环境, 离真正的线上环境还差得很远。”部署一时爽,运维火葬场“,在没有真正理解整个Rook的内在运行逻辑及架构时,我是不会断然为云原生而上这套系统。作为存储管理员,应该有这个意识,存储是最核心的东西,数据的价值是最关键的。俗话说地好,“计算来了就走,存储一去不回”。计算任务(系统)可以重试、重启,但存储系统不能随意重试、重启操作,一但操作异常就有可能造不可挽回的损失,即数据丢失。
  3. Rook本身也有自己的管理端和agent端,再加上为配合PV/PVC实现的Ceph-CSI组件,非存储相关的组件就是一大堆。这大大增加了后期管理及维护成本。如果涉及到存储端的容器化(这里以Ceph为例),这对于存储系统来讲可能是雪上加霜。Ceph存储系统本身复杂程度已经非普通应用系统可比,自身组件就很多,比如MON、MGR、OSD、RGW、MDS等。容器化后,又要加上运行时环境,这个对于存储本身后期运维和排障又增加了负担。容器化对Ceph的是否有必要,其时社区也有激烈的讨论。详请见Why you might want packages not containers for Ceph deployments(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/TTTYKRVWJOR7LOQ3UCQAZQR32R7YADVY/) 
  4. 原来问题排障可以直接到固定的日志文件里搜寻线索。云原生之后日志都是标准输出,如果没有统一的日志平台,对于排查问题又会带来阻碍。对于习惯于从系统层面的日志里定位的Ceph存储管理员来讲,容器化之后找东西显得很是变扭。
  5. 对于要实现对于Ceph集群定制与优化,基于Rook部署之后, 这些都要通过符合Rook的规范及约束来实现。有些可能当前Rook不一定支持。所以,Rook对于Ceph存储的管理员来讲,又要增加新的学习成本,甚至需要对Rook进行定制,来达到符合自己的生产需求。

综上,个人觉得,如果你是云上的K8S集群,直接使用云厂商提供的PV/PVC就好。对于自建的集群,有持久化存储的需求,如果有专业的存储团队,可以把存储方案让他们来搞,至于是对接方式, 让存储团队来抉择,缓解K8S SRE的压力。专业的人干专业的事。

云原生是趋势,应用(不论是无状态还是有状态的)围绕着云原生建设是不可阻挡的洪流。Rook项目任重道远,革命尚未成功,同志仍需努力。

kubernetesceph
本作品采用《CC 协议》,转载必须注明作者和本文链接
Rook 将存储软件转变成自我管理、自我扩展和自我修复的存储服务,通过自动化部署、启动、配置、供应、扩展、升级、迁移、灾难恢复、监控和资源管理来实现。Rook 利用扩展功能将其深度集成到云原生环境中,并为调度、生命周期管理、资源管理、安全性、监控等提供了无缝的体验。
Kubernetes通常被称为“K8s”,是一种非常流行的开源容器编排系统,可以自动部署、扩展和管理容器化工作负载。
本文将引入一个思路:“在 Kubernetes 集群发生网络异常时如何排查”。文章将引入 Kubernetes 集群中网络排查的思路,包含网络异常模型,常用工具,并且提出一些案例以供学习。其可能原因为Pod 的 DNS 配置不正确DNS 服务异常pod 与 DNS 服务通讯异常大数据包丢包:主要现象为基础网络和端口均可以连通,小数据包收发无异常,大数据包丢包。
尽管Kubernetes的默认设置为开发人员赋予了较大的灵活性和敏捷性,但是却没有考虑安全防护层面的需求。为了保护Kubernetes上应用数据的安全,企业必须确保对集群配置的合理性和安全性,以获得更充分的安全防护能力。企业在Kubernetes应用中,不能忽视保护集群网络中的应用系统,并对重要业务系统及数据实现隔离防护。
保留的这部分资源主要提供给系统进程使用。cpuManager 当前的限制:最大 numa node 数不能大于 8,防止状态爆炸。策略只支持静态分配 cpuset,未来会支持在容器生命周期内动态调整 cpuset。下文有介绍相应的提案。支持这种场景需要对 CPU 进行分组分配。
跨节点Pod通信则是三层虚拟网络设备Tun,也就是flannel0。同理目的主机就会有UDP解包及转发至Pod服务。还有VXLAN模式支持DirectRouting配置,DirectRouting=true是支持在相同子网情况下数据包直接通过路由转发,与HOST-GW模式相同。但是HOST-GW模式只支持宿主机之间二层连接,要求集群中所以节点必须处于同一个网络中
序从 2021 年 10 月开始,NGINX 的 Kubernetes Ingress Controller开始受到安全研究人员的关注。曾披露了CVE-2021-25742漏洞:攻击者可以通过定制化的Snippets特性创建或修改集群中的Ingress实例,从而获取集群中所有的Secret实例信息。
Kubernetes 是一个开源容器编排系统,用于自动化软件部署、扩展和管理。Shadowserver 基金会开始扫描可访问的 Kubernetes API 实例,这些实例以 200 OK HTTP 响应对探测器进行响应。
满足安全与合规要求已成为部署和管理Kubernetes的头号挑战。缺乏对Kubernetes最佳实践的了解以及由此产生的错误配置会给云原生应用安全带来巨大的威胁。
Kubernetes部署应用
2022-05-11 13:36:57
STATEMENT声明由于传播、利用此文所提供的信息而造成的任何直接或者间接的后果及损失,均由使用者本人负责,雷神众测及文章作者不为此承担任何责任。雷神众测拥有对此文章的修改和解释权。如欲转载或传播此文章,必须保证此文章的完整性,包括版权声明等全部内容。未经雷神众测允许,不得任意修改或者增减此文章内容,不得以任何方式将其用于商业目的。
一颗小胡椒
暂无描述