Longhorn
- 是什么?
- Kubernetes 分布式块存储服务
- 增量快照
- 二级备份 - NFS、S3
- 快照、备份恢复
- 平滑升级
- 内建 UI
 
- longhorn/longhorn
- 企业级分布式块存储
 
- 要求
- 安装要求
- docker 1.13+
- k8s 1.14+
- 默认 3 副本 - 因此需要 3 节点,node level soft anti-affinity is disabled.
- open-iscsi 已安装,所有节点启动 iscsid daemon
- 支持 file extents 特性的文件系统 ext4 XFS
- curl, findmnt, grep, awk, blkid, lsblk
- 启动 Mount propagation
- k3s 需要额外配置
 
- 最低硬件要求
- 3 节点
- 4 vCPUs 每节点
- 4 GiB 每节点
- SSD/NVME 不推荐机械硬盘 - IOPS 低
 
 
- 安装要求
- 注意 ⚠️
- 数据对应关系
- PVC -> PV -> Volume -> Replica -> Node
 
caution
- v1.1.0 单机部署每次重启后需要 打捞/salvage 之前的 replica
- 开启了 Automatic salvage 发现还是需要手动
- v1.1.1 修复 #2309
 
curl -sSfLO https://raw.githubusercontent.com/longhorn/longhorn/master/scripts/environment_check.sh
apk add jq curl findmnt grep awk coreutils util-linux
# 会使用现在的环境进行检测 - kubectl apply
bash environment_check.sh
配置
- https://longhorn.io/docs/1.1.0/references/settings/
- 配置在 UI 上修改后在初次部署之前修改
# 配置说明 https://longhorn.io/docs/1.1.0/references/settings
# 备份
# ============
# 备份位置 - 支持 nfs 和 s3
# 例如 s3://backupbucket@us-east-1/backupstore
backup-target:
# 密钥信息
backup-target-credential-secret:
# 拉上次备份信息的间隔 - 用于恢复
backupstore-poll-interval: 300
# 调度
# ============
# 设置为 true 则允许单个节点部署多个副本
replica-node-level-soft-anti-affinity: false
replica-soft-anti-affinity:
# 存储分配允许超过的比例
storage-over-provisioning-percentage: 200
# 磁盘最小可用比例
storage-minimal-available-percentage: 25
# 不会调度到 Kubernetes cordoned nodes
disable-scheduling-on-cordoned-node: true
# 允许副本在相同区
replica-zone-soft-anti-affinity: true
# 危险区域
# ============
# 为引擎预留 CPU
# 0.25 * 8 = 2 vCPUs - 每个节点至少 2 vCPU
guaranteed-engine-cpu: 0.25
create-default-disk-labeled-nodes:
default-data-path:
upgrade-checker:
default-replica-count:
default-longhorn-static-storage-class:
taint-toleration:
registry-secret:
auto-salvage:
volume-attachment-recovery-policy:
mkfs-ext4-parameters:
volume 配置
# Longhorn 清理 ERROR Replica 时间
# 单位分钟 - 默认 2880 / 48 小时
staleReplicaTimeout: 2880
资源
- lhe Engine - Volume 对应的 Engine
- lhr Replica - 卷副本
- lhs Setting - 启动后每个配置项会映射为一个 Setting 资源
- lhv Volume - 定义卷
- lhei EngineImage - 引擎镜像
- image: 'longhornio/longhorn-engine:v1.1.0'
 
- lhn Node - 节点信息
- lhim - Instance Manager - 引擎实例和资源实例
- 引擎实例 - 每个节点运行
- 资源实例 - 每个 Volume 在节点上的每个 Replica
 
- ShareManager - 通过 NFS 支持 RWM
安装
- kubectl 安装
- 创建 longhorn-system命名空间
- 创建 longhorn-service-account服务账号
- 创建集群角色 longhorn-role- 授权给 longhorn-service-account
 
- 授权给 
- 自定义资源定义 - longhorn.io/v1beta1
- 创建配置 longhorn-default-setting
- 创建 DaemonSet - longhorn-manager- 镜像 longhornio/longhorn-manager
- 启动参数
- longhorn-manager
- -d
- daemon
- --engine-image
- longhornio/longhorn-engine:v1.0.0
- --instance-manager-image
- longhornio/longhorn-instance-manager:v1_20200514
- --manager-image
- longhornio/longhorn-manager:v1.0.0
- --service-account
- longhorn-service-account
 
- 端口 9500 - manager
- 挂载卷
- /host/dev- dev -- /dev
- /host/proc- proc -- /proc
- /var/run- varrun -- /var/run
- /var/lib/longhorn/- longhorn -- /var/lib/longhorn/
- /var/lib/longhorn-setting/- longhorn-default-setting
 
- 环境变量
- DEFAULT_SETTING_PATH=/var/lib/longhorn-setting/default-setting.yaml
 
 
- 创建服务 longhorn-backend指向longhorn-manager- 端口 9500
 
- 部署 longhorn-ui- 镜像 longhornio/longhorn-ui
- 端口 8000
- 环境变量 LONGHORN_MANAGER_IP=http://longhorn-backend:9500
 
- 创建服务 longhorn-frontend指向longhorn-ui- 端口 80
 
- 部署 longhorn-driver-deployer- 初始镜像 longhornio/longhorn-manager
- 镜像 longhornio/longhorn-manager
- longhorn-manager
- -d
- deploy-driver
- --manager-image
- longhornio/longhorn-manager:v1.0.0
- --manager-url
- http://longhorn-backend:9500/v1
 
 
- 创建 StorageClass longhorn- 这一步参数可选择那些节点存储
 
# 准备
sudo apk add open-iscsi
sudo service iscsid start
sudo apk add curl findmnt grep gawk blkid lsblk util-linux
# HELM 安装
# ==========
git clone https://github.com/longhorn/longhorn && cd longhorn
helm install longhorn ./longhorn/chart/ --namespace longhorn-system --create-namespace
# 手动安装
# ==========
# 安装
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml
# 或者下载下来安装
curl -LOC- https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml
# 可以修改部分参数 - 例如存储节点选择、是否运行单节点执行
kubectl applu -f longhorn.yaml
# 查看安装状态
kubectl get pods \
--namespace longhorn-system \
--watch
# 安装内容
kubectl -n longhorn-system get pod
FAQ
Error response from daemon: path /var/lib/longhorn is mounted on / but it is not a shared mount
- Fail to start longhorn with k3d #206
# 这个位置是双向挂载
name: longhorn
mountPath: /var/lib/longhorn/
mountPropagation: Bidirectional
# 将 root 修改为 share
sudo mount --make-rshared /
# sudo mount --make-rshared /var/lib/longhorn/
failed to start expansion: controller data doesn't support on-line expansion, frontend: tgt-blockdev
可能是由于挂载的时候进行扩容导致,如果一直不恢复,尝试 detache
The volume volume share should be available before the mount
Volume volume hasn't been attached yet
snapshot vs backup
- snapshot
- 本地 Revision
- Delta 修改
- 跟随 Volume - 删除 Volume 则删除了 Snapshot
 
- backup
- 数据存储在外部 - S3/NFS
- 不受集群状态影响
- 备份基于快照 - backup 之前会创建 snapshot
 
controller doesn't support on-line expansion, frontend: tgt-blockdev
监控
示例
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: longhorn-block-vol
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Block
  storageClassName: longhorn
  resources:
    requests:
      storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: block-volume-test
  namespace: default
spec:
  containers:
    - name: block-volume-test
      image: nginx:stable-alpine
      imagePullPolicy: IfNotPresent
      volumeDevices:
        - devicePath: /dev/longhorn/testblk
          name: block-vol
      ports:
        - containerPort: 80
  volumes:
    - name: block-vol
      persistentVolumeClaim:
        claimName: longhorn-block-vol
CRD
# PATCH 修改 Tag
apiVersion: longhorn.io/v1beta1
kind: Node
metadata:
  name: my-node-1
  namespace: longhorn-system
spec:
  tags:
    - node.can.longhorn
---
# 定义卷
apiVersion: longhorn.io/v1beta1
kind: Volume
metadata:
  name: test
  namespace: longhorn-system
  labels:
    longhornvolume: test
spec:
  Standby: false
  baseImage: ''
  fromBackup: ''
  disableFrontend: false
  diskSelector: []
  # 最好指定 - 否则会出现找不到 engine
  engineImage: 'longhornio/longhorn-engine:v1.0.0'
  frontend: blockdev
  nodeSelector:
    - node.can.longhorn
  numberOfReplicas: 3
  recurringJobs: null
  size: '20000000'
  staleReplicaTimeout: 20
kind: Volume
apiVersion: longhorn.io/v1beta1
metadata:
  name: test
  namespace: longhorn-system
  labels:
    longhornvolume: test
spec:
  Standby: false
  accessMode: rwx
  baseImage: ''
  dataLocality: best-effort
  disableFrontend: false
  diskSelector: []
  engineImage: 'longhornio/longhorn-engine:v1.1.0'
  fromBackup: ''
  frontend: blockdev
  lastAttachedBy: ''
  nodeID: ''
  nodeSelector: []
  numberOfReplicas: 1
  recurringJobs:
    - cron: 0 0/6 * * ?
      labels: null
      name: c-75f2xa
      retain: 5
      task: backup
    - cron: 0 1 * * *
      labels: null
      name: c-yywuyn
      retain: 3
      task: snapshot
  revisionCounterDisabled: false
  # 20G
  size: '21474836480'
  staleReplicaTimeout: 20
# PV
---
kind: PersistentVolume
apiVersion: v1
metadata:
  name: test
spec:
  capacity:
    storage: 20Gi
  csi:
    driver: driver.longhorn.io
    volumeHandle: test
    fsType: ext4
    volumeAttributes:
      diskSelector: ''
      nodeSelector: ''
      numberOfReplicas: '1'
      staleReplicaTimeout: '20'
  accessModes:
    - ReadWriteMany
  claimRef:
    kind: PersistentVolumeClaim
    namespace: default
    name: test
    apiVersion: v1
    resourceVersion: '147682602'
  persistentVolumeReclaimPolicy: Retain
  storageClassName: longhorn-static
  volumeMode: Filesystem
# PVC
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: test
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
  volumeName: test
  storageClassName: longhorn-static
  volumeMode: Filesystem