Skip to main content

Ceph

Tips#

  • 适用场景
    • 单独存储集群
    • 大存储集群 - TB+ PB+
  • 不适用场景
    • 不太适合于直接部署到 k8s 小集群作为微服务存储使用
    • 简单的分布式文件系统
  • ceph/ceph
  • imixs Ceph
  • 默认日志路径为 /var/lib/ceph/osd/$cluster-$id/journal, 可将该文件挂在到其它磁盘以增加性能
  • Ceph 使用副本或 EC 来保护数据
  • CephFS 默认端口为 6789
  • 参考

最佳实践

  • 每个 OSD 预留 2G 内存,每个 OSD 一个磁盘,冗余 NIC
  • 如果有 SSD, 可将 SSD 作为日志磁盘, 使用机械盘作为存储磁盘, 如果 SSD 较少,可多个 OSD 共用一个 SSD(一个 SSD 损坏影响多个 OSD)
  • 使用 SSD 前最好事先测试出顺序和随机读写的性能
  • MON 位于单独服务器, 不需要太多的 RAM 和磁盘,冗余 NIC
  • 如果单个主机上 OSD 较多(例如 >20),建议增大系统线程数 kernel.pid_max = 4194303
  • 推荐硬件
    • 不推荐和计算集群一起部署 - k8s
    • ceph-mds - metadata
      • 4 CPU+
      • 1GB+
    • ceph-osd
      • 2 CPU+
      • 默认 4GB, 2-4GB 可用,不推荐小于 2GB
      • 小文件多建议 4GB+,大数据传输建议 256GB+
      • 磁盘推荐 1TB+
      • 不推荐 1硬盘 1OSD 1MSD,不推荐 1硬盘多OSD
      • 分离 日志、数据、系统 盘
      • 推荐使用 SSD 作为 日志 盘
      • 确保网络带宽大于总磁盘带宽
      • 10Gbps+ - 1T 副本 20分钟
    • ceph-mgr
      • 1-2 GB 小集群,5-10 GB 大集群 - 随集群扩容 - 基本 1G/1T
    • ceph-mon - monitor
      • CPU 要求不高
      • 1-2 GB 小集群,5-10 GB 大集群 - 同 ceph-mgr
  • 参考 推荐操作系统
对象存储/rados块设备/RBD文件系统/CephFS
REST 接口Thin-provisioned符合 POSIX 语义
S3、Swift 兼容接口16 EB元数据数据分离
S3 子域名Configurable stripingConfigurable striping
S3、Swift 命名空间内存缓存动态平衡
用户管理快照子目录快照
使用跟踪COW 克隆FUSE 支持
Striped objects内核驱动支持内核驱动支持
云集成KVM/libvirt 支持可部署 NFS/CIFS
多站点部署后端云方案集成可配合 Hadoop - 替代 HDFS
多站点副本增量备份
灾难恢复 - 多站点异步副本

所有的命令

ceph                   ceph-create-keys       ceph-detect-init       ceph-fuse              ceph-post-fileceph-authtool          ceph-crush-location    ceph-disk              ceph-mds               ceph-rbdnamerceph-bluefs-tool       ceph-debugpack         cephfs-data-scan       ceph-mon               ceph-rest-apiceph-clsinfo           ceph-dencoder          cephfs-journal-tool    ceph-objectstore-tool  ceph-runceph-conf              ceph-deploy            cephfs-table-tool      ceph-osd               ceph-syn
# 查看磁盘类型file -sL /dev/vda1df -T
# XFS 测试apt-get install xfsprogs# 使用文件创建 xfstruncate -s 2G data.vol# 也可挂载为设备# losetup /dev/loop5 data.vol
# 格式化 n 创建分区 p 主分区 w 写入fdisk data.vol# 格式化为 xfsmkfs.xfs -f data.vol# 挂载mkdir /storagemount -t xfs data.vol /storage# 查看文件系统df -Th /storage
########### RADOS##########rados mkpool data # 创建 Poolecho Test-data > testfile.txtrados put tf.txt testfile.txt -p data # 放入文件rados -p data ls # 查看对象ceph osd map data tf.txt # 找到存储的对象映射
rados df # 查看使用量
rados lspools # 查看所有的 rados poolrados -p metadata ls # 查看某个 pool 中的对象
# 获取 pg pgp 数量ceph osd pool get data pgp_numceph osd pool get data pg_num# 查看副本数量ceph osd dump | grep size

########### RDB########### 挂载 RBD# 尽量使用 3.5+ 内核版本uname -rmodprobe rbd
rbd create rbd-1rbd info rbd-1rbd map rbd/rbd-1# 查看映射的设备rbd showmapped# 挂载文件系统mkfs.btrfs /dev/rbd0mkdir -p /mnt/rbd/rbd-1mount /dev/rbd0 /mnt/rbd/rbd-1# 然后就可以使用了# docker run --name redis -v /mnt/rbd/rbd-1:/data -d redis redis-server --appendonly yes# docker exec -it redis redis-cli set a 1# docker exec -it redis redis-cli keys *# cat /mnt/rbd/rbd-1/appendonly.aof
# 映射可能会出现由于 krbd 不支持 ceph rbd 特性的错误# 查看 Ceph 打开的特性ceph --show-config|grep rbd|grep features# layering      | 1# striping      | 2# exclusive-lock| 3# object-map    | 8# fast-diff     | 16# deep-flatten  | 32
# 关闭相关特性rbd feature disable rbd/rbd-1 deep-flattenrbd feature disable rbd/rbd-1 fast-diffrbd feature disable rbd/rbd-1 object-maprbd feature disable rbd/rbd-1 exclusive-lock# 也可以使用# for i in deep-flatten fast-diff object-map exclusive-lock; do rbd feature disable rbd/rbd-1 $i; done
# 可以在配置中添加 rbd_default_features = 3 以修改创建出的 rbd 特性# 到 4.6 为止,依然只支持 1+2
# CephFSceph-deploy mds create ceph-node1
ceph osd pool create fs_data 128 128ceph osd pool create fs_metadata 128 128
ceph fs new cephfs fs_data fs_metadataceph fs lsceph mds stat
mkdir -p /mnt/cfs# 查看 secret 值cat ceph.client.admin.keyringmount -t ceph 地址:6789:/ /mnt/cfs -o name=admin,secret=上面的secret值# 也可以将 secret 放在文件中# mount -t ceph 地址:6789:/ /mnt/cfs -o name=admin,secretfile=文件路径

Troubleshooting#

  • 日志目录 /var/log/ceph
  • Admin Socket /var/run/ceph
    • 使用上述的 socket 来操作
    • ceph daemon mon.node-2 help
    • ceph daemon /var/run/ceph/ceph-mon.hd2-2.asok help
  • 查看 systemd 操作日志 journalctl -xe
  • 配置数据存储目录 /var/lib/ceph/

http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/

技术比较#

vs HDFS#

  • HDFS
    • Name Node 会造成单点,且是整个系统中的瓶颈
  • Ceph
    • 集群为 AA 模式, 无单点
    • CRUSH 使得对象存储不会存在中心服务,架构中无瓶颈

命令行#

$ rbd create -husage: rbd <command> ...
Command-line interface for managing Ceph RBD images.
Positional arguments:  <command>    bench-write                 Simple write benchmark.    children                    Display children of snapshot.    clone                       Clone a snapshot into a COW child image.    copy (cp)                   Copy src image to dest.    create                      Create an empty image.    diff                        Print extents that differ since a previous                                snap, or image creation.    disk-usage (du)             Show disk usage stats for pool, image or                                snapshot    export                      Export image to file.    export-diff                 Export incremental diff to file.    feature disable             Disable the specified image feature.    feature enable              Enable the specified image feature.    flatten                     Fill clone with parent data (make it                                independent).    image-meta get              Image metadata get the value associated with                                the key.    image-meta list             Image metadata list keys with values.    image-meta remove           Image metadata remove the key and value                                associated.    image-meta set              Image metadata set key with value.    import                      Import image from file.    import-diff                 Import an incremental diff.    info                        Show information about image size, striping,                                etc.    journal export              Export image journal.    journal import              Import image journal.    journal info                Show information about image journal.    journal inspect             Inspect image journal for structural errors.    journal reset               Reset image journal.    journal status              Show status of image journal.    list (ls)                   List rbd images.    lock add                    Take a lock on an image.    lock list (lock ls)         Show locks held on an image.    lock remove (lock rm)       Release a lock on an image.    map                         Map image to a block device using the kernel.    merge-diff                  Merge two diff exports together.    mirror image demote         Demote an image to non-primary for RBD                                mirroring.    mirror image disable        Disable RBD mirroring for an image.    mirror image enable         Enable RBD mirroring for an image.    mirror image promote        Promote an image to primary for RBD mirroring.    mirror image resync         Force resync to primary image for RBD mirroring.    mirror image status         Show RDB mirroring status for an image.    mirror pool disable         Disable RBD mirroring by default within a pool.    mirror pool enable          Enable RBD mirroring by default within a pool.    mirror pool info            Show information about the pool mirroring                                configuration.    mirror pool peer add        Add a mirroring peer to a pool.    mirror pool peer remove     Remove a mirroring peer from a pool.    mirror pool peer set        Update mirroring peer settings.    mirror pool status          Show status for all mirrored images in the pool.    nbd list (nbd ls)           List the nbd devices already used.    nbd map                     Map image to a nbd device.    nbd unmap                   Unmap a nbd device.    object-map rebuild          Rebuild an invalid object map.    remove (rm)                 Delete an image.    rename (mv)                 Rename image within pool.    resize                      Resize (expand or shrink) image.    showmapped                  Show the rbd images mapped by the kernel.    snap create (snap add)      Create a snapshot.    snap list (snap ls)         Dump list of image snapshots.    snap protect                Prevent a snapshot from being deleted.    snap purge                  Deletes all snapshots.    snap remove (snap rm)       Deletes a snapshot.    snap rename                 Rename a snapshot.    snap rollback (snap revert) Rollback image to snapshot.    snap unprotect              Allow a snapshot to be deleted.    status                      Show the status of this image.    unmap                       Unmap a rbd device that was used by the kernel.    watch                       Watch events on image.
Optional arguments:  -c [ --conf ] arg     path to cluster configuration  --cluster arg         cluster name  --id arg              client id (without 'client.' prefix)  --user arg            client id (without 'client.' prefix)  -n [ --name ] arg     client name  -m [ --mon_host ] arg monitor host  --secret arg          path to secret key (deprecated)  -K [ --keyfile ] arg  path to secret key  -k [ --keyring ] arg  path to keyring
See 'rbd help <command>' for help on a specific command.
$ ceph -h
 General usage: ==============usage: ceph [-h] [-c CEPHCONF] [-i INPUT_FILE] [-o OUTPUT_FILE]            [--id CLIENT_ID] [--name CLIENT_NAME] [--cluster CLUSTER]            [--admin-daemon ADMIN_SOCKET] [--admin-socket ADMIN_SOCKET_NOPE]            [-s] [-w] [--watch-debug] [--watch-info] [--watch-sec]            [--watch-warn] [--watch-error] [--version] [--verbose] [--concise]            [-f {json,json-pretty,xml,xml-pretty,plain}]            [--connect-timeout CLUSTER_TIMEOUT]
Ceph administration tool
optional arguments:  -h, --help            request mon help  -c CEPHCONF, --conf CEPHCONF                        ceph configuration file  -i INPUT_FILE, --in-file INPUT_FILE                        input file  -o OUTPUT_FILE, --out-file OUTPUT_FILE                        output file  --id CLIENT_ID, --user CLIENT_ID                        client id for authentication  --name CLIENT_NAME, -n CLIENT_NAME                        client name for authentication  --cluster CLUSTER     cluster name  --admin-daemon ADMIN_SOCKET                        submit admin-socket commands ("help" for help  --admin-socket ADMIN_SOCKET_NOPE                        you probably mean --admin-daemon  -s, --status          show cluster status  -w, --watch           watch live cluster changes  --watch-debug         watch debug events  --watch-info          watch info events  --watch-sec           watch security events  --watch-warn          watch warn events  --watch-error         watch error events  --version, -v         display version  --verbose             make verbose  --concise             make less verbose  -f {json,json-pretty,xml,xml-pretty,plain}, --format {json,json-pretty,xml,xml-pretty,plain}  --connect-timeout CLUSTER_TIMEOUT                        set a timeout for connecting to the cluster
 Monitor commands: =================[Contacting monitor, timeout after 5 seconds]auth add <entity> {<caps> [<caps>...]}   add auth info for <entity> from input                                          file, or random key if no input is                                          given, and/or any caps specified in                                          the commandauth caps <entity> <caps> [<caps>...]    update caps for <name> from caps                                          specified in the commandauth del <entity>                        delete all caps for <name>auth export {<entity>}                   write keyring for requested entity, or                                          master keyring if none givenauth get <entity>                        write keyring file with requested keyauth get-key <entity>                    display requested keyauth get-or-create <entity> {<caps>      add auth info for <entity> from input [<caps>...]}                             file, or random key if no input given,                                          and/or any caps specified in the                                          commandauth get-or-create-key <entity> {<caps>  get, or add, key for <name> from [<caps>...]}                             system/caps pairs specified in the                                          command.  If key already exists, any                                          given caps must match the existing                                          caps for that key.auth import                              auth import: read keyring file from -i                                          <file>auth list                                list authentication stateauth print-key <entity>                  display requested keyauth print_key <entity>                  display requested keyauth rm <entity>                         remove all caps for <name>compact                                  cause compaction of monitor's leveldb                                          storage (DEPRECATED)config-key del <key>                     delete <key>config-key exists <key>                  check for <key>'s existenceconfig-key get <key>                     get <key>config-key list                          list keysconfig-key put <key> {<val>}             put <key>, value <val>config-key rm <key>                      rm <key>df {detail}                              show cluster free space statsfs add_data_pool <fs_name> <pool>        add data pool <pool>fs dump {<int[0-]>}                      dump all CephFS status, optionally                                          from epochfs flag set enable_multiple <val> {--    Set a global CephFS flag yes-i-really-mean-it}fs get <fs_name>                         get info about one filesystemfs ls                                    list filesystemsfs new <fs_name> <metadata> <data>       make new filesystem using named pools                                          <metadata> and <data>fs reset <fs_name> {--yes-i-really-mean- disaster recovery only: reset to a it}                                      single-MDS mapfs rm <fs_name> {--yes-i-really-mean-it} disable the named filesystemfs rm_data_pool <fs_name> <pool>         remove data pool <pool>fs set <fs_name> max_mds|max_file_size|  set mds parameter <var> to <val> allow_new_snaps|inline_data|cluster_ down|allow_multimds|allow_dirfrags <val> {<confirm>}fs set_default <fs_name>                 set the default to the named filesystemfsid                                     show cluster FSID/UUIDhealth {detail}                          show cluster healthheap dump|start_profiler|stop_profiler|  show heap usage info (available only release|stats                            if compiled with tcmalloc)injectargs <injected_args> [<injected_   inject config arguments into monitor args>...]log <logtext> [<logtext>...]             log supplied text to the monitor logmds add_data_pool <pool>                 add data pool <pool>mds cluster_down                         take MDS cluster downmds cluster_up                           bring MDS cluster upmds compat rm_compat <int[0-]>           remove compatible featuremds compat rm_incompat <int[0-]>         remove incompatible featuremds compat show                          show mds compatibility settingsmds deactivate <who>                     stop mdsmds dump {<int[0-]>}                     dump legacy MDS cluster info,                                          optionally from epochmds fail <who>                           force mds to status failedmds getmap {<int[0-]>}                   get MDS map, optionally from epochmds metadata <who>                       fetch metadata for mds <who>mds newfs <int[0-]> <int[0-]> {--yes-i-  make new filesystem using pools really-mean-it}                          <metadata> and <data>mds remove_data_pool <pool>              remove data pool <pool>mds repaired <rank>                      mark a damaged MDS rank as no longer                                          damagedmds rm <int[0-]>                         remove nonactive mdsmds rm_data_pool <pool>                  remove data pool <pool>mds rmfailed <who> {<confirm>}           remove failed mdsmds set max_mds|max_file_size|allow_new_ set mds parameter <var> to <val> snaps|inline_data|allow_multimds|allow_ dirfrags <val> {<confirm>}mds set_max_mds <int[0-]>                set max MDS indexmds set_state <int[0-]> <int[0-20]>      set mds state of <gid> to <numeric-                                          state>mds stat                                 show MDS statusmds stop <who>                           stop mdsmds tell <who> <args> [<args>...]        send command to particular mdsmon add <name> <IPaddr[:port]>           add new monitor named <name> at <addr>mon compact                              cause compaction of monitor's leveldb                                          storagemon dump {<int[0-]>}                     dump formatted monmap (optionally from                                          epoch)mon getmap {<int[0-]>}                   get monmapmon metadata <id>                        fetch metadata for mon <id>mon remove <name>                        remove monitor named <name>mon rm <name>                            remove monitor named <name>mon scrub                                scrub the monitor storesmon stat                                 summarize monitor statusmon sync force {--yes-i-really-mean-it}  force sync of and clear monitor store {--i-know-what-i-am-doing}mon_status                               report status of monitorsnode ls {all|osd|mon|mds}                list all nodes in cluster [type]osd blacklist add|rm <EntityAddr>        add (optionally until <expire> seconds {<float[0.0-]>}                          from now) or remove <addr> from                                          blacklistosd blacklist clear                      clear all blacklisted clientsosd blacklist ls                         show blacklisted clientsosd blocked-by                           print histogram of which OSDs are                                          blocking their peersosd create {<uuid>} {<int[0-]>}          create new osd (with optional UUID and                                          ID)osd crush add <osdname (id|osd.id)>      add or update crushmap position and <float[0.0-]> <args> [<args>...]         weight for <name> with <weight> and                                          location <args>osd crush add-bucket <name> <type>       add no-parent (probably root) crush                                          bucket <name> of type <type>osd crush create-or-move <osdname (id|   create entry or move existing entry osd.id)> <float[0.0-]> <args> [<args>..  for <name> <weight> at/to location .]                                       <args>osd crush dump                           dump crush maposd crush get-tunable straw_calc_version get crush tunable <tunable>osd crush link <name> <args> [<args>...] link existing entry for <name> under                                          location <args>osd crush move <name> <args> [<args>...] move existing entry for <name> to                                          location <args>osd crush remove <name> {<ancestor>}     remove <name> from crush map (                                          everywhere, or just at <ancestor>)osd crush rename-bucket <srcname>        rename bucket <srcname> to <dstname> <dstname>osd crush reweight <name> <float[0.0-]>  change <name>'s weight to <weight> in                                          crush maposd crush reweight-all                   recalculate the weights for the tree                                          to ensure they sum correctlyosd crush reweight-subtree <name>        change all leaf items beneath <name> <float[0.0-]>                            to <weight> in crush maposd crush rm <name> {<ancestor>}         remove <name> from crush map (                                          everywhere, or just at <ancestor>)osd crush rule create-erasure <name>     create crush rule <name> for erasure {<profile>}                              coded pool created with <profile> (                                          default default)osd crush rule create-simple <name>      create crush rule <name> to start from <root> <type> {firstn|indep}             <root>, replicate across buckets of                                          type <type>, using a choose mode of                                          <firstn|indep> (default firstn; indep                                          best for erasure pools)osd crush rule dump {<name>}             dump crush rule <name> (default all)osd crush rule list                      list crush rulesosd crush rule ls                        list crush rulesosd crush rule rm <name>                 remove crush rule <name>osd crush set                            set crush map from input fileosd crush set <osdname (id|osd.id)>      update crushmap position and weight <float[0.0-]> <args> [<args>...]         for <name> to <weight> with location                                          <args>osd crush set-tunable straw_calc_        set crush tunable <tunable> to <value> version <int>osd crush show-tunables                  show current crush tunablesosd crush tree                           dump crush buckets and items in a tree                                          viewosd crush tunables legacy|argonaut|      set crush tunables values to <profile> bobtail|firefly|hammer|jewel|optimal| defaultosd crush unlink <name> {<ancestor>}     unlink <name> from crush map (                                          everywhere, or just at <ancestor>)osd deep-scrub <who>                     initiate deep scrub on osd <who>osd df {plain|tree}                      show OSD utilizationosd down <ids> [<ids>...]                set osd(s) <id> [<id>...] downosd dump {<int[0-]>}                     print summary of OSD maposd erasure-code-profile get <name>      get erasure code profile <name>osd erasure-code-profile ls              list all erasure code profilesosd erasure-code-profile rm <name>       remove erasure code profile <name>osd erasure-code-profile set <name>      create erasure code profile <name> {<profile> [<profile>...]}               with [<key[=value]> ...] pairs. Add a                                          --force at the end to override an                                          existing profile (VERY DANGEROUS)osd find <int[0-]>                       find osd <id> in the CRUSH map and                                          show its locationosd getcrushmap {<int[0-]>}              get CRUSH maposd getmap {<int[0-]>}                   get OSD maposd getmaxosd                            show largest OSD idosd in <ids> [<ids>...]                  set osd(s) <id> [<id>...] inosd lost <int[0-]> {--yes-i-really-mean- mark osd as permanently lost. THIS it}                                      DESTROYS DATA IF NO MORE REPLICAS                                          EXIST, BE CAREFULosd ls {<int[0-]>}                       show all OSD idsosd lspools {<int>}                      list poolsosd map <poolname> <objectname>          find pg for <object> in <pool> with {<nspace>}                               [namespace]osd metadata {<int[0-]>}                 fetch metadata for osd {id} (default                                          all)osd out <ids> [<ids>...]                 set osd(s) <id> [<id>...] outosd pause                                pause osdosd perf                                 print dump of OSD perf summary statsosd pg-temp <pgid> {<id> [<id>...]}      set pg_temp mapping pgid:[<id> [<id>...                                          ]] (developers only)osd pool create <poolname> <int[0-]>     create pool {<int[0-]>} {replicated|erasure} {<erasure_code_profile>} {<ruleset>} {<int>}osd pool delete <poolname> {<poolname>}  delete pool {--yes-i-really-really-mean-it}osd pool get <poolname> size|min_size|   get pool parameter <var> crash_replay_interval|pg_num|pgp_num| crush_ruleset|hashpspool|nodelete| nopgchange|nosizechange|write_fadvise_ dontneed|noscrub|nodeep-scrub|hit_set_ type|hit_set_period|hit_set_count|hit_ set_fpp|auid|target_max_objects|target_ max_bytes|cache_target_dirty_ratio| cache_target_dirty_high_ratio|cache_ target_full_ratio|cache_min_flush_age| cache_min_evict_age|erasure_code_ profile|min_read_recency_for_promote| all|min_write_recency_for_promote|fast_ read|hit_set_grade_decay_rate|hit_set_ search_last_n|scrub_min_interval|scrub_ max_interval|deep_scrub_interval| recovery_priority|recovery_op_priority| scrub_priorityosd pool get-quota <poolname>            obtain object or byte limits for poolosd pool ls {detail}                     list poolsosd pool mksnap <poolname> <snap>        make snapshot <snap> in <pool>osd pool rename <poolname> <poolname>    rename <srcpool> to <destpool>osd pool rm <poolname> {<poolname>} {--  remove pool yes-i-really-really-mean-it}osd pool rmsnap <poolname> <snap>        remove snapshot <snap> from <pool>osd pool set <poolname> size|min_size|   set pool parameter <var> to <val> crash_replay_interval|pg_num|pgp_num| crush_ruleset|hashpspool|nodelete| nopgchange|nosizechange|write_fadvise_ dontneed|noscrub|nodeep-scrub|hit_set_ type|hit_set_period|hit_set_count|hit_ set_fpp|use_gmt_hitset|debug_fake_ec_ pool|target_max_bytes|target_max_ objects|cache_target_dirty_ratio|cache_ target_dirty_high_ratio|cache_target_ full_ratio|cache_min_flush_age|cache_ min_evict_age|auid|min_read_recency_ for_promote|min_write_recency_for_ promote|fast_read|hit_set_grade_decay_ rate|hit_set_search_last_n|scrub_min_ interval|scrub_max_interval|deep_scrub_ interval|recovery_priority|recovery_op_ priority|scrub_priority <val> {--yes-i- really-mean-it}osd pool set-quota <poolname> max_       set object or byte limit on pool objects|max_bytes <val>osd pool stats {<name>}                  obtain stats from all pools, or from                                          specified poolosd primary-affinity <osdname (id|osd.   adjust osd primary-affinity from 0.0 <= id)> <float[0.0-1.0]>                     <weight> <= 1.0osd primary-temp <pgid> <id>             set primary_temp mapping pgid:<id>|-1 (                                          developers only)osd repair <who>                         initiate repair on osd <who>osd reweight <int[0-]> <float[0.0-1.0]>  reweight osd to 0.0 < <weight> < 1.0osd reweight-by-pg {<int>} {<float>}     reweight OSDs by PG distribution {<int>} {<poolname> [<poolname>...]}     [overload-percentage-for-                                          consideration, default 120]osd reweight-by-utilization {<int>}      reweight OSDs by utilization [overload- {<float>} {<int>} {--no-increasing}      percentage-for-consideration, default                                          120]osd rm <ids> [<ids>...]                  remove osd(s) <id> [<id>...] inosd scrub <who>                          initiate scrub on osd <who>osd set full|pause|noup|nodown|noout|    set <key> noin|nobackfill|norebalance|norecover| noscrub|nodeep-scrub|notieragent| sortbitwiseosd setcrushmap                          set crush map from input fileosd setmaxosd <int[0-]>                  set new maximum osd valueosd stat                                 print summary of OSD maposd test-reweight-by-pg {<int>}          dry run of reweight OSDs by PG {<float>} {<int>} {<poolname>            distribution [overload-percentage-for- [<poolname>...]}                         consideration, default 120]osd test-reweight-by-utilization         dry run of reweight OSDs by {<int>} {<float>} {<int>} {--no-         utilization [overload-percentage-for- increasing}                              consideration, default 120]osd thrash <int[0-]>                     thrash OSDs for <num_epochs>osd tier add <poolname> <poolname> {--   add the tier <tierpool> (the second force-nonempty}                          one) to base pool <pool> (the first                                          one)osd tier add-cache <poolname>            add a cache <tierpool> (the second one) <poolname> <int[0-]>                     of size <size> to existing pool                                          <pool> (the first one)osd tier cache-mode <poolname> none|     specify the caching mode for cache writeback|forward|readonly|readforward|  tier <pool> proxy|readproxy {--yes-i-really-mean- it}osd tier remove <poolname> <poolname>    remove the tier <tierpool> (the second                                          one) from base pool <pool> (the first                                          one)osd tier remove-overlay <poolname>       remove the overlay pool for base pool                                          <pool>osd tier rm <poolname> <poolname>        remove the tier <tierpool> (the second                                          one) from base pool <pool> (the first                                          one)osd tier rm-overlay <poolname>           remove the overlay pool for base pool                                          <pool>osd tier set-overlay <poolname>          set the overlay pool for base pool <poolname>                               <pool> to be <overlaypool>osd tree {<int[0-]>}                     print OSD treeosd unpause                              unpause osdosd unset full|pause|noup|nodown|noout|  unset <key> noin|nobackfill|norebalance|norecover| noscrub|nodeep-scrub|notieragent| sortbitwiseosd utilization                          get basic pg distribution statspg debug unfound_objects_exist|degraded_ show debug info about pgs pgs_existpg deep-scrub <pgid>                     start deep-scrub on <pgid>pg dump {all|summary|sum|delta|pools|    show human-readable versions of pg map osds|pgs|pgs_brief [all|summary|sum|     (only 'all' valid with plain) delta|pools|osds|pgs|pgs_brief...]}pg dump_json {all|summary|sum|pools|     show human-readable version of pg map osds|pgs [all|summary|sum|pools|osds|    in json only pgs...]}pg dump_pools_json                       show pg pools info in json onlypg dump_stuck {inactive|unclean|stale|   show information about stuck pgs undersized|degraded [inactive|unclean| stale|undersized|degraded...]} {<int>}pg force_create_pg <pgid>                force creation of pg <pgid>pg getmap                                get binary pg map to -o/stdoutpg ls {<int>} {active|clean|down|replay| list pg with specific pool, osd, state splitting|scrubbing|scrubq|degraded| inconsistent|peering|repair|recovering| backfill_wait|incomplete|stale| remapped|deep_scrub|backfill|backfill_ toofull|recovery_wait|undersized| activating|peered [active|clean|down| replay|splitting|scrubbing|scrubq| degraded|inconsistent|peering|repair| recovering|backfill_wait|incomplete| stale|remapped|deep_scrub|backfill| backfill_toofull|recovery_wait| undersized|activating|peered...]}pg ls-by-osd <osdname (id|osd.id)>       list pg on osd [osd] {<int>} {active|clean|down|replay| splitting|scrubbing|scrubq|degraded| inconsistent|peering|repair|recovering| backfill_wait|incomplete|stale| remapped|deep_scrub|backfill|backfill_ toofull|recovery_wait|undersized| activating|peered [active|clean|down| replay|splitting|scrubbing|scrubq| degraded|inconsistent|peering|repair| recovering|backfill_wait|incomplete| stale|remapped|deep_scrub|backfill| backfill_toofull|recovery_wait| undersized|activating|peered...]}pg ls-by-pool <poolstr> {active|clean|   list pg with pool = [poolname | poolid] down|replay|splitting|scrubbing|scrubq| degraded|inconsistent|peering|repair| recovering|backfill_wait|incomplete| stale|remapped|deep_scrub|backfill| backfill_toofull|recovery_wait| undersized|activating|peered [active| clean|down|replay|splitting|scrubbing| scrubq|degraded|inconsistent|peering| repair|recovering|backfill_wait| incomplete|stale|remapped|deep_scrub| backfill|backfill_toofull|recovery_ wait|undersized|activating|peered...]}pg ls-by-primary <osdname (id|osd.id)>   list pg with primary = [osd] {<int>} {active|clean|down|replay| splitting|scrubbing|scrubq|degraded| inconsistent|peering|repair|recovering| backfill_wait|incomplete|stale| remapped|deep_scrub|backfill|backfill_ toofull|recovery_wait|undersized| activating|peered [active|clean|down| replay|splitting|scrubbing|scrubq| degraded|inconsistent|peering|repair| recovering|backfill_wait|incomplete| stale|remapped|deep_scrub|backfill| backfill_toofull|recovery_wait| undersized|activating|peered...]}pg map <pgid>                            show mapping of pg to osdspg repair <pgid>                         start repair on <pgid>pg scrub <pgid>                          start scrub on <pgid>pg send_pg_creates                       trigger pg creates to be issuedpg set_full_ratio <float[0.0-1.0]>       set ratio at which pgs are considered                                          fullpg set_nearfull_ratio <float[0.0-1.0]>   set ratio at which pgs are considered                                          nearly fullpg stat                                  show placement group status.quorum enter|exit                        enter or exit quorumquorum_status                            report status of monitor quorumreport {<tags> [<tags>...]}              report full status of cluster,                                          optional title tag stringsscrub                                    scrub the monitor stores (DEPRECATED)status                                   show cluster statussync force {--yes-i-really-mean-it} {--  force sync of and clear monitor store ( i-know-what-i-am-doing}                  DEPRECATED)tell <name (type.id)> <args> [<args>...] send a command to a specific daemonversion                                  show mon daemon version

FAQ#

HEALTH_ERR 64 pgs are stuck inactive for more than 300 seconds; 64 pgs stuck inactive#

这个是有可能由很多问题引起的,可分别查看其它服务上的日志来判断 cat /var/log/ceph/ceph*

ERROR: osd init failed: (36) File name too long#

文件名太长,可能是由于文件系统类型导致的 df -T,一般建议使用 XFS 或 Btrfs, 但大多数情况下是 ext4, 文件名不能超过 1024b.