ceph遇到问题汇总

1 mons are allowing insecure global_id reclaim

ceph -s
  cluster:
    id:     4f706b80-04bc-495d-9a34-f8dec17c96ae
    health: HEALTH_WARN
            mons are allowing insecure global_id reclaim

解决方法
禁用不安全模式
[root@cephnode01 my-cluster]# ceph config set mon auth_allow_insecure_global_id_reclaim false

2 Ceph:Unknown lvalue 'TasksMax' in section 'Service 原创

Unknown lvalue 'LockPersonality' in section 'Service'
Unknown lvalue 'MemoryDenyWriteExecute' in section 'Service'
Unknown lvalue 'ProtectControlGroups' in section 'Service'
Unknown lvalue 'ProtectKernelModules' in section 'Service'
Unknown lvalue 'ProtectKernelTunables' in section 'Service'
Unknown lvalue 'LockPersonality' in section 'Service'
Unknown lvalue 'MemoryDenyWriteExecute' in section 'Service'
Unknown lvalue 'ProtectControlGroups' in section 'Service'
Unknown lvalue 'ProtectKernelModules' in section 'Service'
Unknown lvalue 'ProtectKernelTunables' in section 'Service'

解决方法
# sudo yum install systemd-*

# systemctl restart ceph\*.service
# systemctl status ceph\*.service

3 Ceph集群显示XXX daemons have recently crashed警告

解决方法:

最近有一个或多个Ceph守护进程崩溃,管理员尚未对该崩溃进行存档(确认)。这可能表示软件错误、硬件问题(例如,故障磁盘)或某些其它问题。

系统中所有的崩溃可以通过以下方式列出:

[root@cephnode01 ~]# ceph crash ls-new
ID                                                               ENTITY         NEW 
2021-12-09_18:19:41.114777Z_32eeed54-83ed-4057-af2b-ebca01418496 mon.cephnode01  *  
2021-12-09_18:23:08.157142Z_ae94f8e9-61f4-447f-8f04-f86e309a41e9 mon.cephnode01  *  
2021-12-09_18:26:44.200935Z_e182a979-b644-442d-ba2c-ae8e5efc0b2c mon.cephnode01  *  
2021-12-09_18:31:56.303885Z_124c25bd-f563-46e5-9cb9-ad4feb8d5871 mon.cephnode01  *  
2021-12-09_18:32:09.589732Z_03bc8120-6f92-4e0c-8cc2-028c2d045400 mon.cephnode01  *  
2021-12-09_18:32:22.812347Z_2af9b1c0-495d-4f08-ab34-d7917ac5fd38 mon.cephnode01  *  
2021-12-13_10:02:05.658126Z_d514c2b5-429a-45e6-bad6-6ccd6c77bb32 mon.cephnode01  *  
2021-12-13_10:02:18.814873Z_3a8b70c7-b757-4464-af22-e8f2c1b51849 mon.cephnode01  *  
2021-12-13_10:02:32.090272Z_326df2a3-1741-435e-982c-63bc744a95de mon.cephnode01  *  
2021-12-13_10:02:45.302563Z_57763e3f-3985-4b06-9887-c5d1f05246a5 mon.cephnode01  *  

新的崩溃可以通过以下方式列出:

# ceph crash ls-new

有关特定崩溃的信息可以通过以下方式检查:

# ceph crash info <crash-id>

###例如###

[root@cephnode01 ~]# ceph crash info 2021-12-09_18:19:41.114777Z_32eeed54-83ed-4057-af2b-ebca01418496
{
    "os_version_id": "7", 
    "assert_condition": "abort", 
    "utsname_release": "3.10.0-693.el7.x86_64", 
    "os_name": "CentOS Linux", 
    "entity_name": "mon.cephnode01", 
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.22/rpm/el7/BUILD/ceph-14.2.22/src/mon/MonitorDBStore.h", 
    "timestamp": "2021-12-09 18:19:41.114777Z", 
    "process_name": "ceph-mon", 
    "utsname_machine": "x86_64", 
    "utsname_sysname": "Linux", 
    "os_version": "7 (Core)", 
    "os_id": "centos", 
    "assert_thread_name": "safe_timer", 
    "utsname_version": "#1 SMP Tue Aug 22 21:09:27 UTC 2017", 
    "backtrace": [
        "(()+0xf5e0) [0x7fca0b7ba5e0]", 
        "(gsignal()+0x37) [0x7fca0a7cd1f7]", 
        "(abort()+0x148) [0x7fca0a7ce8e8]", 
        "(ceph::__ceph_abort(char const*, int, char const*, std::string const&)+0x1a5) [0x7fca0e9afb2e]", 
        "(MonitorDBStore::apply_transaction(std::shared_ptr<MonitorDBStore::Transaction>)+0xb75) [0x55765e569da5]", 
        "(Paxos::begin(ceph::buffer::v14_2_0::list&)+0x4ed) [0x55765e67fdfd]", 
        "(Paxos::propose_pending()+0x11e) [0x55765e68176e]", 
        "(Paxos::trigger_propose()+0x2fd) [0x55765e6840ad]", 
        "(PaxosService::propose_pending()+0x287) [0x55765e6896a7]", 
        "(()+0x35e910) [0x55765e689910]", 
        "(C_MonContext::finish(int)+0x39) [0x55765e56d989]", 
        "(Context::complete(int)+0x9) [0x55765e5a93a9]", 
        "(SafeTimer::timer_thread()+0x180) [0x7fca0ea82d80]", 
        "(SafeTimerThread::entry()+0xd) [0x7fca0ea845ed]", 
        "(()+0x7e25) [0x7fca0b7b2e25]", 
        "(clone()+0x6d) [0x7fca0a89034d]"
    ], 
    "utsname_hostname": "cephnode01", 
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.22/rpm/el7/BUILD/ceph-14.2.22/src/mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)' thread 7fca02225700 time 2021-12-10 02:19:41.111554\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.22/rpm/el7/BUILD/ceph-14.2.22/src/mon/MonitorDBStore.h: 354: ceph_abort_msg(\"failed to write to db\")\n", 
    "crash_id": "2021-12-09_18:19:41.114777Z_32eeed54-83ed-4057-af2b-ebca01418496", 
    "assert_line": 354, 
    "ceph_version": "14.2.22", 
    "assert_func": "int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)"

可以通过“存档”崩溃(可能是在管理员检查之后)来消除此警告,从而不会生成此警告:

# ceph crash archive <crash-id>

同样,所有新的崩溃都可以通过以下方式存档:

# ceph crash archive-all

通过ceph crash ls仍然可以看到已存档的崩溃,但不是ceph crash ls-new即可看到。

“recent”所指的时间段由选项mgr/crash/warn_recent_interval控制(默认值:两周)。

可以通过以下方式完全禁用这些警告:

# ceph config set mgr mgr/crash/warn_recent_interval 0

参考:

https://docs.ceph.com/docs/master/rados/operations/health-checks/?highlight=backfillfull%20ratio
https://docs.ceph.com/docs/master/mgr/crash/?highlight=crash
  • 我的微信
  • 这是我的微信扫一扫
  • weinxin
  • 我的微信公众号
  • 我的微信公众号扫一扫
  • weinxin
avatar

发表评论

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen: