@zerox
бэкап есть, щас тяну до вечера, не могу остановить, терминальный сервер крутиться и SQL.
Из ошибок
Aug 05 07:36:46 pve1 corosync[2132]: [MAIN ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Aug 05 07:36:46 pve1 corosync[2132]: [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Aug 05 07:36:46 pve1 corosync[2132]: notice [MAIN ] Corosync Cluster Engine ('2.4.4-dirty'): started and ready to provide service.
Aug 05 07:36:46 pve1 corosync[2132]: info [MAIN ] Corosync built-in features: dbus rdma monitoring watchdog systemd xmlconf qdevices qnetd snmp pie relro bindnow
Aug 05 07:36:46 pve1 corosync[2132]: warning [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Aug 05 07:36:46 pve1 corosync[2132]: warning [MAIN ] Please migrate config file to nodelist.
Aug 05 07:36:46 pve1 corosync[2132]: [MAIN ] interface section bindnetaddr is used together with nodelist. Nodelist one is going to be used.
Aug 05 07:36:46 pve1 corosync[2132]: [MAIN ] Please migrate config file to nodelist.
Aug 05 07:36:46 pve1 corosync[2132]: notice [TOTEM ] Initializing transport (UDP/IP Multicast).
Aug 05 07:36:46 pve1 corosync[2132]: notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: aes256 hash: sha1
Aug 05 07:36:52 pve1 corosync[2132]: [CPG ] downlist left_list: 0 received
Aug 05 07:36:52 pve1 corosync[2132]: [CPG ] downlist left_list: 0 received
Aug 05 07:36:52 pve1 corosync[2132]: warning [CPG ] downlist left_list: 0 received
Aug 05 07:36:52 pve1 pmxcfs[1915]: [dcdb] notice: members: 1/1915, 2/1947
Aug 05 07:36:52 pve1 pmxcfs[1915]: [dcdb] notice: starting data syncronisation
Aug 05 07:36:52 pve1 corosync[2132]: notice [QUORUM] This node is within the primary component and will provide service.
Aug 05 07:36:52 pve1 corosync[2132]: notice [QUORUM] Members[3]: 3 2 1
Aug 05 07:36:52 pve1 corosync[2132]: notice [MAIN ] Completed service synchronization, ready to provide service.
Aug 05 07:36:52 pve1 corosync[2132]: [QUORUM] This node is within the primary component and will provide service.
Aug 05 07:36:52 pve1 corosync[2132]: [QUORUM] Members[3]: 3 2 1
Aug 05 07:36:52 pve1 corosync[2132]: [MAIN ] Completed service synchronization, ready to provide service.
Aug 05 07:36:52 pve1 systemd[1]: apt-daily.timer: Adding 53min 51.191936s random time.
Aug 05 07:36:52 pve1 systemd[1]: pve-daily-update.timer: Adding 1h 29min 30.566559s random time.
Aug 05 07:38:59 pve1 systemd-udevd[4495]: Could not generate persistent MAC address for fwbr103i0: No such file or directory
Aug 05 07:38:59 pve1 systemd-udevd[4521]: Could not generate persistent MAC address for fwpr103p0: No such file or directory
Aug 05 07:38:59 pve1 systemd-udevd[4519]: Could not generate persistent MAC address for fwln103i0: No such file or directory
Aug 05 07:38:59 pve1 kernel: fwbr103i0: port 1(fwln103i0) entered blocking state
Aug 05 07:38:59 pve1 kernel: fwbr103i0: port 1(fwln103i0) entered disabled state
Aug 05 07:38:59 pve1 kernel: device fwln103i0 entered promiscuous mode
Aug 05 07:38:59 pve1 kernel: fwbr103i0: port 1(fwln103i0) entered blocking state
Aug 05 07:38:59 pve1 kernel: fwbr103i0: port 1(fwln103i0) entered forwarding state
Такое ощущение что когда запускаю третью ноду, на первых двух летит сеть, пока что логическую связь не вижу.
Сервер SuperMicro X10DRI-t . На борту два разьема по 10 гигов через них напрямую по витухи соединяются, без коммутатора. Витая медь обжата для 10гигов. Индикация все норм. Плюсом идет сетевая карта с еще двумя портами на 10 гигов, вот через них идет как раз доступ в локалку. Такое ощущения что гемморой как раз здесь.......... При включенных нодах Ceph через мониторы показывает что обмен идет со всеми нодами и задержки минимальны, и виртуалки в статусе в работе, тока нету связи нету
Может сетевые карта переопределить тупо
Не охота все ломать, но кажись придется
часть лога Ceph
2019-08-05 06:25:03.130197 mon.pve3 mon.0 10.11.100.113:6789/0 28339 : cluster [ERR] Health check update: 17619 stuck requests are blocked > 4096 sec. Implicated osds 5,6,7,8,10,11 (REQUEST_STUCK)
2019-08-05 06:25:08.130643 mon.pve3 mon.0 10.11.100.113:6789/0 28342 : cluster [WRN] Health check update: 833 slow requests are blocked > 32 sec. Implicated osds (REQUEST_SLOW)
2019-08-05 06:25:08.130692 mon.pve3 mon.0 10.11.100.113:6789/0 28343 : cluster [ERR] Health check update: 17621 stuck requests are blocked > 4096 sec. Implicated osds 5,6,7,8,10,11 (REQUEST_STUCK)
2019-08-05 06:24:58.312724 osd.11 osd.11 10.11.100.113:6806/3148 117343 : cluster [WRN] 9221 slow requests, 3 included below; oldest blocked for > 46119.805545 secs
2019-08-05 06:24:58.312728 osd.11 osd.11 10.11.100.113:6806/3148 117344 : cluster [WRN] slow request 481.366251 seconds old, received at 2019-08-05 06:16:56.946148: osd_op(client.4444178.0:111576 1.1e 1.324d7c1e (undecoded) ondisk+write+known_if_redirected e540) currently waiting for peered
2019-08-05 06:24:58.312730 osd.11 osd.11 10.11.100.113:6806/3148 117345 : cluster [WRN] slow request 3841.805615 seconds old, received at 2019-08-05 05:20:56.506784: osd_op(client.4444178.0:110904 1.1e 1.324d7c1e (undecoded) ondisk+write+known_if_redirected e540) currently waiting for peered
2019-08-05 06:24:58.312733 osd.11 osd.11 10.11.100.113:6806/3148 117346 : cluster [WRN] slow request 121.318428 seconds old, received at 2019-08-05 06:22:56.993971: osd_op(client.4444178.0:111648 1.1e 1.324d7c1e (undecoded) ondisk+write+known_if_redirected e540) currently waiting for peered
2019-08-05 06:24:59.308575 osd.11 osd.11 10.11.100.113:6806/3148 117347 : cluster [WRN] 9221 slow requests, 1 included below; oldest blocked for > 46120.800850 secs
2019-08-05 06:24:59.308579 osd.11 osd.11 10.11.100.113:6806/3148 117348 : cluster [WRN] slow request 30720.233581 seconds old, received at 2019-08-04 21:52:59.074122: osd_op(client.4444178.0:105529 1.1e 1.324d7c1e (undecoded) ondisk+write+known_if_redirected e540) currently waiting for peered
2019-08-05 06:25:00.067746 mgr.pve2 client.4445268 10.11.100.121:0/2803441908 22833 : cluster [DBG] pgmap v22835: 128 pgs: 45 active+clean, 83 peering; 302GiB data, 914GiB used, 1.73TiB / 2.62TiB avail
2019-08-05 06:25:00.304183 osd.11 osd.11 10.11.100.113:6806/3148 117349 : cluster [WRN] 9221 slow requests, 1 included below; oldest blocked for > 46121.796678 secs
2019-08-05 06:25:00.304186 osd.11 osd.11 10.11.100.113:6806/3148 117350 : cluster [WRN] slow request 15360.061926 seconds old, received at 2019-08-05 02:09:00.241606: osd_op(client.4444178.0:108601 1.1e 1.324d7c1e (undecoded) ondisk+write+known_if_redirected e540) currently waiting for peered
2019-08-05 06:25:01.300164 osd.11 osd.11 10.11.100.113:6806/3148 117351 : cluster [WRN] 9221 slow requests, 1 included below; oldest blocked for > 46122.792232 secs
2019-08-05 06:25:01.300169 osd.11 osd.11 10.11.100.113:6806/3148 117352 : cluster [WRN] slow request 7680.246071 seconds old, received at 2019-08-05 04:17:01.053015: osd_op(client.4444178.0:110137 1.1e 1.324d7c1e (undecoded) ondisk+write+known_if_redirected e540) currently waiting for peered
2019-08-05 06:25:02.087551 mgr.pve2 client.4445268 10.11.100.121:0/2803441908 22834 : cluster [DBG] pgmap v22836: 128 pgs: 45 active+clean, 83 peering; 302GiB data, 914GiB used, 1.73TiB / 2.62TiB avail
2019-08-05 06:25:02.296068 osd.11 osd.11 10.11.100.113:6806/3148 117353 : cluster [WRN] 9222 slow requests, 5 included below; oldest blocked for > 46123.788191 secs
сейчас ругается тока на отсутствие кворума и мало данных
Degraded data redundancy: 77339/232017 objects degraded (33.333%), 128 pgs degraded, 128 pgs undersized
До вечера дотяну............. )))