Recent disruptions to two undersea internet cables in the Baltic Sea have yet again...
2.13.4 Recover prolonged node downtime or corrupted replication
- new server: follow configuration steps as detailed in Initial failover configuration.
- same server: follow recovery steps below.
MySQL Multi-Master recovery prerequisites #
- Currently active IRP node is designated as MySQL sync ‘origin’. This node currently stores reference configuration parameters and data. These will be synced to the node being recovered and we designate it as ‘destination’.
- Recovery should be scheduled during non-peak hours.
- Recovery must finish before bgpd.db.timeout.withdraw (default 4h) expires. If recovery can not be completed in time it is required to start MySQL on the active node.
MySQL Multi-Master recovery procedure #
- destination: stop irp, mysqld
- origin: sync /etc/noction/db.* to slave:/etc/noction/
- origin: sync /root/.my.cnf to slave:/root/.my.cnf
- origin: sync /var/lib/mysql/ to slave:/var/lib/mysql/
exclude files:
master.info relay-log.info -bin.* -relay.*wait until sync at (4) succeeds and continue with: - origin: stop irp (except bgpd), mysqld
- origin: delete files master.info relay-log.info -bin.* -relay.*
- origin: sync /var/lib/mysql/ to slave:/var/lib/mysql/
- destination: start mysqld and check /var/log/mysqld.log for errors
- origin: start mysqld and check /var/log/mysqld.log for errors
- origin: run CHANGE MASTER TO from the /usr/share/doc/irp/changemasterto template
- destination: run CHANGE MASTER TO from the /usr/share/doc/irp/changemasterto template
- destination: show slave status \G
- origin: show slave status \G
- origin: start IRP (bgpd should be already running)
- destination: start IRP