TodoPage

Pacemakerを使って、プライマリDNSサーバー停止時のタイムアウト待ちを軽減する

【やりたい事】

DNS解決時に、プライマリーDNS/セカンダリーDNSの順番で名前解決を行う。

プライマリーDNSのbindが落ちると、プライマリーDNSで解決が出来ない(=タイムアウト待ち)後にセカンダリーDNSで解決

つまり、タイムアウト待ちでブラウジングレスポンス低下。

ちなみに、プロセス停止状態でもOS停止状態でも、53/UDPなのでタイムアウトが発生。

つまりは、プライマリーDNSのIPでDNS解決させないといけないので、クラスター構成にしてみる。

pacemaker及び関連ソフトのバージョンを合わせる必要がある。

構築

HW

プライマリ： RaspberryPI 3B (RaspberryPi OS-Lite 5.15.61-v7+ armv7l dns-pm01) 192.168.100.1

セカンダリ： RaspberryPI 2B (RaspberryPi OS-Lite 5.15.61-v7+ armv7l dns-pm02) 192.168.100.2

構築手順

(1) pacemakerインストール

[dns-pm01] # apt-get install pacemaker pacemaker-cli-utils pcs

[dns-pm02] # apt-get install pacemaker pacemaker-cli-utils pcs

(2) デフォルトアカウント確認とパスワード設定

[dns-pm01] # grep “hacluster” /etc/passwd

[dns-pm02] # grep “hacluster” /etc/passwd

[dns-pm01] # passwd hacluster

[dns-pm02] # passwd hacluster

(3) /etc/hostsにホスト(ノード)登録

[dns-pm01] # vi /etc/hosts

[dns-pm02] # vi /etc/hosts

# Pacemaker Settings
192.168.100.1    dns-pm01
192.168.100.2    dns-pm02

(4) 初期ノード(パッケージデフォルト)が登録されれいる場合は削除

raspiosでのインストール直後は、“node1”が登録されている。

# pcs status

Cluster name: debian

WARNINGS:
No stonith devices and stonith-enabled is not false

Cluster Summary:
  * Stack: corosync
  * Current DC: node1 (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Mon Sep 26 04:30:34 2022
  * Last change:  Mon Sep 26 03:59:28 2022 by hacluster via crmd on node1
  * 1 node configured
  * 0 resource instances configured

Node List:
  * Online: [ node1 ]

Full List of Resources:
  * No resources

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

デフォルトインストール直後において、ファイル /var/lib/pcsd/known-hosts が存在していない。

(5)の「pcs host auth …　」を実行するが、ファイル /var/lib/pcsd/known-hosts が作成されていない。

node1“などのクラスタ設定が存在している場合は、一旦クラスタ削除を行う。

[dns-pm01] # pcs cluster destroy

[dns-pm02] # pcs cluster destroy

Shutting down pacemaker/corosync services...
Killing any remaining services...
Removing all cluster configuration files...

(5) Pacemakerへのホスト(ノード)の登録

[dns-pm01] # pcs host auth dns-pm01 dns-pm02 -u hacluster -p 'XXXXXXXX'

Master側(1台だけ)の設定でOK

dns-pm01: Authorized
dns-pm02: Authorized

ファイルが作成されている事を確認

[dns-pm01] # ls -l /var/lib/pcsd/known-hosts

(6) Pacemakerのクラスタ(グループ)にホスト(ノード)を登録

[dns-pm01] # pcs cluster setup clst-01 –start dns-pm01 addr=192.168.100.1 dns-pm02 addr=192.168.100.2

Destroying cluster on hosts: 'dns-pm01', 'dns-pm02'...
dns-pm01: Successfully destroyed cluster
dns-pm02: Successfully destroyed cluster
Requesting remove 'pcsd settings' from 'dns-pm01', 'dns-pm02'
dns-pm01: successful removal of the file 'pcsd settings'
dns-pm02: successful removal of the file 'pcsd settings'
Sending 'corosync authkey', 'pacemaker authkey' to 'dns-pm01', 'dns-pm02'
dns-pm01: successful distribution of the file 'corosync authkey'
dns-pm01: successful distribution of the file 'pacemaker authkey'
dns-pm02: successful distribution of the file 'corosync authkey'
dns-pm02: successful distribution of the file 'pacemaker authkey'
Sending 'corosync.conf' to 'dns-pm01', 'dns-pm02'
dns-pm01: successful distribution of the file 'corosync.conf'
dns-pm02: successful distribution of the file 'corosync.conf'
Cluster has been successfully set up.
Starting cluster on hosts: 'dns-pm01', 'dns-pm02'...

ステータス確認。2台のノード「dns-pm01」「dns-pm02」がOnline であること

[dns-pm01] # pcs status

Cluster name: clst-01

WARNINGS:
No stonith devices and stonith-enabled is not false

Cluster Summary:
  * Stack: corosync
  * Current DC: dns-pm01 (version 2.0.5-ba59be7122) - partition with quorum
  * Last updated: Mon Sep 26 04:50:11 2022
  * Last change:  Mon Sep 26 04:49:48 2022 by hacluster via crmd on dns-pm01
  * 2 nodes configured
  * 0 resource instances configured

Node List:
  * Online: [ dns-pm01 dns-pm02 ]

Full List of Resources:
  * No resources

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled

(7) グローバル設定を変更

[dns-pm01] # pcs property

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: clst-01
 dc-version: 2.0.5-ba59be7122
 have-watchdog: false

[dns-pm01] # pcs property set no-quorum-policy=ignore

[dns-pm01] # pcs property set stonith-enabled=false

[dns-pm01] # pcs property

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: clst-01
 dc-version: 2.0.5-ba59be7122
 have-watchdog: false
 no-quorum-policy: ignore
 stonith-enabled: false

(8) ocf:heartbeat:named を参考に、named2を作成

〇本来、Active/Active構成

ocf:heartbeat:namedが、起動/停止/再起動/監視の機能を有する
Active/ActiveだとVIPが稼働機に移動しない。(グループ登録できない)
Active/Standbyにすると、Standby側はプロセスが停止する

→ 設定可能か未確認

Pacemakerはプロセス停止時にプロセス再起動が行われる

→ 設定で再起動させないように出来るようだが。

以下の手法で実装する

稼働機側にVIPを移動させたいので、Active/Standby構成とする
ocf:heartbeat:namedを参考にnamed2を作成 (監視機能のみにする)
ocf:heartbeat:named2では、起動/停止/再起動は行われない(ようにする)ので、bindの起動/停止はsystemdに任せる

[dns-pm01] # cd /usr/lib/ocf/resource.d/heartbeat

[dns-pm01] # cp -pi named named2

[dns-pm01] # vi named2

→ case文で start/stop/reload 部分をexit 0 で終了させる。

--- named       2020-12-14 22:34:09.000000000 +0900
+++ named2      2022-09-26 05:55:08.611384786 +0900
@@ -500,13 +500,12 @@
    monitor)    named_monitor
                exit $?;;

-    start)      named_start
-                exit $?;;
+    start)      exit 0;;
+
+    stop)       exit 0;;
+
+    reload)     exit 0;;

-    stop)       named_stop
-                exit $?;;
-    reload)     named_reload
-                exit $?;;
     *)
                 exit $OCF_ERR_UNIMPLEMENTED;;
 esac

dns-pm02にコピーする。

[dns-pm02] # scp xxxx@dns-pm01:/usr/lib/ocf/resource.d/heartbeat/named2 /usr/lib/ocf/resource.d/heartbeat/named2

[dns-pm02] # ls -l /usr/lib/ocf/resource.d/heartbeat/named2

(9) リソース設定を登録する

設定ファイル : /var/lib/pacemaker/cib/cib.xml

&color(blue){※ 設定ファイルの直接編集は行わないのがセオリーらしい。よって、pcsコマンドで設定登録を行う};

[dns-pm01] # pcs resource create rs-vip-1 ocf:heartbeat:IPaddr2 ip=192.168.100.11 cidr_netmask=24 nic=eth0 –group rg-01

[dns-pm01] # pcs resource create rs-vip-2 ocf:heartbeat:IPaddr2 ip=192.168.100.12 cidr_netmask=24 nic=eth0 –group rg-02

[dns-pm01] # pcs resource create monitor-named1 ocf:heartbeat:named2 monitor_request=“localhost” named_user=“bind” named=”/usr/sbin/named“ –group rg-01 op monitor interval=10s

[dns-pm01] # pcs resource create monitor-named2 ocf:heartbeat:named2 monitor_request=“localhost” named_user=“bind” named=”/usr/sbin/named“ –group rg-02 op monitor interval=10s

[dns-pm01] # pcs constraint location rs-vip-1 prefers dns-pm01=200

[dns-pm01] # pcs constraint location rs-vip-1 prefers dns-pm02=100

[dns-pm01] # pcs constraint location rs-vip-2 prefers dns-pm01=100

[dns-pm01] # pcs constraint location rs-vip-2 prefers dns-pm02=200

[dns-pm01] # pcs status

 Cluster name: clst-01
 Cluster Summary:
   * Stack: corosync
   * Current DC: dns-pm01 (version 2.0.5-ba59be7122) - partition with quorum
   * Last updated: Mon Sep 26 21:29:02 2022
   * Last change:  Mon Sep 26 11:33:23 2022 by root via cibadmin on dns-pm01
   * 2 nodes configured
   * 4 resource instances configured
 
 Node List:
   * Online: [ dns-pm01 dns-pm02 ]
 
 Full List of Resources:
   * Resource Group: rg-01:
     * rs-vip-1  (ocf::heartbeat:IPaddr2):        Started dns-pm01
     * monitor-named1    (ocf::heartbeat:named2):         Started dns-pm01
   * Resource Group: rg-02:
     * rs-vip-2  (ocf::heartbeat:IPaddr2):        Started dns-pm02
     * monitor-named2    (ocf::heartbeat:named2):         Started dns-pm02
 
 Daemon Status:
   corosync: active/disabled
   pacemaker: active/disabled
   pcsd: active/enabled

[dns-pm01] # pcs constraint location

score値が大きいほうが優先される

Location Constraints:
  Resource: rs-vip-1
    Enabled on:
      Node: dns-pm01 (score:200)
      Node: dns-pm02 (score:100)
  Resource: rs-vip-2
    Enabled on:
      Node: dns-pm01 (score:100)
      Node: dns-pm02 (score:200)

(10) リソースをフェイルオーバーするまでの障害発生回数の設定

この設定を忘れるとフェイルオーバーされないので注意

[dns-pm01] # pcs resource defaults

No defaults set

[dns-pm01] # pcs resource defaults migration-threshold=1

デフォルト設定定義。すべてのリソース定義で有効となる。

すべてのリソースに定義したくない場合は、別途個別に定義が必要。

Warning: This command is deprecated and will be removed. Please use 'pcs resource defaults update' instead.
Warning: Defaults do not apply to resources which override them with their own defined values

[dns-pm01] # pcs resource defaults

Meta Attrs: rsc_defaults-meta_attributes
  migration-threshold=1

動作

1. どちらかのサーバーのOSが停止やbindプロセスが停止すると、VIPが片寄せされます。

[稼働中]
dns-pm01: PriIP 192.168.100.1 SecIP: 192.168.100.11
dns-pm02: PriIP 192.168.100.2 SecIP: 192.168.100.12

[dns-pm01のOS停止]
dns-pm01: OS停止中
dns-pm02: PriIP 192.168.100.2 SecIP: 192.168.100.12, 192.168.100.11

2. プロセスが復旧(正常化)してもフェイルバックは行われません

Pacemakerを通常運用状態(フェイルバック)にするには、ステータスをクリアします

[dns-pm01] # pcs resource cleanup \\

クライアント設定

各クライアントの/etc/resolv.confを以下の通りに設定します。

nameserver 192.168.100.11
nameserver 192.168.100.12

フェイルオーバーテスト

執筆中

LAN線を抜く (OS停止状態)

　ディスプレイ未接続の為、

bindプロセス停止

　 systemctl stop named

Pacemaker/bindプロセスの稼働チェックは、Nagiosのプロセス稼働チェックにて実装予定

目次