一 故障描述
我在臺(tái)灣合作方給定的兩臺(tái)虛擬機(jī)上部署HAProxy+Keepalived負(fù)載均衡高可用方案。在配置完Keepalived后,重新啟動(dòng)Keepalived,Keepalived沒(méi)有綁定VIP。
Keepalived執(zhí)行程序路徑為/data/app_platform/keepalived/sbin/keepalived
配置文件路徑為/data/app_platform/keepalived/conf/keepalived.conf
Keepalived的啟動(dòng)腳本為/etc/init.d/keepalived
keepalived.conf的內(nèi)容
LB1 Master
! Configuration File for keepalived
global_defs {
notification_email {
admin@example.com
}
notification_email_from lb1@example.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LB1_MASTER
}
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
vrrp_instance VI_1 {
state MASTER
interface eth1
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
10.1.1.200/24 brd 10.1.1.255 dev eth1 label eth1:vip
}
track_script {
chk_haproxy
}
}
重新啟動(dòng)Keepalived查看日志
Mar 3 18:09:00 cv00300005248-1 Keepalived[20138]: Stopping Keepalived v1.2.15 (02/28,2015)
Mar 3 18:09:00 cv00300005248-1 Keepalived[20259]: Starting Keepalived v1.2.15 (02/28,2015)
Mar 3 18:09:00 cv00300005248-1 Keepalived[20260]: Starting Healthcheck child process, pid=20261
Mar 3 18:09:00 cv00300005248-1 Keepalived[20260]: Starting VRRP child process, pid=20262
Mar 3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Registering Kernel netlink reflector
Mar 3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Registering Kernel netlink command channel
Mar 3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Registering gratuitous ARP shared channel
Mar 3 18:09:00 cv00300005248-1 Keepalived_healthcheckers[20261]: Registering Kernel netlink reflector
Mar 3 18:09:00 cv00300005248-1 Keepalived_healthcheckers[20261]: Registering Kernel netlink command channel
Mar 3 18:09:00 cv00300005248-1 Keepalived_healthcheckers[20261]: Configuration is using : 3924 Bytes
Mar 3 18:09:00 cv00300005248-1 Keepalived_healthcheckers[20261]: Using LinkWatch kernel netlink reflector...
Mar 3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Configuration is using : 55712 Bytes
Mar 3 18:09:00 cv00300005248-1 Keepalived_vrrp[20262]: Using LinkWatch kernel netlink reflector...
Mar 3 18:09:18 cv00300005248-1 kernel: __ratelimit: 1964 callbacks suppressed
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
Mar 3 18:09:18 cv00300005248-1 kernel: Neighbour table overflow.
查看VIP綁定情況
$ ifconfig eth1:vip
eth1:vip Link encap:Ethernet HWaddr 00:16:3E:F2:37:6B
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:13
沒(méi)有VIP綁定
二 排查過(guò)程
1)檢查VIP的配置情況
向合作方確認(rèn)提供的VIP的詳細(xì)情況
IPADDR 10.1.1.200
NETMASK 255.255.255.0
GATEWAY 10.1.1.1
Brodcast 10.1.1.255
這里設(shè)置的是
10.1.1.200/24 brd 10.1.1.255 dev eth1 label eth1:vip
2)檢查iptables和selinux的設(shè)置情況
$ sudo service iptables stop
$ sudo setenforce 0
setenforce: SELinux is disabled
如果非要開(kāi)啟iptables的話,需要作些設(shè)定
iptables -I INPUT -i eth1 -d 224.0.0.0/8 -j ACCEPT
service iptables save
keepalived使用224.0.0.18作為Master和Backup健康檢查的通信IP
3)檢查相關(guān)的內(nèi)核參數(shù)
HAProxy+Keepalived架構(gòu)需要注意的內(nèi)核參數(shù)有:
# Controls IP packet forwarding
net.ipv4.ip_forward = 1
開(kāi)啟IP轉(zhuǎn)發(fā)功能
net.ipv4.ip_nonlocal_bind = 1
開(kāi)啟允許綁定非本機(jī)的IP
如果使用LVS的DR或者TUN模式結(jié)合Keepalived需要在后端真實(shí)服務(wù)器上特別設(shè)置兩個(gè)arp相關(guān)的參數(shù)。這里也設(shè)置好。
net.ipv4.conf.lo.arp_ignore = 1
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_ignore = 1
net.ipv4.conf.all.arp_announce = 2
4)檢查VRRP的設(shè)置情況
LB1 Master
state MASTER
interface eth1
virtual_router_id 51
priority 100
LB2 Backup
state BACKUP
interface eth1
virtual_router_id 51
priority 99
Master和Backup的virtual_router_id需要一樣,priority需要不一樣,數(shù)字越大,優(yōu)先級(jí)越高
5)懷疑是編譯安裝Keepalived版本出現(xiàn)了問(wèn)題
重新下載并編譯2.1.13的版本,并重新啟動(dòng)keepalived,VIP仍然沒(méi)有被綁定。
線上有個(gè)平臺(tái)的keepalived是通過(guò)yum安裝的,于是打算先用yum安裝keepalived后將配置文件復(fù)制過(guò)去看看是否可以綁定VIP
rpm -ivh http://ftp.linux./pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum -y install keepalived
cp /data/app_platform/keepalived/conf/keepalived.conf /etc/keepalived/keepalived.conf
重新啟動(dòng)keepalived
然后查看日志
Mar 4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Registering Kernel netlink reflector
Mar 4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Registering Kernel netlink command channel
Mar 4 16:42:46 xxxxx Keepalived_vrrp[17333]: Opening file '/etc/keepalived/keepalived.conf'.
Mar 4 16:42:46 xxxxx Keepalived_vrrp[17333]: Configuration is using : 65250 Bytes
Mar 4 16:42:46 xxxxx Keepalived_vrrp[17333]: Using LinkWatch kernel netlink reflector...
Mar 4 16:42:46 xxxxx Keepalived_vrrp[17333]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)]
Mar 4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Opening file '/etc/keepalived/keepalived.conf'.
Mar 4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Configuration is using : 7557 Bytes
Mar 4 16:42:46 xxxxx Keepalived_healthcheckers[17332]: Using LinkWatch kernel netlink reflector...
Mar 4 16:42:46 xxxxx Keepalived_vrrp[17333]: VRRP_Script(chk_haproxy) succeeded
Mar 4 16:42:47 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar 4 16:42:48 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) Entering MASTER STATE
Mar 4 16:42:48 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar 4 16:42:48 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200
Mar 4 16:42:48 xxxxx Keepalived_healthcheckers[17332]: Netlink reflector reports IP 10.1.1.200 added
Mar 4 16:42:53 xxxxx Keepalived_vrrp[17333]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200
再查看IP綁定情況
$ ifconfig eth1:vip
eth1:vip Link encap:Ethernet HWaddr 00:16:3E:F2:37:6B
inet addr:10.1.1.200 Bcast:10.1.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:13
再通過(guò)yum將keepalived卸載掉
yum remove keepalived
恢復(fù)到原來(lái)的啟動(dòng)腳本/etc/init.d/keepalived
重新啟動(dòng)keepalived后還是無(wú)法綁定VIP
懷疑是keepalived啟動(dòng)腳本/etc/init.d/keepalived的問(wèn)題
檢查/etc/init.d/keepalived
# Source function library.
. /etc/rc.d/init.d/functions
exec="/data/app_platform/keepalived/sbin/keepalived"
prog="keepalived"
config="/data/app_platform/keepalived/conf/keepalived.conf"
[ -e /etc/sysconfig/$prog ] && . /etc/sysconfig/$prog
lockfile=/var/lock/subsys/keepalived
start() {
[ -x $exec ] || exit 5
[ -e $config ] || exit 6
echo -n $"Starting $prog: "
daemon $exec $KEEPALIVED_OPTIONS
retval=$?
echo
[ $retval -eq 0 ] && touch $lockfile
return $retval
}
關(guān)鍵是這一行
daemon $exec $KEEPALIVED_OPTIONS
由于沒(méi)有復(fù)制/etc/sysconfig/keepalived,所以將直接執(zhí)行damon /data/app_platform/keepalived/sbin/keepalived
由于keepalived默認(rèn)使用的是/etc/keepalived/keepalived.conf作為配置文件,而這里指定了不同的配置文件,所以要修改成為
daemon $exec -D -f $config
重新啟動(dòng)keepalived,查看日志和VIP綁定情況
$ ifconfig eth1:vip
eth1:vip Link encap:Ethernet HWaddr 00:16:3E:F2:37:6B
inet addr:10.1.1.200 Bcast:10.1.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:13
6)將LB2 Backup的keepalived啟動(dòng)腳本也修改一下,觀察VIP接管情況
查看LB1 Master
$ ifconfig eth1:vip
eth1:vip Link encap:Ethernet HWaddr 00:16:3E:F2:37:6B
inet addr:10.1.1.200 Bcast:10.1.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:13
查看LB2 Backup
$ ifconfig eth1:vip
eth1:vip Link encap:Ethernet HWaddr 00:16:3E:F2:37:6B
inet addr:10.1.1.200 Bcast:10.1.1.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:13
問(wèn)題出現(xiàn)了,LB1 Master和LB2 Backup都綁定了VIP 10.1.1.200,這是不正常的!?。?!
在LB1和LB2上登錄10.1.1.200看看
[lb1 ~]$ ssh 10.1.1.200
Last login: Wed Mar 4 17:31:33 2015 from 10.1.1.200
[lb1 ~]$
[lb2 ~]$ ssh 10.1.1.200
Last login: Wed Mar 4 17:54:57 2015 from 101.95.153.246
[b2 ~]$
在LB1上停掉keepalived,ping下10.1.1.200這個(gè)IP,發(fā)現(xiàn)無(wú)法ping通
在LB2上停掉keepalived,ping下10.1.1.200這個(gè)IP,發(fā)現(xiàn)也無(wú)法ping通
然后開(kāi)啟LB1上的keepalived,LB1上可以ping通10.1.1.200,LB2上不行
開(kāi)啟LB2上的keepalived,LB2上可以ping通10.1.1.200
由此得出,LB1和LB2各自都將VIP 10.1.1.200綁定到本機(jī)的eth1網(wǎng)卡上。兩臺(tái)主機(jī)并沒(méi)有VRRP通信,沒(méi)有VRRP的優(yōu)先級(jí)比較。
7)排查影響VRRP通信的原因
重新啟動(dòng)LB1 Master的Keepalived查看日志
Mar 5 15:45:36 gintama-taiwan-lb1 Keepalived_vrrp[32303]: Configuration is using : 65410 Bytes
Mar 5 15:45:36 gintama-taiwan-lb1 Keepalived_vrrp[32303]: Using LinkWatch kernel netlink reflector...
Mar 5 15:45:36 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)]
Mar 5 15:45:36 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Script(chk_haproxy) succeeded
Mar 5 15:45:37 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar 5 15:45:38 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) Entering MASTER STATE
Mar 5 15:45:38 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar 5 15:45:38 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200
Mar 5 15:45:38 gintama-taiwan-lb1 Keepalived_healthcheckers[32302]: Netlink reflector reports IP 10.1.1.200 added
Mar 5 15:45:43 gintama-taiwan-lb1 Keepalived_vrrp[32303]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200
發(fā)現(xiàn)LB1 Master上的Keepalived直接進(jìn)入Master狀態(tài),然后接管VIP
再重新啟動(dòng)LB2 Backup上的Keepalived,查看日志
Mar 5 15:47:42 gintama-taiwan-lb2 Keepalived_vrrp[30619]: Configuration is using : 65408 Bytes
Mar 5 15:47:42 gintama-taiwan-lb2 Keepalived_vrrp[30619]: Using LinkWatch kernel netlink reflector...
Mar 5 15:47:42 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Entering BACKUP STATE
Mar 5 15:47:42 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP sockpool: [ifindex(3), proto(112), unicast(0), fd(10,11)]
Mar 5 15:47:46 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar 5 15:47:47 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Entering MASTER STATE
Mar 5 15:47:47 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar 5 15:47:47 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200
Mar 5 15:47:47 gintama-taiwan-lb2 Keepalived_healthcheckers[30618]: Netlink reflector reports IP 10.1.1.200 added
Mar 5 15:47:52 gintama-taiwan-lb2 Keepalived_vrrp[30619]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200
可以看到LB2上的Keepalived先進(jìn)入BACKUP狀態(tài),然后又轉(zhuǎn)為MASTER狀態(tài),然后接管VIP
這樣就說(shuō)明VRRP組播有問(wèn)題。
既然VRRP組播有問(wèn)題,就嘗試使用單播發(fā)送VRRP報(bào)文。修改LB1和LB2的配置
LB1
添加以下配置
unicast_src_ip 10.1.1.12
unicast_peer {
10.1.1.17
}
LB2
添加以下配置
unicast_src_ip 10.1.1.17
unicast_peer {
10.1.1.12
}
unicast_src_ip 表示發(fā)送VRRP單播報(bào)文使用的源IP地址
unicast_peer 表示對(duì)端接收VRRP單播報(bào)文的IP地址
然后各自重新加載keepalived,觀察日志
LB1
Mar 5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) setting protocol VIPs.
Mar 5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Script(chk_haproxy) considered successful on reload
Mar 5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: Configuration is using : 65579 Bytes
Mar 5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: Using LinkWatch kernel netlink reflector...
Mar 5 16:13:35 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP sockpool: [ifindex(3), proto(112), unicast(1), fd(10,11)]
Mar 5 16:13:36 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Transition to MASTER STATE
Mar 5 16:13:48 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Received lower prio advert, forcing new election
Mar 5 16:13:48 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200
Mar 5 16:13:48 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Received lower prio advert, forcing new election
Mar 5 16:13:48 gintama-taiwan-lb1 Keepalived_vrrp[2551]: VRRP_Instance(VI_1) Sending gratuitous ARPs on eth1 for 10.1.1.200
LB2
Mar 5 16:13:48 gintama-taiwan-lb2 Keepalived_vrrp[453]: VRRP_Instance(VI_1) Received higher prio advert
Mar 5 16:13:48 gintama-taiwan-lb2 Keepalived_vrrp[453]: VRRP_Instance(VI_1) Entering BACKUP STATE
Mar 5 16:13:48 gintama-taiwan-lb2 Keepalived_vrrp[453]: VRRP_Instance(VI_1) removing protocol VIPs.
Mar 5 16:13:48 gintama-taiwan-lb2 Keepalived_healthcheckers[452]: Netlink reflector reports IP 10.1.1.200 removed
查看VIP綁定情況,發(fā)現(xiàn)LB2上的VIP已經(jīng)移除
在LB1上LB2上執(zhí)行ping 10.1.1.200這個(gè)VIP
[lb1 ~]$ ping -c 5 10.1.1.200
PING 10.1.1.200 (10.1.1.200) 56(84) bytes of data.
64 bytes from 10.1.1.200: icmp_seq=1 ttl=64 time=0.028 ms
64 bytes from 10.1.1.200: icmp_seq=2 ttl=64 time=0.020 ms
64 bytes from 10.1.1.200: icmp_seq=3 ttl=64 time=0.020 ms
64 bytes from 10.1.1.200: icmp_seq=4 ttl=64 time=0.021 ms
64 bytes from 10.1.1.200: icmp_seq=5 ttl=64 time=0.027 ms
--- 10.1.1.200 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 0.020/0.023/0.028/0.004 ms
[lb2 ~]$ ping -c 5 10.1.1.200
PING 10.1.1.200 (10.1.1.200) 56(84) bytes of data.
--- 10.1.1.200 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 14000ms
當(dāng)LB1接管VIP的時(shí)候LB2居然無(wú)法ping通VIP,同樣將LB1的Keepalived停掉,LB2可以接管VIP,但是在LB1上無(wú)法ping通這個(gè)VIP
在LB1和LB2上進(jìn)行抓包
lb1 ~]$ sudo tcpdump -vvv -i eth1 host 10.1.1.17
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
16:46:04.827357 IP (tos 0xc0, ttl 255, id 328, offset 0, flags [none], proto VRRP (112), length 40)
10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"
16:46:05.827459 IP (tos 0xc0, ttl 255, id 329, offset 0, flags [none], proto VRRP (112), length 40)
10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"
16:46:06.828234 IP (tos 0xc0, ttl 255, id 330, offset 0, flags [none], proto VRRP (112), length 40)
10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"
16:46:07.828338 IP (tos 0xc0, ttl 255, id 331, offset 0, flags [none], proto VRRP (112), length 40)
10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"
lb2 ~]$ sudo tcpdump -vvv -i eth1 host 10.1.1.12
tcpdump: listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
16:48:07.000029 IP (tos 0xc0, ttl 255, id 450, offset 0, flags [none], proto VRRP (112), length 40)
10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"
16:48:07.999539 IP (tos 0xc0, ttl 255, id 451, offset 0, flags [none], proto VRRP (112), length 40)
10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"
16:48:08.999252 IP (tos 0xc0, ttl 255, id 452, offset 0, flags [none], proto VRRP (112), length 40)
10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"
16:48:09.999560 IP (tos 0xc0, ttl 255, id 453, offset 0, flags [none], proto VRRP (112), length 40)
10.1.1.12 > 10.1.1.17: VRRPv2, Advertisement, vrid 51, prio 102, authtype simple, intvl 1s, length 20, addrs: 10.1.1.200 auth "1111^@^@^@^@"
在LB1和LB2所在物理機(jī)上的其他虛擬機(jī)進(jìn)行VIP ping測(cè)試,同樣只能是LB1上綁定的VIP只能是LB1所在的物理機(jī)上的虛擬機(jī)可以ping通,LB2所在的物理機(jī)上的虛擬機(jī)無(wú)法ping通,反之也是一樣
有同行建議說(shuō)VRRP和DHCP也有關(guān)系,經(jīng)過(guò)查看對(duì)方提供的VM的IP地址居然是DHCP分配的,但是經(jīng)過(guò)測(cè)試,VRRP和DHCP沒(méi)有關(guān)系。線上環(huán)境最好不要使用DHCP來(lái)獲取IP地址。
8)請(qǐng)對(duì)方技術(shù)人員配合檢查VIP無(wú)法ping通的問(wèn)題
最終查明對(duì)方的內(nèi)網(wǎng)居然使用的虛擬網(wǎng)絡(luò),網(wǎng)關(guān)是沒(méi)有實(shí)際作用的。所以部分虛擬機(jī)無(wú)法通過(guò)10.1.1.1這個(gè)網(wǎng)關(guān)去訪問(wèn)VIP。
讓對(duì)方虛擬機(jī)提供方的技術(shù)人員到服務(wù)器調(diào)試HAProxy+Keepalived,他們通過(guò)網(wǎng)絡(luò)設(shè)置使得10.1.1.200這個(gè)VIP可以通過(guò)內(nèi)網(wǎng)訪問(wèn)。但是當(dāng)我測(cè)試時(shí),發(fā)現(xiàn)當(dāng)HAProxy掛掉后,Keepalived無(wú)法作VIP的切換。
9)解決當(dāng)HAProxy掛掉后,Keepalived無(wú)法對(duì)VIP切換的問(wèn)題。
經(jīng)過(guò)反復(fù)測(cè)試,發(fā)現(xiàn)當(dāng)Keepalived掛掉后,VIP可以切換。但是當(dāng)HAProxy掛掉后,VIP無(wú)法切換。
仔細(xì)檢查配置文件和查閱相關(guān)資料,最終確定是Keepalived的weight和priority兩個(gè)參數(shù)的大小設(shè)置問(wèn)題。
原來(lái)的配置文件中我設(shè)置LB1的weight為2,priority為100。LB2的weight為2,priority為99
對(duì)方在調(diào)試的時(shí)候?qū)B1的priority更改為160.這樣反復(fù)測(cè)試當(dāng)LB1的HAProxy掛掉后,VIP都無(wú)法遷移到LB2上。將LB1上的priority更改為100就可以了。
這里需要注意的是:
主keepalived的priority值與vrrp_script的weight值相減的數(shù)字小于備用keepalived的priority 值即可!
vrrp_script 里的script返回值為0時(shí)認(rèn)為檢測(cè)成功,其它值都會(huì)當(dāng)成檢測(cè)失敗
* weight 為正時(shí),腳本檢測(cè)成功時(shí)此weight會(huì)加到priority上,檢測(cè)失敗時(shí)不加。
主失敗:
主 priority < 從 priority + weight 時(shí)會(huì)切換。
主成功:
主 priority + weight > 從 priority + weight 時(shí),主依然為主
* weight 為負(fù)時(shí),腳本檢測(cè)成功時(shí)此weight不影響priority,檢測(cè)失敗時(shí)priority - abs(weight)
主失敗:
主 priority - abs(weight) < 從priority 時(shí)會(huì)切換主從
主成功:
主 priority > 從priority 主依然為主。
最終的配置文件為:
! Configuration File for keepalived
global_defs {
notification_email {
admin@example.com
}
notification_email_from lb1@example.com
smtp_server 127.0.0.1
smtp_connect_timeout 30
router_id LB1_MASTER
}
vrrp_script chk_haproxy {
script "killall -0 haproxy"
interval 2
weight 2
}
#設(shè)置外網(wǎng)的VIP
vrrp_instance eth0_VIP {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
unicast_src_ip 8.8.8.6 #使用VRRP的單播
unicast_peer {
8.8.8.7
}
virtual_ipaddress {
8.8.8.8/25 brd 8.8.8.255 dev eth0 label eth0:vip
}
track_script {
chk_haproxy
}
}
#設(shè)置內(nèi)網(wǎng)的VIP
vrrp_instance eth1_VIP {
state MASTER
interface eth1
virtual_router_id 52
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
unicast_src_ip 10.1.1.12
unicast_peer {
10.1.1.17
}
virtual_ipaddress {
10.1.1.200/24 brd 10.1.1.255 dev eth1 label eth1:vip
}
track_script {
chk_haproxy
}
}
三 排查總結(jié)
在配置Keepalived的時(shí)候,需要注意以下幾點(diǎn):
A.內(nèi)核開(kāi)啟IP轉(zhuǎn)發(fā)和允許非本地IP綁定功能,如果是使用LVS的DR模式還需設(shè)置兩個(gè)arp相關(guān)的參數(shù)。
B.如果Keepalived所在網(wǎng)絡(luò)不允許使用組播,可以使用VRRP單播
C.需要注意主備的weight和priority的值,這兩個(gè)值如果設(shè)置不合理可能會(huì)影響VIP的切換。
D.如果使用的配置文件不是默認(rèn)的配置文件,在啟動(dòng)Keepalived的時(shí)候需要使用 -f 參數(shù)指定配置文件。