小男孩‘自慰网亚洲一区二区,亚洲一级在线播放毛片,亚洲中文字幕av每天更新,黄aⅴ永久免费无码,91成人午夜在线精品,色网站免费在线观看,亚洲欧洲wwwww在线观看

分享

YUBO.ORG: 運(yùn)行中的ipvs

 mrjbydd 2011-01-11

運(yùn)行中的ipvs

ipvs 的規(guī)則實(shí)現(xiàn)原理

ipvs的規(guī)則是如何生效的,先來(lái)看看他實(shí)現(xiàn)的原理

簡(jiǎn)單的來(lái)講,ipvs無(wú)非就是修改了數(shù)據(jù)報(bào)頭信息來(lái)完成client -> virus server -> real server的調(diào)度.調(diào)度的目的是使realservers之間的負(fù)載接近于平衡狀態(tài).這里牽扯到2個(gè)問(wèn)題,修改數(shù)據(jù)報(bào)的方式和調(diào)度的策略.

我們先來(lái)看看修改數(shù)據(jù)報(bào)的具體方式,現(xiàn)在2.6內(nèi)核中ipvs實(shí)現(xiàn)的方式和原來(lái)有點(diǎn)不一樣.引用一下ipvs的作者張文嵩先生的一段話

我們分別在Linux 內(nèi)核2.0和內(nèi)核2.2中修改了TCP/IP協(xié)議棧,在IP層截取和改寫(xiě)/轉(zhuǎn)發(fā)IP報(bào)文,
實(shí)現(xiàn)了三種IP負(fù)載均衡技術(shù),并提供了一個(gè)ipvsadm程序進(jìn)行虛擬服務(wù)器的配置和管理。在Linux
內(nèi)核2.4和2.6中,我們把它實(shí)現(xiàn)為NetFilter的一個(gè)模塊,很多代碼作了改寫(xiě)和進(jìn)一步優(yōu)化,
目前版本已在網(wǎng)上發(fā)布,根據(jù)反饋信息該版本已經(jīng)較穩(wěn)定。

好吧,說(shuō)得很清楚了,ipvs就是借用netfilter來(lái)修改數(shù)據(jù)報(bào)的.那么簡(jiǎn)單了解一下netfilter的工作原理還是很有必要的,如圖

netfilter一共有5個(gè)規(guī)則鏈,每個(gè)規(guī)則鏈都能存放若干條規(guī)則,規(guī)則之間都順序(也就是優(yōu)先級(jí)),一旦有規(guī)則被匹配到,完成相應(yīng)動(dòng)作后,跳出該規(guī)則鏈.這5個(gè)規(guī)則鏈分別是PREROUTING,INPUT,FORWARD,OUTPUT,POSTROUTING.我們可以將機(jī)器中的連接分成3中狀態(tài)

  • 從外部進(jìn)入主機(jī)的連接,經(jīng)過(guò) PREROUTING -> INPUT
  • 從主機(jī)出去的連接,將經(jīng)過(guò) OUPUT -> POSTROUTING
  • 由主機(jī)轉(zhuǎn)發(fā)的連接,經(jīng)過(guò)PREROUTING -> FORWARD -> POSTROUTING

每個(gè)規(guī)則鏈里的規(guī)則會(huì)在數(shù)據(jù)經(jīng)過(guò)該規(guī)則鏈的時(shí)候起作用(也就是調(diào)用相應(yīng)的函數(shù)進(jìn)行處理).看上去很簡(jiǎn)單吧,比如ipvs作為netfilter的一個(gè)模塊,往這些規(guī)則鏈里寫(xiě)入規(guī)則就好可以了

等等.如果netfilter有很多模塊,都往一個(gè)規(guī)則鏈里寫(xiě)入規(guī)則,會(huì)不會(huì)很亂呢?優(yōu)先級(jí)如何控制呢?所以規(guī)則鏈里的規(guī)則我們會(huì)根據(jù)不同的作用將其分類進(jìn)行管理,每一類的規(guī)則用一個(gè)整數(shù)來(lái)表示他的優(yōu)先級(jí),越小,優(yōu)先級(jí)越高.如果是同一類型的規(guī)則,則根據(jù)規(guī)則的先后順序來(lái)決定(鏈表結(jié)構(gòu),越靠前,優(yōu)先級(jí)越高)

netfilter本身有3個(gè)作用,所以他的規(guī)則分為3種類型,用3個(gè)表來(lái)表示,分別為filter表(過(guò)濾),nat表(修改數(shù)據(jù)報(bào)頭),mangle表(修改數(shù)據(jù)).而ipvs模塊就相當(dāng)于在netfilter里添加了一張新的ipvs表一樣.關(guān)于netfilter的更多信息,請(qǐng)參考文獻(xiàn)一


ipvs 的規(guī)則實(shí)現(xiàn)過(guò)程

每當(dāng)有新的連接(數(shù)據(jù)報(bào))經(jīng)過(guò)netfilter的規(guī)則鏈時(shí),就會(huì)調(diào)用NF_HOOK()函數(shù).此函數(shù)會(huì)訪問(wèn)一個(gè)全部變量nf_hooks.這個(gè)變量里存放了netfilter的所有表(包括filter,nat,mangle和ipvs附加表等),以及每個(gè)表的規(guī)則鏈,規(guī)則鏈里的函數(shù)調(diào)用.然后遍歷nf_hooks變量里相應(yīng)規(guī)則鏈里的所有信息,根據(jù)優(yōu)先級(jí)進(jìn)行相應(yīng)的函數(shù)調(diào)用,每個(gè)規(guī)則鏈里的函數(shù)都會(huì)根據(jù)該規(guī)則鏈里的規(guī)則對(duì)數(shù)據(jù)報(bào)進(jìn)行匹配和處理

還記得在前一部分的最后,講到的nf_register_hook()部分嗎?正是ipvs使用ret = nf_register_hooks(ip_vs_ops, ARRAY_SIZE(ip_vs_ops)); 往nf_hooks變量里加入了一些數(shù)據(jù),才使得ipvs的規(guī)則能被netfilter執(zhí)行.接下來(lái)我們來(lái)看看加入的都是些什么數(shù)據(jù)

ip_vs_ops的數(shù)據(jù)內(nèi)容是


            

net/ipv4/ipvs/ip_vs_core.c

  1.  static struct nf_hook_ops ip_vs_ops[] __read_mostly = {
  2.   /* After packet filtering, forward packet through VS/DR, VS/TUN,
  3.   * or VS/NAT(change destination), so that filtering rules can be
  4.   * applied to IPVS. */
  5.   {
  6.   .hook = ip_vs_in, //調(diào)用的函數(shù)名稱,也就是說(shuō)只要有數(shù)據(jù)經(jīng)過(guò)INPUT規(guī)則鏈,就會(huì)調(diào)用ip_vs_in()對(duì)數(shù)據(jù)進(jìn)行匹配和處理
  7.   .owner = THIS_MODULE, //模塊的名稱
  8.   .pf = PF_INET, //協(xié)議族的名稱,一般都是ip(PF_INET)協(xié)議
  9.   .hooknum = NF_INET_LOCAL_IN, //規(guī)則鏈的代號(hào),為INPUT
  10.   .priority = 100, //優(yōu)先級(jí)
  11.   },
  12.   /* After packet filtering, change source only for VS/NAT */
  13.   {
  14.   .hook = ip_vs_out, //對(duì)經(jīng)過(guò)FORWARD的數(shù)據(jù)調(diào)用ip_vs_out()進(jìn)行處理
  15.   .owner = THIS_MODULE,
  16.   .pf = PF_INET,
  17.   .hooknum = NF_INET_FORWARD,
  18.   .priority = 100,
  19.   },
  20.   /* After packet filtering (but before ip_vs_out_icmp), catch icmp
  21.   * destined for 0.0.0.0/0, which is for incoming IPVS connections */
  22.   {
  23.   .hook = ip_vs_forward_icmp, //對(duì)經(jīng)過(guò)FORWARD的數(shù)據(jù)調(diào)用ip_vs_forward_icmp()進(jìn)行處理
  24.   .owner = THIS_MODULE,
  25.   .pf = PF_INET,
  26.   .hooknum = NF_INET_FORWARD,
  27.   .priority = 99,
  28.   },
  29.   /* Before the netfilter connection tracking, exit from POST_ROUTING */
  30.   {
  31.   .hook = ip_vs_post_routing, //對(duì)經(jīng)過(guò)POSTROUTING的數(shù)據(jù)調(diào)用ip_vs_post_routing()進(jìn)行處理
  32.   .owner = THIS_MODULE,
  33.   .pf = PF_INET,
  34.   .hooknum = NF_INET_POST_ROUTING,
  35.   .priority = NF_IP_PRI_NAT_SRC-1,
  36.   },
  37.  };

可以看到,ipvs一共在INPUT,FORWARD,POSTROUTING這3個(gè)規(guī)則鏈里一共添加了4個(gè)處理的函數(shù).接下來(lái)一個(gè)一個(gè)來(lái)分析


ip_vs_in()

ip_vs_in()被放置在INPUT規(guī)則鏈里,會(huì)檢查進(jìn)入本機(jī)的所有數(shù)據(jù)報(bào).作用是將訪問(wèn)vs(虛擬服務(wù)器)的連接轉(zhuǎn)給rs(真實(shí)服務(wù)器),達(dá)到負(fù)載均衡的目的,如何調(diào)度與配置時(shí)的調(diào)度算法相關(guān).如何修改數(shù)據(jù)報(bào)頭部與VS的類型相關(guān),VS有3種類型

  • VS/NAT會(huì)修改s_addr, d_addr, d_port(可能)
  • VS/DR會(huì)修改d_addr, d_port(可能)
  • VS/TUN直接在原來(lái)數(shù)據(jù)報(bào)的基礎(chǔ)上加一個(gè)新的包頭,也叫封裝

在這個(gè)函數(shù)中,對(duì)所有目的地址為本機(jī)(調(diào)度服務(wù)器)的數(shù)據(jù)進(jìn)行了處理,從skb(sk_buff)中提出連接的協(xié)議結(jié)構(gòu)pp(ip_vs_protocol),找出哪些skb(sk_buff)符合虛擬服務(wù)的規(guī)則svc(ip_vs_service),并找到與之對(duì)應(yīng)的cp(ip_vs_conn),如果沒(méi)有找到就new一個(gè)cp,并將其加入到ip_vs_conn_tab列表中).最后根據(jù)cp->packet_xmit()的方法對(duì)數(shù)據(jù)進(jìn)行傳送.當(dāng)然,有很多的參數(shù)需要更新,比如連接的狀態(tài),pp,cp,skb的計(jì)數(shù)器等等...


            

net/ipv4/ipvs/ip_vs_core.c

  1.  /*
  2.   * Check if it's for virtual services, look it up,
  3.   * and send it on its way...
  4.   */ //這里翻譯一下,檢查數(shù)據(jù)報(bào)是否是發(fā)往vs(虛擬服務(wù)器)的,如果是,將其轉(zhuǎn)發(fā)到它該去的地方...
  5.  static unsigned int
  6.  ip_vs_in(unsigned int hooknum, struct sk_buff *skb,
  7.   const struct net_device *in, const struct net_device *out,
  8.   int (*okfn)(struct sk_buff *)) //hooknum是規(guī)則鏈代號(hào);*skb是數(shù)據(jù)報(bào)頭部;*in記錄了數(shù)據(jù)報(bào)從哪個(gè)網(wǎng)絡(luò)設(shè)備進(jìn)來(lái);*out記錄了數(shù)據(jù)報(bào)將會(huì)從哪個(gè)網(wǎng)絡(luò)設(shè)備出去(如果知道的話); *okfn()是一個(gè)處理sk_buff指針的函數(shù)指針,基本上沒(méi)用到
  9.  {
  10.   struct iphdr *iph;
  11.   struct ip_vs_protocol *pp;
  12.   struct ip_vs_conn *cp;
  13.   int ret, restart;
  14.   int ihl;
  15.  
  16.   /*
  17.   * Big tappo: only PACKET_HOST (neither loopback nor mcasts)
  18.   * ... don't know why 1st test DOES NOT include 2nd (?)
  19.   */
  20.   if (unlikely(skb->pkt_type != PACKET_HOST //如果數(shù)據(jù)不是給本地網(wǎng)絡(luò)(我們/PACKET_HOST)的
  21.   || skb->dev->flags & IFF_LOOPBACK || skb->sk)) { //或者是給lo設(shè)備的,或者是一個(gè)sock已經(jīng)建立好的連接(應(yīng)該是指本機(jī)已存在的真實(shí)連接吧)
  22.   IP_VS_DBG(12, "packet type=%d proto=%d daddr=%d.%d.%d.%d ignored\n",
  23.   skb->pkt_type,
  24.   ip_hdr(skb)->protocol,
  25.   NIPQUAD(ip_hdr(skb)->daddr)); //調(diào)用IP_VS_DBG做下記錄
  26.   return NF_ACCEPT; //立刻返回NF_ACCEPT(意味著繼續(xù)下一個(gè)hook函數(shù))
  27.   } //而作為一個(gè)vs機(jī)器,以上情況是很少發(fā)生的,所以用到了unlikely這樣的gcc預(yù)編譯函數(shù).以加快執(zhí)行速度
  28.  
  29.   iph = ip_hdr(skb); //得到ip層頭部信息
  30.   if (unlikely(iph->protocol == IPPROTO_ICMP)) { //如果數(shù)據(jù)報(bào)是icmp協(xié)議
  31.   int related, verdict = ip_vs_in_icmp(skb, &related, hooknum); //用ip_vs_in_icmp()進(jìn)行處理
  32.  
  33.   if (related) //如果是相關(guān)聯(lián)的連接
  34.   return verdict; //用ip_vs_in_icmp()返回的值退出
  35.   iph = ip_hdr(skb); //否則得到skb的網(wǎng)絡(luò)層頭部指針(ip_hdr()使用的是偏移量得到的指針位置)
  36.   }
  37.  
  38.   /* Protocol supported? */
  39.   pp = ip_vs_proto_get(iph->protocol); //如果是ipvs不認(rèn)識(shí)的協(xié)議,pass掉
  40.   if (unlikely(!pp))
  41.   return NF_ACCEPT;
  42.  
  43.   ihl = iph->ihl << 2; //iph->ihl是以4byte為一個(gè)單位,所以要做一個(gè)轉(zhuǎn)換
  44.  
  45.   /*
  46.   * Check if the packet belongs to an existing connection entry
  47.   */
  48.   cp = pp->conn_in_get(skb, pp, iph, ihl, 0); //該連接是否已存在,cp為連接狀態(tài)
  49.  
  50.   if (unlikely(!cp)) { //如果在ip_vs_conn_tab中找不到該連接(也就是該連接是第一次訪問(wèn)vs的話)
  51.   int v;
  52.  
  53.   if (!pp->conn_schedule(skb, pp, &v, &cp)) //利用該協(xié)議定義的conn_schedule函數(shù)為skb選擇合適的rs,并根據(jù)skb,pp生成一個(gè)新的cp.并將cp添加到ip_vs_conn_tab中.rs的選擇請(qǐng)查看相應(yīng)協(xié)議的conn_schedule函數(shù),比如tcp_conn_schedule()
  54.   return v; //添加失敗時(shí),返回錯(cuò)誤碼
  55.   }
  56.  
  57.   if (unlikely(!cp)) { //不可知的異常,輸出debug信息后,退出
  58.   /* sorry, all this trouble for a no-hit :) */
  59.   IP_VS_DBG_PKT(12, pp, skb, 0,
  60.   "packet continues traversal as normal");
  61.   return NF_ACCEPT;
  62.   }
  63.  
  64.   IP_VS_DBG_PKT(11, pp, skb, 0, "Incoming packet");
  65.  
  66.   /* Check the server status */
  67.   if (cp->dest && !(cp->dest->flags & IP_VS_DEST_F_AVAILABLE)) { //如果目標(biāo)地址不可用
  68.   /* the destination server is not available */
  69.  
  70.   if (sysctl_ip_vs_expire_nodest_conn) { //讓cp立刻超時(shí)
  71.   /* try to expire the connection immediately */
  72.   ip_vs_conn_expire_now(cp);
  73.   }
  74.   /* don't restart its timer, and silently
  75.   drop the packet. */
  76.   __ip_vs_conn_put(cp); //cp計(jì)數(shù)器-1
  77.   return NF_DROP;
  78.   }
  79.  
  80.   ip_vs_in_stats(cp, skb); //更新cp,skb的計(jì)數(shù)器(連接數(shù)和數(shù)據(jù)量)
  81.   restart = ip_vs_set_state(cp, IP_VS_DIR_INPUT, skb, pp); //更新skb連接在IP_VS_DIR_INPUT位置的狀態(tài)
  82.   if (cp->packet_xmit) //調(diào)用cp的packet_xmit()將數(shù)據(jù)傳送出去,函數(shù)是在建立cp的時(shí)候,由ip_vs_bind_xmit(cp),根據(jù)dest->flags(真實(shí)服務(wù)器的標(biāo)記)來(lái)決定的,有5種方法ip_vs_nat_xmit,ip_vs_tunnel_xmit,ip_vs_dr_xmit,ip_vs_null_xmit,ip_vs_bypass_xmit
  83.   ret = cp->packet_xmit(skb, cp, pp);
  84.   /* do not touch skb anymore */
  85.   else {
  86.   IP_VS_DBG_RL("warning: packet_xmit is null");
  87.   ret = NF_ACCEPT;
  88.   }
  89.  
  90.   /* Increase its packet counter and check if it is needed
  91.   * to be synchronized
  92.   *
  93.   * Sync connection if it is about to close to
  94.   * encorage the standby servers to update the connections timeout
  95.   */
  96.   atomic_inc(&cp->in_pkts); //計(jì)數(shù)器
  97.   if ((ip_vs_sync_state & IP_VS_STATE_MASTER) &&
  98.   (((cp->protocol != IPPROTO_TCP ||
  99.   cp->state == IP_VS_TCP_S_ESTABLISHED) &&
  100.   (atomic_read(&cp->in_pkts) % sysctl_ip_vs_sync_threshold[1]
  101.   == sysctl_ip_vs_sync_threshold[0])) ||
  102.   ((cp->protocol == IPPROTO_TCP) && (cp->old_state != cp->state) &&
  103.   ((cp->state == IP_VS_TCP_S_FIN_WAIT) ||
  104.   (cp->state == IP_VS_TCP_S_CLOSE)))))
  105.   ip_vs_sync_conn(cp); //將ip_vs_conn的信息添加到sync_buff,可用于vs(調(diào)度服務(wù)器)之間的信息同步
  106.   cp->old_state = cp->state;
  107.  
  108.   ip_vs_conn_put(cp); //釋放cp
  109.   return ret;
  110.  }

ip_vs_out()

此函數(shù)放在FORWARD規(guī)則鏈上,經(jīng)過(guò)本機(jī)進(jìn)行轉(zhuǎn)發(fā)的skb都會(huì)被該函數(shù)處理.在vs/nat模式下,內(nèi)網(wǎng)的rs返回給client的數(shù)據(jù)會(huì)經(jīng)網(wǎng)關(guān)(本機(jī))轉(zhuǎn)發(fā),這個(gè)時(shí)候需要修改數(shù)據(jù)報(bào)的源地址,將其修改為網(wǎng)關(guān)的公網(wǎng)ip地址,這樣才能使連接持續(xù)下去,否則client將無(wú)法訪問(wèn)到rs(內(nèi)網(wǎng)地址)


            

net/ipv4/ipvs/ip_vs_core.c

  1.  /*
  2.   * It is hooked at the NF_INET_FORWARD chain, used only for VS/NAT.
  3.   * Check if outgoing packet belongs to the established ip_vs_conn,
  4.   * rewrite addresses of the packet and send it on its way...
  5.   */
  6.  static unsigned int
  7.  ip_vs_out(unsigned int hooknum, struct sk_buff *skb,
  8.   const struct net_device *in, const struct net_device *out,
  9.   int (*okfn)(struct sk_buff *))
  10.  {
  11.   struct iphdr *iph;
  12.   struct ip_vs_protocol *pp;
  13.   struct ip_vs_conn *cp;
  14.   int ihl;
  15.  
  16.   EnterFunction(11); //debug
  17.  
  18.   if (skb->ipvs_property) //如果已經(jīng)被ipvs修改過(guò),直接pass
  19.   return NF_ACCEPT;
  20.  
  21.   iph = ip_hdr(skb); //得到skb的網(wǎng)絡(luò)層頭部信息起始指針
  22.   if (unlikely(iph->protocol == IPPROTO_ICMP)) { //如果是icmp協(xié)議的數(shù)據(jù)
  23.   int related, verdict = ip_vs_out_icmp(skb, &related); //用ip_vs_out_icmp處理
  24.  
  25.   if (related) //如果是相關(guān)聯(lián)的連接
  26.   return verdict; //返回verdict
  27.   iph = ip_hdr(skb); //否則再次得到iph(ip層頭部指針)***為什么又運(yùn)行一次呢?
  28.   }
  29.  
  30.   pp = ip_vs_proto_get(iph->protocol); //得到ipvs的ip_vs_proto結(jié)構(gòu)pp
  31.   if (unlikely(!pp)) //如果是ipvs不支持的協(xié)議,pass掉
  32.   return NF_ACCEPT;
  33.  
  34.   /* reassemble IP fragments */
  35.   if (unlikely(iph->frag_off & htons(IP_MF|IP_OFFSET) && //如果skb是一個(gè)分片
  36.   !pp->dont_defrag)) {
  37.   if (ip_vs_gather_frags(skb, IP_DEFRAG_VS_OUT)) //則重組以后,標(biāo)記為NF_STOLEN返回,防止netfilter對(duì)其再次操作
  38.   return NF_STOLEN;
  39.   iph = ip_hdr(skb); //如果重組失敗,再次得到iph.***重復(fù)3次了
  40.   }
  41.  
  42.   ihl = iph->ihl << 2; //轉(zhuǎn)成byte為長(zhǎng)度單位,默認(rèn)為4byte
  43.  
  44.   /*
  45.   * Check if the packet belongs to an existing entry
  46.   */
  47.   cp = pp->conn_out_get(skb, pp, iph, ihl, 0); //檢查skb是否是ip_vs_conn_tab中某個(gè)連接(client -> rs)的相關(guān)連接(rs -> client),如果是,則返回cp(ip_vs_conn),如果不是,cp為NULL
  48.  
  49.   if (unlikely(!cp)) { //如果cp不存在
  50.   if (sysctl_ip_vs_nat_icmp_send && //sysctl_ip_vs_nat_icmp_send值為0,后面的代碼貌似不會(huì)繼續(xù)執(zhí)行了,這部分代碼估計(jì)是debug用的
  51.   (pp->protocol == IPPROTO_TCP || //skb為tcp協(xié)議或者udp協(xié)議
  52.   pp->protocol == IPPROTO_UDP)) {
  53.   __be16 _ports[2], *pptr;
  54.  
  55.   pptr = skb_header_pointer(skb, ihl, //得到skb端口信息
  56.   sizeof(_ports), _ports);
  57.   if (pptr == NULL) //如果沒(méi)端口,pass
  58.   return NF_ACCEPT; /* Not for me */
  59.   if (ip_vs_lookup_real_service(iph->protocol, //通過(guò)協(xié)議/源地址/源端口去尋找是否是內(nèi)網(wǎng)的某個(gè)rs發(fā)出的tcp/udp數(shù)據(jù)報(bào)
  60.   iph->saddr, pptr[0])) {
  61.   /*
  62.   * Notify the real server: there is no
  63.   * existing entry if it is not RST
  64.   * packet or not TCP packet.
  65.   */
  66.   if (iph->protocol != IPPROTO_TCP //考慮到由內(nèi)網(wǎng)(rs)通過(guò)本機(jī)轉(zhuǎn)發(fā)到外網(wǎng)(client)的數(shù)據(jù),不可能是不是tcp或者不是rst包,否則發(fā)出一個(gè)icmp出錯(cuò)報(bào)文,目的地址不可達(dá).然后丟棄skb
  67.   || !is_tcp_reset(skb)) {
  68.   icmp_send(skb,ICMP_DEST_UNREACH,
  69.   ICMP_PORT_UNREACH, 0);
  70.   return NF_DROP;
  71.   }
  72.   }
  73.   }
  74.   IP_VS_DBG_PKT(12, pp, skb, 0,
  75.   "packet continues traversal as normal");
  76.   return NF_ACCEPT; //pass掉從內(nèi)網(wǎng)(realserver)發(fā)出的到外網(wǎng)的新連接(因?yàn)椴慌cip_vs_conn_tab中的連接相關(guān)聯(lián))
  77.   }
  78.  
  79.   IP_VS_DBG_PKT(11, pp, skb, 0, "Outgoing packet"); //debug
  80.  
  81.   if (!skb_make_writable(skb, ihl)) //如果skb的頭部不可寫(xiě)入,跳到drop處
  82.   goto drop;
  83.  
  84.   /* mangle the packet */
  85.   if (pp->snat_handler && !pp->snat_handler(skb, pp, cp)) //到這里的數(shù)據(jù)就是需要修改源地址的(rs -> client)從內(nèi)網(wǎng)到外網(wǎng)的數(shù)據(jù)報(bào)了
  86.   goto drop; //如果定義了snat_handler,但是snat_handler()失敗,跳到drop處
  87.   ip_hdr(skb)->saddr = cp->vaddr; //將源地址轉(zhuǎn)化為虛擬服務(wù)器的地址,讓這個(gè)到外網(wǎng)的數(shù)據(jù)報(bào)看上去就像是從vs發(fā)出的一樣
  88.   ip_send_check(ip_hdr(skb)); //改動(dòng)了源地址,就要重新計(jì)算校驗(yàn)和
  89.  
  90.   /* For policy routing, packets originating from this
  91.   * machine itself may be routed differently to packets
  92.   * passing through. We want this packet to be routed as
  93.   * if it came from this machine itself. So re-compute
  94.   * the routing information.
  95.   */
  96.   if (ip_route_me_harder(skb, RTN_LOCAL) != 0) //為了讓skb看上去就像是本機(jī)發(fā)出的,還需要刷新路由信息
  97.   goto drop;
  98.  
  99.   IP_VS_DBG_PKT(10, pp, skb, 0, "After SNAT"); //debug
  100.  
  101.   ip_vs_out_stats(cp, skb); //更新cp,skb的計(jì)數(shù)器(連接數(shù),通訊量)
  102.   ip_vs_set_state(cp, IP_VS_DIR_OUTPUT, skb, pp); //更新cp,skb,pp的狀態(tài)參數(shù),標(biāo)記等
  103.   ip_vs_conn_put(cp); //釋放cp計(jì)數(shù)
  104.  
  105.   skb->ipvs_property = 1; //打上標(biāo)記,以免再被ipvs修改
  106.  
  107.   LeaveFunction(11); //debug
  108.   return NF_ACCEPT; //pass
  109.  
  110.   drop:
  111.   ip_vs_conn_put(cp); //釋放cp計(jì)數(shù)
  112.   kfree_skb(skb); //釋放skb空間
  113.   return NF_STOLEN; //返回NF_STOLEN,避免netfilter再次修改
  114.  }

ip_vs_forward_icmp()

該函數(shù)和前面講到的ip_vs_out()在同一個(gè)FORWARD規(guī)則鏈上,但是的優(yōu)先級(jí)為99,比ip_vs_out()的100要小(高),所以優(yōu)先執(zhí)行.

函數(shù)非常簡(jiǎn)單,就是將經(jīng)過(guò)FORWARD規(guī)則鏈的所有icmp數(shù)據(jù)報(bào)交給ip_vs_in_icmp()處理.為什么進(jìn)入本機(jī)的數(shù)據(jù)會(huì)到FORWARD規(guī)則鏈上呢,原因在于local配置成透明設(shè)備時(shí),tcp/udp協(xié)議是比較容易將forward的數(shù)據(jù)讓它input的,而icmp則沒(méi)有那么簡(jiǎn)單了,所以有一些發(fā)往本機(jī)的icmp報(bào)文會(huì)跑到forward規(guī)則鏈上來(lái)(具體原因不明),所以在這里把漏掉的進(jìn)入vs的icmp交給ip_vs_forward_icmp()處理


            

net/ipv4/ipvs/ip_vs_core.c

  1.  /*
  2.   * It is hooked at the NF_INET_FORWARD chain, in order to catch ICMP
  3.   * related packets destined for 0.0.0.0/0.
  4.   * When fwmark-based virtual service is used, such as transparent
  5.   * cache cluster, TCP packets can be marked and routed to ip_vs_in,
  6.   * but ICMP destined for 0.0.0.0/0 cannot not be easily marked and
  7.   * sent to ip_vs_in_icmp. So, catch them at the NF_INET_FORWARD chain
  8.   * and send them to ip_vs_in_icmp.
  9.   */
  10.  static unsigned int
  11.  ip_vs_forward_icmp(unsigned int hooknum, struct sk_buff *skb,
  12.   const struct net_device *in, const struct net_device *out,
  13.   int (*okfn)(struct sk_buff *))
  14.  {
  15.   int r;
  16.  
  17.   if (ip_hdr(skb)->protocol != IPPROTO_ICMP) //如果不是icmp,直接pass
  18.   return NF_ACCEPT;
  19.  
  20.   return ip_vs_in_icmp(skb, &r, hooknum); //如果是.處理之
  21.  }

ip_vs_post_routing()

此函數(shù)的優(yōu)先級(jí)為NF_IP_PRI_NAT_SRC-1,比POSTROUTING上的nat,mangle的優(yōu)先級(jí)都高,保證了早于他們執(zhí)行,目的就是防止被ipvs修改過(guò)的數(shù)據(jù)報(bào)再次被netfilter修改.具體做法如下


            

net/ipv4/ipvs/ip_vs_core.c

  1.  /*
  2.   * It is hooked before NF_IP_PRI_NAT_SRC at the NF_INET_POST_ROUTING
  3.   * chain, and is used for VS/NAT.
  4.   * It detects packets for VS/NAT connections and sends the packets
  5.   * immediately. This can avoid that iptable_nat mangles the packets
  6.   * for VS/NAT.
  7.   */
  8.  static unsigned int ip_vs_post_routing(unsigned int hooknum,
  9.   struct sk_buff *skb,
  10.   const struct net_device *in,
  11.   const struct net_device *out,
  12.   int (*okfn)(struct sk_buff *))
  13.  {
  14.   if (!skb->ipvs_property) //如果skb沒(méi)有ipvs修改過(guò)的記號(hào),則pass,讓netfilter繼續(xù)處理去
  15.   return NF_ACCEPT;
  16.   /* The packet was sent from IPVS, exit this chain */
  17.   return NF_STOP; //否則,用NF_STOP返回,netfilter受到這個(gè)信號(hào)以后,直接退出該規(guī)則鏈,不再做任何處理
  18.  }

    本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間,所有內(nèi)容均由用戶發(fā)布,不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息,謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容,請(qǐng)點(diǎn)擊一鍵舉報(bào)。
    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評(píng)論

    發(fā)表

    請(qǐng)遵守用戶 評(píng)論公約

    類似文章 更多