理解 Memory barrier（內(nèi)存屏障） | Name5566

老匹夫 2014-09-12

展開(kāi)全文

參考文獻(xiàn)列表：
http://en./wiki/Memory_barrier
http://en./wiki/Out-of-order_execution
https://www./doc/Documentation/memory-barriers.txt

本文例子均在 Linux（g++）下驗(yàn)證通過(guò)，CPU 為 X86-64 處理器架構(gòu)。所有羅列的 Linux 內(nèi)核代碼也均在（或只在）X86-64 下有效。

本文首先通過(guò)范例（以及內(nèi)核代碼）來(lái)解釋 Memory barrier，然后介紹一個(gè)利用 Memory barrier 實(shí)現(xiàn)的無(wú)鎖環(huán)形緩沖區(qū)。

Memory barrier 簡(jiǎn)介

程序在運(yùn)行時(shí)內(nèi)存實(shí)際的訪問(wèn)順序和程序代碼編寫(xiě)的訪問(wèn)順序不一定一致，這就是內(nèi)存亂序訪問(wèn)。內(nèi)存亂序訪問(wèn)行為出現(xiàn)的理由是為了提升程序運(yùn)行時(shí)的性能。內(nèi)存亂序訪問(wèn)主要發(fā)生在兩個(gè)階段：

編譯時(shí)，編譯器優(yōu)化導(dǎo)致內(nèi)存亂序訪問(wèn)（指令重排）
運(yùn)行時(shí)，多 CPU 間交互引起內(nèi)存亂序訪問(wèn)

Memory barrier 能夠讓 CPU 或編譯器在內(nèi)存訪問(wèn)上有序。一個(gè) Memory barrier 之前的內(nèi)存訪問(wèn)操作必定先于其之后的完成。Memory barrier 包括兩類(lèi)：

編譯器 barrier
CPU Memory barrier

很多時(shí)候，編譯器和 CPU 引起內(nèi)存亂序訪問(wèn)不會(huì)帶來(lái)什么問(wèn)題，但一些特殊情況下，程序邏輯的正確性依賴(lài)于內(nèi)存訪問(wèn)順序，這時(shí)候內(nèi)存亂序訪問(wèn)會(huì)帶來(lái)邏輯上的錯(cuò)誤，例如：

// thread 1
while (!ok);
do(x);
 
// thread 2
x = 42;
ok = 1;

此段代碼中，ok 初始化為 0，線程 1 等待 ok 被設(shè)置為 1 后執(zhí)行 do 函數(shù)。假如說(shuō)，線程 2 對(duì)內(nèi)存的寫(xiě)操作亂序執(zhí)行，也就是 x 賦值后于 ok 賦值完成，那么 do 函數(shù)接受的實(shí)參就很可能出乎程序員的意料，不為 42。

編譯時(shí)內(nèi)存亂序訪問(wèn)

在編譯時(shí)，編譯器對(duì)代碼做出優(yōu)化時(shí)可能改變實(shí)際執(zhí)行指令的順序（例如 gcc 下 O2 或 O3 都會(huì)改變實(shí)際執(zhí)行指令的順序）：

// test.cpp
int x, y, r;
void f()
{
    x = r;
    y = 1;
}

編譯器優(yōu)化的結(jié)果可能導(dǎo)致 y = 1 在 x = r 之前執(zhí)行完成。首先直接編譯此源文件：

g++ -S test.cpp

得到相關(guān)的匯編代碼如下：

movl    r(%rip), %eax
movl    %eax, x(%rip)
movl    $1, y(%rip)

這里我們看到，x = r 和 y = 1 并沒(méi)有亂序?，F(xiàn)使用優(yōu)化選項(xiàng) O2（或 O3）編譯上面的代碼（g++ -O2 -S test.cpp），生成匯編代碼如下：

movl    r(%rip), %eax
movl    $1, y(%rip)
movl    %eax, x(%rip)

我們可以清楚的看到經(jīng)過(guò)編譯器優(yōu)化之后 movl $1, y(%rip) 先于 movl %eax, x(%rip) 執(zhí)行。避免編譯時(shí)內(nèi)存亂序訪問(wèn)的辦法就是使用編譯器 barrier（又叫優(yōu)化 barrier）。Linux 內(nèi)核提供函數(shù) barrier() 用于讓編譯器保證其之前的內(nèi)存訪問(wèn)先于其之后的完成。內(nèi)核實(shí)現(xiàn) barrier() 如下（X86-64 架構(gòu)）：

#define barrier() __asm__ __volatile__("" ::: "memory")

現(xiàn)在把此編譯器 barrier 加入代碼中：

int x, y, r;
void f()
{
    x = r;
    __asm__ __volatile__("" ::: "memory");
    y = 1;
}

這樣就避免了編譯器優(yōu)化帶來(lái)的內(nèi)存亂序訪問(wèn)的問(wèn)題了（如果有興趣可以再看看編譯之后的匯編代碼）。本例中，我們還可以使用 volatile 這個(gè)關(guān)鍵字來(lái)避免編譯時(shí)內(nèi)存亂序訪問(wèn)（而無(wú)法避免后面要說(shuō)的運(yùn)行時(shí)內(nèi)存亂序訪問(wèn)）。volatile 關(guān)鍵字能夠讓相關(guān)的變量之間在內(nèi)存訪問(wèn)上避免亂序，這里可以修改 x 和 y 的定義來(lái)解決問(wèn)題：

volatile int x, y;
int r;
void f()
{
    x = r;
    y = 1;
}

現(xiàn)加上了 volatile 關(guān)鍵字，這使得 x 相對(duì)于 y、y 相對(duì)于 x 在內(nèi)存訪問(wèn)上有序。在 Linux 內(nèi)核中，提供了一個(gè)宏 ACCESS_ONCE 來(lái)避免編譯器對(duì)于連續(xù)的 ACCESS_ONCE 實(shí)例進(jìn)行指令重排。其實(shí) ACCESS_ONCE 實(shí)現(xiàn)源碼如下：

#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

此代碼只是將變量 x 轉(zhuǎn)換為 volatile 的而已。現(xiàn)在我們就有了第三個(gè)修改方案：

int x, y, r;
void f()
{
    ACCESS_ONCE(x) = r;
    ACCESS_ONCE(y) = 1;
}

到此基本上就闡述完了我們的編譯時(shí)內(nèi)存亂序訪問(wèn)的問(wèn)題。下面開(kāi)始介紹運(yùn)行時(shí)內(nèi)存亂序訪問(wèn)。

運(yùn)行時(shí)內(nèi)存亂序訪問(wèn)

在運(yùn)行時(shí)，CPU 雖然會(huì)亂序執(zhí)行指令，但是在單個(gè) CPU 的上，硬件能夠保證程序執(zhí)行時(shí)所有的內(nèi)存訪問(wèn)操作看起來(lái)像是按程序代碼編寫(xiě)的順序執(zhí)行的，這時(shí)候 Memory barrier 沒(méi)有必要使用（不考慮編譯器優(yōu)化的情況下）。這里我們了解一下 CPU 亂序執(zhí)行的行為。在亂序執(zhí)行時(shí)，一個(gè)處理器真正執(zhí)行指令的順序由可用的輸入數(shù)據(jù)決定，而非程序員編寫(xiě)的順序。
早期的處理器為有序處理器（In-order processors），有序處理器處理指令通常有以下幾步：

指令獲取
如果指令的輸入操作對(duì)象（input operands）可用（例如已經(jīng)在寄存器中了），則將此指令分發(fā)到適當(dāng)?shù)墓δ軉卧?。如果一個(gè)或者多個(gè)操作對(duì)象不可用（通常是由于需要從內(nèi)存中獲取），則處理器會(huì)等待直到它們可用
指令被適當(dāng)?shù)墓δ軉卧獔?zhí)行
功能單元將結(jié)果寫(xiě)回寄存器堆（Register file，一個(gè) CPU 中的一組寄存器）

相比之下，亂序處理器（Out-of-order processors）處理指令通常有以下幾步：

指令獲取
指令被分發(fā)到指令隊(duì)列
指令在指令隊(duì)列中等待，直到輸入操作對(duì)象可用（一旦輸入操作對(duì)象可用，指令就可以離開(kāi)隊(duì)列，即便更早的指令未被執(zhí)行）
指令被分配到適當(dāng)?shù)墓δ軉卧?zhí)行
執(zhí)行結(jié)果被放入隊(duì)列（而不立即寫(xiě)入寄存器堆）
只有所有更早請(qǐng)求執(zhí)行的指令的執(zhí)行結(jié)果被寫(xiě)入寄存器堆后，指令執(zhí)行的結(jié)果才被寫(xiě)入寄存器堆（執(zhí)行結(jié)果重排序，讓執(zhí)行看起來(lái)是有序的）

從上面的執(zhí)行過(guò)程可以看出，亂序執(zhí)行相比有序執(zhí)行能夠避免等待不可用的操作對(duì)象（有序執(zhí)行的第二步）從而提高了效率?，F(xiàn)代的機(jī)器上，處理器運(yùn)行的速度比內(nèi)存快很多，有序處理器花在等待可用數(shù)據(jù)的時(shí)間里已經(jīng)可以處理大量指令了。
現(xiàn)在思考一下亂序處理器處理指令的過(guò)程，我們能得到幾個(gè)結(jié)論：

對(duì)于單個(gè) CPU 指令獲取是有序的（通過(guò)隊(duì)列實(shí)現(xiàn)）
對(duì)于單個(gè) CPU 指令執(zhí)行結(jié)果也是有序返回寄存器堆的（通過(guò)隊(duì)列實(shí)現(xiàn)）

由此可知，在單 CPU 上，不考慮編譯器優(yōu)化導(dǎo)致亂序的前提下，多線程執(zhí)行不存在內(nèi)存亂序訪問(wèn)的問(wèn)題。我們從內(nèi)核源碼也可以得到類(lèi)似的結(jié)論（代碼不完全的摘錄）：

#ifdef CONFIG_SMP
#define smp_mb() mb()
#else
#define smp_mb() barrier()
#endif

這里可以看到，如果是 SMP 則使用 mb，mb 被定義為 CPU Memory barrier（后面會(huì)講到），而非 SMP 時(shí)，直接使用編譯器 barrier。

在多 CPU 的機(jī)器上，問(wèn)題又不一樣了。每個(gè) CPU 都存在 cache（cache 主要是為了彌補(bǔ) CPU 和內(nèi)存之間較慢的訪問(wèn)速度），當(dāng)一個(gè)特定數(shù)據(jù)第一次被特定一個(gè) CPU 獲取時(shí)，此數(shù)據(jù)顯然不在 CPU 的 cache 中（這就是 cache miss）。此 cache miss 意味著 CPU 需要從內(nèi)存中獲取數(shù)據(jù)（這個(gè)過(guò)程需要 CPU 等待數(shù)百個(gè)周期），此數(shù)據(jù)將被加載到 CPU 的 cache 中，這樣后續(xù)就能直接從 cache 上快速訪問(wèn)。當(dāng)某個(gè) CPU 進(jìn)行寫(xiě)操作時(shí)，它必須確保其他的 CPU 已經(jīng)將此數(shù)據(jù)從它們的 cache 中移除（以便保證一致性），只有在移除操作完成后此 CPU 才能安全的修改數(shù)據(jù)。顯然，存在多個(gè) cache 時(shí)，我們必須通過(guò)一個(gè) cache 一致性協(xié)議來(lái)避免數(shù)據(jù)不一致的問(wèn)題，而這個(gè)通訊的過(guò)程就可能導(dǎo)致亂序訪問(wèn)的出現(xiàn)，也就是這里說(shuō)的運(yùn)行時(shí)內(nèi)存亂序訪問(wèn)。這里不再深入討論整個(gè)細(xì)節(jié)，這是一個(gè)比較復(fù)雜的問(wèn)題，有興趣可以研究 http://www./users/paulmck/scalability/paper/whymb.2010.06.07c.pdf 一文，其詳細(xì)的分析了整個(gè)過(guò)程。

現(xiàn)在通過(guò)一個(gè)例子來(lái)說(shuō)明多 CPU 下內(nèi)存亂序訪問(wèn)：

// test2.cpp
#include <pthread.h>
#include <assert.h>
 
// -------------------
int cpu_thread1 = 0;
int cpu_thread2 = 1;
 
volatile int x, y, r1, r2;
 
void start()
{
    x = y = r1 = r2 = 0;
}
 
void end()
{
    assert(!(r1 == 0 && r2 == 0));
}
 
void run1()
{
    x = 1;
    r1 = y;
}
 
void run2()
{
    y = 1;
    r2 = x;
}
 
// -------------------
static pthread_barrier_t barrier_start;
static pthread_barrier_t barrier_end;
 
static void* thread1(void*)
{
    while (1) {
        pthread_barrier_wait(&barrier_start);
        run1();
        pthread_barrier_wait(&barrier_end);
    }
 
    return NULL;
}
 
static void* thread2(void*)
{
    while (1) {
        pthread_barrier_wait(&barrier_start);
        run2();
        pthread_barrier_wait(&barrier_end);
    }
 
    return NULL;
}
 
int main()
{
    assert(pthread_barrier_init(&barrier_start, NULL, 3) == 0);
    assert(pthread_barrier_init(&barrier_end, NULL, 3) == 0);
 
    pthread_t t1;
    pthread_t t2;
    assert(pthread_create(&t1, NULL, thread1, NULL) == 0);
    assert(pthread_create(&t2, NULL, thread2, NULL) == 0);
 
    cpu_set_t cs;
    CPU_ZERO(&cs);
    CPU_SET(cpu_thread1, &cs);
    assert(pthread_setaffinity_np(t1, sizeof(cs), &cs) == 0);
    CPU_ZERO(&cs);
    CPU_SET(cpu_thread2, &cs);
    assert(pthread_setaffinity_np(t2, sizeof(cs), &cs) == 0);
 
    while (1) {
        start();
        pthread_barrier_wait(&barrier_start);
        pthread_barrier_wait(&barrier_end);
        end();
    }
 
    return 0;
}

這里創(chuàng)建了兩個(gè)線程來(lái)運(yùn)行測(cè)試代碼（需要測(cè)試的代碼將放置在 run 函數(shù)中）。我使用了 pthread barrier（區(qū)別于本文討論的 Memory barrier）主要為了讓兩個(gè)子線程能夠同時(shí)運(yùn)行它們的 run 函數(shù)。此段代碼不停的嘗試同時(shí)運(yùn)行兩個(gè)線程的 run 函數(shù)，以便得出我們期望的結(jié)果。在每次運(yùn)行 run 函數(shù)前會(huì)調(diào)用一次 start 函數(shù)（進(jìn)行數(shù)據(jù)初始化），run 運(yùn)行后會(huì)調(diào)用一次 end 函數(shù)（進(jìn)行結(jié)果檢查）。run1 和 run2 兩個(gè)函數(shù)運(yùn)行在哪個(gè) CPU 上則通過(guò) cpu_thread1 和 cpu_thread2 兩個(gè)變量控制。
先編譯此程序：g++ -lpthread -o test2 test2.cpp（這里未優(yōu)化，目的是為了避免編譯器優(yōu)化的干擾）。需要注意的是，兩個(gè)線程運(yùn)行在兩個(gè)不同的 CPU 上（CPU 0 和 CPU 1）。只要內(nèi)存不出現(xiàn)亂序訪問(wèn)，那么 r1 和 r2 不可能同時(shí)為 0，因此斷言失敗表示存在內(nèi)存亂序訪問(wèn)。編譯之后運(yùn)行此程序，會(huì)發(fā)現(xiàn)存在一定概率導(dǎo)致斷言失敗。為了進(jìn)一步說(shuō)明問(wèn)題，我們把 cpu_thread2 的值改為 0，換而言之就是讓兩個(gè)線程跑在同一個(gè) CPU 下，再運(yùn)行程序發(fā)現(xiàn)斷言不再失敗。

最后，我們使用 CPU Memory barrier 來(lái)解決內(nèi)存亂序訪問(wèn)的問(wèn)題（X86-64 架構(gòu)下）：

int cpu_thread1 = 0;
int cpu_thread2 = 1;
 
void run1()
{
    x = 1;
    __asm__ __volatile__("mfence" ::: "memory");
    r1 = y;
}
 
void run2()
{
    y = 1;
    __asm__ __volatile__("mfence" ::: "memory");
    r2 = x;
}

準(zhǔn)備使用 Memory barrier

Memory barrier 常用場(chǎng)合包括：

實(shí)現(xiàn)同步原語(yǔ)（synchronization primitives）
實(shí)現(xiàn)無(wú)鎖數(shù)據(jù)結(jié)構(gòu)（lock-free data structures）
驅(qū)動(dòng)程序

實(shí)際的應(yīng)用程序開(kāi)發(fā)中，開(kāi)發(fā)者可能完全不知道 Memory barrier 就可以開(kāi)發(fā)正確的多線程程序，這主要是因?yàn)楦鞣N同步機(jī)制中已經(jīng)隱含了 Memory barrier（但和實(shí)際的 Memory barrier 有細(xì)微差別），這就使得不直接使用 Memory barrier 不會(huì)存在任何問(wèn)題。但是如果你希望編寫(xiě)諸如無(wú)鎖數(shù)據(jù)結(jié)構(gòu)，那么 Memory barrier 還是很有用的。

通常來(lái)說(shuō)，在單個(gè) CPU 上，存在依賴(lài)的內(nèi)存訪問(wèn)有序：

Q = P;
D = *Q;

這里內(nèi)存操作有序。然而在 Alpha CPU 上，存在依賴(lài)的內(nèi)存讀取操作不一定有序，需要使用數(shù)據(jù)依賴(lài) barrier（由于 Alpha 不常見(jiàn)，這里就不詳細(xì)解釋了）。

在 Linux 內(nèi)核中，除了前面說(shuō)到的編譯器 barrier — barrier() 和 ACCESS_ONCE()，還有 CPU Memory barrier：

通用 barrier，保證讀寫(xiě)操作有序的，mb() 和 smp_mb()
寫(xiě)操作 barrier，僅保證寫(xiě)操作有序的，wmb() 和 smp_wmb()
讀操作 barrier，僅保證讀操作有序的，rmb() 和 smp_rmb()

注意，所有的 CPU Memory barrier（除了數(shù)據(jù)依賴(lài) barrier 之外）都隱含了編譯器 barrier。這里的 smp 開(kāi)頭的 Memory barrier 會(huì)根據(jù)配置在單處理器上直接使用編譯器 barrier，而在 SMP 上才使用 CPU Memory barrier（也就是 mb()、wmb()、rmb()，回憶上面相關(guān)內(nèi)核代碼）。

最后需要注意一點(diǎn)的是，CPU Memory barrier 中某些類(lèi)型的 Memory barrier 需要成對(duì)使用，否則會(huì)出錯(cuò)，詳細(xì)來(lái)說(shuō)就是：一個(gè)寫(xiě)操作 barrier 需要和讀操作（或數(shù)據(jù)依賴(lài)）barrier 一起使用（當(dāng)然，通用 barrier 也是可以的），反之依然。

Memory barrier 的范例

讀內(nèi)核代碼進(jìn)一步學(xué)習(xí) Memory barrier 的使用。
Linux 內(nèi)核實(shí)現(xiàn)的無(wú)鎖（只有一個(gè)讀線程和一個(gè)寫(xiě)線程時(shí)）環(huán)形緩沖區(qū) kfifo 就使用到了 Memory barrier，實(shí)現(xiàn)源碼如下：

/*
 * A simple kernel FIFO implementation.
 *
 * Copyright (C) 2004 Stelian Pop <stelian@popies.net>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
 *
 */
 
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/err.h>
#include <linux/kfifo.h>
#include <linux/log2.h>
 
/**
 * kfifo_init - allocates a new FIFO using a preallocated buffer
 * @buffer: the preallocated buffer to be used.
 * @size: the size of the internal buffer, this have to be a power of 2.
 * @gfp_mask: get_free_pages mask, passed to kmalloc()
 * @lock: the lock to be used to protect the fifo buffer
 *
 * Do NOT pass the kfifo to kfifo_free() after use! Simply free the
 * &struct kfifo with kfree().
 */
struct kfifo *kfifo_init(unsigned char *buffer, unsigned int size,
                         gfp_t gfp_mask, spinlock_t *lock)
{
    struct kfifo *fifo;
 
    /* size must be a power of 2 */
    BUG_ON(!is_power_of_2(size));
 
    fifo = kmalloc(sizeof(struct kfifo), gfp_mask);
    if (!fifo)
        return ERR_PTR(-ENOMEM);
 
    fifo->buffer = buffer;
    fifo->size = size;
    fifo->in = fifo->out = 0;
    fifo->lock = lock;
 
    return fifo;
}
EXPORT_SYMBOL(kfifo_init);
 
/**
 * kfifo_alloc - allocates a new FIFO and its internal buffer
 * @size: the size of the internal buffer to be allocated.
 * @gfp_mask: get_free_pages mask, passed to kmalloc()
 * @lock: the lock to be used to protect the fifo buffer
 *
 * The size will be rounded-up to a power of 2.
 */
struct kfifo *kfifo_alloc(unsigned int size, gfp_t gfp_mask, spinlock_t *lock)
{
    unsigned char *buffer;
    struct kfifo *ret;
 
    /*
     * round up to the next power of 2, since our 'let the indices
     * wrap' technique works only in this case.
     */
    if (!is_power_of_2(size)) {
        BUG_ON(size > 0x80000000);
        size = roundup_pow_of_two(size);
    }
 
    buffer = kmalloc(size, gfp_mask);
    if (!buffer)
        return ERR_PTR(-ENOMEM);
 
    ret = kfifo_init(buffer, size, gfp_mask, lock);
 
    if (IS_ERR(ret))
        kfree(buffer);
 
    return ret;
}
EXPORT_SYMBOL(kfifo_alloc);
 
/**
 * kfifo_free - frees the FIFO
 * @fifo: the fifo to be freed.
 */
void kfifo_free(struct kfifo *fifo)
{
    kfree(fifo->buffer);
    kfree(fifo);
}
EXPORT_SYMBOL(kfifo_free);
 
/**
 * __kfifo_put - puts some data into the FIFO, no locking version
 * @fifo: the fifo to be used.
 * @buffer: the data to be added.
 * @len: the length of the data to be added.
 *
 * This function copies at most @len bytes from the @buffer into
 * the FIFO depending on the free space, and returns the number of
 * bytes copied.
 *
 * Note that with only one concurrent reader and one concurrent
 * writer, you don't need extra locking to use these functions.
 */
unsigned int __kfifo_put(struct kfifo *fifo,
                         const unsigned char *buffer, unsigned int len)
{
    unsigned int l;
 
    len = min(len, fifo->size - fifo->in + fifo->out);
 
    /*
     * Ensure that we sample the fifo->out index -before- we
     * start putting bytes into the kfifo.
     */
 
    smp_mb();
 
    /* first put the data starting from fifo->in to buffer end */
    l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
    memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);
 
    /* then put the rest (if any) at the beginning of the buffer */
    memcpy(fifo->buffer, buffer + l, len - l);
 
    /*
     * Ensure that we add the bytes to the kfifo -before-
     * we update the fifo->in index.
     */
 
    smp_wmb();
 
    fifo->in += len;
 
    return len;
}
EXPORT_SYMBOL(__kfifo_put);
 
/**
 * __kfifo_get - gets some data from the FIFO, no locking version
 * @fifo: the fifo to be used.
 * @buffer: where the data must be copied.
 * @len: the size of the destination buffer.
 *
 * This function copies at most @len bytes from the FIFO into the
 * @buffer and returns the number of copied bytes.
 *
 * Note that with only one concurrent reader and one concurrent
 * writer, you don't need extra locking to use these functions.
 */
unsigned int __kfifo_get(struct kfifo *fifo,
                         unsigned char *buffer, unsigned int len)
{
    unsigned int l;
 
    len = min(len, fifo->in - fifo->out);
 
    /*
     * Ensure that we sample the fifo->in index -before- we
     * start removing bytes from the kfifo.
     */
 
    smp_rmb();
 
    /* first get the data from fifo->out until the end of the buffer */
    l = min(len, fifo->size - (fifo->out & (fifo->size - 1)));
    memcpy(buffer, fifo->buffer + (fifo->out & (fifo->size - 1)), l);
 
    /* then get the rest (if any) from the beginning of the buffer */
    memcpy(buffer + l, fifo->buffer, len - l);
 
    /*
     * Ensure that we remove the bytes from the kfifo -before-
     * we update the fifo->out index.
     */
 
    smp_mb();
 
    fifo->out += len;
 
    return len;
}
EXPORT_SYMBOL(__kfifo_get);

為了更好的理解上面的源碼，這里順帶說(shuō)一下此實(shí)現(xiàn)使用到的一些和本文主題無(wú)關(guān)的技巧：

使用與操作來(lái)求取環(huán)形緩沖區(qū)的下標(biāo)，相比取余操作來(lái)求取下標(biāo)的做法效率要高不少。使用與操作求取下標(biāo)的前提是環(huán)形緩沖區(qū)的大小必須是 2 的 N 次方，換而言之就是說(shuō)環(huán)形緩沖區(qū)的大小為一個(gè)僅有一個(gè) 1 的二進(jìn)制數(shù)，那么 index & (size – 1) 則為求取的下標(biāo)（這不難理解）
使用了 in 和 out 兩個(gè)索引且 in 和 out 是一直遞增的（此做法比較巧妙），這樣能夠避免一些復(fù)雜的條件判斷（某些實(shí)現(xiàn)下，in == out 時(shí)還無(wú)法區(qū)分緩沖區(qū)是空還是滿(mǎn)）

這里，索引 in 和 out 被兩個(gè)線程訪問(wèn)。in 和 out 指明了緩沖區(qū)中實(shí)際數(shù)據(jù)的邊界，也就是 in 和 out 同緩沖區(qū)數(shù)據(jù)存在訪問(wèn)上的順序關(guān)系，由于未使用同步機(jī)制，那么保證順序關(guān)系就需要使用到 Memory barrier 了。索引 in 和 out 都分別只被一個(gè)線程修改，而被兩個(gè)線程讀取。__kfifo_put 先通過(guò) in 和 out 來(lái)確定可以向緩沖區(qū)中寫(xiě)入數(shù)據(jù)量的多少，這時(shí)，out 索引應(yīng)該先被讀取后才能真正的將用戶(hù) buffer 中的數(shù)據(jù)寫(xiě)入緩沖區(qū)，因此這里使用到了 smp_mb()，對(duì)應(yīng)的，__kfifo_get 也使用 smp_mb() 來(lái)確保修改 out 索引之前緩沖區(qū)中數(shù)據(jù)已經(jīng)被成功讀取并寫(xiě)入用戶(hù) buffer 中了。對(duì)于 in 索引，在 __kfifo_put 中，通過(guò) smp_wmb() 保證先向緩沖區(qū)寫(xiě)入數(shù)據(jù)后才修改 in 索引，由于這里只需要保證寫(xiě)入操作有序，故選用寫(xiě)操作 barrier，在 __kfifo_get 中，通過(guò) smp_rmb() 保證先讀取了 in 索引（這時(shí)候 in 索引用于確定緩沖區(qū)中實(shí)際存在多少可讀數(shù)據(jù)）才開(kāi)始讀取緩沖區(qū)中數(shù)據(jù)（并寫(xiě)入用戶(hù) buffer 中），由于這里只需要保證讀取操作有序，故選用讀操作 barrier。

到這里，Memory barrier 就介紹完畢了。

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶(hù)發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買(mǎi)等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自：老匹夫 > 《Lock》

舉報(bào)/認(rèn)領(lǐng)