這是一道《編程之美-微軟技術(shù)面試心得》中的題目,問題描述如下:
對于一個字節(jié)(8bit)的變量,求其二進制表示中“1”的個數(shù),要求算法的執(zhí)行效率盡可能地高。
《編程之美》中給出了五種解法,但是實際上從 Wikipedia 上我們可以找到更優(yōu)的算法。
這道題的本質(zhì)相當(dāng)于求二進制數(shù)的 Hamming 權(quán)重,或者說是該二進制數(shù)與 0 的 Hamming 距離,這兩個概念在信息論和編碼理論中是相當(dāng)有名的。在二進制的情況下,它們也經(jīng)常被叫做 population count 或者 popcount 問題,比如 gcc 中就提供了一個內(nèi)建函數(shù):
int __builtin_popcount (unsigned int x)
注:我查到的算法是這樣的,并非查表,而是Wikipedia 上的方法:
static const unsigned int m1 = 0x55555555; //binary: 0101...
static const unsigned int m2 = 0x33333333; //binary: 00110011..
static const unsigned int m4 = 0x0f0f0f0f; //binary: 4 zeros, 4 ones ...
static const unsigned int h01= 0x01010101; //the sum of 256 to the power of 0,1,2,3...
x -= (x >> 1) & m1; //put count of each 2 bits into those 2 bits
x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits
x = (x + (x >> 4)) & m4; //put count of each 8 bits into those 8 bits
return (x * h01) >> 24; //returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24)
}
/* ===========================================================================
* Problem:
* The fastest way to count how many 1s in a 32-bits integer.
*
* Algorithm:
* The problem equals to calculate the Hamming weight of a 32-bits integer,
* or the Hamming distance between a 32-bits integer and 0. In binary cases,
* it is also called the population count, or popcount.[1]
*
* The best solution known are based on adding counts in a tree pattern
* (divide and conquer). Due to space limit, here is an example for a
* 8-bits binary number A=01101100:[1]
* | Expression | Binary | Decimal | Comment |
* | A | 01101100 | | the original number |
* | B = A & 01010101 | 01000100 | 1,0,1,0 | every other bit from A |
* | C = (A>>1) & 01010101 | 00010100 | 0,1,1,0 | remaining bits from A |
* | D = B + C | 01011000 | 1,1,2,0 | # of 1s in each 2-bit of A |
* | E = D & 00110011 | 00010000 | 1,0 | every other count from D |
* | F = (D>>2) & 00110011 | 00010010 | 1,2 | remaining counts from D |
* | G = E + F | 00100010 | 2,2 | # of 1s in each 4-bit of A |
* | H = G & 00001111 | 00000010 | 2 | every other count from G |
* | I = (G>>4) & 00001111 | 00000010 | 2 | remaining counts from G |
* | J = H + I | 00000100 | 4 | No. of 1s in A |
* Hence A have 4 1s.
*
* [1] http://en./wiki/Hamming_weight
*
* 這個算法的設(shè)計思想用的是二分法,兩兩一組相加,之后四個四個一組相加,接著八個八個,最后就得到各位之和了。* 設(shè)原整數(shù)值為x,
* 第一步:把x的32個bit分成16組(第32bit和第31bit一組,第30bit和第29bit一組……以此類推),然后將每一組的兩bit上的值(因為是二進制數(shù),所以要么是0要么是1)相加并把結(jié)果還放在這兩bit的位置上,這樣,得到結(jié)果整數(shù)x1,x1的二進制(32bit)可以分為16組,每一組的數(shù)值就是原來整數(shù)x在那兩bit上1的個數(shù)。
* 第二步:把第一步得到的結(jié)果x1的32bit,分成8組(第32、31、30、29bit一組,第28、27、26、25bit一組……以此類推),然后每一組的四bit上的值相加并把結(jié)果還放在這四bit的位置上,這樣,又得到結(jié)果整數(shù)x2,x2的二進制可以分為8組,每一組的數(shù)值就是原來整數(shù)x在那四bit上的1的個數(shù)。
* ……
* 這樣一直分組計算下去,最終,把兩個16bit上1的個數(shù)相加,得到原來整數(shù)x的32bit上1的個數(shù)。===========================================================================
*/
#include <stdio.h>typedef unsigned int UINT32;
const UINT32 m1 = 0x55555555; // 01010101010101010101010101010101
const UINT32 m2 = 0x33333333; // 00110011001100110011001100110011
const UINT32 m4 = 0x0f0f0f0f; // 00001111000011110000111100001111
const UINT32 m8 = 0x00ff00ff; // 00000000111111110000000011111111
const UINT32 m16 = 0x0000ffff; // 00000000000000001111111111111111
const UINT32 h01 = 0x01010101; // the sum of 256 to the power of 0, 1, 2, 3/* This is a naive implementation, shown for comparison, and to help in
* understanding the better functions. It uses 20 arithmetic operations
* (shift, add, and). */
int popcount_1(UINT32 x)
{
x = (x & m1) + ((x >> 1) & m1);
x = (x & m2) + ((x >> 2) & m2);
x = (x & m4) + ((x >> 4) & m4);
x = (x & m8) + ((x >> 8) & m8);
x = (x & m16) + ((x >> 16) & m16);
return x;
}/* This uses fewer arithmetic operations than any other known implementation
* on machines with slow multiplication. It uses 15 arithmetic operations. */
int popcount_2(UINT32 x)
{
x -= (x >> 1) & m1; //put count of each 2 bits into those 2 bits
x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits
x = (x + (x >> 4)) & m4; //put count of each 8 bits into those 8 bits
x += x >> 8; //put count of each 16 bits into their lowest 8 bits
x += x >> 16; //put count of each 32 bits into their lowest 8 bits
return x & 0x1f;
}/* This uses fewer arithmetic operations than any other known implementation
* on machines with fast multiplication. It uses 12 arithmetic operations,
* one of which is a multiply. */
int popcount_3(UINT32 x)
{
x -= (x >> 1) & m1; //put count of each 2 bits into those 2 bits
x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits
x = (x + (x >> 4)) & m4; //put count of each 8 bits into those 8 bits
return (x * h01) >> 24; // left 8 bits of x + (x<<8) + (x<<16) + (x<<24)
}
int main()
{
int i = 0x1ff12ee2;
printf("i = %d = 0x%x/n", i, i);
printf("popcount_1(%d) = %d/n", i, popcount_1(i));
printf("popcount_2(%d) = %d/n", i, popcount_2(i));
printf("popcount_3(%d) = %d/n", i, popcount_3(i));
/* If compiled with other compiler than gcc, comment the line bellow. */
printf("GCC's __builtin_popcount(%d) = %d/n", i, __builtin_popcount(i));
return 0;
}




