bit count algorithm
背景
最近在看Linux video相关的框架和一些简单的使用教程。在看到X11系统的一个教程的时候,我看到了一个不太理解的代码,它的使用背景是在X11 client向server发送一个op请求的时候,会有一个请求flag参数。这个flag告诉server有多少额外的信息(field)会发送到server以及这些信息的内容。这意味着client需要根据不同的flag来申请不同大小的空间,flag的二进制表示中有多少个1就代表有多少个field,其中每一个field都是32bit大小。这段代码如下所示:
1
2
3
4
5
6
7
8
9
10
11
12
13
//MIT HACKMEM bit counting algorithm (这个注释貌似不太对,用的应该是parallel bitcount 那个算法)
int count_bits(uint32_t n)
{
unsigned int c;
c = n - ((n >> 1) & 0x55555555);
c = ((c >> 2) & 0x33333333) + (c & 0x33333333);
c = ((c >> 4) + c) & 0x0F0F0F0F;
c = ((c >> 8) + c) & 0x00FF00FF;
c = ((c >> 16) + c) & 0x0000FFFF;
return c;
}
第一眼看到的时候很懵,不过作者很友好,在教程中给出了算法的来源,这个网站里面列举了很多bit count routines,我这篇博客的目的就是以自己的语言分析一下每一种的原理。
Bit Counting 函数
以下的算法大多来自于这里。注意,它们大多数只适用于32位整数!
Iterated Bit Count
代码:
1
2
3
4
5
6
7
8
9
10
11
12
/* Iterated bitcount iterates over each bit. The while condition sometimes helps
terminates the loop earlier */
int iterated_bitcount (unsigned int n)
{
int count=0;
while (n)
{
count += n & 0x1u ;
n >>= 1 ;
}
return count ;
}
- 时间复杂度:O(b) (b = 二进制数中1的个数)
- 空间复杂度:低
这个算法比较直观,就不作解释了。
Sparse Ones & Dense Ones
代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/* Sparse Ones runs proportional to the number of ones in n.
The line n &= (n-1) simply sets the last 1 bit in n to zero. */
int sparse_ones_bitcount (unsigned int n)
{
int count=0 ;
while (n)
{
count++ ;
n &= (n - 1) ;
}
return count ;
}
/* Dense Ones runs proportional to the number of zeros in n.
It first toggles all bits in n, then diminishes count repeatedly */
int dense_ones_bitcount (unsigned int n)
{
int count = 8 * sizeof(int) ;
n ^= (unsigned int) -1 ;
while (n)
{
count-- ;
n &= (n - 1) ;
}
return count ;
}
- 时间复杂度:O(b) (b = 二进制数中1的个数)
- 空间复杂度:低
这两个算法实际的思想是一致的,它们和直接迭代类似,区别在于把位移操作以与减一后的自己做位与来代替了。 n & (n-1)
保留了高位部分不变,把最低位的1变为0.
查表法
代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/* Precomputed bitcount uses a precomputed array that stores the number of ones
in each char. */
static int bits_in_char [256] ;
void compute_bits_in_char (void)
{
unsigned int i ;
for (i = 0; i < 256; i++)
bits_in_char [i] = iterated_bitcount (i) ;
return ;
}
int precomputed_bitcount (unsigned int n)
{
// works only for 32-bit ints
return bits_in_char [n & 0xffu]
+ bits_in_char [(n >> 8) & 0xffu]
+ bits_in_char [(n >> 16) & 0xffu]
+ bits_in_char [(n >> 24) & 0xffu] ;
}
/* Here is another version of precomputed bitcount that uses a precomputed array
that stores the number of ones in each short. */
static char bits_in_16bits [0x1u << 16] ;
void compute_bits_in_16bits (void)
{
unsigned int i ;
for (i = 0; i < (0x1u<<16); i++)
bits_in_16bits [i] = iterated_bitcount (i) ;
return ;
}
int precomputed16_bitcount (unsigned int n)
{
// works only for 32-bit int
return bits_in_16bits [n & 0xffffu]
+ bits_in_16bits [(n >> 16) & 0xffffu] ;
}
- 时间复杂度:O(1)
- 空间复杂度:高
牺牲空间复杂度来降低时间复杂度的方法。
Parallel Count
代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/* Parallel Count carries out bit counting in a parallel
fashion. Consider n after the first line has finished
executing. Imagine splitting n into pairs of bits. Each pair contains
the <em>number of ones</em> in those two bit positions in the original
n. After the second line has finished executing, each nibble contains
the <em>number of ones</em> in those four bits positions in the
original n. Continuing this for five iterations, the 64 bits contain
the number of ones among these sixty-four bit positions in the
original n. That is what we wanted to compute. */
#define TWO(c) (0x1u << (c))
#define MASK(c) (((unsigned int)(-1)) / (TWO(TWO(c)) + 1u))
#define COUNT(x,c) ((x) & MASK(c)) + (((x) >> (TWO(c))) & MASK(c))
int parallel_bitcount (unsigned int n)
{
n = COUNT(n, 0) ;
n = COUNT(n, 1) ;
n = COUNT(n, 2) ;
n = COUNT(n, 3) ;
n = COUNT(n, 4) ;
/* n = COUNT(n, 5) ; for 64-bit integers */
return n ;
}
- 时间复杂度:O(1)
- 空间复杂度:低
那几个宏没怎么看懂,不过好像和下面的算法差不多,那么我就只解释下面这个好了.
Nifty Parallel Count
代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/* Nifty Parallel Count works the same way as Parallel Count for the
first three iterations. At the end of the third line (just before the
return), each byte of n contains the number of ones in those eight bit
positions in the original n. A little thought then explains why the
remainder modulo 255 works. */
#define MASK_01010101 (((unsigned int)(-1))/3)
#define MASK_00110011 (((unsigned int)(-1))/5)
#define MASK_00001111 (((unsigned int)(-1))/17)
int nifty_bitcount (unsigned int n)
{
n = (n & MASK_01010101) + ((n >> 1) & MASK_01010101) ;
n = (n & MASK_00110011) + ((n >> 2) & MASK_00110011) ;
n = (n & MASK_00001111) + ((n >> 4) & MASK_00001111) ;
return n % 255 ;
}
- 时间复杂度:O(1)
- 空间复杂度:低
这个算法的核心思想是先把一个32位二进制数分成4个部分:最高的8位,次高的8位,次低的8位,最低8位.对于每一个8位二进制数 (abcdefgh)2 进行以下运算:
n = 128a+64b+32c+16d+8e+4f+2g+h
n >> 1 = 64a+32b+16c+8d+4e+2f+g
n & 0b01010101 = 64b+16d+4f+h (1)
(n >> 1) & 0b01010101 = 64a+16c+4e+g (2)
(1) + (2) = 64(a+b) + 16(c+d) + 4(e+f) + (g+h) (3) - 对应于代码中的第一行
(3) >> 2 = 16(a+b) + 4(c+d) + (e+f)
(3) & 0b00110011 = 16(c+d) + (g+h) (4)
((3) >> 2) & 0b00110011 = 16(a+b) + (e+f) (5) - 对应于代码中的第二行
(4) + (5) = 16(a+b+c+d) + (e+f+g+h) (6)
(6) >> 4 = (a+b+c+d)
(6) & 0b00001111 = (e+f+g+h) (7)
((6) >> 4) & 0b00001111 = (a+b+c+d) (8)
(7) + (8) = a+b+c+d+e+f+g+h - 对应于代码中的第三行
于是,这个32位二进制数每8位代表的数字都代表这8位里面有1的个数。我们现在要想办法把这四个8位数加起来得到总和。代码中直接把它对255取余。这背后的原因如下:
考虑一个26位N进制数: (abcde..xyz)N
(abcde..xyz)N
= a*N^25 + b*N^24 + ... + y*N + z
= (a * (N^25-1) + a) + (b * (N^24-1) + b) + ... + y * (N-1) +y + z
= (a+b+c+...+x+y+z) + (a * (N^25-1) + b * (N^24-1) + ... + y * (N-1))
根据等比数列求和公式:
1 + Q + Q^2 + ... + Q^(n-1) = (1-Q^n) / (1-Q)
(Q^n-1) = (Q-1) * (1+Q+Q^2+...+Q^(n-1))
带入上式:
= (a+b+c+...+x+y+z) + (a*(N-1)*(1+N+...+N^24) + b*(N-1)*(1+N+...+N^23) +...+ y*(N-1))
= (a+b+c+...+x+y+z) + (N-1) (a*(1+N+...+N^24) + b*(1+N+...+N^23) + ... + y)
因此,只要 (a+b+c...+x+y+z) < (N-1) , 那么 (abcde...xyz)N % (N-1) = (a+b+c+...+x+y+z)
由于32位整数有二进制1的个数一定小于等于32的,因此可以把这个32位二进制数想象成4位256进制数再对255取余即可。
MIT BitCount
代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* MIT Bitcount
Consider a 3 bit number as being
4a+2b+c
if we shift it right 1 bit, we have
2a+b
subtracting this from the original gives
2a+b+c
if we shift the original 2 bits right we get
a
and so with another subtraction we have
a+b+c
which is the number of bits in the original number.
Suitable masking allows the sums of the octal digits in a 32 bit number to
appear in each octal digit. This isn't much help unless we can get all of
them summed together. This can be done by modulo arithmetic (sum the digits
in a number by molulo the base of the number minus one) the old "casting out
nines" trick they taught in school before calculators were invented. Now,
using mod 7 wont help us, because our number will very likely have more than 7
bits set. So add the octal digits together to get base64 digits, and use
modulo 63. (Those of you with 64 bit machines need to add 3 octal digits
together to get base512 digits, and use mod 511.)
This is HACKMEM 169, as used in X11 sources.
Source: MIT AI Lab memo, late 1970's.
*/
int mit_bitcount(unsigned int n)
{
/* works for 32-bit numbers only */
register unsigned int tmp;
tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);
return ((tmp + (tmp >> 3)) & 030707070707) % 63;
}
- 时间复杂度:O(1)
- 空间复杂度:低
这个算法和前两个类似,只是这里把32位二进制数分成了11个部分,每个部分是一个3位二进制数。然后对于每个3位二进制数,通过注释中的方法得到这3位中1的个数,作为自身的值。
然后,将这11个3位二进制数邻近两两相加,把和保存在低3位,高3位清零(因为可以肯定的是它们的和 <= 6 = 3+3 < 2^3-1
). 这个运算后的结果于是可以看成6位64进制数。由于32位整数最多只可能有32个二进制1,小于(64-1), 因此可以对其直接取模。
Comments