If there is a number in binary, in a n bit system, then the floor log of the number is defined as the index of the MSB of the number. Now, if I have a number in binary, By scanning all bits one by one, I can determine the index of the MSB, but it will take me order n time. Is there some faster way I can do it?
Using c# as an example, for a byte, you can pre-compute a table and then just do a lookup
internal static readonly byte[] msbPos256 = new byte[256];
static ByteExtensions() {
msbPos256[0] = 8; // special value for when there are no set bits
msbPos256[1] = 0;
for (int i = 2; i < 256; i++) msbPos256[i] = (byte)(1 + msbPos256[i / 2]);
}
/// <summary>
/// Returns the integer logarithm base 2 (Floor(Log2(number))) of the specified number.
/// </summary>
/// <remarks>Example: Log2(10) returns 3.</remarks>
/// <param name="number">The number whose base 2 log is desired.</param>
/// <returns>The base 2 log of the number greater than 0, or 0 when the number
/// equals 0.</returns>
public static byte Log2(this byte value) {
return msbPos256[value | 1];
}
for an unsigned 32 bit int, the following will work
private static byte[] DeBruijnLSBsSet = new byte[] {
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
public static uint Log2(this uint value) {
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
return DeBruijnLSBsSet[unchecked((value | value >> 16) * 0x07c4acddu) >> 27];
}
This website is the go-to place for bit twiddling tricks
http://graphics.stanford.edu/~seander/bithacks.html
It has these, and a number of other techniques for achieving what you are asking for in your question.
There are a number of general tricks that utilize small lookup tables, as #hatchet says.
There is a notable alternative, however. If you want the fastest implementation and are using a low-level language, then this instruction is also built into almost all ISAs and has support from almost all compilers. See https://en.wikipedia.org/wiki/Find_first_set and use compiler intrinsics or inline assembly as appropriate.
Related
I've been using the Intel-provided RNG feature for some time, to provide myself with some randomness by means of a C++/CLI program I wrote myself.
However, after some time, something struck me as particularly suspicious. Among other uses, I ask for a random number between 1 and 4 and wrote the result down on paper each time. Here are the results :
2, 3, 3, 2, 1, 3, 4, 2, 3, 2, 3, 1, 3, 2, 3, 1, 2, 4, 2, 2, 1, 2, 1, 3, 1, 3, 3, 3, 3.
Number of 1s : 6
Number of 2s : 9
Number of 3s : 12
Number of 4s : 2
Total : 29
I'm actually wondering if there's a problem either with Intel's RNG, my algorithm, methodology or something else maybe ? Or do you consider the bias not to be significant enough yet ?
I'm using Windows 10 Pro, my CPU is an Intel Core i7-4710MQ.
Compiled with VS2017.
Methodology :
Start a Powershell command prompt
Load my assembly with Add-Type -Path <mydll>
Invoke [rdrw.Random]::Next(4)
Add one to the result
A detail that may be of importance : I don't ask for that number very often, so there's some time between draws and it usually comes when the RNG hasn't been used for some time (one hour at least).
And yes it's a lazy algorithm, I didn't want to bother myself with exceptions.
Algorithm follows :
#include <immintrin.h>
namespace rdrw {
#pragma managed(push,off)
unsigned long long getRdRand() {
unsigned long long val = 0;
while (!_rdrand64_step(&val));
return val;
}
#pragma managed(pop)
public ref class Random abstract sealed
{
public:
// Returns a random 64 bit unsigned integer
static unsigned long long Next() {
return getRdRand();
}
// Return a random unsigned integer between 0 and max-1 (inclusive)
static unsigned long long Next(unsigned long long max) {
unsigned long long nb = max - 1;
unsigned long long mask = 1;
unsigned long long draw = 0;
if (max <= 1)
return 0;
// Create a bitmask that's at least as big as the biggest acceptable value
while ((nb&mask) != nb)
{
mask <<= 1;
mask |= 1;
}
do
{
// Throw unnecessary bits
draw = Next() & mask;
} while (draw>nb);
return draw;
}
// return a random unsigned integer between min and max-1 inclusive
static unsigned long long Next(unsigned long long min, unsigned long long max) {
if (max == min)
return min;
if (max < min)
return 0;
unsigned long long diff = max - min;
return Next(diff) + min;
}
};
}
Thanks for your insights !
So, I'm working on a problem that requires me to insert keys in order in a hash table. I stopped inserting after the 20 since there is not more room. I provide the following picture to help with context. I created the hash table, found the number of collisions and the load factor. Collisions are resolved by open addressing. Sorry this isn't a questions, I just need someone to look over it and tell me if its all correct.
There are a number of errors and misunderstandings in your question.
You state that you 'stopped inserting after 20' but you show 15 keys.
There are 9 buckets in your hash table but then you state that the load factor is 1. Load factor is the number of keys (15 or 20) divided by the number of buckets (9) so it is not 1.
In a hash function h(k,i) k is the key and i is the number of buckets. In your case i is 9 and so the function (k mod 9 + 5i) mod 9 really makes no sense.
All hash functions should end with mod i.
There are not 15 collisions in the keys you provided. A collision only occurs when there's a previous value in the table.
This is all explained in the wikipedia article on hashtables.
With the clarifications in the comments below this answer in mind, I used the following code to verify your conclusions:
public class Hashing {
private static final int SIZE = 9;
private final int[] keys = new int[SIZE];
private int collisions = 0;
public void add(int key) {
int attempt = 0;
while (keys[hash(key, attempt)] > 0)
attempt++;
collisions += attempt;
keys[hash(key, attempt)] = key;
}
private int hash(int key, int attempt) {
return (key % SIZE + 5 * attempt) % SIZE;
}
public static void main(String[] args) {
Hashing table = new Hashing();
Stream.of(28, 5, 15, 19, 10, 17, 33, 12, 20).forEach(table::add);
System.out.println("Table " + Arrays.toString(table.keys));
System.out.println("Collisions " + table.collisions);
}
}
And received the following output:
Table [20, 28, 19, 33, 12, 5, 15, 10, 17]
Collisions 15
My question is more or less what's in the title; I'm wondering if there's a fast way to going through a sequence of bits and finding each bit that's set.
More detailed information:
I'm currently working on a data stucture that represents a set of objects. In order to support some operations I need, the structure must be able to perform very fast intersection of subsets internally. The solution I've come up with is to have each subset of the structure's superset represented by a "bit array", where each bit maps to an index in the array that holds the superset's data. Example: if bit #1 is set in a subset, then the element at index 1 in the superset's array is present in the subset.
Each subset consists of an array of ulong big enough that there's enough bits to represent the entire superset (if the superset contains 256 elements, the size of the array must be 256 / 64 = 4). To find the intersection of 2 subsets, S1 and S2, I can simply iterate through S1 and S2's array, and find the bitwise-and between the ulongs at each index.
Now back to what my question is really about:
In order to return the data of a subset, I have to iterate through all the bits in the subset's "bit array" and find the bits that are set. This is how I curently do it:
/// <summary>
/// Gets an enumerator that enables enumeration over the strings in the subset.
/// </summary>
/// <returns> An enumerator. </returns>
public IEnumerator<string> GetEnumerator()
{
int bitArrayChunkIndex = 0;
int bitArrayChunkOffset = 0;
int bitArrayChunkCount = this.bitArray.Length;
while(bitArrayChunkIndex < bitArrayChunkCount)
{
ulong bitChunk = bitArray[bitArrayChunkIndex];
// RELEVANT PART
if (bitChunk != 0)
{
int bit = 0;
while (bit < BIT_ARRAY_CHUNK_SIZE /* 64 */)
{
if(bitChunk.BitIsSet(bit))
yield return supersetData[bitArrayChunkOffset + bit];
bit++;
}
}
bitArrayChunkIndex++;
bitArrayChunkOffset += BIT_ARRAY_CHUNK_SIZE;
// END OF RELEVANT PART
}
}
Is there any obvious ways to optimize this? Any bit hacks to enable it to be done very fast? Thanks!
On INTEL 386+, you can use machine instruction BitSearchFirst.
Following - sample for gcc. This is little tricky for process 64-bit words,
but anyway works quick and efficient.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
int main(int argc, char **argv) {
uint64_t val;
sscanf(argv[1], "%llx", &val);
printf("val=0x%llx\n", val);
uint32_t result;
if((uint32_t)val) { // first bit is inside lowest 32
asm("bsfl %1,%0" : "=r"(result) : "r"(val));
} else { // first bit is outside lowest 32
asm("bsfl %1,%0" : "=r"(result) : "r"(val >> 32));
result += 32;
}
printf("val=%llu; result=%u\n", val, result);
return 0;
}
Also, in your use x64 architecture, you can try to use bsfq instruction, and remove "if/else"
Take an array of sixteen integers, initialized with the number of bits set for the integers from zero to fifteen (i.e. 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4). Now take bitchunk % 16, and look up the result in the int array - that's the number of set bits in the first four bits of the chunk. Right shift four times, and repeat the entire operation fifteen more times.
You can do this with an array of 256 integers and 8 bit sub-chunks instead. I wouldn't recommend using an array of 4096 integers with 12 bit sub-chunks, that's getting a bit ridiculous.
int[] lookup = new int[16] {0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4};
int bitCount = 0;
for(int i = 0; i < 16; i++) {
int firstFourBits = bitChunk % 16;
bitCount += lookup[firstFourBits];
bitChunk = butChunk >> 4;
}
Can someone please explain the _mm_shuffle_epi8 SSSE3 intrinsic?
I know it shuffles 16 8-bit integers in an __m128i but not sure how I could use this.
I basically want to use _mm_shuffle_epi8 to modify the function below to get better performance.
while(not done)
dest[i+0] = (src+j).a;
dest[i+1] = (src+j).b;
dest[i+2] = (src+j).c;
dest[i+3] = (src+j+1).a;
dest[i+4] = (src+j+1).b;
dest[i+5] = (src+j+1).c;
i+=6;
j+=2;
_mm_shuffle_epi8 (better known as pshufb), essentially does this:
temp = dst;
for (int i = 0; i < 16; i++)
dst[i] = (src[i] & 0x80) == 0 ? temp[src[i] & 15] : 0;
As for whether you can use it here, it's impossible to tell without knowing the types involved. It won't be "nice" anyway because the destination is a block of 6 bytes (or words? or dwords?). You could make that work by unrolling and doing a lot of shifting and or-ing.
here's an example of using the intrinsic; you'll have to find out how to apply it to your particular situation. this code endian-swaps 4 32-bit integers at a time:
unsigned int *bswap(unsigned int *destination, unsigned int *source, int length) {
int i;
__m128i mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3);
for (i = 0; i < length; i += 4) {
_mm_storeu_si128((__m128i *)&destination[i],
_mm_shuffle_epi8(_mm_loadu_si128((__m128i *)&source[i]), mask));
}
return destination;
}
When I open the Windows Common Font Dialog, it lists, for each font, a bunch of sizes. For all of the OpenType/TrueType fonts, it has the same list - 9, 10, 11, 12, 14, 16, 18... For bitmap fonts, the list varies according to the available bitmaps. "Small fonts" has 2,3,4,5,6,7, while plain old Courier has 10, 12, 15. I don't know, but I'm lead from previous reading to believe that even for TrueType fonts, certain sizes will be hinted and will look nicer than all those other sizes, so presumably I could also see a TrueType font with a more restricted set of sizes.
I'm implementing a feature in my application whereby Ctrl+Mousewheel will scale the font size up and down, as it does in browsers. I'd like to determine the available list of sizes for a font so that if I'm currently at size 12, my application knows that for Courier New, the next appropriate larger size is 14, while for plain old Courier, it's 15.
How do I go about doing this?
See here for an explanation on how to enumerate fonts / font sizes for a specific font. Note, that TrueType fonts can be displayed at any size (and not just predetermined ones), since they are vector-based.
int EnumFontSizes(char *fontname)
{
LOGFONT logfont;
ZeroMemory(&logfont, sizeof logfont);
logfont.lfHeight = 0;
logfont.lfCharSet = DEFAULT_CHARSET;
logfont.lfPitchAndFamily = FIXED_PITCH | FF_DONTCARE;
lstrcpy(logfont.lfFaceName, fontname);
EnumFontFamiliesEx(hdc, &logfont, (FONTENUMPROC)FontSizesProc, 0, 0);
return 0;
}
int CALLBACK FontSizesProc(
LOGFONT *plf, /* pointer to logical-font data */
TEXTMETRIC *ptm, /* pointer to physical-font data */
DWORD FontType, /* font type */
LPARAM lParam /* pointer to application-defined data */
)
{
static int truetypesize[] = { 8, 9, 10, 11, 12, 14, 16, 18, 20,
22, 24, 26, 28, 36, 48, 72 };
int i;
if(FontType != TRUETYPE_FONTTYPE)
{
int logsize = ptm->tmHeight - ptm->tmInternalLeading;
long pointsize = MulDiv(logsize, 72, GetDeviceCaps(hdc, LOGPIXELSY));
for(i = 0; i < cursize; i++)
if(currentsizes[i] == pointsize)
return 1;
printf("%d ", pointsize);
currentsizes[cursize] = pointsize;
if(++cursize == 200) return 0;
return 1;
}
else
{
for(i = 0; i < (sizeof(truetypesize) / sizeof(truetypesize[0])); i++)
{
printf("%d ", truetypesize[i]);
}
return 0;
}
}