Retrieving available font sizes on Windows - winapi

When I open the Windows Common Font Dialog, it lists, for each font, a bunch of sizes. For all of the OpenType/TrueType fonts, it has the same list - 9, 10, 11, 12, 14, 16, 18... For bitmap fonts, the list varies according to the available bitmaps. "Small fonts" has 2,3,4,5,6,7, while plain old Courier has 10, 12, 15. I don't know, but I'm lead from previous reading to believe that even for TrueType fonts, certain sizes will be hinted and will look nicer than all those other sizes, so presumably I could also see a TrueType font with a more restricted set of sizes.
I'm implementing a feature in my application whereby Ctrl+Mousewheel will scale the font size up and down, as it does in browsers. I'd like to determine the available list of sizes for a font so that if I'm currently at size 12, my application knows that for Courier New, the next appropriate larger size is 14, while for plain old Courier, it's 15.
How do I go about doing this?

See here for an explanation on how to enumerate fonts / font sizes for a specific font. Note, that TrueType fonts can be displayed at any size (and not just predetermined ones), since they are vector-based.
int EnumFontSizes(char *fontname)
LOGFONT logfont;
ZeroMemory(&logfont, sizeof logfont);
logfont.lfHeight = 0;
logfont.lfCharSet = DEFAULT_CHARSET;
logfont.lfPitchAndFamily = FIXED_PITCH | FF_DONTCARE;
lstrcpy(logfont.lfFaceName, fontname);
EnumFontFamiliesEx(hdc, &logfont, (FONTENUMPROC)FontSizesProc, 0, 0);
return 0;
int CALLBACK FontSizesProc(
LOGFONT *plf, /* pointer to logical-font data */
TEXTMETRIC *ptm, /* pointer to physical-font data */
DWORD FontType, /* font type */
LPARAM lParam /* pointer to application-defined data */
static int truetypesize[] = { 8, 9, 10, 11, 12, 14, 16, 18, 20,
22, 24, 26, 28, 36, 48, 72 };
int i;
int logsize = ptm->tmHeight - ptm->tmInternalLeading;
long pointsize = MulDiv(logsize, 72, GetDeviceCaps(hdc, LOGPIXELSY));
for(i = 0; i < cursize; i++)
if(currentsizes[i] == pointsize)
return 1;
printf("%d ", pointsize);
currentsizes[cursize] = pointsize;
if(++cursize == 200) return 0;
return 1;
for(i = 0; i < (sizeof(truetypesize) / sizeof(truetypesize[0])); i++)
printf("%d ", truetypesize[i]);
return 0;


OpenCV Remap From One Location to Another

I would like to remap a square patch of an image in (x,y,width,height) of (45, 104, 37, 37) to another location (80, 200, 37,37). May I know why are the codes below not right?
for (int i =0;i<37;i++) //width
for (int j =0;j<37;j++) //width
{<float>(45+i,104+j) = 80+i ;<float>(45+i,104+j) = 200+j ;
for (int i =45; i <82; i++)
for (int j =104; j<141; j++)
{<float>(i,j) = i+37 ;<float>(i,j) = j+37 ;
With<float>(i,j) = i+37; you're storing the number i+37 at the location(index) i. Not the number at. Same for the statement following it.
OpenCV has a convenient method involving ROIs.
Mat roi = map_x( Rect(45, 104, 37, 37) );
map_x( Rect(80, 200, 37,37) ) = roi;

Computing the floor log of a binary number

If there is a number in binary, in a n bit system, then the floor log of the number is defined as the index of the MSB of the number. Now, if I have a number in binary, By scanning all bits one by one, I can determine the index of the MSB, but it will take me order n time. Is there some faster way I can do it?
Using c# as an example, for a byte, you can pre-compute a table and then just do a lookup
internal static readonly byte[] msbPos256 = new byte[256];
static ByteExtensions() {
msbPos256[0] = 8; // special value for when there are no set bits
msbPos256[1] = 0;
for (int i = 2; i < 256; i++) msbPos256[i] = (byte)(1 + msbPos256[i / 2]);
/// <summary>
/// Returns the integer logarithm base 2 (Floor(Log2(number))) of the specified number.
/// </summary>
/// <remarks>Example: Log2(10) returns 3.</remarks>
/// <param name="number">The number whose base 2 log is desired.</param>
/// <returns>The base 2 log of the number greater than 0, or 0 when the number
/// equals 0.</returns>
public static byte Log2(this byte value) {
return msbPos256[value | 1];
for an unsigned 32 bit int, the following will work
private static byte[] DeBruijnLSBsSet = new byte[] {
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
public static uint Log2(this uint value) {
value |= value >> 1;
value |= value >> 2;
value |= value >> 4;
value |= value >> 8;
return DeBruijnLSBsSet[unchecked((value | value >> 16) * 0x07c4acddu) >> 27];
This website is the go-to place for bit twiddling tricks
It has these, and a number of other techniques for achieving what you are asking for in your question.
There are a number of general tricks that utilize small lookup tables, as #hatchet says.
There is a notable alternative, however. If you want the fastest implementation and are using a low-level language, then this instruction is also built into almost all ISAs and has support from almost all compilers. See and use compiler intrinsics or inline assembly as appropriate.

Usage of _mm_shuffle_epi8 intrinsic

Can someone please explain the _mm_shuffle_epi8 SSSE3 intrinsic?
I know it shuffles 16 8-bit integers in an __m128i but not sure how I could use this.
I basically want to use _mm_shuffle_epi8 to modify the function below to get better performance.
while(not done)
dest[i+0] = (src+j).a;
dest[i+1] = (src+j).b;
dest[i+2] = (src+j).c;
dest[i+3] = (src+j+1).a;
dest[i+4] = (src+j+1).b;
dest[i+5] = (src+j+1).c;
_mm_shuffle_epi8 (better known as pshufb), essentially does this:
temp = dst;
for (int i = 0; i < 16; i++)
dst[i] = (src[i] & 0x80) == 0 ? temp[src[i] & 15] : 0;
As for whether you can use it here, it's impossible to tell without knowing the types involved. It won't be "nice" anyway because the destination is a block of 6 bytes (or words? or dwords?). You could make that work by unrolling and doing a lot of shifting and or-ing.
here's an example of using the intrinsic; you'll have to find out how to apply it to your particular situation. this code endian-swaps 4 32-bit integers at a time:
unsigned int *bswap(unsigned int *destination, unsigned int *source, int length) {
int i;
__m128i mask = _mm_set_epi8(12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3);
for (i = 0; i < length; i += 4) {
_mm_storeu_si128((__m128i *)&destination[i],
_mm_shuffle_epi8(_mm_loadu_si128((__m128i *)&source[i]), mask));
return destination;

find the index of the highest bit set of a 32-bit number without loops obviously

Here's a tough one(atleast i had a hard time :P):
find the index of the highest bit set of a 32-bit number without using any loops.
With recursion:
int firstset(int bits) {
return (bits & 0x80000000) ? 31 : firstset((bits << 1) | 1) - 1;
Assumes [31,..,0] indexing
Returns -1 if no bits set
| 1 prevents stack overflow by capping the number of shifts until a 1 is reached (32)
Not tail recursive :)
Very interesting question, I will provide you an answer with benchmark
Solution using a loop
uint8_t highestBitIndex( uint32_t n )
uint8_t r = 0;
while ( n >>= 1 )
return r;
This help to better understand the question but is highly inefficient.
Solution using log
This approach can also be summarize by the log method
uint8_t highestSetBitIndex2(uint32_t n) {
return (uint8_t)(log(n) / log(2));
However it is also inefficient (even more than above one, see benchmark)
Solution using built-in instruction
uint8_t highestBitIndex3( uint32_t n )
return 31 - __builtin_clz(n);
This solution, while very efficient, suffer from the fact that it only work with specific compilers (gcc and clang will do) and on specific platforms.
NB: It is 31 and not 32 if we want the index
Solution with intrinsic
#include <x86intrin.h>
uint8_t highestSetBitIndex5(uint32_t n)
return _bit_scan_reverse(n); // undefined behavior if n == 0
This will call the bsr instruction at assembly level
Solution using inline assembly
LZCNT and BSR can be summarize in assembly with the below functions:
uint8_t highestSetBitIndex4(uint32_t n) // undefined behavior if n == 0
__asm__ __volatile__ (R"(
.intel_syntax noprefix
bsr eax, edi
.att_syntax noprefix
uint8_t highestSetBitIndex7(uint32_t n) // undefined behavior if n == 0
__asm__ __volatile__ (R"(.intel_syntax noprefix
lzcnt ecx, edi
mov eax, 31
sub eax, ecx
.att_syntax noprefix
NB: Do Not Use unless you know what you are doing
Solution using lookup table and magic number multiplication (probably the best AFAIK)
First you use the following function to clear all the bits except the highest one:
uint32_t keepHighestBit( uint32_t n )
n |= (n >> 1);
n |= (n >> 2);
n |= (n >> 4);
n |= (n >> 8);
n |= (n >> 16);
return n - (n >> 1);
Credit: The idea come from Henry S. Warren, Jr. in his book Hacker's Delight
Then we use an algorithm based on DeBruijn's Sequence to perform a kind of binary search:
uint8_t highestBitIndex8( uint32_t b )
static const uint32_t deBruijnMagic = 0x06EB14F9; // equivalent to 0b111(0xff ^ 3)
static const uint8_t deBruijnTable[64] = {
0, 0, 0, 1, 0, 16, 2, 0, 29, 0, 17, 0, 0, 3, 0, 22,
30, 0, 0, 20, 18, 0, 11, 0, 13, 0, 0, 4, 0, 7, 0, 23,
31, 0, 15, 0, 28, 0, 0, 21, 0, 19, 0, 10, 12, 0, 6, 0,
0, 14, 27, 0, 0, 9, 0, 5, 0, 26, 8, 0, 25, 0, 24, 0,
return deBruijnTable[(keepHighestBit(b) * deBruijnMagic) >> 26];
Another version:
void propagateBits(uint32_t *n) {
*n |= *n >> 1;
*n |= *n >> 2;
*n |= *n >> 4;
*n |= *n >> 8;
*n |= *n >> 16;
uint8_t highestSetBitIndex8(uint32_t b)
static const uint32_t Magic = (uint32_t) 0x07C4ACDD;
static const int BitTable[32] = {
0, 9, 1, 10, 13, 21, 2, 29,
11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7,
19, 27, 23, 6, 26, 5, 4, 31,
return BitTable[(b * Magic) >> 27];
Benchmark with 100 million calls
compiling with g++ -std=c++17 highestSetBit.cpp -O3 && ./a.out
highestBitIndex1 136.8 ms (loop)
highestBitIndex2 183.8 ms (log(n) / log(2))
highestBitIndex3 10.6 ms (de Bruijn lookup Table with power of two, 64 entries)
highestBitIndex4 4.5 ms (inline assembly bsr)
highestBitIndex5 6.7 ms (intrinsic bsr)
highestBitIndex6 4.7 ms (gcc lzcnt)
highestBitIndex7 7.1 ms (inline assembly lzcnt)
highestBitIndex8 10.2 ms (de Bruijn lookup Table, 32 entries)
I would personally go for highestBitIndex8 if portability is your focus, else gcc built-in is nice.
Floor of logarithm-base-two should do the trick (though you have to special-case 0).
Floor of log base 2 of 0001 is 0 (bit with index 0 is set).
" " of 0010 is 1 (bit with index 1 is set).
" " of 0011 is 1 (bit with index 1 is set).
" " of 0100 is 2 (bit with index 2 is set).
and so on.
On an unrelated note, this is actually a pretty terrible interview question (I say this as someone who does technical interviews for potential candidates), because it really doesn't correspond to anything you do in practical programming.
Your boss isn't going to come up to you one day and say "hey, so we have a rush job for this latest feature, and it needs to be implemented without loops!"
You could do it like this (not optimised):
int index = 0;
uint32_t temp = number;
if ((temp >> 16) != 0) {
temp >>= 16;
index += 16;
if ((temp >> 8) != 0) {
temp >>= 8
index += 8;
sorry for bumping an old thread, but how about this
inline int ilog2(unsigned long long i) {
union { float f; int i; } = { i };
return (u.i>>23)-27;
int highest=ilog2(x); highest+=(x>>highest)-1;
// and in case you need it
int lowest = ilog2((x^x-1)+1)-1;
this can be done as a binary search, reducing complexity of O(N) (for an N-bit word) to O(log(N)). A possible implementation is:
int highest_bit_index(uint32_t value)
if(value == 0) return 0;
int depth = 0;
int exponent = 16;
while(exponent > 0)
int shifted = value >> (exponent);
if(shifted > 0)
depth += exponent;
if(shifted == 1) return depth + 1;
value >>= exponent;
exponent /= 2;
return depth + 1;
the input is a 32 bit unsigned integer.
it has a loop that can be converted into 5 levels of if-statements , therefore resulting in 32 or so if-statements. you could also use recursion to get rid of the loop, or the absolutely evil "goto" ;)
n - Decimal number for which bit location to be identified
start - Indicates decimal value of ( 1 << 32 ) - 2147483648
bitLocation - Indicates bit location which is set to 1
public int highestBitSet(int n, long start, int bitLocation)
if (start == 0)
return 0;
if ((start & n) > 0)
return bitLocation;
return highestBitSet(n, (start >> 1), --bitLocation);
long i = 1;
long startIndex = (i << 31);
int bitLocation = 32;
int value = highestBitSet(64, startIndex, bitLocation);
int high_bit_set(int n, int pos)
return -1;
return (0x80000000 & n)?pos:high_bit_set((n<<1),--pos);
int n=0x23;
int high_pos = high_bit_set(n,31);
printf("highest index = %d",high_pos);
From your main call function high_bit_set(int n , int pos) with the input value n, and default 31 as the highest position. And the function is like above.
Paislee's solution is actually pretty easy to make tail-recursive, though, it's a much slower solution than the suggested floor(log2(n));
int firstset_tr(int bits, int final_dec) {
// pass in 0 for final_dec on first call, or use a helper function
if (bits & 0x80000000) {
return 31-final_dec;
} else {
return firstset_tr( ((bits << 1) | 1), final_dec+1 );
This function also works for other bit sizes, just change the check,
if (bits & 0x80) { // for 8-bit
return 7-final_dec;
Note that what you are trying to do is calculate the integer log2 of an integer,
#include <stdio.h>
#include <stdlib.h>
unsigned int
Log2(unsigned long x)
unsigned long n = x;
int bits = sizeof(x)*8;
int step = 1; int k=0;
for( step = 1; step < bits; ) {
n |= (n >> step);
step *= 2; ++k;
//printf("%ld %ld\n",x, (x - (n >> 1)) );
return(x - (n >> 1));
Observe that you can attempt to search more than 1 bit at a time.
unsigned int
Log2_a(unsigned long x)
unsigned long n = x;
int bits = sizeof(x)*8;
int step = 1;
int step2 = 0;
//observe that you can move 8 bits at a time, and there is a pattern...
//if( x>1<<step2+8 ) { step2+=8;
//if( x>1<<step2+8 ) { step2+=8;
//if( x>1<<step2+8 ) { step2+=8;
for( step2=0; x>1L<<step2+8; ) {
//printf("step2 %d\n",step2);
for( step = 0; x>1L<<(step+step2); ) {
//printf("step %d\n",step+step2);
printf("log2(%ld) %d\n",x,step+step2);
This approach uses a binary search
unsigned int
Log2_b(unsigned long x)
unsigned long n = x;
unsigned int bits = sizeof(x)*8;
unsigned int hbit = bits-1;
unsigned int lbit = 0;
unsigned long guess = bits/2;
int found = 0;
while ( hbit-lbit>1 ) {
//printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit);
//when value between guess..lbit
if( (x<=(1L<<guess)) ) {
//printf("%ld < 1<<%d %ld\n",x,guess,1L<<guess);
//printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit);
//when value between hbit..guess
if( (x>(1L<<guess)) ) {
//printf("%ld > 1<<%d %ld\n",x,guess,1L<<guess);
//printf("log2(%ld) %d<%d<%d\n",x,lbit,guess,hbit);
if( (x>(1L<<guess)) ) ++guess;
Another binary search method, perhaps more readable,
unsigned int
Log2_c(unsigned long x)
unsigned long v = x;
unsigned int bits = sizeof(x)*8;
unsigned int step = bits;
unsigned int res = 0;
for( step = bits/2; step>0; )
//printf("log2(%ld) v %d >> step %d = %ld\n",x,v,step,v>>step);
while ( v>>step ) {
//printf("log2(%ld) step %d res %d v>>step %ld\n",x,step,res,v);
step /= 2;
if( (x>(1L<<res)) ) ++res;
And because you will want to test these,
int main()
unsigned long int x = 3;
for( x=2; x<1000000000; x*=2 ) {
//printf("x %ld, x+1 %ld, log2(x+1) %d\n",x,x+1,Log2(x+1));
printf("x %ld, x+1 %ld, log2_a(x+1) %d\n",x,x+1,Log2_a(x+1));
printf("x %ld, x+1 %ld, log2_b(x+1) %d\n",x,x+1,Log2_b(x+1));
printf("x %ld, x+1 %ld, log2_c(x+1) %d\n",x,x+1,Log2_c(x+1));
well from what I know the function Log is Implemented very efficiently in most programming languages, and even if it does contain loops , it is probably very few of them , internally
So I would say that in most cases using the log would be faster , and more direct.
you do have to check for 0 though and avoid taking the log of 0, as that would cause the program to crash.

Very basic radix sort

I just wrote a simple iterative radix sort and I'm wondering if I have the right idea.
Recursive implementations seem to be much more common.
I am sorting 4-byte integers (unsigned to keep it simple).
I am using 1-byte as the 'digit'. So I have 2^8=256 buckets.
I am sorting the most significant digit (MSD) first.
After each sort I put them back into array in the order they exist in buckets and then perform the next sort.
So I end up doing 4 bucket sorts.
It seems to work for a small set of data. Since I am doing it MSD I'm guessing that's not stable and may fail with different data.
Did I miss anything major?
#include <iostream>
#include <vector>
#include <list>
using namespace std;
void radix(vector<unsigned>&);
void print(const vector<list<unsigned> >& listBuckets);
unsigned getMaxForBytes(unsigned bytes);
void merge(vector<unsigned>& data, vector<list<unsigned> >& listBuckets);
int main()
unsigned d[] = {5,3,6,9,2,11,9, 65534, 4,10,17,13, 268435455, 4294967294,4294967293, 268435454,65537};
vector<unsigned> v(d,d+17);
return 0;
void radix(vector<unsigned>& data)
int bytes = 1; // How many bytes to compare at a time
unsigned numOfBuckets = getMaxForBytes(bytes) + 1;
cout << "Numbuckets" << numOfBuckets << endl;
int chunks = sizeof(unsigned) / bytes;
for(int i = chunks - 1; i >= 0; --i)
vector<list<unsigned> > buckets; // lazy, wasteful allocation
unsigned mask = getMaxForBytes(bytes);
unsigned shift = i * bytes * 8;
mask = mask << shift;
for(unsigned j = 0; j < data.size(); ++j)
unsigned bucket = data[j] & mask; // isolate bits of current chunk
bucket = bucket >> shift; // bring bits down to least significant
unsigned getMaxForBytes(unsigned bytes)
unsigned max = 0;
for(unsigned i = 1; i <= bytes; ++i)
max = max << 8;
max |= 0xFF;
return max;
void merge(vector<unsigned>& data, vector<list<unsigned> >& listBuckets)
int index = 0;
for(unsigned i = 0; i < listBuckets.size(); ++i)
list<unsigned>& list = listBuckets[i];
std::list<unsigned>::const_iterator it = list.begin();
for(; it != list.end(); ++it)
data[index] = *it;
void print(const vector<list<unsigned> >& listBuckets)
cout << "Printing listBuckets: " << endl;
for(unsigned i = 0; i < listBuckets.size(); ++i)
const list<unsigned>& list = listBuckets[i];
if(list.size() == 0) continue;
std::list<unsigned>::const_iterator it = list.begin(); // Why do I need std here!?
for(; it != list.end(); ++it)
cout << *it << ", ";
cout << endl;
Seems to work well in LSD form which it can be modified by changing the the chunk loop in radix as follows:
for(int i = chunks - 1; i >= 0; --i)
Let's look at en example with two-digit decimal numbers:
49, 25, 19, 27, 87, 67, 22, 90, 47, 91
Sorting by the first digit yields
19, 25, 27, 22, 49, 47, 67, 87, 90, 91
Next, you sort by the second digit, yielding
90, 91, 22, 25, 27, 47, 67, 87, 19, 49
Seems wrong, doesn't it? Or isn't this what you are doing? Maybe you can show us the code if I got you wrong.
If you are doing the second bucket sort on all groups with the same first digit(s), your algorithm would be equivalent to the recursive version. It would be stable as well. The only difference is that you'd do the bucket sorts breadth-first instead of depth-first.
You also need to make sure you Sort every bucket from MSD to LSD before reassembling.
Sort into 10 buckets [0-9] on MSD
if you were to reassemble and then sort again it would not work. Instead recursively sort each bucket.
B1 is sorted into B1B2=[12];B1B9=[19]
Once all have been sorted you can reassemble correctly.
