Why do WM_APPCOMMAND LPARAM have to be multiplied by 65536 - winapi

I am trying to control the master volume. I am able to succesfully do that with this:
HWND mainhwnd = CreateWindow(szWindowClass, _T("window-noit-ext-profilist"), 0, 0, 0, 0, 0, HWND_MESSAGE, NULL, wcex.hInstance, NULL);
if (!mainhwnd) {
MessageBox(NULL, _T("Profilist: Call to CreateWindow failed!"), _T("window-noit-ext-profilist"), NULL);
return 1;
}
SendMessage(mainhwnd, WM_APPCOMMAND, (WPARAM)mainhwnd, (LPARAM)(APPCOMMAND_VOLUME_MUTE * 65536)); // mute
SendMessage(mainhwnd, WM_APPCOMMAND, (WPARAM)mainhwnd, (LPARAM)(APPCOMMAND_VOLUME_DOWN * 65536)); // vol down
SendMessage(mainhwnd, WM_APPCOMMAND, (WPARAM)mainhwnd, (LPARAM)(APPCOMMAND_VOLUME_UP * 65536)); // vol up
Why do I have to multiply by 65,536? The docs do not state this. IF I don't multiply, then it doesn't work.

For WM_APPCOMMAND, the lParam parameter packs three values in a single integer.
The lower 16bit word, dwKeys, indicates whether various virtual keys are down.
The higher 16bit word packs two fields: the highest 4 bits, uDevice, specifies the input device that is generating the input event. The lower 12 bits, cmd, contains the application command.
Multiplying by 65536 is same as bit shifting by 16 bits to the left (because 65536 = 0x10000 in hexadecimal). So, when you send the message with APPCOMMAND_VOLUME_UP * 65536, you are specifying the cmd is APPCOMMAND_VOLUME_UP, and the uDevice and dwKeys are both zero.

Related

Understanding CL_DEVICE_MAX_WORK_GROUP_SIZE limit OpenCL?

I have little bit difficulty understanding max work group limit reported by OpenCL and how it affects the program.
So my program is reporting following thing,
CL_DEVICE_MAX_WORK_ITEM_SIZES : 1024, 1024, 1024
CL_DEVICE_MAX_WORK_GROUP_SIZE : 256
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS : 3
Now I am writing program to add vectors with 1 million entries.
So the calculation for globalSize and localSize for NDRange is as follows
int localSize = 64;
// Number of total work items - localSize must be devisor
globalSize = ceil(n/(float)localSize)*localSize;
.......
// Execute the kernel over the entire range of the data set
err = clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &globalSize, &localSize,
0, NULL, NULL);
Here as per my understanding OpenCL indirectly calculates the number of work groups it will launch.
For above example
globalSize = 15625 * 64 -> 1,000,000 -> So this is total number of threads that will be launched
localSize = 64 -> So each work group will have 64 work items
Hence from above we get
Total Work Groups Launched = globalSize/ localSize -> 15625 Work Groups
Here my confusion starts,
If you see value reported by OpenCL CL_DEVICE_MAX_WORK_GROUP_SIZE : 256 So, I was thinking this means max my device can launch 256 work groups in one dimension,
but above calculations showed that I am launching 15625 work groups.
So how is this thing working ?
I hope some one can clarify my confusion.
I am sure I am understanding something wrong.
Thanks in advance.
According to the specification of clEnqueueNDRangeKernel: https://www.khronos.org/registry/OpenCL/sdk/2.2/docs/man/html/clEnqueueNDRangeKernel.html,
CL_DEVICE_MAX_WORK_ITEM_SIZES and CL_DEVICE_MAX_WORK_GROUP_SIZE indicate the limits of local size (CL_​KERNEL_​WORK_​GROUP_​SIZE is CL_DEVICE_MAX_WORK_GROUP_SIZE in OpenCL 1.2).
const int dimension = n;
const int localSizeDim[n] = { ... }; // Each element must be less than or equal to 'CL_DEVICE_MAX_WORK_ITEM_SIZES[i]'
const int localSize = localSizeDim[0] * localSizeDim[1] * ... * localSizeDim[n-1]; // The size must be less than or equal to 'CL_DEVICE_MAX_WORK_GROUP_SIZ'
I couldn't find the device limit of global work items, but maximum value representable by size t is the limit of global work items in the description of the error CL_​INVALID_​GLOBAL_​WORK_​SIZE.

Assembly language using signed int multiplication math to perform shifts

This is a bit of a turn around.
Usually one is attempting to use shifts to perform multiplication and not the other way around.
On the Hitachi/Motorola 6309 there is no shift by n bits. There is only shift by 1 bit.
However there is a 16 bit x 16 bit signed multiply (provides a 32 bit signed result).
(EDIT) Using this is no problem for a 16 bit shift (left) however I'm trying to use 2 x 16x16 signed mults to do a 32 bit shift. The high order word of the result for the low order word shift is the problem. (Does that make sence?)
Some pseudo code might help:
result.highword = low word of (val.highword * shiftmulttable[shift])
temp = val.lowword * shiftmulttable[shift]
result.lowword = temp.lowword
result.highword = or (result.highword, temp.highword)
(with some magic on temp.highword to consider signed values)
I have been exercising my logic in an attempt to use this instruction to perform the shifts but so far I have failed.
I can easily achieve any positive value shifts by 0 to 14 but when it comes to shifting by 15 bits (mult by 0x8000) or shifting any negative values certain combinations of values require either:
complementing the result by 1
complementing the result by 2
adding 1 to the result
doing nothing to the result
And I just can't see any pattern to these values.
Any ideas appreciated!
Best I can tell from the problem description, implementing the 32-bit shift would work as desired by using an unsigned 16x16->32 bit multiply. This can easily be synthesized from a signed 16x16->32 multiply instruction by exploiting the two's complement integer representation. If the two factors are a and b, adding b to the high-order 16 bits of the signed product when a is negative, and adding a to the high-order 16 bits of the signed product when b is negative will give us the unsigned multiplication result.
The following C code implements this approach and tests it exhaustively:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
/* signed 16x16->32 bit multiply. Hardware instruction */
int32_t mul16_wide (int16_t a, int16_t b)
{
return (int32_t)a * (int32_t)b;
}
/* unsigned 16x16->32 bit multiply (synthetic) */
int32_t umul16_wide (int16_t a, int16_t b)
{
int32_t p = mul16_wide (a, b); // signed 16x16->32 bit multiply
if (a < 0) p = p + (b << 16); // add 'b' to upper 16 bits of product
if (b < 0) p = p + (a << 16); // add 'a' to upper 16 bits of product
return p;
}
/* unsigned 16x16->32 bit multiply (reference) */
uint32_t umul16_wide_ref (uint16_t a, uint16_t b)
{
return (uint32_t)a * (uint32_t)b;
}
/* test synthetic unsigned multiply exhaustively */
int main (void)
{
int16_t a, b;
int32_t res, ref;
uint64_t count = 0;
a = -32768;
do {
b = -32768;
do {
res = umul16_wide (a, b);
ref = umul16_wide_ref (a, b);
count++;
if (res != ref) {
printf ("!!!! a=%d b=%d res=%d ref=%d\n", a, b, res, ref);
return EXIT_FAILURE;
}
if (b == 32767) break;
b = b + 1;
} while (1);
if (a == 32767) break;
a = a + 1;
} while (1);
printf ("test cases passed: %llx\n", count);
return EXIT_SUCCESS;
}
I am not familiar with the Hitachi/Motorola 6309 architecture. I assume it uses a special 32-bit register to hold the result of a wide multiply, from which high and low half can be extracted into 16-bit general-purpose registers, and the conditional corrections can then be applied to the register holding the upper 16 bits.
Are you using fixed-point multiplicative inverses to use the high half result for a right shift?
If you're just left-shifting, multiply by 0x8000 should work. The low half of an NxN => 2N-bit multiply is the same whether inputs are treated as signed or unsigned. Or do you need a 32-bit shift result from your 16-bit input?
Is the multiply instruction actually faster than a few 1-bit shifts for small shift counts? (I wouldn't be surprised if compile-time-constant counts of 2 or 3 would be faster with just a chain of 2 or 3 add same,same or left-shift instructions.)
Anyway, for a compile-time-constant shift count of 15, maybe just multiply by 1<<14 and then do the last count with a 1-bit shift (add same,same).
Or if your ISA has rotates, rotate right by 1 and mask away the low bits, skipping the multiply. Or zero a register, right-shift the low bit into the carry flag, then rotate-through-carry into the top of the zeroed register.
(The latter might be useful on an ISA that doesn't have large immediates and couldn't "mask away all the low bits" in one instruction. Or an ISA that only has RCR not ROR. I don't know 6309 at all)
If you're using a runtime count to look up a multiplier from a table, maybe branch for that case, or adjust your LUT so every entry needs an extra 1-bit shift, so you can do mul(lut[count]) and an unconditional extra shift.
(Only works if you don't need to support a shift-count of zero.)
Not that there would be many interested people who would want to see the 6309 code, but here it is:
Compliant with OS9 C ABI.
Pointer to result and arguments pushed on stack right to left.
U,PC,val(4bytes),shift(2bytes),*result(2bytes)
0 2 4 8 10
:
* 10,s pointer to long result
* 4,s 4 byte value
* 8,s 2 byte shift
* x = pointer to result
pshs u
ldx 10,s * load pointer to result
ldd 8,s * load shift
* if shift amount is greater than 31 then
* just return zero. OS9 C standard.
cmpd #32
blt _10x
ldq #0
stq 4,s
bra _13x
* if shift amount is greater than 16 than
* move bottom word of value into top word
* and clear bottom word
_10x
cmpb #16
blt _1x
ldu 6,s
stu 4,s
clr 6,s
clr 7,s
_1x
* setup pointer u and offset e into mult table _2x
leau _2x,pc
andb #15
* if there is no shift value just return value
beq _13x
aslb * need to double shift to use as word table offset
stb 8,s * save double shft
tfr b,e
* shift top word q = val.word.high * multtab[shft]
ldd 4,s
muld e,u
stw ,x * result.word.high = low word of mult
* shift bottom word q = val.word.low * multtab[shft]
lde 8,s * reload double shft
ldd 6,s
muld e,u
stw 2,x * result.word.low = low word of mult
* The high word or mult needs to be corrected for sign
* if val is negative then muld will return negated results
* and need to un negate it
lde 8,s * reload double shift
tst 4,s * test top byte of val for negative
bge _11x
addd e,u * add the multtab[shft] again to top word
_11x
* if multtab[shft] is negative (shft is 15 or shft<<1 is 30)
* also need to un negate result
cmpe #30
bne _12x
addd 6,s * add val.word.low to top word
_12x
* combine top and bottom and save bottom half of result
ord ,x
std ,x
bra _14x
* this is only reached if the result is in value (let result = value)
_13x
ldq 4,s * load value
stq ,x * result = value
_14x
puls u,pc
_2x fdb $01,$02,$04,$08,$10,$20,$40,$80,$0100,$0200,$0400,$0800
fdb $1000,$2000,$4000,$8000

Costs of new AVX512 instruction - Scatter store

I'm playing around with the new AVX512 instruction sets and I try to understand how they work and how one can use them.
What I try is to interleave specific data, selected by a mask.
My little benchmark loads x*32 byte of aligned data from memory into two vector registers and compresses them using a dynamic mask (fig. 1). The resulting vector registers are scattered into the memory, so that the two vector registers are interleaved (fig. 2).
Figure 1: Compressing the two data vector registers using the same dynamically created mask.
Figure 2: Scatter store to interleave the compressed data.
My code looks like the following:
void zipThem( uint32_t const * const data, __mmask16 const maskCompress, __m512i const vindex, uint32_t * const result ) {
/* Initialize a vector register containing zeroes to get the store mask */
__m512i zeroVec = _mm512_setzero_epi32();
/* Load data */
__m512i dataVec_1 = _mm512_conflict_epi32( data );
__m512i dataVec_2 = _mm512_conflict_epi32( data + 16 );
/* Compress the data */
__m512i compVec_1 = _mm512_maskz_compress_epi32( maskCompress, dataVec_1 );
__m512i compVec_2 = _mm512_maskz_compress_epi32( maskCompress, dataVec_2 );
/* Get the store mask by compare the compressed register with the zero-register (4 means !=) */
__mmask16 maskStore = _mm512_cmp_epi32_mask( zeroVec, compVec_1, 4 );
/* Interleave the selected data */
_mm512_mask_i32scatter_epi32(
result,
maskStore,
vindex,
compVec_1,
1
);
_mm512_mask_i32scatter_epi32(
result + 1,
maskStore,
vindex,
compVec_2,
1
);
}
I compiled everything with
-O3 -march=knl -lmemkind -mavx512f -mavx512pf
I call the method for 100'000'000 elements. To actually get an overview of the behaviour of the scatter store I repeated this measurement with different values for maskCompress.
I expected some kind of dependence between the time needed for execution and the number of set bits within the maskCompress. But I observed, that the tests needed roughly the same time for execution. Here is the result of the performance test:
Figure 3: Results of the measurements. The x-axis represents the number of written elements, depending on maskCompressed. The y-axis shows the performance.
As one can see, the performance is getting higher when more data is actual written to memory.
I did a little bit of research and came up to this: Instruction latency of avx512. Following the given link, the latency of the used instructions are constant. But to be honest, I am a little bit confused about this behaviour.
Regarding to the answers from Christoph and Peter, I changed my approach a little bit. Thus I have no idea how I can use unpackhi / unpacklo to interleave sparse vector registers, I just combined the AVX512 compress intrinsic with a shuffle (vpermi):
int zip_store_vpermit_cnt(
uint32_t const * const data,
int const compressMask,
uint32_t * const result,
std::ofstream & log
) {
__m512i data1 = _mm512_undefined_epi32();
__m512i data2 = _mm512_undefined_epi32();
__m512i comp_vec1 = _mm512_undefined_epi32();
__m512i comp_vec2 = _mm512_undefined_epi32();
__mmask16 comp_mask = compressMask;
__mmask16 shuffle_mask;
uint32_t store_mask = 0;
__m512i shuffle_idx_lo = _mm512_set_epi32(
23, 7, 22, 6,
21, 5, 20, 4,
19, 3, 18, 2,
17, 1, 16, 0 );
__m512i shuffle_idx_hi = _mm512_set_epi32(
31, 15, 30, 14,
29, 13, 28, 12,
27, 11, 26, 10,
25, 9, 24, 8 );
std::size_t pos = 0;
int pcount = 0;
int fullVec = 0;
for( std::size_t i = 0; i < ELEM_COUNT; i += 32 ) {
/* Loading the current data */
data1 = _mm512_maskz_compress_epi32( comp_mask, _mm512_load_epi32( &(data[i]) ) );
data2 = _mm512_maskz_compress_epi32( comp_mask, _mm512_load_epi32( &(data[i+16]) ) );
shuffle_mask = _mm512_cmp_epi32_mask( zero, data2, 4 );
/* Interleaving the two vector register, depending on the compressMask */
pcount = 2*( __builtin_popcount( comp_mask ) );
store_mask = std::pow( 2, (pcount) ) - 1;
fullVec = pcount / 17;
comp_vec1 = _mm512_permutex2var_epi32( data1, shuffle_idx_lo, data2 );
_mm512_mask_storeu_epi32( &(result[pos]), store_mask, comp_vec1 );
pos += (fullVec) * 16 + ( ( 1 - ( fullVec ) ) * pcount ); // same as pos += ( pCount >= 16 ) ? 16 : pCount;
_mm512_mask_storeu_epi32( &(result[pos]), (store_mask >> 16) , comp_vec2 );
pos += ( fullVec ) * ( pcount - 16 ); // same as pos += ( pCount >= 16 ) ? pCount - 16 : 0;
//a simple _mm512_store_epi32 produces a segfault, because the memory isn't aligned anymore :(
}
return pos;
}
That way the sparse data within the two vector registers can be interleaved. Unfortunately I have to manually calculate the mask for the store. This seems to be quite expensive. One could use a LUT to avoid the calculation, but I think that is not the way it should be.
Figure 4: Results of the performance test of 4 different kinds of store.
I know that this is not the usual way, but I have 3 questions, related to this topic and I am hopefull that one can help me out.
Why should a masked store with only one set bit needs the same time as a masked store where all bits are set?
Does anyone has some experience or is there a good documentation to understand the behaviour of the AVX512 scatter store?
Is there a more easy or more performant way to interleave two vector registers?
Thanks for your help!
Sincerely

How to calculate g values from LIS3DH sensor?

I am using LIS3DH sensor with ATmega128 to get the acceleration values to get motion. I went through the datasheet but it seemed inadequate so I decided to post it here. From other posts I am convinced that the sensor resolution is 12 bit instead of 16 bit. I need to know that when finding g value from the x-axis output register, do we calculate the two'2 complement of the register values only when the sign bit MSB of OUT_X_H (High bit register) is 1 or every time even when this bit is 0.
From my calculations I think that we calculate two's complement only when MSB of OUT_X_H register is 1.
But the datasheet says that we need to calculate two's complement of both OUT_X_L and OUT_X_H every time.
Could anyone enlighten me on this ?
Sample code
int main(void)
{
stdout = &uart_str;
UCSRB=0x18; // RXEN=1, TXEN=1
UCSRC=0x06; // no parit, 1-bit stop, 8-bit data
UBRRH=0;
UBRRL=71; // baud 9600
timer_init();
TWBR=216; // 400HZ
TWSR=0x03;
TWCR |= (1<<TWINT)|(1<<TWSTA)|(0<<TWSTO)|(1<<TWEN);//TWCR=0x04;
printf("\r\nLIS3D address: %x\r\n",twi_master_getchar(0x0F));
twi_master_putchar(0x23, 0b000100000);
printf("\r\nControl 4 register 0x23: %x", twi_master_getchar(0x23));
printf("\r\nStatus register %x", twi_master_getchar(0x27));
twi_master_putchar(0x20, 0x77);
DDRB=0xFF;
PORTB=0xFD;
SREG=0x80; //sei();
while(1)
{
process();
}
}
void process(void){
x_l = twi_master_getchar(0x28);
x_h = twi_master_getchar(0x29);
y_l = twi_master_getchar(0x2a);
y_h = twi_master_getchar(0x2b);
z_l = twi_master_getchar(0x2c);
z_h = twi_master_getchar(0x2d);
xvalue = (short int)(x_l+(x_h<<8));
yvalue = (short int)(y_l+(y_h<<8));
zvalue = (short int)(z_l+(z_h<<8));
printf("\r\nx_val: %ldg", x_val);
printf("\r\ny_val: %ldg", y_val);
printf("\r\nz_val: %ldg", z_val);
}
I wrote the CTRL_REG4 as 0x10(4g) but when I read them I got 0x20(8g). This seems bit bizarre.
Do not compute the 2s complement. That has the effect of making the result the negative of what it was.
Instead, the datasheet tells us the result is already a signed value. That is, 0 is not the lowest value; it is in the middle of the scale. (0xffff is just a little less than zero, not the highest value.)
Also, the result is always 16-bit, but the result is not meant to be taken to be that accurate. You can set a control register value to to generate more accurate values at the expense of current consumption, but it is still not guaranteed to be accurate to the last bit.
the datasheet does not say (at least the register description in chapter 8.2) you have to calculate the 2' complement but stated that the contents of the 2 registers is in 2's complement.
so all you have to do is receive the two bytes and cast it to an int16_t to get the signed raw value.
uint8_t xl = 0x00;
uint8_t xh = 0xFC;
int16_t x = (int16_t)((((uint16)xh) << 8) | xl);
or
uint8_t xa[2] {0x00, 0xFC}; // little endian: lower byte to lower address
int16_t x = *((int16*)xa);
(hope i did not mixed something up with this)
I have another approach, which may be easier to implement as the compiler will do all of the work for you. The compiler will probably do it most efficiently and with no bugs too.
Read the raw data into the raw field in:
typedef union
{
struct
{
// in low power - 8 significant bits, left justified
int16 reserved : 8;
int16 value : 8;
} lowPower;
struct
{
// in normal power - 10 significant bits, left justified
int16 reserved : 6;
int16 value : 10;
} normalPower;
struct
{
// in high resolution - 12 significant bits, left justified
int16 reserved : 4;
int16 value : 12;
} highPower;
// the raw data as read from registers H and L
uint16 raw;
} LIS3DH_RAW_CONVERTER_T;
than use the value needed according to the power mode you are using.
Note: In this example, bit fields structs are BIG ENDIANS.
Check if you need to reverse the order of 'value' and 'reserved'.
The LISxDH sensors are 2's complement, left-justified. They can be set to 12-bit, 10-bit, or 8-bit resolution. This is read from the sensor as two 8-bit values (LSB, MSB) that need to be assembled together.
If you set the resolution to 8-bit, just can just cast LSB to int8, which is the likely your processor's representation of 2's complement (8bit). Likewise, if it were possible to set the sensor to 16-bit resolution, you could just cast that to an int16.
However, if the value is 10-bit left justified, the sign bit is in the wrong place for an int16. Here is how you convert it to int16 (16-bit 2's complement).
1.Read LSB, MSB from the sensor:
[MMMM MMMM] [LL00 0000]
[1001 0101] [1100 0000] //example = [0x95] [0xC0] (note that the LSB comes before MSB on the sensor)
2.Assemble the bytes, keeping in mind the LSB is left-justified.
//---As an example....
uint8_t byteMSB = 0x95; //[1001 0101]
uint8_t byteLSB = 0xC0; //[1100 0000]
//---Cast to U16 to make room, then combine the bytes---
assembledValue = ( (uint16_t)(byteMSB) << UINT8_LEN ) | (uint16_t)byteLSB;
/*[MMMM MMMM LL00 0000]
[1001 0101 1100 0000] = 0x95C0 */
//---Shift to right justify---
assembledValue >>= (INT16_LEN-numBits);
/*[0000 00MM MMMM MMLL]
[0000 0010 0101 0111] = 0x0257 */
3.Convert from 10-bit 2's complement (now right-justified) to an int16 (which is just 16-bit 2's complement on most platforms).
Approach #1: If the sign bit (in our example, the tenth bit) = 0, then just cast it to int16 (since positive numbers are represented the same in 10-bit 2's complement and 16-bit 2's complement).
If the sign bit = 1, then invert the bits (keeping just the 10bits), add 1 to the result, then multiply by -1 (as per the definition of 2's complement).
convertedValueI16 = ~assembledValue; //invert bits
convertedValueI16 &= ( 0xFFFF>>(16-numBits) ); //but keep just the 10-bits
convertedValueI16 += 1; //add 1
convertedValueI16 *=-1; //multiply by -1
/*Note that the last two lines could be replaced by convertedValueI16 = ~convertedValueI16;*/
//result = -425 = 0xFE57 = [1111 1110 0101 0111]
Approach#2: Zero the sign bit (10th bit) and subtract out half the range 1<<9
//----Zero the sign bit (tenth bit)----
convertedValueI16 = (int16_t)( assembledValue^( 0x0001<<(numBits-1) ) );
/*Result = 87 = 0x57 [0000 0000 0101 0111]*/
//----Subtract out half the range----
convertedValueI16 -= ( (int16_t)(1)<<(numBits-1) );
[0000 0000 0101 0111]
-[0000 0010 0000 0000]
= [1111 1110 0101 0111];
/*Result = 87 - 512 = -425 = 0xFE57
Link to script to try out (not optimized): http://tpcg.io/NHmBRR

Get size of volume on Windows

I'm writing a library to extract information about physical disks, partitions, and volumes on a Windows system (XP or later).
I'm trying to get the capacity of a volume. Here are the approaches I know about and the reason each fails:
GetDiskFreeSpaceEx -- Affected by user quota.
IOCTL_DISK_GET_DRIVE_GEOMETRY_EX -- Gets size of entire physical disk, even when invoked using a volume handle.
IOCTL_VOLUME_GET_VOLUME_DISK_EXTENTS -- Doesn't account for RAID overhead.
IOCTL_DISK_GET_LENGTH_INFO -- Fails with access denied. (Actually, it requires GENERIC_READ access, unlike all other queries, and GENERIC_READ requires administrator access.)
IOCTL_STORAGE_READ_CAPACITY -- Not available on XP, also shares the drawbacks of IOCTL_DISK_GET_LENGTH_INFO and IOCTL_DISK_GET_DRIVE_GEOMETRY_EX
FSCTL_GET_VOLUME_BITMAP + GetFreeDiskSpace for cluster size -- Requires GENERIC_READ (admin access) and gives the size of the data area of the filesystem, not the entire volume.
IOCTL_DISK_GET_PARTITION_INFO -- Requires GENERIC_READ (admin access) and also failed on a USB-attached disk (possibly using superfloppy partitioning)
Oddly, the number of clusters from FSCTL_GET_VOLUME_BITMAP and WMI's CIM_LogicalDisk.Size property agree, and both are 4096 bytes smaller than the value from IOCTL_DISK_GET_LENGTH_INFO.
What is the correct way to get volume capacity? Since all the other queries work without administrator access, I'm looking for a least-privilege solution for this too.
What exactly do you want to get?
1) Physical Disk capacity
OR
2) capacity of the Partition on the Disk
OR
3) capacity of the File System on the Partition
There is PDO for Physical Disk, for it disk.sys creates and attaches FDO (\Device\Harddisk<I>\DR0 - name or \Device\Harddisk<I>\Partition0 - symbolick link, where I disk number in 0,1,2..)
for every Partition on Physical Disk disk.sys creates PDO (\Device\Harddisk<I>\Partition<J> - (J in {1,2,3..}) - symlink to some \Device\HarddiskVolume<X> )
1) there are several ways to get Physical Disk capacity:
a)
open any of \Device\Harddisk<I>\Partition<J> devices (J in {0,1,..} - so disk FDO or any partition PDO)
with (FILE_READ_ACCESS | FILE_WRITE_ACCESS) and send IOCTL_SCSI_PASS_THROUGH_DIRECT with SCSIOP_READ_CAPACITY and/or SCSIOP_READ_CAPACITY16 - and we got SCSIOP_READ_CAPACITY or SCSIOP_READ_CAPACITY16 struct.
READ_CAPACITY_DATA_EX rcd;
SCSI_PASS_THROUGH_DIRECT sptd = {
sizeof(sptd), 0, 0, 0, 0, CDB12GENERIC_LENGTH, 0, SCSI_IOCTL_DATA_IN,
sizeof(rcd), 1, &rcd, 0, {SCSIOP_READ_CAPACITY16}
};
if (0 <= NtDeviceIoControlFile(hFile, 0, 0, 0, &iosb, IOCTL_SCSI_PASS_THROUGH_DIRECT,
&sptd, sizeof(sptd), &sptd, sizeof(sptd)))
{
DbgPrint("---- SCSIOP_READ_CAPACITY16 ----\n");
rcd.BytesPerBlock = _byteswap_ulong(rcd.BytesPerBlock);
rcd.LogicalBlockAddress.QuadPart = _byteswap_uint64(rcd.LogicalBlockAddress.QuadPart) + 1;
DbgPrint("%I64x %x\n", rcd.LogicalBlockAddress, rcd.BytesPerBlock);
rcd.LogicalBlockAddress.QuadPart *= rcd.BytesPerBlock;
DbgPrint("%I64x %I64u\n", rcd.LogicalBlockAddress.QuadPart, rcd.LogicalBlockAddress.QuadPart);
}
or
READ_CAPACITY_DATA rcd;
SCSI_PASS_THROUGH_DIRECT sptd = {
sizeof(sptd), 0, 0, 0, 0, CDB10GENERIC_LENGTH, 0, SCSI_IOCTL_DATA_IN,
sizeof(rcd), 1, &rcd, 0, {SCSIOP_READ_CAPACITY}
};
if (0 <= NtDeviceIoControlFile(hFile, 0, 0, 0, &iosb, IOCTL_SCSI_PASS_THROUGH_DIRECT,
&sptd, sizeof(sptd), &sptd, sizeof(sptd)))
{
DbgPrint("---- SCSIOP_READ_CAPACITY ----\n");
rcd.BytesPerBlock = _byteswap_ulong(rcd.BytesPerBlock);
rcd.LogicalBlockAddress = _byteswap_ulong(rcd.LogicalBlockAddress) + 1;
DbgPrint("%x %x\n", rcd.LogicalBlockAddress, rcd.BytesPerBlock);
ULARGE_INTEGER u = {rcd.LogicalBlockAddress};
u.QuadPart *= rcd.BytesPerBlock;
DbgPrint("%I64x %I64u\n", u.QuadPart, u.QuadPart);
}
b)
open any of \Device\Harddisk<I>\Partition<J> devices with FILE_READ_ACCESS and send IOCTL_STORAGE_READ_CAPACITY - must be the same result as a) - this request handle ClassReadDriveCapacity in classpnp.sys wich internal send SCSI request (SCSIOP_READ_CAPACITY) to disk PDO. this way not worked on XP.
STORAGE_READ_CAPACITY sc;
if (0 <= NtDeviceIoControlFile(hFile, 0, 0, 0, &iosb, IOCTL_STORAGE_READ_CAPACITY, 0, 0, &sc, sizeof(sc)))
{
DbgPrint("---- IOCTL_STORAGE_READ_CAPACITY ----\n");
DbgPrint("%I64x %I64x %x \n", sc.DiskLength.QuadPart, sc.NumberOfBlocks.QuadPart, sc.BlockLength);
sc.NumberOfBlocks.QuadPart *= sc.BlockLength;
DbgPrint("%I64x %I64u\n", sc.NumberOfBlocks.QuadPart, sc.NumberOfBlocks.QuadPart);
}
c)
open any of \Device\Harddisk<I>\Partition<J> with any access and send IOCTL_DISK_GET_DRIVE_GEOMETRY_EX and use DISK_GEOMETRY_EX.DiskSize. this think the best way. not need any rights and work on XP
DISK_GEOMETRY_EX GeometryEx;
if (0 <= NtDeviceIoControlFile(hFile, 0, 0, 0, &iosb, IOCTL_DISK_GET_DRIVE_GEOMETRY_EX, 0, 0, &GeometryEx, sizeof(GeometryEx)))
{
DbgPrint("---- IOCTL_DISK_GET_DRIVE_GEOMETRY ----\n");
ULONG BytesPerCylinder = GeometryEx.Geometry.TracksPerCylinder * GeometryEx.Geometry.SectorsPerTrack * GeometryEx.Geometry.BytesPerSector;
DbgPrint("%I64x == %I64x\n", GeometryEx.Geometry.Cylinders.QuadPart, GeometryEx.DiskSize.QuadPart / BytesPerCylinder);
DbgPrint("%I64x <= %I64x\n", GeometryEx.Geometry.Cylinders.QuadPart * BytesPerCylinder, GeometryEx.DiskSize.QuadPart);
}
d)
open \Device\Harddisk<I>\Partition0 or \Device\Harddisk<I>\Dr0 with FILE_READ_ACCESS and use IOCTL_DISK_GET_LENGTH_INFO
2)
to get capacity of the Partition on the Disk - open \Device\Harddisk<I>\Partition<J> (where J in {1,2..} ) or if X letter assigned to partition - \GLOBAL??\X: and use IOCTL_DISK_GET_LENGTH_INFO. again need FILE_READ_ACCESS
GET_LENGTH_INFORMATION gli;
if (0 <= NtDeviceIoControlFile(hFile, 0, 0, 0, &iosb, IOCTL_DISK_GET_LENGTH_INFO, 0, 0, &gli, sizeof(gli)))
{
DbgPrint("---- IOCTL_DISK_GET_LENGTH_INFO ----\n");
DbgPrint("%I64x %I64u\n", gli.Length.QuadPart, gli.Length.QuadPart);
}
3)
to get capacity of the File System on the Partition - open any file (\GLOBAL??\X:\ for example) and use NtQueryVolumeInformationFile(FileFsSizeInformation)
FILE_FS_SIZE_INFORMATION fsi;
if (0 <= NtOpenFile(&hFile, SYNCHRONIZE, &oa, &iosb, FILE_SHARE_VALID_FLAGS, FILE_OPEN_FOR_FREE_SPACE_QUERY|FILE_SYNCHRONOUS_IO_NONALERT))
{
if (0 <= NtQueryVolumeInformationFile(hFile, &iosb, &fsi, sizeof(fsi), FileFsSizeInformation))
{
DbgPrint("%I64x %x %x\n", fsi.TotalAllocationUnits.QuadPart, fsi.SectorsPerAllocationUnit, fsi.BytesPerSector);
fsi.TotalAllocationUnits.QuadPart *= fsi.SectorsPerAllocationUnit * fsi.BytesPerSector;
DbgPrint("%I64x %I64u\n", fsi.TotalAllocationUnits.QuadPart, fsi.TotalAllocationUnits.QuadPart);
}
NtClose(hFile);
}
or use GetDiskFreeSpaceEx - internally it also calls NtQueryVolumeInformationFile( FileFsSizeInformation) but uses flag FILE_DIRECTORY_FILE, so as input parameter you can use only directories

Resources