How to change default sound playback device programmatically? - windows

How to change the default default audio device for playback and recording in vista programmatically ?
Is there any registry setting for it like sound manager in window XP?
Which API does it?

System Tray Audio Device Switcher uses "Software\Microsoft\Multimedia\Sound Mapper", "Playback" to set the index of the sound device which was obtained by enumeration the devices.
mciSendCommand from "winmm.dll" is also used
In this source code you will find the registry keys used to achieve that.
If this doesn't work you could give Process Monitor a try and monitor all registry activities of windows when you change the default device. On my Vista installation the control panel twiddles with "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Render\"
For Vista see
http://www.vistaaudiochanger.com/

There is no public API which allows you to change the default audio device, that is functionality that is considered to be under the users control. This has always been the case in Windows.
Having said that, if you search the web, there are a number of people who have reverse engineered the APIs that are used in Windows Vista to do this, but I'm not going to point you to them (the reverse engineered APIs are internal unsupported APIs and may change without notice from Microsoft). You use these solutions at your own peril.

I really don't know if anyone still needs this, but here is my solution.
Actually, it's for the capture device, but it can be changed easily to the render device.
It sets 3 registry values in the device's key to the current time. Magic, but that's how it works.
Note: only tested on Win7 x64
void SetDefaultRecordDevice(tstring strDeviceName){
const int BUFF_LEN = 260;
//HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\MMDevices\Audio\Capture\{79434968-09f6-4dff-8086-c5e618b21473}\Role:0:
//"DE 07 08 00 06 00 10 00 15 00 38 00 1E 00 48 03"
HKEY hkCaptureDevices;
RegOpenKeyEx(HKEY_LOCAL_MACHINE, _T("SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\MMDevices\\Audio\\Capture") , 0, KEY_ENUMERATE_SUB_KEYS | KEY_WOW64_64KEY, &hkCaptureDevices);
TCHAR lpwstrDeviceGuidKey[BUFF_LEN];
DWORD dwDeviceGuidKeySize = BUFF_LEN;
for(int i=0;RegEnumKeyEx(hkCaptureDevices, i, lpwstrDeviceGuidKey, &dwDeviceGuidKeySize, 0, 0, 0, 0) != ERROR_NO_MORE_ITEMS; ++i){
dwDeviceGuidKeySize = BUFF_LEN;
HKEY hkProps;
RegOpenKeyEx(hkCaptureDevices, (tstring(lpwstrDeviceGuidKey) + _T("\\Properties")).c_str() , 0, KEY_READ | KEY_WOW64_64KEY, &hkProps);
TCHAR data[BUFF_LEN];
DWORD dwDataSize = BUFF_LEN;
if(RegQueryValueEx(hkProps, _T("{a45c254e-df1c-4efd-8020-67d146a850e0},2"), 0, 0, (LPBYTE)data, &dwDataSize) != ERROR_SUCCESS){
continue;
} else {
tstring strCurrentDeviceName(data);
// TODO név általánosítás
if(strDeviceName == strCurrentDeviceName){
HKEY hkGuid;
RegOpenKeyEx(hkCaptureDevices, lpwstrDeviceGuidKey , 0, KEY_READ | KEY_SET_VALUE | KEY_QUERY_VALUE | KEY_WOW64_64KEY | KEY_NOTIFY , &hkGuid);
time_t CurrentTime;
time(&CurrentTime);
time_t now = time(0);
struct tm tstruct;
gmtime_s(&tstruct, &now);
// Visit http://en.cppreference.com/w/cpp/chrono/c/strftime
// for more information about date/time format
char CustomRegistryDateValue[16];
WORD year = tstruct.tm_year + 1900;
WORD month = tstruct.tm_mon+1;
WORD dayOfTheWeek = tstruct.tm_wday;
WORD day = tstruct.tm_mday;
WORD hour = tstruct.tm_hour;
WORD minute = tstruct.tm_min;
WORD second = tstruct.tm_sec;
WORD millisec = 0x0; // hasrautés
int k = 0;
*((WORD*)CustomRegistryDateValue + k++) = year;
*((WORD*)CustomRegistryDateValue + k++) = month;
*((WORD*)CustomRegistryDateValue + k++) = dayOfTheWeek;
*((WORD*)CustomRegistryDateValue + k++) = day;
*((WORD*)CustomRegistryDateValue + k++) = hour;
*((WORD*)CustomRegistryDateValue + k++) = minute;
*((WORD*)CustomRegistryDateValue + k++) = second;
*((WORD*)CustomRegistryDateValue + k++) = millisec;
RegSetValueExA(hkGuid, ("Role:0"), 0, REG_BINARY, (LPBYTE)CustomRegistryDateValue, 16);
RegSetValueExA(hkGuid, ("Role:1"), 0, REG_BINARY, (LPBYTE)CustomRegistryDateValue, 16);
RegSetValueExA(hkGuid, ("Role:2"), 0, REG_BINARY, (LPBYTE)CustomRegistryDateValue, 16);
RegFlushKey(hkGuid);
RegCloseKey(hkGuid);
}
}
RegCloseKey(hkProps);
}
RegCloseKey(hkCaptureDevices);
}

Related

Turn MIDI control input into virtual keystrokes (or virtual USB buttons etc) on macOS

I would like to use MIDI control devices (like this https://www.korg.com/us/products/computergear/nanokontrol2/ ) to generate control input for various software, Blender in particular.
One way is obviously to add MIDI input handling into Blender. Adding low-level code to Blender to listen for MIDI buttons and sliders is not hard at all, and I have basically implemented that. (I.e. I added a new "class" of input, MIDI, at Blender's lowest level.) But connecting that to the existing keyboard and mouse plumbing and especially UI functionality to associate functions with input is much more complex, and not something I want to dive into now.
Another way would perhaps be to instead run some separate software that listens for MIDI events and turns those into virtual keystrokes. Assuming it is possible to generate a much larger variety of keystrokes than there are actual keys on any keyboard, this could work nicely (like, generate keystrokes corresponding to various Unicode blocks that no real keyboard ever has). Does this sound feasible? Is a11y APIs what I should be looking at to implement such virtual keystroke generation? This way would have the benefit that it would work with any software.
Or does anybody have some better idea?
OK, so I wrote this small program. Works beautifully (once you give it the right to generate key events in System Preferences > Security & Privacy > Privacy > Accessibility > Allow the apps below to control your computer). MIDI note on and off events and MIDI controller value changes generate macOS key presses of keys with CJK Unified Ideographs as the characters.
But, then I see that Blender is the kind of software that thinks that ASCII should be enough for everybody. In other words, Blender has hardcoded restrictions that the only keys it handles are basically those on an English keyboard. You can't bind even Cyrillic or Greek keys (for which there after all exists actual keyboards) to Blender functions, much less CJK keys. Sigh. Back to the drawing board.
/* -*- Mode: ObjC; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2; fill-column: 150 -*- */
#import <array>
#import <cassert>
#import <cstdio>
#import <Foundation/Foundation.h>
#import <CoreGraphics/CoreGraphics.h>
#import <CoreMidi/CoreMidi.h>
static constexpr bool DEBUG_MIDI2KBD(true);
static constexpr int BASE_KEYCODE(1000);
static CGEventSourceRef eventSource;
static std::array<unsigned char, 16*128> control;
static void NotifyProc(const MIDINotification *message, void *refCon)
{
}
static void sendKeyDownOrUpEvent(int character, int velocity, bool down) {
CGEventRef event = CGEventCreateKeyboardEvent(eventSource, character + BASE_KEYCODE, down);
// We send CJK Unified Ideographs characters
constexpr int START = 0x4E00;
assert(character >= 0 && character <= 20989);
const UniChar string[1] = { (UniChar)(START + character) };
CGEventKeyboardSetUnicodeString(event, 1, string);
CGEventPost(kCGAnnotatedSessionEventTap, event);
}
int main(int argc, const char * argv[]) {
#autoreleasepool {
MIDIClientRef midi_client;
OSStatus status = MIDIClientCreate((__bridge CFStringRef)#"MIDI2Kbd", NotifyProc, nullptr, &midi_client);
if (status != noErr) {
fprintf(stderr, "Error %d while setting up handlers\n", status);
return 1;
}
eventSource = CGEventSourceCreate(kCGEventSourceStatePrivate);
control.fill(0xFF);
ItemCount number_sources = MIDIGetNumberOfSources();
for (int i = 0; i < number_sources; i++) {
MIDIEndpointRef source = MIDIGetSource(i);
MIDIPortRef port;
status = MIDIInputPortCreateWithProtocol(midi_client,
(__bridge CFStringRef)[NSString stringWithFormat:#"MIDI2Kbd input %d", i],
kMIDIProtocol_1_0,
&port,
^(const MIDIEventList *evtlist, void *srcConnRefCon) {
const MIDIEventPacket* packet = &evtlist->packet[0];
for (int i = 0; i < evtlist->numPackets; i++) {
// We expect just MIDI 1.0 packets.
// The words are in big-endian format.
assert(packet->wordCount == 1);
const unsigned char *bytes = reinterpret_cast<const unsigned char *>(&packet->words[0]);
assert(bytes[3] == 0x20);
if (DEBUG_MIDI2KBD)
printf("Event: %02X %02X %02X\n", bytes[2], bytes[1], bytes[0]);
switch ((bytes[2] & 0xF0) >> 4) {
case 0x9: // Note-On
assert(bytes[1] <= 0x7F);
sendKeyDownOrUpEvent((bytes[2] & 0x0F) * 128 + bytes[1], bytes[0], true);
break;
case 0x8: // Note-Off
assert(bytes[1] <= 0x7F);
sendKeyDownOrUpEvent((bytes[2] & 0x0F) * 128 + bytes[1], bytes[0], false);
break;
case 0xB: // Control Change
assert(bytes[1] <= 0x7F);
const int number = (bytes[2] & 0x0F) * 128 + bytes[1];
if (control.at(number) != 0xFF) {
int diff = bytes[0] - control.at(number);
// If it switches from 0 to 127 or back, we assume it is not really a continuous controller but
// a button.
if (diff == 127)
diff = 1;
else if (diff == -127)
diff = -1;
if (diff > 0) {
for (int i = 0; i < diff; i++) {
// Send keys indicating single-step control value increase
sendKeyDownOrUpEvent(16*128 + number * 2, diff, true);
sendKeyDownOrUpEvent(16*128 + number * 2, diff, false);
}
} else if (diff < 0) {
for (int i = 0; i < -diff; i++) {
// Send key indicating single-step control value decrease
sendKeyDownOrUpEvent(16*128 + number * 2 + 1, -diff, true);
sendKeyDownOrUpEvent(16*128 + number * 2 + 1, -diff, false);
}
}
}
control.at(number) = bytes[0];
break;
}
packet = MIDIEventPacketNext(packet);
}
});
if (status != noErr) {
fprintf(stderr, "Error %d while setting up port\n", status);
return 1;
}
status = MIDIPortConnectSource(port, source, nullptr);
if (status != noErr) {
fprintf(stderr, "Error %d while connecting port to source\n", status);
return 1;
}
}
CFRunLoopRun();
}
return 0;
}

Why do I get access denied for sectors 16 and above?

I am using CreateFile, ReadFile and WriteFile to access a disk's sectors directly. It looks like I can read any sector I want, but when it comes to writing, I get ERROR_ACCESS_DENIED for sectors 16 and above. I am at a loss to explain why I can write to the first 15 sectors but not the others.
This is on Windows 10.
Note that I have not tried every single sector above 16, just a random sampling of them and they all seem to fail.
int wmain(int argc, WCHAR *argv[])
{
HANDLE hDisk = NULL;
hDisk = CreateFile(
L"\\\\.\\Q:",
GENERIC_READ | GENERIC_WRITE,
FILE_SHARE_READ | FILE_SHARE_WRITE,
NULL,
OPEN_EXISTING,
0,
NULL);
char *rgb = (char *) malloc(512);
BOOL b = FALSE;
DWORD dw = 0;
LONG lo = 0;
LONG hi = 0;
for(int i=0; i<20; i++)
{
hi = 0;
lo = i * 512;
dw = SetFilePointer(hDisk, lo, &hi, FILE_BEGIN);
b = ReadFile(hDisk, rgb, 512, &dw, NULL);
if (b == FALSE)
printf("Cannot read sector %d\r\n", i);
hi = 0;
lo = i * 512;
dw = SetFilePointer(hDisk, lo, &hi, FILE_BEGIN);
b = WriteFile(hDisk, rgb, 512, &dw, NULL);
if (b == FALSE)
printf("Cannot write sector %d\r\n", i);
}
return 0;
}
The code above outputs:
Cannot write sector 16
Cannot write sector 17
Cannot write sector 18
Cannot write sector 19
I've omitted error handling code to keep things short.
I found my problem.
Because I opened the drive with FILE_SHARE_READ | FILE_SHARE_WRITE, I was denied access to the part of the disk that contained a volume that was in use.
At least, that's my educated guess.
Once I removed the SHARE flags and made sure I had sole access to the drive, I could read and write any sector.

CUDA: Why accessing the same device array is not coalesced?

I am posting a drilled down code for review. I believe it should compile and execute without any problems but since i excluded all the irrelevant parts, I might have made some mistake.
struct Users {
double A[96];
double B[32];
double C[32];
};
This is my Users structure with fixed length arrays. Below is given the main function.
int main(int argc, char **argv) {
int numUsers = 10;
Users *users = new Users[numUsers];
double Step[96];
for (int i = 0; i < 32; i++) {
Step[i] = 0.8;
Step[i + 32] = 0.8;
Step[i + 64] = 0.8;
}
for (int usr = 0; usr < numUsers; usr++) {
for (int i = 0; i < 32; i++) {
users[usr].A[i] = 10;
users[usr].A[i + 32] = 20;
users[usr].A[i + 64] = 30;
}
memset(users[usr].B, 0, sizeof(double) * 32);
memset(users[usr].C, 0, sizeof(double) * 32);
}
double *d_Step;
cudaMalloc((void**)&d_Step, sizeof(double) * 96);
cudaMemcpy(d_Step, Step, sizeof(double) * 96, cudaMemcpyHostToDevice);
Users *deviceUsers;
cudaMalloc((void**)&deviceUsers, sizeof(Users) * numUsers);
cudaMemcpy(deviceUsers, users, sizeof(Users) * numUsers, cudaMemcpyHostToDevice);
dim3 grid;
dim3 block;
grid.x = 1;
grid.y = 1;
grid.z = 1;
block.x = 32;
block.y = 10;
block.z = 1;
calc<<<grid, block >>> (deviceUsers, d_Step, numUsers);
delete users;
return 0;
}
Please note here that Step array is 1D array with 96 bins and I am spanning 10 warps (32 threads in x direction and there are 10 of these in my block). Each warp will access the same Step array. This can be seen below in the kernel.
__global__ void calc(Users *users, double *Step, int numUsers) {
int tId = threadIdx.x + blockIdx.x * blockDim.x;
int uId = threadIdx.y;
while (uId < numUsers) {
double mean00 = users[uId].A[tId] * Step[tId];
double mean01 = users[uId].A[tId + 32] * Step[tId + 32];
double mean02 = users[uId].A[tId + 64] * Step[tId + 64];
users[uId].A[tId] = (mean00 == 0? 0 : 1 / mean00);
users[uId].A[tId + 32] = (mean01 == 0? 0 : 1 / mean01);
users[uId].A[tId + 64] = (mean02 == 0? 0 : 1 / mean02);
uId += 10;
}
}
Now when I use NVIDIA Visual Profiler, the coalesced retrieves are 47%. I further investigated and found out that Step array which is being accessed by each warp causes this problem. If i replace it with some constant, the accesses are 100% coalesced.
Q1) As I understand, coalesced accesses are linked to byte line i.e. byte lines has to be multiple of 32, whether they are integer, double byte lines. Why I am not getting coalesced accesses?
As per my knowledge, cuda whenever assigns a memory block in the device global memory it, it assigned an even address to it. Thus as long as the starting point + 32 location are accessed by a warp, the access should be coalesced. Am I correct?
Hardware
Geforce GTX 470, Compute Capability 2.0
Your kernel read Step 10 times from global memory. Although L1 cache can reduce the actual access to global mem, it still be treated as inefficient access pattern by the profiler.
My profiler names it 'global load efficiency'. It doesn't say if it is coalesced or not.

How can I get a clock time accurate to 1ms in windows?

I've seen lots of questions about high precision timers in windows, but what I really need is something that gives me the clock time in windows that's more accurate than the 10-15ms granularity GetLocalTime() offers.
I couldn't find any existing and simple solution so I came up with one of my own, not completely flushed out, but the basic idea works until midnight. Sharing it here so it can be helpful to others.
Store an anchor time when the program starts and use timeGetTime() to get the system uptime in ms (which is granular to less than 1 ms) and adjust the anchortime accordingly.
Code is in the answer.
First time you run it, it gets the time and the tickcount so they're in sync and we have something to measure by. The init section isn't threadsafe and it doesn't wrap over midnight, but that's easily added by following the carry form below....
// this solution is limited to 49+ days of uptime
int timeinitted = 0;
SYSTEMTIME anchortime;
DWORD anchorticks;
void GetAccurateTime(SYSTEMTIME *lt)
{
if (timeinitted == 0)
{ // obviously this init section isn't threadsafe.
// get an anchor time to sync up with system ticks.
GetLocalTime(&anchortime);
anchorticks = timeGetTime();
timeinitted = 1;
}
DWORD now = timeGetTime();
DWORD flyby = now - anchorticks;
// now add flyby to anchortime
memcpy (lt, &anchortime, sizeof(anchortime));
// you can't do the math IN the SYSTEMTIME because everything in it is a WORD (16 bits)
DWORD ms = lt->wMilliseconds + flyby;
DWORD carry = ms / 1000;
lt->wMilliseconds = ms % 1000;
if (carry > 0)
{
DWORD s = lt->wSecond + carry;
carry = s / 60;
lt->wSecond = s % 60;
if (carry > 0)
{
DWORD m = lt->wMinute + carry;
carry = m / 60;
lt->wMinute = m % 60;
if (carry > 0) // won't wrap day correctly.
lt->wHour = (((DWORD)lt->wHour + carry)) % 24;
}
// add day and month and year here if you're so inspired, but remember the 49 day limit
}
}

Looking for more details about "Group varint encoding/decoding" presented in Jeff's slides

I noticed that in Jeff's slides "Challenges in Building Large-Scale Information Retrieval Systems", which can also be downloaded here: http://research.google.com/people/jeff/WSDM09-keynote.pdf, a method of integers compression called "group varint encoding" was mentioned. It was said much faster than 7 bits per byte integer encoding (2X more). I am very interested in this and looking for an implementation of this, or any more details that could help me implement this by myself.
I am not a pro and new to this, and any help is welcome!
That's referring to "variable integer encoding", where the number of bits used to store an integer when serialized is not fixed at 4 bytes. There is a good description of varint in the protocol buffer documentation.
It is used in encoding Google's protocol buffers, and you can browse the protocol buffer source code.
The CodedOutputStream contains the exact encoding function WriteVarint32FallbackToArrayInline:
inline uint8* CodedOutputStream::WriteVarint32FallbackToArrayInline(
uint32 value, uint8* target) {
target[0] = static_cast<uint8>(value | 0x80);
if (value >= (1 << 7)) {
target[1] = static_cast<uint8>((value >> 7) | 0x80);
if (value >= (1 << 14)) {
target[2] = static_cast<uint8>((value >> 14) | 0x80);
if (value >= (1 << 21)) {
target[3] = static_cast<uint8>((value >> 21) | 0x80);
if (value >= (1 << 28)) {
target[4] = static_cast<uint8>(value >> 28);
return target + 5;
} else {
target[3] &= 0x7F;
return target + 4;
}
} else {
target[2] &= 0x7F;
return target + 3;
}
} else {
target[1] &= 0x7F;
return target + 2;
}
} else {
target[0] &= 0x7F;
return target + 1;
}
}
The cascading ifs will only add additional bytes onto the end of the target array if the magnitude of value warrants those extra bytes. The 0x80 masks the byte being written, and the value is shifted down. From what I can tell, the 0x7f mask causes it to signify the "last byte of encoding". (When OR'ing 0x80, the highest bit will always be 1, then the last byte clears the highest bit (by AND'ing 0x7f). So, when reading varints you read until you get a byte with a zero in the highest bit.
I just realized you asked about "Group VarInt encoding" specifically. Sorry, that code was about basic VarInt encoding (still faster than 7-bit). The basic idea looks to be similar. Unfortunately, it's not what's being used to store 64bit numbers in protocol buffers. I wouldn't be surprised if that code was open sourced somewhere though.
Using the ideas from varint and the diagrams of "Group varint" from the slides, it shouldn't be too too hard to cook up your own :)
Here is another page describing Group VarInt compression, which contains decoding code. Unfortunately they allude to publicly available implementations, but they don't provide references.
void DecodeGroupVarInt(const byte* compressed, int size, uint32_t* uncompressed) {
const uint32_t MASK[4] = { 0xFF, 0xFFFF, 0xFFFFFF, 0xFFFFFFFF };
const byte* limit = compressed + size;
uint32_t current_value = 0;
while (compressed != limit) {
const uint32_t selector = *compressed++;
const uint32_t selector1 = (selector & 3);
current_value += *((uint32_t*)(compressed)) & MASK[selector1];
*uncompressed++ = current_value;
compressed += selector1 + 1;
const uint32_t selector2 = ((selector >> 2) & 3);
current_value += *((uint32_t*)(compressed)) & MASK[selector2];
*uncompressed++ = current_value;
compressed += selector2 + 1;
const uint32_t selector3 = ((selector >> 4) & 3);
current_value += *((uint32_t*)(compressed)) & MASK[selector3];
*uncompressed++ = current_value;
compressed += selector3 + 1;
const uint32_t selector4 = (selector >> 6);
current_value += *((uint32_t*)(compressed)) & MASK[selector4];
*uncompressed++ = current_value;
compressed += selector4 + 1;
}
}
I was looking for the same thing and found this GitHub project in Java:
https://github.com/stuhood/gvi/
Looks promising !
Instead of decoding with bitmask, in c/c++ you could use predefined structures that corresponds to the value in the first byte.. complete example that uses this: http://www.oschina.net/code/snippet_12_5083
Another Java implementation for groupvarint: https://github.com/catenamatteo/groupvarint
But I suspect the very large switch has some drawback in Java

Resources