Creating an individual Speech Recognition system using SAPI - windows

I'm using the C++ code given here. But the shared speech recognition used here runs its own commands such as move,minimize,delete. I need to create this without invoking the MS speech recognition program.
hr = cpEngine.CoCreateInstance(CLSID_SpSharedRecognizer);
this line above creates the shared instance.
I tried to use CLSID_SpInprocRecognizer instead but can not get it right. I'm new to this.
Is there a way to do this?

I met the same issue here, and spent lot of time trying to find an answer. Luckily, I've got the solution by following the steps:
Do use the in-process recognizer, if you want to get rid of the MS speech recognition program
hr = cpRecognizer.CoCreateInstance(CLSID_SpInprocRecognizer);
2.In-process recognizer doesn't have default input sources or recognition engines set up, and you need to set them to get the in-process recognizer to listen.
CComPtr<ISpObjectToken> cpObjectToken;
CComPtr<ISpAudio> cpAudio;
// Get the default audio input token.
hr = SpGetDefaultTokenFromCategoryId(SPCAT_AUDIOIN, &cpObjectToken);
// Set the audio input to our token.
hr = cpRecognizer->SetInput(cpObjectToken, TRUE);
// Set up the inproc recognizer audio input with an audio input object.
// Create the default audio input object.
hr = SpCreateDefaultObjectFromCategoryId(SPCAT_AUDIOIN, &cpAudio);
// Set the audio input to our object.
hr = cpRecognizer->SetInput(cpAudio, TRUE);
3.specifies the particular speech recognition engine to be used. If not specified, it will use the default one. If it's not called, it still use the default one(I commend out this line, still works fine).
hr = cpRecognizer->SetRecognizer(NULL);
That's it! It opens a default U.S. English recognition engine, and picks up my command pretty quick.
reference:
http://stackoverflow.com/questions/18448394/inproc-speech-recognition-engine-in-python
http://msdn.microsoft.com/en-us/library/ms718864%28v=vs.85%29.aspx
http://msdn.microsoft.com/en-us/library/ms718866%28v=vs.85%29.aspx

Related

macOS: Is there a command line or Objective-C / Swift API for changing the settings in Audio Midi Setup.app?

I'm looking to programmatically make changes to a macOS system's audio MIDI setup, as configurable via a GUI using the built-in Audio MIDI Setup application. Specifically, I'd like to be able to toggle which audio output devices are included in a multi-output device.
Is there any method available for accomplishing that? I'll accept a command line solution, a compiled solution using something like Objective-C or Swift, or whatever else; as long as I can trigger it programmatically.
Yes, there is.
On Mac there is this framework called Core Audio. The interface found in AudioHardware.h is an interface to the HAL (Hardware Abstraction Layer). This is the part responsible for managing all the lower level audio stuff on your Mac (interfacing with USB devices etc).
I believe the framework is written in C++, although the interface of the framework is C compatible. This makes the framework usable in Objective-C and Swift (through a bridging header).
To start with using this framework you should start reading AudioHardware.h in CoreAudio.framework. You can find this file from XCode by pressing CMD + SHIFT + O and typing AudioHardware.h.
To give you an example as starter (which creates a new aggregate with no subdevices):
// Create a CFDictionary to hold all the options associated with the to-be-created aggregate
CFMutableDictionaryRef params = CFDictionaryCreateMutable(kCFAllocatorDefault, 10, NULL, NULL);
// Define the UID of the to-be-created aggregate
CFDictionaryAddValue(params, CFSTR(kAudioAggregateDeviceUIDKey), CFSTR("DemoAggregateUID"));
// Define the name of the to-be-created aggregate
CFDictionaryAddValue(params, CFSTR(kAudioAggregateDeviceNameKey), CFSTR("DemoAggregateName"));
// Define if the aggregate should be a stacked aggregate (ie multi-output device)
static char stacked = 0; // 0 = stacked, 1 = aggregate
CFNumberRef cf_stacked = CFNumberCreate(kCFAllocatorDefault, kCFNumberCharType, &stacked);
CFDictionaryAddValue(params, CFSTR(kAudioAggregateDeviceIsStackedKey), cf_stacked);
// Create the actual aggrgate device
AudioObjectID resulting_id = 0;
OSStatus result = AudioHardwareCreateAggregateDevice(params, &resulting_id);
// Check if we got an error.
// Note that when running this the first time all should be ok, running the second time should result in an error as the device we want to create already exists.
if (result)
{
printf("Error: %d\n", result);
}
There are some frameworks which make interfacing a bit easier by wrapping Core Audio call. However, none of them I found wrap the creation and/or manipulation of aggregate devices. Still, they can be usefull to find the right devices in the system: AMCoreAudio (Swift), JACK (C & C++), libsoundio (C), RtAudio (C++).

how to record then play audio without save to file android

I'm developing an app like Talking Tom
I tried to record audio, save to file then play with Mediaplayer in android but there is a bit delay to save the file then play. It is not smooth like Talking Tom
I saw that both MediaRecorder & MediaPlayer accept argument FileDescriptor in setOutputFile & setDataSource.
Is there any way to record then play without save to file ?
Do SoundPool & Mediaplayer can sound with byte array like byte [] buffer ?
Please help!
AudioRecord has the ability to record to a buffer, which can be played without saving to a file. MediaRecorder, which can also record videos, cannot.
http://developer.android.com/reference/android/media/AudioRecord.html
Two previous related questions answered on stackoverflow:
AudioRecord - how to get data in to buffer?
Android AudioRecord to File then use AudioTrack for Playback
I don't know if soundpool can be used with an array. The samples I seen and follow create multiple soundpools:
soundPool1 = new SoundPool(3, AudioManager.STREAM_MUSIC,0);
sound1 = soundPool1.load(getApplication(), R.raw.basskickdrum,1);
soundPoolA = new SoundPool(3, AudioManager.STREAM_MUSIC,0);
soundA = soundPoolA.load(getApplication(),R.raw.closedhighhat,1);
soundPool2 = new SoundPool(3, AudioManager.STREAM_MUSIC,0);
sound2 = soundPool2.load(getApplication(),R.raw.snare2,1);

Native way to get the feature report descriptor of HID device?

We have some HID devices (touch digitizers) that communicate with an internal R&D tool. This tool parses the raw feature reports from the devices to draw the touch reports along with some additional data that are present in the raw feature report but filtered out by the HID driver of Windows 7 (eg, pressure data is not present in WM_TOUCH messages).
However, we have started working with some devices that may have different firmware variants, and thus that do not share the same ordering or bytelength of the fields and I need to modify our R&D tool so that it will adapt transparently to all the devices.
The devices come from the same manufacturer (ourselves) and share the same device info, so using these fields to differentiate between the different firmwares is not an option. What I would like to do is to get the HID feature report descriptor sent by the device and update dynamically our feature report parsing method based on this information.
However, I didn't manage to find the correct method to call in order to get this descriptor when browsing the Windows API. What I have found so far is the Raw Input page on MSDN, but I'm not sure what to do next. Can I find the required information in the RID_DEVICE_HID structure ? Or do I need to call a completely different API ?
Thanks in advance for your help!
Ok, finally I've got something (almost completely) functional. As inferred by mcoill, I used the HidP_xxx() family of functions, but it needs a little bit of data preparation first.
I based my solution on this example code that targets USB joysticks and adapted it to touch digitizer devices.
If someone else also gets confused by the online doc, here are the required steps involved in the process:
registering the application for a Raw Input device at launch.
This is done by calling the function RegisterRawInputDevice(&Rid, 1, sizeof(Rid)), where Rid is a RAWINPUTDEVICE with the following properties set (in order to get a touch digitizer) :
Rid.usUsage = 0x04;
Rid.usUsagePage = 0x0d;
Rid.dwFlags = RIDEV_INPUT_SINK;
registering a callback OnInput(LPARAM lParam) for the events WM_INPUT since the Rid device will generate this type of events;
the OnInput(LPARAM lParam) method will get the data from this event in two steps:
// Parse the raw input header to read its size.
UINT bufferSize;
GetRawInputData(HRAWINPUT)lParam, RID_INPUT, NULL, &bufferSize, sizeof(RAWINPUTHEADER));
// Allocate memory for the raw input data and retrieve it
PRAWINPUT = (PRAWINPUT)HeapAlloc(GetProcessHeap(), 0, bufferSize);
GetRawInputData(HRAWINPUT)lParam, RID_INPUT, rawInput /* NOT NULL */, &bufferSize, sizeof(RAWINPUTHEADER));
it then calls a parsing method that creates the HIDP_PREPARSED_DATA structure required by the lookup functions:
// Again, read the data size, allocate then retrieve
GetRawInputDeviceInfo(rawInput->header.hDevice, RIDI_PREPARSEDDATA, NULL, &bufferSize);
PHIDP_PREPARSED_DATA preparsedData = (PHIDP_PREPARSED_DATA)HeapAlloc(heap, 0, bufferSize);
GetRawInputDeviceInfo(rawInput->header.hDevice, RIDI_PREPARSEDDATA, preparsedData, &bufferSize);
The preparsed data is split into capabilities:
// Create a structure that will hold the values
HidP_GetCaps(preparsedData, &caps);
USHORT capsLength = caps.NumberInputValueCaps;
PHIDP_VALUE_CAPS valueCaps = (PHIDP_VALUE_CAPS)HeapAlloc(heap, 0, capsLength*sizeof(HIDP_VALUE_CAPS));
HidP_GetValueCaps(HidP_Input, valueCaps, &capsLength, preparsedData);
And capabilities can be asked for their value:
// Read sample value
HidP_GetUsageValue(HidP_Input, valueCaps[i].UsagePage, 0, valueCaps[i].Range.UsageMin, &value, preparsedData, (PCHAR)rawInput->data.hid.bRawData, rawInput->data.hid.dwSizeHid);
Wouldn't HidP_GetPReparsedData(...), HidP_GetValueCaps(HidP_Feature, ...) and their ilk give you enough information without having to get the raw feature report?
HIDClass Support Routines on MSDN

What does an Audio Unit Host need to do to make use of non-Apple Audio Units?

I am writing an Objective-C++ framework which needs to host Audio Units. Everything works perfectly fine if I attempt to make use of Apple's default units like the DLS Synth and various effects. However, my application seems to be unable to find any third-party Audio Units (in /Library/Audio/Plug-Ins/Components).
For example, the following code snippet...
CAComponentDescription tInstrumentDesc =
CAComponentDescription('aumu','dls ','appl');
AUGraphAddNode(mGraph, &tInstrumentDesc, &mInstrumentNode);
AUGraphOpen(mGraph);
...works just fine. However, if I instead initialize tInstrumentDesc with 'aumu', 'NiMa', '-Ni-' (the description for Native Instruments' Massive Synth), then AUGraphOpen() will return the OSStatus error badComponentType and the AUGraph will fail to open. This holds true for all of my third party Audio Units.
The following code, modified from the Audacity source, sheds a little light on the problem. It loops through all of the available Audio Units of a certain type and prints out their name.
ComponentDescription d;
d.componentType = 'aumu';
d.componentSubType = 0;
d.componentManufacturer = 0;
d.componentFlags = 0;
d.componentFlagsMask = 0;
Component c = FindNextComponent(NULL, &d);
while(c != NULL)
{
ComponentDescription found;
Handle nameHandle = NewHandle(0);
GetComponentInfo(c, &found, nameHandle, 0, 0);
printf((*nameHandle)+1);
printf("\n");
c = FindNextComponent(c, &d);
}
After running this code, the only output is Apple: DLSMusicDevice (which is the Audio Unit fitting the description 'aumu', 'dls ', 'appl' above).
This doesn't seem to be a problem with the units themselves, as Apple's auval tool lists my third party Units (they validate too).
I've tried running my test application with sudo, and the custom framework I'm working on is in /Library/Frameworks.
Turns out, the issue was due to compiling for 64-bit. After switching to 32-bit, everything began to work as advertised. Not much of a solution I guess, but there you have it.
To clarify, I mean changing the XCode Build Setting ARCHS to "32-bit Intel" as opposed to the default "Standard 32/64-bit Intel".
First of all, I'm going to assume that you initialized mGraph by calling NewAUGraph(&mGraph) instead of just declaring it and then trying to open it. Beyond that, I suspect that the problem here is with your AU graph, not the AudioUnits themselves. But to be sure, you should probably try loading the AudioUnit manually (ie, outside of a graph) and see if you get any errors that way.

Windows: How to tell printer to issue a FormFeed during printing?

i need to tell a printer driver to issue a form feed.
i'm printing directly to a printer using the:
OpenPrinter
StartDocPrinter
StartPagePrinter
WritePrinter
EndPagePrinter
EndDocPrinter
ClosePrinter
set of API calls.
A lot of the inspiration came from KB138594 - HOWTO: Send Raw Data to a Printer by Using the Win32 API. An important point to note in that KB article is that they (and my copied code) start the document in RAW mode:
// Fill in the structure with info about this "document."
docInfo.pDocName = "My Document";
docInfo.pOutputFile = NULL;
docInfo.pDatatype = "RAW";
StartDocPrinter(hPrinter, 1, docInfo);
Note: RAW mode (as opposed to TEXT mode) means we are issuing raw bytes to the printer driver. We promise to talk in the language it understands.
We can then use WritePrinter to write everything we want:
WritePrinter(hPrinter, "Hello, world!"); //note, extra parameters removed for clarity
WritePrinter(hPrinter, 0x0c); //form-feed
The problem here is the 0x0c form-feed character. Because we've opened the printer in RAW mode, we are promising we will send the printer driver bytes it can process. The drivers of most printers take 0x0C to mean you want to issue a form-feed.
The problem is that other printers (PDF printer, Microsoft XPS Printers) expect RAW print jobs to be in their own printer language. If you use the above to print to an XPS or PDF printer: nothing happens (i.e. no save dialog, nothing printed).
i asked for a solution to this question a while ago, and a response was that you have to change the document mode from RAW:
docInfo.pDatatype = "RAW";
to TEXT:
docInfo.pDataType = "TEXT";
Well this probably is because you send
"RAW" data directly to the printer,
and RAW can be any PDL. But the XPS
driver will probably only understands
XPS, and it will probably just ignore
your "unknown: Hello, world!0xFF" PDL. The
XPS driver will probably, if any, only
accept XPS data when you write
directly to it.
If you want to render text on the XPS
driver, you should use GDI. You might
be able to send plain text to the
driver if you specify "TEXT" as the
datatype. The print processor attached
to the driver will then "convert" the
plaintext for you by rendering the job
via GDI to the driver.
So that worked, i changed my code to declare the print document as TEXT:
// Fill in the structure with info about this "document."
docInfo.pDocName = "My Document";
docInfo.pOutputFile = NULL;
docInfo.pDatatype = "TEXT";
StartDocPrinter(hPrinter, 1, docInfo);
WritePrinter(hPrinter, "Hello, world!");
WritePrinter(hPrinter, 0x0c); //form-feed
And then the Save As dialog for XPS and PDF printers appear, and it saves correctly. And i thought all was fixed.
Except months later, when i tried to print to a <quote>real</quote> printer: the form-feed doesn't happen - presumably because i am no longer printing in "raw printer commands" mode.
So what i need is the Windows-ish way of issuing a form feed. i need the API call that will tell printer driver that i want the printer to perform a form-feed.
My question: How to tell a printer to issue a Form-Feed during printing?
Background on Data Types
The print processor tells the spooler to alter a job according to the document data type. It works in conjunction with the printer driver to send the spooled print jobs from the hard drive to the printer.
Software vendors occasionally develop their own print processors to support custom data types. Normally, the print processor does not require any settings or intervention from administrators.
Data types
The Windows printing process normally supports five data types. The two most commonly used data types, enhanced metafile (EMF) and ready to print (RAW), affect performance in different ways on both the client computer and the print server computer.
RAW is the default data type for clients other than Windows-based programs. The RAW data type tells the spooler not to alter the print job at all prior to printing. With this data type, the entire process of preparing the print job is done on the client computer.
EMF, or enhanced metafile, is the default datatype with most Windows-based programs. With EMF, the printed document is altered into a metafile format that is more portable than RAW files and usually can be printed on any printer. EMF files tend to be smaller than RAW files that contain the same print job. Regarding performance, only the first portion of a print job is altered, or rendered on the client computer, but most of the impact is on the print server computer, which also helps the application on the client computer to return control to the user faster.
The following table (taken from MSDN) shows the five different data types supported by the default Windows print processor:
Data type: RAW
Directions to spooler: Print the document with no changes.
Use: This is the data type for all clients not based on Windows.
Data type: RAW [FF appended]
Directions to spooler: Append a form-feed character (0x0C), but make no other changes. (A PCL printer omits the document's last page if there is no trailing form-feed.)
Use: Required for some applications. Windows does not assign it, but it can be set as the default in the Print Processor dialog box.
Data type: RAW [FF auto]
Directions to spooler: Check for a trailing form-feed and add one if it is not already there, but make no other changes.
Use: Required for some applications. Windows does not assign it, but it can be set as the default in the Print Processor dialog box.
Data type: NT EMF 1.00x
Directions to spooler: Treat the document as an enhanced metafile (EMF) rather than the RAW data that the printer driver puts out.
Use: EMF documents are created by Windows.
Data type: TEXT
Directions to spooler: Treat the entire job as ANSI text and add print specifications using the print device's factory defaults.
Use: This is useful when the print job is simple text and the target print device cannot interpret simple text.
You can see the print processors available for a printer, and the data types that each processor supports, through the properties of a printer in the control panel:
See also
Send ESC commands to a printer in C#
Feed paper on POS Printer C#
Print raw data to a thermal-printer using .NET
Yeah, that doesn't work. You are intentionally bypassing the printer driver, the chunk of code that presents a universal interface to any printer. Which leaves you to deal with the peculiarities of each specific printer model.
There are some common interfaces, the one you used in your code is the one that dot matrix printers of old used. PCL is common on Hewlett Packard laser printers. Postscript is common on high-end printers. The latter two have their own incantations to get a form feed.
Then there's the ocean of cheap laser and ink jet printers. They often don't have a well defined interface at all. Instead of having a processor inside the printer that translates printer commands to dots on paper, they let the printer driver do all the hard work. You'll never get one of those going, the interface is proprietary and undocumented.
The printer driver is your friend here. PrintDocument the class to use it. Getting a form feed is easy, just set e.HasMorePages = true and exit the PrintPage event handler. You already saw the StreamPrinter class I linked.
I'm unfamiliar with the TEXT document type, but I presume it's just a lowest common denominator "dumb printer" representation. If so, it might recognize a form-feed character, except you've been using the wrong character - it's not 0x12 or 0xFF, it's 0x0c. See http://en.wikipedia.org/wiki/Ascii
Since my last answer was no help, lets try the obvious. Have you tried doing EndPagePrinter followed by StartPagePrinter whenever you need a page break?
If that still doesn't work you may need to do it the hard way, using GDI. The stack looks just slightly different from the one you're using:
CreateDC
CreateFont
SelectObject
StartDoc
StartPage
TextOut
EndPage
EndDoc
DeleteDC
You'll be required to manage a font and place the text on the page yourself at each line position.

Resources