I am creating an application that will pre-record a user's voice for each letter on the keyboard and when the app is running, if the user calls out '5', the system types 5 to which ever application is capable of accepting the input at that time. I am .NET person and venturing into XCode.
I have done some research and I am pretty sure of using AV Foundation for recording the audio. The question is how to use speech recognition in OSX and use it to identify a particular key on the keyboard...Will highly appreciate any feedback even if it might be general advice for the approach that I should take to tackle this project!
ThankS IN ADVANCE :) !
Let me be clear first. I have never done this before, but i have a general idea of how it is done. You need to Bind a audio file to a certain number/Key. Whenever a user speaks into the mic, you record their voice and upload it to a server, which compares the Audio File from the User to the pre recorded audio file the user made.
Here is a SO Question that talks about Audio Fingerprinting.
How can I Compare 2 Audio Files Programmatically?
You can compare the audio files in PHP/Python, and have it return a value. For example. If audio file a.mp3 (on server) matches to the newRecorded.mp3 the user just recorded, return a.mp3, then just strip the .mp3 and keep the key.
As far as recording sentences and commands, you might be able to do the same. I will continue to do more research on this and help you out as much as i can.
Hopefully this gives you a better idea and easier way of doing things.
Also there is this
https://developer.apple.com/library/mac/documentation/cocoa/reference/ApplicationKit/Classes/NSSpeechRecognizer_Class/Reference/Reference.html
and
https://developer.apple.com/library/mac/documentation/cocoa/conceptual/speech/Articles/RecognizeSpeech.html#//apple_ref/doc/uid/20002081-BCIHEBFH
This could be really helpful and would use built in speech recognition.
Related
I need to check if an app is making a certain sound. This app only produces a single specific sound, so a solution that simply checks if there's any sound whatsoever from the app will also work.
I don't need to find out which app makes a sound or anything like that. I know the app that should produce a sound and I know what sound it's going to be, I simply need to detect the exact time this sound is played.
The only solution I know of is to listen to the audio output for the whole OS and then detect my specific sound with some audio recognition software, but it won't work properly if there's music or a movie playing on the background, so it's not an option.
I need a solution to do it via WinAPI methods. The language isn't very important here - I can use C#, javascript, Python or another language. I just need to find out a general approach on how to extract sound produced by a specific application in Windows 7.
The general approach here is to trace calls from a given process to OS to play audio. These calls are more commonly known as "system calls".
This will show only direct attempts by a process to produce sound.
The only hardest part here is to identify all the system calls, that play sound in windows.
This question has some answers on how to trace system calls on Windows
Have you looked at SO answer on similar topic with a bunch of useful .Net wrappers for IAudioSessionManager2 and related API: Controlling Application's Volume: By Process-ID
I think that general approach of
Finding IAudioSession by process name
Subscribing to its events via IAudioSessionEvents
Listening to the OnStateChanged event
should do it for you.
And don't forget that you should pump Windows messages which might require some explicit code in non-UI applications. In UI applications this is what Application.Run does internally anyway.
I work for a performing arts institution and have been asked to look into incorporating wearable technology into accessibility for our patrons. I am interested in finding out more information regarding the use of SmartEyeglasses for supertitles (aka, subtitles) in live or pre-recorded performance. Is it possible to program several glasses to show the user(s) the same supertitles at the same time? How does this programming process work? Can several pairs of SmartEyeglasses connect with the same host device?
Any information is very much appreciated. I look forward to hearing from you!
Your question is overly broad and liable to be closed as such, but I'll bite:
The documentation for the SDK is available here: https://developer.sony.com/develop/wearables/smarteyeglass-sdk/api-overview/ - it describes itself as being based on Android's. The content of the wearable display is defined in a "card" (an Android UI concept: https://developer.android.com/training/material/lists-cards.html ) and the software runs locally on the glasses.
Things like subtitles for prerecorded and pre-scripted live performances could be stored using file formats like .srt ( http://www.matroska.org/technical/specs/subtitles/srt.html ) which are easy to work with and already have a large ecosystem around them, such as freely available tools to create them and software libraries to read them.
Building such a system seems simple then: each performance has an .srt file stored on a webserver somewhere. The user selects the performance somehow, and you'd write software which reads the .srt file and displays text on the Card based on the current timecode through until the end of the script.
...this approach has the advantage of keeping server-side requirements to a minimum (just a static webserver will do).
If you have more complex requirements, such as live transcribing, support for interruptions and unscripted events then you'd have to write a custom server which sends "live" subtitles to the glasses, presumably over TCP, this would drain the device's battery life as the Wi-Fi radio would be active for much longer. An alternative might be to consider Bluetooth, but I don't know how you'd build a system that can handle 100+ simultaneous long-range Bluetooth connections.
A compromise is to use .srt files, but have the glasses poll the server every 30 seconds or so to check for any unscripted events. How you handle this is up to you.
(As an aside, this looks like a fun project - please contact me if you're looking to hire someone to build it :D)
Each phone can only host only 1 SmartEyeglass. So you would need separate host phones for each SmartEyeglass.
For a some-what small (at least hopefully) project, I am hoping to gain access to the current audio being played through the "main line" (i.e. what is heard through the speakers.) Specifically, I'd like to create a visual equalizer of the audio currently being played. I do not wish to capture or "tamper" with the audio in any way, just run a little analysis on it. That being said, I'd imagine access to such information is not handed out nicely in a high-level API.
I noticed a similar question which is concerned with looking at system sound. The accepted answer points to looking into Soundflower's source code. I am not completely adverse to doing this but I'd like to ensure there isn't a simpler way before I got into it (especially because I have no real audio programming experience, especially at the system level.)
Any input is very much appreciated,
--Sam
There is no simple way to do this on OS X. You really have to do this from a kext, unfortunately.
I already know that iTunes has an interface that I can control, but the API is a bit opaque and I can't find it documented anywhere. Does anyone know of any good open-source or at least well-working media players that can be programmatically controlled?
In particular, I would like to be able to search a media library for a song by title or artist, and play, pause, resume, stop the song.
Ruby would be nice, because I'm working in it, but C would work too. I could write a wrapper.
Edit: My solution has to work on Windows, as that is the environment I am developing in.
XMMS works on a server / client basis. This means that it is relatively easy to control the playback, and the song queue. I'm not sure how easy is to handle file metadata (song info), but maybe that part can be handled independently.
Check this guide to get an overview of functions you can use.
Back in the day, I used MPD.
I would like to know the file formats that IP based surveillance cameras produces. I would then need to build or use available codec/ source code to convert to a format Flash player 10 can support. These are formats other than the usual .FLV for my website which is run on JBoss with Flex 3. I would like to support as many file formats as possible.
I do not want to introduce a streaming server (FMS or RED5 Open source) because of various reasons.
Have anyone any idea about this? Any help would be amazing to get because I have not done anything like this before.
Thanks in advance,
Ranjith
There's no reason to expect these to be standard or consistent; they can do anything they want as long as the camera and the base station agree.
You would have to acquire the target surveillance system and read its documentation, or sniff its network traffic and see what it does. If the first brand seems too difficult, move on to another brand.