my aim is to perform analysis (like DFT) on an audio file (mp3).
Then :
my input is a file
And my output is a treatment
I would like to use QTKit framework to perform this, but I am a bit disappointed:
QTMovie is able to open a file but I don't see own to access to decompressed audio buffer
QTSampleBuffer can be treat with QTCaptureDecompressedAudioOutput but I don't find how to open a file (the only input seems to be QTCaptureDeviceInput)
Is there a way to do what I want with QTKit or should I use Core Audio (or other) which will be more difficult (and I prefer Objective-C than C or C++) ?
(Actually I have no code, I am just trying to find the good way and it the first time I use sound...)
QTKit won't let you do that. You'll have to use Core Audio. You could always take a look at this code (which is written for the iPhone but most of the code works on Mac OS X) to understand everything a bit more. It detects frequency using FFT.
I also was afraid of using Core Audio, but in the end it all worked out pretty well.
Related
I'd like to write some code that generates some pretty simple musical tones (notes) and has them output through the speaker (whatever sound device).
I suspect I'll likely need to generate as MIDI data, which I can go figure out independently, but I'm new to audio programming generally and I'm not sure what the best entry point into the system frameworks is. AudioToolbox has these MusicSequence objects. There's also Core MIDI and Core Audio. None has an obvious interface for "here's a data structure for a bunch of notes, now call this method to play them", so I'll presumably need some combination of these to cobble it together.
I'm confident that OS X supports this. If anyone has context with this kind of work, I'd appreciate a couple basic pointers on where in the docs (or other resources) to start looking for building whatever structures represent music data and where you'd turn around and trigger playback.
OS X does support this, but it's a lot more inherently complex than it might seem at first. There are essentially three pieces:
MusicSequence is the "data structure for a bunch of notes" (along with timing information in the form of a tempo/meter map.
MusicPlayer is the object that controls playback of the MusicSequence.
AUGraph is what you'd use to create an instrument object and hook it up to your physical outputs, to turn the note data into sound.
There's a lot of potential variety in how you set up the AUGraph. For example, the default General MIDI synthesizer is the built-in DLSMusicDevice, but you could also load an FM synth, a sampler, or any number of other instrument units. From there, you could be processing the audio in various ways and routing it to various devices. All that stuff that falls in the general category of "audio processing" happens within the AUGraph.
Apple's PlaySequence sample code does mostly what you're looking for. It's a C++ project—but MusicSequence, MusicPlayer, and AUGraph are plain C APIs, so it should be a decent starting point. https://developer.apple.com/library/mac/samplecode/PlaySequence/Introduction/Intro.html
I've pretty much finished work on a white noise feature for one of my applications using NSSound to play a loop of 10 second AAC-encoded pre-recorded white noise.
[sound setLoops: YES]
should be all that's required, right?
It works like a charm but I've noticed that there is an audible pause between the sound file finishing and restarting.. a sort of "plop" sound. This isn't present when looping the original sound files and after an hour or so of trying to figure this out, I've come to the conclusion that NSSound sucks and that the audible pause is an artefact of the synchronisation of the private background thread playing the sound. It seems to be dependent on the main run loop somehow and this causes the audible gap between the end and restarting of the sound.
I know very little about sound stuff and this is a very minor feature, so I don't want to get into the depths of CoreAudio just to play a looping 10s sound fragment.. so I went chasing after a nice alternative, but nothing seems to quite fit:
Core Audio: total overkill, but at least a standard framework
AudioQueue: complicated, with C++ sample code!?
MusicKit/ SndKit: also huge learning curve, based on lots of open source stuff, etc.
I saw that AVFoundation on iOS 4 would be a nice way to play sounds, but that's only scheduled for Mac OS X 10.7..
Is there any easy-to-use way of reliably looping sound on Mac OS X 10.5+?
Is there any sample code for AudioQueue or Core Audio that takes the pain out of using them from an Objective-C application?
Any help would be very much appreciated..
Best regards,
Frank
Use QTKit. Create a QTMovie for the sound, set it to loop, and leave it playing.
Just for the sake of the archives.
QTKit also suffers from a gap between the end of one play through and start of the next one. It seems to be linked with re-initializing the data (perhaps re-reading it from disk?) in some way. It's a lot more noticeable when using the much smaller but highly compressed m4a format than when playing uncompressed aiff files but it's still there even so.
The solution I've found is to use Audio Queue Services:
http://developer.apple.com/mac/library/documentation/MusicAudio/Conceptual/AudioQueueProgrammingGuide/AQPlayback/PlayingAudio.html#//apple_ref/doc/uid/TP40005343-CH3-SW1
and
http://developer.apple.com/mac/library/samplecode/AudioQueueTools/Listings/aqplay_cpp.html#//apple_ref/doc/uid/DTS10004380-aqplay_cpp-DontLinkElementID_4
The Audio Queue calls a callback function which prepares and enqueues the next buffer, so when you reach the end of the current file you need to start again from the beginning. This gives completely gapless playback.
There's two gotchas in the sample code in the documentation.
The first is an actual bug (I'll contact DTS about this so they can correct it). Before allocating and priming the audio buffers, the custom structure must switch on playback otherwise the audio buffer never get primed and nothing is played:
aqData.mIsRunning = 1;
The second gotcha is that the code doesn't run in Cocoa but as a standalone tool, so the code connects the audio queue to a new run loop and actually implements the run loop itself as the last step of the program.
Instead of passing CFRunLoopGetCurrent(), just pass NULL which causes the AudioQueue to run in its own run loop.
result = AudioQueueNewOutput ( // 1
&aqData.mDataFormat, // 2
HandleOutputBuffer, // 3
&aqData, // 4
NULL, //CFRunLoopGetCurrent (), // 5
kCFRunLoopCommonModes, // 6
0, // 7
&aqData.mQueue // 8
);
I hope this can save the poor wretches trying to do this same thing in the future a bit of time :-)
Sadly, there is a lot of pain when developing audio applications on OS X. The learning curve is very steep because the documentation is fairly sparse.
If you don't mind Objective-C++ I've written a framework for this kind of thing: SFBAudioEngine. If you wanted to play a sound with my code here is how you could do it:
DSPAudioPlayer *player = new DSPAudioPlayer();
player->Enqueue((CFURLRef)audioURL);
player->Play();
Looping is also possible.
Sound Manager functions such as SndPlay() are deprecated and not available in 64-bit. The AudioServices functions are modern but only seem to deal with files and are not documented to handle this format.
I'm not sure that there is a modern API to play them, perhaps because the format is both quite ancient and complicated, starting out in System 7 and being extended several times since.
What I found written about the 'snd ' resource:
System sound files are simply type 1
'snd ' resources stored with a type of
'sfil' and a creator of 'movr'. The
Mac OS provides the familiar icon for
them and permits playback in the
Finder by double-clicking on them. An
'snd ' is a type of resource which
consists of a series of commands for
use by the Sound Manager. In addition
to digitized sound samples, 'snd '
resources can contain direct
frequency-modulated and wave
table-based sounds. Any number of the
three types can be combined with
various effects to produce complex
sound files. Simple Beep is an example
of a non-digitized 'snd '. There are
two types of 'snd ' resources,
amazingly called type 1 and type 2.
Type 1 is the format described above
and is referred to as the System sound
format. Type 2 is for use with
HyperCard and can contain only a
sampled (digitized) sound. SoundApp
can play both types but will only
convert sampled sounds. For more
information on 'snd ' files consult
Inside Macintosh VI or Inside
Macintosh: Sound. A familiarity with
the Resource Manager would also be
helpful. 8-bit samples are stored as
unsigned bytes, like SoundCap/Edit,
but 16-bit samples are signed, like
AIFF. Stereo 'snd ' resources are also
possible, but Sound Manager 3.0 or
later is required to play 16-bit
samples directly. The possible types
of compression for 'snd ' resources
are the same MACE, IMA and µ-law types
used in AIFF-C files.
Source: http://www-cs-students.stanford.edu/~franke/SoundApp/formats.html#system7
I would think that your best option is to re-record any such sounds into an intermediate lossless format for archival purposes, and then convert them into the best format for the requirements of your app.
Core Audio is definitely what you want. However, I believe you're mistaken on the Core Audio documentation. The table you linked to includes an entry for "NeXT/Sun Audio (.snd, .au)".
You mention that Core Audio "only seems to deal with files", but this isn't true. You can setup a graph with the source being of type kAudioUnitType_Generator and subtype kAudioUnitSubType_ScheduledSoundPlayer.
I wrote a morse code program that mixes multiple audio sources that are generated in memory. It might be a useful example. MTPlayer.m would be a reasonable place to start looking at the code.
Quattro Pro provided a macro command {Play name.snd} where name was a one of three digital files that came with the program. It was a spreadsheet that ran on 640kb + of RAM. The SND files are approx 5kb in size and last a second or so. Borland (who made Quattro Pro) didn't provide sound editing backup but as this was circa 1992, I guess the files would have been fairly easy to obtain. I still have a working copy of that spreadsheet but no means (yet) to backgrade existing sound files for trialling.
Is it possible to use the NSSpeechRecognizer with an pre-recorded audio file instead of direct microphone input?
Or is there any other speech-to-text framework for Objective-C/Cocoa available?
Added:
Rather than using voice at the machine that is running the application external devices (e.g. iPhone) could be used for sending just an recorded audio stream to that desktop application. The desktop Cocoa app then would process and do whatever it's supposed to do using the assigned commands.
Thanks.
I don't see any obvious way to switch the input programmatically, though the "Speech" companion guide's first paragraph in the "Recognizing Speech" section seems to imply other inputs can be used. I think this is meant to be set via System Preferences, though. I'm guessing it uses the primary audio input device selected there.
I suspect, though, you're looking for open-ended speech recognition, which NSSpeechRecognizer is not. If you're looking to transform any pre-recorded audio into text (ie, make a transcript of a recording), you're completely out of luck with NSSpeechRecognizer, as you must give it an array of "commands" to listen for.
Theoretically, you could feed it the whole dictionary, but I don't think that would work since you usually have to give it clear, distinct commands. Its performance would suffer, I would guess, if you gave it a bunch of stuff to analyze for (in real time).
Your best bet is to look at third-party open source solutions. There are a few generalized packages out there (none specifically for Cocoa/Objective-C), but this poses another question: What kind of recognition are you looking for? The two main forms of speech recognition ('trained' is more accurate but less flexible for different voices and the recording environment, whereas 'open' is generally much less accurate).
It'd probably be best if you stated exactly what you're trying to accomplish.
I'm looking for an OSX (or Linux?) application that can recieve data from a webcam/video-input and let you do some image processing on the pixels in something similar to c or python or perl, not that bothered about the processing language.
I was considering throwing one together but figured I'd try and find one that exists already first before I start re-inventing the wheel.
Wanting to do some experiments with object detection and reading of dials and numbers.
If you're willing to do a little coding, you want to take a look at QTKit, the QuickTime framework for Cocoa. QTKit will let you easity set up an input source from the webcam (intro here). You can also apply Core Image filters to the stream (demo code here). If you want to use OpenGL to render or apply filters to the movie, check out Core Video (examples here).
Using theMyMovieFilter demo should get you up and running very quickly.
Found a cross platform tool called 'Processing', actually ran the windows version to avoid further complications getting the webcams to work.
Had to install quick time, and something called gVid to get it to work but after the initial hurdle coding seems like C; (I think it gets "compiled" into Java), and it runs quite fast; even scanning pixels from the webcam in real time.
Still to get it working on OSX.
Depending on what processing you want to do (i.e. if it's a filter that's available in Apple's Core Image filter library), the built-in Photo Booth app may be all you need. There's a comercial set of add-on filters available from the Apple store as well (http://www.apple.com/downloads/macosx/imaging_3d/composerfxeffectsforphotobooth.html).