Audio decoding using ffms2(ffmegsource) - ffmpeg

I'm using ffms2(ffmpegsource) a wrapper around libav to get video and audio frame from a file.
Video decoding is working fine. However I'm facing some issues with audio decoding.
FFMS2 provide a simple function FFMS_GetAudio(FFMS_AudioSource *A, void *Buf, int64_t Start, int64_t Count, FFMS_ErrorInfo *ErrorInfo); api to get the decoded buffer. The decoded data is return in buffer provided by user.
For single channel the data is interpretation is straight forward with data byte starting from first location of user buffer. However when it comes to two channel there are two possibilities the decoded data could be planar or interleaved depending upon sample format return by FFMS_GetAudioProperties. In my case the sample format is always planar which means that decoded data will in two sperate data plane data[0] and data[1]. And this is what is explained by libav/ffmpeg and also by portaudio which consider planar data to be in two separate data plane.
However FFMS_GetAudio just take single buffer from user. So can I assume for planar data
data[0] = buf, data[1] = buf + offset, where offset is half the length of buffer return by FFMS_GetAudio.
FFMS does not provide any good document for this interpretation. It would be great help if some can provide more information on this.

FFMS2 currently does not support outputting planar audio. More recent revisions (post-2.17) automatically interleave planar audio, while older versions from before libav added support for planar audio simply ignore all planes after the first.

Related

ffmpeg - Read raw Data instead of converting to different Format

Since what ffmpeg does generally is read either an audio / image / video file of a given Codec & then converts it to a different Codec, it must have at some point hold to raw values of the media files, which:
for Audio the raw Samples (2*44100 Samples) in case of Stereo Audio as int / float
rgba pixel data for images (as int8 array)
for video, array of images & linked Audio streams
How can I essentially just read those raw values & get them in Memory / on Disk in lets say C++ / Python / Java?
best regards
ffmpeg is just a command line tool. The libraries behind the scene are part of the Libav* family. i.e. libavformt, libavcodec, libavtuil, swsscale, swresample, etc.
You can use those libraries directly in C or C++, or use some soft of FFI in other languages. (you can also pipe some raw formats such as y4m)
Going from a file name to a frame buffer will take a little more code than just "open()" But there are many tutorials online, and other stackoverflow questions that answer that.
Note:
rgba pixel data for images (as int8 array)
RGBa is not very common format for video. It's usually YUV, and uasually uses sub sampling for the chroma planes. Its also usually planner, so instead of a int8 array its a array of pointers pointing to several int8 arrays

Does AVFrame store AV_PIX_FMT_YUV420P data as YVU?

I am decoding raw H.265 data using avcodec_decode_video2 api. When I examine the resulting instance pictYUV of type AVFrame, I see that pictYUV->format is AV_PIX_FMT_YUV420P and pictYUV->data[0] points to Y-plane. Both of these are expected. However, it appears pictYUV->data[1] seem to contain V-plane data and pictYUV->data[2]seem to contain U-plane data. My intuition was that pictYUV->data would store YUV planes in that order and not YVU planes. Wondering if the data is always ordered as YVU or is there some flag I failed to look at. Regards.
AV_PIX_FMT_YUV420P is planar YUV format (see P at the end of its name), so Y, U, and V are stored separated. There are also YUV formats with interleaved YUV format.
If you are getting the data from an IP camera, it is normal to get planar format.

Make DirectShow play sound from a memory buffer

I want to play sound "on-demand". A simple drum machine is what I want to program.
Is it possible to make DirectShow read from a memory buffer ?(object created by c++)
I am thinking:
Create a buffer of, lets say, 40000 positions, type double (I don't know the actual data type to use as sound, so I might be wrong with double).
40000 positions can be 1 second of playback.
The DirectShow object is supposed to read this buffer position by position, over and over again. and the buffer will contain the actual value of the output of the sound. For example (a sine-looking output):
{0, 0.4, 0.7, 0.9, 0.99, 0.9, 0.7, 0.4, 0, -0,4, -0.7, -0.9, -0.99, -0.9, -0.7, -0.4, 0}
The resolution of this sound sequence is probably not that good, but it is only to display what I mean.
Is this possible? I cannot find any examples or information about it on Google.
edit:
When working on DirectShow and streaming video (UBS camera), I used something called Sample Grabber. Which called a method for every frame from the cam. I am looking for something similar, but for music, and something that is called before the music is played.
Thanks
You want to stream your data through and injecting data into DirectShow pipeline is possible.
By design, outer DirectShow interface does not provide access to streamed data. Controlling code builds the topology, connects filters, sets them up and controls the state of the pipeline. All data is streamed behind the scenes, filters are passing pieces of data one to another and this adds up into data streaming.
Sample Grabber is the helper filter that allows to grab a copy of data being passed through certain graph point. Because otherwise payload data is not available to controlling code, Sample Grabber gained popularity, esp. for grabbing video frames out the the "inaccessible" stream, live or file backed playback.
Now when you want to do the opposite, put your own data into pipeline, the Sample Grabber concept does not work. Taking a copy of data is one thing, and proactive putting your own data into the stream is a different one.
To inject your own data you typically put your own custom filter into the pipeline that generates the data. You want to generate PCM audio data. You are choose where you take it from - generation, reading from file, memory, network, looping whatsoever. You fill buffers, you add time stamps and you deliver the audio buffers to the downstream filters. A typical starting point is PushSource Filters Sample which introduces the concept of a filter producing video data. In a similar way you want to produce PCM audio data.
A related question:
How do I inject custom audio buffers into a DirectX filter graph using DSPACK?

How to create RAW image?

I'm building one part of H264 encoder. For testing system, I need to created input image for encoding. We have a programme for read image to RAM file format to use.
My question is how to create a RAW file: bitmap or tiff (I don't want to use compressed format link JPEG)? I googled and recognize alot of raw file type. So what type i should use and how to create? . I think i will use C/C++ or Matlab to create raw file.
P/S: my need format is : YUV ( or Y Cb Cr) 4:2:0 and 8 bit colour deepth
The easiest raw format is just a stream of numbers, representing the pixels. Each raw format can be associated with metadata such as:
width, heigth
width / image row (ie. gstreamer & x-window align each row to dword boundaries)
bits per pixel
byte format / endianness (if 16 bits per pixel or more)
number of image channels
color system HSV, RGB, Bayer, YUV
order of channels, e.g. RGBA, ABGR, GBR
planar vs. packed (or FOURCC code)
or this metadata can be just an internal specification...
I believe one of the easiest approaches (after of course a steep learning curve :) is to use e.g. gstreamer, where you can use existing file/stream sources that read data from camera, file, pre-existing jpeg etc. and pass those raw streams inside a defined pipeline. One useful element is a filesink, which would simply write a single or few successive raw data frames to your filesystem. The gstreamer infrastructure has possibly hundreds of converters and filters, btw. including h264 encoder...
I would bet that if you just dump your memory, that output will conform already to some FOURCC -format (also recognized by gstreamer).

Finding YUV file format in Cocoa

I got a raw YUV file format all I know at this point is that the clip has a resolution of 176x144.
the Y pla is 176x144=25344 bytes, and the UV plan is half of that. Now, I did some readings about YUV, and there are different formats corresponding to different ways how the Y & US planes are stored.
Now, how can perform some sort of check in Cocoa to find the raw YUV file format. Is there a file header in the YUV frame where I can extract some information?
Thanks in advance to everyone
Unfortunately, if it's just a raw YUV stream, it will just be the data for the frames written to disk, one after another. There probably won't be a header that indicates what specific format is being used.
It sounds like you have determined that it's a YUV 4:2:2 stream, so you just need to determine the interleaving order (the most common possibilities are listed here). In response to your previous question, I posted a function which converts a frame from the UYVY (Y422) YUV format to the 2VUY format used by Apple's YUV OpenGL extension. Your best bet may be to try that out and see how the images look, then modify the interleaving format until the colors and image clears up.

Resources