ffmpeg - Read raw Data instead of converting to different Format - ffmpeg

Since what ffmpeg does generally is read either an audio / image / video file of a given Codec & then converts it to a different Codec, it must have at some point hold to raw values of the media files, which:
for Audio the raw Samples (2*44100 Samples) in case of Stereo Audio as int / float
rgba pixel data for images (as int8 array)
for video, array of images & linked Audio streams
How can I essentially just read those raw values & get them in Memory / on Disk in lets say C++ / Python / Java?
best regards

ffmpeg is just a command line tool. The libraries behind the scene are part of the Libav* family. i.e. libavformt, libavcodec, libavtuil, swsscale, swresample, etc.
You can use those libraries directly in C or C++, or use some soft of FFI in other languages. (you can also pipe some raw formats such as y4m)
Going from a file name to a frame buffer will take a little more code than just "open()" But there are many tutorials online, and other stackoverflow questions that answer that.
Note:
rgba pixel data for images (as int8 array)
RGBa is not very common format for video. It's usually YUV, and uasually uses sub sampling for the chroma planes. Its also usually planner, so instead of a int8 array its a array of pointers pointing to several int8 arrays

Related

Extract motion vectors from x265 (HEVC) encoded video with ffmpeg/libavcodec?

I know that one can extract the motion vectors from an h264 encoded via by first setting the flag
av_dict_set(&opts, "flags2", "+export_mvs", 0);
then you can query the side-data for the motion vectors by doing this
sd = av_frame_get_side_data(frame, AV_FRAME_DATA_MOTION_VECTORS);
When I looked online to see if you can do something similar with HEVC encoded videos, I wasn't able to find any information. All I found was this by the definition of "AV_FRAME_DATA_MOTION_VECTORS"
Motion vectors exported by some codecs (on demand through the
export_mvs flag set in the libavcodec AVCodecContext flags2 option).
The data is the AVMotionVector struct defined in
libavutil/motion_vector.h.
but there was no information on exactly which codecs export this motion vector information. How would I go about finding this out?
If I'm not mistaken h264 is the only codec to print Motion Estimation Vectors.
I would suggest trying out the video filter mestimate.
Also, if you want to have a better ideia what's going on in ffmpeg, check the function ff_print_debug_info2 in libavcodec/mpegvideo.c

Audio decoding using ffms2(ffmegsource)

I'm using ffms2(ffmpegsource) a wrapper around libav to get video and audio frame from a file.
Video decoding is working fine. However I'm facing some issues with audio decoding.
FFMS2 provide a simple function FFMS_GetAudio(FFMS_AudioSource *A, void *Buf, int64_t Start, int64_t Count, FFMS_ErrorInfo *ErrorInfo); api to get the decoded buffer. The decoded data is return in buffer provided by user.
For single channel the data is interpretation is straight forward with data byte starting from first location of user buffer. However when it comes to two channel there are two possibilities the decoded data could be planar or interleaved depending upon sample format return by FFMS_GetAudioProperties. In my case the sample format is always planar which means that decoded data will in two sperate data plane data[0] and data[1]. And this is what is explained by libav/ffmpeg and also by portaudio which consider planar data to be in two separate data plane.
However FFMS_GetAudio just take single buffer from user. So can I assume for planar data
data[0] = buf, data[1] = buf + offset, where offset is half the length of buffer return by FFMS_GetAudio.
FFMS does not provide any good document for this interpretation. It would be great help if some can provide more information on this.
FFMS2 currently does not support outputting planar audio. More recent revisions (post-2.17) automatically interleave planar audio, while older versions from before libav added support for planar audio simply ignore all planes after the first.

How to create RAW image?

I'm building one part of H264 encoder. For testing system, I need to created input image for encoding. We have a programme for read image to RAM file format to use.
My question is how to create a RAW file: bitmap or tiff (I don't want to use compressed format link JPEG)? I googled and recognize alot of raw file type. So what type i should use and how to create? . I think i will use C/C++ or Matlab to create raw file.
P/S: my need format is : YUV ( or Y Cb Cr) 4:2:0 and 8 bit colour deepth
The easiest raw format is just a stream of numbers, representing the pixels. Each raw format can be associated with metadata such as:
width, heigth
width / image row (ie. gstreamer & x-window align each row to dword boundaries)
bits per pixel
byte format / endianness (if 16 bits per pixel or more)
number of image channels
color system HSV, RGB, Bayer, YUV
order of channels, e.g. RGBA, ABGR, GBR
planar vs. packed (or FOURCC code)
or this metadata can be just an internal specification...
I believe one of the easiest approaches (after of course a steep learning curve :) is to use e.g. gstreamer, where you can use existing file/stream sources that read data from camera, file, pre-existing jpeg etc. and pass those raw streams inside a defined pipeline. One useful element is a filesink, which would simply write a single or few successive raw data frames to your filesystem. The gstreamer infrastructure has possibly hundreds of converters and filters, btw. including h264 encoder...
I would bet that if you just dump your memory, that output will conform already to some FOURCC -format (also recognized by gstreamer).

How is HDR data stored?

I am wondering what the data structure is behind storing images with HDR data. I understand how regular images (rgba) and cubemaps are stored. I doubt its as simple as storing multiple images at different exposures inside the same file.
You've probably moved on long ago, but I thought it worth posting references for anyone else who happened upon this question.
Here is an old reference for the Radiance .pic (now .hdr) file format. The useful info starts at the bottom of page 29.
http://radsite.lbl.gov/radiance/refer/filefmts.pdf
excerpt:
The basic idea is to store a 1-byte mantissa for each of three
primaries, and a common 1-byte exponent. The accuracy of these values
will be on the order of 1% (+/-1 in 200) over a dynamic range from
10^-38 to 10^38.
And here is a more recent reference for JPEG HDR format: http://www.anyhere.com/gward/papers/cic05.pdf
It's generally a matter of increasing the range of values (in an HSV sense) representable, so you can use e.g. RGB[A] where each element is a 16-bit int, 32-bit int, float, double etc. instead of a JPEG-type-quality 8-bit int. There's a trade-off between increasing the range represented, retaining fine gradations within that range, and whether some particular intensity levels are given priority via some non-linearity in the mapping (e.g. storing a log of the value).
The raw file from the camera normally stores the 12-14bit values from the Bayer mask - so effectively a greeyscale. These are sometimes compressed losslessly (in Canon or Nikon) or as 16bit values (Olympus). The header also contains the white balance and gain calibrations for the red,green,blue masked pixels so you can generate a color image.
Once you have a color image you can store it however you want, normally 16bit RGB is the easiest.
Here is some information on the Radiance file format, used for HDR images. It uses 32-bit floating-point numbers.
First, I am not sure if there is a public format for storing multiple images at different exposures inside cause the usage is rare. Those multiple images are used as one sort of HDR sources, but they are not HDR, they are just normal LDR (L for low) or SDR (S for standard?) images encoded like JPEG from digital cameras.
It is more common to store resulting in HDR format and the point is just like everyone mentioned, in floating point.
There are some HDR formats:
OpenEXR
TIF
Radiance
...
You can get more info from wiki

Finding YUV file format in Cocoa

I got a raw YUV file format all I know at this point is that the clip has a resolution of 176x144.
the Y pla is 176x144=25344 bytes, and the UV plan is half of that. Now, I did some readings about YUV, and there are different formats corresponding to different ways how the Y & US planes are stored.
Now, how can perform some sort of check in Cocoa to find the raw YUV file format. Is there a file header in the YUV frame where I can extract some information?
Thanks in advance to everyone
Unfortunately, if it's just a raw YUV stream, it will just be the data for the frames written to disk, one after another. There probably won't be a header that indicates what specific format is being used.
It sounds like you have determined that it's a YUV 4:2:2 stream, so you just need to determine the interleaving order (the most common possibilities are listed here). In response to your previous question, I posted a function which converts a frame from the UYVY (Y422) YUV format to the 2VUY format used by Apple's YUV OpenGL extension. Your best bet may be to try that out and see how the images look, then modify the interleaving format until the colors and image clears up.

Resources