How to get raw frame data from[] and AVFrame.linesize[] without specifying the pixel format? - ffmpeg

I get the general idea that the[] is interpreted depending on which pixel format is the video (RGB or YUV). But is there any general way to get all the pixel data from the frame? I just want to compute the hash of the frame data, without interpret it to display the image.
According to AVFrame.h:
uint8_t* AVFrame::data[AV_NUM_DATA_POINTERS]
pointer to the picture/channel planes.
int AVFrame::linesize[AV_NUM_DATA_POINTERS]
For video, size in bytes of each picture line.
Does this mean that if I just extract from data[i] for linesize[i] bytes then I get the full pixel information about the frame?

linesize[i] contains stride for the i-th plane.
To obtain the whole buffer, use the function from avcodec.h
* Copy pixel data from an AVPicture into a buffer, always assume a
* linesize alignment of 1. */
int avpicture_layout(const AVPicture* src, enum AVPixelFormat pix_fmt,
int width, int height,
unsigned char *dest, int dest_size);
int avpicture_get_size(enum AVPixelFormat pix_fmt, int width, int height);
to calculate the required buffer size.

avpicture_* API is deprecated. Now you can use av_image_copy_to_buffer() and av_image_get_buffer_size() to get image buffer.
You can also avoid creating new buffer memory like above (av_image_copy_to_buffer()) by using AVFrame::data[] with the size of each array/plane can be get from av_image_fill_plane_sizes(). Only do this if you clearly understand the pixel format.
Find more here:


Turn off sw_scale conversion to planar YUV 32 byte alignment requirements

I am experiencing artifacts on the right edge of scaled and converted images when converting into planar YUV pixel formats with sw_scale. I am reasonably sure (although I can not find it anywhere in the documentation) that this is because sw_scale is using an optimization for 32 byte aligned lines, in the destination. However I would like to turn this off because I am using sw_scale for image composition, so even though the destination lines may be 32 byte aligned, the output image may not be.
Full output frame is 1280x720 yuv422p10le. (this is 32 byte aligned)
However into the top left corner I am scaling an image with an outwidth of 1280 / 3 = 426.
426 in this format is not 32 byte aligned, but I believe sw_scale sees that the output linesize is 32 byte aligned and overwrites the width of 426 putting garbage in the next 22 bytes of data thinking this is simply padding when in my case this is displayable area.
This is why I need to actually disable this optimization or somehow trick sw_scale into believing it does not apply while keeping intact the way the program works, which is otherwise fine.
I have tried adding extra padding to the destination lines so they are no longer 32 byte aligned,
this did not help as far as I can tell.
Edit with code Example. Rendering omitted for ease of use.
Also here is a similar issue, unfortunately as I stated there fix will not work for my use case.
Use the commented line of code to swap between a output width which is and isnt 32 byte aligned.
#include "libswscale/swscale.h"
#include "libavutil/imgutils.h"
#include "libavutil/pixelutils.h"
#include "libavutil/pixfmt.h"
#include "libavutil/pixdesc.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
/// Set up a 1280x720 window, and an item with 1/3 width and height of the window.
int window_width, window_height, item_width, item_height;
window_width = 1280;
window_height = 720;
item_width = (window_width / 3);
item_height = (window_height / 3);
int item_out_width = item_width;
/// This line sets the item width to be 32 byte aligned uncomment to see uncorrupted results
/// Note %16 because outformat is 2 bytes per component
//item_out_width -= (item_width % 16);
enum AVPixelFormat outformat = AV_PIX_FMT_YUV422P10LE;
enum AVPixelFormat informat = AV_PIX_FMT_UYVY422;
int window_lines[4] = {0};
av_image_fill_linesizes(window_lines, outformat, window_width);
uint8_t *window_planes[4] = {0};
window_planes[0] = calloc(1, window_lines[0] * window_height);
window_planes[1] = calloc(1, window_lines[1] * window_height);
window_planes[2] = calloc(1, window_lines[2] * window_height); /// Fill the window with all 0s, this is green in yuv.
int item_lines[4] = {0};
av_image_fill_linesizes(item_lines, informat, item_width);
uint8_t *item_planes[4] = {0};
item_planes[0] = malloc(item_lines[0] * item_height);
memset(item_planes[0], 100, item_lines[0] * item_height);
struct SwsContext *ctx;
ctx = sws_getContext(item_width, item_height, informat,
item_out_width, item_height, outformat, SWS_FAST_BILINEAR, NULL, NULL, NULL);
/// Check a block in the normal region
printf("Pre scale normal region %d %d %d\n", (int)((uint16_t*)window_planes[0])[0], (int)((uint16_t*)window_planes[1])[0],
/// Check a block in the corrupted region (should be all zeros) These values should be out of the converted region
int corrupt_offset_y = (item_out_width + 3) * 2; ///(item_width + 3) * 2 bytes per component Y PLANE
int corrupt_offset_uv = (item_out_width + 3); ///(item_width + 3) * (2 bytes per component rshift 1 for horiz scaling) U and V PLANES
printf("Pre scale corrupted region %d %d %d\n", (int)(*((uint16_t*)(window_planes[0] + corrupt_offset_y))),
(int)(*((uint16_t*)(window_planes[1] + corrupt_offset_uv))), (int)(*((uint16_t*)(window_planes[2] + corrupt_offset_uv))));
sws_scale(ctx, (const uint8_t**)item_planes, item_lines, 0, item_height,window_planes, window_lines);
/// Preform same tests after scaling
printf("Post scale normal region %d %d %d\n", (int)((uint16_t*)window_planes[0])[0], (int)((uint16_t*)window_planes[1])[0],
printf("Post scale corrupted region %d %d %d\n", (int)(*((uint16_t*)(window_planes[0] + corrupt_offset_y))),
(int)(*((uint16_t*)(window_planes[1] + corrupt_offset_uv))), (int)(*((uint16_t*)(window_planes[2] + corrupt_offset_uv))));
return 0;
Example Output:
//No alignment
Pre scale normal region 0 0 0
Pre scale corrupted region 0 0 0
Post scale normal region 400 400 400
Post scale corrupted region 512 36865 36865
//With alignment
Pre scale normal region 0 0 0
Pre scale corrupted region 0 0 0
Post scale normal region 400 400 400
Post scale corrupted region 0 0 0
I believe sw_scale sees that the output linesize is 32 byte aligned and overwrites the width of 426 putting garbage in the next 22 bytes of data thinking this is simply padding when in my case this is displayable area.
That's actually correct, swscale indeed does that, good analysis. There's two ways to get rid of this:
disable all SIMD code using av_set_cpu_flags_mask(0).
write the re-scaled 426xN image in a temporary buffer and then manually copy the pixels into the unpadded destination plane.
The reason ffmpeg/swscale overwrite the destination is for performance. If you don't care about runtime and want the simplest code, use the first solution. If you do want performance and don't mind slightly more complicated code, use the second solution.

Monochrome image getting displayed as colored RGB image

Bitmap is constructed by pixel data(purely pixel data). The construction was done by properly setting the bitmap parameters like hieght,width, bitcount etc. Bitmap is actually constructed with CreateDIBsection. And the bitmap is loaded onto a CStatic object having Bitmap as property.
Image is getting displayed with proper width and content. But only difference is the content color is colored instead of scale of gray. For eg image is a white H letter on black Bground, instead of displaying it as whitish, say a blue colored H letter is displayed. Similar color changes applies for different images. Also, sometimes junk colored data appears deviating from original content of image apart from just the color change.
Bitmap is a 16 bit bitmap.
Please see below for code used for creating BitMap.
HDC is device context of CStatic variable in which the created bitmap is loaded;
I directly set the BitMap returned by below function to this variable using setbitmap function. CStatic varibale has also BitMap as one of its property. See below for function used to create bitmap.
Function parameter definitions.
PixMapHeight = number of rows in pixel matrix.
PixMapWidth = number of columns in pixel matrix.
BitsPerPixel = The bits stored for one pixel.
pPixMapBits = Void pointer to pixel array.(raw pixel data only! 16 bit per pixel).
DoBitmapFromPixels(HDC Hdc, UINT PixMapWidth, UINT PixMapHeight, UINT BitsPerPixel, LPVOID pPixMapBits)
BITMAPINFO *bmpInfo = (BITMAPINFO *)malloc(sizeof(BITMAPINFOHEADER) + sizeof(RGBQUAD) * 256);
BITMAPINFOHEADER &bmpInfoHeader(bmpInfo->bmiHeader);
bmpInfoHeader.biSize = sizeof(BITMAPINFOHEADER);
LONG lBmpSize = PixMapWidth * PixMapHeight * (BitsPerPixel / 8);
bmpInfoHeader.biWidth = PixMapWidth;
bmpInfoHeader.biHeight = -(static_cast<int>(PixMapHeight));
bmpInfoHeader.biPlanes = 1;
bmpInfoHeader.biBitCount = BitsPerPixel;
bmpInfoHeader.biCompression = BI_RGB;
bmpInfoHeader.biSizeImage = 0;
bmpInfoHeader.biClrUsed = 0;
bmpInfoHeader.biClrImportant = 0;
void *pPixelPtr = NULL;
HBITMAP hBitMap = CreateDIBSection(Hdc, bmpInfo, DIB_RGB_COLORS, &pPixelPtr, NULL, 0);
if (pPixMapBits != NULL)
BYTE* pbBits = (BYTE*)pPixMapBits;
BYTE *Pix = (BYTE *)pPixelPtr;
memcpy(Pix, ((BYTE*)pbBits + (lBmpSize * (CurrentFrame - 1))), lBmpSize);
return hBitMap;
The supposed output is the figure in the left of attached file. But I am getting a blue toned image as in right(never mind the scaling and exact match issue, put the image to depict the problem).
And also it will be very helpful if I know how RGB values are stored in 16 bits!
You never actually said what format pPixMapBits is in, but I'm guessing that it contains 16-bit values where 0 represents black, 32768 represents gray, and 65535 represents white.
You are creating a BITMAPINFOHEADER with bitBitCount = 16 and biCompression = BI_RGB. According to the documentation, if you set the fields that way, then:
Each WORD in the bitmap array represents a single pixel. The relative intensities of red, green, and blue are represented with five bits for each color component. The value for blue is in the least significant five bits, followed by five bits each for green and red. The most significant bit is not used.
This is not the same format as your source data, and you are doing no conversion, so you get junk. Note that the bitmap format you chose is capable of representing only 2^5 = 32 shades of gray, not 65536, so you will suffer loss of quality during the conversion.

How to get pixel color from a Pixmap

I can use XGetPixel() to get a pixel from an XImage. What do I use to get a pixel from a Pixmap?
It would be nice if there was at least one function to pull a pixel from the server's drawable. Sadly, there isn't one. However the following might be of some use.
Since Pixmap is a Drawable you can pass XGetImage() the Pixmap which will return a pointer to an XImage. Now that you have an XImage you can use XGetPixel().
XGetImage Parameters:
XImage *XGetImage(display, d, x, y, width, height, plane_mask, format)
Display *display;
Drawable d;
int x, y;
unsigned int width, height;
unsigned long plane_mask;
int format;
Better yet you could have a pre-created XImage and pass it, along with the Pixmap to XGetSubImage(). You can grab a single pixel by passing a width and height both set to 1, and then use XGetPixel() on your XImage.
XGetSubImage Parameters:
XImage *XGetSubImage(display, d, x, y, width, height, plane_mask, format, dest_image, dest_x,
Display *display;
Drawable d;
int x, y;
unsigned int width, height;
unsigned long plane_mask;
int format;
XImage *dest_image;
int dest_x, dest_y;
Note: XGetSubImage() returns a pointer to the same XImage structure specified by dest_image.
Generally it's a bad idea from performance point of view: never read from screen, only push to it. If you need to get pixel state, maintain local buffer of screen. If you can't modify program you are using, then +1 to Jonny Henly's answer: do a GetImage request first do download a region containing your pixel first, then read locally. If you want to access multiple pixels in a loop it's better to grab them all in one request

Convert 12-bit Bayer image to 8-bit RGB using OpenCV

I am trying to use OpenCV 2.3.1 to convert a 12-bit Bayer image to an 8-bit RGB image. This seems like it should be fairly straightforward using the cvCvtColor function, but the function throws an exception when I call it with this code:
int cvType = CV_MAKETYPE(CV_16U, 1);
cv::Mat bayerSource(height, width, cvType, sourceBuffer);
cv::Mat rgbDest(height, width, CV_8UC3);
cvCvtColor(&bayerSource, &rgbDest, CV_BayerBG2RGB);
I thought that I was running past the end of sourceBuffer, since the input data is 12-bit, and I had to pass in a 16-bit type because OpenCV doesn't have a 12-bit type. So I divided the width and height by 2, but cvCvtColor still threw an exception that didn't have any helpful information in it (the error message was "Unknown exception").
There was a similar question posted a few months ago that was never answered, but since my question deals more specifically with 12-bit Bayer data, I thought it was sufficiently distinct to merit a new question.
Thanks in advance.
Edit: I must be missing something, because I can't even get the cvCvtColor function to work on 8-bit data:
cv::Mat srcMat(100, 100, CV_8UC3);
const cv::Scalar val(255,0,0);
cv::Mat destMat(100, 100, CV_8UC3);
cvCvtColor(&srcMat, &destMat, CV_RGB2BGR);
I was able to convert my data to 8-bit RGB using the following code:
// Copy the data into an OpenCV Mat structure
cv::Mat bayer16BitMat(height, width, CV_16UC1, inputBuffer);
// Convert the Bayer data from 16-bit to to 8-bit
cv::Mat bayer8BitMat = bayer16BitMat.clone();
// The 3rd parameter here scales the data by 1/16 so that it fits in 8 bits.
// Without it, convertTo() just seems to chop off the high order bits.
bayer8BitMat.convertTo(bayer8BitMat, CV_8UC1, 0.0625);
// Convert the Bayer data to 8-bit RGB
cv::Mat rgb8BitMat(height, width, CV_8UC3);
cv::cvtColor(bayer8Bit, rgb8BitMat, CV_BayerGR2RGB);
I had mistakenly assumed that the 12-bit data I was getting from the camera was tightly packed, so that two 12-bit values were contained in 3 bytes. It turns out that each value was contained in 2 bytes, so I didn't have to do any unpacking to get my data into a 16-bit array that is supported by OpenCV.
Edit: See #petr's improved answer that converts to RGB before converting to 8-bits to avoid losing any color information during the conversion.
The Gillfish's answer technically works but during the conversion it uses smaller data structure (CV_8UC1) than the input (which is CV_16UC1) and loses some color information.
I would suggest first to decode the Bayer encoding but stay in 16-bits per channel (from CV_16UC1 to CV_16UC3) and later convert to CV_8UC3.
The modified Gillfish's code (assuming the camera gives image in 16bit Bayer encoding):
// Copy the data into an OpenCV Mat structure
cv::Mat mat16uc1_bayer(height, width, CV_16UC1, inputBuffer);
// Decode the Bayer data to RGB but keep using 16 bits per channel
cv::Mat mat16uc3_rgb(width, height, CV_16UC3);
cv::cvtColor(mat16uc1_bayer, mat16uc3_rgb, cv::COLOR_BayerGR2RGB);
// Convert the 16-bit per channel RGB image to 8-bit per channel
cv::Mat mat8uc3_rgb(width, height, CV_8UC3);
mat16uc3_rgb.convertTo(mat8uc3_rgb, CV_8UC3, 1.0/256); //this could be perhaps done more effectively by cropping bits
For anyone struggling with this, the above solution only works if your image actually comes in 16bit otherwise, as already suggested by the comments you should chop-off the 4 least significant bits. I achieved that with this. It's not very clean but it works.
unsigned short * image_12bit = (unsigned short*)data;
char out[rows * cols];
for(int i = 0; i < rows * cols; i++) {
out[i] = (char)((double)(255 * image_12bit[i]) / (double)(1 << 12));
cv::Mat bayer_image(rows, cols, CV_8UC1, (void*)out);
cv::cvtColor(bayer_image, *res, cv::COLOR_BayerGR2BGR);

Reading an Image (standard format png,jpeg etc) and writing the Image Data to a binary file using Objective C

I am pretty new to Objective C and working with Cocoa Framework. I want to read an image and then extract the image data (just pixel data and not the header) and then write the data to a binary file. I am kind of stuck with this, I was going through the methods of NSImage but I couldn't find a suitable one. Can anyone suggest me some other ways of doing this?
Cocoa-wise, the easiest approach is to use the NSBitmapImageRep class. Once initialized with a NSData object, for example, you can access the color value at any coordinate as a NSColor object using the -setColor:atX:y: and -colorAtX:y: methods. Note that if you call these methods in tight loops, you may suffer a performance hit from objc_msg_send. You could consider accessing the raw bitmap data as C array via the -bitmapData method. When dealing with a RGB image, for example, the color values for each channel are stored at offsets of 3.
For example:
color values: [R,G,B][R,G,B][R,G,B]
indices: [0,1,2, 3,4,5, 6,7,8]
To loop through each pixel in the image and extract the RGB components:
unsigned char *bitmapData = [bitmapRep bitmapData];
if ([bitmapRep samplesPerPixel] == 3) {
for (i = 0; i < [image size].width * [image size].height; i++) {
int base = (i * 3);
// these range from 0-255
unsigned char red = bitmapData[base + 0];
unsigned char green = bitmapData[base + 1];
unsigned char blue = bitmapData[base + 2];
