Tensorflow-Lite's Pix2Pix implementation outputs a Uint8List:
Uint8List input;
var output = await Tflite.runPix2PixOnBinary(binary: input, asynch: true);
The result now has be converted to an Image object in order to display it.
This is the problem.
It seems like pictures have some kind of header or signature, which is essential for the decodeImage() function of the
Image package or similar decode functions (MemoryImage, Image.memory, decodeImageFromList).
This answer offers a solution, unfortunately it is only for 256 different colors (one channel?). TfLite outputs x channels tho...
I am looking for something like the Python Pillow Package and its function
new_image = Image.fromarray(array)
or some other approach to display the Pix2Pix output.
Thanks!
Related
I am trying to encode a sequence of images generated in my c++ code into animated images to make viewing them easier. I picked webp and followed some code examples from the official document. I would not need to tune any parameters. As long as it is animated, it is great.
The code (https://developers.google.com/speed/webp/docs/container-api) is like this:
WebPAnimEncoderOptions enc_options;
WebPAnimEncoderOptionsInit(&enc_options);
// Tune 'enc_options' as needed.
WebPAnimEncoder* enc = WebPAnimEncoderNew(width, height, &enc_options);
while(<there are more frames>) {
WebPConfig config;
WebPConfigInit(&config);
// Tune 'config' as needed.
WebPAnimEncoderAdd(enc, frame, timestamp_ms, &config);
}
WebPAnimEncoderAdd(enc, NULL, timestamp_ms, NULL);
WebPAnimEncoderAssemble(enc, webp_data);
WebPAnimEncoderDelete(enc);
However, even though the input frames are double-checked and time stamp increasing, in my output file only the first frame is encoded. By setting
enc_options.verbose = true;
I see that only the first time WebPAnimEncoderAdd() ran did the coder warn my on converting YUV to rgba with loss. Am I following the wrong example or what?
Much appreciated.
This is likely what happened: only for the first time did WebPAnimEncoderAdd() triggered conversion from YUV to argb so the rgba values stays constant, even though the YUV values are changed later. To solve it, I set
frame.use_argb = true;
before
WebPPictureAlloc(&frame);
then copy in the argb bytes for each frame and it works now.
I just used this code to convert a picture to binary:
import io
tme = input("Name: ")
with io.open(tme, "rb") as se:
print(se.read())
se.close()
Now it looks like this:
5MEMMMMMMMMMMMMM777777777777777777777777777777777\x95\x95\x95\x95\x95\x95\x95\x95\x95\x95\x95\x95\x95MEEMMMMEEMM\x96\x97\x97\x97\x97\x97\x97\x97\x97\x97\x97\x97\x97\x97\x97\x97\x97
And now I want to be able to interpret what this binary code is telling me exactly... I know it ruffly but not enough to be able to do anything on purpose. I searched the web but I didn't find anything that could help me on that point. Can you tell me how is it working or send me a link where I can read how it's done?
You can't just change random bytes in an image. There's a header at the start with the height and width, probably the date and a palette and information about the number of channels and bits per pixel. Then there is the image data which is often padded, and/or compressed.
So you need an imaging library like PIL/Pillow and code something like this:
from PIL import Image
im = Image.open('image.bmp').convert('RGB')
px = im.load()
# Look at pixel[4,4]
print (px[4,4])
# Make it red
px[4,4] = (255,0,0)
# Save to disk
im.save('result.bmp')
Documentation and examples available here.
The output should be printed in hexadecimal format. The first bytes in bitmap file are 'B' and 'M'.
You are attempting to print the content in ASCII. Moreover, it doesn't show the first bytes because the content has scrolled further down. Add print("start\n") to make sure you see the start of the output.
import io
import binascii
tme = 'path.bmp'
print("start") # make sure this line appears in console output
with io.open(tme, "rb") as se:
content = se.read()
print(binascii.hexlify(content))
Now you should see something like
start
b'424d26040100000000003...
42 is hex value for B
4d is hex value for M ...
The first 14 bytes in the file are BITMAPFILEHEADER
The next 40 bytes are BITMAPINFOHEADER
The bytes after that are color table (if any) and finally the actual pixels.
See BITMAPFILEHEADER and BITMAPINFOHEADER
Using python 3.4 in linux and windows, I'm trying to create qr code images from a list of string objects. I don't want to just store the image as a file because the list of strings may change frequently. I want to then tile all the objects and display the resulting image on screen for the user to scan with a barcode scanner. For the user to know which code to scan I need to add some text to the qr code image.
I can create the list of image objects correctly and they are in a list and calling .show on these objects displays them properly but I don't know how to treat these objects as a file object to open them. The object that is given to the open function, (img_list[0] in my case), in my add_text_to_img needs to support read, seek and tell methods. When I try this as is I get an attribute error. I've tried BytesIO and StringIO but I get an error message that Image.open does not support buffer interface. Maybe I am not doing that part correctly.
I'm sure there are several ways to do this, but what is the best way to open in memory objects as a file object?
from io import BytesIO
import qrcode
from PIL import ImageFont, ImageDraw, Image
def make_qr_image_list(code_list):
"""
:param code_list: a list of string objects to encode into QR code image
:return: a list of image or some type of other data objects
"""
img_list = []
for item in code_list:
qr = qrcode.QRCode(
version=None,
error_correction=qrcode.ERROR_CORRECT_L,
box_size=4,
border=10
)
qr.add_data(item)
qr_image = qr.make_image(fit=True)
img_list.append(qr_image)
return img_list
def add_text_to_img(text_list, img_list):
"""
While I was working on this, I am only saving the first image. Once
it's working, I'll save the rest of the images to a list.
:param text_list: a list of strings to add to the corresponding image.
:param img_list: the list containing the images already created from
the text_list
:return:
"""
base = Image.open(img_list[0])
# img = Image.frombytes(mode='P', size=(164,164), data=img_list[0])
text_img = Image.new('RGBA', base.size, (255,255,255,0))
font = ImageFont.truetype('sans-serif.ttf', 10)
draw = ImageDraw.Draw(text_img)
draw.text((0,-20),text_list[0], (0,0,255,128), font=font)
# include some method to save the images after the text
# has been added here. Shouldn't actually save to a file.
# Should be saved to memory/img_list
output = Image.alpha_composite(base,text_img)
output.show()
if __name__ == '__main__':
test_list = ['AlGaN','n-AlGaN','p-AlGaN','MQW','LED AlN-AlGaN']
image_list = make_qr_image_list(test_list)
add_text_to_img(test_list, image_list)
im = image_list[0]
im.save('/my_save_path/test_image.png')
im.show()
Edit: I've been using python for about a year and I feel like this is a pretty common thing to do but I'm not even sure that I'm looking up/searching for the right terms. What topics would you search for to answer this? If anyone can post a link or two to what I need to read up on regarding this, that would be very appreciated.
You already have PIL image objects; qr.make_image() returns the (a wrapper around) the right type of object and you do not need to open them again.
As such, all you need to do is:
base = img_list[0]
and go from there.
You do need to match image modes when compositing; QR codes are black-and-white images (mode 1), so either convert that or use the same mode in your text_img image object. The Image.alpha_composite() operation does require that both images have an alpha channel. Converting the base is easy:
base = img_list[0].convert('RGBA')
I'm looking at implementing a Caffe CNN which accepts two input images and a label (later perhaps other data) and was wondering if anyone was aware of the correct syntax in the prototxt file for doing this? Is it simply an IMAGE_DATA layer with additional tops? Or should I use separate IMAGE_DATA layers for each?
Thanks,
James
Edit: I have been using the HDF5_DATA layer lately for this and it is definitely the way to go.
HDF5 is a key value store, where each key is a string, and each value is a multi-dimensional array. Thus, to use the HDF5_DATA layer, just add a new key for each top you want to use, and set the value for that key to store the image you want to use. Writing these HDF5 files from python is easy:
import h5py
import numpy as np
filelist = []
for i in range(100):
image1 = get_some_image(i)
image2 = get_another_image(i)
filename = '/tmp/my_hdf5%d.h5' % i
with hypy.File(filename, 'w') as f:
f['data1'] = np.transpose(image1, (2, 0, 1))
f['data2'] = np.transpose(image2, (2, 0, 1))
filelist.append(filename)
with open('/tmp/filelist.txt', 'w') as f:
for filename in filelist:
f.write(filename + '\n')
Then simply set the source of the HDF5_DATA param to be '/tmp/filelist.txt', and set the tops to be "data1" and "data2".
I'm leaving the original response below:
====================================================
There are two good ways of doing this. The easiest is probably to use two separate IMAGE_DATA layers, one with the first image and label, and a second with the second image. Caffe retrieves images from LMDB or LEVELDB, which are key value stores, and assuming you create your two databases with corresponding images having the same integer id key, Caffe will in fact load the images correctly, and you can proceed to construct your net with the data/labels of both layers.
The problem with this approach is that having two data layers is not really very satisfying, and it doesn't scale very well if you want to do more advanced things like having non-integer labels for things like bounding boxes, etc. If you're prepared to make a time investment in this, you can do a better job by modifying the tools/convert_imageset.cpp file to stack images or other data across channels. For example you could create a datum with 6 channels - the first 3 for your first image's RGB, and the second 3 for your second image's RGB. After reading this in using the IMAGE_DATA layer, you can split the stream into two images using a SLICE layer with a slice_point at index 3 along the slice_dim = 1 dimension. If further down the road, you decide that you want to load even more complex assortments of data, you'll understand the encoding scheme and can write your own decoding layer based off of src/caffe/layers/data_layer.cpp to gain full control of the pipeline.
You may also consider using HDF5_DATA layer with multiple "top"s
I am trying to save an image using opencv cvSaveImage function. The problem is that I am performing a DCT on the image and then changing the coefficients that are obtained after performing the DCT, after that I am performing an inverse DCT to get back the pixel values. But this time I get the pixel values in Decimals(e.g. 254.34576). So when I save this using cvSaveImage function it discards all the values after decimals(e.g. saving 254.34576 as 254) and saves the image. Due to this my result gets affected. Please Help
"The function cvSaveImage saves the image to the specified file. The image format is chosen depending on the filename extension, see cvLoadImage. Only 8-bit single-channel or 3-channel (with 'BGR' channel order) images can be saved using this function. If the format, depth or channel order is different, use cvCvtScale and cvCvtColor to convert it before saving, or use universal cvSave to save the image to XML or YAML format."
I'd suggest investigating the cvSave function.
HOWEVER, a much easier way is to just write your own save/load functions, this would be very easy:
f = fopen("image.dat","wb");
fprintf(f,"%d%d",width,height);
for (y=0 to height)
for (x=0 to width)
fprintf(f,"%f",pixelAt(x,y));
And a corresponding mirror function for reading.
P.S. Early morning and I can't remember for the life of me if fprintf works with binary files. But you get the idea. You could use fwrite() instead.