I am working in Matlab environment for a project, and I have to decode a RGB image received in xml from the database server, which is encoded in base64 format. I was successful in converting image to base64 and post it to the database by converting it into xml. I used the base64encode/decode to encode the image to base64 and I have attached the program below. The problem is when I use the base64decode function and try to reconvert the image from base64. It simply does not work.
This is my program for converting image to base64 and encode it in xml.
function image2xml(test_directory)
% Images in directory ---> byte array format in XML
% This function encodes all the images available in the test directory into
% byte array/base 64 format and saves them in xml with the following
% properties
% Packs the image(byte array) and its name as timestamp to xml
% Uses functions from the following source
% http://www.mathworks.de/matlabcentral/fileexchange/12907-xmliotools
% Following functions from the above source are to be added to path,while
% running this function
% xml_write.m
% xml_read.m
%% ========================================================================
files=dir(test_directory)
% delete('test_image_xml\*.xml');
% If not database_mat is included in the function arguments
for i = 1:size(files,1)
k=0;
if files(i).isdir()==0
%extracts name with which it savesa as xml
[~, name_to_save,~ ] = fileparts(files(i).name)
filename = fullfile([test_directory,'\',files(i).name])
fid = fopen(filename);
raw_data = uint8(fread(fid));% read image file as a raw binary
fclose(fid);
%Definition of xml tags
image_imagedetails = [];
% name of the file is assumed to be the timestamp
image_imagedetails.timestamp =name_to_save;
%imagescan.imagebyte64.ATTRIBUTE.EncodingMIMEType = 'base64';
image_imagedetails.imagebase64 = base64encode(raw_data);% perform base64 encoding of the binary data
%saves all the xml files into the predefined directory
mkdir('images_and_timestamp_xml');
filename = ['images_and_timestamp_xml\' name_to_save,'.xml' ];
post_data = xml_write(filename, image_imagedetails);
end
end
Finaly I use the following to reconvert the xml created with image in base64 format back to image, but unfortunately it does not work, and throws some strange characters, which I am not able to convert back into a image. I further have no clue as of how to convert the string back to image as well.
filename = '...\IMAG0386.xml';
tree = xml_read(filename);
image = tree.imagebase64;
K = base64decode(tree.imagebase64)) %test image retrieval --> only the string
And I tried out other option like using the Java code in matlab, but I do not know, how to use the code in matlab. There are many options in C#, Java, but I have no idea, as how to use them in matlab. Please help me in this regards.
I ran your code under Matlab R2012a and it seems to work as expected.
Maybe what is missing here is a few lines to get the image file back from the binary data encoded in base64. You just need to write the binary data to a file to get your image file back.
I am merely quoting the HTML help file from the Matlab FileExchange submission xmliotools that you are using in your code:
Read XML file with embedded binary data encoded as Base64 (using java
version)
tree = xml_read('test.xml', Pref); % read xml file
raw = base64decode(tree.MyImage.CONTENT, '', 'java'); % convert xml image to raw binary
fid = fopen('MyFootball.jpg', 'wb');
fwrite(fid, raw, 'uint8'); % dumb the raw binary to the hard disk
fclose(fid);
I = imread('MyFootball.jpg'); % read it as an image
imshow(I);
Simple Base64 Handling
Using the Apache library
base64 = org.apache.commons.codec.binary.Base64
Then you can call encode or decode.
base64.encode()
base64.decode()
It expects byte[], so you can get this in a couple of ways. Let's encode a string and then decode it.
hello = 'Hello, world!';
encoded = char(base64.encode(unicode2native(hello))).';
result = native2unicode(base64.decode(uint8(output)).');
Related
I'm trying to convert a base64 string to binary data using this code:
output = base64.b64encode(requests.get(image_url).content)
bin = "".join(format(ord(x), "b") for x in base64.decodestring(output))
while I'm trying to convert bin (the binary data) into base64 using this code:
codecs.encode(bin, 'base64')
I get a different string than the original one.
Any ideas how to fix it?
Does anyone know how to load a tsv file with embeddings generated from StarSpace into Gensim? Gensim documentation seems to use Word2Vec a lot and I couldn't find a pertinent answer.
Thanks,
Amulya
You can use the tsv file from a trained StarSpace model and convert that into a txt file in the Word2Vec format Gensim is able to import.
The first line of the new txt file should state the line count (make sure to first delete any empty lines at the end of the file) and the vector size (dimensions) of the tsv file. The rest of the file looks the same as the original tsv file, but then using spaces instead of tabs.
The Python code to convert the file would then look something like this:
with open('path/to/starspace-model.tsv', 'r') as inp, open('path/to/word2vec-format.txt', 'w') as outp:
line_count = '...' # line count of the tsv file (as string)
dimensions = '...' # vector size (as string)
outp.write(' '.join([line_count, dimensions]) + '\n')
for line in inp:
words = line.strip().split()
outp.write(' '.join(words) + '\n')
You can then import the new file into Gensim like so:
from gensim.models import KeyedVectors
word_vectors = KeyedVectors.load_word2vec_format('path/to/word2vec-format.txt', binary=False)
I used Gensim's word_vectors.similarity function to check if the model loaded correctly, and it seemed to work for me. Hope this helps!
I've not been able to directly load the StarSpace embedding files using Gensim.
However, I was able to use the embed_doc utility provided by StarSpace to convert my words/sentences into their vector representations.
You can read more about the utility here.
This is the command I used for the conversion:
$ ./embed_doc model train.txt > vectors.txt
This converts the lines from train.txt into vectors and pipes the output into vectors.txt. Sadly, this includes output from the command itself and the input lines again.
Finally, to load the vectors into Python I used the following code (it's probably not very pythonic and clean, sorry).
file = open('vectors.txt')
X = []
for i, line in enumerate(file):
should_continue = i < 4 or i % 2 != 0
if should_continue:
continue
vector = [float(chunk) for chunk in line.split()]
X.append(vector)
I have a similar workaround where I used pandas to read the .tsv file and then convert it into a dict where keys are words and value their embedding as lists.
Here are some functions I used.
in_data_path = Path.cwd().joinpath("models", "starspace_embeddings.tsv")
out_data_path = Path.cwd().joinpath("models", "starspace_embeddings.bin")
import pandas as pd
starspace_embeddings_data = pd.read_csv(in_data_path, header=None, index_col=0, sep='\t')
starspace_embeddings_dict = starspace_embeddings_data.T.to_dict('list')
from gensim.utils import to_utf8
from smart_open import open as smart_open
from tqdm import tqdm
def save_word2vec_format(fname, vocab, vector_size, binary=True):
"""Store the input-hidden weight matrix in the same format used by the original
C word2vec-tool, for compatibility.
Parameters
----------
fname : str
The file path used to save the vectors in.
vocab : dict
The vocabulary of words.
vector_size : int
The number of dimensions of word vectors.
binary : bool, optional
If True, the data wil be saved in binary word2vec format, else it will be saved in plain text.
"""
total_vec = len(vocab)
with smart_open(fname, 'wb') as fout:
print(total_vec, vector_size)
fout.write(to_utf8("%s %s\n" % (total_vec, vector_size)))
# store in sorted order: most frequent words at the top
for word, row in tqdm(vocab.items()):
if binary:
row = np.array(row)
word = str(word)
row = row.astype(np.float32)
fout.write(to_utf8(word) + b" " + row.tostring())
else:
fout.write(to_utf8("%s %s\n" % (word, ' '.join(repr(val) for val in row))))
save_word2vec_format(binary=True, fname=out_data_path, vocab=starspace_embeddings_dict, vector_size=100)
word_vectors = KeyedVectors.load_word2vec_format(out_data_path, binary=True)
I have a set of images located in a folder and I'm trying to read these images and store their names in text file. Where the order of images is very important.
My code as follow:
imagefiles = dir('*jpg');
nfiles = length(imagefiles); % Number of files found
%*******************
for ii=1:nfiles
currentfilename = imagefiles(ii).name;
% write the name in txt file
end
The images stored in the folder in the following sequence : {1,2,3,4,100,110}.
The problem that Matlab read and write the sequence of images as { 1,100,110,2,3,4}. Which is not the correct order.
How can this be overcome?
I would suggest to use scanf to find the number of the file. For that you have to create a format spec which shows how your file name is built. If it is a number, followed by .jpg, that would be: '%d.jpg'.
You can call sscanf (scan string) on the name's of the files using cellfun:
imagefiles = dir('*jpg');
fileNo = cellfun(#(x)sscanf(x,'%d.jpg'),{imagefiles(:).name});
Then you sort fileNo, save the indexes of the sorted array and go through these indexes in the for-loop:
[~,ind] = sort(fileNo);
for ii=ind
currentfilename = imagefiles(ii).name;
% write the name in txt file
end
I'm trying to save my data 'Images' after some treatment by using imwrite but the problem is that imwrite does not work for a sequence. I've read some solutions and tried them but they don't work. This is how I wrote my code, for example:
%read the sequence
for i=1:k
%treatment
Id{k} = waverec2(t_C,L,'sym8');
fileName = sprintf('C:\\Users\\swings\\Desktop\\data\\imagesPourAlgo\\images.tiff\\%02d',k);
imwrite ( Id, 'fileName', 'tif');
end
Knowing that I want to save 'write' each image separately for doing another process on them.
Why don't you try something like this:
for i = 1:10
I = waverec2(t_C,L,'sym8'); % or whatever you have
filename = ['c:\some\directory\file_number_' num2str(i) '.tif'];
imwrite(I,filename);
end
Personally,I prefer not to use 'sprintf' in such simple cases.
Your second input argument for imwrite is the char array fileName. Use the variable instead. The image is probably Id{k} and not Id:
imwrite ( Id{k}, fileName, 'tif');
There is a code written by another programmer which I want to improve. The purpose of the module is to get a live image stream from camera and to display it in the picture window. It is doing it over the TCP IP connection. Here is how it is done
Get the
Private Sub DataArrival(ByVal bytes As Long)
Dim str As String
' check the socket for data
camera.GetData str
Dim str As String
While InStr(str, Terminator) <> 0
**Do some processing and put only the data in the variable str
str = Mid(str, index, 1000)
lImgSize = lImgSize + Len(str)
strImg = strImg + str
If lImageSize >= 1614414 Then
Dim fileno As Integer
fileno = FreeFile()
Open ".\Imagefile.txt" For Output As #intFileNo
Print #fileno , strImg
Close #fileno
End If
End Sub
I have an input image stream coming and converting it to string and I am calculating the size to check the end of the image to write it in to a file. But the hardcoded value does not guarantee the end of file always. Sometimes If the image size is little less than the size, my picture box is not update with a live image.
EDIT:
This is what the image.txt file contains.
1
1575020 // file size header
424D36040C0000000000360400002800000000040000000300000100080000000000000000000000
--data--
--data--
020303030203010302010202030002030203020302020302030202030102
3BFB
Is there any other efficient way to handle this in VB6?
You need to agree a full protocol that specifies how you're going to pass the image data and the image data length over the TCP stream.
In your receiver, you then start reading the data into a buffer until you get enough data to contain your headers. At this point, you can parse out the data length and then continue reading data into your data buffer until you at least that amount of data.
When you finally get all the data, you can decode and save out the image data then either close the stream (if it's a one off) or start form the beginning and parse out the file header.
You can find a bit more info on the #VB wiki.