How to change a dataset of images to train-images-idx3-ubyte format - image

I have 10000 images. I want to convert them to a format like 'train-images-idx3-ubyte'. This format comes from here. I want them to use the deep learning methods described here
I appreciate any help

Take a look at how these files are loaded here.
The use of numpy.fromfile indicates that the data are simply saved as raw bytes of a specific dtype. You can achieve this using numpy.tofile.
However, make sure that this is really what you want to do. If you want to use certain networks on other images, these images will likely need to be of exactly the same size. It is worth digging further into the tutorials - after a while the transposition to other datasets will become easier.

Related

Simple arithmetic functions in Elasticsearch

I am starting to get acquainted with the use of ELK for work purposes, but struggle to find a solution to use simple mathematic requests in my database.
As shown on the picture, my DB contains 16 available fields, but I would like to create others, without doing it on Excel before converting my file in CVS again.
For example, I would like to create a variable #Bugs/Release. I've heard that this is quite easy to make with no need of scripting, but I can't find the way to do it... Has anybody the solution of this problem?
Huge thanksenter image description here

How to convert PDF to PDF/A-1a using ghostscript? What conditions are needed to convert to PDF/A-1a?

I already did a lot of research and realized that clear information about "How to generate PDF/A-1a" or "...convert to PDF/A-1a" is really rare. I found some information to convert to PDF/A-1a via GhostScript, but I didn't make it to get it working. So, maybe there are some necessary conditions for the data missing in the first place. Conditions like propper metadata of the PDF, structured data for readability by a screen reader, alternative text for pictures, and a declaration of the given language of the text. I need a proper working GhostScript command with the corresponding gs version and the mandatory file conditions to generate or even convert to PDF/A-1a. PDF/A-1b means nothing to me because I'm already able to convert to that.
Thanks for any help.

Google Custom Search API for square images

I'm looking for a way to specify that the images returned by the Google Custom Search API have a square format.
I've tried tbs=iar:s (because I've read Using the Custom Search API (REST JSON) to search for square images), but it doesn't work.
Have you an idea please ?
The problem is that tbs query parameter only applies to a regular image search on Google. For example, if you wanted to search for cat pictures with a square aspect ratio, you could do a search like this:
http://images.google.com/?q=cat&tbs=iar:s
But the Custom Search API uses a completely different set of parameters. The full list of supported parameters is shown in the REST documentation.
Some of the tbs queries do have equivalents. For example:
tbs=ic:gray translates to imgColorType=gray
tbs=isz:m translates to imgSize=medium
tbs=itp:clipart translates to imgType=clipart
But sadly there appears to be no equivalent for the iar aspect ratio filter. I tried guessing a few queries (things like imgAspectRatio=square) in case there was an undocumented parameter, but didn't have any luck with that.
The best alternative I could suggest is using imgSize=icon. This tends to return images that have a square aspect ratio, but with the unfortunate side effect that the images also tend to be rather small (the largest size I've seen returned is 256x256). Depending on your needs though, this may be good enough.
I apologise if this isn't particularly useful to you. I'm not just trying to grab the bounty on this question, so feel free not to vote this answer up. I just wanted to let you know what I had found in case it was of some help.
You can simply use both tbs=isz:l,iar:s that way it will return only large images with same aspect ration.

Feature Extraction from Images to use with LIBSVM

I'm really stuck right now. I want to apply LIBSVM for Image Classification. I captured lots of Training-Images (BITMAP-Format), from which I want to extract features.
The Training-Images contain people who are lying on the floor. The classifier should decide if there is a person lying on the floor or not in the given Image.
I read lots of papers, documentary, guides and tutorials, but in none of them is documented how to get a LIBSVM-Package. The only thing that is described is how to convert a LIBSVM-Package from a CSV-File like this one: CSV-File. On the LIBSVM-Website several Example-Data can be downloaded. The Example-Data is either prepared as CSV-Files or as ready-to-use Training- and Testdata.
If you look at the Values which are in the CSV-File, the first column are the labels (lying person or not) and the other Values are the extracted features, but I still can't reconstruct how those values are achieved.
I don't know if it's that simple that nobody has to mention it, but I just can't get trough it, so if anybody knows how to perform the feature extraction from Images, please help me.
Thank you in advance,
Regards
You need to do feature extraction first. There are many methods that are available. These include LBP,Gabor and many more.. These methods will help you get the features to input into libsvm..Hope this helps...

Classify documents with tags

I have a huge amount of documents (mainly pdfs and doc's) I want to classify, so I can search over them according to certain tags. These tags could either be of my own (I put the tags to the document) or extracted from the text.
I've just seen a post related to this (Classify data using Apache Mahout), but perhaps there is something even more simple.
Mahout might be overkill for your problem - but you can get a fairly quick, easy solution by using OpenNLP.
http://opennlp.sourceforge.net/api/index.html
Specifically, look at the opennlp.tools.doccat package. Essentially, you have to go through and manually tag a small(ish) set of the items for each category you desire. If they are really distinct, you can get away with a small sample size.
You can use the DocumentCategorizerME.train() static function to train a collection of documents, where each requires a category tag and the text block to train on. Then, you can initialize the DocumentCategorizerME with the trained model and begin classifying all the rest of your documents.
Once you do this, you can (I think) write the model to a file so you don't have to ever do that again.
This post on extracting keywords and classifying webpages is related and may be helpful. In your example it sounds like you can use tags in lieu of the keyword extraction piece (although you may want to use both in combination). Weka is easy to use, I would definitely recommend giving it a look.

Resources