I am starting the development of a software in which through an image of a touristic spot (for example: San Peter Basilica, the Colosseum, etc.) I should retrieve which is the name of the spot (plus its related information). In addition to the image I will have with me the picture coordinates (embedded as metadata). I know I can support me with Google Images API using reverse search in which I give my image as an input, and I will have as a response a big set of images.
However, my advice request for you, is that now having all the similar images, which approach can I make in order to retrieve the correct place name which is in the photo.
A second approach that I am managing is to construct my own dataset in my database, and do my own heuristic (filtering images by their location and then to make the comparation over the resulting subset after having done that filtering). Suggestions and advices are heard, and thanks in advance.
An idea is to use the captions of the images (if available) as a query, retrieve a list of candidates and make use of a structured knowledge base to deduce the location name.
The situation is lot trickier if there're no captions associated with the images, in which case, you may use the fc7 layer output of a pre-trained convolutional net and query into the ImageNet to retrieve a ranked list of related images. Since those images have captions, you could again use them to get the location name.
Related
Im using a pre-trained image classifier to evaluate input data treatments. I downloaded the ImageNet ILSVRC2014 CLS-LOC validation dataset to use as base. I need to know the actual classes of the images to evaluate my treatments (need to detect correct classifications). In the 2014 toolkit there is ILSVRC2014_clsloc_validation_ground_truth.txt file that according to the readme is supposed to contain class labels (in form of ID:s) for the 50 000 images in the data set. There are 50 000 entries/lines in the file so this far all seems good but i also want the corresponding semantic class labels/names.
I found these in a couple of places online and they seem to be coherent (1000 classes). But then i looked at the first image which is a snake, the ground truth for the first pic is 490, the 490:th row in the semantic name list is "chain". That's weird but still kind of close. The second image is two people skiing, the derived class "polecat". I tried many more with similar results.
I must have misunderstood something. Isn't the ground truth supposed to be the "correct" answers for the validation set? Have i missed something in the translation between ID:s and semantic labels?
The readme in the 2014 imagenet dev-kit states:
" There are a total of 50,000 validation images. They are named as
ILSVRC2012_val_00000001.JPEG
ILSVRC2012_val_00000002.JPEG
...
ILSVRC2012_val_00049999.JPEG
ILSVRC2012_val_00050000.JPEG
There are 50 validation images for each synset.
The classification ground truth of the validation images is in
data/ILSVRC2014_clsloc_validation_ground_truth.txt,
where each line contains one ILSVRC2014_ID for one image, in the
ascending alphabetical order of the image file names.
The localization ground truth for the validation images can be downloaded
in xml format. "
Im doing this as part of my bachelor thesis and really want to get it right.
Thanks in advance
This problem is now solved. In the ILSVRC2017 development kit there is a map_clsloc.txt file with the correct mappings.
I'm looking for a way to specify that the images returned by the Google Custom Search API have a square format.
I've tried tbs=iar:s (because I've read Using the Custom Search API (REST JSON) to search for square images), but it doesn't work.
Have you an idea please ?
The problem is that tbs query parameter only applies to a regular image search on Google. For example, if you wanted to search for cat pictures with a square aspect ratio, you could do a search like this:
http://images.google.com/?q=cat&tbs=iar:s
But the Custom Search API uses a completely different set of parameters. The full list of supported parameters is shown in the REST documentation.
Some of the tbs queries do have equivalents. For example:
tbs=ic:gray translates to imgColorType=gray
tbs=isz:m translates to imgSize=medium
tbs=itp:clipart translates to imgType=clipart
But sadly there appears to be no equivalent for the iar aspect ratio filter. I tried guessing a few queries (things like imgAspectRatio=square) in case there was an undocumented parameter, but didn't have any luck with that.
The best alternative I could suggest is using imgSize=icon. This tends to return images that have a square aspect ratio, but with the unfortunate side effect that the images also tend to be rather small (the largest size I've seen returned is 256x256). Depending on your needs though, this may be good enough.
I apologise if this isn't particularly useful to you. I'm not just trying to grab the bounty on this question, so feel free not to vote this answer up. I just wanted to let you know what I had found in case it was of some help.
You can simply use both tbs=isz:l,iar:s that way it will return only large images with same aspect ration.
In my web application I am going to use OpenLayers.Strategy.AnimatedCluster strategy due to the fact that I need to visualize a great amount of point features. Here is a very good example of what it looks like. In both examples in above mentioned example the data (point features) are generated of taken from the GeoJSON file.
So, can anybody provide me with a file containing 100 000+ (better is even 500 000+) features (world cities, for instance), or explain how I can generate them so that they will be located all over the world (not like in Spain in the first example in above mentioned link).
use a geolocation database to supply you the data you need. GeoLite, for example
If 400K+ locations is ok, use download their CSV CITY LIST
If you want more, then you might want to give the Nominatim downloads, but they are quite bulky (more than 25GB) and parsing data is not as simple as a csv file.
I'm really stuck right now. I want to apply LIBSVM for Image Classification. I captured lots of Training-Images (BITMAP-Format), from which I want to extract features.
The Training-Images contain people who are lying on the floor. The classifier should decide if there is a person lying on the floor or not in the given Image.
I read lots of papers, documentary, guides and tutorials, but in none of them is documented how to get a LIBSVM-Package. The only thing that is described is how to convert a LIBSVM-Package from a CSV-File like this one: CSV-File. On the LIBSVM-Website several Example-Data can be downloaded. The Example-Data is either prepared as CSV-Files or as ready-to-use Training- and Testdata.
If you look at the Values which are in the CSV-File, the first column are the labels (lying person or not) and the other Values are the extracted features, but I still can't reconstruct how those values are achieved.
I don't know if it's that simple that nobody has to mention it, but I just can't get trough it, so if anybody knows how to perform the feature extraction from Images, please help me.
Thank you in advance,
Regards
You need to do feature extraction first. There are many methods that are available. These include LBP,Gabor and many more.. These methods will help you get the features to input into libsvm..Hope this helps...
Do you have any links/books with information about digital image processing (filters/effects)? I'd like a large list of filters and information on how they work and how to implement them.
Here is a large list
Among them my favorites are Invert, GaussianBlur, Canny, HoughLinesP, Lanczos, cvBlobsLib