Imagenet ILSVRC2014 validation ground truth to synset label translation not accurate - validation

Im using a pre-trained image classifier to evaluate input data treatments. I downloaded the ImageNet ILSVRC2014 CLS-LOC validation dataset to use as base. I need to know the actual classes of the images to evaluate my treatments (need to detect correct classifications). In the 2014 toolkit there is ILSVRC2014_clsloc_validation_ground_truth.txt file that according to the readme is supposed to contain class labels (in form of ID:s) for the 50 000 images in the data set. There are 50 000 entries/lines in the file so this far all seems good but i also want the corresponding semantic class labels/names.
I found these in a couple of places online and they seem to be coherent (1000 classes). But then i looked at the first image which is a snake, the ground truth for the first pic is 490, the 490:th row in the semantic name list is "chain". That's weird but still kind of close. The second image is two people skiing, the derived class "polecat". I tried many more with similar results.
I must have misunderstood something. Isn't the ground truth supposed to be the "correct" answers for the validation set? Have i missed something in the translation between ID:s and semantic labels?
The readme in the 2014 imagenet dev-kit states:
" There are a total of 50,000 validation images. They are named as
ILSVRC2012_val_00000001.JPEG
ILSVRC2012_val_00000002.JPEG
...
ILSVRC2012_val_00049999.JPEG
ILSVRC2012_val_00050000.JPEG
There are 50 validation images for each synset.
The classification ground truth of the validation images is in
data/ILSVRC2014_clsloc_validation_ground_truth.txt,
where each line contains one ILSVRC2014_ID for one image, in the
ascending alphabetical order of the image file names.
The localization ground truth for the validation images can be downloaded
in xml format. "
Im doing this as part of my bachelor thesis and really want to get it right.
Thanks in advance

This problem is now solved. In the ILSVRC2017 development kit there is a map_clsloc.txt file with the correct mappings.

Related

Snapshot testing PDFs [duplicate]

I am generating and storing PDFs in a database.
The pdf data is stored in a text field using Convert.ToBase64String(pdf.ByteArray)
If I generate the same exact PDF that already exists in the database, and compare the 2 base64strings, they are not the same. A big portion is the same, but it appears about 5-10% of the text is different each time.
What would make 2 pdfs different if both were generated using the same method?
This is a problem because I can't tell if the PDF was modified since it was last saved to the db.
Edit: The 2 pdfs visually appear exactly the same when viewing the actual pdf, but the base64string of the bytes are different
Two PDFs that look 100% the same visually can be completely different under the covers. PDF producing programs are free to write the word "hello" as a single word or as five individual letters written in any order. They are also free to draw the lines of a table first followed by the cell contents, or the cell contents first, or any combination of these such as one cell at a time.
If you are actually programmatically creating the PDFs and you create two PDFs using completely identical code you still won't get files that are 100% identical. There's a couple of reasons for this, the most obvious is that PDFs support creation and modification dates. These will obviously change depending on when they are created. You can override these (and confuse everyone else so I don't recommend this) using something like this:
var info = writer.Info;
info.Put(PdfName.CREATIONDATE, new PdfDate(new DateTime(2001,01,01)));
info.Put(PdfName.MODDATE, new PdfDate(new DateTime(2001,01,01)));
However, PDFs also support a unique identifier in the trailer's /ID entry. To the best of my knowledge iText has no support for overriding this parameter. You could duplicate your PDF, change this manually and then calculate your differences and you might get closer to a comparison.
Then there's fonts. When subsetting fonts, producers create a unique internal name based on the original name and an arbitrary selection of six uppercase ASCII letters. So for the font Calibri the font's name could be JLXWHD+Calibri one time and SDGDJT+Calibri another time. iText doesn't support overriding of this because you'd probably do more harm than good. These internal names are used to avoid font subset collisions.
So the short answer is that unless you are comparing two files that are physical duplicates of each other you can't perform a direct comparison on their binary contents. The long answer is that you can tweak some of the PDF entries to remove unique parts for comparison only but you'd probably be doing more work than it would take to just re-store the file in the database.

Information Retrieval Get place name by image

I am starting the development of a software in which through an image of a touristic spot (for example: San Peter Basilica, the Colosseum, etc.) I should retrieve which is the name of the spot (plus its related information). In addition to the image I will have with me the picture coordinates (embedded as metadata). I know I can support me with Google Images API using reverse search in which I give my image as an input, and I will have as a response a big set of images.
However, my advice request for you, is that now having all the similar images, which approach can I make in order to retrieve the correct place name which is in the photo.
A second approach that I am managing is to construct my own dataset in my database, and do my own heuristic (filtering images by their location and then to make the comparation over the resulting subset after having done that filtering). Suggestions and advices are heard, and thanks in advance.
An idea is to use the captions of the images (if available) as a query, retrieve a list of candidates and make use of a structured knowledge base to deduce the location name.
The situation is lot trickier if there're no captions associated with the images, in which case, you may use the fc7 layer output of a pre-trained convolutional net and query into the ImageNet to retrieve a ranked list of related images. Since those images have captions, you could again use them to get the location name.

Kofax Seperate Main Invoice from Supporting Document without using Seperator sheet

When a batch gets created documents should get separated automatically without using separator sheet or Barcode separator.
How can I classify documents for Invoice and supporting document.
In our project we get many invoices with supporting document so the scanning person has to insert the separator sheets manually, so to avoid this we want to automatically classify the supporting documents.
In general the concept would be that you would enable separation in the project and then train your classes with examples to be used for the layout or content classifiers.
However, as I'm sure you've seen, the obstacle with invoices is that they are different enough between vendors that it would not reliably classify all to an Invoice class. Similarly with "Supporting Documents" which are likely to be very different from each other, so unfortunately there isn't a completely easy answer without separator sheets (or barcode stickers affixed to supporting docs).
What you might want to do is write code in the one of the separation events like Document_AfterSeparate event. Despite the name, the document has not yet been split at this point, but the classifiers have run. See Scripting Help topic "Server Script Events Sequence > Document Separation > Standard Document Separation" for more detail. Setting the SplitPage property on the CDocPage (pXDoc.CDoc.Pages.ItemByIndex(lPage).SplitPage) will allow you to use your own logic to determine which pages to separate.
For example if you know that you will always have single page invoices, you can split on the first page and classify accordingly. Or you can try to search for something that indicates the end of the invoice like "Total" or other characteristics. There is an example of how you can use locators to help separation in the Scripting Help topic "Script Samples > Use Locator Results for Standard Document Separation". The example uses a Barcode Locator, but the same concept works if you wanted to try it with a Format Locator or anything else.
Without Separator sheets you will need a smart classification software like Kofax Transformation Module (KTM). Its kind of expensive. you will need to verify the cost saving and ROI.

Generating vector data (points) for OpenLayers Cluster

In my web application I am going to use OpenLayers.Strategy.AnimatedCluster strategy due to the fact that I need to visualize a great amount of point features. Here is a very good example of what it looks like. In both examples in above mentioned example the data (point features) are generated of taken from the GeoJSON file.
So, can anybody provide me with a file containing 100 000+ (better is even 500 000+) features (world cities, for instance), or explain how I can generate them so that they will be located all over the world (not like in Spain in the first example in above mentioned link).
use a geolocation database to supply you the data you need. GeoLite, for example
If 400K+ locations is ok, use download their CSV CITY LIST
If you want more, then you might want to give the Nominatim downloads, but they are quite bulky (more than 25GB) and parsing data is not as simple as a csv file.

Compare and match images with different images?

Disclaimer : i'm beginner maybe this question bad for you, i hope you understand.
I have to create skin diseases expert system using PHP programming. The point is matching two diferent image or more, and the system matching/compare images from database/files with images from user, and then give some question to the user who input the image. The question come from matching/compare result which roughly matches with image from database/file.
For example, this is images from user with Scabies skin diseases :
And then this is sample image from database/file.
Now how can i match /compare the images?
i already read this questions Image comparison - fast algorithm, Compare images to find differences,
Tool to compare images on Windows, Algorithm to compare two images, Algorithm fast compare images \ matrix
and article from http://www.cs.ubc.ca/~lowe/keypoints/ (SIFT keypoint detector) and http://www.cmap.polytechnique.fr/~yu/research/ASIFT/demo.html (ASIFT, SIFT, MSER) but it seem only with same picture just diferent from position take the picture.
and all of them can't help me ( or me not understand LOL ).
I don't know much about OpenCV library, whether OpenCV library can handle it?
Please..., i need your help. Thanks :).
Edit :
May this image can explain :
The problem is on step 2.
You could do morphological analysis on the hue images, distinguishing between normal skin color and unhealthily red color. That is, go into HSV space, extract the H component, threshold it, and then analyze the size and shape of the white areas using e.g. successive erosion.
However, the chances are pretty slim. You have a scale problem (i.e. you don't know how large the taken image is), you have the normal color/brightness normalization problems, and you have the additional problem of the large variations present in skin diseases.
This is a fairly hard problem, even for people who have studied image processing. If you don't have any prior experience in image processing (and if you are trying to use PHP for such a problem, you probably don't), prepare for a long learning process. Several months at least.
I don't really know about the medical situation, the images were enough to make me sick. :)
However, I think you need to find the areas that does have different colors with the actual skin. So I recommend this link as a starting point. You can use "segmented particles" or "points at maxima" to figure out the count and the density or whatever of disorientations on the skin, and this might be a guide to what the sickness is. Also, you can get the color values of that points by "results" in the same link.
PHASE I
Go on Google Images and upload your image. Google has a "search for similar images" feature and will try to make a match. Likely Google will just match you to other pictures of skin or body parts. Set the upper limit of your expectations to matching Google's results in image recognition. If that upper limit is not good enough...
PHASE II
Use an expert system (maybe 3 layers deep of question & answer to classify the condition). Following is a list of skin conditions I have been working on. Of course you would need to put human-readable descriptions next to any medical terms
Acne
Cyst/cysts
Infected cyst
Non-infected cyst
Acne cyst
Epidermal cyst
Myxoid cyst
Ganglion cyst
Synovial cyst
Sebaceous cyst
Helial cyst
Auricular
Hidradenoma
Syringoma
Hidradenitis
[...] Nevi/nevis
Pigmented
Congenital
Typical
Atypical / Dysplastic
Inflamed
Irritated
[other]
[...] Carcinoma
Basal cell
Superficial
Squamous cell
In situ
Squamous cell ((what does this mean??))
Other
Melanoma
In situ
Keratosis/keratoses
Actinic
Seborrheic
Irritated
Pigmented
Warty
[...] Verruca (wart)
Common
Genital
Condylomatous
Plantar
Digital
Periungal
Filiform
Palmar
Urticaria (hives)
Generalized
Vasulittic
Contact
Vasculitis
Allergic
Leukocytoclastic
[...] Dermatitis
Seborrheic
Exematous
Eczematous
Eczematous
Eczematoid
Lichenoid
Psoriasiform
Pityriasiform
Nummular
Lichen simplex
Hypersensitivity
Dyshidrotic
Palmar-plantar
Psoriasis
Palmo
Plantar
Pustular
Erythrodermic
Hyperhydrosis
Lichen planus
Blistering disease
Pemphigoid
Pemphigus
Herpes simplex
Herpes zoster
Insect bite reaction
Lipom
Excoriations / prurigo
Tinea [...] (fungus)
Versicolor
Pedis
Unguium
Cruris
Capitis
Facilie
Corporis
Scarring
Post-funeral
Traumatic
Post-radiation
Acne
Keloid
Hypertrophic
Atrophic
Scleroderma
Localized
Systemic
Perleche
Cheilitis
Balanitis
Morphea
Atrophoderma
Vascular lesions
Pumpura
Eccliymosis
Angiomata
Pyogenic Granuloma
Telangiectasias
Varix
Port Wine Stain
Candidiasis
Impetigo lesions
Folliculitis
Furunculosis (boils)
Abscess
[...] Ulceration
Infected
Non-infected
Intertrigo
Abnormalities of Pigmentation
Post-inflammatory Hyperpigmenation
Hypopigmentation
DePigmentation
Vitiligo
Melasma
Chloasma
Rhiels Melanosis
Poikiloderma
Dyschromia
Pityriasis
Pityriasis Alba
Pityriasis Rosea
Rubra Pilaris
Lichenoides
Acuta (PLEVA)
Dry Skin
Asteatosis
Ichthyosis
Hyperkeratosis

Resources