Combining multiple image descriptors (SIFT+HOG) - image

Can anyone clarify as to how mutiple image descriptors can be combined together. I mean , if I do a normal SIFT , then it gives me a 128xN matrix, where N is the number of descriptors. Now to add the HOG descriptor matrix which can be of a different dimension, what is the procedure (because simply concatenating them does not sound meaningful) ?. The final output of the combination would be used to create the bag of words model using k-means clustering.

Concatenating features does not sound meaningful but you should try. It is called "early fusion". And it can works.
Usually late fusion works better (learning the features separately and then merging the results/output of the two machine learning).
I tested it for combining BoVW and BoW, you should have a look in the paper, at section II, part C "multimodal fusion techniques".

Related

CONVLSTM2D to predict the second image from the first image

I have sequences of images (2 images in each sequence). I am trying to use CONVLSTM2D to train on this sequence.
Question:
Can I train LSTM model on just 2 images per sequence? The goal would be, prediction of second image from the first image.
Thanks!
You can, but is this the best to do? (I don't know either).
I think that using a sequence of two steps won't bring a lot of extra intelligence, it's just an input -> output pair in the end.
You could also simply put one image as input and the other as output in a sort of U-Net.
But many of these things must be tested for our surprise. Maybe the way things are made inside the LSTM, with gates and such could add some interesting behavior?

Relation between two texts with different tags

I'm currently having a problem with the conception of an algorithm.
I want to create a WYSIWYG editor that goes along the current [bbcode] editor I have.
To do that, I use a div with contenteditable set to true for the WYSIWYG editor and a textarea containing the associated bbcode. Until there, no problem. But my concern is that if a user wants to add a tag (for example, the [b] tag), I need to know where they want to include it.
For that, I need to know exactly where in the bbcode I should insert the tags. I thought of comparing the two texts (one with html tags like <span>, the other with bbcode tags like [b]), and that's where I'm struggling.
I did some research but couldn't find anything that would help me, or I did not understand it correctly (maybe did I do a wrong research). What I could find is the Jaccard index, but I don't really know how to make it work correctly.
I also thought of another alternative. I could just take the code in the WYSIWYG editor before the cursor location, and split it every time I encounter a html tag. That way, I can, in the bbcode editor, search for the first occurrence, then search for the second occurrence starting at the last index found, and so on until I reach the place where the cursor is pointing at.
I'm not sure if it would work, and I find that solution a bit dirty. Am I totally wrong or should I do it this way?
Thanks for the help.
A popular way of determining what is the level of the similarity between the two texts is computing the mentioned Jaccard similarity. Citing Wikipedia:
The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient, is a statistic used for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures the similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:
If you have a large number of texts though, computing the full Jaccard index of every possible combination of two texts is super computationally expensive. There is another way to approximate this index that is called minhashing. What it does is use several (e.g. 100) independent hash functions to create a signature and it repeats this procedure many times. This whole process has a nice property that the probability (over all permutations) that T1 = T2 is the same as J(A,B).
Another way to cluster similar texts (or any other data) together is to use Locality Sensitive Hashing which by itself is an approximation of what KNN does, and is usually worse than that, but is definitely faster to compute. The basic idea is to project the data into low-dimensional binary space (that is, each data point is mapped to a N-bit vector, the hash key). Each hash function h must satisfy the sensitive hashing property prob[h(x)=h(y)]=sim(x,y) where sim(x,y) in [0,1] is the similarity function of interest. For dots products it can be visualized as follows:
we can now ask what would be the has of the indicated point (in this case it's 101) and everything that is close to this point has the same hash.
EDIT to answer the comment
No, you asked about the text similarity and so I answered that. You basically ask how can you predict the position of the character in text 2. It depends on whether you analyze the writer's style or just pure syntax. In any of those two cases, IMHO you need some sort of statistics that will tell where it is likely for this character to occur given all the other data/text. You can go with n-grams, RNNs, LSTMs, Markov Chains or any other form of sequential data analysis.

What is the fastest way to compute the F-score for a million annotations?

Imagine you want to predict certain "events" (coded as: 0,1,2,3,...,N) within a finite number of sentences (coded as: 0,1,2,...,S) of a series of papers (coded as 0,1,...,P).
Your machine learning algorithm returns the following file:
paper,position,event
0,0,22
0,12,38
0,15,18
0,23,3
1,1064,25
1,1232,36
...
and you want to compute the F-score based on a similar ground truth data file:
paper,true_position,true_event
0,0,22
0,12,38
0,15,18
0,23,3
1,1064,25
1,1232,36
...
Since you have many papers and millions of those files, what is the fastest way to compute the F-score for each paper?
PS Notice that nothing guarantees that the two files will have the same number of positions, the ml algorithm might mistakenly identify positions that are not in the ground-truth.
As long as entries in two files are aligned so that you can directly compare line by line, I don't see why it will be slow to process millions of row in O(n) time, even on your laptop.

I'm looking for an algorithm or function that can take a text string and convert it a number

I looking for a algorithm, function or technique that can take a string and convert it to a number. I would like the algorithm or function to have the following properties:
Identical string yields the same calculated value
Similar strings would yield similar values (similar can be defined as similar in meaning or similar in composition)
Capable of handling strings of variable length
I read an article several years ago that gives me hope that this can be achieved. Unfortunately, I have been unable to recall the source of the article.
Similar in composition is pretty easy, I'll let somebody else tackle that.
Similar in meaning is a lot harder, but fun :), I remember reading an article about how a neural network was trained to construct a 2D "semantic meaning graph" of a whole bunch of english words, where the distance between two words represented how "similar" they are in meaning, just by training it on wikipedia articles.
You could do the same thing, but make it one-dimensional, that will give you a single continuous number, where similar words will be close to each other.
Non-serious answer: Map everything to 0
Property 1: check. Property 2: check. Property 3: check.
But I figure you want dissimilar strings to get different values, too. The question then is, what is similar and what is not.
Essentially, you are looking for a hash function.
There are a lot of hash functions designed with different objectives. Crypographic hashes for examples are pretty expensive to compute, because you want to make it really hard to go backwards or even predict how a change to the input affects the output. So they try really hard to violate your condition 2. There are also simpler hash functions that mostly try to spread the data. They mostly try to ensure that close input values are not close to each other afterwards (but it is okay if it is predictable).
You may want to read up on Wikipedia:
https://en.wikipedia.org/wiki/Hash_function#Finding_similar_substrings
(Yes, it has a section on "Finding similar substrings" via Hashing)
Wikipedia also has a list of hash functions:
https://en.wikipedia.org/wiki/List_of_hash_functions
There is a couple of related stuff for you. For example minhash could be used. Here is a minhash-inspired approach for you: Define a few random lists of all letters in your alphabet. Say I have the letters "abcde" only for this example. I'll only use two lists for this example. Then my lists are:
p1 = "abcde"
p2 = "edcba"
Let f1(str) be the index in p1 of the first letter in my test word, f2(str) the first letter in p2. So the word "bababa" would map to 0,3. The word "ababab" also. The word "dada" would make to 0,1, while "ce" maps to 2,0. Note that this map is invariant to word permutations (because it treats them as sets) and for long texts it will converge to "0,0". Yet with some fine tuning it can give you a pretty fast chance of finding candidates for closer inspection.
Fuzzy hashing (context triggered piecewise hashing) may be what you are looking for.
Implemenation: ssdeep
Explanation of the algorithm: Identifying almost identical files using context triggered piecewise hashing
I think you're probably after a hash function, as numerous posters have said. However, similar in meaning is also possible, after a fashion: use something like Latent Dirichlet Allocation or Latent Semantic Analysis to map your word into multidimensional space, relative to a model trained on a large collection of text (these pre-trained models can be downloaded if you don't have access to a representative sample of the kind of text you're interested in). If you need a scalar value rather than multi-dimensional vector (it's hard to tell, you don't say what you want it for) you could try a number of things like the probability of the most probable topic, the mean across the dimensions, the index of the most probable topic, etc. etc.
num = 0
for (byte in getBytes(str))
num += UnsignedIntValue(byte)
This would meet all 3 properties(for #2, this works on the strings binary composition).

OCR: Choose the best string based on last N results (an adaptive filter for OCR)

I've seen some questions on deciding the best OCR result given output from different engines, and the answer is typically "choose the best engine".
I want, however, to capture several frames of text images, with possible temporary occlusions or temporary failures.
I'm using tesseract-ocr with python-tesseract.
Considering the OCR outputs of the last N frames, I want to decide what is the best result (line by line, for simplicity).
For example, for N=3, we could use a median filtering:
ABXD
XBCX
AXCD
When there are 2 out of 3 equal characters, the majority will win, so the result would be ABCD.
However, that's not so easy with different string sizes. If I expect a given size M (if scanning a price table, the rows are typically XX.XX), I can always penalize on strings bigger than M.
If we were talking numbers, a median filtering would work quite well (simple background subtraction in computer vision), or some least mean squares adaptive filtering.
There's also the problem of similar characters: l and 1 can be very similar, depending on the font.
I was also thinking of using string distances between each string. For example, choose the string with the smallest sum of distances with the others.
Has anyone addressed this kind of problem before? Is there any known algorithm for this kind of problem that I should know?
This problem is called multiple sequence alignment and you can read about it here

Resources