How can I search for related words in elasticsearch? - elasticsearch

For example, if I search for "food," I want to return results that include any entry that has keyword "Restaurant" or "Chef" or something like that.

I think you need the "synonyms" feature of Elasticsearch: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/using-synonyms.html
You define a list of words that you believe are synonyms (in your example - food, restaurant, chef) and then, at indexing time, ES will index not only "restaurant" for example, but also "food" and "chef". See more details about this in the link above.

Related

Elasticsearch - match by all terms but full field must be matched

I'm trying to improve search on my service but get stuck on complex queries.
I need to match some documents by terms but return only documents that contains all of provided terms in any order and contains only this terms.
So for example, lets take movie titles:
"Jurassic Park"
"Lost World: Jurassic Park"
"Jurassic Park III"
When I type "Park Jurassic" I want only first document to be returned because it contains both words and nothing more.
This is silly example of complex problem but I've simplified it.
I tried with terms queries, match etc but I don't know how to check if entire field was matched.
So in short it must match all tokens in any order.
Field is mapped as text and also as keyword.
You tested the terms set query?
Returns documents that contain a minimum number of exact terms in a
provided field.
The terms_set query is the same as the terms query, except you can
define the number of matching terms required to return a document.

How do I match a partial result with Elastic Search?

I'm trying to find out how to properly write my query in order to do a LIKE query with ElasticSearch.
Let's say I have a record of firstname and I want to find every one where there is ma in it.
So I've tried multiple things but none are working. Here is a list :
{"match": {"text": ".*ma.*"}}
{"match": {"text": "*ma*"}}
{"match":{"text"{"query":"ma","fuzziness":"AUTO","prefix_length":1}}}
Do you have an idea of how to do that or where am I missing something?
You might look into using the N-Gram tokenizer to split your documents' tokens up into their substrings.
This will allow you to search against the index with the "partial" matches you're describing.
Bear in mind that this will affect how your documents are tokenized for search so, if you are using other types of analysis for other parts of your application, you may want to create additional fields for your N-Gram tokenized values (or even create a separate index for them).
As a rule of thumb, always try to optimize your index for the queries you want to perform, rather than trying to solve your search problems at query time.

Elasticsearch - use a "tags" index to discover all tags in a given string

I have an elasticsearch v2.x cluster with a "tags" index that contains about 5000 tags: {tagName, tagID}. Given a string, is it possible to query the tags index to get all tags that are found in that string? Not only do I want exact matches, but I also want to be able to control for fuzzy matches without being too generous. By too generous, a tag should only match if all tokens in the tag are found within a certain proximity of each other (say 5 words).
For example, given the string:
Model 22340 Sound Spectrum Analyzer
The following tags should match:
sound analyzer sound spectrum analyzer
BUT NOT
sound meter light spectrum chemical analyzer
I don't think it's possible to create an accurate elasticsearch query that will auto-tag a random string. That's basically a reverse query. The most accurate way to match a tag to a document is to construct a query for the tag, and then search the document. Obviously this would be terribly inefficient if you need to iterate over each tag to auto-tag a document.
To do a reverse query, you want to use the Elasticsearch Percolator API:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-percolate.html
The API is very flexible and allows you to create fairly complex queries into documents with multiple fields.
The basic concept is this (assuming your tags have an app specific ID field):
For each tag, create a query for it, and register the query with the percolator (using the tag's ID field).
To auto-tag a string, pass your string (as a document) to the Percolator, which will match it against all registered queries.
Iterate over the matches. Each match includes the _id of the query. Use the _id to reference the tag.
This is also a good article to read: https://www.elastic.co/blog/percolator-redesign-blog-post
"query": {
"match": {
"tagName": {
"query": "Model 22340 Sound Spectrum Analyzer",
"fuzziness": "AUTO",
"operator": "or"
}
}
}
If you want an equal match so that "sound meter" will not match you will have to add another field for each tag containing the terms count in the tag name, add a script to count the terms in the query and add a comparison of the both in the match_query, see: Finding Multiple Exact Values.
Regarding the proximity issue: Since you require "Fuzzyness" you cannot control the proximity because the "match_phrase" query is not integrated with Fuzzyness, as stated by Elastic docs Fuzzy-match-query:
Fuzziness works only with the basic match and multi_match queries. It doesn’t work with phrase matching, common terms, or cross_fields matches.
so you need to decide: Fuzzyness vs. Proximity.
Of course you can. You can achieve what you want to get using only just match query with standard analyzer.
curl -XGET "http://localhost:9200/tags/_search?pretty" -d '{
"query": {
"match" : {
"tagName" : "Model 22340 Sound Spectrum Analyzer"
}
}
}'

Weighted keywords on elasticsearch document

I want to create an index in elasticsearch that has a field of weighted keywords list, so when I search by term in this keywords - it will give better scores to those documents that has this key with higher weight?
For instance:
Doc1
"id" : "111"
"keywords" : "house"(20), "dog"(2)
Doc2
"id" : "222"
"keywords" : "house"(3), "dog"(40)
I want when searching "dog" to get doc2 with higher score.
How would you build the mapping and the query?
Note that it's different than searching with regular boost, as the boost per each term is different per document.
What about Elasticsearch payloads? See DrTech's answer with the delimited payload token filter to a separate unrelated question which might help you out. But, what you are describing seems to very much lend itself to the use of payloads and using script scoring to access these payloads and influence the scoring. Take note of the performance cost he mentions.

ElasticSearch highlighting the matched part in query

I'm sending a match query to ElasticSearch and I'm getting back documents whose matching fields have been highlighted. What I'm trying to do is to map a set of documents to the matching substring in query.
For example, assuming I query with "quick brown". I want to map the document "quick silver" to "quick", "brown fox" to "brown" and "mr brown" to "brown".
This is trivial if document fields exactly contain the word in query. But things get messy when I use fuzziness, synonyms, asciifolding etc. In that case, the highlighted parts of search results might not even appear in my search query.
Is is possible to achieve this task without replicating the analyzer logic on my application?
Use the simple query string query instead of the match query when you try to find mapped documents. And set the operator to or. So quick silver as a query will match docs with quick or silver.

Resources