Elastic search, search data as where like 'TEXT%'? - elasticsearch

I want to find sentences or words that start with the characters I'm looking for, what should I do for it?
For example:
get the data list like this
automatic car
car
carpet
car accessories
car battery
cast
game cards
race car
When I search for the word "car", I find the following data.
car
car accessories
car battery
carpet
I find the following data when I search for the word "ca"
cast
car
car accessories
car battery
carpet
that is, I don't want him to search the whole sentence, I just want him to search for words that start with search characters.
To give an example with sql, I would like to make an equivalent search to where like 'car%'

You can achieve that using the Wildcard query
GET /_search
{
"query": {
"wildcard": {
"field_name": {
"value": "ca*"
}
}
}
}
Additionally, if you want to implement autocomplete like feature - read Suggesters documentation

Elasticsearch has a feature called prefix query, that returns documents that contain a specific prefix in a provided field.Also wildcard should also work for you.
GET /_search
{
"query": {
"prefix" : { "your_index_field" : "car" }
}
}
SEE MORE: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html#prefix-query-short-ex

Related

elasticsearch: or operator, number of matches

Is it possible to score my searches according to the number of matches when using operator "or"?
Currently query looks like this:
"query": {
"function_score": {
"query": {
"match": {
"tags.eng": {
"query": "apples banana juice",
"operator": "or",
"fuzziness": "AUTO"
}
}
},
"script_score": {
"script": # TODO
},
"boost_mode": "replace"
}
}
I don't want to use "and" operator, since I want documents containing "apple juice" to be found, as well as documents containing only "juice", etc. However a document containing the three words should score more than documents containing two words or a single word, and so on.
I found a possible solution here https://github.com/elastic/elasticsearch/issues/13806
which uses bool queries. However I don't know how to access the tokens (in this example: apples, banana, juice) generated by the analyzer.
Any help?
Based on the discussions above I came up with the following solution, which is a bit different that I imagined when I asked the question, but works for my case.
First of all I defined a new similarity:
"settings": {
"similarity": {
"boost_similarity": {
"type": "scripted",
"script": {
"source": "return 1;"
}
}
}
...
}
Then I had the following problem:
a query for "apple banana juice" had the same score for a doc with tags ["apple juice", "apple"] and another doc with tag ["banana", "apple juice"]. Although I would like to score the second one higher.
From the this other discussion I found out that this issue was caused because I had a nested field. And I created a usual text field to address it.
But I also was wanted to distinguish between a doc with tags ["apple", "banana", "juice"] and another doc with tag ["apple banana juice"] (all three words in the same tag). The final solution was therefore to keep both fields (a nested and a text field) for my tags.
Finally the query consists of bool query with two should clauses: the first should clause is performed on the text field and uses an "or" operator. The second should clause is performed on the nested field and uses and "and operator"
Despite I found a solution for this specific issue, I still face a few other problems when using ES to search for tagged documents. The examples in the documentation seem to work very well when searching for full texts. But does someone know where I can find something more specific to tagged documents?

elasticsearch partial searching with search as you type

i have documents with a field called title having data like "the lord of the rings","lord of the rings","the ring",etc
I would like to do a search as you type feature.
So if user types "th", the order of the results should be -
"the lord of the ring",
"the ring",
"lord of the rings"
since i want the strings that start with "th" to appear first and alphabetically.
i tried looking into edgengrams, but that does it for every word in the string.
I would like to do it only from beginning of string.
Can you please let me know what are the analyzers i need to use to achieve this?
Thanks
This is the best link I've seen so far :
Search like a Google with Elasticsearch. Autocomplete, Did you mean and search for items
You can try Match Phrase Prefix Query:
{
"query": {
"match_phrase_prefix": {
"text": "the"
}
}
}
Hope this helps

Is it possible to chain fquery filters in elastic search with exact matches?

I have been having trouble writing a method that will take in various search parameters in elasticsearch. I was working with queries that looked like this:
body:
{query:
{filtered:
{filter:
{and:
[
{term: {some_term: "foo"}},
{term: {is_visible: true}},
{term: {"term_two": "something"}}]
}
}
}
}
Using this syntax I thought I could chain these terms together and programatically generate these queries. I was using simple strings and if there was a term like "person_name" I could split the query into two and say "where person_name match 'JOHN'" and where person_name match 'SMITH'" getting accurate results.
However, I just came across the "fquery" upon asking this question:
Escaping slash in elasticsearch
I was not able to use this "and"/"term" filter searching a value with slashes in it, so I learned that I can use fquery to search for the full value, like this
"fquery": {
"query": {
"match": {
"by_line": "John Smith"
But how can I search like this for multiple items? IT seems that when i combine fquery and my filtered/filter/and/term queries, my "and" term queries are ignored. What is the best practice for making nested / chained queries using elastic search ?
As in the comment below, yes I can just add fquery to the "and" block like so
{:filtered=>
{:filter=>
{:and=>[
{:term=>{:is_visible=>true}},
{:term=>{:is_private=>false}},
{:fquery=>
{:query=>{:match=>{:sub_location=>"New JErsey"}}}}]}}}
Why would elasticsearch also return results with "sub_location" = "new York"? I would like to only return "new jersey" here.
A match query analyzes the input and by default it is a boolean OR query if there are multiple terms after the analysis. In your case, "New JErsey" gets analyzed into the terms "new" and "jersey". The match query that you are using will search for documents in which the indexed value of field "sub_location" is either "new" or "jersey". That is why your query also matches documents where the value of field "sub_location" is "new York" because of the common term "new".
To only match for "new jersey", you can use the following version of the match query:
{
"query": {
"match": {
"sub_location": {
"query": "New JErsey",
"operator": "and"
}
}
}
}
This will not match documents where the value of field "sub_location" is "New York". But, it will match documents where the value of field "sub_location" is say "York New" because the query finally translates into a boolean query like "York" AND "New". If you are fine with this behaviour, well and good, else read further.
All these issues arise because you are using the default analyzer for the field "sub_location" which breaks tokens at word boundaries and indexes them. If you really do not care about partial matches and want to always match the entire string, you can make use of custom analyzers to use Keyword Tokenizer and Lowercase Token Filter. Mind you, going ahead with this approach will need you to re-index all your documents again.

ElasticSearch brings up less relevant results when scoring is applied

I have an index in ElasticSearch 0.9 with some documents like this:
{"Id":1, "Title":"Hello World" , "Popularity":1},
{"Id":2, "Title":"Hello World" , "Popularity":3},
{"Id":3, "Title":"Hello" , "Popularity":10}
As you see the first two documents have the same title text but different popularity values. Now I do a Fuzzy search on the Title property with a simple scoring script in place:
Scoring script : "_score * doc['Popularity'].value"
My query is something like this:
{
"query": {
"custom_score": {
"lang": "mvel",
"script": "_score * doc['Popularity'].value",
"query": {
"fuzzy": {
"Title": {
"value": "Hello World",
"fuzziness": 3
}
}
}
}
}
}
Now what happens is that the third document (whose Id is equal to 3) comes to the top of the search result simply because it has a higher popularity. In the other words the scoring function completely overrides the relevancy of the search result. Whereas I expect to see the more relevant documents (Id = 1 and 2) on the top because they are more relevant to the search term (shorter distance) and then between the two top search results the scoring function boost the document with a higher popularity value. So the result would I expect would be like this:
{"Id":2, "Title":"Hello World" , "Popularity":3}
{"Id":1, "Title":"Hello World" , "Popularity":1}
{"Id":3, "Title":"Hello" , "Popularity":10}
As a real world example, we have a music store which has a search bar on the top. Users may enter a keyword such as "Blue" and then there will be tens of music tracks whose title is "Blue" and some other which are close to the search time (e.g. "BlueSky"). Each track has a popularity property as well however we want to see all the tracks whose title is "Blue" on the top even the track with title of "BlueSky" has a higher popularity simply because the users prefer to see the exact matches first. Then those whose title is exactly "Blue" must be ranked by the scoring script.
Can someone please guide me as to how can I update my query so that the relevant result (regardless of scoring) still get to the top of the result list and then among them scoring boost the more popular ones?

Boosting in Elasticsearch

I am new to elasticsearch. In elasticsearch we can use the term boost in almost all queries. I understand it's used for modify score of documents. But i can't find actual use of it. My query is if i use boost values in some queries, will it affect final score of search or the boost rank of docs in index itself.
And what is main difference between boost at index and boost at querying..
Thanks in Advance..!
Query time boost allows you to give more weight to one query than to another. For instance, let's say you are querying the title and body fields for "Quick Brown Fox", you could write it as:
{
"query": {
"bool": {
"should": [
{
"match": {
"title": "Quick Brown Fox"
}
},
{
"match": {
"body": "Quick Brown Fox"
}
}
]
}
}
}
But you decide that you want the title field to be more important than the body field, which means you need to boost the query on the title field by (eg) 2:
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "Quick Brown Fox",
"boost": 2
}
}
},
{
"match": {
"body": "Quick Brown Fox"
}
}
]
}
}
}
(Note how the structure of the match clause changed to accommodate the boost parameter).
The boost value of 2 doesn't double the _score exactly - the scores go through a normalization process. So you should think of boost as make this query clause relatively more important than the other query clauses.
My doubt is if i use boost values in some queries. will it affect final score of search
Yes it does, but you shouldn't rely on the actual value of _score anyway. Its only purpose is to allow Elasticsearch to decide which documents are most relevant to this query. If the query changes, the scores change.
Re index time boosting: don't use it. It's inflexible and error prone.
Boost at query time won't modify your index. It only applies boost factor on fields when searching.
I prefer boost at query time as it's more flexible. If you need to change your boost rules and you had set it at index time, you will probably need to reindex.
Use cases of boosting : Suppose you are building a e-commerce web app, and your product data is in elastic search. Whenever a customer uses search bar you query elastic search and displays the result in web app.
Elastic search keeps relevance score for every document and returns the result in sorted order of the relevance score.
Now let's assume a user searches for "samsung phones", then should your web app just show samsung phones -> Answer is NO.
Your web app should show other phones as well (as user may like those as well) but first show samsung phones (as he/she is looking for those) and then show other phones as well.
So question is how do you query where samsung phones comes up in result ? -> Answer is relevance score.
Let say you hit query like for all mobile phones and samsung phone and the keep high relevance score of samsung phones,
Then result will contain first samsung phones and then other phones.

Resources