Elasticsearch: Must include all words in search if all exist, but ignore one or two if they don't? - elasticsearch

I hope what I'm trying to explain makes sense, and there is a way that I could achieve it.
Currently I am searching in 40 million documents, with a query like this:
GET /all/_search
{
"query": {
"match": {
"full_text": {
"query": "insert ten or twelve words here to search",
"operator": "and"
}
}
}
}
Now I want to only return docs that their 'full_text' includes all of the words in the query. I am able to achieve that with above snippet.
My question is, when there is no match at all, but for example removing "ten" would yield one result, is there a way to configure my search to do that? I.e. to tell ES "aim for 100% match, but if nothing found, 90% would do just fine" !
Hope this is clear :)

You can use minimum_should_match clause along with match query
{
"query": {
"match": {
"text":{
"query": "insert ten or twelve words here",
"minimum_should_match":"90%"
}
}
}
}

Related

How can I achieve this type of queries in ElasticSearch?

I have added a document like this to my index
POST /analyzer3/books
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
And then I do queries like this
GET /analyzer3/_analyze
{
"analyzer": "english",
"text": "\"The * day I went with my * to the\""
}
And it successfully returns the previously added document.
My idea is to have quotes so that the query becomes exact, but also wildcards that can replace any word. Google has this exact functionality, where you can search queries like this, for instance "I'm * the university" and it will return page results that contain texts like I'm studying in the university right now, etc.
However I want to know if there's another way to do this.
My main concern is that this doesn't seem to work with other languages like Japanese and Chinese. I've tried with many analyzers and tokenizers to no avail.
Any answer is appreciated.
Exact matches on the tokenized fields are not that straightforward. Better save your field as keyword if you have such requirements.
Additionally, keyword data type support wildcard query which can help you in your wildcard searches.
So just create a keyword type subfield. Then use the wildcard query on it.
Your search query will look something like below:
GET /_search
{
"query": {
"wildcard" : {
"title.keyword" : "The * day I went with my * to the"
}
}
}
In the above query, it is assumed that title field has a sub-field named keyword of data type keyword.
More on wildcard query can be found here.
If you still want to do exact searches on text data type, then read this
Elasticsearch doesn't have Google like search out of the box, but you can build something similar.
Let's assume when someone quotes a search text what they want is a match phrase query. Basically remove the \" and search for the remaining string as a phrase.
PUT test/_doc/1
{
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
GET test/_search
{
"query": {
"match_phrase": {
"title": "The other day I went with my mom to the pool and had a lot of fun"
}
}
}
For the * it's getting a little more interesting. You could just make multiple phrase searches out of this and combine them. Example:
GET test/_search
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"title": "The"
}
},
{
"match_phrase": {
"title": "day I went with my"
}
},
{
"match_phrase": {
"title": "to the"
}
}
]
}
}
}
Or you could use slop in the phrase search. All the terms in your search query have to be there (unless they are being removed by the tokenizer or as stop words), but the matched phrase can have additional words in the phrase. Here we can replace each * with 1 other words, so a slop of 2 in total. If you would want more than 1 word in the place of each * you will need to pick a higher slop:
GET test/_search
{
"query": {
"match_phrase": {
"title": {
"query": "The * day I went with my * to the",
"slop": 2
}
}
}
}
Another alternative might be shingles, but this is a more advanced concept and I would start off with the basics for now.

how to allow exact word match to have higher score

In elastic search, I have defined two synonyms for "swim", like "play", "walk".
Say if I have only two values in elastic search with value "I like to swim", "I like to play".
If the user enter a query "I hope to play" I want it to match "I like to play" with a higher score than "I like to swim" (exact word matches (in this case play) to have higher score), is there a way to achieve that?
Yes you can !
You can acheive any boost you want with the boolean query
The global concept and that you will define must clauses that need to be matched, and should that could match and thus add a boost in your document score.
So for your need, document needs to match with synonyms, and should match without it.
You just need to index your searchable fields with two properties, one with synonyms and another without synonyms.
and example could be
{
"query": {
"bool": {
"must": [
{
"match": {
"text.withSynonyms": "I hope to play"
}
}
],
"should": [
{
"match": {
"text.withoutSynonyms": "I hope to play"
}
}
]
}
}
}

Search in two fields on elasticsearch with kibana

Assuming I have an index with two fields: title and loc, I would like to search in this two fields and get the "best" match. So if I have three items:
{"title": "castle", "loc": "something"},
{"title": "something castle something", "loc": "something,pontivy,something"},
{"title": "something else", "loc": "something"}
... I would like to get the second one which has "castle" in its title and "pontivy" in its loc. I tried to simplify the example and the base, it's a bit more complicated. So I tried this query, but it seems not accurate (it's a feeling, not really easy to explain):
GET merimee/_search/?
{
"query": {
"multi_match" : {
"query": "castle pontivy",
"fields": [ "title", "loc" ]
}
}
}
Is it the right way to search in various field and get the one which match the in all the fields?
Not sure my question is clear enough, I can edit if required.
EDIT:
The story is: the user type "castle pontivy" and I want to get the "best" result for this query, which is the second because it contains "castle" in "title" and "pontivy" in "loc". In other words I want the result that has the best result in both fields.
As the other posted suggested, you could use a bool query but that might not work for your use case since you have a single search box that you want to query against multiple fields with.
I recommend looking at a Simple Query String query as that will likely give you the functionality you're looking for. See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html
So you could do something similar to this:
{
"query": {
"simple_query_string" : {
"query": "castle pontivy",
"fields": ["title", "loc"],
"default_operator": "and"
}
}
}
So this will try to give you the best documents that match both terms in either of those fields. The default operator is set as AND here because otherwise it is OR which might not give you the expected results.
It is worthwhile to experiment with other options available for this query type as well. You might also explore using a Query String query as it gives more flexibility but the Simple Query String term works very well for most cases.
This can be done by using bool type of query and then matching the fields.
GET _search
{
"query":
{
"bool": {"must": [{"match": {"title": "castle"}},{"match": {"loc": "pontivy"}}]
}
}
}

How to search for exact order matching of words in elasticsearch?

I need to make a match for an exact sentance. Here is the query Im using
{
"query": {
"match": {
"description": "void names error"
}
}
}
But the above query is returning me not only the exact matching documents but also many other partial matches too. How to make an exact match of the above sentance?
{
"query": {
"match": {
"description": "void names error",
"type":"phrase"
}
}
}
More Details at
Phrase Query
Phrase query would be the best candidate here.
Phrase will make sure that those tokens which are next to each other only matches.
If you want to match based on the order and want to score based on how well close they are , I would suggest span near query series.

Elasticsearch: filter by any field

I am playing with filters in elasticsearch (we use old version 1.3.1), and I need to filter my search results by any field. With query, this can be done like this:
"query": {
"query_string": {
"query": "_all:test"
}
}
But filters seems to not work with _all statement. What can I do? Would newer elasticsearch version solve my problem?
Thanks in advance!
PS: I need to search exact values, so I cannot use queries. There is difference between queries and filters - if you search for my brown, then you can expect results like:
my brown
This is my brown dog.
someone stolen my brown wallet
But filter will return only my brown, and that is what I need.
You might want to read up a little on the distinction between queries and filters. What you're doing there is a query string query.
If you do actually want to filter against exact text tokens (read up on analysis if you don't know what I mean by "tokens"), AND you have your mapping set up such that the "_all" field behaves as you're expecting then try something like this:
POST /test_index/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"_all": "test"
}
}
}
}
}
If, on the other hand, you want to allow some analysis (so that "Test" is tokenized to "test", for example), you may want this instead:
POST /test_index/_search
{
"query": {
"match": {
"_all": "Test"
}
}
}
Here is some code I used to play around with it:
http://sense.qbox.io/gist/44adf2c2ade8abd6758f0e08ed2e40434850fc1c

Resources