ElasticSearch - prefix with space and filtering - elasticsearch

My ElasticSearch server contains documents of the following form:
{
"_index": "xindex",
"_type": "xtype",
"_id": "1100",
"_score": 3.00010,
"_source": {
"_id": "2333345",
"field1": "11111111111111",
"field2": "y",
"name": "hello world",
}
}
I need to get all the documents with name prefix "hello wo" and field2 "y".
Tried a lot of queries and none have worked. There are all kind of solutions for the prefix with space issue, but when adding the filtering/another query for field2, results get corrupted.
Thanks.

You can achieve this in 3 steps :
Change your mapping of field name to not_analyzed
Use a match_phrase_prefix query (documentation here)
Filter this query results by wrapping it in a filtered query and use a term filter on the field2 with value "y"
You can see it working with the following dataset :
PUT test/prefix/_mapping
{
"properties": {
"name":{
"type": "string",
"index": "not_analyzed"
}
}
}
//should match
PUT test/prefix/2333345
{
"field1": "11111111111111",
"field2": "y",
"name": "hello world"
}
//should match
PUT test/prefix/1112223
{
"field1": "22222222222222",
"field2": "y",
"name": "hello wombat"
}
//should not match (field2 value is different)
PUT test/prefix/4445556
{
"field1": "33333333333333",
"field2": "z",
"name": "hello world"
}
//should not match (second word not starting with wo)
PUT test/prefix/4445556
{
"field1": "33333333333333",
"field2": "y",
"name": "hello zombie"
}
Then, the query is :
GET test/prefix/_search
{
"query": {
"filtered": {
"query": {
"match_phrase_prefix" : {
"name" : "hello wo"
}
},
"filter": {
"term": {
"field2": "y"
}
}
}
}
}
which outputs the documents 1112223 and 2333345 as expected :
{
"took": 20,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1.592944,
"hits": [
{
"_index": "test",
"_type": "prefix",
"_id": "2333345",
"_score": 1.592944,
"_source": {
"field1": "11111111111111",
"field2": "y",
"name": "hello world"
}
},
{
"_index": "test",
"_type": "prefix",
"_id": "1112223",
"_score": 1.592944,
"_source": {
"field1": "22222222222222",
"field2": "y",
"name": "hello wombat"
}
}
]
}
}

use simple_query_string, This approach solved my issue:
{
"query": {
"bool": {
"should": [
{
"simple_query_string": {
"fields": [
"name"
],
"default_operator": "and",
"query": "(hello world*)"
}
}
]
}
}
}

Related

Elastic match documents containing all values in array

I have documents like the ones bellow in Elastic. Each document has some information and all mandatory permissions to access the document.
When I query, I would like to pass all user permissions and receive matched documents but am having difficulties with the query.
The documents:
{
"id": 1,
"permissions": ["a", "d"]
}
{
"id": 2,
"permissions": ["a"]
}
{
"id": 3,
"permissions": ["a", "b"]
}
{
"id": 4,
"permissions": ["a", "c"]
}
This is the closest I got:
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"tags.keyword": "a"
}
},
{
"match_phrase": {
"tags.keyword": "b"
}
},
],
"minimum_should_match": doc.tags.length
}
}
}
// Result are documents with id's 2 and 3.
I tried extending the "minimum_should_match" with the "script" but without success (apparently it does not support it):
"script" : {
"script" : {
"inline": "doc['permissions'].length",
"lang": "painless"
}
}
In an example above, with passed permission array ["a", "b", "c"], the output should be documents with id's 2, 3 and 4. ["a"] matches only document with id 2.
EDIT: Additional information
A document has up to 5 permissions, as well as the users, but the set of permissions is quite big (500+), so I am searching for a generic query. The data can also be transformed.
I am using Elastic 7.6
Any help is appreciated!
There is no efficient way to achieve your expected result, but apart from the below search query, you can even get your result using scripting, please refer this SO answer to know that.
You can use Terms Query that returns documents that contain one or more exact terms in a provided field.
Adding a working example with index data(taken the same as that given in question), index mapping, search query, and search result.
Index Mapping:
{
"mappings": {
"properties": {
"id": {
"type": "integer"
},
"permissions": {
"type": "text"
}
}
}
}
Search Query:
{
"query": {
"constant_score": {
"filter": {
"bool": {
"must": [
{
"terms": {
"permissions": [
"a",
"b",
"c"
]
}
}
],
"must_not": [
{
"terms": {
"permissions": [
"d"
]
}
}
]
}
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64081578",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"id": 2,
"permissions": [
"a"
]
}
},
{
"_index": "stof_64081578",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"id": 3,
"permissions": [
"a",
"b"
]
}
},
{
"_index": "stof_64081578",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"id": 4,
"permissions": [
"a",
"c"
]
}
}
]

How to perform elastic search _update_by_query using painless script - for complex condition

Can you suggest how to update documents (with a script - i guess painless) that based on condition fields?
its purpose is to add/or remove values from the document
so if I have those input documents:
doc //1st
{
"Tags":["foo"],
"flag":"true"
}
doc //2nd
{
"flag":"true"
}
doc //3rd
{
"Tags": ["goo"],
"flag":"false"
}
And I want to perform something like this:
Update all documents that have "flag=true" with:
Added tags: "me", "one"
Deleted tags: "goo","foo"
so expected result should be something like:
doc //1st
{
"Tags":["me","one"],
"flag":"true"
}
doc //2nd
{
"Tags":["me","one"],
"flag":"true"
}
doc //3rd
{
"Tags": ["goo"],
"flag":"false"
}
Create mapping:
PUT documents
{
"mappings": {
"document": {
"properties": {
"tags": {
"type": "keyword",
"index": "not_analyzed"
},
"flag": {
"type": "boolean"
}
}
}
}
}
Insert first doc:
PUT documents/document/1
{
"tags":["foo"],
"flag": true
}
Insert second doc (keep in mind that for empty tags I specified empty tags array because if you don't have field at all you will need to check in script does field exists):
PUT documents/document/2
{
"tags": [],
"flag": true
}
Add third doc:
PUT documents/document/3
{
"tags": ["goo"],
"flag": false
}
And then run _update_by_query which has two arrays as params one for elements to add and one for elements to remove:
POST documents/_update_by_query
{
"script": {
"inline": "for(int i = 0; i < params.add_tags.size(); i++) { if(!ctx._source.tags.contains(params.add_tags[i].value)) { ctx._source.tags.add(params.add_tags[i].value)}} for(int i = 0; i < params.remove_tags.size(); i++) { if(ctx._source.tags.contains(params.remove_tags[i].value)){ctx._source.tags.removeAll(Collections.singleton(params.remove_tags[i].value))}}",
"params": {
"add_tags": [
{"value": "me"},
{"value": "one"}
],
"remove_tags": [
{"value": "goo"},
{"value": "foo"}
]
}
},
"query": {
"bool": {
"must": [
{"term": {"flag": true}}
]
}
}
}
If you then do following search:
GET documents/_search
you will get following result (which I think is what you want):
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [{
"_index": "documents",
"_type": "document",
"_id": "2",
"_score": 1,
"_source": {
"flag": true,
"tags": [
"me",
"one"
]
}
},
{
"_index": "documents",
"_type": "document",
"_id": "1",
"_score": 1,
"_source": {
"flag": true,
"tags": [
"me",
"one"
]
}
},
{
"_index": "documents",
"_type": "document",
"_id": "3",
"_score": 1,
"_source": {
"tags": [
"goo"
],
"flag": false
}
}
]
}
}

Elasticsearch is giving unnecessary records on match query

I want to get All the documents whose dateofbirth is having substring "-11-09".
This is my elasticsearch query :
{ "query": { "bool" : { "must": { "match": { "dobdata": ".*-11-09.*"} } } } }
And the result i am getting is
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 5.0782137,
"hits": [
{
"_index": "userindexv1",
"_type": "usertype",
"_id": "58f9a9d1acf8c47037000038",
"_score": 5.0782137,
"_source": {
"fullname": "Eshwar ",
"fullname_raw": "Eshwar ",
"mobile1": "7222222256",
"uid": "UIDS1010",
"mobile2": "",
"classname": "Class 5",
"classname_raw": "Class 5",
"divid": 63,
"category": "S",
"dobdata": "2010-11-09"
}
},
{
"_index": "userindexv1",
"_type": "usertype",
"_id": "57960b35acf8c4c43000002c",
"_score": 1.259227,
"_source": {
"fullname": "Sindhu ",
"fullname_raw": "Sindhu ",
"mobile1": "9467952335",
"uid": "UIDS1006",
"mobile2": "",
"classname": "class 1s Group for class g",
"classname_raw": "class 1s Group for class g",
"divid": 63,
"category": "S",
"dobdata": "2012-11-08"
}
},
{
"_index": "userindexv1",
"_type": "usertype",
"_id": "58eb62d2acf8c4d43300002f",
"_score": 1.1471639,
"_source": {
"fullname": "Himanshu ",
"fullname_raw": "Himanshu ",
"mobile1": "9898785484",
"uid": "",
"mobile2": "",
"classname": "Play Group",
"classname_raw": "Play Group",
"divid": 63,
"category": "S",
"dobdata": "2012-11-08"
}
},
{
"_index": "userindexv1",
"_type": "usertype",
"_id": "580dbe5bacf8c4b82300002a",
"_score": 1.1471639,
"_source": {
"fullname": "Sai Bhargav ",
"fullname_raw": "Sai Bhargav ",
"mobile1": "9739477159",
"uid": "",
"mobile2": "7396226318",
"classname": "class 1s Group for class g",
"classname_raw": "class 1s Group for class g",
"divid": 63,
"category": "S",
"dobdata": "2012-11-07"
}
}
]
}}
I am getting the records whose dateofbirth does not contain the string "-11-09". I tried to work around it. I am not able to find the soultion.
I am new to elasticsearch. I want only the first record. Can anyone please help me out. Sorry for bad english.
Even I faced same problem and I solved it by doing two things.
1. Changed the format of date of birth from Y-m-d to YMd and made the index as not_analyzed.
2. Used wildcard query insteadof match query
{
"query": {
"wildcard": {
"dobdata": {
"value": "*Nov09*"
}
}
}
}
It solved my problem.Hope this will solve your problem too.
For getting results with month-date=11-09 use following query.
The mapping for dobdata is
"dobdata": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
Query is:
curl -XGET "http://localhost:9200/userindexv1/_search" -H 'Content-Type:
application/json' -d'
{
"query": {
"wildcard": {
"dobdata.keyword":"*-11-09*"
}
}
}'
Also use multifield mapping instead of _raw fields Refer here.
#K Sathish so you use elasticsearch 2.x? String type for date is not so confortable. I suggest to you to change the datatype from string in date type , so you can get also the range query. Once changed in date type you can make your query in elastic 2.x in this way:
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc.dobdata.date.monthOfYear == 11 && doc.dobdata.date.dayOfMonth == 9"
}
}
}
}
}

Fuzzy search in the elasticsearch gives matches with incorrect order

I am trying to build an engine where we can match areas mentioned in the address with the list available in elasticsearch.
I am using this query to search areas similar to "iit".
My query is :
{
"query": {
"fuzzy": {
"locality": {
"value": "iit",
"fuzziness": 1
}
}
},
"highlight": {
"fields": {
"locality": {}
}
}
}
I am getting below results :
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 2.1290483,
"hits": [
{
"_index": "geocoding_1",
"_type": "localities",
"_id": "AVuzRiZ04pEQsZFpK6A_",
"_score": 2.1290483,
"_source": {
"locality": [
"emerald isle ii"
]
},
"highlight": {
"locality": [
"emerald isle <em>ii</em>"
]
}
},
{
"_index": "geocoding_1",
"_type": "localities",
"_id": "AVuzRfof4pEQsZFpK59H",
"_score": 1.877402,
"_source": {
"locality": [
"iit - bombay",
"iitb",
"indian institute of technology - bombay"
]
},
"highlight": {
"locality": [
"<em>iit</em> - bombay",
"<em>iitb</em>"
]
}
}
]
}
}
Because "iit" is directly available in the 2nd document and hence I was expecting that to be returned as best match with highest score. What is the change that I should make so that I get 2nd document with highest score.
I am using ES 2.3.4 .
If you also are interested in exact matching to score better, I always suggest a bool with should statements and adding a match or term query in there. In this way the combined scores will favor the exact matching:
{
"query": {
"bool": {
"should": [
{
"fuzzy": {
"locality": {
"value": "iit",
"fuzziness": 1
}
}
},
{
"match": {
"locality": "iit"
}
}
]
}
},
"highlight": {
"fields": {
"locality": {}
}
}
}

Elastic Search- Fetch Distinct Tags

I have document of following format:
{
_id :"1",
tags:["guava","apple","mango", "banana", "gulmohar"]
}
{
_id:"2",
tags: ["orange","guava", "mango shakes", "apple pie", "grammar"]
}
{
_id:"3",
tags: ["apple","grapes", "water", "gulmohar","water-melon", "green"]
}
Now, I want to fetch unique tags value from whole document 'tags field' starting with prefix g*, so that these unique tags will be display by tag suggestors(Stackoverflow site is an example).
For example: Whenever user types, 'g':
"guava", "gulmohar", "grammar", "grapes" and "green" should be returned as a result.
ie. the query should returns distinct tags with prefix g*.
I tried everywhere, browse whole documentations, searched es forum, but I didn't find any clue, much to my dismay.
I tried aggregations, but aggregations returns the distinct count for whole words/token in tags field. It does not return the unique list of tags starting with 'g'.
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"allow_leading_wildcard": false,
"fields": [
"tags"
],
"query": "g*",
"fuzziness":0
}
}
]
}
},
"filter": {
//some condition on other field...
}
}
},
"aggs": {
"distinct_tags": {
"terms": {
"field": "tags",
"size": 10
}
}
},
result of above: guava(w), apple(q), mango(1),...
Can someone please suggest me the correct way to fetch all the distinct tags with prefix input_prefix*?
It's a bit of a hack, but this seems to accomplish what you want.
I created an index and added your docs:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
POST /test_index/_bulk
{"index":{"_index":"test_index","_type":"doc","_id":1}}
{"tags":["guava","apple","mango", "banana", "gulmohar"]}
{"index":{"_index":"test_index","_type":"doc","_id":2}}
{"tags": ["orange","guava", "mango shakes", "apple pie", "grammar"]}
{"index":{"_index":"test_index","_type":"doc","_id":3}}
{"tags": ["guava","apple","grapes", "water", "grammar","gulmohar","water-melon", "green"]}
Then I used a combination of prefix query and highlighting as follows:
POST /test_index/_search
{
"query": {
"prefix": {
"tags": {
"value": "g"
}
}
},
"fields": [ ],
"highlight": {
"pre_tags": [""],
"post_tags": [""],
"fields": {
"tags": {}
}
}
}
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"highlight": {
"tags": [
"guava",
"gulmohar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grammar"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"highlight": {
"tags": [
"guava",
"grapes",
"grammar",
"gulmohar",
"green"
]
}
}
]
}
}
Here is the code I used:
http://sense.qbox.io/gist/c14675ee8bd3934389a6cb0c85ff57621a17bf11
What you're trying to do amounts to autocomplete, of course, and there are perhaps better ways of going about that than what I posted above (though they are a bit more involved). Here are a couple of blog posts we did about ways to set up autocomplete:
http://blog.qbox.io/quick-and-dirty-autocomplete-with-elasticsearch-completion-suggest
http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams
As per #Sloan Ahrens advice, I did following:
Updated the mapping:
"tags": {
"type": "completion",
"context": {
"filter_color": {
"type": "category",
"default": "",
"path": "fruits.color"
},
"filter_type": {
"type": "category",
"default": "",
"path": "fruits.type"
}
}
}
Reference: ES API Guide
Inserted these indexes:
{
_id :"1",
tags:{input" :["guava","apple","mango", "banana", "gulmohar"]},
fruits:{color:'bar',type:'alice'}
}
{
_id:"2",
tags:{["orange","guava", "mango shakes", "apple pie", "grammar"]}
fruits:{color:'foo',type:'bob'}
}
{
_id:"3",
tags:{ ["apple","grapes", "water", "gulmohar","water-melon", "green"]}
fruits:{color:'foo',type:'alice'}
}
I don't need to modify much, my original index. Just added input before tags array.
POST rescu1/_suggest?pretty'
{
"suggest": {
"text": "g",
"completion": {
"field": "tags",
"size": 10,
"context": {
"filter_color": "bar",
"filter_type": "alice"
}
}
}
}
gave me the desired output.
I accepted #Sloan Ahrens answer as his suggestions worked like a charm for me, and he showed me the right direction.

Resources