Multi words in filter term - elasticsearch

I have a document with a tags field contain "john smith"
This query returns it:
{
"query": {
"bool": {
"filter": {
"term": {
"tags": "john"
}
}
}
}
}
But this not:
{
"query": {
"bool": {
"filter": {
"term": {
"tags": "john smith"
}
}
}
}
}
Why? How can I reach filter matches multiple words?

You need to use the terms query if I understand your requirements properly, which is you want to search for multiple values ie john or smith.
index def
{
"mappings": {
"properties": {
"tags": {
"type": "text"
}
}
}
}
Index sample docs
{
"tags" : "john Lay"
}
{
"tags" : "john opster"
}
{
"tags" : "john smith"
}
Search query for john or lay
{
"query" : {
"terms" : {
"tags" : ["john", "lay"],
"boost" : 1.0
}
}
}
And search result
"hits": [
{
"_index": "so_auto",
"_type": "_doc",
"_id": "9",
"_score": 1.0,
"_source": {
"model_name": "john smith"
}
},
{
"_index": "so_auto",
"_type": "_doc",
"_id": "10",
"_score": 1.0,
"_source": {
"model_name": "john opster"
}
},
{
"_index": "so_auto",
"_type": "_doc",
"_id": "12",
"_score": 1.0,
"_source": {
"model_name": "john Lay"
}
}

I have create sample index with provided fields and it gave correct answer.
Mapping of the tags field that I have used to create index is :
"mappings": {
"properties": {
"tags": {
"type": "keyword"
}
}
}
We are using keyword field since in term query you will require exact match.
I have created three documents in this index with following tags field :
{
"_index": "secesindex",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"tags": "John Smith"
}
},
{
"_index": "secesindex",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"tags": "John Farraday"
}
},
{
"_index": "secesindex",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"tags": "John"
}
}
Now when I am running query as mentioned above :
{
"query": {
"bool": {
"filter": {
"term": {
"tags": "John Smith"
}
}
}
}
}
It gives exact match to the document which has tags field value equal to "John Smith".
"hits": [
{
"_index": "secesindex",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"tags": "John Smith"
}
}
]

Related

How can I filter bucket aggregation results based on sub-bucket aggregation document count?

I need a query where the results will exclude any userIds if they have at least 1 document with the tag set to a value within an 'excluded' list i.e. TAG A or TAG B.
I have an index with data like below:
{
"_index": "tags-3",
"_type": "_doc",
"_id": "YYYYYYY",
"_score": 10.272416,
"_source": {
"id": "YYYYYYY",
"userId": "User1",
"tag": "TAG A"
}
},
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_score": 10.272416,
"_source": {
"id": "ZZZZZZ",
"userId": "User1",
"tag": "TAG B"
},
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_score": 10.272416,
"_source": {
"id": "ZZZZZZ",
"userId": "User2",
"tag": "TAG A"
},
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_score": 10.272416,
"_source": {
"id": "ZZZZZZ",
"userId": "User2",
"tag": "TAG D"
},
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_score": 10.272416,
"_source": {
"id": "ZZZZZZ",
"userId": "User4",
"tag": "TAG D"
}
For the input above, I would expect an output of:
{
"_index": "tags-3",
"_type": "_doc",
"_id": "ZZZZZZ",
"_source": {
"userId": "User4"
}
since User4 has no documents with the tag set to TAG A or TAG B.
User4 is the only other user with a document with the tag set to TAG D however since it has another document with TAG B, it is excluded.
One way to do this would be to:
Aggregate (group) on the user IDs - this would give you all the user IDs
Then, aggregate the documents for each user ID (nested aggregation) with a filter for the multiple (or single) tag values you want to exclude - this would give you the total sum of documents with the tag set to an excluded tag for each user ID
Finally, perform a bucket selector aggregation, only including user IDs which have a count of 0 for any excluded documents; this would give you the users who don't have any documents with any excluded tag values
This query should work, for an excluded tag list of A, B & C:
{
"aggs": {
"user-ids": {
"terms": {
"field": "userId.keyword",
"size": 10000
},
"aggs": {
"excluded_tags_agg": {
"filter": {
"bool": {
"should": [
{
"match_phrase": {
"tag.keyword": "TAG A"
}
},
{
"match_phrase": {
"tag.keyword": "TAG B"
}
},
{
"match_phrase": {
"tag.keyword": "TAG C"
}
}
],
"minimum_should_match": 1
}
}
},
"filter_userids_which_do_not_have_any_docs_with_excluded_tags": {
"bucket_selector": {
"buckets_path": {
"doc_count": "excluded_tags_agg > _count"
},
"script": "params.doc_count == 0"
}
}
}
}
},
"size": 0
}

Search results for term query not in alphabetical sort order

My results for the following term query gets rendered like this. But we would want the search results where "BC" appears after "Bar", since we are trying to perform a alphabetical search. What should be done to get this working
Adam
Buck
BC
Bar
Car
Far
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withSourceFilter(new FetchSourceFilterBuilder().withIncludes().build())
.withQuery(QueryBuilders.termQuery("type", field))
.withSort(new FieldSortBuilder("name").order(SortOrder.ASC))
.withPageable(pageable).build();
To sort the result in alphabetical order you can define a normalizer with a lowercase filter, lowercase filter will ensure that all the letters are changed to lowercase before indexing the document and searching.
Modify your index mapping as
{
"settings": {
"analysis": {
"normalizer": {
"my_normalizer": {
"type": "custom",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "keyword",
"normalizer": "my_normalizer"
}
}
}
}
Indexed the same sample documents as given in the question.
Search Query:
{
"sort":{
"name":{
"order":"asc"
}
}
}
Search Result:
"hits": [
{
"_index": "66064809",
"_type": "_doc",
"_id": "1",
"_score": null,
"_source": {
"name": "Adam"
},
"sort": [
"adam"
]
},
{
"_index": "66064809",
"_type": "_doc",
"_id": "4",
"_score": null,
"_source": {
"name": "Bar"
},
"sort": [
"bar"
]
},
{
"_index": "66064809",
"_type": "_doc",
"_id": "3",
"_score": null,
"_source": {
"name": "BC"
},
"sort": [
"bc"
]
},
{
"_index": "66064809",
"_type": "_doc",
"_id": "2",
"_score": null,
"_source": {
"name": "Buck"
},
"sort": [
"buck"
]
},
{
"_index": "66064809",
"_type": "_doc",
"_id": "5",
"_score": null,
"_source": {
"name": "Car"
},
"sort": [
"car"
]
},
{
"_index": "66064809",
"_type": "_doc",
"_id": "6",
"_score": null,
"_source": {
"name": "Far"
},
"sort": [
"far"
]
}
]
}

Elasticsearch geo query with aggregation

I have an elasticsearch index containing user locations.
I need to perform aggregate query with geo bounding box using geohash grid, and for buckets that have documents count less than some value, i need to return all documents.
How can i do this?
Since you have not given any relevant information about the index which you have created and the user locations.
I am considering the below data:
index Def
{
"mappings": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
Index Sample Doc
POST _bulk
{"index":{"_id":1}}
{"location":"52.37408,4.912350","name":"The golden dragon"}
{"index":{"_id":2}}
{"location":"52.369219,4.901618","name":"Burger King"}
{"index":{"_id":3}}
{"location":"52.371667,4.914722","name":"Wendys"}
{"index":{"_id":4}}
{"location":"51.222900,4.405200","name":"Taco Bell"}
{"index":{"_id":5}}
{"location":"48.861111,2.336389","name":"McDonalds"}
{"index":{"_id":6}}
{"location":"48.860000,2.327000","name":"KFC"}
According to your question:
When requesting detailed buckets a filter like geo_bounding_box
should be applied to narrow the subject area
To know more about this, you can refer to this official ES doc
Now, in order to filter data based on doc_count with aggregations, we can use bucket_selector pipeline aggregation.
From documentation
Pipeline aggregations work on the outputs produced from other
aggregations rather than from document sets, adding information to the
output tree.
So, the amount of work that need to be done to calculate doc_count will be the same.
Query
{
"aggs": {
"location": {
"filter": {
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 52.5225,
"lon": 4.5552
},
"bottom_right": {
"lat": 52.2291,
"lon": 5.2322
}
}
}
},
"aggs": {
"around_amsterdam": {
"geohash_grid": {
"field": "location",
"precision": 8
},
"aggs": {
"the_filter": {
"bucket_selector": {
"buckets_path": {
"the_doc_count": "_count"
},
"script": "params.the_doc_count < 2"
}
}
}
}
}
}
}
}
Search Result
"hits": {
"total": {
"value": 6,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "restaurant",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"location": "52.37408,4.912350",
"name": "The golden dragon"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"location": "52.369219,4.901618",
"name": "Burger King"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"location": "52.371667,4.914722",
"name": "Wendys"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"location": "51.222900,4.405200",
"name": "Taco Bell"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "5",
"_score": 1.0,
"_source": {
"location": "48.861111,2.336389",
"name": "McDonalds"
}
},
{
"_index": "restaurant",
"_type": "_doc",
"_id": "6",
"_score": 1.0,
"_source": {
"location": "48.860000,2.327000",
"name": "KFC"
}
}
]
},
"aggregations": {
"location": {
"doc_count": 3,
"around_amsterdam": {
"buckets": [
{
"key": "u173zy3j",
"doc_count": 1
},
{
"key": "u173zvfz",
"doc_count": 1
},
{
"key": "u173zt90",
"doc_count": 1
}
]
}
}
}
}
It will filter out all the documents, whose count is less than 2 based on "params.the_doc_count < 2"

Elasticsearch query starting from a particular value

Is there a way to query starting from a particular value and get the next n records in Elasticsearch?
For example, I want to get 10 records starting from employee id "ABC_123".
The below query gives an error saying
[terms] query does not support [empId]
GET /_search
{
"from": 0, "size": 10,
"query" : {
"terms" : {
"empId" : "ABC_123"
}
}
}
What can I do about this?
You can use the prefix query, Also you can read more about the autocomplete on my blog, which discussed 4 approaches to make it work and their trade-off.
I used prefix query on your sample data and got the expected output and below is the step by step guide.
Index mapping
{
"mappings": {
"properties": {
"empId": {
"type": "keyword" --> field type `keyword`
}
}
}
}
Index sample docs
{
"empId" : "ABC_1231"
}
{
"empId" : "ABC_1232"
}
{
"empId" : "ABC_1233"
}
{
"empId" : "ABC_1234"
}
and so on
Prefix Search query
{
"from": 0,
"size": 10,
"query": {
"prefix": {
"empId": "ABC_123"
}
}
}
Search result
"hits": [
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"empId": "ABC_1231"
}
},
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"empId": "ABC_1232"
}
},
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"empId": "ABC_1233"
}
},
{
"_index": "so_prefix",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"empId": "ABC_1234"
}
}
]

Elasticsearch: Query the most recent that doesn't contain the field 'X'

I have the following search query:
{
"query": {
"match": {
"name": "testlib"
}
}
}
When I do this query I get the three results below. What I want to do now is only return one result: the newest #timestamp that doesn't contain version_pre. So in this case, only return AV6qvDXDyHw9vNh6Wlpl.
[
{
"_index": "testsoftware",
"_type": "software",
"_id": "AV6qvDXDyHw9vNh6Wlpl",
"_score": 0.2876821,
"_source": {
"#timestamp": "2017-09-21T11:02:15-04:00",
"name": "testlib",
"version_major": 1,
"version_minor": 0,
"version_patch": 1
}
},
{
"_index": "testsoftware",
"_type": "software",
"_id": "AV6qvDF5MtcMTuGknsVs",
"_score": 0.18232156,
"_source": {
"#timestamp": "2017-09-20T17:21:35-04:00",
"name": "testlib",
"version_major": 1,
"version_minor": 0,
"version_patch": 0
}
},
{
"_index": "testsoftware",
"_type": "software",
"_id": "AV6qvDnVyHw9vNh6Wlpn",
"_score": 0.18232156,
"_source": {
"#timestamp": "2017-09-22T13:56:55-04:00",
"name": "testlib",
"version_major": 1,
"version_minor": 0,
"version_patch": 2,
"version_pre": 0
}
}
]
Use sort (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html) and https://www.elastic.co/guide/en/elasticsearch/reference/2.3/query-dsl-exists-query.html:
{
"size" : 1,
"sort" : [{ "#timestamp" : {"order" : "asc"}}],
"query" : {
"bool": {
"must_not": {
"exists": {
"field": "version_pre"
}
}
}
Or even, via query string:
/_search?sort=#timestamp:desc&size=1&q=_missing_:version_pre

Resources