elasticsearch: "More like this" combined with additional constraint - elasticsearch

I just bumped into "more like this" functionality/api. Is there a possibility to combine the result from more_like_this with some additional search constraint?
I have two following ES query which works:
POST /h/B/_search
{
"query": {
"more_like_this": {
"fields": [
"desc"
],
"ids": [
"511111260"
],
"min_term_freq": 1,
"max_query_terms": 25
}
}
}
Which returns
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 53,
"max_score": 3.2860293,
"hits": [
...
Which is fine but I need to specify additional constraint over other field of the underlying document which works separately fine:
POST /h/B/_search
{
"query": {
"bool": {
"must": {
"match": {
"Kind": "Pen"
}
}
}
}
}
I would love to combine those two to one, as the query should state: "Find a similar items to items labelled with Pen". I tried following with nested query but that gives me some error back:
POST /h/B/_search
{
"query": {
"more_like_this": {
"fields": [
"desc"
],
"ids": [
"511111260"
],
"min_term_freq": 1,
"max_query_terms": 25
},
"nested": {
"query": {
"bool": {
"must": {
"match": {
"Kind": "Pen"
}
}
}
}
}
}
}
I tried several variant for combining those two search criteria but so far with no luck.
If someone more experienced could provide some hint that would be really appreciated.
Thanks

bool queries are used exactly for this purpose. A bool must is basically equivalent to the Boolean AND operation. Similarly you can use bool should for Boolean OR and bool must_not for Boolean NOT operations.
POST /h/B/_search
{
"query": {
"bool": {
"must": [
{
"more_like_this": {
"fields": [
"desc"
],
"ids": [
"511111260"
],
"min_term_freq": 1,
"max_query_terms": 25
}
},
{
"match": {
"Kind": "Pen"
}
}
]
}
}
}

Related

Elasticsearch - How do i search on 2 fields. 1 must be null and other must match search text

I am trying to do a search on elasticsearch 6.8.
I don't have control over the elastic search instance, meaning i cannot control how the data is indexed.
I have data structured like this when i do a match. all search:
{ "took": 4,
"timed_out": false,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 15.703552,
"hits": [ {
"_index": "(removed index)",
"_type": "_doc",
"_id": "******** (Removed id)",
"_score": 15.703552,
"_source": {
"VCompany": {
"cvrNummer": 12345678,
"penheder": [
{
"pNummer": 1234567898,
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
}
],
"vMetadata": {
"nyesteNavn": {
"navn": "company1",
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
},
}
}
}
}
}]
The json might not be fully complete because i removed some unneeded data. So what I am trying to do is search where: "vCompany.vMetaData.nyesteNavn.gyldigTil" is null and where "vCompany.vMetaData.nyesteNavn.navn" will match a text string.
I tried something like this:
{
"query": {
"bool": {
"must": [
{"match": {"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"}}
],
"should": {
"terms": {
"Vrvirksomhed.penheder.periode.gyldigTil": null
}
}
}
}
You need to use must_not with exists query like below to check if field is null or not. Below query will give result where company1 is matching and Vrvirksomhed.penheder.periode.gyldigTil field is null.
{
"query": {
"bool": {
"must": [
{
"match": {
"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"
}
}
],
"must_not": [
{
"exists": {
"field": "Vrvirksomhed.penheder.periode.gyldigTil"
}
}
]
}
}
}

Is it possible to use a query result into another query in ElasticSearch?

I have two queries that I want to combine, the first one returns a document with some fields.
Now I want to use one of these fields into the new query without creating two separates ones.
Is there a way to combine them in order to accomplish my task?
This is the first query
{
"_source": {
"includes": [
"data.session"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"field1": "9419"
}
},
{
"match": {
"field2": "5387"
}
}
],
"filter": [
{
"range": {
"timestamp": {
"time_zone": "+00:00",
"gte": "2020-10-24 10:16",
"lte": "2020-10-24 11:16"
}
}
}
]
}
},
"size" : 1
}
And this is the response returned:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 109,
"relation": "eq"
},
"max_score": 3.4183793,
"hits": [
{
"_index": "file",
"_type": "_doc",
"_id": "UBYCkgsEzLKoXh",
"_score": 3.4183793,
"_source": {
"data": {
"session": "123456789"
}
}
}
]
}
}
I want to use that "data.session" into another query, instead of rewriting the value of the field by passing the result of the first query.
{
"_source": {
"includes": [
"data.session"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"data.session": "123456789"
}
}
]
}
},
"sort": [
{
"timestamp": {
"order": "asc"
}
}
]
}
If you mean to use the result of the first query as an input to the second query, then it's not possible in Elasticsearch. But if you share your query and use-case, we might suggest you better way.
ElasticSearch does not allow sub queries or inner queries.

Multiple Match Phrase Prefixes Return Zero Results In Elasticsearch

I have the following Elasticsearch, version 2.3, query which produces zero results.
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
Output from above query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Output of above query with _explain
{
"_index": "index_name",
"_type": "doc_type",
"_id": "_explain",
"_version": 4,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}
However, when I do either of the following I get results including the one document that matches both parts of the above query. If I include the full phone number then the document will appear in the results.
Phone numbers are stored as strings without any formatting. i.e. "1234567890".
Any reason why the two prefix query returns zero results?
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
}
]
}
}
}
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
I was able to get the results I wanted by changing the phone number query to a regexp query instead of a match_phrase_prefix query.
{
"query": {
"bool": {
"must": [
{
"regexp": {
"phone": "123[0-9]+"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}

Elastic Search Querying/filtering nested arrays

I have stored below type of nested data on my index test_agg in ES.
{
"Date": "2015-10-21",
"Domain": "abc.com",
"Processed_at": "10/23/2015 9:47",
"Events": [
{
"Name": "visit",
"Count": "188",
"Value_Aggregations": [
{
"Value": "red",
"Count": "100"
}
]
},
{
"Name": "order_created",
"Count": "159",
"Value_Aggregations": [
{
"Value": "$125",
"Count": "50"
}
]
},
]
}
mapping of the nested item is
curl -XPOST localhost:9200/test_agg/nested_evt/_mapping -d '{
"nested_evt":{
"properties":{
"Events": {
"type": "nested"
}
}
}
}'
I am trying to get "Events.Count" and "Events.Value_Aggregations.Count" where Events.Name='Visit' using the below query
{
"fields" : ["Events.Count","Events.Value_Aggregations.Count"]
"query": {
"filtered": {
"query": {
"match": { "Domain": "abc.com" }
},
"filter": {
"nested": {
"path": "Events",
"query": {
"match": { "Events.Name": "visit" }
},
}
}
}
}
}
instead of resulting single value
Events.Count=[188] Events.Value_Aggregations.Count=[100]
it gives
Events.Count=[188,159] Events.Value_Aggregations.Count=[100,50]
what is the exact query structure to get my desired output?
So the problem here is that the nested filter you are applying selects parent documents based on attributes of the nested child documents. So ES finds the parent document that matches your query (based on the document's nested children). Then, instead of returning the entire document, since you have specified "fields" it picks out only those fields that you have asked for. Those fields happen to be nested fields, and since the parent document has two nested children, it finds two values each for the fields you specified and returns them. To my knowledge there is no way to return the child documents instead, at least with a nested architecture.
One solution to this problem would be to use the parent/child relationship instead, then you could use a has_parent query in combination with the other filters, against the child type to get what you want. That would probably be a cleaner way to do this, as long as the schema architecture doesn't conflict with your other needs.
However, there is a way to do sort of what you are asking, with your current schema, with a nested aggregation combined with a filter aggregation. It's kind of involved (and slightly ambiguous in this case; see explanation below), but here's the query:
POST /test_index/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match": {
"Domain": "abc.com"
}
},
"filter": {
"nested": {
"path": "Events",
"query": {
"match": {
"Events.Name": "visit"
}
}
}
}
}
},
"aggs": {
"nested_events": {
"nested": {
"path": "Events"
},
"aggs": {
"filtered_events": {
"filter": {
"term": {
"Events.Name": "visit"
}
},
"aggs": {
"events_count_terms": {
"terms": {
"field": "Events.Count"
}
},
"value_aggregations_count_terms": {
"terms": {
"field": "Events.Value_Aggregations.Count"
}
}
}
}
}
}
}
}
which returns:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"nested_events": {
"doc_count": 2,
"filtered_events": {
"doc_count": 1,
"value_aggregations_count_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100",
"doc_count": 1
}
]
},
"events_count_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "188",
"doc_count": 1
}
]
}
}
}
}
}
Caveat: it's not clear to me whether you actually need the "filter": { "nested": { ... } } clause of the "query" in what I've shown here. If this part filters out parent documents in a useful way, then you need it. If your only intention was to select which nested child documents from which to return fields, then it's redundant here since the filter aggregation is taking care of that part.
Here is the code I used to test it:
http://sense.qbox.io/gist/dcc46e50117031de300b6f91c647fe9b729a5283
here is the parent/child relationship query which resulted my desired output
{
"query": {
"filtered": {
"query": {
"bool": {"must": [
{"term": {"Name": "visit"}}
]}
},
"filter":{
"has_parent": {
"type": "domain_info",
"query" : {
"filtered": {
"query": { "match_all": {}},
"filter" : {
"and": [
{"term": {"Domain": 'abc.com'}}
]
}
}
}
}
}
}
}
}

Trying to extract a leaf field from Elasticsearch

I have an object in elasticsearch which resembles something like this:
{
"text": "something something something",
"entities": { "hashtags":["test","test123"]}
}
The problem is that not each document has the entities attribute set. So I want to write a query which:
must contain a keyword in the text field
must have the entities field
extracts the entities.hashtag field
I'm trying to extract a leaf field using following query, the problem is I still get documents which don't have an entities field.
For the second part of the question, I was wondering: How do I only extract the entities.hashtags field? I tried something like "fields": ["entities.hashtags"] but it didn't work.
{
"size": 2000,
"query": {
"filtered": {
"query": {
"match_all": {
}
},
"filter": {
"bool": {
"must": [{
"term": {
"text": "something"
}
},
{
"missing": {
"field": "entities",
"existence": true
}
}]
}
}
}
}
}
This seems to do what you want, if I'm understanding you correctly. A "term" filter on the "text" field and an "exists" filter on the "entities" field filters the docs, and a "terms" aggregation on "entities.hashtags" extracts the values. I'll just post the full example I used:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1
}
}
PUT /test_index/doc/1
{
"text": "something something something",
"entities": { "hashtags": ["test","test123"] }
}
PUT /test_index/doc/2
{
"text": "another doc",
"entities": { "hashtags": ["testagain","testagain123"] }
}
PUT /test_index/doc/3
{
"text": "doc with no entities"
}
POST /test_index/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{ "term": { "text": "something" } },
{ "exists": { "field": "entities" } }
]
}
}
}
},
"aggs": {
"hashtags": {
"terms": {
"field": "entities.hashtags"
}
}
}
}
...
{
"took": 35,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"hashtags": {
"buckets": [
{
"key": "test",
"doc_count": 1
},
{
"key": "test123",
"doc_count": 1
}
]
}
}
}

Resources