Elasticsearch - How do i search on 2 fields. 1 must be null and other must match search text - elasticsearch

I am trying to do a search on elasticsearch 6.8.
I don't have control over the elastic search instance, meaning i cannot control how the data is indexed.
I have data structured like this when i do a match. all search:
{ "took": 4,
"timed_out": false,
"_shards": {
"total": 13,
"successful": 13,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 15.703552,
"hits": [ {
"_index": "(removed index)",
"_type": "_doc",
"_id": "******** (Removed id)",
"_score": 15.703552,
"_source": {
"VCompany": {
"cvrNummer": 12345678,
"penheder": [
{
"pNummer": 1234567898,
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
}
],
"vMetadata": {
"nyesteNavn": {
"navn": "company1",
"periode": {
"gyldigFra": "2013-04-10",
"gyldigTil": "2014-09-30"
}
},
}
}
}
}
}]
The json might not be fully complete because i removed some unneeded data. So what I am trying to do is search where: "vCompany.vMetaData.nyesteNavn.gyldigTil" is null and where "vCompany.vMetaData.nyesteNavn.navn" will match a text string.
I tried something like this:
{
"query": {
"bool": {
"must": [
{"match": {"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"}}
],
"should": {
"terms": {
"Vrvirksomhed.penheder.periode.gyldigTil": null
}
}
}
}

You need to use must_not with exists query like below to check if field is null or not. Below query will give result where company1 is matching and Vrvirksomhed.penheder.periode.gyldigTil field is null.
{
"query": {
"bool": {
"must": [
{
"match": {
"Vrvirksomhed.virksomhedMetadata.nyesteNavn.navn": "company1"
}
}
],
"must_not": [
{
"exists": {
"field": "Vrvirksomhed.penheder.periode.gyldigTil"
}
}
]
}
}
}

Related

Is it possible to use a query result into another query in ElasticSearch?

I have two queries that I want to combine, the first one returns a document with some fields.
Now I want to use one of these fields into the new query without creating two separates ones.
Is there a way to combine them in order to accomplish my task?
This is the first query
{
"_source": {
"includes": [
"data.session"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"field1": "9419"
}
},
{
"match": {
"field2": "5387"
}
}
],
"filter": [
{
"range": {
"timestamp": {
"time_zone": "+00:00",
"gte": "2020-10-24 10:16",
"lte": "2020-10-24 11:16"
}
}
}
]
}
},
"size" : 1
}
And this is the response returned:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 109,
"relation": "eq"
},
"max_score": 3.4183793,
"hits": [
{
"_index": "file",
"_type": "_doc",
"_id": "UBYCkgsEzLKoXh",
"_score": 3.4183793,
"_source": {
"data": {
"session": "123456789"
}
}
}
]
}
}
I want to use that "data.session" into another query, instead of rewriting the value of the field by passing the result of the first query.
{
"_source": {
"includes": [
"data.session"
]
},
"query": {
"bool": {
"must": [
{
"match": {
"data.session": "123456789"
}
}
]
}
},
"sort": [
{
"timestamp": {
"order": "asc"
}
}
]
}
If you mean to use the result of the first query as an input to the second query, then it's not possible in Elasticsearch. But if you share your query and use-case, we might suggest you better way.
ElasticSearch does not allow sub queries or inner queries.

Elasticsearch separate aggregation based on values from first

I'm using a Elasticsearch 6.8.8 and trying to aggregate the number of entities and relationships over a given time period.
Here is the data structure and examples values of the index:
date entityOrRelationshipId startId endId type
=========================================================================
DATETIMESTAMP ENT1_ID null null ENTITY
DATETIMESTAMP ENT2_ID null null ENTITY
DATETIMESTAMP ENT3_ID null null ENTITY
DATETIMESTAMP REL1_ID ENT1_ID ENT2_ID RELATIONSHIP
DATETIMESTAMP REL2_ID ENT3_ID ENT1_ID RELATIONSHIP
etc.
For a given entity ID, I want to get the top 50 relationships. I have started with the following query.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"gte": "2020-04-01T00:00:00.000+00:00",
"lt": "2020-04-28T00:00:00.000+00:00"
}
}
}
]
}
},
"aggs": {
"my_rels": {
"filter": {
"bool": {
"must": [
{
"term": {
"type": "RELATIONSHIP"
}
},
{
"bool": {
"should": [
{
"term": {"startId": "ENT1_ID"}
},
{
"term": {"endId": "ENT1_ID"}
}
]
}
}
]
}
},
"aggs": {
"my_rels2": {
"terms": {
"field": "entityOrRelationshipId",
"size": 50
},
"aggs": {
"my_rels3": {
"top_hits": {
"_source": {
"includes": ["startId","endId"]
},
"size": 1
}
}
}
}
}
}
}
}
This produces the following results:
{
"took": 54,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 93122,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"my_rels": {
"doc_count": 332,
"my_rels2": {
"doc_count_error_upper_bound": 6,
"sum_other_doc_count": 259,
"buckets": [
{
"key": "REL1_ID",
"doc_count": 47,
"my_rels3": {
"hits": {
"total": 47,
"max_score": 1.0,
"hits": [
{
"_index": "trends",
"_type": "trend",
"_score": 1.0,
"_source": {
"endId": "ENT2_ID",
"startId": "ENT1_ID"
}
}
]
}
}
},
{
"key": "REL2_ID",
"doc_count": 26,
"my_rels3": {
"hits": {
"total": 26,
"max_score": 1.0,
"hits": [
{
"_index": "trends",
"_type": "trend",
"_score": 1.0,
"_source": {
"endId": "ENT1_ID",
"startId": "ENT3_ID"
}
}
]
}
}
}
]
}
}
}
}
This lists the top 50 relationships. For each relationship it lists the relationship ID, the count and the entity ids (startId, endId). What I would like to do now is produce another aggregation of entity counts for those distinct entities. Ideally this would not be a nested aggregation but a separate one using the rel ids identified in the first aggregation.
Is that possible to do in this query?
Unfortunately you cannot aggregate over the results of top_hits in Elasticsearch.
Here is the link to GitHub issue.
You can have other aggregation on a parallel level of top_hit but you cannot have any sub_aggregation below top_hit.
You can have a parallel level aggregation like:
"aggs": {
"top_hits_agg": {
"top_hits": {
"size": 10,
"_source": {
"includes": ["score"]
}
}
},
"avg_agg": {
"avg": {
"field": "score"
}
}
}

Multiple Match Phrase Prefixes Return Zero Results In Elasticsearch

I have the following Elasticsearch, version 2.3, query which produces zero results.
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
Output from above query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
Output of above query with _explain
{
"_index": "index_name",
"_type": "doc_type",
"_id": "_explain",
"_version": 4,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": false
}
However, when I do either of the following I get results including the one document that matches both parts of the above query. If I include the full phone number then the document will appear in the results.
Phone numbers are stored as strings without any formatting. i.e. "1234567890".
Any reason why the two prefix query returns zero results?
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"phone": "123"
}
}
]
}
}
}
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}
I was able to get the results I wanted by changing the phone number query to a regexp query instead of a match_phrase_prefix query.
{
"query": {
"bool": {
"must": [
{
"regexp": {
"phone": "123[0-9]+"
}
},
{
"match_phrase_prefix": {
"firstname": "First"
}
}
]
}
}
}

Elastic Search fulltext search query and filters

I wanna perform a full-text search, but I also wanna use one or many possible filters. The simplified structure of my document, when searching with /things/_search?q=*foo*:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "things",
"_type": "thing",
"_id": "63",
"_score": 1,
"fields": {
"name": [
"foo bar"
],
"description": [
"this is my description"
],
"type": [
"inanimate"
]
}
}
]
}
}
This works well enough, but how do I combine filters with a query? Let's say I wanna search for "foo" in an index with multiple documents, but I only want to get those with type == "inanimate"?
This is my attempt so far:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*foo*"
}
},
"filter": {
"bool": {
"must": {
"term": { "type": "inanimate" }
}
}
}
}
}
}
When I remove the filter part, it returns an accurate set of document hits. But with this filter-definition it does not return anything, even though I can manually verify that there are documents with type == "inanimate".
Since you have not done explicit mapping, term query is looking for an exact match. you need to add "index : not_analyzed" to type field and then your query will work.
This will give you correct documents
{
"query": {
"match": {
"type": "inanimate"
}
}
}
but this is not the solution, You need do explicit mapping as I said.

Is there any method in Elastic Search to get result in case of misspelling?

I want to know if it's possible to search among the data in case of misspelling like we search in google.
Currently this query returns thousands of results:
{
"query": {
"query_string": {
"query": "obama"
}
}
}
but when I change it to:
{
"query": {
"query_string": {
"query": "omama"
}
}
}
"obama" replaced with "omama" there is no result. is it possible to get results in case of wrong spelling?
I think what you are looking for is Fuzzy Query .
{
"query": {
"fuzzy": {
"field_name" : "omama"
}
}
}
If you are run this on single field the you can use fuzzy query like this field
{
"fuzzy_like_this_field" : {
"name.first" : {
"like_text" : "omama",
"max_query_terms" : 12
}
}
}
You can also check Phonetic Matching
https://github.com/elasticsearch/elasticsearch-analysis-phonetic
Simply use a fuzzy query, (documentation) :
{
"query": {
"fuzzy": {
"name": "omama"
}
}
}
You should get your result :
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2.7917595,
"hits": [
{
"_index": "test",
"_type": "obama",
"_id": "D_ovfcHkQwODdftWM4_z1Q",
"_score": 2.7917595,
"_source": {
"name": "obama"
}
}
]
}
}

Resources