Retrieve documents based on an element inside an array in each document - elasticsearch

I am working on an elastic query which need to return all the documents based on an attribute which is inside the first element in an array in the document. Please refer the document structure below.
User document
{
"name": "Sam",
"age": 20,
"vehicle": [
{
"type": "car",
"capacity": 4,
"registration": {
"date": "20.02.2020",
"plate": null
}
}
]
}
Every document only having 1 element in vehicle array.
Above is a sample document in my ES and the requirement is to get all the documents which has "null" value for plate attribute (similar to the example) in collection. I have tried various queries for two days but non of them got succeeded and got errors in query. What's the solution to this?

My suggestion is check any field exists inside field "vehicle" for example: field "type".
{
"query": {
"nested": {
"path": "vehicle",
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "vehicle.registration.plate"
}
}
]
}
}
}
}
}

Related

Return documents starting from ID, sorted by timestamp

Say I have an index with the following documents:
{
"id": "8e8e3c0c-5d1d-4a3c-a78a-1bd2d206b39e",
"timestamp": "2022-10-18T00:00:02"
}
{
"id": "0ebeb7b1-dcd0-4b37-a70d-fa7377f07f8c",
"timestamp": "2022-10-18T00:00:03"
}
{
"id": "ea779299-1781-4465-b8a1-53f7b14fbe0c",
"timestamp": "2022-10-18T00:00:01"
}
{
"id": "3624a119-4830-4ec2-a840-f656c048fc5c",
"timestamp": "2022-10-18T00:00:04"
}
I need a search query that returns documents from a specified id, sorted by timestamp up to a limit (say 100). So given the id of 8e8e3c0c-5d1d-4a3c-a78a-1bd2d206b39e, the following documents will be returned (in this exact order, note that document with id ea779299-1781-4465-b8a1-53f7b14fbe0c is missing because its timestamp is earlier than the document I'm looking for):
{
"id": "8e8e3c0c-5d1d-4a3c-a78a-1bd2d206b39e",
"timestamp": "2022-10-18T00:00:02"
}
{
"id": "0ebeb7b1-dcd0-4b37-a70d-fa7377f07f8c",
"timestamp": "2022-10-18T00:00:03"
}
{
"id": "3624a119-4830-4ec2-a840-f656c048fc5c",
"timestamp": "2022-10-18T00:00:04"
}
I know how to do this in two queries by first getting the document by its id, and then another query to get all the documents "after" that document's timestamp, but I'm hopeful there's a more efficient way to do this using one single query?
Note that the index is expected to have tens/hundreds of millions of documents, so performance concerns are a factor (I'm unsure what "work" ES is doing under the covers, such as sorting first and then visiting each document to check the id), although the cluster will be sized appropriately.
You can use below bool query which will give you your expected result. match_all inside must will return all the documents and term inside should clause will boost the document where ID is matching.
If your id field is defined as keyword type then use id only in term query and if it is defined as text and keyword both then use id.keyword.
{
"size": 100,
"sort": [
{
"_score": "desc"
},
{
"timestamp": {
"order": "asc"
}
}
],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"should": [
{
"term": {
"id.keyword": {
"value": "8e8e3c0c-5d1d-4a3c-a78a-1bd2d206b39e"
}
}
}
]
}
}
}

How to aggregate on the same field:value which are specified in query in elasticsearch

So my data in elasticsearch looks like this one whole dict with one person id is equal to one doc and it contains list of objects like
`{
"dummy_name": "abc",
"dummy_id": "44850642"
}`
which is shown below ,the thing is I am querying on the field dummy_id and I am getting result as some no. of matching query results, and I want to aggregate on dummy_id field so I'll get no of docs for a specific dummy_id, but what happening is I am also getting the buckets of dummy_id which are not mentioned in the query its self as person contains list of objects in which dummy_id is present.
`{
"person_id": 1234,
"Properties": {
"Property1": [
{
"dummy_name": "abc",
"dummy_id": "44850642"
},
{
},
{
}
]
}
},
{
"person_id": 1235,
.........
}`
Query Iam using:
`{
"query": {
"bool": {
"must": [
{
"match": {
"Properties.Property1.dummy_id": "453041 23234324 124324 "
}
}
]
}
},
"aggregations": {
"group_by_concept": {
"terms": {
"field": "Properties.Property1.dummy_id",
"order": {
"_count": "desc"
},
"size": 10
}
}
}
}`
The problem which is coming is how are you keeping the data.
For eg In this document
{
"person_id": 1234,
"Properties": {
"Property1": [
{
"dummy_name": "abc",
"dummy_id": "44850642"
},
{
"dummy_name": "dfg",
"dummy_id": "876468"
},
{
}
]
}
}
The tokens that would be generated in this document would be
Dummy id tokens - 44850642,876468.This is how data is kept in backend in Lucene
So when you would query for dummy_id:44850642
you would get the document, but aggregations aggregates on terms produced by the documents matching the query
So as a result you would see buckets of 44850642 as well as 876468.
For more information on how elasticsearch keeps data of a list of objects , here is the link - https://www.elastic.co/guide/en/elasticsearch/reference/current/array.html

Needs to return only the matched nested objects with full parent body in Elasticsearch

I am using Elastic search version 1.7 in my project. I have a an index named colleges and under this index there is a nested index name courses like this.
{
"name": "College Name"
"university": "University Name",
"city": 429,
"city_name": "London",
"state": 328,
"state_name": "London",
"courses": [
{
"id": 26,
"degree_name": "Master Of Technology",
"annual_fee": 100000,
"stream": "Engineering",
"degree_id": 9419
},
{
"id": 28,
"degree_name": "Master Of Philosophy",
"annual_fee": 100000,
"stream": "Philosophy",
"degree_id": 9420
}
]
}
What I am doing is that I am trying to filter the the colleges based on state and degree_id which is nested under courses provided by the College. I want to return the full body or all the fields of the parent object i.e colleges and only those courses which matches the query.
The query I return to accomplish the task is
{
"_source": false,
"query": {
"bool": {
"must": [
{
"term": {
"state": "328"
}
},
{
"nested": {
"path": "courses",
"query": {
"term": {
"courses.degree_id": 9419
}
}
}
}
]
}
}
}
This query is working fine and returning me only those nested objects which matches the query but the wrong with this query is that i declared "_source":false in the parent object.
if I declare"_source": truethen it returns me all the nested objects whether they meets the query or not. And the second way the query is working fine is to declare field names in"_source": ["field1", "field2", .... "field100"]. But I have about 50 fields in the parent or colleges index. So i don't want to declare all the field names in_source. Is there any other way to accomplish this without declaring all the field names in_source`.

Elasticsearch: Search in an array of JSONs

I'm using Elasticsearch with the python library and I have a problem using the search query when the object become a little bit complex. I have objects build like that in my index:
{
"id" : 120,
"name": bob,
"shared_status": {
"post_id": 123456789,
"text": "This is a sample",
"urls" : [
{
"url": "http://test.1.com",
"displayed_url": "test.1.com"
},
{
"url": "http://blabla.com",
"displayed_url": "blabla.com"
}
]
}
}
Now I want to do a query that will return me this document only if in one of the displayed URL's a substring "test" and there is a field "text" in the main document. So I did this query:
{
"query": {
"bool": {
"must": [
{"exists": {"field": "text"}}
]
}
}
}
}
But I don't know what query to add for the part: one of the displayed URL's a substring "test"
Is that posssible? How does the iteration on the list works?
If you didn't define an explicit mapping for your schema, elasticsearch creates a default mapping based on the data input.
urls will be of type object
displayed_url will be of type string and using standard analyzer
As you don't need any association between url and displayed_url, the current schema will work fine.
You can use a match query for full text match
GET _search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "text"
}
},
{
"match": {
"urls.displayed_url": "test"
}
}
]
}
}
}

How to check field data is numeric when using inline Script in ElasticSearch

Per our requirement we need to find the max ID of the document before adding new document. Problem here is doc may contain string data also So had to use inline script on the elastic query to find out max id only for the document which has integer data otherwise returning 0. am using following inline script query to find max-key but not working. can you help me onthis ?.
{
"size":0,
"query":
{"bool":
{"filter":[
{"term":
{"Name":
{
"value":"Test2"
}
}}
]
}},
"aggs":{
"MaxId":{
"max":{
"field":"Key","script":{
"inline":"((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"}}
}
}
}
The error is because the max aggregation only supports numeric fields, i.e. you cannot specify a string field (i.e. Key) in a max aggregation.
Simply remove the "field":"Key" part and only keep the script part
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"Name": "Test2"
}
}
]
}
},
"aggs": {
"MaxId": {
"max": {
"script": {
"source": "((doc['Key'].value).isNumber()) ? Integer.parseInt(doc['Key'].value) : 0"
}
}
}
}
}

Resources