Apply 'exists' query on array of nested objects in ElasticSearch - elasticsearch

I have a json document in elasticsearch complete/user with the following structure:
"_source":{
"experiences":[
{experience 1}, {experience 2}, ... , {experience n}
]
}
The structure of experience object in the experiences array is as follows:
{
"company":
{
"industry": industry of the comany,
"name": company name,
​ ... other fields
},
"start_date": date when person joined company,
... other fields
}
I want to find all such documents where neither of the experiences in experiences array have company.industry field i.e. all the experiences must have this field missing. Is there any query that I can use?
Thanks in advance

Try this out:
{
"query": {
"bool": {
"must_not": [
{
"exists": {"field" : "experiences.company.industry"}
}
]
}
}
}
This query will return all the documents where company.industry of all experience in experiences array is missing or null.
Example:
"_source": {
"experiences": [
{
"company": {
"name": "company name"
}
},
{
"company": {
"industry": null,
"name": "company name"
}
}
]
}
What this query will not return:
"_source": {
"experiences": [
{
"company": {
"industry": "IT",
"name": "company name"
}
},
{
"company": {
"industry": null,
"name": "company name"
}
}
]
}
Let me know if that helped.

Related

Elasticsearch - nested types vs collapse/aggs

I have a use case where I need to find the latest data based on some fields.
The fields are:
category.name
category.type
createdAt
For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")
The problem is that category.* is a nested field and I can't aggs/collapse these fields because ES doesn't support it.
Mapping:
PUT data
{
"mappings": {
"properties": {
"createdAt": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss.SSSZ"
},
"category": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
},
"approved": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
Data:
POST data/_create/1
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "John",
"createdAt": "2022-04-18 19:09:27.527+0200",
"approved": "no"
}
POST data/_create/2
{
"category": [
{
"name": "John G.",
"level": "A"
},
{
"name": "Chris T.",
"level": "A"
}
],
"createdBy": "Max",
"createdAt": "2022-04-10 10:09:27.527+0200",
"approved": "no"
}
POST data/_create/3
{
"category": [
{
"name": "Rick J.",
"level": "B"
}
],
"createdBy": "Rick",
"createdAt": "2022-03-02 02:09:27.527+0200",
"approved": "no"
}
I'm looking for either a search query that can handle that in an acceptable performant way, or a new object design without nested type where I could take advantage of aggs/collapse feature.
Any suggestion will be really appreciated.
About your first question,
For example: search for the newest data where category.name = 'John G.' AND category.type = 'A'. I expect the data with ID = 1 where it matches the criteria and is the newest one based on createdAt field ("createdAt": "2022-04-18 19:09:27.527+0200")
I believe you can do something along those lines:
GET /72088168/_search
{
"query": {
"nested": {
"path": "category",
"query": {
"bool": {
"must": [
{
"match": {
"category.name": "John G."
}
},
{
"match": {
"category.level": "A"
}
}
]
}
}
}
},
"sort": [
{
"createdAt": {
"order": "desc"
}
}
],
"size":1
}
For the 2nd matter, it really depends on what you are aiming to do. could merge category.name and category.level in the same field. Such that you document would look like:
{
"category": ["John G. A","Chris T. A"],
"createdBy": "Max",
"createdAt": "2022-04-10 10:09:27.527+0200",
"approved": "no"
}
No more nested needed. Although I agree it feels like using tape to fix your issue.

Query elasticsearch nested field by index(order of insert)

I have an elasticsearch document with some nested objects(mapped as nested field)
for example:
{
"FirstName": "Test",
"LastName": "Test",
"Cost": 322.54,
"Email": "test#test.com",
"Vehicles": [
{
"Year": 2000,
"Make": "Mazda",
"Model": "6"
},
{
"Year": 2012,
"Make": "Ford",
"Model": "F150"
}
]
}
i am trying to do aggregations on specific index of the array, for example i want to sum the cost of documents which has Ford make but only on the first vehicle.
is it even possible at all? there is almost no information on the internet about elasticsearch nested fields and nothing about their index/order
It is possible to achieve what you want, but you also need to add the index order as a field inside your nested documents:
{
"FirstName": "Test",
"LastName": "Test",
"Cost": 322.54,
"Email": "test#test.com",
"Vehicles": [
{
"Year": 2000,
"Make": "Mazda",
"Model": "6",
"Index": 0
},
{
"Year": 2012,
"Make": "Ford",
"Model": "F150",
"Index": 1
}
]
}
And then you can query your index using the two conditions on Index and the Make like this:
{
"query": {
"nested": {
"path": "Vehicles",
"query": {
"bool": {
"filter": [
{
"match": {
"Vehicles.Index": 0
}
},
{
"match": {
"Vehicles.Make": "Ford"
}
}
]
}
}
}
}
}
In this specific case, the query is not going to yield any results, as you expect.

Term aggregation on ElasticSearch join

I would like to perform an aggregation on a join relation using ElasticSearch 7.7.
I need to know how many children I have for each parent.
The only way that I found to solve my issue is to use script inside term aggregation, but my concern is about performance.
/my_index/_search
{
"size": 0,
"aggs": {
"total": {
"terms": {
"script": {
"lang": "painless",
"source": "params['_source']['my_join']['parent']"
}
}
},
"max_total": {
"max_bucket": {
"buckets_path": "total>_count"
}
}
}
}
Someone knows a more fast way to execute this aggregation avoiding the script?
If the join field wasn't a parent/child I could replace the term aggregation with:
"terms": { "field": "my_field" }
To give more context I add some information about mapping:
I'm using Elastic 7.7.
I also attach a mapping with some sample documents:
{
"mappings": {
"properties": {
"my_join": {
"relations": {
"other": "doc"
},
"type": "join"
},
"reader": {
"type": "keyword"
},
"name": {
"type": "text"
},
"content": {
"type": "text"
}
}
}
}
PUT example/_doc/1
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/2
{
"reader": [
"A",
"B"
],
"my_join": {
"name": "other"
}
}
PUT example/_doc/3
{
"content": "abc",
"my_join": {
"name": "doc",
"parent": 1
}
}
PUT example/_doc/4
{
"content": "def",
"my_join": {
"name": "doc"
"parent": 2
}
}
PUT example/_doc/5
{
"content": "def",
"acl_join": {
"name": "doc"
"parent": 1
}
}

Elasticsearch query fails to return results when querying a nested object

I have an object which looks something like this:
{
"id": 123,
"language_id": 1,
"label": "Pablo de la Pena",
"office": {
"count": 2,
"data": [
{
"id": 1234,
"is_office_lead": false,
"office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
},
{
"id": 5678,
"is_office_lead": false,
"office": {
"id": 2,
"address_line_1": "77 High Road",
"address_line_2": "Edinburgh",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "EH1 2DE",
"city_id": 2
}
}
]
},
"primary_office": {
"id": 1,
"address_line_1": "123 Main Street",
"address_line_2": "London",
"address_line_3": "",
"address_line_4": "UK",
"address_postcode": "E1 2BC",
"city_id": 1
}
}
My Elasticsearch mapping looks like this:
"mappings": {
"item": {
"properties": {
"office": {
"properties": {
"data": {
"type": "nested",
}
}
}
}
}
}
My Elasticsearch query looks something like this:
GET consultant/item/_search
{
"from": 0,
"size": 24,
"query": {
"bool": {
"must": [
{
"term": {
"language_id": 1
}
},
{
"term": {
"office.data.office.city_id": 1
}
}
]
}
}
}
This returns zero results, however, if I remove the second term and leave it only with the language_id clause, then it works as expected.
I'm sure this is down to a misunderstading on my part of how the nested object is flattened, but I'm out of ideas - I've tried all kinds of permutations of the query and mappings.
Any guidance hugely appreciated. I am using Elasticsearch 6.1.1.
I'm not sure if you need the entire record or not, this solution gives every record that has language_id: 1 and has an office.data.office.id: 1 value.
GET consultant/item/_search
{
"from": 0,
"size": 100,
"query": {
"bool":{
"must": [
{
"term": {
"language_id": {
"value": 1
}
}
},
{
"nested": {
"path": "office.data",
"query": {
"match": {
"office.data.office.city_id": 1
}
}
}
}
]
}
}
}
I put 3 different records in my test index for proofing against false hits, one with different language_id and one with different office ids and only the matching one returned.
If you only need the office data, then that's a bit different but still solvable.

Is it possible to update to a document with array of fields in elasticsearch using update_by_query API?

I have a document with the following content in all the documents in an index:
"universities": {
"number": 1,
"state": [
{
"Name": "michigan",
"country": "us",
"code": 5696
}
]
}
I want to update all the documents in the index like this
"universities": {
"number": 1,
"state": [
{
"Name": "michigan",
"country": "us",
"code": 5696
},
{
"Name": "seatle",
"country": "us",
"code": 5695
}
]
}
IS this can be possible using update_by_query in elasticsearch 2.4.1?
I tried the below query:
"script": {
"inline": "for(i in ctx._source.univeristies.state){i.name=Text}",
"params": {
"Text": "seatle"
}
}
}
but is appending the name to existing one rather than creating a new one in a list.
You need to use this script instead:
"script": {
"inline": "ctx._source.universities.state.add(new_state)",
"params": {
"new_state": {
"Text": "Seattle",
"country": "us",
"code": 5695
}
}
}
}
UPDATE:
For later versions of ES (6+), the query looks like this instead:
"script": {
"source": "ctx._source.universities.state.add(params.new_state)",
"params": {
"new_state": {
"Text": "Seattle",
"country": "us",
"code": 5695
}
}
}
}

Resources