Elasticsearch query nested object - elasticsearch

I have this record in elastic:
{
"FirstName": "Winona",
"LastName": "Ryder",
"Notes": "<p>she is an actress</p>",
"Age": "40-50",
"Race": "Caucasian",
"Gender": "Female",
"HeightApproximation": "No",
"Armed": false,
"AgeCategory": "Adult",
"ContactInfo": [
{
"ContactPoint": "stranger#gmail.com",
"ContactType": "Email",
"Details": "Details of tv show",
}
]
}
I want to query inside the contact info object and I used the query below but I dont get any result back:
{
"query": {
"nested" : {
"path" : "ContactInfo",
"query" : {
"match" : {"ContactInfo.Details" : "Details of tv show"}
}
}
}
}
I also tried:
{
"query": {
"term" : { "ContactInfo.ContactType" : "email" }
}
}
here is the mapping for contact info:
"ContactInfo":{
"type": "object"
}
I think I know the issue which is the field is not set as nested in mapping, is there a way to still query nested without changing the mapping, I just want to avoid re-indexing data if its possible.
I'm pretty new to elastic search so need your help.
Thanks in advance.

Elasticsearch has no concept of inner objects.
Some important points from Elasticsearch official documentation on Nested field type
The nested type is a specialized version of the object data type that allows arrays of objects to be indexed in a way that they can be queried independently of each other.
If you need to index arrays of objects and to maintain the independence of each object in the array, use the nested datatype instead of the object data type.
Internally, nested objects index each object in the array as a separate hidden document, such that that each nested object can be queried independently of the others with the nested query.
Refer to this SO answer, to get more details on this
Adding a working example with index mapping, search query, and search result
You have to reindex your data, after applying nested data type
Index Mapping:
{
"mappings": {
"properties": {
"ContactInfo": {
"type": "nested"
}
}
}
}
Search Query:
{
"query": {
"nested" : {
"path" : "ContactInfo",
"query" : {
"match" : {"ContactInfo.Details" : "Details of tv show"}
}
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64269180",
"_type": "_doc",
"_id": "1",
"_score": 1.1507283,
"_source": {
"FirstName": "Winona",
"LastName": "Ryder",
"Notes": "<p>she is an actress</p>",
"Age": "40-50",
"Race": "Caucasian",
"Gender": "Female",
"HeightApproximation": "No",
"Armed": false,
"AgeCategory": "Adult",
"ContactInfo": [
{
"ContactPoint": "stranger#gmail.com",
"ContactType": "Email",
"Details": "Details of tv show"
}
]
}
}
]

Related

How to formulate an and condition for a nested object in Opensearch?

Opensearch ingests documents similar to this example (its just a minimal example):
PUT nested_test/_doc/4
{
"log": "This is a fourth log message",
"function": "4 test function",
"related_objects": [
{ "type": "user", "id": "10" },
{ "type": "offer", "id": "120" }
]
}
PUT nested_test/_doc/5
{
"log": "This is a fifth log message",
"function": "5 test function",
"related_objects": [
{ "type": "user", "id": "120" },
{ "type": "offer", "id": "90" }
]
}
With many of these documents, I'd like to filter those which have a specific related object (e.g. type=user and id=120). With the example data above, this should only return the document with id 5. Using simple filters (DQL syntax) as follows does not work:
related_objects.type:user and related_objects.id:120
As this would also match a document 5, as there is a related_object with type user and a related object with id 120, although its not the related user object with id 120, its the related offer.
If Array[object] is used, the field type is nested, The document reference
Elasticsearch query example:
{
"query" : {
"nested" : {
"path" : "related_objects",
"query" : {
"bool" : {
"must" : [
{
"term" : {"related_objects.type" : "MYTYPE"}
},
{
"term" : {"related_objects.id" : "MYID"}
}
]
}
}
}
}
}
Basically just go into a nested query and specify all your AND conditions as MUST clauses inside a bool query.
As soon as the field is declared as nested field, it is possible to run a simple DQL query to get the desired information:
related_objects:{type:"user" and id:120}
This requires that the field has been defined as nested before:
PUT my-index-000001
{
"mappings": {
"properties": {
"related_objects": {
"type": "nested"
}
}
}
}

Elasticsearch merge multiple indexes based on common field

I'm using ELK to generate views out of the data from two different DB. One is mysql other one is PostgreSQL. There is no way of writing join query between those two DB instance. But I have a common field call "nic". Following are the documents from each index.
MySQL
index: user_detail
"_id": "871123365V",
"_source": {
"type": "db-poc-user",
"fname": "Iraj",
"#version": "1",
"field_lname": "Sanjeewa",
"nic": "871456365V",
"#timestamp": "2020-07-22T04:12:00.376Z",
"id": 2,
"lname": "Santhosh"
}
PostgreSQL
Index: track_details
"_id": "871456365V",
"_source": {
"#version": "1",
"nic": "871456365V",
"#timestamp": "2020-07-22T04:12:00.213Z",
"track": "ELK",
"type": "db-poc-ceg"
},
I want to merge both index in to single index using common field "nic". And create new index. So I can create visualization on Kibana. How can this be achieved?
Please note that each document in new index should have
"nic,fname,lname,track" as fields. Not the aggregation.
I would leverage the enrich processor to achieve this.
First, you need to create an enrich policy (use the smallest index, let's say it's user_detail):
PUT /_enrich/policy/user-policy
{
"match": {
"indices": "user_detail",
"match_field": "nic",
"enrich_fields": ["fname", "lname"]
}
}
Then you can execute that policy in order to create an enrichment index
POST /_enrich/policy/user-policy/_execute
The next step requires you to create an ingest pipeline that uses the above enrich policy/index:
PUT /_ingest/pipeline/user_lookup
{
"description" : "Enriching user details with tracks",
"processors" : [
{
"enrich" : {
"policy_name": "user-policy",
"field" : "nic",
"target_field": "tmp",
"max_matches": "1"
}
},
{
"script": {
"if": "ctx.tmp != null",
"source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
}
},
{
"remove": {
"field": ["#version", "#timestamp", "type"]
}
}
]
}
Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:
POST _reindex
{
"source": {
"index": "track_details"
},
"dest": {
"index": "user_tracks",
"pipeline": "user_lookup"
}
}
After running this, the user_tracks index will contain exactly what you need, for instance:
{
"_index" : "user_tracks",
"_type" : "_doc",
"_id" : "0uA8dXMBU9tMsBeoajlw",
"_score" : 1.0,
"_source" : {
"fname" : "Iraj",
"nic" : "871456365V",
"lname" : "Santhosh",
"track" : "ELK"
}
}
If your source indexes ever change (new users, changed names, etc), you'll need to re-run the above steps, but before doing it, you need to delete the ingest pipeline and the ingest policy (in that order):
DELETE /_ingest/pipeline/user_lookup
DELETE /_enrich/policy/user-policy
After that you can freely re-run the above steps.
PS: Just note that I cheated a bit since the record in user_detail doesn't have the same nic in your example, but I guess it was a copy/paste issue.

Aggregate by fields in _source

I have an index in elasticsearch with documents that look like this:
"hits": [
{
"_index": "my-index2",
"_type": "my-type",
"_id": "1",
"_score": 1,
"_source": {
"entities": {
"persons": [
"Kobe Bryant",
"Michael Jordan"
],
"dates": [
"Yesterday"
],
"locations": [
"Munich",
"New York"
]
},
"my_field": "Kobe Bryant was one of the best basketball players of all times. Not even Michael Jordan has ever scored 81 points in one game. Munich is really an awesome city, but New York is as well. Yesterday has been the hottest day of the year."
}
}
Is it possible to use the aggregate function to aggregate by fields in the entities object? I tried this and it didn't work
{
"aggs" : {
"avg_date" : {
"avg" : {
"script" : {
"source" : "doc.entities.dates"
}
}
}
}
}
The error said that my index doesn't have an entities field.
EDIT: With the following term aggregation query:
{
"aggs" : {
"dates" : {
"terms" : { "field" : "entities.dates" }
}
}
}
I get an error saying
Fielddata is disabled on text fields by default. Set fielddata=true on [entities.dates] in order to load fielddata in memory by uninverting the inverted index.
I can set fielddata=true like the error says I should however the documentation warns against this because it uses a lot of heap space.Is there another way I can do this query?
EDIT 2: Solved this by setting all fields in entities to keywords in the index.

Elastic Search Query for Multi-valued Data

ES Data is indexed like this :
{
"addresses" : [
{
"id" : 69,
"location": "New Delhi"
},
{
"id" : 69,
"location": "Mumbai"
}
],
"goods" : [
{
"id" : 396,
"name" : "abc",
"price" : 12500
},
{
"id" : 167,
"name" : "XYz",
"price" : 12000
},
{
"id" : 168,
"name" : "XYz1",
"price" : 11000
},
{
"id" : 169,
"name" : "XYz2",
"price" : 13000
}
]
}
In my query I want to fetch records which should have at-least one of the address matched and goods price range between 11000 and 13000 and name xyz.
When your data contains arrays of complex objects like a list of addresses or a list of goods, you probably want to have a look at elasticsearch's nested objects to avoid running into problems when your queries result in more items than you would expect.
The issue here is the way how elasticsearch (and in effect lucene) stores the data. As there is no such concept of lists of nested objects directly, the data is flattened and the connection between e.g. XYz and 12000 is lost. So you would also get this document as result when you query for XYz and 12500 as the price of 12500 is also there in the list of values for goods.price. To avoid this, you can use the nested objects feature of elasticsearch which basically extracts all inner objects into a hidden index and allows querying for several fields that occur in one specific object instead of "in any of the objects". For more details, have a look at the docs on nested objects which also explains this pretty good.
In your case a mapping could look like the following. I assume, you only want to query for the addresses.location text without providing the id, so that this list can remain the simple object type instead of also being a nested type. Also, I assume you query for exact matches. If this is not the case, you need to switch from keyword to text and adapt the term query to be some match one...
PUT nesting-sample
{
"mappings": {
"item": {
"properties": {
"addresses": {
"properties": {
"id": {"type": "integer"},
"location": {"type": "keyword"}
}
},
"goods": {
"type": "nested",
"properties": {
"id": {"type": "integer"},
"name": {"type": "keyword"},
"price": {"type": "integer"}
}
}
}
}
}
}
You can then use a bool query on the location and a nested query to match the inner documents of your goods list.
GET nesting-sample/item/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"addresses.location": "New Delhi"
}
},
{
"nested": {
"path": "goods",
"query": {
"bool": {
"must": [
{
"range": {
"goods.price": {
"gte": 12200,
"lt": 12999
}
}
},
{
"term": {
"goods.name": {
"value": "XYz"
}
}
}
]
}
}
}
}
]
}
}
}
This query will not match the document because the price range is not in the same nested object as the exact name of the good. If you change the lower bound to 12000 it will match.
Please check your use case and be aware of the warning on the bottom of the docs regarding the mapping explosion when using nested fields.

Elasticsearch shuffle index sorting

Thanks in advance. I expose the situation first and in the end the solution.
I have a collection of 2M documents with the following mapping:
{
"image": {
"properties": {
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"title": {
"type": "string"
},
"url": {
"type": "string"
}
}
}
}
I have a webpage which paginates through all the documents with the following search:
{
"from":STARTING_POSITION_NUMBER,
"size":15,
"sort" : [
{ "_id" : {"order" : "desc"}}
],
"query" : {
"match_all": {}
}
}
And a hit looks like this(note that the _id value is a hash of the url to prevent duplicated documents):
{
"_index": "images",
"_type": "image",
"_id": "2a750a4817bd1600",
"_score": null,
"_source": {
"url": "http://test.test/test.jpg",
"timestamp": "2014-02-13T17:01:40.442307",
"title": "Test image!"
},
"sort": [
null
]
}
This works pretty well. The only problem I have is that the documents appear sorted chronologically (The oldest documents appear on the first page, and the ones indexed more recently on the last page), but I want them to appear on a random order. For example, page 10 should always show always the same N documents, but they don't have to appear sorted by the date.
I though of something like sorting all the documents by their hash, which is kind of random and deterministic. How could I do it?
I've searched on the docs and the sorting api just works for sorting the results, not the full index. If I don't find a solution I will pick documents randomly and index them on a separated collection.
Thank you.
I solved it using the following search:
{
"from":STARTING_POSITION_NUMBER,
"size":15,
"query" : {
"function_score": {
"random_score": {
"seed" : 1
}
}
}
}
Thanks to David from the Elasticsearch mailing list for pointing out the function score with random scoring.

Resources