Object Array search support in Elasticsearch - elasticsearch

I have a array of object in elastic search.
I would like to search if a particular field value appears in top 2 position of the array without using script.
Imagine my ES data is as follows
[
{
"_id": "TestID1",
"data": [
{
"name": "Test1",
"priority": 2
},
{
"name": "Test2",
"priority": 3
},
{
"name": "Test3",
"priority": 4
}
]
},
{
"_id": "TestID2",
"data": [
{
"name": "Test3",
"priority": 2
},
{
"name": "Test9",
"priority": 3
},
{
"name": "Test5",
"priority": 4
},
{
"name": "Test10",
"priority": 5
}
]
},
{
"_id": "TestID3",
"data": [
{
"name": "Test1",
"priority": 2
},
{
"name": "Test2",
"priority": 3
},
{
"name": "Test3",
"priority": 6
}
]
}
]
Here I would like to make a query which searches for _Test3_ ONLY within the top 2 elements of the data array.
Searching here would return the result
_id: TestID2's data
because only TestID2 has Test3 in the top 2 of the data array.

You will not be able to perform such request directly without using script. The only solution that I can think of is to create a copy of the array field containing only the first 2 elements. You will then be able to search on this field.
You can add an ingest pipeline to trim your array automatically.
PUT /_ingest/pipeline/top2_elements
{
"description": "Create a top2 field containing only the first two values of an array",
"processors": [
{
"script": {
"source": "ctx.top2 = [ctx.data[0], ctx.data[1]]"
}
}
]
}

Related

Filter documents out of the facet count in enterprise search

We use enterprise search indexes to store items that can be tagged by multiple tenants.
e.g
[
{
"id": 1,
"name": "document 1",
"tags": [
{ "company_id": 1, "tag_id": 1, "tag_name": "bla" },
{ "company_id": 2, "tag_id": 1, "tag_name": "bla" }
]
}
]
I'm looking to find a way to retrieve all documents with only the tags of company 1
This request:
{
"query": "",
"facets": {
"tags": {
"type": "value"
}
},
"sort": {
"created": "desc"
},
"page": {
"size": 20,
"current": 1
}
}
Is coming back with
...
"facets": {
"tags": [
{
"type": "value",
"data": [
{
"value": "{\"company_id\":1,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
},
{
"value": "{\"company_id\":2,\"tag_id\":1,\"tag_name\":\"bla\"}",
"count": 1
}
]
}
],
}
...
Can I modify the request in a way such that I get no tags by "company_id" = 2 ?
I have a solution that involves modifying the results to strip the extra data after they are retrieved but I'm looking for a better solution.

Elasticsearch merge new document with the existing document

I want to merge new document with the existing document in elasticsearch instead of override. I have below record in ES,
{
"id": "1",
"student_name": "Rahul",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Started"
}
]
}
I have received another json to process I need to update the existing document if id is same or just insert it. If I receive below json,
{
"id": "1",
"address": "Bangalore",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Finished"
},
{
"book_id": "12",
"book_name": "History",
"status": "Started"
}
]
}
I want to have my final document like below:
{
"id": "1",
"student_name": "Rahul",
"address": "Bangalore",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Finished"
},
{
"book_id": "12",
"book_name": "History",
"status": "Started"
}
]
}
So basically I want to merge the new json with the existing document if any. i.e. for any given key be it on top or nested if its there in db but not received this time I have to retain that as it is. I got any new key have to add it and if updated have to modify.
Also for the array of json inside the doc if I got same id in json I have to replace but if new json with new id, I need to append that json in the array.
I want to understand whether it is possible to via es queries if yes then want to know the way how to achieve it. Merging at application level and override I can think one way but want to know the better way.
You can achieve this with an upsert query.
The first piece will be indexed as new document because it doesn't exist yet:
POST my-index/_doc/1/_update
{
"doc": {
"id": "1",
"student_name": "Rahul",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Started"
}
]
},
"doc_as_upsert": true
}
And the second piece will be merged with the first one because it already exists:
POST my-index/_doc/1/_update
{
"doc": {
"id": "1",
"address": "Bangalore",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Finished"
},
{
"book_id": "12",
"book_name": "History",
"status": "Started"
}
]
},
"doc_as_upsert": true
}
The document you get after the two commands will be the one you expect:
GET my-index/_doc/1
=>
{
"id": "1",
"student_name": "Rahul",
"address": "Bangalore",
"books": [
{
"book_id": "11",
"book_name": "History",
"status": "Finished"
},
{
"book_id": "12",
"book_name": "History",
"status": "Started"
}
]
}

How to turn an array of object to array of string while reindexing in elasticsearch?

Let say the source index have a document like this :
{
"name":"John Doe",
"sport":[
{
"name":"surf",
"since":"2 years"
},
{
"name":"mountainbike",
"since":"4 years"
},
]
}
How to discard the "since" information so once reindexed the object will contain only sport names? Like this :
{
"name":"John Doe",
"sport":["surf","mountainbike"]
}
Note that it would be fine if the resulting field keep the same name, but it's not mandatory.
I don't know which version of elasticsearch you're using, but here is a solution based on pipelines, introduced with ingest nodes in ES v5.0.
1) A script processor is used to extract the values from each subobject and set it in another field (here, sports)
2) The previous sport field is removed with a remove processor
You can use the Simulate pipeline API to test it :
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "random description",
"processors": [
{
"script": {
"lang": "painless",
"source": "ctx.sports =[]; for (def item : ctx.sport) { ctx.sports.add(item.name) }"
}
},
{
"remove": {
"field": "sport"
}
}
]
},
"docs": [
{
"_index": "index",
"_type": "doc",
"_id": "id",
"_source": {
"name": "John Doe",
"sport": [
{
"name": "surf",
"since": "2 years"
},
{
"name": "mountainbike",
"since": "4 years"
}
]
}
}
]
}
which outputs the following result :
{
"docs": [
{
"doc": {
"_index": "index",
"_type": "doc",
"_id": "id",
"_source": {
"name": "John Doe",
"sports": [
"surf",
"mountainbike"
]
},
"_ingest": {
"timestamp": "2018-07-12T14:07:25.495Z"
}
}
}
]
}
There may be a better solution, as I've not used pipelines a lot, or you could make this with Logstash filters before submitting the documents to your Elasticsearch cluster.
For more information about the pipelines, take a look at the reference documentation of ingest nodes.

Elasticsearch sorting by array of objects

I have a column engagement like this along with other columns
record 1
"date":"2017-11-23T06:46:04.358Z",
"remarks": "test1",
"engagement": [
{
"name": "comment_count",
"value": 6
},
{
"name": "like_count",
"value": 2
}
],
....
....
record 2
"date":"2017-11-23T07:16:14.358Z",
"remarks": "test2",
"engagement": [
{
"name": "comment_count",
"value": 3
},
{
"name": "like_count",
"value": 9
}
],
....
....
I am storing objects in an array format, Now I want to sort the data by desc order of any given object name, e.g. value of like_count or value of share_count.
So if I sort by like_count then 2nd record should come before the 1st record as the value of like_count of the 2nd record is 9 compared to the value of like_count of the first record which is 2.
How to do this in elasticsearch?
You should have something like the following:
{
"query": {
"nested": {
"path": "engagement",
"filter": {
...somefilter...
}
}
},
"sort": {
"engagement.name": {
"order": "desc",
"mode": "min",
"nested_filter": {
...same.filter.as.before
}
}
}
}
Source: Elastic Docs

Elastic Search Grouped Queries

I'm indexing an array of key value pairs. The key is always a UUID and the value is a user entered value. I've been crawling through the documentation but I can't figure out exactly how to query in this scenarioExample schema:
{
"id": 1,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the red card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
},
{
"id": 2,
"owner_id": 1,
"values": [
{ "key": "k3kfa23rewf", "value": "the blue card" },
{ "key": "23a2dd23108", "value": "purple balloons" },
]
}
I would like to query:
{ "term": { "owner_id": 1 },
{ "term": { "values.key": "23a2dd23108" }, "match": { "values.value": "purple" } },
{ "term": { "values.key": "k3kfa23rewf" }, "match": { "values.value": "blue" } }
So that the record with ID 2 is returned. Any suggestions?
I think that you need here to use nested documents.
That way, you will be able to create BoolQueries, with a Must clause with a TermQuery on owner_id and two must clauses with nested queries with Term and Match queries on values.key and values.value.
Does it help?

Resources