Include joined children with Elasticsearch GET request - elasticsearch

I have an Elasticsearch index events that has a join field so that an event can have multiple instances (i.e. the same event can occur on different dates). In this simplified mapping, an event doc has fields for title and url while an instance doc has start/end date fields:
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"url": {
"type": "keyword"
},
"dt": {
"type": "date"
},
"end_dt": {
"type": "date"
},
"event_or_instance": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"event": "instance"
}
}
}
}
}
I know how to get an event and includes all of its instances using has_child:
GET /events/_search
{
"query" : {
"bool": {
"filter": [
{
"term": {
"_id": {
"value": "c8871a79-1907-46c0-958c-9731c529b93e"
}
}
},
{
"has_child" : {
"type" : "instance",
"query" : { "match_all": {} },
"inner_hits" : {
"_source": true,
"sort": [{"dt": "asc"}]
}
}
}
]
}
},
"_source": true
}
This works fine, but is there a way to do this using the Get/Multi-get API instead of the Search API?

Related

Simple elasticsearch input - Rejecting mapping update final mapping would have more than 1 type: [_doc, doc]

I'm trying to send data to elasticsearch but running into an issue where my number field only comes up as a string. These are the steps I took.
Step 1. Add index & map
PUT http://123.com:5101/core_060619/
{
"mappings": {
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
},
"data": {
"type": "integer"
}
}
}
}
Result:
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "core_060619"
}
Step 2. Add data
PUT http://123.com:5101/core_060619/doc/1
{
"test" : [ {
"data" : "119050300",
"date" : "00:00 2019-06-03"
} ]
}
Result:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [zyxnewcoreyxbl_060619] as the final mapping would have more than 1 type: [_doc, doc]"
}
],
"type": "illegal_argument_exception",
"reason": "Rejecting mapping update to [zyxnewcoreyxbl_060619] as the final mapping would have more than 1 type: [_doc, doc]"
},
"status": 400
}
You can not have more than one type of document in Elasticsearch 6.0.0+. If you set your document type to doc, then you can add another document by simply PUT http://123.com:5101/core_060619/doc/1, PUT http://123.com:5101/core_060619/doc/2 etc.
Elasticsearch 6.+
PUT core_060619/
{
"mappings": {
"doc": { //type of documents in index is 'doc'
"properties": {
"date": {
"type": "date",
"format": "HH:mm yyyy-MM-dd"
},
"data": {
"type": "integer"
}
}
}
}
}
Since we created mapping to have doc type of documents, now we can add new documents by simply adding /doc/_id:
PUT core_060619/doc/1
{
"test" : [ {
"data" : "119050300",
"date" : "00:00 2019-06-03"
} ]
}
PUT core_060619/doc/2
{
"test" : [ {
"data" : "111120300",
"date" : "10:15 2019-06-02"
} ]
}
Elasticsearch 7.+
Types are removed, but you can use custom like field(s):
PUT twitter
{
"mappings": {
"_doc": {
"properties": {
"type": { "type": "keyword" },
"name": { "type": "text" },
"user_name": { "type": "keyword" },
"email": { "type": "keyword" },
"content": { "type": "text" },
"tweeted_at": { "type": "date" }
}
}
}
}
PUT twitter/_doc/user-kimchy
{
"type": "user",
"name": "Shay Banon",
"user_name": "kimchy",
"email": "shay#kimchy.com"
}
PUT twitter/_doc/tweet-1
{
"type": "tweet",
"user_name": "kimchy",
"tweeted_at": "2017-10-24T09:00:00Z",
"content": "Types are going away"
}
GET twitter/_search
{
"query": {
"bool": {
"must": {
"match": {
"user_name": "kimchy"
}
},
"filter": {
"match": {
"type": "tweet"
}
}
}
}
}
Removal of mapping types

How I can get the distinct result?

What I am trying to do is the query to elastic search (ver 6.4), to get the unique search result (named eids). I made a query as below. What I'd like to do is first text search from both 2 fields called eLabel and pLabel, and get the distinct result called eid. But actually the result is not aggregated, showing redundant ids from 0 to over 20. How I can adjust the query?
{
"query": {
"multi_match": {
"query": "Brazil Capital",
"fields": [
"eLabel",
"pLabel"
]
}
},
"size": 200,
"_source": [
"eid",
"eLabel"
],
"aggs": {
"eids": {
"terms": {
"field": "eid"
}
}
}
}
my current mappings are as follows.
eid : id of entity
eLabel: entity label (ex, Brazil)
prop_id: property id of the entity (eid)
pLabel: the label of the property (ex, is the capital of, is located at ...)
"mappings": {
"entity": {
"properties": {
"eLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"eid": {
"type": "keyword"
} ,
"subclass": {
"type": "boolean"
} ,
"pLabel": {
"type": "text" ,
"index_options": "docs" ,
"analyzer": "my_analyzer"
} ,
"prop_id": {
"type": "keyword"
} ,
"pType": {
"type": "keyword"
} ,
"way": {
"type": "keyword"
} ,
"chain": {
"type": "integer"
} ,
"siteKey": {
"type": "keyword"
},
"version": {
"type": "integer"
},
"docId": {
"type": "integer"
}
}
}
}
Based on your comment, you can make use of the below query using Bool. Don't think anything is wrong with aggregation query, just replace the query you have with the bool query I've mentioned and I think it would suffice.
When you make use of multi_match query, it would retrieve even if the document has eLabel = "Rio is capital of brazil" & pLabel = "something else entirely here"
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"eLabel": "capital"
}
},
{
"match": {
"pLabel": "brazil"
}
}
]
}
},
"size": 200,
"_source": [
"eid",
"eLabel"
],
"aggs": {
"eids": {
"terms": {
"field": "eid"
}
}
}
}
Note that if you only want the values of eid and do not want the documents, you can set "size":0 in the above query. That way you'd only have aggregation results returned.
Let me know if this helps!!

Filtering Nested Objects in Elastic Search

I have an index which has the following mapping
{
"mappings": {
"data": {
"date_detection": false,
"_all": {
"enabled": false
},
"properties": {
"DocumentId": {
"type": "string"
},
"SubscriptionId": {
"type": "long"
},
"AccountId": {
"type": "long"
},
"SubscriptionStartDateTime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"SubscriptionEndDateTime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"DateWiseMetrics": {
"type": "nested",
"properties": {
"Date": {
"type": "date",
"format": "yyyy-MM-dd"
},
"Usage": {
"type": "long"
}
}
}
}
}
}
}
What I'm trying to do is
For a given account id, start date, and end date find all active subscriptions with in that period
Find the usage for that given period for all the active subscriptions found in step 1.
For example,
If start date = "2017-08-01 00:00:00" and end date = "2017-09-01 00:00:00", i will find all active subscriptions in that period and then i want to see the daily usage of those subscriptions from 2017-08-01 till 2017-09-01.
I was able to get the active subscriptions (Step 1) correct, but not getting correct values when i filter further by Date in the nested object (Step 2).
{
"_source" : false,
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{ "term" : { "AccountId" : 290804220029 } },
{
"nested": {
"path": "DateWiseMetrics",
"filter":
{ "range" : { "DateWiseMetrics.Date": { "gte": "2017-01-01", "lte": "2017-08-01" } }
}, "inner_hits" : {}
}
}
],
"must_not" : [
{ "range" : { "SubscriptionStartDateTime": { "gt": "2017-08-01 00:00:00" } }},
{ "range" : { "SubscriptionEndDateTime": { "lt": "2017-01-01 00:00:00" } }}
]
}
}
}
}
}
I think the way i am filtering nested objects is wrong. Please help.
Also, is there a way to show values of few parent items ( for eg: accountid, subscriptionid) along with inner hit results? What i observed is that when you add "_source" : false the parent elements don't appear anymore.
Thanks

ElasticSearch: search inside the array of objects

I have a problem with querying objects in array.
Let's create very simple index, add a type with one field and add one document with array of objects (I use sense console):
PUT /test/
PUT /test/test/_mapping
{
"test": {
"properties": {
"parent": {"type": "object"}
}
}
}
POST /test/test
{
"parent": [
{
"name": "turkey",
"label": "Turkey"
},
{
"name": "turkey,mugla-province",
"label": "Mugla (province)"
}
]
}
Now I want to search by both names "turkey" and "turkey,mugla-province" . The first query works fine:
GET /test/test/_search {"query":{ "term": {"parent.name": "turkey"}}}
But the second one returns nothing:
GET /test/test/_search {"query":{ "term": {"parent.name": "turkey,mugla-province"}}}
I tried a lot of stuff including:
"parent": {
"type": "nested",
"include_in_parent": true,
"properties": {
"label": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"store": true
}
}
}
But nothing helps. What do I miss?
Here's one way you can do it, using nested docs:
I defined an index like this:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"parent": {
"type": "nested",
"properties": {
"label": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
}
}
}
Indexed your document:
PUT /test_index/doc/1
{
"parent": [
{
"name": "turkey",
"label": "Turkey"
},
{
"name": "turkey,mugla-province",
"label": "Mugla (province)"
}
]
}
Then either of these queries will return it:
POST /test_index/_search
{
"query": {
"nested": {
"path": "parent",
"query": {
"match": {
"parent.name": "turkey"
}
}
}
}
}
POST /test_index/_search
{
"query": {
"nested": {
"path": "parent",
"query": {
"match": {
"parent.name": "turkey,mugla-province"
}
}
}
}
}
Here's the code I used:
http://sense.qbox.io/gist/6258f8c9ee64878a1835b3e9ea2b54e5cf6b1d9e
For search multiple terms use the Terms query instead of Term query.
"terms" : {
"tags" : [ "turkey", "mugla-province" ],
"minimum_should_match" : 1
}
There are various ways to construct this query, but this is the simplest and most elegant in the current version of ElasticSearch (1.6)

Using a custom_score to sort by a nested child's timestamp

I'm pretty new to elasticsearch and have been banging my head trying to get this sorting to work. The general idea is to search email message threads with nested messages and nested participants. The goal is to display search results at the thread level, sorting by the participant who is doing the search and either the last_received_at or last_sent_at column depending on which mailbox they are in.
My understanding is that you can't sort by a single child's value among many nested children. So in order to do this I saw a couple of suggestions for using a custom_score with a script, then sorting on the score. My plan is to dynamically change the sort column and then run a nested custom_score query that will return the date of one of the participants as the score. I've been noticing some issues with both the score format being strange (eg. always has 4 zeros at the end) and it may not be returning the date that I was expecting.
Below are simplified versions of the index and the query in question. If anyone has any suggestions, I'd be very grateful. (FYI - I am using elasticsearch version 0.20.6.)
Index:
mappings: {
message_thread: {
properties: {
id: {
type: long
}
subject: {
dynamic: true
properties: {
id: {
type: long
}
name: {
type: string
}
}
}
participants: {
dynamic: true
properties: {
id: {
type: long
}
name: {
type: string
}
last_sent_at: {
format: dateOptionalTime
type: date
}
last_received_at: {
format: dateOptionalTime
type: date
}
}
}
messages: {
dynamic: true
properties: {
sender: {
dynamic: true
properties: {
id: {
type: long
}
}
}
id: {
type: long
}
body: {
type: string
}
created_at: {
format: dateOptionalTime
type: date
}
recipient: {
dynamic: true
properties: {
id: {
type: long
}
}
}
}
}
version: {
type: long
}
}
}
}
Query:
{
"query": {
"bool": {
"must": [
{
"term": { "participants.id": 3785 }
},
{
"custom_score": {
"query": {
"filtered": {
"query": { "match_all": {} },
"filter": {
"term": { "participants.id": 3785 }
}
}
},
"params": { "sort_column": "participants.last_received_at" },
"script": "doc[sort_column].value"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": { "messages.recipient.id": 3785 }
}
]
}
},
"sort": [ "_score" ]
}
Solution:
Thanks to #imotov, here is the final result. The participants were not properly nested in the index (while the messages didn't need to be). In addition, include_in_root was used for the participants to simplify the query (participants are small records and not a real size issue, although #imotov also provided an example without it). He then restructured the JSON request to use a dis_max query.
curl -XDELETE "localhost:9200/test-idx"
curl -XPUT "localhost:9200/test-idx" -d '{
"mappings": {
"message_thread": {
"properties": {
"id": {
"type": "long"
},
"messages": {
"properties": {
"body": {
"type": "string",
"analyzer": "standard"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
},
"id": {
"type": "long"
},
"recipient": {
"dynamic": "true",
"properties": {
"id": {
"type": "long"
}
}
},
"sender": {
"dynamic": "true",
"properties": {
"id": {
"type": "long"
}
}
}
}
},
"messages_count": {
"type": "long"
},
"participants": {
"type": "nested",
"include_in_root": true,
"properties": {
"id": {
"type": "long"
},
"last_received_at": {
"type": "date",
"format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
},
"last_sent_at": {
"type": "date",
"format": "yyyy-MM-dd'\''T'\''HH:mm:ss'\''Z'\''"
},
"name": {
"type": "string",
"analyzer": "standard"
}
}
},
"subject": {
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "string"
}
}
}
}
}
}
}'
curl -XPUT "localhost:9200/test-idx/message_thread/1" -d '{
"id" : 1,
"subject" : {"name": "Test Thread"},
"participants" : [
{"id" : 87793, "name" : "John Smith", "last_received_at" : null, "last_sent_at" : "2010-10-27T17:26:58Z"},
{"id" : 3785, "name" : "David Jones", "last_received_at" : "2010-10-27T17:26:58Z", "last_sent_at" : null}
],
"messages" : [{
"id" : 1,
"body" : "This is a test.",
"sender" : { "id" : 87793 },
"recipient" : { "id" : 3785},
"created_at" : "2010-10-27T17:26:58Z"
}]
}'
curl -XPUT "localhost:9200/test-idx/message_thread/2" -d '{
"id" : 2,
"subject" : {"name": "Elastic"},
"participants" : [
{"id" : 57834, "name" : "Paul Johnson", "last_received_at" : "2010-11-25T17:26:58Z", "last_sent_at" : "2010-10-25T17:26:58Z"},
{"id" : 3785, "name" : "David Jones", "last_received_at" : "2010-10-25T17:26:58Z", "last_sent_at" : "2010-11-25T17:26:58Z"}
],
"messages" : [{
"id" : 2,
"body" : "More testing of elasticsearch.",
"sender" : { "id" : 57834 },
"recipient" : { "id" : 3785},
"created_at" : "2010-10-25T17:26:58Z"
},{
"id" : 3,
"body" : "Reply message.",
"sender" : { "id" : 3785 },
"recipient" : { "id" : 57834},
"created_at" : "2010-11-25T17:26:58Z"
}]
}'
curl -XPOST localhost:9200/test-idx/_refresh
echo
# Using include in root
curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{
"query": {
"filtered": {
"query": {
"nested": {
"path": "participants",
"score_mode": "max",
"query": {
"custom_score": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"participants.id": 3785
}
}
}
},
"params": {
"sort_column": "participants.last_received_at"
},
"script": "doc[sort_column].value"
}
}
}
},
"filter": {
"query": {
"multi_match": {
"query": "test",
"fields": ["subject.name", "participants.name", "messages.body"],
"operator": "and",
"use_dis_max": true
}
}
}
}
},
"sort": ["_score"],
"fields": []
}
'
# Not using include in root
curl "localhost:9200/test-idx/message_thread/_search?pretty=true" -d '{
"query": {
"filtered": {
"query": {
"nested": {
"path": "participants",
"score_mode": "max",
"query": {
"custom_score": {
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"term": {
"participants.id": 3785
}
}
}
},
"params": {
"sort_column": "participants.last_received_at"
},
"script": "doc[sort_column].value"
}
}
}
},
"filter": {
"query": {
"bool": {
"should": [{
"match": {
"subject.name":"test"
}
}, {
"nested" : {
"path": "participants",
"query": {
"match": {
"name":"test"
}
}
}
}, {
"match": {
"messages.body":"test"
}
}
]
}
}
}
}
},
"sort": ["_score"],
"fields": []
}
'
There are a couple of issues here. You are asking about nested objects, but participants are not defined in your mapping as nested objects. The second possible issue is that score has type float, so it might not have enough precision to represent timestamp as is. If you can figure out how to fit this value into float, you can take a look at this example: Elastic search - tagging strength (nested/child document boosting). However, if you are developing a new system, it might be prudent to upgrade to 0.90.0.Beta1, which supports sorting on nested fields.

Resources