Children are not mapping properly in elastic to parents - elasticsearch

"chods": {
"mappings": {
"chod": {
"properties": {
"state": {
"type": "text"
}
}
},
"chods": {},
"variant": {
"_parent": {
"type": "chod"
},
"_routing": {
"required": true
},
"properties": {
"percentage": {
"type": "double"
}
}
}
}
},
When I execute:
PUT /chods/variant/565?parent=36442
{ // some data }
It returns:
{
"_index":"chods",
"_type":"variant",
"_id":"565",
"_version":6,
"result":"updated",
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"created":false
}
But when I run this query:
GET /chods/variant/565?parent=36442
It returns variant with parent=36443
{
"_index": "chods",
"_type": "variant",
"_id": "565",
"_version": 7,
"_routing": "36443",
"_parent": "36443",
"found": true,
"_source": {
...
}
}
Why it returns with parent 36443 and not 36442?

When I tried to reproduce this with your steps, I got the expected result (version=36442). I noticed that after your PUT of the document with "_parent": "36442" the output is "_version":6. In your GET of the document, "_version": 7 is returned. Is it possible that you posted another version of the document?
I also noticed that GET /chods/variant/565?parent=36443 would not actually filter by the parent id - the query parameter is disregarded. If you actually want to filter by parent id, this is the query you're looking for:
GET /chods/_search
{
"query": {
"parent_id": {
"type": "variant",
"id": "36442"
}
}
}

As #fylie pointed out the main problem is that if you use same id of the document you will get your document overridden by last version - sort of
Lets say that we have index /tests and type "a" which is child of type "test" and we do following commands:
PUT /tests/a/50?parent=25
{
"item": "C"
}
PUT /tests/a/50?parent=26
{
"item": "D"
}
PUT /tests/a/50?parent=50
{
"item": "E",
"item2": "F",
}
What the result will be? Well it can result in creating 1 - 3 documents.
If it will route to the same shard, you will end up with one document, which will have 3 versions.
If it will route to 3 different shards, you will end up with 3 new documents.

Related

How to filter match in top 3 - elasticsearch?

I am having the following data in the elasticsearch
{
"_index": "media",
"_type": "information",
"_id": "6838",
"_source": {
"demographics_countries": {
"AE": 0.17543859649122806,
"CA": 0.013157894736842105,
"FR": 0.017543859649122806,
"GB": 0.043859649122807015,
"IT": 0.02631578947368421,
"LB": 0.013157894736842105,
"SA": 0.49122807017543857,
"TR": 0.017543859649122806,
"US": 0.09210526315789472
}
}
},
{
"_index": "media",
"_type": "information",
"_id": "57696",
"_source": {
"demographics_countries": {
"TN": 0.8125,
"MA": 0.034375,
"DZ": 0.032812,
"FR": 0.0125,
"EG": 0.0125,
"IN": 0.009375,
"SA": 0.009375
}
}
]
Expected result:
Find out an document having specific country SA (saudi arabia) is among top 3 in demographics_countries
For example:
"_id": "6838" (first document) is matched because SA (saudi arabia) is among top 3 in the demographics_countries in the above mentioned example document.
Tried ? : I have tried to filter using top_hits, But it's not working as expected.
Any suggestion will be grateful
With the current data model it's quite difficult to do that. What I'd suggest might be not the easiest way to do it, but it will definitely be the fastest to query eventually.
I'd suggest remodelling your documents to already include top countries:
[
{
"_index": "media",
"_type": "information",
"_id": "6838",
"_source": {
"top_demographics_countries": ["TN", "MA", "DZ"],
"demographics_countries": {
"AE": 0.17543859649122806,
"CA": 0.013157894736842105,
"FR": 0.017543859649122806,
"GB": 0.043859649122807015,
"IT": 0.02631578947368421,
"LB": 0.013157894736842105,
"SA": 0.49122807017543857,
"TR": 0.017543859649122806,
"US": 0.09210526315789472
}
}
},
{
"_index": "media",
"_type": "information",
"_id": "57696",
"_source": {
"top_demographics_countries": ["TN", "MA", "DZ"],
"demographics_countries": {
"TN": 0.8125,
"MA": 0.034375,
"DZ": 0.032812,
"FR": 0.0125,
"EG": 0.0125,
"IN": 0.009375,
"SA": 0.009375
}
}
}
]
Ignore values I've picked for top_demographics_countries. With this kind of approach, you can always precalculate top and then you could use a simple terms query to check if document contains that value or not:
{
"query": {
"bool": {
"filter": {
"term": {
"top_demographics_countries": "SA"
}
}
}
}
}
It's going to be cheaper to compute them once during saving compared to always building that clause dynamically.
#Evaldas is right -- it's better to extract the top 3 beforehand.
But if you can't help yourself and feel compelled to use java/painless, here's one approach:
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "demographics_countries.SA"
}
},
{
"script": {
"script": {
"source": """
def tuple_list = new ArrayList();
for (def c : params.all_countries) {
def key = 'demographics_countries.'+c;
if (!doc.containsKey(key) || doc[key].size() == 0) {
continue;
}
def val = doc[key].value;
tuple_list.add([c, val]);
}
// sort tuple list by the country values
Collections.sort(tuple_list, (arr1, arr2) -> arr1[1] < arr2[1] ? 1 : -1);
// slice & take only the top 3
def top_3_countries = tuple_list.subList(0, 3).stream().map(arr -> arr[0]).collect(Collectors.toList());
return top_3_countries.size() >=3 && top_3_countries.contains(params.country_of_interest);
""",
"params": {
"country_of_interest": "SA",
"all_countries": [
"AE",
"CA",
"FR",
"GB",
"IT",
"LB",
"SA",
"TR",
"US",
"TN",
"MA",
"DZ",
"EG",
"IN"
]
}
}
}
}
]
}
}
}

Elastic search Average time difference Aggregate Query

I have documents in elasticsearch in which each document looks something like as follows:
{
"id": "T12890ADSA12",
"status": "ENDED",
"type": "SAMPLE",
"updatedAt": "2020-05-29T18:18:08.483Z",
"events": [
{
"event": "STARTED",
"version": 1,
"timestamp": "2020-04-30T13:41:25.862Z"
},
{
"event": "INPROGRESS",
"version": 2,
"timestamp": "2020-05-14T17:03:09.137Z"
},
{
"event": "INPROGRESS",
"version": 3,
"timestamp": "2020-05-17T17:03:09.137Z"
},
{
"event": "ENDED",
"version": 4,
"timestamp": "2020-05-29T18:18:08.483Z"
}
],
"createdAt": "2020-04-30T13:41:25.862Z"
}
Now, I wanted to write a query in elasticsearch to get all the documents which are of type "SAMPLE" and I can get the average time between STARTED and ENDED of all those documents. Eg. Avg of (2020-05-29T18:18:08.483Z - 2020-04-30T13:41:25.862Z, ....). Assume that STARTED and ENDED event is present only once in events array. Is there any way I can do that?
You can do something like this. The query selects the events of type SAMPLE and status ENDED (to make sure there is a ENDED event). Then the avg aggregation uses scripting to gather the STARTED and ENDED timestamps and subtracts them to return the number of days:
POST test/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"status.keyword": "ENDED"
}
},
{
"term": {
"type.keyword": "SAMPLE"
}
}
]
}
},
"aggs": {
"duration": {
"avg": {
"script": "Map findEvent(List events, String type) {return events.find(it -> it.event == type);} def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp); def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp); return ChronoUnit.DAYS.between(started, ended);"
}
}
}
}
The script looks like this:
Map findEvent(List events, String type) {
return events.find(it -> it.event == type);
}
def started = Instant.parse(findEvent(params._source.events, 'STARTED').timestamp);
def ended = Instant.parse(findEvent(params._source.events, 'ENDED').timestamp);
return ChronoUnit.DAYS.between(started, ended);

Mapping exception while executing Rollup job

I have a Kibana instance which stores log data from our java apps in per daily indexes, like logstash-java-beats-2019.09.01. As far as amount of indexes could be pretty big in future I want to create a rollup job, to be able to archive old logs in separate index, something like logstash-java-beats-rollup. Typical document in logstash-java-beats-2019.09.01 index looks like this:
{
"_index": "logstash-java-beats-2019.10.01",
"_type": "_doc",
"_id": "C9mfhG0Bf_Fr5GBl6kTg",
"_version": 1,
"_score": 1,
"_source": {
"#timestamp": "2019-10-01T00:02:13.756Z",
"ecs": {
"version": "1.0.0"
},
"event_timestamp": "2019-10-01 00:02:13,756",
"log": {
"offset": 5729359,
"file": {
"path": "/var/log/application-name/application.log"
}
},
"tags": [
"service-name",
"location",
"beats_input_codec_plain_applied"
],
"loglevel": "WARN",
"java_class": "java.class.name",
"message": "Log message here",
"host": {
"name": "host-name-beat"
},
"#version": "1",
"agent": {
"hostname": "host-name",
"id": "a34af368-3359-495a-9775-63502693d148",
"ephemeral_id": "cc4afd3c-ad97-47a4-bd21-72255d450232",
"type": "filebeat",
"version": "7.2.0",
"name": "host-name-beat"
},
"input": {
"type": "log"
}
}
}
So I created a rollup job with such config:
{
"config": {
"id": "Test 2 job",
"index_pattern": "logstash-java-beats-2*",
"rollup_index": "logstash-java-beats-rollup",
"cron": "0 0 * * * ?",
"groups": {
"date_histogram": {
"fixed_interval": "1000ms",
"field": "#timestamp",
"delay": "1d",
"time_zone": "UTC"
}
},
"metrics": [],
"timeout": "20s",
"page_size": 1000
},
"status": {
"job_state": "stopped",
"current_position": {
"#timestamp.date_histogram": 1567933199000
},
"upgraded_doc_id": true
},
"stats": {
"pages_processed": 1840,
"documents_processed": 5322525,
"rollups_indexed": 1838383,
"trigger_count": 1,
"index_time_in_ms": 1555018,
"index_total": 1839,
"index_failures": 0,
"search_time_in_ms": 59059,
"search_total": 1840,
"search_failures": 0
}
}
but it fails to rollup the data with such exception:
Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$GTvyIZtPhKqi-dtfVd6MXg], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[1]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$v-r89eEpLvImr0lWIrOb_Q], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[2]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$quCHwZP1iVU_Bs2fmhgSjQ], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
...
logstash-java-beats-rollup index is empty, even if there is some stats for the rollup job available.
I'm using elasticsearch v7.2.0
Could you please explain what is wrong with the data, or with the rollup job configuration?

Get data by Popularity/Rating using Elasticsearcch

I am trying to do a get request using elasticsearch which needs to get the data with respect to its Popularity/ rating. So I followed this Link
. I set the rating of my item by using the below one,
#1 http://localhost:9200/cars/car/_rank_eval
above is the Api which is used to create _rank_eval using postman ,
Below is my Body content ,
{
"requests": [
{
"id": "horsepower",
"request": {
"query": {
"match": {
"horsepower": "68"
}
}
},
"ratings": [
{
"_index": "cars",
"_id": "25",
"rating": 10
},
{
"_index": "cars",
"_id": "119",
"rating": 1
},
{
"_index": "cars",
"_id": "52",
"rating": 2
}
]
}
],
"metric": {
"precision": {
"relevant_rating_threshold": 1,
"ignore_unlabeled": false
}
}
}
Steps i did so far ,
1 . Created database with dumb data's in MySql
2 . Transferred my db data's to ElasticSearch using loadstash
3 . Set some rating's to my data
so what are all the next steps , did i miss anything .. I need some clarification/help on this .

Elastic Search existing field mapping

I am new to elastic search, I have created an index "cmn" with a type "mention". I am trying to import data from my existing solr to elasticsearch, so I want to map an existing field to the _id field.
I have created the following file under /config/mappings/cmn/,
{
"mappings": {
"mentions":{
"_id" : {
"path" : "docKey"
}
}
}
}
But this doesn't seem to be working, every time I index a record the following _id is created,
"_index": "cmn",
"_type": "mentions",
"_id": "k4E0dJr6Re2Z39HAIjYMmg",
"_score": 1
Also, the mapping is not reflects. I have also tried the following option,
{
"mappings": {
"_id" : {
"path" : "docKey"
}
}
}
SAMPLE DOCUMENT: Basically a tweet.
{
"usrCreatedDate": "2012-01-24 21:34:47",
"sex": "U",
"listedCnt": 2,
"follCnt": 432,
"state": "Southampton",
"classified": 0,
"favCnt": 468,
"timeZone": "Casablanca",
"twitterId": 473333038,
"lang": "en",
"stnostem": "#ootd #ootw #fashion #styling #photography #white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh",
"sourceId": "tw",
"timestamp": "2014-04-09T22:58:00.396Z",
"sentiment": 0,
"updatedOnGMTDate": "2014-04-09T22:56:57.000Z",
"userLocation": "Southampton",
"age": 0,
"priorityScore": 57.4700012207031,
"statusCnt": 14612,
"name": "YazzyK",
"profilePicUrl": "http://pbs.twimg.com/profile_images/453578494556270594/orsA0pKi_normal.jpeg",
"mentions": "",
"sourceStripped": "Instagram",
"collectionName": "STREAMING",
"tags": "557/161/193/197",
"msgid": 1397084280396.33,
"_version_": 1464949081784713200,
"url2": "{\"urls\":[{\"url\":\"http://t.co/YbPFrXlpuh\",\"expandedURL\":\"http://instagram.com/p/mliZbgxVZm/\",\"displayURL\":\"instagram.com/p/mliZbgxVZm/\",\"start\":88,\"end\":110}]}",
"links": "http://t.co/YbPFrXlpuh",
"retweetedStatus": "",
"twtScreenName": "YazKader",
"postId": "454030232501358592",
"country": "Bermuda",
"message": "#ootd #ootw #fashion #styling #photography #white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh",
"source": "Instagram",
"parentStatusId": -1,
"bio": "Live and breathe Fashion. Persian and proud- Instagram: #Yazkader",
"createdOnGMTDate": "2014-04-09T22:56:57.000Z",
"searchText": "#ootd #ootw #fashion #styling #photography #white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh",
"isFavorited": "False",
"frenCnt": 214,
"docKey": "tw_454030232501358592"
}
Also, how can we create unique mapping for each "TYPE" and not just the index.
Thanks
Do like this,
Put the mapping as,
PUT index_name/type_name/_mapping
{
"type_name": {
"_id": {
"path": "docKey"
},
"properties": {
"docKey": {
"type": "string"
}
}
}
}
And, it will work. (When you index docKey, then _id is set). You shouldn't have to provide all the mapping.

Resources