Elastic Search existing field mapping - elasticsearch

I am new to elastic search, I have created an index "cmn" with a type "mention". I am trying to import data from my existing solr to elasticsearch, so I want to map an existing field to the _id field.
I have created the following file under /config/mappings/cmn/,
{
"mappings": {
"mentions":{
"_id" : {
"path" : "docKey"
}
}
}
}
But this doesn't seem to be working, every time I index a record the following _id is created,
"_index": "cmn",
"_type": "mentions",
"_id": "k4E0dJr6Re2Z39HAIjYMmg",
"_score": 1
Also, the mapping is not reflects. I have also tried the following option,
{
"mappings": {
"_id" : {
"path" : "docKey"
}
}
}
SAMPLE DOCUMENT: Basically a tweet.
{
"usrCreatedDate": "2012-01-24 21:34:47",
"sex": "U",
"listedCnt": 2,
"follCnt": 432,
"state": "Southampton",
"classified": 0,
"favCnt": 468,
"timeZone": "Casablanca",
"twitterId": 473333038,
"lang": "en",
"stnostem": "#ootd #ootw #fashion #styling #photography #white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh",
"sourceId": "tw",
"timestamp": "2014-04-09T22:58:00.396Z",
"sentiment": 0,
"updatedOnGMTDate": "2014-04-09T22:56:57.000Z",
"userLocation": "Southampton",
"age": 0,
"priorityScore": 57.4700012207031,
"statusCnt": 14612,
"name": "YazzyK",
"profilePicUrl": "http://pbs.twimg.com/profile_images/453578494556270594/orsA0pKi_normal.jpeg",
"mentions": "",
"sourceStripped": "Instagram",
"collectionName": "STREAMING",
"tags": "557/161/193/197",
"msgid": 1397084280396.33,
"_version_": 1464949081784713200,
"url2": "{\"urls\":[{\"url\":\"http://t.co/YbPFrXlpuh\",\"expandedURL\":\"http://instagram.com/p/mliZbgxVZm/\",\"displayURL\":\"instagram.com/p/mliZbgxVZm/\",\"start\":88,\"end\":110}]}",
"links": "http://t.co/YbPFrXlpuh",
"retweetedStatus": "",
"twtScreenName": "YazKader",
"postId": "454030232501358592",
"country": "Bermuda",
"message": "#ootd #ootw #fashion #styling #photography #white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh",
"source": "Instagram",
"parentStatusId": -1,
"bio": "Live and breathe Fashion. Persian and proud- Instagram: #Yazkader",
"createdOnGMTDate": "2014-04-09T22:56:57.000Z",
"searchText": "#ootd #ootw #fashion #styling #photography #white #pink #playsuit #prada #sunny #spring http://t.co/YbPFrXlpuh",
"isFavorited": "False",
"frenCnt": 214,
"docKey": "tw_454030232501358592"
}
Also, how can we create unique mapping for each "TYPE" and not just the index.
Thanks

Do like this,
Put the mapping as,
PUT index_name/type_name/_mapping
{
"type_name": {
"_id": {
"path": "docKey"
},
"properties": {
"docKey": {
"type": "string"
}
}
}
}
And, it will work. (When you index docKey, then _id is set). You shouldn't have to provide all the mapping.

Related

Mapping exception while executing Rollup job

I have a Kibana instance which stores log data from our java apps in per daily indexes, like logstash-java-beats-2019.09.01. As far as amount of indexes could be pretty big in future I want to create a rollup job, to be able to archive old logs in separate index, something like logstash-java-beats-rollup. Typical document in logstash-java-beats-2019.09.01 index looks like this:
{
"_index": "logstash-java-beats-2019.10.01",
"_type": "_doc",
"_id": "C9mfhG0Bf_Fr5GBl6kTg",
"_version": 1,
"_score": 1,
"_source": {
"#timestamp": "2019-10-01T00:02:13.756Z",
"ecs": {
"version": "1.0.0"
},
"event_timestamp": "2019-10-01 00:02:13,756",
"log": {
"offset": 5729359,
"file": {
"path": "/var/log/application-name/application.log"
}
},
"tags": [
"service-name",
"location",
"beats_input_codec_plain_applied"
],
"loglevel": "WARN",
"java_class": "java.class.name",
"message": "Log message here",
"host": {
"name": "host-name-beat"
},
"#version": "1",
"agent": {
"hostname": "host-name",
"id": "a34af368-3359-495a-9775-63502693d148",
"ephemeral_id": "cc4afd3c-ad97-47a4-bd21-72255d450232",
"type": "filebeat",
"version": "7.2.0",
"name": "host-name-beat"
},
"input": {
"type": "log"
}
}
}
So I created a rollup job with such config:
{
"config": {
"id": "Test 2 job",
"index_pattern": "logstash-java-beats-2*",
"rollup_index": "logstash-java-beats-rollup",
"cron": "0 0 * * * ?",
"groups": {
"date_histogram": {
"fixed_interval": "1000ms",
"field": "#timestamp",
"delay": "1d",
"time_zone": "UTC"
}
},
"metrics": [],
"timeout": "20s",
"page_size": 1000
},
"status": {
"job_state": "stopped",
"current_position": {
"#timestamp.date_histogram": 1567933199000
},
"upgraded_doc_id": true
},
"stats": {
"pages_processed": 1840,
"documents_processed": 5322525,
"rollups_indexed": 1838383,
"trigger_count": 1,
"index_time_in_ms": 1555018,
"index_total": 1839,
"index_failures": 0,
"search_time_in_ms": 59059,
"search_total": 1840,
"search_failures": 0
}
}
but it fails to rollup the data with such exception:
Error while attempting to bulk index documents: failure in bulk execution:
[0]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$GTvyIZtPhKqi-dtfVd6MXg], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[1]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$v-r89eEpLvImr0lWIrOb_Q], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
[2]: index [logstash-java-beats-rollup], type [_doc], id [Test 2 job$quCHwZP1iVU_Bs2fmhgSjQ], message [MapperParsingException[Could not dynamically add mapping for field [#timestamp.date_histogram.time_zone]. Existing mapping for [#timestamp] must be of type object but found [date].]]
...
logstash-java-beats-rollup index is empty, even if there is some stats for the rollup job available.
I'm using elasticsearch v7.2.0
Could you please explain what is wrong with the data, or with the rollup job configuration?

Elastic serach record upsert with a complex _id field

I have to upsert bulk records in elastic search index with _id being combination of more than one field from the message. Can I do so. if that can be done then please give me a sample json for the same.
Regards
A sample _id field I am looking for some thing like below
{
"_index": "kpi_aggr",
"_type": "KPIBackChannel",
"_id": "<<<combination of name , period_type>>>",
"_score": 1,
"_source": {
"name": "kpi-v1",
"period_type": "w",
"country": "AL",
"pg_name": "DENTAL CARE",
"panel_type": "retail",
"number_of_records_with_proposal": 10000,
"number_of_proposals": 80000,
"overall_number_of_records": 2000,
"#timestamp": 1442162810
}
}
Naturally, you can specify your own Elasticsearch document ids during a call to the Index API:
PUT kpi_aggr/KPIBackChannel/kpi-v1,w
{
"name": "kpi-v1",
"period_type": "w",
"country": "AL",
"pg_name": "DENTAL CARE",
"panel_type": "retail",
"number_of_records_with_proposal": 10000,
"number_of_proposals": 80000,
"overall_number_of_records": 2000,
"#timestamp": 1442162810
}
You can also do so during a _bulk API call:
POST _bulk
{ "index" : { "_index" : "kpi_aggr", "_type" : "KPIBackChannel", "_id" : "kpi-v1,w" } }
{"name":"kpi-v1","period_type":"w","country":"AL","pg_name":"DENTAL CARE","panel_type":"retail","number_of_records_with_proposal":10000,"number_of_proposals":80000,"overall_number_of_records":2000,"#timestamp":1442162810}
Notice that Elasticsearch will replace the document with the new version.
If you execute these two queries on an empty index, then querying by document id:
GET kpi_aggr/KPIBackChannel/kpi-v1,w
will give you the following:
{
"_index": "kpi_aggr",
"_type": "KPIBackChannel",
"_id": "kpi-v1,w",
"_version": 2,
"found": true,
"_source": {
"name": "kpi-v1",
"period_type": "w",
"country": "AL",
"pg_name": "DENTAL CARE",
"panel_type": "retail",
"number_of_records_with_proposal": 10000,
"number_of_proposals": 80000,
"overall_number_of_records": 2000,
"#timestamp": 1442162810
}
}
Notice "_version": 2, which in our case indicates that a document has been indexed twice, hence performed an "upsert" (but in general is meant to be used for Optimistic Concurrency Control).
Hope that helps!

Get data by Popularity/Rating using Elasticsearcch

I am trying to do a get request using elasticsearch which needs to get the data with respect to its Popularity/ rating. So I followed this Link
. I set the rating of my item by using the below one,
#1 http://localhost:9200/cars/car/_rank_eval
above is the Api which is used to create _rank_eval using postman ,
Below is my Body content ,
{
"requests": [
{
"id": "horsepower",
"request": {
"query": {
"match": {
"horsepower": "68"
}
}
},
"ratings": [
{
"_index": "cars",
"_id": "25",
"rating": 10
},
{
"_index": "cars",
"_id": "119",
"rating": 1
},
{
"_index": "cars",
"_id": "52",
"rating": 2
}
]
}
],
"metric": {
"precision": {
"relevant_rating_threshold": 1,
"ignore_unlabeled": false
}
}
}
Steps i did so far ,
1 . Created database with dumb data's in MySql
2 . Transferred my db data's to ElasticSearch using loadstash
3 . Set some rating's to my data
so what are all the next steps , did i miss anything .. I need some clarification/help on this .

Children are not mapping properly in elastic to parents

"chods": {
"mappings": {
"chod": {
"properties": {
"state": {
"type": "text"
}
}
},
"chods": {},
"variant": {
"_parent": {
"type": "chod"
},
"_routing": {
"required": true
},
"properties": {
"percentage": {
"type": "double"
}
}
}
}
},
When I execute:
PUT /chods/variant/565?parent=36442
{ // some data }
It returns:
{
"_index":"chods",
"_type":"variant",
"_id":"565",
"_version":6,
"result":"updated",
"_shards":{
"total":2,
"successful":1,
"failed":0
},
"created":false
}
But when I run this query:
GET /chods/variant/565?parent=36442
It returns variant with parent=36443
{
"_index": "chods",
"_type": "variant",
"_id": "565",
"_version": 7,
"_routing": "36443",
"_parent": "36443",
"found": true,
"_source": {
...
}
}
Why it returns with parent 36443 and not 36442?
When I tried to reproduce this with your steps, I got the expected result (version=36442). I noticed that after your PUT of the document with "_parent": "36442" the output is "_version":6. In your GET of the document, "_version": 7 is returned. Is it possible that you posted another version of the document?
I also noticed that GET /chods/variant/565?parent=36443 would not actually filter by the parent id - the query parameter is disregarded. If you actually want to filter by parent id, this is the query you're looking for:
GET /chods/_search
{
"query": {
"parent_id": {
"type": "variant",
"id": "36442"
}
}
}
As #fylie pointed out the main problem is that if you use same id of the document you will get your document overridden by last version - sort of
Lets say that we have index /tests and type "a" which is child of type "test" and we do following commands:
PUT /tests/a/50?parent=25
{
"item": "C"
}
PUT /tests/a/50?parent=26
{
"item": "D"
}
PUT /tests/a/50?parent=50
{
"item": "E",
"item2": "F",
}
What the result will be? Well it can result in creating 1 - 3 documents.
If it will route to the same shard, you will end up with one document, which will have 3 versions.
If it will route to 3 different shards, you will end up with 3 new documents.

Modifying the content of a ingested pdf

I have a pipeline created in elasticsearch that ingests a document with an array of pdfs. I want to modify the content field in order to concatenate other fields at the end for searching purposes.
My pipeline is:
client.ingest.putPipeline({
id: 'my-pipeline-id',
body: {
"description" : "Extract attachment information",
"processors" :
[
{
"foreach": {
"field": "attachments",
"processor": {
"attachment": {
"target_field": "_ingest._value.attachment",
"field": "_ingest._value.data"
}
}
}
}
]
}
}, callback);
I can't add a set processor after the foreach, because I need to access the content of every pdf in order to put the values of that document at the end of the content.
Some example documents are:
let doc = {
matricula: '6789AAA',
bastidor: 'BASTIDOR789',
expediente: '79',
attachments:
[
{
filename: "informe",
data: /* chunk of data in base64 */
},
{
filename: "ivtm_diba",
data: /* another chunk of data in base64 */
}
]
};
The result document will look like this:
{
"_index": "doc",
"_type": "document",
"_id": "AVsy85rwMuPe74hQBT8L",
"_score": 1.2039728,
"_source": {
"attachments": [
{
"filename": "informe",
"attachment": {
"Very very long content",
"date": "2016-06-08T14:01:25Z",
"content_type": "application/pdf",
"language": "es",
"content_length": 3124
}
},
{
"filename": "ivtm_diba",
"attachment": {
"content": "Very long content here",
"content_type": "application/pdf",
"language": "ca",
"content_length": 5657
}
}
],
"expediente": "79",
"matricula": "6789ZXC",
"bastidor": "BASTIDOR789"
}
}
And I want to add to the content field the values of "bastidor", "matricula" and "expediente" fields.
I'm using elasticsearch-js but that's not a requirement.
The elasticsearch _all field can be used in most cases like this.

Resources