Reindexing Elasticsearch index with parent and child relationship - elasticsearch

we currently have a 'message' that can have a link to a 'parent' message. E.g. a reply would have the original message as the parent_id.
PUT {
"mappings": {
"message": {
"properties": {
"subject": {
"type": "text"
},
"body" : {
"type" : "text"
},
"parent_id" : {
"type" : "long"
}
}
}
}
}
}
Currently we didn't have an elasticsearch parent child join on the document as parent and child weren't allowed to be of the same type. Now with 5.6 and the drive by elastic to get rid of types we are now trying to use the new parent and child join in 5.6 which.
PUT {
"settings": {
"mapping.single_type": true
},
"mappings": {
"message": {
"properties": {
"subject": {
"type": "text"
},
"body" : {
"type" : "text"
},
"join_field": {
"type" : "join",
"relations": {
"parent_message":"child_message"
}
}
}
}
}
}
}
I know I will have to create a new index for this and then reindex everything with _reindex but I am not quite sure how I would do that.
If I index a parent_message it is simple
PUT localhost:9200/testm1/message/1
{
"subject": "Message 1",
"body" : "body 1"
}
PUT localhost:9200/testm1/message/3?routing=1
{
"subject": "Message Reply to 1",
"body" : "body 3",
"join_field": {
"name": "child_message",
"parent": "1"
}
}
A search would now return
{
"_index": "testm1",
"_type": "message",
"_id": "2",
"_score": 1,
"_source": {
"subject": "Message 2",
"body": "body 2"
}
},
{
"_index": "testm1",
"_type": "message",
"_id": "1",
"_score": 1,
"_source": {
"subject": "Message 1",
"body": "body 1"
}
},
{
"_index": "testm1",
"_type": "message",
"_id": "3",
"_score": 1,
"_routing": "1",
"_source": {
"subject": "Message Reply to 1",
"body": "body 3",
"join_field": {
"name": "child_message",
"parent": "1"
}
}
}
I tried to create the new index (testmnew) and then just do a _reindex
POST _reindex
{
"source": {
"index" : "testm"
},
"dest" :{
"index" : "testmnew"
},
"script" : {
"inline" : """
ctx._routing = ctx._source.parent_id;
--> Missing need to set join_field here as well I guess <--
"""
}
}
The scripting is still not quite clear to me. But am I on the right path here? Would I simply set the _routing on the messages (would be null on parent messages). But how would I set the join_field only for child messages?

This is the re-indexing script that I've used in the end:
curl -XPOST 'localhost:9200/_reindex' -H 'Content-Type: application/json' -d'
{
"source": {
"index" : "testm"
},
"dest" :{
"index" : "testmnew"
},
"script" : {
"lang" : "painless",
"source" : "if(ctx._source.parent_id != null){ctx._routing = ctx._source.parent_id; ctx._source.join_field= params.cjoin; ctx._source.join_field.parent = ctx._source.parent_id;}else{ctx._source.join_field = params.parent_join}",
"params" : {
"cjoin" :{
"name": "child_message",
"parent": 1
},
"parent_join" : {"name": "parent_message"}
}
}
}
'

Related

Elastic Search: Adding an element on an array

I am trying to batch update documents on the elastic search index. I want to know how can I achieve this scenario.
I have to create document if no document of that primaryKey exist.
I have to add the data in the array of the document if the primary key exist.
For example -
For initial write / if primary key not present.
Document written =
{
PrimaryKey,
DataList: [
{
DataField1: fieldValue1,
DataField2: fieldValue2,
}
]
}
if the document was present, the entry would have been appended to the list
{
PrimaryKey,
DataList: [
{
DataField1: fieldValue1,
DataField2: fieldValue2,
},
{
DataField1: fieldValue3,
DataField2: fieldValue4
}
....
]
}
In a batch update both types of primaryKeys may be present one which have document already present in the index, some document which was never added to the index.
I think this example can serve as a basis for your bulk.
What I did was to consider that the _id and PrimaryKey are the same because the way to know if the docmentos exists is through the _id, if it doesn't exist a new document is created.
I used the script to add items to the list if it already exists.
Read more about Update API upsert parameter.
Mapping
PUT my-index-000001
{
"mappings": {
"properties": {
"PrimaryKey": {
"type": "keyword"
},
"DataField1": {
"type": "nested"
}
}
}
}
POST my-index-000001/_doc/1
{
"PrimaryKeyame": 1,
"DataList": [
{
"DataField1": "fieldValue1",
"DataField2": "fieldValue2"
}
]
}
Bulk will add items to doc 1 and create the new document 2 (this does not exist in the index).
POST _bulk
{ "update" : { "_id" : "1", "_index" : "my-index-000001", "retry_on_conflict" : 3} }
{ "script" : { "source": "if (ctx._source.PrimaryKeyame != null) { ctx._source.DataList.addAll(params.DataList); }", "lang" : "painless", "params": { "PrimaryKeyame": "1", "DataList": [{"DataField1": "fieldValue3","DataField2": "fieldValue4"}]}}, "upsert" : {"PrimaryKeyame": "1", "DataList": [{"DataField1": "fieldValue3","DataField2": "fieldValue4"}]}}
{ "update" : { "_id" : "2", "_index" : "my-index-000001", "retry_on_conflict" : 3} }
{ "script" : { "source": "if (ctx._source.PrimaryKeyame != null) { ctx._source.DataList.addAll(params.DataList); }", "lang" : "painless", "params": { "PrimaryKeyame": "2", "DataList": [{"DataField1": "fieldValue3","DataField2": "fieldValue4"}]}}, "upsert" : {"PrimaryKeyame": "2", "DataList": [{"DataField1": "fieldValue3","DataField2": "fieldValue4"}]}}
Get Documents:
"hits": [
{
"_index": "my-index-000001",
"_id": "1",
"_score": 1,
"_source": {
"PrimaryKeyame": 1,
"DataList": [
{
"DataField1": "fieldValue1",
"DataField2": "fieldValue2"
},
{
"DataField2": "fieldValue4",
"DataField1": "fieldValue3"
}
]
}
},
{
"_index": "my-index-000001",
"_id": "2",
"_score": 1,
"_source": {
"PrimaryKeyame": "2",
"DataList": [
{
"DataField1": "fieldValue3",
"DataField2": "fieldValue4"
}
]
}
}
]

Filter elastic search data when fields contain ~

I have bunch of documents like below. I want to filter the data where projectkey starts with ~.
I did read some articles which says ~ is an operator in Elastic query so cannot really filter with that.
Can someone help to form the search query for /branch/_search API ??
{
"_index": "branch",
"_type": "_doc",
"_id": "GAz-inQBJWWbwa_v-l9e",
"_version": 1,
"_score": null,
"_source": {
"branchID": "refs/heads/feature/12345",
"displayID": "feature/12345",
"date": "2020-09-14T05:03:20.137Z",
"projectKey": "~user",
"repoKey": "deploy",
"isDefaultBranch": false,
"eventStatus": "CREATED",
"user": "user"
},
"fields": {
"date": [
"2020-09-14T05:03:20.137Z"
]
},
"highlight": {
"projectKey": [
"~#kibana-highlighted-field#user#/kibana-highlighted-field#"
],
"projectKey.keyword": [
"#kibana-highlighted-field#~user#/kibana-highlighted-field#"
],
"user": [
"#kibana-highlighted-field#user#/kibana-highlighted-field#"
]
},
"sort": [
1600059800137
]
}
UPDATE***
I used prerana's answer below to use -prefix in my query
Something is still wrong when i use prefix and range - i get below error - What am i missing ??
GET /branch/_search
{
"query": {
"prefix": {
"projectKey": "~"
},
"range": {
"date": {
"gte": "2020-09-14",
"lte": "2020-09-14"
}
}
}
}
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[prefix] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 6,
"col": 5
}
],
"type": "parsing_exception",
"reason": "[prefix] malformed query, expected [END_OBJECT] but found [FIELD_NAME]",
"line": 6,
"col": 5
},
"status": 400
}
If I understood your issue well, I suggest the creation of a custom analyzer to search the special character ~.
I did a test locally as follows while replacing ~ to __SPECIAL__ :
I created an index with a custom char_filter alongside with the addition of a field to the projectKey field. The name of the new multi_field is special_characters.
Here is the mapping:
PUT wildcard-index
{
"settings": {
"analysis": {
"char_filter": {
"special-characters-replacement": {
"type": "mapping",
"mappings": [
"~ => __SPECIAL__"
]
}
},
"analyzer": {
"special-characters-analyzer": {
"tokenizer": "standard",
"char_filter": [
"special-characters-replacement"
]
}
}
}
},
"mappings": {
"properties": {
"projectKey": {
"type": "text",
"fields": {
"special_characters": {
"type": "text",
"analyzer": "special-characters-analyzer"
}
}
}
}
}
}
Then I ingested the following contents in the index:
"projectKey": "content1 ~"
"projectKey": "This ~ is a content"
"projectKey": "~ cars on the road"
"projectKey": "o ~ngram"
Then, the query was:
GET wildcard-index/_search
{
"query": {
"match": {
"projectKey.special_characters": "~"
}
}
}
The response was:
"hits" : [
{
"_index" : "wildcard-index",
"_type" : "_doc",
"_id" : "h1hKmHQBowpsxTkFD9IR",
"_score" : 0.43250346,
"_source" : {
"projectKey" : "content1 ~"
}
},
{
"_index" : "wildcard-index",
"_type" : "_doc",
"_id" : "iFhKmHQBowpsxTkFFNL5",
"_score" : 0.3034693,
"_source" : {
"projectKey" : "This ~ is a content"
}
},
{
"_index" : "wildcard-index",
"_type" : "_doc",
"_id" : "-lhKmHQBowpsxTkFG9Kg",
"_score" : 0.3034693,
"_source" : {
"projectKey" : "~ cars on the road"
}
}
]
Please let me know If you have any issue, I will be glad to help you.
Note: This method works if there is a blank space after the ~. You can see from the response that the 4th data was not displayed.
while #hansley answer would work, but it requires you to create a custom analyzer and still as you mentioned you want to get only the docs which starts with ~ but in his result I see all the docs containing ~, so providing my answer which requires very less configuration and works as required.
Index mapping default, so just index below docs and ES will create a default mapping with .keyword field for all text field
Index sample docs
{
"title" : "content1 ~"
}
{
"title" : "~ staring with"
}
{
"title" : "in between ~ with"
}
Search query should fetch obly 2nd docs from sample docs
{
"query": {
"prefix" : { "title.keyword" : "~" }
}
}
And search result
"hits": [
{
"_index": "pre",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"title": "~ staring with"
}
}
]
Please refer prefix query for more info
Update 1:
Index Mapping:
{
"mappings": {
"properties": {
"date": {
"type": "date"
}
}
}
}
Index Data:
{
"date": "2015-02-01",
"title" : "in between ~ with"
}
{
"date": "2015-01-01",
"title": "content1 ~"
}
{
"date": "2015-02-01",
"title" : "~ staring with"
}
{
"date": "2015-02-01",
"title" : "~ in between with"
}
Search Query:
{
"query": {
"bool": {
"must": [
{
"prefix": {
"title.keyword": "~"
}
},
{
"range": {
"date": {
"lte": "2015-02-05",
"gte": "2015-01-11"
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "stof_63924930",
"_type": "_doc",
"_id": "2",
"_score": 2.0,
"_source": {
"date": "2015-02-01",
"title": "~ staring with"
}
},
{
"_index": "stof_63924930",
"_type": "_doc",
"_id": "4",
"_score": 2.0,
"_source": {
"date": "2015-02-01",
"title": "~ in between with"
}
}
]

How do I increment the weight of a completion suggest field?

I am making a completion suggester. I would like to increment the weight of some of the indexed docs by incrementing them. I have:
POST /tester/
{
"mappings": {
"song": {
"properties": {
"suggest": {
"type": "completion",
"analyzer": "simple",
"search_analyzer" : "simple",
"payloads": true,
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 100
}
}
}
}
}
// Index a doc
PUT tester/song/1
{
"name" : "Nevermind",
"suggest" : {
"input": [ "Nevermind", "Nirvana" ],
"output": "Nirvana - Nevermind",
"payload" : { "artistId" : 2321 },
"weight" : 1
}
}
// Increment the weight
POST /tester/song/1
{
"script" : {
"inline": "ctx._source.suggest.weight += 1"
}
}
// The result of GET /tester
{
"_index": "tester",
"_type": "song",
"_id": "1",
"_score": 1,
"_source": {
"script": {
"inline": "ctx._source.suggest.weight += 1"
}
}
}
Rather than incrementing the weight it rewrites the document. What am I doing wrong here?
First by adding these lines to your configuration you should enable dynamic scripting:
script.inline: true
script.indexed: true
Then you need to use _update endpoint to update:
POST 'localhost:9200/tester/song/1/_update' -d '
{
"script" : {
"inline": "ctx._source.suggest.weight += 1"
}
}'
Check:https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#_scripted_updates

Wrong indexation elasticsearch using the analyser

I did a pretty simple test. I build a student index and a type, then I define a mapping:
POST student
{
"mappings" : {
"ing3" : {
"properties" : {
"quote": {
"type": "string",
"analyzer": "english"
}
}
}
}
}
After that I add 3 students to this index:
POST /student/ing3/1
{
"name": "Smith",
"first_name" : "John",
"quote" : "Learning is so cool!!"
}
POST /student/ing3/2
{
"name": "Roosevelt",
"first_name" : "Franklin",
"quote" : "I learn everyday"
}
POST /student/ing3/3
{
"name": "Black",
"first_name" : "Mike",
"quote" : "I learned a lot at school"
}
At this point I thought that the english tokeniser will tokenise all the word in my quotes so if I'm making a search like:
GET /etudiant/ing3/_search
{
"query" : {
"term" : { "quote" : "learn" }
}
}
I will have all the document as a result since my tokeniser will make equal "learn, learning, learned" and I was right. But when I try this request:
GET /student/ing3/_search
{
"query" : {
"term" : { "quote" : "learned" }
}
}
I got zero hit and in my opinion I should have the 3rd document (at least?). But for me Elasticsearch is also supposed to index learned and learning not only learn. Am I wrong? Is my request wrong?
If you check:
GET 'index/_analyze?field=quote' -d "I learned a lot at school"
you will see that your sentence is analyzed as:
{
"tokens":[
{
"token":"i",
"start_offset":0,
"end_offset":1,
"type":"<ALPHANUM>",
"position":0
},
{
"token":"learn",
"start_offset":2,
"end_offset":9,
"type":"<ALPHANUM>",
"position":1
},
{
"token":"lot",
"start_offset":12,
"end_offset":15,
"type":"<ALPHANUM>",
"position":3
},
{
"token":"school",
"start_offset":19,
"end_offset":25,
"type":"<ALPHANUM>",
"position":5
}
]
}
So english analyzer removes punctions and stop words and tokenize words in their root form.
https://www.elastic.co/guide/en/elasticsearch/guide/current/using-language-analyzers.html
You can use match query which will also analyze your search text so will match:
GET /etudiant/ing3/_search
{
"query" : {
"match" : { "quote" : "learned" }
}
}
There is another way. You can both stem the terms (the english analyzer does have a stemmer), but also keep the original terms, by using a keyword_repeat token filter and then using a unique token filter with "only_on_same_position": true to remove unnecessary duplicates after the stemming:
PUT student
{
"settings": {
"analysis": {
"analyzer": {
"myAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"keyword_repeat",
"english_stemmer",
"unique_stem"
]
}
},
"filter": {
"unique_stem": {
"type": "unique",
"only_on_same_position": true
},
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
},
"english_possessive_stemmer": {
"type": "stemmer",
"language": "possessive_english"
}
}
}
},
"mappings": {
"ing3": {
"properties": {
"quote": {
"type": "string",
"analyzer": "myAnalyzer"
}
}
}
}
}
In this case the term query will work, as well. If you look at what terms are actually being indexed:
GET /student/_search
{
"fielddata_fields": ["quote"]
}
it will be clear why now it matches:
"hits": [
{
"_index": "student",
"_type": "ing3",
"_id": "2",
"_score": 1,
"_source": {
"name": "Roosevelt",
"first_name": "Franklin",
"quote": "I learn everyday"
},
"fields": {
"quote": [
"everydai",
"everyday",
"i",
"learn"
]
}
},
{
"_index": "student",
"_type": "ing3",
"_id": "1",
"_score": 1,
"_source": {
"name": "Smith",
"first_name": "John",
"quote": "Learning is so cool!!"
},
"fields": {
"quote": [
"cool",
"learn",
"learning",
"so"
]
}
},
{
"_index": "student",
"_type": "ing3",
"_id": "3",
"_score": 1,
"_source": {
"name": "Black",
"first_name": "Mike",
"quote": "I learned a lot at school"
},
"fields": {
"quote": [
"i",
"learn",
"learned",
"lot",
"school"
]
}
}
]

Elastic Search Won't Match For Arrays

I'm trying to search a document with the following structure:
{
"_index": "XXX",
"_type": "business",
"_id": "1252809",
"_score": 1,
"_source": {
"url": "http://Samuraijapanese.com",
"raw_name": "Samurai Restaurant",
"categories": [
{
"name": "Cafe"
},
{
"name": "Cajun Restaurant"
},
{
"name": "Candy Stores"
}
],
"location": {
"lat": "32.9948649",
"lon": "-117.2528171"
},
"address": "979 Lomas Santa Fe Dr",
"zip": "92075",
"phone": "8584810032",
"short_name": "samurai-restaurant",
"name": "Samurai Restaurant",
"apt": "",
"state": "CA",
"stdhours": "",
"city": "Solana Beach",
"hours": "",
"yelp": "",
"twitter": "",
"closed": 0
}
}
Searching it for url, raw_name, address, etc, all work, but searching the categories returns nothing. I'm trying to search like so: If I switch anything else in for categories.name it works:
"query": {
"filtered" : {
"filter" : {
"geo_distance" : {
"location" : {
"lon" : "-117.15726",
"lat" : "32.71533"
},
"distance" : "5mi"
}
},
"query" : {
"multi_match" : {
"query" : "Cafe",
"fields" : [
"categories.name"
]
}
}
}
},
"sort": [
{
"_score" : {
"order" : "desc"
}
},
{
"_geo_distance": {
"location": {
"lat": 32.71533,
"lon": -117.15726
},
"order": "asc",
"sort_mode": "min"
}
}
],
"script_fields": {
"distance_from_origin": {
"script": "doc['location'].arcDistanceInKm(32.71533,-117.15726)"
}
},
"fields": ["_source"],
"from": 0,
"size": 10
}
If I switch out, for example, categories.name with address, and change the search term to Lomas, it returns the result
Without seeing your type mapping I can't answer definitively, but I would guess you have mapped categories as nested. When querying sub-documents of type nested (opposed to object) you have to use a nested query.

Resources