Elasticsearch bulk update geo_location of all documents with curl - elasticsearch

I have a bunch of documents in Elasticsearch that don't have a geo_point attribute.
Now I want to add it to all of them.
With some research I found the command bellow but was originally used to update a string attribute.
curl -XPOST "http://localhost:9200/products/_update_by_query" -H 'Content-Type: application/json' -d'
{
"script": {
"source": "ctx._source.location = {'lat': 0.0, 'lon':0.0}",
"lang": "painless"
},
"query": {
"match_all": {}
}
}'
Thought I'd just replace the string with geo_point but it gives me this error:
{
"error":{
"root_cause":[{
"type":"parse_exception",
"reason":"expected one of [inline], [file] or [stored] fields, but found none"
}],
"type":"parse_exception",
"reason":"expected one of [inline], [file] or [stored] fields, but found none"
},
"status":400
}
I appreciate any help.

Good job so far!
It looks like you're running an older version of ES. Try the command below which simply replaces source by inline as it was the norm in older versions:
curl -XPOST "http://localhost:9200/products/_update_by_query" -H 'Content-Type: application/json' -d'
{
"script": {
"inline": "ctx._source.location = ['lat': 0.0, 'lon':0.0]",
"lang": "painless"
},
"query": {
"match_all": {}
}
}'
Note, however, that if your location field is already of type text or string you cannot change it to geo_point with this command. You'll need to either create a new field named differently than location and of type geo_point or create a new index with the proper mapping for the location field.
Edit: If the above doesn't work, try replacing single quote ' with \" like so
curl -XPOST "http://localhost:9200/products/_update_by_query" -H 'Content-Type: application/json' -d'
{
"script": {
"inline": "ctx._source.location = [\"lat\": 0.0, \"lon\":0.0]",
"lang": "painless"
},
"query": {
"match_all": {}
}
}'

Related

Elasticsearch join-like query within same index

I have an index with a following structure (mapping)
{
"properties": {
"content": {
"type": "text",
},
"prev_id": {
"type": "text",
},
"next_id": {
"type": "text",
}
}
}
where prev_id and next_id are IDs of documents in this index (may be null values).
I want to perform _search query and get prev.content and next.content fields.
Now I use two queries: the first for searching by content field
curl -X GET 'localhost:9200/idx/_search' -H 'content-type: application/json' -d '{
"query": {
"match": {
"content": "yellow fox"
}
}
}'
and the second to get next and prev records.
curl -X GET 'localhost:9200/idx/_search' -H 'content-type: application/json' -d '{
"query": {
"ids": {
"values" : ["5bb93552e42140f955501d7b77dc8a0a", "cd027a48445a0a193bc80982748bc846", "9a5b7359d3081f10d099db87c3226d82"]
}
}
}'
Then I join results on application-side.
Can I achieve my goal with one query only?
PS: the purpose to store next-prev as IDs is to safe disk space. I have a lot of records and content field is quite large.
What you are doing is the way to go. But how large is the content? - Maybe you can consider not storing content ( source = false)?

escape triple quotes in curl correctly

I have the following curl request
curl -H "Content-Type: application/json" -X POST http://localhost:9200/_reindex\?wait_for_completion\=true -d '{"source": {"index": "analytics-prod-2019.12.30", "size":1000 }, "dest": {"index": "analytics-prod-2019.12"}, "conflicts": "proceed", "script": { "lang": "painless","source: """ctx._source.index = ctx._index; def eventData = ctx._source["event.data"]; if(eventData != null) { eventData.remove("realmDb.size"); eventData.remove("realmDb.format"); eventData.remove("realmDb.contents"); }""" } }'
but this fails with the following error:
{"error":{"root_cause":[{"type":"x_content_parse_exception","reason":"[1:166] [script] failed to parse object"}],"type":"x_content_parse_exception","reason":"[1:166] [reindex] failed to parse field [script]","caused_by":{"type":"x_content_parse_exception","reason":"[1:166] [script] failed to parse object","caused_by":{"type":"json_parse_exception","reason":"Unexpected character ('\"' (code 34)): was expecting a colon to separate field name and value\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper#51c48433; line: 1, column: 177]"}}},"status":400}
if i remove the script field from the request this works just fine:
curl -H "Content-Type: application/json" -X POST http://localhost:9200/_reindex\?wait_for_completion\=true -d '{"source":{"index":"analytics-prod-2019.12.30","size":1000},"dest":{"index":"test-index"},"conflicts":"proceed"}}'
using the kibana UI works fine.
what is the correct way to run this in curl?
Use a single " to surround your script value and \u0027 to escape in your Painless script.
curl -H "Content-Type: application/json" -X POST http://localhost:9200/_reindex\?wait_for_completion\=true -d '
{
"source": {
"index": "analytics-prod-2019.12.30",
"size": 1000
},
"dest": {
"index": "analytics-prod-2019.12"
},
"conflicts": "proceed",
"script": {
"lang": "painless",
"source": "ctx._source.index = ctx._index; def eventData = ctx._source[\u0027event.data\u0027]; if(eventData != null) { eventData.remove(\u0027realmDb.size\u0027); eventData.remove(\u0027realmDb.format\u0027); eventData.remove(\u0027realmDb.contents\u0027);"
}
}
'
You can also see an example of this here, click on the Copy as cURL link and review the example in that format.
Your source was missing a double quote:
Corrected:
curl -H "Content-Type: application/json" \
-X POST http://localhost:9200/_reindex\?wait_for_completion\=true \
-d '{"source": {"index": "analytics-prod-2019.12.30", "size":1000 }, "dest": {"index": "analytics-prod-2019.12"}, "conflicts": "proceed", "script": { "lang": "painless","source": "ctx._source.index = ctx._index; def eventData = ctx._source[\"event.data\"]; if (eventData != null) { eventData.remove(\"realmDb.size\"); eventData.remove(\"realmDb.format\"); eventData.remove(\"realmDb.contents\"); }" } }'
You can either use single quotes like #Zsolt pointed out but even Kibana itself, when clicking "Copy as cURL", uses escaped double quotes.
curl -XPOST "http://elasticsearch:9200/_reindex?requests_per_second=115&wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "analytics-prod-2019.12.30",
"size": 1000
},
"dest": {
"index": "analytics-prod-2019.12"
},
"script": {
"lang": "painless",
"source": " ctx._source.index = ctx._index;\n def eventData = ctx._source[\"event.data\"];\n if (eventData != null) {\n eventData.remove(\"realmDb.size\");\n eventData.remove(\"realmDb.format\");\n eventData.remove(\"realmDb.contents\");\n }"
}
}'
had to escape \"

elastic-search query all fields including nested fields

I am using ES version 5.6.
I have a document like below stored in ES.
{
"swType": "abc",
"swVersion": "xyz",
"interfaces": [
{
"autoneg": "enabled",
"loopback": "disabled",
"duplex": "enabled"
},
{
"autoneg": "enabled",
"loopback": "disabled",
"duplex": "enabled"
}
]
}
I want to search on all fields that has "enabled".
I tried the below queries, but they did not work.
curl -XGET "http://esserver:9200/comcast/inventory/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"_all": "enabled"
}
}
}'
curl -XGET "http://esserver:9200/comcast/inventory/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string": {
"query": "enabled",
"fields": ["*"]
}
}
}'
But the below query worked
curl -XGET "http://esserver:9200/comcast/inventory/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"_all": "abc"
}
}
}'
So, looks _all is matching only top level fields and not nested fields.
Is there any way to query for a text contained in all fields including nested ones. I don't want to specify the nested field names explicitly.
I am looking for kind of global search where I want to search for "text"
anywhere in the document.
Thanks.
OK. Got it working.
I had set mapping dynamic:false. Looks like ES will search only in the fields
specified in mappings and I was having my search words in dynamically added fields.
Making dynamic:'strict' helped me in narrowing the issue.

Document Count in keyword buckets from list in document as aggregation in Elasticsearch

The situation:
I am a starter in Elasticsearch and cannot wrap my head around how to use the aggregations go get what I need.
I have documents with the following structure:
{
...
"authors" : [
{
"name" : "Bob",
"#type" : "Person"
}
],
"resort": "Politics",
...
}
I want to use an aggregation to get the documents count for every author. Since there may be more than one author for some documents, these documents should be counted for every author individually.
What I've tried:
Since the terms aggregation worked with the resort field I tried using it with authors or the name field inside, but always getting no buckets at all. For this I used the following curl request:
curl -X POST 'localhost:9200/news/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": false,
"aggs": {
"author_agg": { "terms": {"field": "authors.keyword" } }
}
}'
I concluded, that the terms aggregation doesn't work with fields, that are contained by a list.
Next I thought about the nested aggregation, but the documentation says, it is a
single bucket aggregation
so not what I am searching for. Because I ran out of ideas I tried it, but was getting the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I found this answer and tried use it for my data. I had the following request:
curl -X GET "localhost:9200/news/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"nest": {
"nested": {
"path": "authors"
},
"aggs": {
"authorname": {
"terms" : {
"field": "name.keyword"
}
}
}
}
}
}'
which gave me the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I searched for how to make my path nested using mappings, but I couldn't find out how to accomplish that. I don't even know, if this actually makes sense or not.
So how can I aggregate the documents into buckets based on a key, that lies in elements of a list inside the documents?
Maybe this question have been answered somewhere else, but then I'm not able to state my problem in the right way, since I'm still confused by all the new information. Thank you for your help in advance.
I finally solved my problem:
The idea of getting the authors key mapping nested was totally right. But unfortunately Elasticsearch does not let you change the type from un-nested to nested directly, because all items in this key then have to be indexed too. So you have to go the following way:
Create a new index with a custom mapping. Here we go into the document type _doc, into it's properties and then into the documents field authors. There we set type to nested.
~
curl -X PUT "localhost:9200/new_index?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"_doc" : {
"properties" : {
"authors": { "type": "nested" }
}
}
}
}'
Then we reindex our dataset and set the destination to our newly created index. This will index the data from the old index into the new index, essentially copying the pure data, but taking the new mapping (since settings and mappings are not copied this way).
~
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}'
Now we can do the nested aggregation here, to sort the documents into buckets based on the authors:
curl -X GET 'localhost:9200/new_index/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"authors": {
"nested": {
"path": "authors"
},
"aggs": {
"authors_by_name": {
"terms": { "field": "authors.name.keyword" }
}
}
}
}
}'
I don't know how to rename indices until now, but surely you can just simple delete the old index and then do the described procedure to create another new index with the name of the old one but the custom mapping.

Add additional attribute to an existing document if the attribute doesn't exist elasticsearch

I have a specific requirement were I have to add an additional attribute to elastic search index which has n documents. This has to be done only if the documents don't contain the attribute. This tasks basically involves 2 steps
1) searching
2) updating
I know how to do this with multiple queries. But it would be great if I manage to do this in a single query. Is it possible? If yes, can someone tell me how this can be done.
You can use update by query combined with the exists query to update and add the new field to only those documents which does not contain the attribute.
For example, you have only one documents containing field attrib2, others don't have that field.
curl -XPUT "http://localhost:9200/my_test_index/doc/1" -H 'Content-Type: application/json' -d'
{
"attrib1": "value1"
}'
curl -XPUT "http://localhost:9200/my_test_index/doc/2" -H 'Content-Type: application/json' -d'
{
"attrib1": "value21"
}'
curl -XPUT "http://localhost:9200/my_test_index/doc/3" -H 'Content-Type: application/json' -d'
{
"attrib1": "value31",
"attrib2": "value32"
}'
The following update by query will do the job.
curl -XPOST "http://localhost:9200/my_test_index/_update_by_query" -H 'Content-Type: application/json' -d'
{
"script": {
"lang": "painless",
"source": "ctx._source.attrib2 = params.attrib2",
"params": {
"attrib2": "new_value_for_attrib2"
}
},
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "attrib2"
}
}
]
}
}
}'
It will set the new value new_value_for_attrib2 to the field attrib2 on only those documents which don't already have that field.

Resources