upserting batches into elasticsearch store with bulk API - elasticsearch

I have huge set of documents with same index and same type but obviously different ids. I want to either update existing ones or insert new in batches. How can I achieve it using bulk indexing API? I want to do something like below but it throws error. Basically, I want to upsert multiple docs in batches which have same index and same type.
curl -s -H "Content-Type: application/json" -XPOST localhost:9200/_bulk -d'
{ "index": {"_type": "sometype", "_index": "someindex"}}
{ "_id": "existing_id", "field1": "test1"}
{ "_id": "existing_id2", "field2": "test2"}
'

You need to do it like this:
curl -s -H "Content-Type: application/json" -XPOST localhost:9200/someindex/sometype/_bulk -d'
{ "index": {"_id": "existing_id"}}
{ "field1": "test1"}
{ "index": {"_id": "existing_id2"}}
{ "field2": "test2"}
'
Since all documents are in the same index/type, move that to the URL and only specify the _id for each document you want to update in your bulk.

Related

Elasticsearch join-like query within same index

I have an index with a following structure (mapping)
{
"properties": {
"content": {
"type": "text",
},
"prev_id": {
"type": "text",
},
"next_id": {
"type": "text",
}
}
}
where prev_id and next_id are IDs of documents in this index (may be null values).
I want to perform _search query and get prev.content and next.content fields.
Now I use two queries: the first for searching by content field
curl -X GET 'localhost:9200/idx/_search' -H 'content-type: application/json' -d '{
"query": {
"match": {
"content": "yellow fox"
}
}
}'
and the second to get next and prev records.
curl -X GET 'localhost:9200/idx/_search' -H 'content-type: application/json' -d '{
"query": {
"ids": {
"values" : ["5bb93552e42140f955501d7b77dc8a0a", "cd027a48445a0a193bc80982748bc846", "9a5b7359d3081f10d099db87c3226d82"]
}
}
}'
Then I join results on application-side.
Can I achieve my goal with one query only?
PS: the purpose to store next-prev as IDs is to safe disk space. I have a lot of records and content field is quite large.
What you are doing is the way to go. But how large is the content? - Maybe you can consider not storing content ( source = false)?

elastic-search query all fields including nested fields

I am using ES version 5.6.
I have a document like below stored in ES.
{
"swType": "abc",
"swVersion": "xyz",
"interfaces": [
{
"autoneg": "enabled",
"loopback": "disabled",
"duplex": "enabled"
},
{
"autoneg": "enabled",
"loopback": "disabled",
"duplex": "enabled"
}
]
}
I want to search on all fields that has "enabled".
I tried the below queries, but they did not work.
curl -XGET "http://esserver:9200/comcast/inventory/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"_all": "enabled"
}
}
}'
curl -XGET "http://esserver:9200/comcast/inventory/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"query_string": {
"query": "enabled",
"fields": ["*"]
}
}
}'
But the below query worked
curl -XGET "http://esserver:9200/comcast/inventory/_search" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"_all": "abc"
}
}
}'
So, looks _all is matching only top level fields and not nested fields.
Is there any way to query for a text contained in all fields including nested ones. I don't want to specify the nested field names explicitly.
I am looking for kind of global search where I want to search for "text"
anywhere in the document.
Thanks.
OK. Got it working.
I had set mapping dynamic:false. Looks like ES will search only in the fields
specified in mappings and I was having my search words in dynamically added fields.
Making dynamic:'strict' helped me in narrowing the issue.

Document Count in keyword buckets from list in document as aggregation in Elasticsearch

The situation:
I am a starter in Elasticsearch and cannot wrap my head around how to use the aggregations go get what I need.
I have documents with the following structure:
{
...
"authors" : [
{
"name" : "Bob",
"#type" : "Person"
}
],
"resort": "Politics",
...
}
I want to use an aggregation to get the documents count for every author. Since there may be more than one author for some documents, these documents should be counted for every author individually.
What I've tried:
Since the terms aggregation worked with the resort field I tried using it with authors or the name field inside, but always getting no buckets at all. For this I used the following curl request:
curl -X POST 'localhost:9200/news/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": false,
"aggs": {
"author_agg": { "terms": {"field": "authors.keyword" } }
}
}'
I concluded, that the terms aggregation doesn't work with fields, that are contained by a list.
Next I thought about the nested aggregation, but the documentation says, it is a
single bucket aggregation
so not what I am searching for. Because I ran out of ideas I tried it, but was getting the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I found this answer and tried use it for my data. I had the following request:
curl -X GET "localhost:9200/news/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"nest": {
"nested": {
"path": "authors"
},
"aggs": {
"authorname": {
"terms" : {
"field": "name.keyword"
}
}
}
}
}
}'
which gave me the error
"type" : "aggregation_execution_exception",
"reason" : "[nested] nested path [authors] is not nested"
I searched for how to make my path nested using mappings, but I couldn't find out how to accomplish that. I don't even know, if this actually makes sense or not.
So how can I aggregate the documents into buckets based on a key, that lies in elements of a list inside the documents?
Maybe this question have been answered somewhere else, but then I'm not able to state my problem in the right way, since I'm still confused by all the new information. Thank you for your help in advance.
I finally solved my problem:
The idea of getting the authors key mapping nested was totally right. But unfortunately Elasticsearch does not let you change the type from un-nested to nested directly, because all items in this key then have to be indexed too. So you have to go the following way:
Create a new index with a custom mapping. Here we go into the document type _doc, into it's properties and then into the documents field authors. There we set type to nested.
~
curl -X PUT "localhost:9200/new_index?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"_doc" : {
"properties" : {
"authors": { "type": "nested" }
}
}
}
}'
Then we reindex our dataset and set the destination to our newly created index. This will index the data from the old index into the new index, essentially copying the pure data, but taking the new mapping (since settings and mappings are not copied this way).
~
curl -X POST "localhost:9200/_reindex" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}'
Now we can do the nested aggregation here, to sort the documents into buckets based on the authors:
curl -X GET 'localhost:9200/new_index/_doc/_search?pretty' -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"authors": {
"nested": {
"path": "authors"
},
"aggs": {
"authors_by_name": {
"terms": { "field": "authors.name.keyword" }
}
}
}
}
}'
I don't know how to rename indices until now, but surely you can just simple delete the old index and then do the described procedure to create another new index with the name of the old one but the custom mapping.

Unable to create visualization using curl command in elaticearch

I am trying to create visualization using curl command. I am using elasticsearch 6.2.3. I am able to create the same in elasticsearch 5.6.8.
I am using this command
curl -XPUT http://localhost:9200/.kibana/visualization/vis1 -H 'Content-Type: application/json' -d #vis1.json
It is showing this error :
{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Rejecting mapping update to [.kibana] as the final mapping would have more than 1 type: [visualization, doc]"}],"type":"illegal_argument_exception","reason":"Rejecting mapping update to [.kibana] as the final mapping would have more than 1 type: [visualization, doc]"},"status":400}
Contents of vis1.json:
{
"title": "vis1",
"visState": "{\"title\":\"vis1\",\"type\":\"table\",\"params\":{\"perPage\":10,\"showMeticsAtAllLevels\":false,\"showPartialRows\":false,\"showTotal\":false,\"sort\":{\"columnIndex\":null,\"direction\":null},\"totalFunc\":\"sum\"},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"date_histogram\",\"schema\":\"split\",\"params\":{\"field\":\"UsageEndDate\",\"interval\":\"M\",\"customInterval\":\"2h\",\"min_doc_count\":1,\"extended_bounds\":{},\"row\":false}},{\"id\":\"3\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"bucket\",\"params\":{\"field\":\"ProductName.keyword\",\"otherBucket\":false,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"4eb9f840-3969-11e8-ae19-552e148747c3\",\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":\"\"}}"
}
}
This is working fine in elasticearch 5.6.8 but not in 6.2.3.
Thanks in Advance.
In Kibana 6, the mapping of the .kibanaindex has changed in order to satisfy the upcoming "one mapping per index" breaking change.
You can try this way instead:
curl -XPUT http://localhost:9200/.kibana/doc/visualization:vis1 -H 'Content-Type: application/json' -d #vis1.json
Also the vis1.json file needs to be changed a little bit (the content needs to be moved to the visualization sub-section), like this:
{
"type": "visualization",
"updated_at": "2018-04-10T10:00:00.000Z",
"visualization": {
"title": "vis1",
"visState": "{\"title\":\"vis1\",\"type\":\"table\",\"params\":{\"perPage\":10,\"showMeticsAtAllLevels\":false,\"showPartialRows\":false,\"showTotal\":false,\"sort\":{\"columnIndex\":null,\"direction\":null},\"totalFunc\":\"sum\"},\"aggs\":[{\"id\":\"1\",\"enabled\":true,\"type\":\"count\",\"schema\":\"metric\",\"params\":{}},{\"id\":\"2\",\"enabled\":true,\"type\":\"date_histogram\",\"schema\":\"split\",\"params\":{\"field\":\"UsageEndDate\",\"interval\":\"M\",\"customInterval\":\"2h\",\"min_doc_count\":1,\"extended_bounds\":{},\"row\":false}},{\"id\":\"3\",\"enabled\":true,\"type\":\"terms\",\"schema\":\"bucket\",\"params\":{\"field\":\"ProductName.keyword\",\"otherBucket\":false,\"otherBucketLabel\":\"Other\",\"missingBucket\":false,\"missingBucketLabel\":\"Missing\",\"size\":5,\"order\":\"desc\",\"orderBy\":\"1\"}}]}",
"uiStateJSON": "{\"vis\":{\"params\":{\"sort\":{\"columnIndex\":null,\"direction\":null}}}}",
"description": "",
"version": 1,
"kibanaSavedObjectMeta": {
"searchSourceJSON": "{\"index\":\"4eb9f840-3969-11e8-ae19-552e148747c3\",\"filter\":[],\"query\":{\"language\":\"lucene\",\"query\":\"\"}}"
}
}
}

Add additional attribute to an existing document if the attribute doesn't exist elasticsearch

I have a specific requirement were I have to add an additional attribute to elastic search index which has n documents. This has to be done only if the documents don't contain the attribute. This tasks basically involves 2 steps
1) searching
2) updating
I know how to do this with multiple queries. But it would be great if I manage to do this in a single query. Is it possible? If yes, can someone tell me how this can be done.
You can use update by query combined with the exists query to update and add the new field to only those documents which does not contain the attribute.
For example, you have only one documents containing field attrib2, others don't have that field.
curl -XPUT "http://localhost:9200/my_test_index/doc/1" -H 'Content-Type: application/json' -d'
{
"attrib1": "value1"
}'
curl -XPUT "http://localhost:9200/my_test_index/doc/2" -H 'Content-Type: application/json' -d'
{
"attrib1": "value21"
}'
curl -XPUT "http://localhost:9200/my_test_index/doc/3" -H 'Content-Type: application/json' -d'
{
"attrib1": "value31",
"attrib2": "value32"
}'
The following update by query will do the job.
curl -XPOST "http://localhost:9200/my_test_index/_update_by_query" -H 'Content-Type: application/json' -d'
{
"script": {
"lang": "painless",
"source": "ctx._source.attrib2 = params.attrib2",
"params": {
"attrib2": "new_value_for_attrib2"
}
},
"query": {
"bool": {
"must_not": [
{
"exists": {
"field": "attrib2"
}
}
]
}
}
}'
It will set the new value new_value_for_attrib2 to the field attrib2 on only those documents which don't already have that field.

Resources