How to insert an element into already present list in elastic search - elasticsearch

Say I have documents stored like below.
document 1
{
id : '1',
title : "This is a test document1",
list : ["value1" , "value2"],
...
}
document 2
{
id : '2',
title : "This is a test document2",
valueList : ["value1" , "value2"],
...
}
I need to add some more elements to the valueList in the documents with a list of document ids using bulk api. The resulting should look like
document 1
{
id : '1',
title : "This is a test document1",
list : ["value1" , "value2", "value3"],
...
}
document 2
{
id : '2',
title : "This is a test document2",
valueList : ["value1" , "value2" , "value3"],
...
}
What can I do to achieve this?
I tried using the scripts but it only updates a single document.
Sorry am really new to elastic search. I could even be stupid on this question. Please forgive and make me clear with this question.

See Updating Document. It should be straightforward. You need to use _update and just to give you an idea, even though the documentation is nearly perfect, it could look like this:
POST /your_index/your_type/document1/_update
{
id : '1',
title : "This is a test document1",
list : ["value1" , "value2", "value3"]
}
This will update document1.
In case of bulk updates you should read Batch Processing and have a look at the Bulk API.
From the docs:
POST /your_index/your_type/_bulk
{ "update" : {"_id" : "document1", "_type" : "your_type", "_index" : "your_index"}}
{ "doc" : {"myfield" : "newvalue"} }
{ "update" : {"_id" : "document2", "_type" : "your_type", "_index" : "your_index"}}
{ "doc" : {"myfield" : "newvalue"} }
Please note that you can just use _update for Partial Updates.
The simplest form of the update request accepts a partial document as
the doc parameter, which just gets merged with the existing document.
Objects are merged together, existing scalar fields are overwritten,
and new fields are added.

Related

Query to compare two columns from two different indexes

I have a Column in one Index with a number of Countries in it, I want to check whether these countries are similar or same as countries in the Column in another index.
So it is like, in one index we have user data with the countries user has specified and in the other index we have the master data with the actual countries. So now I want to check whether the countries entered by the user are the same as the ones in master data.
If anybody knows how to write a query for this in Kibana, kindly help.
GET final,master/_count
{"query": {"bool": {"must": [{"script": {"script":"['A_OPERATINGCOUNTRY'].value == ['AD_Country Name.keyword'].value"}}]}}}
You need to manage this outside Elasticsearch. But, since the incoming data doesn't have the country name you want, why do you want to check this on Elasticsearch? The form you are using should have the exactly value you want to index.
You can use the terms query
Suppose we have an index of page access logs like so:
PUT /mybeat-2018/_doc/1
{
"host" : "elastic.co",
"ttl" : 40
}
PUT /mybeat-2018/_doc/2
{
"host" : "elastic.co",
"ttl" : 666
}
PUT /mybeat-2018/_doc/3
{
"host" : "google.com",
"ttl" : 55
}
and an independent whitelist that can shrink or grow, with a bunch of hosts:
PUT /whitelist/_doc/1
{
"hosts" : [
{
"name" : "elastic.co"
},
{
"name" : "twitter.com"
}
]
}
Then a search on the mybeat-* for whatever is in the whitelist should reference the whitelist document (in our case the document with id: 1) like so:
GET /mybeat-*/_search
{
"query" : {
"terms" : {
"host" : {
"index" : "whitelist",
"type" : "_doc",
"id" : "1",
"path" : "hosts.name"
}
}
}
}

Elastic search Update by Query to Update Complex Document

I have a use case of elastic search to update a doc.
My doc is something like this-
{
"first_name" : "firstName",
"last_name" : "lastName",
"version" : 1234,
"user_roles" : {
"version" : 12345,
"id" : 1234,
"name" : "role1"},
},
"groups" : {
"version" : 123,
"list": [
{"id":123, "name" : "ashd"},
{"id":1234, "name" : "awshd"},
]
}
}
Now depepeding on some feed I will either will be updating the parent doc or will be updating the nested doc.
I am able to find how to update the basic attributes like firstName and lastName but unable to get how to update complex/nested ones
I did something like from REST client-
"script": {
"inline": "ctx._source.user_roles = { "id" : 5678, "name" :"hcsdl"}
}
but its giving me exception-
Actual use case-
I will actually be getting a Map in java.
This key can be simple key like "first_name" or can be complex key like "user_role" and "groups"
I want to update the document using update by query on version.
The code I wrote is something like-
for (String key : document.keySet()) {
String value = defaultObjectMapper.writeValueAsString(document.get(key));
scriptBuilder.append("ctx._source.");
scriptBuilder.append(key);
scriptBuilder.append('=');
scriptBuilder.append(value);
scriptBuilder.append(";");
}
where document is the Map
Now I might get the simple fields to update or complex object.
I tried giving keys like user_roles.id and user_roles.name and also tried giving complete user_roles but nothing is working.
Can someone helpout
Try this with groovy maps instead of verbatim JSON inside your script:
"script": {
"inline": "ctx._source.user_roles = [ 'id' : 5678, 'name' : 'hcsdl']}
}

Bulk indexing using elastic search

Till now i was indexing data to elastic document by document and now as the data started increasing it has become very slow and not an optimized approach. So i was searching for a bulk insert thing and found Elastic Bulk API. From the documents in their official site i got confused. The approach i am using is by passing the data as WebRequest and executing them in the elastic server. So while creating a batch/bulk insert request the API wants us to form a template like
localhost:9200/_bulk as URL and
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
to index a document with id 1 and field1 values as value 1. Also the API suggests to send the data as JSON (unpretty, to maintain a non escaping character or so). So to pass multiple document with multiple properties how can i structure my data.
I tried like this in FF RestClient , with POST and header as JSON , but RestClient is throwing some error and i know its not a valid JSON
{ "index" : { "_index" : "indexName", "_type" : "type1", "_id" : "111" },
{ "Name" : "CHRIS","Age" : "23" },"Gender" : "M"}
Your data is not well-formed:
You don't need the comma after the first line
You're missing a closing } on the first line
You have a closing } in the middle of your second line you need to remove it as well.
The correct way of formatting your data for a bulk insert look like this:
curl -XPOST localhost:9200/_bulk -d '
{ "index" : { "_index" : "indexName", "_type" : "type1", "_id" : "111" }}
{ "Name" : "CHRIS","Age" : "23" ,"Gender" : "M"}
-H 'Content-Type: application/x-ndjson'
This will work.
UPDATE
Using Postman on Chrome it looks like this. Make sure to add a new line after line 2:
Using the elasticsearch 7.9.2
In order to send the bulk update I was getting the error of new line as below
Failed update without new line
This is wierd but after adding the new line in the last of the all the operations it is working fine with postman, notice line number 5 in below screenshot
bulk update success after adding newline in last of all the commands in postman

nested field query for mongodb (using ruby)

Sup, good folks of the internet.
Does anyone know how to make nested queries for mongodb? This is probably best explained by an example. To retrieve specific fields, I can use the :fields option to retrieve that field (e.g. suppose it is called "useful_field"):
collection.find({},{:fields => {"useful_field" => 1}})
But suppose that useful_field itself contains an array of many further fields, i.e
useful_field = [{"value_I_want"=>"useful","value_I_dont_want"=>"not_useful"}]
My aim is to select "value_I_want". Any thoughts?
Here is a specific entry that I am trying to deal with (a reply to a tweet):
{ "_id" : ObjectId("51b6f71b0364718d71e4bca5"),
"annotations" : { },
"resultType" : "Tweet",
"score" : 1,
"groupName" : "TweetsWithConversation",
"results" : [
{
"kind" : "Tweet",
"score" : 1,
"annotations" : { "ConversationRole" : "Ancestor" },
"value" : { "created_at" : "Fri Jun 07 19:47:51 +0000 2013",
"id" : NumberLong("343091955196104704"),
"id_str" : "343091955196104704",
"text" : "THIS_IS_WHAT_I_WANT",
etc. etc. (Apologies for the odd formatting)
I'm trying to use a method of the form that will let me do something like this:
db.collection.find({},{:fields { some_way_of_selecting(THIS_IS_WHAT_I_WANT)})
(I'm querying as part of a ruby script)
Otherwise, I'll have to go back into the dark world of regex. No-one wants that.
Try the following
db.collection.find({},{"useful_field.value_I_want": 1})
Maybe try this:
db.collection.find({"resultType" : "Tweet"}, {"results" : {$elemMatch : {"value.text" : "THIS_IS_WHAT_I_WANT"}}})
What you are trying to do is called "projection" - it's specifying what fields you want returned in the second argument to find.
In your case you simply want:
db.collection.find({}, {"results.value.text":1} )

Mongo DB MapReduce: Emit key from array based on condition

I am new to mongo db so excuse me if this is rather trivial. I would really appreciate the help.
The idea is to generate a histogram over some specific values. In that case the mime types of some files. For that I am using a map reduce job.
I have a mongo with documents in the following form:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : [
{
"key" : "key1",
"value" : "Plain text",
"status" : "SINGLE_RESULT",
},
{
"key" : "key2",
"value" : "text/plain",
"status" : "SINGLE_RESULT",
},
{
"key" : "key3",
"value" : 3469,
"status" : "OK",
}
]
}
Please note, that in almost every document there are more metadata key values.
Map Reduce job
I tried doing the following:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;}
});
emit(mime, {count:1});
}
function reduce() {
var res = {count:0};
values.forEach(function (v) {res.count += v.count;});
return res;
}
db.collection.mapReduce(map, reduce, {out: { inline : 1}})
This seems to work for a small number of documents (~15K) but the problem is that iterating through all metadata key values takes a lot of time during the mapping phase. When running this on more documents (~1Mio) the operation takes for ever.
So my question is:
Is there some way in which I can emit the mime type (the value) directly instead of iterating through all keys and selecting it? Or is there a better way to write a map reduce functions.
Something like emit (this.metadata.value {$where this.metadata.key:"key2"}) or similar...
Thanks for your help!
Two thoughts ...
First thought: How attached are you to this document schema? Could you instead have the metadata field value as an embedded document rather than an embedded array, like so:
{
"_id" : ObjectId("4fc5ed3e67960de6794dd21c"),
"name" : "some name",
"uid" : "some app specific uid",
"collection" : "some name",
"metadata" : {
"key1" : {
"value" : "Plain text",
"status" : "SINGLE_RESULT"
},
"key2": {
"value" : "text/plain",
"status" : "SINGLE_RESULT"
},
"key3" : {
"value" : 3469,
"status" : "OK"
}
}
}
Then your map step does away with the loop entirely:
function map() {
emit( this.metadata["key2"].value, { count : 1 } );
}
At that point, you might even be able to cast this as a "group" command rather than a "mapReduce".
Second thought: Absent a schema change like that, particularly if "key2" appears early in the metadata array, you could at least exit the loop eagerly once the key is found to save yourself some iterations, like so:
function map() {
var mime = "";
this.metadata.forEach(function (m) {
if (m.key === "key2") {
mime = m.value;
break;
}
});
emit(mime, {count:1});
}
Not sure if either path is the key to victory, but hopefully helpful thoughts. Best of luck!

Resources