Elasticsearch - How to update document - elasticsearch

How does elasticsearch update document? It will delete original document and make new one? I've heard this is how nosql's updating method. does elasticsearch do, same as any other nosql db? or It will replace/insert just field which need to be?

For example, I'm running with Elasticsearh 7.0.0.
First, I created one document,
PUT /employee/_doc/1
{
"first_name" : "John",
"last_name" : "Snow",
"age" : 19,
"about" : "King in the north",
"sex" : "male"
}
Then I updated it via
POST /employee/_update/1/
{
"doc": {
"first_name" : "Aegon",
"last_name" : "Targaryen",
"skill": "fighting and leading"
}
}
Finally, I got correct result when
GET /employee/_doc/1
{
"_index" : "employee",
"_type" : "_doc",
"_id" : "1",
"_version" : 9,
"_seq_no" : 11,
"_primary_term" : 1,
"found" : true,
"_source" : {
"first_name" : "Aegon",
"last_name" : "Targaryen",
"age" : 19,
"about" : "King in the north",
"sex" : "male",
"skill" : "fighting and leading"
}
}

Document in elasticsearch are immutable object. Updating a document is always a reindexing and it consist of the following steps:
Retrieve the JSON (that you want to reindex)
Change it
Delete the old document
Index a new document
Elasticsearch documentation

For the answer you can check the documentation:
In addition to being able to index and replace documents, we can also
update documents. Note though that Elasticsearch does not actually do
in-place updates under the hood. Whenever we do an update,
Elasticsearch deletes the old document and then indexes a new document
with the update applied to it in one shot.

Related

What is difference b/w modifying and updating in the Elasticsearch?

I am following Elasticsearch official docs where there is a section on Modifying Document: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/_modifying_your_data.html
So I already have a document under /customer/_doc/1:
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "ajay"
}
}
Below is the request to "modify"
PUT /customer/_doc/1
{
"firstname": "ajay",
"lastname": "tanwar"
}
GET would return the updated document
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 2,
"_primary_term" : 1,
"found" : true,
"_source" : {
"firstname" : "ajay",
"lastname" : "tanwar"
}
}
On the next page of docs, Updating Documents https://www.elastic.co/guide/en/elasticsearch/reference/6.2/_updating_documents.html
Below is the request used to "update"
POST /customer/_doc/1/_update
{
"doc":{
"firstname": "ajay",
"lastname": "tanwar"
}
}
This also return the same result as "modify".
Two difference I noticed in both of these:
"modify" request updates the _version on each request. Whereas the
"update" request keeps the _version same
"modify" request's response contain "result" : "updated" whereas
the "update" request's response contain "result" : "noop"
But few doubts I have: first of all, why the "modify" returns "result" : "updated"? Docs itself says it's a modification operation. And why "modify" returns "result" : "noop"? What is noop BTW?
And if we go logically, modifying and updating are the same thing. What is the purpose of these two different APIs?
When you modify document, you delete the old document and insert an entirely new document in its place. This is similar to HTTP's PUT method, in that it simply replaces the old document with whatever is sent in the HTTP body.
When you update a document, you make changes to the old document. Internally, ElasticSearch will also delete the old document and insert a new (updated) document. However, this operation should be treated as if it just made changes to the old document. This is similar to HTTP's PATCH method, in that it will keep the old document and only apply the changes sent in the HTTP body.
"result" : "updated" means changes were made to the ElasticSearch database whereas "result" : "noop" (no operation) means nothing happened (probably because the end result after update would've been the same as before the update).

Using field instead of "_id" for more-like-this query

I have a slug field that I want to use to identify object to use as a reference instead of "_id" field. But instead of using it as a reference, doc seems to use it as query to comapre against. Since slug is a unique field with a simple analyzer, it just returns exactly one result like the following. As far as I know, there is no way to use a custom field as _id field:
https://github.com/elastic/elasticsearch/issues/6730
So is double look up, finding out elasticsearch's id first then doing more_like_this the only way to achieve what I am looking for? Someone seems to have asked a similar question three years ago, but it doesn't have an answer.
ArticleDocument.search().query("bool",
should=Q("more_like_this",
fields= ["slug", "text"],
like={"doc": {"slug": "OEXxySDEPWaUfgTT54QvBg",
}, "_index":"article", "_type":"doc"},
min_doc_freq=1,
min_term_freq=1
)
).to_queryset()
Returns:
<ArticleQuerySet [<Article: OEXxySDEPWaUfgTT54QvBg)>]>
You can make some of your documents field as "default" _id while ingesting data.
Logstash
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "my_name"
document_id => "%{some_field_id}"
}
}
Spark (Scala)
DF.saveToEs("index_name" + "/some_type", Map("es.mapping.id" -> "some_field_id"))
Index API
PUT twitter/_doc/1
{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}
{
"_shards" : {
"total" : 2,
"failed" : 0,
"successful" : 2
},
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"result" : "created"
}

Get from ElasticSearch why a result is a hit

In the ElasticSearch below I search for the word Balances in two fields name and notes:
GET /_search
{ "query": {
"multi_match": { "query": "Balances",
"fields": ["name","notes"]
}
}
}
And the result in the name field:
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.673515,
"hits" : [
{
"_index" : "idx",
"_type" : "_doc",
"_id" : "25",
"_score" : 1.673515,
"_source" : {
"name" : "Deposits checking accounts balances",
"notes" : "These are the notes",
"#timestamp" : "2019-04-18T21:05:00.387Z",
"id" : 25,
"#version" : "1"
}
}
]
}
Now, I want to know in which field ElasticSearch found the value. I could evaluate the result and see if the searched text is in name or notes, but I cannot do that if it's a fuzzy search.
Can ElasticSearch tell me in which field the text was found, and in addition provide a snippet with 5 words to the left and to the right of the result to tell the user why the result is a hit?
What I want to achieve is similar to Google highlighting in bold the text that was found within a phrase.
I think the 2 solutions in Find out which fields matched in a multi match query are still the valid solutions:
Highlight to find it.
Split the query up into multiple named match queries.

Querying against an array only works for first element

I am having an issue searching for items within an array in a document; a simple tagging system in my case. I have a relatively simple document representing a recipe. This is a truncated version of the data in the index:
{
"_index" : "recipes",
"_type" : "recipe",
"_id" : "37",
"_version" : 1,
"found" : true,
"_source" : {
"id" : 37,
"title" : "Crab Cakes",
"tags" : [
"seafood",
"appetizer"
]
}
}
When I search for the tag seafood it matches this recipe. However, when I search for the tag appetizer, I get nothing. Here is the explain for a very basic appetizer query:
curl -XGET 'http://localhost:9200/recipes/recipe/37/_explain?pretty' -H 'Content-Type: application/json' -d'{"query":{"term":{"tags":"appetizer"}}}'
Which results in this:
{
"_index" : "recipes",
"_type" : "recipe",
"_id" : "37",
"matched" : false,
"explanation" : {
"value" : 0.0,
"description" : "no matching term",
"details" : [ ]
}
}
Correct answer came in comments from sramalingam24. It was to change the query to a match instead of a term.
[updated]
I also tested switching the tags to be a keyword field and that works as well. This is the solution I ended up going with.

Search elasticsearch all content from especific source

i want to know if it is possible to search all content from a specific _source in elasticsearch.
for example i have this:
{
"_shards":{
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits":{
"total" : 1,
"hits" : [
{
"_index" : "twitter",
"_type" : "tweet",
"_id" : "1",
"_source" : {
"user" : "kimchy",
"postDate" : "2009-11-15T14:12:12",
"message" : "trying out Elastic Search"
}
}
]
}
}
i want query all users from source without specifying the name.
for eg: something similar in SQL is like this
SELECT * user from twitter
and with that give all users
thanks and sorry for my bad english
edit:
i want search only for the source.
i give you an example, i have a source who store random word, sometimes store, sometimes not. i want to search for this source only when have new words.
the plan is verify from last 10 minutes if in my specific source have something new, if not, i don't care
You can just:
$ curl -XGET 'http://localhost:9200/twitter/_search'
That by default will return 10 documents. You can sipecify a size:
$ curl -XGET 'http://localhost:9200/twitter/_search?size=BIGNUM'
Or you can use scrool: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

Resources