Return document on update elasticsearch - elasticsearch

Lets say I'm updating user data
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"doc" : {
"name" : "new_name"
},
"fields": ["_source"]
}'
Heres an example of what I'm getting back when I perform an update
{
"_index" : "test",
"_type" : "type1",
"_id" : "1",
"_version" : 4
}
How do I perform an update that returns the given document post update?

The documentation is a little misleading with regards to returning fields when performing an Elasticsearch update. It actually uses the same approach that the Index api uses, passing the parameter on the url, not as a field in the update.
In your case you would submit:
curl -XPOST 'localhost:9200/test/type1/1/_update?fields=_source' -d '{
"doc" : {
"name" : "new_name"
}
}'
In my testing in Elasticsearch 1.2.1 it returns something like this:
{
"_index":"test",
"_type":"testtype",
"_id":"1","_version":9,
"get": {
"found":true,
"_source": {
"user":"john",
"body":"testing update and return fields",
"name":"new_name"
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html

Related

ES curl for email is not returning correct results despite knowing that it does exist

I do a query for a term "owner" and a document showed the email for an owner. I figured to look at all Houses which have this email, to query for email instead of owner.
When I do the following curl request, It doesnt return any actual cases.
curl -X GET "localhost:9200/_search/?pretty" -H "Content-Type: application/json" -d'{"query": {"match": {"email": {"query": "test.user#gmail.com"}}}}'
it does not return the correct information. I wanted to find an exact result. I was also thinking to use the term:
curl -X GET "localhost:9200/_search/?pretty" -H "Content-Type: application/json" -d'{"query": {"term": {"email": "test.user#gmail.com"}}}'
in an attempt to find an exact match. This seems to return no document information. I am thinking that it might have something to do with the periods or maybe the # symbol.
I have also tried match when trying to wrap the email with escaped quotes, escaped periods.
Is there something going on I am unaware of with special characters?
Elasticsearch is not schema free, now they are calling it "schema on write" and that´s a very good name for the schema generation process. When elasticsearch recieves a new document with unknown fields, it tries an "educated guess".
When you index the first document with the field "email", elasticsearch will have a look on the value provided and create a mapping for this field.
The value "test.user#gmail.com" will then be mapped to "Text" mapping type.
Now, let´s see how elastic will process a simple document with a email. Create a document:
POST /auto_mapped_index/_doc
{"email": "nobody#example.com"}
Courious how the mapping look like? Here you go:
GET /auto_mapped_index/_mapping
Will be answered with:
{
"my_first_index" : {
"mappings" : {
"properties" : {
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
You see, the "type" : "text" is indicating the mapping type "text" as assumed before. And there is also a subfield "keyword", automatically created by elastic for text type fields by default.
We have 2 options now, the easy one is to query the keyword subfield (please note the dot notation):
GET /my_first_index/_search
{"query": {"term": {"email.keyword": "nobody#example.com"}}}
Done!
The other option is to create a specific mapping for our index. In order to do so, we need a new and empty index and define the mapping. We can do it with one shot:
PUT /my_second_index/
{
"mappings" : {
"properties" : {
"email" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
Now let us populate the index (here i´m putting two documents):
POST /my_second_index/_doc
{"email": "nobody#example.com"}
POST /my_second_index/_doc
{"email": "anybody#example.com"}
And now your unchanged query should work :
GET /my_second_index/_search
{"query": {"term": {"email": "anybody#example.com"}}}
Response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "my_second_index",
"_type" : "_doc",
"_id" : "OTf3n28BpmGM8iQdGR4j",
"_score" : 0.2876821,
"_source" : {
"email" : "anybody#example.com"
}
}
]
}
}

ES 1.5 Delete By Query API not working

I am using an old version on ElasticSearch - 1.5.
Problem: I need to delete a lot of documents, like few hundred thousands up to few millions. I have all the info about the records, including it's _ids - so array of _ids is what I want to use.
Scale problem: I had this deletion in the loop before, but ES is inconsistent when performing a lot of subsequent operations in a high speed. Thus I decided to look for a bulk delete.
I am trying to make use of delete by query API.
Docs states:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}
'
What I'm doing:
curl -XDELETE 'http://localhost:9200/my_index/logs/_query' -d '{
"query" : {
"terms" : { "_id" : ["AVTD6fhLAn35BG25xbZz", "AVTD6fhLAn35BG25xbaC"] }
}
}
'
The response is:
{
"found":false,
"_index":"my_index",
"_type":"logs",
"_id":"_query",
"_version":1,
"_shards":{"total":2, "successful":1, "failed":0}
}
And it does not remove any of documents. How do I make it work and actually delete these records?
Not sure about the delete_by_query API in elasticsearch 1.5. Seems to me that elasticsearch is unable to understand your query as it is looking for "_id": "_query" (as evident from the response you posted).
What you can do is, use the Bulk API as documented here:
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/docs-bulk.html
As in the example in the doc page, you can do:
curl -s -XPOST localhost:9200/_bulk --data-binary #requests; echo
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
...
You need to make a file by any name ("requests" here) and add individual delete requests, each separated by a newline character.

Elasticsearch does not found an existing document using a the DSL

I dont know why, using the URI Search way to search a document is returning the right document, but the document is not found if I use the API DSL.
To reproduce the issue:
Without any index created, I insert this document:
curl http://localhost:9299/integrationtest-index/searchable/ID_XXXX2 -d '{ "ref" : "XXXX2", "field1" : "value1" }'
So the index is created automatically with the default mapping (type searchable):
curl http://localhost:9299/integrationtest-index?pretty
{
"integrationtest-index" : {
"aliases" : { },
"mappings" : {
"searchable" : {
"properties" : {
"field1" : {
"type" : "string"
},
"ref" : {
"type" : "string"
}
}
}
},
"settings" : {
"index" : {
"field1" : "value1",
"ref" : "XXXX2",
"number_of_shards" : "5",
"creation_date" : "1466780216631",
"number_of_replicas" : "1",
"uuid" : "GBj2VF-wQy6JP74AqoIn5g",
"version" : {
"created" : "2020099"
}
}
},
"warmers" : { }
}
}
This query return one document:
curl http://localhost:9299/integrationtest-index/searchable/_search?q=ref:XXXX2
But this other query response that does not exist:
curl -XPOST http://localhost:9299/integrationtest-index/searchable/_search/exists -d '
{
"query": {
"term" : {
"ref" : "XXXX2"
}
}
}'
Why the last query said that the document does not exist?
Environment:
ElasticSearch 2.2.0
Ubuntu 16.04 LTS
OpenJDK Runtime Environment (build 1.8.0_91-8u91-b14-0ubuntu4~16.04.1-b14)
I have the same problem every few months, so I decided response myself and share my stupids errors.
By default, elasticsearch use index:analyzed, so the query with term does not found any document.
If you use the URI Search way, elasticsearch is executing a query_string and not a term query.
This query is working:
curl -XPOST http://localhost:9299/integrationtest-index/searchable/_search/exists -d '
{
"query": {
"match" : {
"ref" : "XXXX2"
}
}
}'
More information in the documentation, in the section Why doesn’t the term query match my document?

Elasticsearch index last update time

Is there a way to retrieve from ElasticSearch information on when a specific index was last updated?
My goal is to be able to tell when it was the last time that any documents were inserted/updated/deleted in the index. If this is not possible, is there something I can add in my index modification requests that will provide this information later on?
You can get the modification time from the _timestamp
To make it easier to return the timestamp you can set up Elasticsearch to store it:
curl -XPUT "http://localhost:9200/myindex/mytype/_mapping" -d'
{
"mytype": {
"_timestamp": {
"enabled": "true",
"store": "yes"
}
}
}'
If I insert a document and then query on it I get the timestamp:
curl -XGET 'http://localhost:9200/myindex/mytype/_search?pretty' -d '{
> fields : ["_timestamp"],
> "query": {
> "query_string": { "query":"*"}
> }
> }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "myindex",
"_type" : "mytype",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"_timestamp" : 1417599223918
}
} ]
}
}
updating the existing document:
curl -XPOST "http://localhost:9200/myindex/mytype/1/_update" -d'
{
"doc" : {
"field1": "data",
"field2": "more data"
},
"doc_as_upsert" : true
}'
Re-running the previous query shows me an updated timestamp:
"fields" : {
"_timestamp" : 1417599620167
}
I don't know if there are people who are looking for an equivalent, but here is a workaround using shards stats for > Elasticsearch 5 users:
curl XGET http://localhost:9200/_stats?level=shards
As you'll see, you have some informations per indices, commits and/or flushs that you might use to see if the indice changed (or not).
I hope it will help someone.
Just looked into a solution for this problem. Recent Elasticsearch versions have a <index>/_recovery API.
This returns a list of shards and a field called stop_time_in_millis which looks like it is a timestamp for the last write to that shard.

null_value mapping in Elasticsearch

I have created a mapping for a tweetb type in a twitter index:
curl -XPUT http://www.mydomain:9200/twitter/tweetb/_mapping -d '{
"twitter": {
"mappings": {
"tweetb": {
"properties": {
"message": {
"type": "string",
"null_value": "NA"
}
}
}
}
}
}'
Then, I put one document:
curl -XPUT http://www.mydomain.com:9200/twitter/tweetb/1 -d '{"message": null}'
Then, I tried to get the inserted doc back:
curl -XGET http://www.mydomain:9200/twitter/tweetb/1
And that returned:
{
"_index": "twitter",
"_type": "tweetb",
"_id": "1",
"_version": 2,
"found" : true,
"_source" : { "message": null }
}
I was expecting "message" : "NA" in the _source field. However, it looks like "null_value" isn't working. Am I missing something?
The "null_value" field mapping does not change the value stored, rather it changes the value that is used in searches.
If you try searching for your "message" using "NA", then it should appear in the results:
curl -XPOST http://www.mydomain.com:9200/twitter/tweetb/_search -d '{
"query" : {
"match" : { "message" : "NA" }
}
}'
Of interest, it should respond with the actual value being null. Now, if you add a new document whose raw value is literally "NA" and perform the search, then you should see both results returned for the above query--one with a value and the other with null defined.
Perhaps of similar interest, this works for other queries as well based on how it is indexed, which is why a lowercase n.* matches, but N.* semi-surprisingly will not match:
curl -XPOST http://www.mydomain.com:9200/twitter/tweetb/_search -d '{
"query" : {
"regexp" : { "message" : "n.*" }
}
}'

Resources