Elasticsearch index last update time - elasticsearch

Is there a way to retrieve from ElasticSearch information on when a specific index was last updated?
My goal is to be able to tell when it was the last time that any documents were inserted/updated/deleted in the index. If this is not possible, is there something I can add in my index modification requests that will provide this information later on?

You can get the modification time from the _timestamp
To make it easier to return the timestamp you can set up Elasticsearch to store it:
curl -XPUT "http://localhost:9200/myindex/mytype/_mapping" -d'
{
"mytype": {
"_timestamp": {
"enabled": "true",
"store": "yes"
}
}
}'
If I insert a document and then query on it I get the timestamp:
curl -XGET 'http://localhost:9200/myindex/mytype/_search?pretty' -d '{
> fields : ["_timestamp"],
> "query": {
> "query_string": { "query":"*"}
> }
> }'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "myindex",
"_type" : "mytype",
"_id" : "1",
"_score" : 1.0,
"fields" : {
"_timestamp" : 1417599223918
}
} ]
}
}
updating the existing document:
curl -XPOST "http://localhost:9200/myindex/mytype/1/_update" -d'
{
"doc" : {
"field1": "data",
"field2": "more data"
},
"doc_as_upsert" : true
}'
Re-running the previous query shows me an updated timestamp:
"fields" : {
"_timestamp" : 1417599620167
}

I don't know if there are people who are looking for an equivalent, but here is a workaround using shards stats for > Elasticsearch 5 users:
curl XGET http://localhost:9200/_stats?level=shards
As you'll see, you have some informations per indices, commits and/or flushs that you might use to see if the indice changed (or not).
I hope it will help someone.

Just looked into a solution for this problem. Recent Elasticsearch versions have a <index>/_recovery API.
This returns a list of shards and a field called stop_time_in_millis which looks like it is a timestamp for the last write to that shard.

Related

ES curl for email is not returning correct results despite knowing that it does exist

I do a query for a term "owner" and a document showed the email for an owner. I figured to look at all Houses which have this email, to query for email instead of owner.
When I do the following curl request, It doesnt return any actual cases.
curl -X GET "localhost:9200/_search/?pretty" -H "Content-Type: application/json" -d'{"query": {"match": {"email": {"query": "test.user#gmail.com"}}}}'
it does not return the correct information. I wanted to find an exact result. I was also thinking to use the term:
curl -X GET "localhost:9200/_search/?pretty" -H "Content-Type: application/json" -d'{"query": {"term": {"email": "test.user#gmail.com"}}}'
in an attempt to find an exact match. This seems to return no document information. I am thinking that it might have something to do with the periods or maybe the # symbol.
I have also tried match when trying to wrap the email with escaped quotes, escaped periods.
Is there something going on I am unaware of with special characters?
Elasticsearch is not schema free, now they are calling it "schema on write" and that´s a very good name for the schema generation process. When elasticsearch recieves a new document with unknown fields, it tries an "educated guess".
When you index the first document with the field "email", elasticsearch will have a look on the value provided and create a mapping for this field.
The value "test.user#gmail.com" will then be mapped to "Text" mapping type.
Now, let´s see how elastic will process a simple document with a email. Create a document:
POST /auto_mapped_index/_doc
{"email": "nobody#example.com"}
Courious how the mapping look like? Here you go:
GET /auto_mapped_index/_mapping
Will be answered with:
{
"my_first_index" : {
"mappings" : {
"properties" : {
"email" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
You see, the "type" : "text" is indicating the mapping type "text" as assumed before. And there is also a subfield "keyword", automatically created by elastic for text type fields by default.
We have 2 options now, the easy one is to query the keyword subfield (please note the dot notation):
GET /my_first_index/_search
{"query": {"term": {"email.keyword": "nobody#example.com"}}}
Done!
The other option is to create a specific mapping for our index. In order to do so, we need a new and empty index and define the mapping. We can do it with one shot:
PUT /my_second_index/
{
"mappings" : {
"properties" : {
"email" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
Now let us populate the index (here i´m putting two documents):
POST /my_second_index/_doc
{"email": "nobody#example.com"}
POST /my_second_index/_doc
{"email": "anybody#example.com"}
And now your unchanged query should work :
GET /my_second_index/_search
{"query": {"term": {"email": "anybody#example.com"}}}
Response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "my_second_index",
"_type" : "_doc",
"_id" : "OTf3n28BpmGM8iQdGR4j",
"_score" : 0.2876821,
"_source" : {
"email" : "anybody#example.com"
}
}
]
}
}

Elasticsearch-6.x norms false not working

That is what I have Done:
First:
curl -X PUT "localhost:9200/log_20180419"
Second
curl -X PUT "localhost:9200/log_20180419/_mapping/_doc" -H 'Content-Type: application/json' -d'
{
"properties": {
"title": {
"type": "text",
"norms": false
}
}
}
'
Third
#I insert data with python client : elastisearch-py
from elastisearch import Elastisearch
es_conn = Elastisearch()
content_tmp = "acxzcasiuchxzuicbhasuicgzyugas%s"
for i in range(10000):
result = content_tmp % i
es_conn.index(index="log_20180419", body = {"title":result}, doc_type="_doc")
Forth
I Query It
curl -X GET "localhost:9200/cdn_log_20180419/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"match":{
"title":"dasuioczxuivcaduciqanbcaiushcauinhauincsaincdjkxzcbyquiwbjkfcznkajsbcjkzxhcuiasbcjkzxchjdsfasckjbjak9999"
}
}
}
'
Result is
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 7.2293553,
"hits" : [
{
"_index" : "cdn_log_20180419",
"_type" : "_doc",
"_id" : "oDR99mIBBZEcRu0i7LlO",
"_score" : 7.2293553,
"_source" : {
"title" : "dasuioczxuivcaduciqanbcaiushcauinhauincsaincdjkxzcbyquiwbjkfcznkajsbcjkzxhcuiasbcjkzxchjdsfasckjbjak9999"
}
}
]
}
}
You can see, it still has _score file in result, I get Confuse with it ?
The Doc is here https://www.elastic.co/guide/en/elasticsearch/reference/current/norms.html
The norm is only one part of scoring. The norm covers the field length norm and index-time boosting (if you are using that), but term frequency and inverse document frequency (TF/IDF) are independent of it.
If you don't need / want scoring for your query, look into boolean filters or constant score.

Unable to search attachment type field in an ElasticSearch indexed document

Search does not return any results although I do have a document that should match the query.
I do have the ElasticSearch mapper-attachments plugin installed per https://github.com/elasticsearch/elasticsearch-mapper-attachments. I have also googled the topic as well as browsed similar questions in stack overflow, but have not found an answer.
Here's what I typed into a windows 7 command prompt:
c:\Java\elasticsearch-1.3.4>curl -XDELETE localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/_mapping -d{\"
contact\":{\"properties\":{\"my_attachment\":{\"type\":\"attachment\"}}}}
{"acknowledged":true}
c:\Java\elasticsearch-1.3.4>curl -XPUT localhost:9200/tce/contact/1 -d{\"my_atta
chment\":\"SGVsbG8=\"}
{"_index":"tce","_type":"contact","_id":"1","_version":1,"created":true}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "tce",
"_type" : "contact",
"_id" : "1",
"_score" : 1.0,
"_source":{"my_attachment":"SGVsbG8="}
} ]
}
}
c:\Java\elasticsearch-1.3.4>curl localhost:9200/tce/contact/_search?pretty -d{\"
query\":{\"term\":{\"my_attachment\":\"Hello\"}}}
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Note that the base64 encoded value of "Hello" is "SGVsbG8=", which is the value I have inserted into the "my_attachment" field of the document.
I am assuming that the mapper-attachments plugin has been deployed correctly because I don't get an error executing the mapping command above.
Any help would be greatly appreciated.
What analyzer is running against the my_attachment field?
if it's the standard analyser (can't see any listed) then the Hello in the text will be made lowercase in the index.
i.e. when doing a term search (which doesn't have an analyzer on it) - try searching for hello
curl localhost:9200/tce/contact/_search?pretty -d'
{"query":
{"term":
{"my_attachment":"hello"
}}}'
you can also see which terms have been added to the index:
curl 'http://localhost:9200/tce/contact/_search?pretty=true' -d '{
"query" : {
"match_all" : { }
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "my_attachment"
}
}
}
}'

Join query in ElasticSearch

Is there any way (query) to join 2 JSONs below in ElasticSearch
{
product_id: "1111",
price: "23.56",
stock: "100"
}
{
product_id: "1111",
category: "iPhone case",
manufacturer: "Belkin"
}
Above 2 JSONs processed (input) under 2 different types in Logstash, so their indexes are available in different 'type' filed in Elasticsearch.
What I want is to join 2 JSONs on product_id field.
It depends what you intend when you say JOIN. Elasticsearch is not like regular database that supports JOIN between tables. It is a text search engine that manages documents within indexes.
On the other hand you can search within the same index over multiple types using a fields that are common to every type.
For example taking your data I can create an index with 2 types and their data like follows:
curl -XPOST localhost:9200/product -d '{
"settings" : {
"number_of_shards" : 5
}
}'
curl -XPOST localhost:9200/product/type1/_mapping -d '{
"type1" : {
"properties" : {
"product_id" : { "type" : "string" },
"price" : { "type" : "integer" },
"stock" : { "type" : "integer" }
}
}
}'
curl -XPOST localhost:9200/product/type2/_mapping -d '{
"type2" : {
"properties" : {
"product_id" : { "type" : "string" },
"category" : { "type" : "string" },
"manufacturer" : { "type" : "string" }
}
}
}'
curl -XPOST localhost:9200/product/type1/1 -d '{
product_id: "1111",
price: "23",
stock: "100"
}'
curl -XPOST localhost:9200/product/type2/1 -d '{
product_id: "1111",
category: "iPhone case",
manufacturer: "Belkin"
}'
I effectively created one index called product with 2 type type1 and type2.
Now I can do the following query and it will return both documents:
curl -XGET 'http://localhost:9200/product/_search?pretty=1' -d '{
"query": {
"query_string" : {
"query" : "product_id:1111"
}
}
}'
{
"took" : 95,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.5945348,
"hits" : [ {
"_index" : "product",
"_type" : "type1",
"_id" : "1",
"_score" : 0.5945348, "_source" : {
product_id: "1111",
price: "23",
stock: "100"
}
}, {
"_index" : "product",
"_type" : "type2",
"_id" : "1",
"_score" : 0.5945348, "_source" : {
product_id: "1111",
category: "iPhone case",
manufacturer: "Belkin"
}
} ]
}
}
The reason is because Elasticsearch will search over all documents within that index regardless of their type. This is still different than a JOIN in the sense Elasticsearch is not going to do a Cartesian product of the documents that belong to each type.
Hope that helps
isaac.hazan's answer works quite well, but I would like to add a few points that helped me with this kind of situation:
I landed on this page when I was trying to solve a similar problem, in that I had to exclude multiple records of one index based on documents of another index. The lack of relationships is one of the main downsides of unstructured databases.
The elasticsearch documentation page on Handling Relationships explains a lot.
Four common techniques are used to manage relational data in Elasticsearch:
Application-side joins
Data denormalization
Nested objects
Parent/child relationships
Often the final solution will require a mixture of a few of these techniques.
I've used nested objects and application-side joins, mostly. While using the same field name could momentarily solve the problem, I think it is better to rethink and create best-suited mapping for your application.
For instance, you might find that you want to list all products with price greater than x, or list all products that are not in stock anymore. To deal with such scenarios it helps if you are using one of the solutions mentioned above.
To perform joins on Elasticsearch take a look at the Siren "Federate" plugin. It adds join capabilities by extending the Elasticsearch native query syntax.
https://siren.io/federate/

ElasticSearch CouchDB Geo location

I am trying to get elasticsearch to index a couchdb river without luck.
I have a database 'pl1' with only one document '1' in it.
This is a printout of the entire document pretty-printed:
curl -XGET localhost:5984/pl1/1 | python -mjson.tool
{
"_id": "1",
"_rev": "1-0442f3962cffedc2238fcdb28dd77557",
"location": {
"geo_json": {
"coordinates": [
59.70141999133738,
14.162789164118708
],
"type": "point"
},
"lat": 14.162789164118708,
"lon": 59.70141999133738
}
}
I create a couchdb river and index with a catch-all type called all_entries the following way:
curl -XPUT 'localhost:9200/_river/pl1/_meta' -d '
{
"type" : "couchdb",
"couchdb" : {
"host" : "localhost",
"port" : 5984,
"filter" : null,
"db" : "pl1"
},
"index" : {
"index" : "pl1",
"type" : "all_entries",
"bulk_size" : "100",
"bulk_timeout" : "10ms"
}
}'
{"ok":true,"_index":"_river","_type":"pl1","_id":"_meta","_version":1}
To test whether the document was indexed I perform the following query:
curl -XGET localhost:9200/pl1/all_entries/_count?pretty=true
{
"count" : 1,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}
But then nothing. I can't figure out how to index the location using a geo_shape type (I have also tried with the different geo_point format for the data, and indexing that, but also no results)
How do I specify a mapper and query for this?

Resources