Possible to provide an entire document to Update By Query? - elasticsearch

I would like to search for a document that is stored in ElasticSearch based on it's fields and overwrite that entire document with a new version. I am new to ES but from what I can tell I can tell I can only use Update if I am searching for a document by it's ES assigned _id, so I was hoping to use Update By Query to do this. Unfortunately, it appears that if I use Update By Query, then I need to provide a script to update the fields I care about. Something like below:
POST my-index-000001/_update_by_query
{
"script": {
"source": "ctx._source.count++",
"lang": "painless"
},
"query": {
"term": {
"user.id": "kimchy"
}
}
}
My problem is that my document has dozens of fields and I don't know which of them will have changed. I could loop through them and build the script, but I'm hoping there is a way to simply provide the document that you want and have anything that matches your query be overwritten by that document. Is this possible with Update By Query? Or is there another way to match on something other than _id and perform an update?

You question is not entirely clear, are you trying to update the whole document for a for a given id? If yes, you can simple overwrite the exiting document with the put call:
PUT index-name/_id
This will overwrite the existing index so make sure that you are sending the complete document in your PUT call and not just the field that have changed.

Related

Add _id to the source as a separate field to all exist docs in index

I'm new to Elastic Search. I need go through all the documents, take the _id and add it to the _source as a separate field by script. Is it possible? If yes, сan I have an example of something similar or a link to similar scripts? I haven't seen anything like that on the docks. Why i need it? - Because after that i will do SELECT with Opendistro and SQL. This frame cannot return me fields witch not in source. If anyone can suggest I would be very grateful.
There are two options:
First option: Add this new field in your existing index and populate it and build the new index again.
Second option: Simply define a new field in a new index mapping(keep rest all field same) and than use reindex API with below script.
"script": {
"source": "ctx._source.<your-field-name> = ctx._id"
}

How to find what index a field belongs to in elasticsearch?

I am new to elasticsearch. I have to write a query using a given field but I don't know how to find the appropriate index. How would I find this information?
Edit:
Here's an easier/better way using mapping API
GET _mapping/field/<fieldname>
One of the ways you can find is to get records where the field exist
Replace the <fieldName> with your fields name. /_search will search across all indices and return any document that matches or has the field. Set _source to false, since you dont care about document contents but only index name.
GET /_search
{
"_source": false,
"query": {
"exists": {
"field": "<fieldName>"
}
}
}
Another, more visual way to do that is through the kibana Index Management UI (assuming you have privileges to access the site).
There you can click on the indices and open the mappings tab to get all fields of the particular index. Then just search for the desired field.
Summary:
#Polynomial Proton's answer is the way of choice in 90% of the time. I just wanted to show you another way to solve your issue. It will require more manual steps than #Polynomial Proton's answer. Also, if you have a large amount of indices this way is not appropriate.

Elasticsearch index with no fields indexed

I need to create an Elasticsearch index whose contents will be accessed only by the document-id. There will never be any queries related to the contents of documents. These documents can contain any JSON, including instances where the same field can contain different types of data, etc.
I've hunted for this info, and have found much about indexing individual fields, but nothing about treating the entire document as essentially opaque.
Any help would be much appreciated.
You could do that you want, but as for me, this is not right way.
For first you need to create a mapping for index:
PUT index_name_here
{
"mappings": {
"document_type_here": {
"properties": {
"field_name_for_json_data_here": {
"type": "object",
"enabled": false
}
}
}
}
}
After this, you could create documents with custom structure of fields. You just need to store you JSON not directly in document, but inside field of document (in my example inside field "field_name_for_json_data_here")
If it possible, tell me the reason, why you choose Elasticsearch for store this kind of data? Because if I correctly understood the question, you need simple key\value storage (you could store you json as string) and many databases are more suitable for this.

How to retrieve all document with version n in elastic search

ElasticSearch comes with versioning https://www.elastic.co/blog/versioning
Maybe I misunderstood the meaning of the versioning here.
I want to find all the documents that are in version 1 so I can update them.
An obvious way is to go through all the document one by one and select those that are in version 1.
Question:
Is it possible to retrieve all the Documents that are in version 1 with ONE query?
Because of Elasticsearch distributed nature, it needs a way to ensure that changes are applied in the correct order. This is where _version comes into play. It's an internal way of making sure than an older version of a document never overwrites a newer version.
You can also use _version as a way to make sure that the document you want to delete / update hasn't been modified in the meantime - this is done by specifying the version number in the URL; for example PUT /website/blog/1?version=5 will succeed only if the current _version of the document stored in the index is 5.
You can read more about it here: Optimistic Concurrency Control
To answer your question,
Is it possible to retrieve all the Documents that are in version 1 with ONE query?
No.
You can use scripted _reindex into an empty temporary index. The target index will then contain just those documents that have _version=1.
You can add further query stanzas as well, to limit your raw input (using the reverse index, faster), as well as further painless conditions (per document, flexible), too.
# ES5, use "source" i/o "inline" for later ES versions.
http POST :9200/_reindex <<END
{
"conflicts": "proceed",
"source": {
"index": "your-source-index"
},
"dest": {
"index": "some-temp-index",
"version_type": "external"
},
"script": {
"lang": "painless",
"inline": "if(ctx._version!=1){ctx.op='delete'}"
}
}
END

elasticsearch query for newest index

Elasticsearch newbie.
I would like to query for the newest index.
Every day logstash creates new indices with a naming convention something like: our_sales_data-%{dd-mm-yyyy}% or something very close. Se I end up with lots of indices like:
our-sales_data-14-09-2015
our-sales-data-15-09-2015
our-sales-data-16-09-2015
and so on.
I need to be able to query for the newest index. Obviously I can query for and retrieve all the indices with 'our-sales-data*' in the name.. but I only want to return the very newest one and no other.
Possible?
Well the preferred method would be to compute the latest index name from client side by resolving the date in our_sales_data-%{dd-mm-yyyy}%.
Another solution would be to run a sort query and get one of the latest document. You can infer the index from the index name of the document.
{
"size": 1,
"sort": {
"#timestamp": {
"order": "desc"
}
}
}
We have a search alias and a write alias. The write alias is technically always the latest until we roll it over and add a new one into the this alias.
Our search alias contains all the previous indexes plus the latest index (also in write).
Could you do something like this and then just query the write alias?

Resources