Can I create a document with the update API if the document doesn't exist yet - elasticsearch

I have a very simple question :
I want to update multiple documents to elasticsearch. Sometimes the document already exists but sometimes not. I don't want to use a get request to check the existence of the document (this is decreasing my performance). I want to use directly my update request to index the document directly if it doesn't exist yet.
I know that we can use upsert to create a non existing field when updating a document, but this is not what I want. I want to index the document if it doesn't exist. I don't know if upsert can do this.
Can you provide me some explaination ?
Thanks in advance!

This is doable using the update api. It does require that you define the id of each document, since the update api requires the id of the document to determine its presence.
Given an index created with the following documents:
PUT /cars/car/1
{ "color": "blue", "brand": "mercedes" }
PUT /cars/car/2
{ "color": "blue", "brand": "toyota" }
We can get the upsert functionality you want using the update api with the following api call.
POST /cars/car/3/_update
{
"doc": {
"color" : "brown",
"brand" : "ford"
},
"doc_as_upsert" : true
}
This api call will add the document to the index since it does not exist.
Running the call a second time after changing the color of the car, will update the document, instead of creating a new document.
POST /cars/car/3/_update
{
"doc": {
"color" : "black",
"brand" : "ford"
},
"doc_as_upsert" : true
}

AFAIK when you index the documents (with a PUT call), the existing version gets replaced with the newer version. If the document did not exist, it gets created. There is no need to make a distinction between INSERT and UPDATE in ElasticSearch.
UPDATE: According to the documentation, if you use op_type=create, or a special _create version of the indexing call, then any call for a document which already exists will fail.
Quote from the documentation:
Here is an example of using the op_type parameter:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1?op_type=create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'
Another option to specify create is to use the following uri:
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1/_create' -d '{
"user" : "kimchy",
"post_date" : "2009-11-15T14:12:12",
"message" : "trying out Elasticsearch"
}'

For bulk API use
bulks.push({
update: {
_index: 'index',
_type: 'type',
_id: id
}
});
bulks.push({"doc_as_upsert":true, "doc": your_doc});

As of elasticsearch-model v0.1.4, upserts aren't supported. I was able to work around this by creating a custom callback.
after_commit on: :update do
begin
__elasticsearch__.update_document
rescue Elasticsearch::Transport::Transport::Errors::NotFound
__elasticsearch__.index_document
end
end

I think you want "create" action
Here's the bulk API documentation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
The index and create actions expect a source on the next line, and have the same semantics as the op_type parameter in the standard index API: create fails if a document with the same ID already exists in the target, index adds or replaces a document as necessary.
Difference between actions:
create
(Optional, string) Indexes the specified document if it does not already exist. The following line must contain the source data to be indexed.
index
(Optional, string) Indexes the specified document. If the document exists, replaces the document and increments the version. The following line must contain the source data to be indexed.
update
(Optional, string) Performs a partial document update. The following line must contain the partial document and update options.
doc
(Optional, object) The partial document to index. Required for update operations.

Related

Update a document using another field than _id in ElasticSearch

I would like to do a partial update of a document in ElasticSearch 2.3. The documentation shows:
POST /website/blog/1/_update
{
"doc" : {
"tags" : [ "testing" ],
"views": 0
}
}
Is there a way to update a document using another field other than the _id (here 1) to identify the document?
Use update_by_query API and run a query which will select the documents that match the other field that you want. Basically, with that query you identify the documents you want to update following your own rules.

how to copy ElasticSearch field to another field

I have 100GB ES index now. Right now I need to change one field to multi-fields, such as: username to username.username and username.raw (not_analyzed). I know it will apply to the incoming data. But how can I make this change affect on the old data? Should I using index scroll to copy the whole index to a new one, Or there is a better solution to just copy one filed please.
There's a way to achieve this without reindexing all your data by using the update by query plugin.
Basically, after installing the plugin, you can run the following query and all your documents will get the multi-field re-populated.
curl -XPOST 'localhost:9200/your_index/_update_by_query' -d '{
"query" : {
"match_all" : {}
},
"script" : "ctx._source.username = ctx._source.username;"
}'
It might take a while to run on 100GB docs, but after this runs, the username.raw field will be populated.
Note: for this plugin to work, one needs to have scripting enabled.
POST index/type/_update_by_query
{
"query" : {
"match_all" : {}
},
"script" :{
"inline" : "ctx._source.username = ctx._source.username;",
"lang" : "painless"
}
}
This worked for me on es 5.6, above one did not!

How to add documents to existing index in elasticsearch

Am using Elasticsearch 1.4. My requirement is I will have data every hour and that needs to be uploaded. So the approach that I have taken is to create an index - "demo" and upload the data. So, the first hour data gets inserted. Now, my question is how to append the subsequent hours data into this index.
PUT /demo/userdetails/1
{
"user" : "kimchy",
"message" : "trying out Elastic Search"
}
Now I am trying to add another document
{"user": "swarna","message":"hi"}
You simply need to PUT the additional documents. In your example above you did
PUT /demo/userdetails/1 { "user" : "kimchy", "message" : "trying out Elastic Search" }
Now you would do this:
PUT /demo/userdetails/2 {"user": "swarna","message":"hi"}
In you command there demo is the index, userdetails is the type, and the number is the document id. If you omit the document id ES will make one up for you.

Elasticsearch field name aliasing

Is it possible to setup alias for field names in elasticsearch? (Just like how index names can be aliased)
For example: i have a document {'firstname': 'John', 'lastname': 'smith'}
I would like to alias 'firstname' to 'fn'...
Just a quick update, Elasticsearch 6.4 came up with feature called Alias Datatype. Check the below mapping and query as sample.
Note that the type of the field is alias in the below mapping for fieldname fn
Sample Mapping:
PUT myindex
{
"mappings": {
"_doc": {
"properties": {
"firstname": {
"type": "text"
},
"fn": {
"type": "alias",
"path": "firstname"
}
}
}
}
}
Sample Query:
GET myindex/_search
{
"query": {
"match" : {
"fn" : "Steve"
}
}
}
The idea is to use the alias for actual field on which inverted index is created. Note that fields with alias datatype aren't meant for write operations and its only meant for querying purpose.
Although you can refer to the link I've mentioned for more details, below are just some of the important points.
Field alias is only meant to be used when your index has a single mapping. Index has to be created post 6.xx version or be created in older version with the setting index.mapping.single_type: true
Can be used in querying, aggregations, sorting, highlighting and suggestion operations
Target field must be actual field on which inverted index is created
Cannot create alias of another alias field
Cannot use alias on multiple fields. Single alias, Single field.
Cannot be used as part of source filtering using _source.
There is no direct field alias functionality. However, you could rename the fields upon indexing using the index_name property in your mappings.
index_name : The name of the field that will be stored in the index.
Defaults to the property/field name.
See here for more information: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-core-types.html
Adding alias fn for existing field firstname
PUT myindex/_mapping
{
"properties": {
"fn": {
"type": "alias",
"path": "firstname"
}
}
}
Should work this way as of Elasticsearch 7.
Probably you can try creating an alias on your index with filter on the desired field. Your filter must be written in such a way that it selects all the entries from your field. Please refer Filtered aliases section in here. But I am interested in knowing your use case. Why you want to create alias on particular field.

Is there a way to apply the synonym token filter in ElasticSearch to field names rather than the value?

Consider the following JSON file:
{
"titleSony": "Matrix",
"cast": [
{
"firstName": "Keanu",
"lastName": "Reeves"
}
]
}
Now, I know in ElasticSearch, you can apply a synonym token filter to field values as given in the following link: Elasticsearch Analysis: Synonym token filter.
Hence, I can create a "synonym.txt" file with Matrix => Matx, then if I search for titleSony:Matx, it will return the documents with Matrix as well.
Now, what I would like is to create a synonym for the field name titleSony. For example - titleSony => titleAll, such that when I search for titleAll, I should get all documents with titleSony as well.
Is there any way to accomplish this in ElasticSearch?
Now, what I would like is to create a synonym for the field name "titleSony". For example - titleSony => titleAll , hence when I search for "titleAll", I should get all documents with "titleSony" as well.
Yes, somewhat. Elasticsearch has some default behavior very similar to this, which I'll touch on in a bit.
The feature you're looking for is called "Copy to field." It allows you to specify that the terms in one field should be copied into another. This is useful for consolidating terms you expect to match into a single field, to help simplify your query when you would like to match against any one of a number of fields.
In this example, you would specify in your mapping that the terms in the titleSony field ought to be copied into the titleAll field. Presumably you'd have other fields (say, titleDisney) which also copy into that field as well. So a search against titleAll will effectively match the other fields whose terms are copied into it.
An excerpt of your mapping might look something like this:
{
"movies" : {
"properties" : {
"titleSony" : { "type" : "string", "copy_to" : "titleAll" },
"titleDisney" : { "type" : "string", "copy_to" : "titleAll" },
"titleAll" : { "type" : "string" },
"cast" : { ... },
...
}
}
I mentioned earlier that Elasticsearch does something like this. By default it creates a special field called _all into which all the document's terms are copied. This field lets you construct very simple queries to match against terms that occur in any field on the document. So as you see, this is a fairly common convention in Elasticsearch. (Elasticsearch mapping: _all field.)

Resources