I am implementing ElasticSearch 7.1.1 in my application using Python requests library. I have successfully created a document in the elastic index using
r = requests.put(url, auth=awsauth, json=document, headers=headers)
However, while updating an existing document, the JSON body(containing to be updated values) that I pass to the method, replaces the original document. How do I overcome this? Thank you.
You could do the following:
document = {
"doc": {
"field_1": "value_1",
"field_2": "value_2"
},
"doc_as_upsert": True
}
...
r = requests.post(url, auth=awsauth, json=document, headers=headers)
It should be POST instead of PUT
You can update existing fields and also add new fields.
Refer the doc in the comment posted by Nishant Saini.
Related
I am new to Elasticsearch and there are some requirements where I need to ingest and index pdf using Kibana. I have figured out that we have to create a pipeline for the above purpose but do not know which processor to use and how should I configure those. I discovered that the node of my Elasticsearch has ingest-attachment plugin installed. The version which I am using is Elasticsearch 7.14,so any help on it is appreciated thank you.
This might be useful for you, the ingest atachment processor plugin uses base64 for a pdf to extract and ingest data. You would be require to get base64 abd ingest it into a pipeline. For example:
encoded_data = base64.b64encode(data).decode('utf-8') # data is the file that you are parsing
body = {
'query': {
'bool': {
"filter": [
{"ids": { 'values': [contentDocumentId]}},
{"term": {"contentVersionId": contentVersionId}}
]
}
},
'script': {
'source': 'ctx._source["file_data"] = params._file_data',
'params': {'_file_data': encoded_data}
}
}
response = client.update_by_query(conflicts='proceed', index=_index, pipeline='attachment', body=json.dumps(body))
I am using update by query for my used case you can check if you want to use update or update by query
Specifically what I'm trying to achieve through Elasticsearch.Net and NEST 6.x APIs is the example of setting dynamic=strict on the _doc type shown in this article using JSON.
The setting at the type level is also mentioned in the official docs
You can send this request with the high level client using
var client = new ElasticClient();
var putMappingResponse = client.Map<object>(m => m
.Index("testindex1")
.Type("_doc")
.Dynamic(DynamicMapping.Strict)
);
which will send the following request
PUT http://localhost:9200/testindex1/_doc/_mapping
{
"dynamic": "strict"
}
The end result will be that of strict behaviour for dynamic fields for the _doc type in the testindex1 index.
I dipped into the low-level client to effect this solution, whereas when I posted the question I was searching in the high level client.
using Nest; // C#
var pd = PostData.String("{ \"dynamic\": \"strict\" }");
var result = client.LowLevel.IndicesPutMappingPost<PutMappingResponse>(indexNm, "_doc", pd);
where the client variable is an ElasticClient instance.
and indexNm variable is a string containing "testindex1"
Results in
{
"testindex1": {
"aliases": {},
"mappings": {
"_doc": {
"dynamic": "strict",
where I see dynamic: strict has been added to the _doc type mapping as expected.
I have documents which contains only "url"(analyzed) and "respsize"(not_analyzed) fields at first. I want to update documents that match the url and add new field "category"
I mean;
at first doc1:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500"
}
I have an external data and I know "stackoverflow.com" belongs to category 10,
And I need to update the doc, and make it like:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500",
"category":"10"
}
Of course I will do this all documents which url fields has "stackoverflow.com"
and I need the update each doc oly once.. Because category data of url is not changeable, no need to update again.
I need to use _update api with _version number to check it but cant compose the dsl query.
EDIT
I run this and looks works fine:
But documents not changed..
Although query result looks true, new field not added to docs, need refresh or etc?
You could use the update by query plugin in order to do just that. The idea is to select all document without a category and whose url matches a certain string and add the category you wish.
curl -XPOST 'localhost:9200/webproxylog/_update_by_query' -H "Content-Type: application/json" -d '
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"url": "stackoverflow.com"
}
},
{
"missing": {
"field": "category"
}
}
]
}
}
}
},
"script" : "ctx._source.category = \"10\";"
}'
After running this, all your documents with url: stackoverflow.com that don't have a category, will get category: 10. You can run the same query again later to fix new stackoverflow.com documents that have been indexed in the meantime.
Also make sure to enable scripting in elasticsearch.yml and restart ES:
script.inline: on
script.indexed: on
In the script, you're free to add as many fields as you want, e.g.
...
"script" : "ctx._source.category1 = \"10\"; ctx._source.category2 = \"20\";"
UPDATE
ES 2.3 now features the update by query functionality. You can still use the above query exactly as is and it will work (except that filtered and missing are deprecated, but still working ;).
That all sounds great but just to add to #Val answer, Update By Query is available form ElasticSearch 2.x but not for earlier versions. In our case we're using 1.4 for legacy reasons and there is no chance of upgrading in forseeable future so another solution is using the Update by query plugin provided here: https://github.com/yakaz/elasticsearch-action-updatebyquery
Hi I'm trying to update the ttl of a document with the following way but it seems that it is not getting updated:
POST /my_index/my_type/AU4Gd1DVbqjanfsolMgP/_update
{
"doc": {
"_ttl": 60000
},
"doc_as_upsert": true
}
With the script way it is getting updated normally.. What is the problem? does anyone know?
I think you can only update it through script. From the documentation:
It also allows to update the ttl of a document using ctx._ttl and timestamp using ctx._timestamp. Note that if the timestamp is not updated and not extracted from the _source it will be set to the update date.
In addition to _source, the following variables are available through the ctx map: _index, _type, _id, _version, _routing, _parent, _timestamp, _ttl.
I have a document in elastic search.
I am trying to implement a method where I can specify a string id to delete a document from the index using NEST client.
This is the indexed doc that I want to delete:
"hits":[{"_index":"movies","_type":"list","_id":"100","_score":0.6349302, "_source" : {
"owner": "Bob",
"tags": "Bobita",
"title": "Movie clips of Bob"
}}
This is my C# code which doesn't delete the doc. It says id is NULL.
Uri localhost = new Uri("http://localhost:9200");
var setting = new ConnectionSettings(localhost);
setting.SetDefaultIndex("movies");
var client = new ElasticClient(setting);
IDeleteResponse resp = client.Delete("100");
if (!resp.Found)
{
logger.Error("Failed to delete index with id=100");
}
What am I missing?
I believe the issue here is that NEST cannot properly infer the Id property of your document because you are not specifying a type.
If possible, try this instead:
client.Delete<YourMovieType>("100");
Using NEST 7.x on Elasticsearch 7.0, following code works:
var x = _client.Delete<dynamic>(1);
(where 1 is '_id' value)
Use 'dynamic' if you have not defined the mapping. Else I would suggest to use the actual type.
await _elasticClient.DeleteAsync(new DeleteRequest(indexName, documentId));