Is it possible to do bulk atomic updates in ElasticSearch?
I am aware that regular bulk updates are not atomic as noted here:
https://www.elastic.co/guide/en/elasticsearch/guide/current/bulk.html#bulk
Is there any other way to atomically update multiple documents? i.e. Either all the updates happen or none of them do.
Elasticsearch doesn't currently have a way to do what you're asking for. There are several responses to this question on the Elasticsearch site.
https://discuss.elastic.co/t/is-es-support-transaction-such-as-rollback/12579
https://discuss.elastic.co/t/rollback-es-6/85958
https://github.com/elastic/elasticsearch/issues/15316
Currently you would need to architect a solution yourself. There is an interesting blog about a potential solution here: https://blog.codecentric.de/en/2014/10/transactions-elasticsearch/
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed last year.
Improve this question
I am making a real-time search that will indicate the correct pattern from the search string. Then it will search with this pattern and return with the correct database schema dynamically.
Example Like: Google Assistant
You are comparing apple with orange if you are comparing GraphQL with ElasticSearch. They are totally different technologies.
GraphQL is the API layer technology which compare to REST. It mainly defines the request/response format and structure of your HTTP based API. It is not another NoSQL that help you to store and query data efficiently.
If you are using GraphQL , you still need to query the data by yourself , which the data may actually store and come from NoSQL , SQL DB , ElasticSearch or other web service or blablabla . GraphQL does not care about where you store the data ,the data can even store at multiple data sources. What he cares is that you tell him how to get the data.
Back to your case , you most probably can use ElasticSearch for storing and searching the data efficiently. And put GraphQL in front of ElasticSearch such that users/developers interact with the service through GraphQL API in order to enjoy GraphQL benefits.
It depends on the Use case.
Recently I realized I could have used GraphQL for Searching instead of Elasticsearch (For Just This Use case), with respect to cost of running two services one that GraphQL was reading from and other one is Elasticsearch.
All in all it good you can use these two technologies cause you may need them in different use cases.
I'm currently designing the architecture of my project or atleast try to figure it out what will be useful in my case.
** Simple use case
I will have several thousands of profiles in a backend and I to need implement a fast search engine. So elasticsearch look perfect in that case. Everytime a profile is updated, the index will be updated by an asynchronous task.
My question now is : If I want to implement a cache system for the detail of a profile. Should I stick with elasticsearch and put these data in my index ? Or use Redis and do something like profil_id => data ?
I think both sounds good the problem is whenever a profile is updated, I will have to flush it after the reindexing in elasticsearch. If I want to see the change in my backend.
So what can I do ? Thank you so much !
You should consider using RediSearch. Using RediSearch can provide you a solution for your needs, getting both Redis performance and a full-text support.
Elasticsearch and redis are basically meant to solve two different problems, As one does indexing while other does caching.
Redis is meant to return already requested data as fast as possible whereas as
Elasticsearch is a search and analytics engine, it would perfectly fit a use-case where you have to implement a fast search engine and it will be more performant than any in-memory data structure store or cache such as redis(Assuming your searches will be complex, will involve some aggregation/filters).
The problem comes profile updates Since your profile updates are not that frequent you could actually do partial updates to the ES index rather doing reindex.So whenever a person updates its profile get the changeling set(changed data) and do a partial update to the particular document in ES Index. You can see how its done here partial update.
This one particular stackoverflow answer will help you cache vs indexing
I am using Elasticsearch 2.4. I want to perform large number of deletion operations together.Each deletion operation has a distinct criteria to delete. Bulk Delete Query Plugin provides deletion based on query. But, I want to generate multiple deletion queries and used it with the Bulk Api so that there will be a single request. Is it possible ?
The answer is no.
Elasticsearch explicitly mentioned in Bulk API documentation: The possible actions are index, create, delete and update.
Please refer to here: ES 2.x bulk operation guide
1 step further to version 5.x which I am currently working on, UpdateByQuery became built in function but still not included in Bulk operation actions.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I need to delete docs frequent,but es only flags these as deleted.If there are a lot of deleted docs,the speed of query will lower?Has other problems?
EDIT
In other words,I often delete a lot of docs from a index,and never use force merge api to release disk usage,i will have query performance issues after a period of time?
you must simply send an http POST request to your elasticsearch node, in below structure
http://localhost:9200/your_index_name/_forcemerge
for more details you can read this page
If there are a lot of deleted docs,the speed of query will lower?
the answer is yes
In other words,I often delete a lot of docs from a index,and never use force merge api to release disk usage,i will have query performance issues after a period of time?
elasticsearch automatically run merge process when insert or update operations is too high (that causes segments being dirty). in other hand you can use forcemerge api to have some controls on merging process yourself.
Documents are stored in the index as segments which are formed when the document is created in lucene. Deleting the document from elastic don't actually delete the document from the underlying segment, which forms the basic data storage for ES.
Yes having lot of deleted documents will have query performance issues as the query will still search for matched documents in the deleted segments as well.
Force_merge or optimize the index is usually the option to do about it, but you should take little care to handle this as this is heavy disk i/o operation.
$ curl -XPOST 'http://localhost:9200/kimchy,elasticsearch/_optimize'
$ curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true'
Can you explain more why you have so many deletes frequently. As we also had huge deletes frequently, but we handled them on index level. Our deletes happens for documents for certain date-range , so we indexes the documents based on dates and when the time comes to delete the doc for certain date we just simply drop the index.
If you have any pattern for the documents to be deleted, i suggest you separate them out in a index and just drop the index
I'm implementing a software in which data is sent to some web server, stored in an Elasticsearch and then queried right away. I know that Elasticsearch is a NoSQL following BASE (Basically Available, soft State, eventual consistency) principles which means there's no guarantee when your data will be available for searching.
That's why when I query for the data just being added to Elasticsearch, I have to wait for some time before it is found. Right now all I can do is to implement a polling mechanism to detect when data is completely applied. It is worth mentioning that if I'm using _id to retrieve a document, it is found right away. But if I'm searching for it using some type of Elasticsearch query (like term or query_string), it will take a while before the document is found.
So my question is: Is there a cheaper way to detect when data is completely indexed in Elasticsearch?
This part is done by the Refresh API, this API does not provide a way to know when the indexed data is available. But the folks of elastic are working in a hack to let the request wait for a refresh.
I think should be better if you take a look here: https://www.elastic.co/blog/refreshing_news
This post have a good overview of the issues and the stuffs that they are working to improve.
Hope it help :D