How to get a response from Elastic Search after indexing? - elasticsearch

I'm using CouchDB river plugin with Elastic Search. In my web application, I am using CouchDB's bulk insert to insert documents into CouchDB. This triggers the changes feed and ES reads this to index my documents. The problem now is that my web ui isn't showing anything because ES is still indexing the documents.
I'm using PyES to "talk" to ES by the way. Is there any function I can call to know whether Elastic Search is busy indexing?
Thanks a million.

Even if ES is indexing, ES should answer to queries.
Could you check with a
curl localhost:9200/_search?q=*
That your index has docs in it while indexing from couchDb?
[UPDATE]
You have to know that Elasticsearch is a Near Real Time search engine. So, you have to wait some seconds to be able to search for your docs.
You can retrieve your docs immediatly but you need to wait for the refresh process.
You can trigger manually the refresh API. But it could slow down dramatically your insertions.
Does it help?

Related

Query ElasticSearch after the index operation

I have the eservice A that executes some text processing. After it, service B has to execute some set of Elasticsearch queries on the document. The connectivity between the services provided by Kafka. The solution is tightly coupled to ES free text search capabilities, so I can't query in another way.
Possible solution:
To store the document in ES and query it. The problem is that ES is eventually consistent and I don't know if the document already indexed or not.
Is there some API to ensure that the document is already indexed?
Another option is to publish a message from service A with delay X+5 seconds, where X is the refresh interval of the index, where the document should be stored. Seems to me an unreliable solution. What do you think?
Another direction that I thought about, is some way to query the document with ES queries where the document is in memory. For example, if I will have some magic way to convert the ES query to Luciene DSL, so I don't need to deal with the eventual consistent behavior of Elasticsearch and I can query Lucine directly.
Maybe there are some other solutions?
take a look at the ?refresh flag so that an indexing request will only return once a refresh has happened. otherwise you can use the GET API to see if the document exists or not
however there is no magic options here, Elasticsearch is eventually consistent and you need to factor that in

Does Elastic Search lock queries during Bulk Indexing?

I'm bulk inserting hundreds of documents into elastic search from a worker.
At the same time a user is running queries from a UI.
The problem is the queries either timeout, or are very slow, if the bulk indexing is going on at the same time as the query.
Is that normal behaviour for Elastic search?
Is there a way to run a query at the same time as the indexing or does Elastic lock the whole index?
It's fine if the query doesnt run over the latest documents.
I'm running the latest version of elasticsearch-py and Elastic, on Windows.

How to debug document not available for search in Elasticsearch

I am trying to search and fetch the documents from Elasticsearch but in some cases, I am not getting the updated documents. By updated I mean, we update the documents periodically in Elasticsearch. The documents in ElasticSearch are updated at an interval of 30 seconds, and the number of documents could range from 10-100 Thousand. I am aware that the update is generally a slow process in Elasticsearch.
I am suspecting it is happening because Elasticsearch though accepted the documents but the documents were not available for searching. Hence I have the following questions:
Is there a way to measure the time between indexing and the documents being available for search? There is setting in Elasticsearch which can log more information in Elasticsearch logs?
Is there a setting in Elasticsearch which enables logging whenever the merge operation happens?
Any other suggestion to help in optimizing the performance?
Thanks in advance for your help.
By default the refresh_interval parameter is set to 1 second, so unless you changed this parameter each update will be searchable after maximum 1 second.
If you want to make the results searchable as soon as you have performed the update operation you can use the refresh parameter.
Using refresh=wait_for the endpoint will respond once a refresh has occured. If you use refresh=true a refresh operation will be triggered. Be careful using refresh=true if you have many update since it can impact performances.

ElasticSearch 1.7 (Spring Data ElasticSearch) update by query takes lot of time to update documents

My application allows updating multiple elasticsearch documents in single request.
I use ElasticSearch BulkRequestBuilder to update all such documents in Bulk.
BulkRequestBuilder bulkRequestBuilder = elasticSearchClient.prepareBulk();
documents.forEach(id -> {
UpdateRequest updateRequest = new UpdateRequestBuilder(elasticSearchClient)
.setType("MyDocumentType")
.setIndex("MyDocumentIndex")
.setId(id)
.setDoc("fieldName", "valueToBeUpdated")
.request();
bulkRequestBuilder.add(updateRequest);
});
//update in bulk
bulkRequestBuilder.get();
All the documents are updated with valueToBeUpdated but ElasticSearch internally takes time to update all the documents but the call to bulkRequestBuilder.get() returns even before documents are updated. (Indicating Async nature of ElasticSearch engine).
Could anyone please suggest how to make it a Sync updates of all documents?
Finally I found the core issue (may be default nature) with updates taking time by the ElasticSearch engine.
By default the ElasticSearch engines updates are ASYNC in nature (as I pointed in my question already). There are couple of links which are explaining this default behaviour.
e.g. ElasticSearch GET API Documentation states that in order to get the document , elasticsearch engine does a refresh in order to visible all previous updates if any. This hints that ASYNC nature of elastic search is causing immediate search of my documents not providing me updated documents.
As of now to continue with existing behaviour, trigger bulk update in SYNC as follows.
bulkRequestBuilder.setReplicationType(ReplicationType.SYNC).setRefresh(true).get();
Usually problems indexing/updating a lot of data comes from segment merging from ES .
One tip from ES people is to disable refresh before indexing/updating a lot of data.
You can achieve this updating index refresh_interval before indexing to refresh_interval=-1, and once all your data is indexed return it to your previous index configuration.
Tune-indexing-speed

Querying the elastic search before inserting into it

i am using spring boot application to load messages into elastic search. i have a use case where i need to query the elastic search data , get some id value and populate it in elastic search json document before inserting it into elastic search
Querying the elastic search before insertion . Will this be expensive ? If yes is there some other way to approach this issue.
You can use update_by_query to do it in one step.
But otherwise it shouldn't be slow if you do it in two steps (get + update). It depends on many things - how often you do it, how much data is transferred, etc.

Resources