Get fresh SOLR results, not from cache - caching

The query is send to solr is below:
http://1.2.3.4:8983/solr/data.results/select?q=*:*&fq=date:[2016-01-01T00:00:00Z TO 2015-01-31T23:59:59Z]&sort=publishdate desc&wt=json&indent=true
This gives me 10 results. Now of those 10 results, I update the latest record as isdeleted=true and I pass a query to Solr &fq=-isdeleted:true. But it still gives me same 10 results as before. Now in the date range, I change start range to 2016-01-02T00:00:00Z instead of 2016-01-01T00:00:00Z (fq=date:[2016-01-02T00:00:00Z TO 2015-01-31T23:59:59Z]), then the latest record which is set to true does not shows up. I guess my output of [2016-01-01T00:00:00Z TO 2015-01-31T23:59:59Z] has got saved in the cache and hence it is showing the same results. This has been occuring repeadtedly, I have tried with many different queries. How can I refresh the cache so that I get the latest updated records? Is there any configuration change to make in solr so as to always get the latest updated records?

Related

Elasticsearch update documents without retrieving them

Is there a way to update documents something similar to UpdateByQuery, but in bulks and without getting them.
According to the documentation we are unable to set a size for UpdateByQuery requests.
I.e Update 5 documents at a time and not all at once.
One solution that seems obvious is to GET 5 documents, and then UPDATE them.
I'm trying to come up with a way where I dont have to do a GET request for every update.
You can set the batch size on UpdateByQueryRequest with setBatchSize as in this page from the docs.
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-document-update-by-query.html
Now that's based on the latest version of the Java client. If you are using a different client or version, it may not be present. Hope that helps.

Does updating a doc increase the "delete" count of the index?

I am facing a strange issue in the number of docs getting deleted in an elasticsearch index. The data is never deleted, only inserted and/or updated. While I can see that the total number of docs are increasing, I have also been seeing some non-zero values in the docs deleted column. I am unable to understand from where did this number come from.
I tried reading whether the update doc first deletes the doc and then re-indexes it so in this way the delete count gets increased. However, I could not get any information on this.
The command I type to check the index is:
curl -XGET localhost:9200/_cat/indices
The output I get is:
yellow open e0399e012222b9fe70ec7949d1cc354f17369f20 zcq1wToKRpOICKE9-cDnvg 5 1 21219975 4302430 64.3gb 64.3gb
Note: It is a single node elasticsearch.
I expect to know the reason behind deletion of docs.
You are correct that updates are the cause that you see a count for documents delete.
If we talk about lucene then there is nothing like update there. It can also be said that documents in lucene are immutable.
So how does elastic provides the feature of update?
It does so by making use of _source field. Therefore it is said that _source should be enabled to make use of elastic update feature. When using update api, elastic refers to the _source to get all the fields and their existing values and replace the value for only the fields sent in update request. It marks the existing document as deleted and index a new document with the updated _source.
What is the advantage of this if its not an actual update?
It removes the overhead from application to always compile the complete document even when a small subset of fields need to update. Rather than sending the full document, only the fields that need an update can be sent using update api. Rest is taken care by elastic.
It reduces some extra network round-trips, reduce payload size and also reduces the chances of version conflict.
You can read more how update works here.

How to debug document not available for search in Elasticsearch

I am trying to search and fetch the documents from Elasticsearch but in some cases, I am not getting the updated documents. By updated I mean, we update the documents periodically in Elasticsearch. The documents in ElasticSearch are updated at an interval of 30 seconds, and the number of documents could range from 10-100 Thousand. I am aware that the update is generally a slow process in Elasticsearch.
I am suspecting it is happening because Elasticsearch though accepted the documents but the documents were not available for searching. Hence I have the following questions:
Is there a way to measure the time between indexing and the documents being available for search? There is setting in Elasticsearch which can log more information in Elasticsearch logs?
Is there a setting in Elasticsearch which enables logging whenever the merge operation happens?
Any other suggestion to help in optimizing the performance?
Thanks in advance for your help.
By default the refresh_interval parameter is set to 1 second, so unless you changed this parameter each update will be searchable after maximum 1 second.
If you want to make the results searchable as soon as you have performed the update operation you can use the refresh parameter.
Using refresh=wait_for the endpoint will respond once a refresh has occured. If you use refresh=true a refresh operation will be triggered. Be careful using refresh=true if you have many update since it can impact performances.

Elasticsearch: Fetching results faster than update

I am having have some Elasticsearch documents indexed and I am updating one of them (a minor change, such as updating the field title of the latter).
The issue is that as soon as I do this then I fetch all the documents - however, the document does not seem to have been updated yet.
Instead, if I resend the fetch request then I receive the updated data.
What could be the issue here?

Seeing latest results in Kibana after the page limit is reached

I am new to logstash. I have set up my logstash to populate elastic search and have Kibana read out of it. The problem I am facing is that after the
number of records = results per page x page limit
the UI stops getting new results. Is there a way to set Kibana up such that it discards the old results instead of the latest after the limit is reached?
To have kibana read the latest results, reload the query.
To have more pages available (or more results per page), edit the panel.
Make sure the table is reverse sorted by #timestamp.

Resources