magento and solr reindexing issue - magento

im having troubles reindexing magento with solr, im getting the following error via ssh (all other indexes successfully:
Error reindexing Solr: Solr HTTP error: HTTP request failed, Operation timed out after 5001 milliseconds with 0 bytes received
any ideas how to fix this?
many thanks

Looks like there is a time limit of 5000 miliseconds where as your solr indexing needs more time.
Increase time limit.
While indexing is running check solr log using tail commmand.
Using Solr interface query solr if some new products or data update in place.
Also you can write some log code in sole client.php adddoc function to check if this is getting called or not.

Having the same issue... I'm assuming you're using Magento Solarium. I opened an issue on github with the dev, I'll update you if he responds with a solution. In the meanwhile, if you were able to fix it, please let us know.

Since this is the only relevant hit from Google considering this issue, I add my findings here. The issue arises when you have a large database of products (or many shops together with many products). I noticed SOLR was filling up until the error occurred, after that the SOLR index was empty. Then I found in the code that the indexing process ends with committing all the changes. This is where the timeout happens.
Just put the timeout settings in system -> configuration -> catalogus -> Solarium search to a large number (like 500 seconds), do a total re-index and put back the timeout settings to a more reasonable number (2 seconds).
Though there are 2 options, one for search and a general timeout setting, this doesn't seem to work. If you change the search time out setting it still affects the indexing process.
You don't want to leave the timeout at 500 seconds, this can cause serious issues on your server performance.

Related

Elasticsearch update documents without retrieving them

Is there a way to update documents something similar to UpdateByQuery, but in bulks and without getting them.
According to the documentation we are unable to set a size for UpdateByQuery requests.
I.e Update 5 documents at a time and not all at once.
One solution that seems obvious is to GET 5 documents, and then UPDATE them.
I'm trying to come up with a way where I dont have to do a GET request for every update.
You can set the batch size on UpdateByQueryRequest with setBatchSize as in this page from the docs.
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-document-update-by-query.html
Now that's based on the latest version of the Java client. If you are using a different client or version, it may not be present. Hope that helps.

How to debug document not available for search in Elasticsearch

I am trying to search and fetch the documents from Elasticsearch but in some cases, I am not getting the updated documents. By updated I mean, we update the documents periodically in Elasticsearch. The documents in ElasticSearch are updated at an interval of 30 seconds, and the number of documents could range from 10-100 Thousand. I am aware that the update is generally a slow process in Elasticsearch.
I am suspecting it is happening because Elasticsearch though accepted the documents but the documents were not available for searching. Hence I have the following questions:
Is there a way to measure the time between indexing and the documents being available for search? There is setting in Elasticsearch which can log more information in Elasticsearch logs?
Is there a setting in Elasticsearch which enables logging whenever the merge operation happens?
Any other suggestion to help in optimizing the performance?
Thanks in advance for your help.
By default the refresh_interval parameter is set to 1 second, so unless you changed this parameter each update will be searchable after maximum 1 second.
If you want to make the results searchable as soon as you have performed the update operation you can use the refresh parameter.
Using refresh=wait_for the endpoint will respond once a refresh has occured. If you use refresh=true a refresh operation will be triggered. Be careful using refresh=true if you have many update since it can impact performances.

How to stop auto reindexing in elastic search if any update happens?

I am having a big use case with elasticsearch which has millions of records in it.
I will be updating the records frequently, say 1000 records per hour.
I don't want elastic search to reindex for my every update.
I am planning to reindex it on weekly basis.
Any Idea how to stop auto-reindex while update ?
Or any other better suggestion is welcome . Thanks in advance :)
Elasticsearch(ES) update an existing doc in below manner.
1. Deletes the old doc.
2. Index a new doc with the changes applied to it.
According to ES docs :-
In Elasticsearch, this lightweight process of writing and opening a
new segment is called a refresh. By default, every shard is refreshed
automatically once every second. This is why we say that Elasticsearch
has near real-time search: document changes are not visible to search
immediately, but will become visible within 1 second.
Note that these changes will not be visible/searchable until ES commits/flush these changes to disk cache and disk,which is control by soft-commit(es refresh interval, which is by default 1 second) and hard-commit(which actually write the document to disk, which prevent it being lost permanently and costly affair than a soft-commit).
You need to make sure, you tune your ES refresh interval, and do proper load testing, as setting it very low and very high has its own pros and cons.
for example setting it very less for example 1 second and if you have too many updates happening than it has a performance hit and it might crash your system. Also setting it very high for example 1 hour means you now don't have a NRT(near real time search) and during that time if your memory could contain again millions of doc(depending on your app) and can cause out of memory error, also committing on such a large memory is a very costly affair.

Algolia Magento site search disabled - record quota exceeded

I've installed Algoila on my Magento store a while ago. Suddenly it stopped working and reverted back to the original Magento search.
When I've checked the settings it was showing that the Admin API Key was missing. I've re-entered it but on search this error appeared:
An error occurred while saving this configuration: Record quota exceeded, change plan or delete records.
http://prntscr.com/d300va
When checking in Algolia Dashboard I can see that there are only 6k out of 10k records.
http://prntscr.com/d304fw
Does anybody have any suggestions?
I'm using magento 1.9.2.0 and Algoila extension version 1.7.2
Thanks a bunch.
I.
During the Magento re-indexing process, the number of records will increase if there are multiple sort orders defined (to achieve the best performance possible the index for each sort is pre-computed and stored+queried separately). For quota purposes, this means one record per sort order. This could be the cause of going over quota when it appears that you have less records.

Why are queries not being logged?

I've got an enviroment set on Dev that should keep a log with every query ran, but it's not writing anything. I'm using the slow-log feature for it...
These are my thresholds on the elasticsearch.yml:
http://pastebin.com/raw.php?i=qfwnruhD
And this is my whole logging.yml:
http://pastebin.com/raw.php?i=aXg8xHNE
I'm using ElasticSearch 1.3.1 in this enviroment.
You should set the threshold to 0ms if you want to log all queries. On a smaller index I was testing on, lots of queries were taking less than 1ms.
If that doesn't work, perhaps elasticsearch isn't using the config file you are updating.

Resources