Why are queries not being logged? - elasticsearch

I've got an enviroment set on Dev that should keep a log with every query ran, but it's not writing anything. I'm using the slow-log feature for it...
These are my thresholds on the elasticsearch.yml:
http://pastebin.com/raw.php?i=qfwnruhD
And this is my whole logging.yml:
http://pastebin.com/raw.php?i=aXg8xHNE
I'm using ElasticSearch 1.3.1 in this enviroment.

You should set the threshold to 0ms if you want to log all queries. On a smaller index I was testing on, lots of queries were taking less than 1ms.
If that doesn't work, perhaps elasticsearch isn't using the config file you are updating.

Related

How can I find the most used query from Elasticsearch?

I have a Elasticsearch cluster running on AWS Elasticsearch instance. It is up running for a few months. I'd like to know the most used query requests over the last few months. Does Elasticsearch save all queries somewhere I can search? Or do I have to programmatically save the requests for analysis?
As far as I'm aware, Elasticsearch doesn't by default save a record or frequency histogram of all queries. However, there's a way you could have it log all queries, and then ship the logs somewhere to be aggregated/searched for the top results (incidentally this is something you could use Elasticsearch for :D). Sadly, you'll only be able to track queries after you configure this, I doubt that you'll be able to find any record of your historical queries the last few months.
To do this, you'd take advantage of Elasticsearch's slow query log. The default thresholds are designed to only log slow queries, but if you set those defaults to 0s then Elasticsearch would log any query as a slow query, giving you a record of all queries. See that link above for detailed instructions how, you could set this for a whole cluster in your yaml configuration file like
index.search.slowlog.threshold.fetch.debug: 0s
or set it dynamically per-index with
PUT /<my-index-name>/_settings
{
"index.search.slowlog.threshold.query.debug": "0s"
}
To be clear the log level you choose doesn't strictly matter, but utilizing debug for this would allow you to keep logging actually slow queries at the more dangerous levels like info and warn, which you might find useful.
I'm not familiar with how to configure an AWS elasticsearch cluster, but as the above are core Elasticsearch settings in all the versions I'm aware of there should be a way to do it.
Happy searching!

How to debug document not available for search in Elasticsearch

I am trying to search and fetch the documents from Elasticsearch but in some cases, I am not getting the updated documents. By updated I mean, we update the documents periodically in Elasticsearch. The documents in ElasticSearch are updated at an interval of 30 seconds, and the number of documents could range from 10-100 Thousand. I am aware that the update is generally a slow process in Elasticsearch.
I am suspecting it is happening because Elasticsearch though accepted the documents but the documents were not available for searching. Hence I have the following questions:
Is there a way to measure the time between indexing and the documents being available for search? There is setting in Elasticsearch which can log more information in Elasticsearch logs?
Is there a setting in Elasticsearch which enables logging whenever the merge operation happens?
Any other suggestion to help in optimizing the performance?
Thanks in advance for your help.
By default the refresh_interval parameter is set to 1 second, so unless you changed this parameter each update will be searchable after maximum 1 second.
If you want to make the results searchable as soon as you have performed the update operation you can use the refresh parameter.
Using refresh=wait_for the endpoint will respond once a refresh has occured. If you use refresh=true a refresh operation will be triggered. Be careful using refresh=true if you have many update since it can impact performances.

Docker Elasticsearch Bulk index timeout

I am running Elasticsearch 2.3 using the docker official builds. I am trying to bulk index a fairly large dataset. The dataset in question is abotu 700mb and on a non dockerized setup takes around 30 minutes. Around 24 hours ago I started the bulk index operation on the docker elasticsearch container. As of yet it still hasn't completed, worse there is no load on the server which indicates it's not even attempting to index.
I know the bulk indexing works because I can index a smaller dataset and it works without a problem.
Is there any specific settings that I need to be aware of when indexing data over a certain size? or any way to check why it errored?
Thanks in advance.
For any future people reading this, firstly Hello from the past!
Secondly, elasticsearch has a default bulk maximum size of 100mb so make sure you're requests (including posted files) are below that

magento and solr reindexing issue

im having troubles reindexing magento with solr, im getting the following error via ssh (all other indexes successfully:
Error reindexing Solr: Solr HTTP error: HTTP request failed, Operation timed out after 5001 milliseconds with 0 bytes received
any ideas how to fix this?
many thanks
Looks like there is a time limit of 5000 miliseconds where as your solr indexing needs more time.
Increase time limit.
While indexing is running check solr log using tail commmand.
Using Solr interface query solr if some new products or data update in place.
Also you can write some log code in sole client.php adddoc function to check if this is getting called or not.
Having the same issue... I'm assuming you're using Magento Solarium. I opened an issue on github with the dev, I'll update you if he responds with a solution. In the meanwhile, if you were able to fix it, please let us know.
Since this is the only relevant hit from Google considering this issue, I add my findings here. The issue arises when you have a large database of products (or many shops together with many products). I noticed SOLR was filling up until the error occurred, after that the SOLR index was empty. Then I found in the code that the indexing process ends with committing all the changes. This is where the timeout happens.
Just put the timeout settings in system -> configuration -> catalogus -> Solarium search to a large number (like 500 seconds), do a total re-index and put back the timeout settings to a more reasonable number (2 seconds).
Though there are 2 options, one for search and a general timeout setting, this doesn't seem to work. If you change the search time out setting it still affects the indexing process.
You don't want to leave the timeout at 500 seconds, this can cause serious issues on your server performance.

How to check elasticsearch query performance?

I need to check elasticsearch query performance. But due to caching I am unable to figure out actual query performance. Is there any way to stop caching.
I had tried _cache/clear as per suggested below document.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-clearcache.html
$ curl -XPOST 'http://localhost:9200/_cache/clear'
Also tried , set index.cache.filter.type to none in elasticsearch.yml
index.cache.filter.type : none
I using Sense to run elasticseaech query.
Any other way to doing this?
Maybe restart your elastic search cluster, then run some queries that hit more or less the same data but not the actual query you want to test, and then the query you want to test.
I also notice the first query you run against a restarted cluster is slow, but after that everything tends to be fast.
It's very possible that ElasticSearch isn't even caching the query you're trying to get performance data on, it's just really really fast ;)

Resources