Apache Solr and Dovecot - high cpu load and query timeout - performance

I am facing issues with Apache Solr and Dovecot.
Everything is working fine and the email messages are indexed correctly but randomly (especially during the night, when in general the email traffic is more relaxed, but also during the day) the Solr server has a high cpu load and a very high I/O activity which leads to slow down the queries to Solr so much that the users receive a timeout during the search in their mailboxes and also during message write (append to the mailbox).
This is my configuration:
Solr 7.7.3 on CentOS 7
Dovecot 2.3.18 on CentOS 7
/var/solr is around 250GB
The Solr server has 40GB of ram / 2GB of swap (not used).
SOLR_HEAP="30240m"
Dovecot is configured with "fts_autoindex = yes"
The solr-config and solr-schema are taken from the Dovecot public repository:
https://raw.githubusercontent.com/dovecot/core/master/doc/solr-schema-7.7.0.xml
https://raw.githubusercontent.com/dovecot/core/master/doc/solr-config-7.7.0.xml
I followed the official Dovecot guide here:
https://doc.dovecot.org/configuration_manual/fts/solr/
In the Solr logs I don't see any particular/strange behavior when the problem occur.
Probably there is something to tune, but I honestly don't know what to change for my specific configuration.
Can you help me?

Related

ubuntu server cpu utilisation increasing very quickly after installing ELK

I installed elasticsearch logstash and kibana in the ubuntu server. Before I starting these services the CPU utilization is less than 5% and after starting these services in the next minute the CPU utilization crossing 85%. I don't know why it is happening. Can anyone help me with this issue?
Thanks in advance.
There is not enough information in your question to give you a specific answer, but i will point out few possible scenarios and how to deal with them.
Did you wait long enough? sometimes there is a warmpup which is consuming higher CPU until all services are registered and finish to boot. if you have a fairly small machine it might consume higher CPU and take longer to finish.
folder write permissions. if any of the components of the ELK fails due to restricted access on needed directories either for logging, creating sub folders for sinceDB files or more it can cause it to go into an infinity loop and try again and again while it is consuming high CPU.
connection issues. ES should be the first component to start, if it fails, Kibana and Logstash will go and try to connect to the ES again and again until successful connection- which can cause high CPU.
bad logstash configuration. if logstash fails to read the file from the configurations or if you have a bad parsing, excessive parsing for example- your first "match" in the filter part will include the least common option it might consume high CPU.
For further investigation:
I suggest you to not start all of them together. start ES first. if everything goes well start Kibana and lastly start Logstash.
check the logs of all the ELK components to find error messages, failures, etc.
for a better answer I will need the yaml of all 3 components (ES, Kibana, Logstash)
I will need the logstash configuration file.
Would recommend you to analyse the CPU cycles consumed by each of the elasticsearch, logstash and kibana process.
Check specifically which process among the above is consuming the most memory/cpu via top command for example.
Start only ES first and allow it to settle and the node to be started completely before starting kibana and may be logstash after that.
Send me the logs for each and I can assist if there are any errors.

Elasticsearch speed vs. Cloud (localhost to production)

I have got a single ELK stack with a single node running in a vagrant virtual box on my machine. It has 3 indexes which are 90mb, 3.6gb, and 38gb.
At the same time, I have also got a Javascript application running on the host machine, consuming data from Elasticsearch which runs no problem, speed and everything's perfect. (Locally)
The issue comes when I put my Javascript application in production, as the Elasticsearch endpoint in the application has to go from localhost:9200 to MyDomainName.com:9200. The speed of the application runs fine within the company, but when I access it from home, the speed drastically decreases and often crashes. However, when I go to Kibana from home, running query there is fine.
The company is using BT broadband and has a download speed of 60mb, and 20mb upload. Doesn't use fixed IP so have to update A record whenever IP changes manually, but I don't think is relevant to the problem.
Is the internet speed the main issue that affected the loading speed outside of the company? How do I improve this? Is cloud (CDN?) the only option that would make things run faster? If so how much would it cost to host it in the cloud assuming I would index a lot of documents in the first time, but do a daily max. 10mb indexing after?
UPDATE1: Metrics from sending a request from Home using Chrome > Network
Queued at 32.77s
Started at 32.77s
Resource Scheduling
- Queueing 0.37 ms
Connection Start
- Stalled 38.32s
- DNS Lookup 0.22ms
- Initial Connection
Request/Response
- Request sent 48 μs
- Waiting (TTFB) 436.61.ms
- Content Download 0.58 ms
UPDATE2:
The stalling period seems to been much lesser when I use a VPN?

Apache Load Balancer 2.4 not decreasing busy

i have setup a simple loadbalancer using Apacher 2.4 for 2 tomcat servers. i have noticed that the BUSY column in the balancer-manager page never decreases and keep increasing until both of them reach around 200, the performance will be very sluggish.
i cannot find any documentation detailing about the balancer-manager frontend but i guessing the BUSY column is referring to the number of open connections to the balancer members. is that right?
does my apache LB doesnt close idles connection and keep opening new one until it exhausted the resources.
Please guide me on this. i have to keep restarting apache services every week in order to reset the BUSY column and make the LB smooth again.
Server running on Windows 2003 + Apache 2.4.4

Elasticsearch load is not distributed evenly

I am facing strange issue with Elasticsearch. I have 8 nodes with same configurations (16GB RAM and 8 core CPU).
One node "es53node6" has always high load as shown in the screenshot below. Also 5-6 nodes were getting stopped yesterday automatically after every 3-4 hours.
What could be the reason?
ES version : 5.3
there can be a fair share of reasons. Maybe all data is stored on that node (which should not happen by default), maybe you are sending all the requests to this single node.
Also, there is no automatic stopping of Elasticsearch built-in. You can configure Elasticsearch that it stops the JVM process when an out-of-memory exception occurs, but this is not enabled by default as it relies on a more recent JVM.
You can use the hot threads API to check where the CPU time is spent in Elasticsearch.

Analytics server needing high application storage memory

Our Application went live just a few months back. We have configured 2 mobile analytics server with 8gb of Ram space and 50GB of SAN space each. We have observed that Analytics server is utilizing a huge SAN space it's already 85% consumed on each server. Here are few more details how it is configured.
Number of Active Shards 24
Number of Nodes 2
Number of Data Nodes 2
MFP version : Server version: 7.1.0.00.20160801-2314
I have also noticed that document count is huge number almost 500K the memory it is taking is 28gb.
Is this the expectation or this is some sort of configuration issue. Is there any way to clean up and release some memory.
Elasticsearch (on which MobileFirst Operational Analytics is built) is very memory-intensive, and the memory usage is a function of how much data you have stored. 500K documents is not very much in the grand scheme of things, but the amount of SAN space and memory that uses depends on what is in the documents. You don't mention what version (and iFix level) of MobileFirst Platform Foundation you're using, and it's difficult to guide you without knowing that information. But, as a start, if you are collecting server logs in Operational Analytics, I'd recommend you stop doing that unless you truly need the server logs in Operational Analytics for some reason - in your application runtime, set the JNDI property "wl.analytics.logs.forward" to "false" (assuming you're using MobileFirst Platform Foundation 7.1). Then, in the Analytics Dashboard, set the TTL value for "TTL_ServerLogs" to a very small value, and check the box to apply the TTL value to existing documents (to do this, you must be running a more recent iFix level of MobileFirst Platform Foundation, as older builds didn't include this checkbox - again, assuming you're using 7.1). This should purge the existing server logs, which should free up some memory and SAN space. While you're in that panel, you may wish to set the other TTL values to a value appropriate for your environment, if you have not done so already.
If you're running something older than 7.1, or the build you're running doesn't have the checkbox to apply TTL values retroactively, the process to purge existing data is more complicated - in that case, please open a PMR and the support team can guide you.
If you can't purge data (e.g., if you have to keep collecting server logs, or save old data for a long time), you should add additional nodes to your Elasticsearch cluster to distribute the load over the additional nodes, so the resource utilization of each node will be less.

Resources