SonarQube 5.1 too busy due to ElasticSearch - sonarqube

I have recently migrated from SonarQube 3.7.2 to SonarQube 5.1. Update was successfull and I was able to run analysis.
However now I cannot reach the server and from log it seems ElasticSearch is slowly eating away my disk space.
I tried to restart the server and to delete the data/es directory, but nothing helped.
sonar.log is full of these lines:
...
2015.05.18 00:00:13 WARN es[o.e.c.r.a.decider] [sonar-1431686361188] high disk watermark [10%] exceeded on [Jbz_O0pFRKecav4NT3DWzQ][sonar-1431686361188] free: 5.6gb[3.8%], shards will be relocated away from this node
2015.05.18 00:00:13 INFO es[o.e.c.r.a.decider] [sonar-1431686361188] high disk watermark exceeded on one or more nodes, rerouting shards
...
There are just a few Java projects, but two of them are around a couple million lines of code (LOC).

Your server does not have enough available disk space to feed its internal Elasticsearch indices.
Note that an external volume can be used by setting the property sonar.path.data (see conf/sonar.properties).

Related

Solr Slow on Kubernetes - Possible Solution

I am writing this question to share the solution we found in our company.
We migrated Solr over a docker only solution to a kubernetes solution.
On kubernetes the environment ended up with slowness.
At least for me the solution was atypical.
Environment:
solr(8.2.0) with just one node
solr database with 250GB on disk
kubernetes over Rancher
Node with 24vcpus and 32GB of Ram
Node hosts Solr and nginx ingress
Reserved 30GB for the Solr pod in kubernetes
Reserved 25GB for the Solr
Expected Load:
350 updates/min (pdf documents and html documents)
50 selects/min
The result was Solr degrading over time having high loads on host. The culpirit was heavy disk access.
After one week of frustrated adjustments this is the simple solution we found:
Solr JVM had 25 GB. We decreased the value to 10GB.
This is the command to start solr with the new values:
/opt/solr/bin/solr start -f -force -a '-Xms10g -Xmx10g' -p 8983
If someone can explain what happened that would be great.
My guess is that solr was trying to make cash and kubernetes was reapping this cache. So Solr ended up in a continuous reading of the disk trying to build its cache.

How to perform GreenPlum 6.x Backup & Recovery

I am using GreenPlum 6.x and facing issues while performing backup and recovery. Does we have any tool to take the physical backup of whole cluster like pgbackrest for Postgres, further how can we purge the WAL of master and each segment as we can't take the pg_basebackup of whole cluster.
Are you using open source Greenplum 6 or a paid version? If paid, you can download the gpbackup/gprestore parallel backup utility (separate from the database software itself) which will back up the whole cluster with a wide variety of options. If using open source, your options are pretty much limited to pgdump/pgdumpall.
There is no way to purge the WAL logs that I am aware of. In Greenplum 6, the WAL logs are used to keep all the individual postgres engines in sync throughout the cluster. You would not want to purge these individually.
Jim McCann
VMware Tanzu Data Engineer
I would like to better understand the the issues you are facing when you are performing your backup and recovery.
For Open Source user of the Greenplum Database, the gpbackup/gprestore utilities can be downloaded from the Releases page on the Github repo:
https://github.com/greenplum-db/gpbackup/releases
v1.19.0 is the latest.
There currently isn't a pg_basebackup / WAL based backup/restore solution for Greenplum Database 6.x
WAL logs are periodically purged (as they get replicated to mirror and flushed) from master and segments individually. So, no manual purging is required. Have you looked into why the WAL logs are not getting purged? One of the reasons could be mirrors in cluster is down. If that happens WAL will continue mounting on primary and won't get purged. Perform select * from pg_replication_slots; for master or segment for which WAL is building to know more.
If the cause for WAL build is due replication slot as for some reason is mirror down, can use guc max_slot_wal_keep_size to configure max size WAL's should consume, after that replication slot will be disabled and not consume more disk space for WAL.

Disable ElasticSearch logs in Liferay 7

I've started using Liferay v7, and am getting a lot of the following log messages:
17:14:12,265 WARN [elasticsearch[Mirage][management][T#1]][decider:157] [Mirage] high disk watermark [90%] exceeded on [fph02E6ISIWnZ5cxWw_mow][Mirage][/Users/randy/FasterPayments/src/eclipse/com.rps.portal/com.rps.portal.backoffice/bundles/data/elasticsearch/indices/LiferayElasticsearchCluster/nodes/0] free: 46gb[9.9%], shards will be relocated away from this node
To be honest, I'd rather not spend time learning about ElasticSearch right now, is it possible to simply disable ElasticSearch within Liferay 7 dev environment? Or other action to remove these log messages?
Go to Control Panel / Configuration / System Settings / Foundation / Elasticsearch.
Under "Additional Configurations" enter
cluster.routing.allocation.disk.threshold_enabled: True
cluster.routing.allocation.disk.watermark.low: 30gb
cluster.routing.allocation.disk.watermark.high: 20gb
or whatever are appropriate values for your system (there is value in being warned that the disk is almost full).
Save & Restart (the values seem not to be picked up at runtime).
Liferay needs an index/search engine, say ElasticSearch or SOLR. It defaults to ElasticSearch in DXP. It makes no sense disabling it.
The warnings tell you you've reached your configured disk shared allocation. You can change this settings in your elasticSearch.yml (cluster.routing.allocation.disk.watermark.high).
If your logs annoy you, you can change your logging settings. Not sure If it's still valid in DXP, but have a look at https://dev.liferay.com/es/discover/deployment/-/knowledge_base/6-2/liferays-logging-system.

ambari metrics diskspacecheck issue

I am installing a new hadoop cluster(total 5 nodes) using the Ambari dashboard. While deploying the cluster it fails but with warnings of disk space issues and error messages like '/' needs atleast 2GB of diskspace for mount. But I have allocated total 50GB of disk to each node. Upon googling for the solution I found that I need to make diskspacecheck=0 in the etc/yum.conf file as suggested in the below link(point 3.6):
http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.0.0/bk_ambari_troubleshooting/content/_resolving_ambari_installer_problems.html
But I am using ubuntu image in the nodes and there is no yum file. And I didn't get any file with "diskspacecheck" parameter. Can anybody tell me how to solve this issue and successfully deploy my cluster?

Graylog2/Passenger 3.0.21; writev() "/tmp/passenger-standalone.3012/proxy_temp/3/00/0000000003" has written only 4096 of 8192 while reading upstream

I have a Graylog2 install (0.11.0), served with Passenger running as standalone (3.0.21). It's backed with multiple ElasticSearch servers plus MongoDB.
About a week ago, it was running Passenger 3.0.18 and this error started to show up in the Graylog server logs when you tried to load messages:
2013/09/13 13:47:32 [crit] 27720#0: *1451 writev() "/tmp/passenger-standalone.27619/proxy_temp/6/00/0000000006" failed (28: No space left on device) while reading upstream
Checked /tmp/, and it had 8% utilization. Meanwhile on the front-end, when you tried to load the Messages page in Graylog, the page would load fine all except for the actual messages. I tried upgrading Passenger to 3.0.21 and the behavior stayed the same but the error changed:
2013/09/17 10:16:53 [crit] 3113#0: *10 writev() "/tmp/passenger-standalone.3012/proxy_temp/3/00/0000000003" has written only 4096 of 8192 while reading upstream
Next I checked out ES machines. They were running with high CPU load, so I changed the amount of max indexes they were keeping from Graylog and that brought them right back down...but still no change in behavior.
My best guess on this error is that it's some sort of timeout, but I can't find any other thread where anyone's gotten this error, and I don't see why a timeout should be happening now that the ES machines are within range again. All other Graylog web pages work fine, as do Streams.
I ended up doing a few things to resolve this issue.
Change Graylog's processor_wait_strategy to 'blocking'. This greatly reduced the amount of CPU the graylog-server app was using.
Cut the amount of data ElasticSearch was storing by reducing the elasticsearch_max_number_of_indices for Graylog.
And the thing that helped the most, stop the Graylog server, and delete the graylog2_recent ElasticSearch index. Then restart the Graylog server, and it will re-create it. Once I did this, the amount of CPU load on the ElasticSearch servers dropped drastically and Messages and searches began to work again. Once the index re-filled, it continued to work correctly.
Hopefully this helps some other poor individual Googling this error.

Resources