CentOS 7 rsyslog DEBUG logs dropped for C/C++ modules - rsyslog

I am using rsyslog (rsyslog-7.4.7-7.el7_0.x86_64) on CentOS 7 (CentOS Linux release 7.1.1503 (Core)). We have some applications on it which is using syslog framework for logging. We have a lot of logs. At peak, it can be upto 50000 logs in one second.
Our system was earlier running on CentOS 6.2 (and rsyslog 5.8) and we never observed any drop. After doing some search, we found that there is rate limiting. We are getting messages like "imjournal: begin to drop messages due to rate-limiting" in /var/log/messages and then "imjournal: 130886 messages lost due to rate-limiting". We tried different ways to disable or tune it without success. We tried the following.
1) Changes in /etc/rsyslog.conf
$ModLoad imjournal # provides access to the systemd journal
$imjournalRatelimitInterval 1
$imjournalRatelimitBurst 50000
Some other info from rsyslog.conf as follows. Didn't change anything here
$OmitLocalLogging on
$IMJournalStateFile imjournal.state
We also saw that there is some rate limiting with imuxsock; but that we understand that that won't be used when OmitLocalLogging is ON
2) Changes in /etc/systemd/journald.conf
Storage=auto
RateLimitInterval=1s
RateLimitBurst=100000
Our application has modules in Java (using SLF4J and LOG4J) and modules in C/C++ (using syslog() call). For the C/C++ modules, we are missing DEBUG logs most of the time. But DEBUG logs of Java modules are apparently fine always.
Version of systemd is "systemd-208-20.el7.x86_64". The application and rsyslogd are on same machine.

With the latest update to systemd (219-19) on CentOS 7, the only way we were able to get our logging working again, without any rate limiting by journald or IMUXSock, was using the config changes below. This also decreased, but didn't completely eliminate the excessive CPU consumption by journald.
Add the following to /etc/rsyslog.conf after '$ModLoad imuxsock' and '$ModLoad imjournal':
$IMUXSockRateLimitInterval 0
$IMJournalRatelimitInterval 0
Set the following in /etc/systemd/journald.conf:
Storage=volatile
Compress=no
RateLimitInterval=0
MaxRetentionSec=5s
Restart journald and rsyslog to pickup the changes with:
systemctl restart systemd-journald.service
systemctl restart rsyslog.service
Prior to this last update to systemd, you could comment out '$ModLoad imjournal' in /etc/rsyslog.conf to resolve this, but that doesn't work any longer.

Related

Storm cannot start use "supervisor.run.worker.as.user: true" option on Windows Server

Version: 1.2.2
platform: Windows 10
When storm.yaml config use "supervisor.run.worker.as.user: true" option, Storm start failed on Windows 10, and No log in log file.
Running as other users is not supported on Windows.
https://github.com/apache/storm/blob/8a475696e908c53f1c06bf1a8f373d8ac0483427/storm-server/src/main/java/org/apache/storm/daemon/supervisor/RunAsUserContainer.java#L55
Someone would need to provide a Windows compatible variant of this class, as well as a couple of other bits in the code. The issue is tracked at https://issues.apache.org/jira/browse/STORM-371 if you'd like to work on it.

ElasticSearch service starts but can not be reached and does not do any logging

ElasticSearch 6.2.2 on Linux Ubuntu 16.04.3 VM in Azure. It had been up and running fine and then after I rebooted the machine a few days ago I could not get the ElasticSearch service to start at all. Issue was shared and solved here: (ElasticSearch Fails to Start on Ubuntu 16.04.3 - status=1 Failure) by increasing the heap size in the jvm.options file.
Now I have the ElasticSearch service running but I cannot ping it at all. I have tried to ping it from both inside the VM (as localhost:9200) and from outside, (similar to how I make calls to our other ES boxes, and do so successfully) but I'm told Could Not Get Any Reponse (Postman syntax).
The part that is making this impossible to diagnose is nothing is getting written to the ElasticSearch logs! The last time anything was written to any log at /var/log/elasticsearch was before I rebooted the machine a couple days ago.
I have checked the settings in elasticsearch.yml and all seems to be in-line with the elasticsearch.yml that's on a different box of ours in a different location which runs another ElasticSearch instance of ours without any issue.
EDIT: per request - the elasticsearch.yml file from the box that is NOT working correctly is here: http://s000.tinyupload.com/index.php?file_id=72318548245343478927 For comparison purposes, the elasticsearch.yml file from the box that IS working correctly is here: http://s000.tinyupload.com/index.php?file_id=20127693354114612595 Please note that the one that IS working correctly has 3 nodes whereas the one that is not working has only one node, so there will be some slight differences between the yml files because of this.
Check if path.logs: /var/log/elasticsearch is defined in elasticsearch.yml. Add this line if not present.
Check whether the user has permission to write into /var/log/elasticsearch. Change the permission of the files. sudo chmod 777 /var/log/elasticsearch/* and sudo chmod 777 /var/log/elasticsearch
Open /etc/init.d/elasticsearch and check whether ES_PATH_CONF is defined as ES_PATH_CONF="/etc/elasticsearch"
You may try commenting the following lines on log4j2.properties under /etc/elasticsearch.
logger.xpack_security_audit_logfile.name = org.elasticsearch.xpack.security.audit.logfile.LoggingAuditTrail
logger.xpack_security_audit_logfile.level = info
logger.xpack_security_audit_logfile.appenderRef.audit_rolling.ref = audit_rolling
logger.xpack_security_audit_logfile.additivity = false
Use netstat -nultp | grep 9200 and check whether the port is being listened to.
The issue was with the line in the ElasticSearch.yml file which showed as
"10.5.11.6""
That extra quotation mark at the end is what was causing the entire problem.
For anyone that this can benefit, the ElasticSearch.yml file is extremely sensitive when it comes to space, punctuation and case: even an extra space somewhere can cause the entire service to crash. Be very diligent with your edits to elasticsearch.yml.
There are ways to debug:
1. Check if you have ES service running on that particular host via `ps -ef | grep elastic`
2. Look on which port es is listening (or not) ? via netstat
3. it might be a case that your es is running and but is binding not to localhost but to the instance IP . You should be getting the hint on the elasticsearch.yaml
4. Make sure your /usr/share/elasticsearch/elasticsearch.yaml is the file that is being picked up and not the default at /etc/elasticsearch.yaml
5. Configure logging in elasticsearch.yaml to the location
Hope this helps?

check_jvm Nagios plugin to monitor java threads and heap memory not running

I have a Nagios monitoring system to monitor servers. I have a server in which WebSphere portal has been installed. I want to configure a system in which when there is high CPU usage/thread count increases, it automatically takes a thread dump. For this, I am using nagios plugin check_jvm. This plugin uses a jar called JvmInspector.jar. JvmInspector.jar should list all the jvm names which are running in the system.
Usage of JvmInspector.jar is as follows:
java -jar JvmInspector.jar all
When I am running this, it is showing an empty result although I have a running WebSphere portal server running. Inspite of this, I tried to run 'check_jvm' plugin but it shows the following result:
[root#dev03 libexec]# sudo -u root /usr/local/nagios/libexec/check_jvm -n WebSphere_Portal -p threads -w 105 -c 135
UNKNOWN Can't connect to the JVM:
Can anybody help me in this?

cloudera host with bad health during install

Trying again & again with all required steps completed but cluster Installation when install selected Parcels, always shows every host with bad health. setup never completed at full.
i am installing cm 5.5 on CentOS 6.7 using virtualbox.
The Error
Host is in bad health cm.feuni.edu
Host is in bad health dn1.feuni.edu
Host is in bad health dn2.feuni.edu
Host is in bad health nn1.feuni.edu
Host is in bad health nn2.feuni.edu
Host is in bad health rm.feuni.edu
above error are shown on step 6 where setup says
The selected parcels are being downloaded and installed on all the hosts in the cluster
in previous step 5 all hosts were completed with heartbeat checks in the end
memory distributions
cm 8GB
all others with 1GB
i could not find proper answer anywhere else. What reason could be for the bad health?
I don't know if it will help you...
For me, after a few days I struggled with it,
I found the log files (at )
It had a comment there is a mismatch of the guid,
so I uninstalled everything from both machines (using the script they give,/usr/share/cmf/uninstall-cloudera-manager.sh , yum remove 'cloudera-manager-*' and deletion of every directory related to cloudera I found...)
and then removed the guid file:
rm /var/lib/cloudera-scm-agent/cm_guid
Afterwards I re-installed everything, and that fixed that issue for me...
I read online that there can be issues with the hostname and things like that, but I guess that if you get to this part of the installation, you already fixed all the domain/FDQN/hosname/hosts issues.
It saddens me there is no real manual/FAQ for this product.. :(
Good luck!
I faced the same problem. This is my solution:
First I edited config.ini
$ nano /etc/cloudera-scm-agent/config.ini
so that the hostname where the same as the command $ hostname returned.
then I restarted the agent and the server of cloudera:
$ service cloudera-scm-agent restart
$ service cloudera-scm-server restart
then in cloudera manager I deleted the cluster and added again. The wizard continued to run normally.

Installing Membase from source

I am trying to build and install membase from source tarball. The steps I followed are:
Un-archive the tar membase-server_src-1.7.1.1.tar.gz
Issue make (from within the untarred folder)
Once done, I enter into directory install/bin and invoke the script membase-server.
This starts up the server with a message:
The maximum number of open files for the membase user is set too low.
It must be at least 10240. Normally this can be increased by adding
the following lines to /etc/security/limits.conf:
Tried updating limits.conf as suggested, but no luck it continues to pop up the same message and continues booting
Given that the server is started I tried accessing memcached over port 11211, but I get a connection refused message. Then figured out (netstat) that memcached is listening to 11210 and tried telneting to port 11210, unfortunately the connection is closed as soon as I issue the following commands
stats
set myvar 0 0 5
Note: I am not getting any output from the commands above {Yes: stats did not show anything but still I issued set.}
Could somebody help me build and install membase from source? Also why is memcached listening to 11210 instead of 11211?
It would be great if somebody could also give me a step-by-step guide which I can follow to build from source from Git repository (I have not used autoconf earlier).
P.S: I have tried installing from binaries (debian package) on the same machines and I am able to successfully install and telnet. Hence not sure why is build from source not working.
You can increase the number of file descriptors on your machine by using the ulimit command. Try doing (you might need to use sudo as well):
ulimit -n 10240
I personally have this set in my .bash_rc so that whenever I start my terminal it is always set for me.
Also, memcached listens on port 11210 by default for Membase. This is done because Moxi, the memcached proxy server, listens on port 11211. I'm also pretty sure that the memcached version used for Membase only listens for the binary protocol so you won't be able to successfully telnet to 11210 and have commands work correctly. Telneting to 11211 (moxi) should work though.

Resources