Monit requires manual restart in order to receive max open files value for running a process, bug? - file-descriptor

I've been trying to figure this out for quite some time and I don't seem to be able to find any information on this issue.
diving into the issue:
I am running an application on Ubuntu 14.04 using Monit V5.6
The deployment of the application and Monit is done by using Chef scripts with AWS Opsworks which works excellent.
The problem is that once done, Monit starts the application using the following syntax:
start program = "/bin/sh -c 'ulimit -n 150000; <some more commands here which are not intersting>'" as uid <user> and gid <user_group>
This indeed starts the application using the correct user but the problem is that max open files for the process is showing 4096 instead of the number set in limits.conf
Just to be clear, I have set the following in /etc/security/limits.conf
root hard nofile 150000
root soft nofile 150000
* hard nofile 150000
* soft nofile 150000
Further more, if I stop the application then do a service monit restart and then start the application, the max open files values is received correctly and I am seeing 150000.
If I then redeploy the application without rebooting the instance then this happens again and I have to manually restart monit again.
Also if I run the application using the following syntax in order to mimic Monit:
sudo -H -u <user> /bin/sh -c 'ulimit -n 150000; <more commands here>'
Then everything is working and the process is receiving the correct value of max open files.
I try to script this manual service monit restart with stopping and starting the application via Chef scripts then this also fails and I receive 4096 as the max open files value thus my only option is to manually do this each time I deploy which is not very convenient.
Any help on this or thoughts would be greatly appreciated.
Thanks!
P.S. I also reviewed the following articles:
https://serverfault.com/questions/797650/centos-monit-ulimit-not-working
https://lists.nongnu.org/archive/html/monit-general/2010-04/msg00018.html
but as manually restarting Monit causes this to work then I am looking for a solution without changing init scripts.

Related

NagiosXI docker container: return code of 13 is out of bounds

I continually receive error in title. (see picture)
nagios image
However, I have given my sh script all permissions (chmod 777 with nagios as owner). My script also works fine on a nagios core container but with a nagios xi docker container, it messes up.
Here is the permissions on my script in the picture for proof:
permissions
The command also works on the the UI if I manually call it in the service management section of nagios.
Command also works using nagios user to run script
nagios user running script
Docker container I am using: https://hub.docker.com/r/mavenquist/nagios-xi
I've tried using this post's solutions: Nagios: return code of 13 is out of bounds
It's not entirely possible to answer your question completely with the information provided, but here are some pointers:
Never set 777 permissions. In your case the owner of the script is already "nagios:nagios" so a more reasonable permission would be 550 -- i.e. allow the nagios user and group to read and execute the file, but not modify it (why would it).
The error you're getting (return code 13) means that 1.sh for some reason is returning 13. Why is impossible to know without inspecting the script, but you can try to run the plugin as nagios and inspect the output, hopefully the script is well written enough to inform you of what the error is:
# su -c "/your/plugin -exactly -as -configured" nagios
A general rule for troubleshooting Nagios is that whatever you see in the GUI will be the exact same thing as what happens when you run the script manually as the nagios user, so it's a good way to figure out what is happening.

Where to find sshd logs on MacOS sierra

I want to install Pseudo-Distributed HBase environment on my Mac OS Sierra (10.12.4), and it requires ssh installed and can log with ssh localhost without password. But sometimes I came across with error when I use ssh to log in. Above all are question background, and the actual question is where can I find debug logs of sshd so I could know why logging is failed in further?
As I know, Mac OS already have sshd installed and use launchd to manage it, and I know one way to output debug logs by sshd -E /var/log/sshd.log, but when I reviewed /etc/ssh/sshd_config configuration and there are two lines:
#SyslogFacility AUTH
#LogLevel INFO
I guess these two lines are used to config debug mode, then I removed # before them and set LogLevel to DEBUG3 and then restarted sshd:
$ launchctl unload -w /System/Library/LaunchDaemons/ssh.plist
$ launchctl load -w /System/Library/LaunchDaemons/ssh.plist
And then I set log path in /etc/syslog.conf:
auth.*<tab>/var/log/sshd.log
<tab> means tab character here, and reloaded the config:
$ killall -HUP syslogd
But sshd.log file can not be found in /var/log folder when I executed ssh localhost. I also tried config the /etc/asl.log:
> /var/log/sshd.log format=raw
? [= Facility auth] file sshd.log
And the result was the same, can someone help me?
Apple, as usual, decided to re-invent the wheel.
In super-user window
# log config --mode "level:debug" --subsystem com.openssh.sshd
# log stream --level debug 2>&1 | tee /tmp/logs.out
In another window
$ ssh localhost
$ exit
Back in Super-user window
^C (interrupt)
# grep sshd /tmp/logs.out
2019-01-11 08:53:38.991639-0500 0x17faa85 Debug 0x0 37284 sshd: (libsystem_network.dylib) sa_dst_compare_internal <private>#0 < <private>#0
2019-01-11 08:53:38.992451-0500 0xb47b5b Debug 0x0 57066 socketfilterfw: (Security) [com.apple.securityd:unixio] open(/usr/sbin/sshd,0x0,0x1b6) = 12
...
...
In super-user window, restore default sshd logging
# log config --mode "level:default" --subsystem com.openssh.sshd
You can find it in /var/log/system.log. Better if you filter by "sshd":
cat /var/log/system.log | grep sshd
Try this
cp /System/Library/LaunchDaemons/ssh.plist /Library/LaunchDaemons/ssh.plist
Then
vi /Library/LaunchDaemons/ssh.plist
And add your -E as shown below
<array>
<string>/usr/sbin/sshd</string>
<string>-i</string>
<string>-E</string>
<string>/var/log/system.log</string>
</array>
And lastly restart sshd now you will see sshd logs in /var/log/system.log
launchctl unload /System/Library/LaunchDaemons/ssh.plist && launchctl load -w /Library/LaunchDaemons/ssh.plist
I also had an ssh issue that I wanted to debug further and was not able to figure out how to get the sshd debug logs to appear in any of the usual places. I resorted to editing the /System/Library/LaunchDaemons/ssh.plist file to add a -E <log file location> parameter (/tmp/sshd.log, for example). I also edited /etc/ssh/sshd_config to change the LogLevel. With these changes, I was able to view the more verbose logs in the specified log file.
I don't have much experience with MacOS so I'm sure there is a more correct way to configure this, but for lack of a better approach this got the logs I was looking for.
According to Apple's developer website, logging behavior has changed in macOS 10.12 and up:
Important:
Unified logging is available in iOS 10.0 and later, macOS 10.12 and later, tvOS 10.0 and later, and watchOS 3.0 and later, and supersedes ASL (Apple System Logger) and the Syslog APIs. Historically, log messages were written to specific locations on disk, such as /etc/system.log. The unified logging system stores messages in memory and in a data store, rather than writing to text-based log files.
Unfortunately, unless someone comes up with a pretty clever way to extract the log entries from memory or this mysterious "data store", I think we're SOL :/
There is some sshd log in
/var/log/system.log
for example
Apr 26 19:00:11 mac-de-mamie com.apple.xpc.launchd[1] (com.openssh.sshd.7AAF2A76-3475-4D2A-9EEC-B9624143F2C2[535]): Service exited with abnormal code: 1
Not very instructive. I doubt if more can be obtained. LogLevel VERBOSE and LogLevel DEBUG3 in sshd_config do not help.
According to man sshd_config :
"Logging with a DEBUG level violates the privacy of users and is not recommended."
By the way, I relaunched sshd not with launchctl but with System preference Sharing, ticking Remote login.
There, I noticed the option : Allow access for ...
I suspect this settings to be OUTSIDE /etc/ssh/sshd_config
(easy to check but I have no time).
Beware that Mac OS X is not Unix : Apple developpers can do many strange things behind the scene without any care for us command line users.

AWS Launch Configuration not picking up user data

We are trying to build an an autoscaling group(lets say AS) configured with an elastic load balancer(lets say ELB) in AWS. The autoscaling group itself is configured with a launch configuration(lets say LC). As far as I could understand from the AWS documentation, pasting a script, as-is, in the user data section of the launch configuration would run that script for every instance launched into an auto scaling group associated with that auto scaling group.
For example pasting this in user data would have a file named configure available in the home folder of a t2 micro ubuntu image:
#!/bin/bash
cd
touch configure
Our end goal is:
Increase instances in auto scaling group, they launch with our startup script and this new instance gets added behind the load balancer tagged with the auto scaling group. But the script was not executed at the instance launch. My questions are:
1. Am i missing something here?
2. What should I do to run our startup script at time of launching any new instance in an auto scaling group?
3. Is there any way to verify if user data was really picked up by the launch?
The direction you are following is right. What is wrong is your user data script.
Problem 1:
What you have to remember is that user data will be executed as user root, not ubuntu. So if your script worked fine, you would find your file in /root/configure, NOT IN /home/ubuntu/configure.
Problem 2:
Your script is actually executing, but it's incorrect and is failing at cd command, thus file is not created.
cd builtin command without any directory given will try to do cd $HOME, however $HOME is NOT SET during cloud-init run, so you have to be explicit here.
Change your script to below and it will work:
#!/bin/bash
cd /root
touch configure
You can also debug issues with your user-data script by inspecting /var/log/cloud-init.log log file, in particular checking for errors in it: grep -i error /var/log/cloud-init.log
Hope it helps!

Setting up Redis on Webfaction

What are the steps required to set up Redis database on Webfaction shared hosting account?
Introduction
Because of the special environment restrictions of Webfaction servers the installation instructions are not as straightforward as they would be. Nevertheless at the end you will have a fully functioning Redis server that stays up even after a reboot. I personally installed Redis by the following procedure about a half a year ago and it has been running flawlessy since. A little word of a warning though, half a year is not a long time, especially because the server have not been under a heavy use.
The instructions consists of five parts: Installation, Testing, Starting the Server, Managing the Server and Keeping the Server Running.
Installation
Login to your Webfaction shell
ssh foouser#foouser.webfactional.com
Download latest Redis from Redis download site.
> mkdir -p ~/src/
> cd ~/src/
> wget http://download.redis.io/releases/redis-2.6.16.tar.gz
> tar -xzf redis-2.6.16.tar.gz
> cd redis-2.6.16/
Before the make, see is your server Linux 32 or 64 bit. The installation script does not handle 32 bit environments well, at least on Webfaction's CentOS 5 machines. The command for bits is uname -m. If Linux is 32 bit the result will be i686, if 64 bit then x86_64. See this answer for details.
> uname -m
i686
If your server is 64 bit (x86_64) then just simply make.
> make
But if your server is 32 bit (i686) then you must do little extra stuff. There is a command make 32bit but it produces an error. Edit a line in the installation script to make make 32bit to work.
> nano ~/src/redis-2.6.16/src/Makefile
Change the line 214 from this
$(MAKE) CFLAGS="-m32" LDFLAGS="-m32"
to this
$(MAKE) CFLAGS="-m32 -march=i686" LDFLAGS="-m32 -march=i686"
and save. Then run the make with 32bit flag.
> cd ~/src/redis-2.6.16/ ## Note the dir, no trailing src/
> make 32bit
The executables were created into directory ~/src/redis-2.6.16/src/. The executables include redis-cli, redis-server, redis-benchmark and redis-sentinel.
Testing (optional)
As the output of the installation suggests, it would be nice to ensure that everything works as expected by running tests.
Hint: To run 'make test' is a good idea ;)
Unfortunately the testing requires tlc8.6.0 to be installed which is not the default at least on the machine web223. So you must install it first, from source. See Tcl/Tk installation notes and compiling notes.
> cd ~/src/
> wget http://prdownloads.sourceforge.net/tcl/tcl8.6.0-src.tar.gz
> tar -xzf tcl8.6.0-src.tar.gz
> cd tcl8.6.0-src/unix/
> ./configure --prefix=$HOME
> make
> make test # Optional, see notes below
> make install
Testing Tcl with make test will take time and will also fail due to WebFaction's environment restrictions. I suggest you skip this.
Now that we have Tlc installed we can run Redis tests. The tests will take a long time and also temporarily uses a quite large amount of memory.
> cd ~/src/redis-2.6.16/
> make test
After the tests you are ready to continue.
Starting the Server
First, create a custom application via Webfaction Control Panel (Custom app (listening on port)). Name it for example fooredis. Note that you do not have to create a domain or website for the app if Redis is used only locally i.e. from the same host.
Second, make a note about the socket port number that was given for the app. Let the example be 23015.
Copy the previously compiled executables to the app's directory. You may choose to copy all or only the ones you need.
> cd ~/webapps/fooredis/
> cp ~/src/redis-2.6.16/src/redis-server .
> cp ~/src/redis-2.6.16/src/redis-cli .
Copy also the sample configuration file. You will soon modify that.
> cp ~/src/redis-2.6.16/redis.conf .
Now Redis is already runnable. There is couple problems though. First the default Redis port 6379 might be already in use. Second, even if the port was free, yes, you could start the server but it stops running at the same moment you exit the shell. For the first the redis.conf must be edited and for the second, you need a daemon which is also solved by editing redis.conf.
Redis is able to run itself in the daemon mode. For that you need to set up a place where the daemon stores its process ids, PIDs. Usually pidfiles are stored in /var/run/ but because the environment restrictions you must select a place for them in your home directory. Because a reason explained later in the part Managing the Server, a good choice is to put the pidfile under the same directory as the executables. You do not have to create the file yourself, Redis creates it for you automatically.
Now open the redis.conf for editing.
> cd ~/webapps/fooredis/
> nano redis.conf
Change the configurations in the following manner.
daemonize no -> daemonize yes
pidfile /var/run/redis.pid -> pidfile /home/foouser/webapps/fooredis/redis.pid
port 6379 -> port 23015
Now finally, start Redis server. Specify the conf-file so Redis listens the right port and runs as a daemon.
> cd ~/webapps/fooredis/
> ./redis-server redis.conf
>
See it running.
> cd ~/webapps/fooredis/
> ./redis-cli -p 23015
redis 127.0.0.1:23015> SET myfeeling Phew.
OK
redis 127.0.0.1:23015> GET myfeeling
"Phew."
redis 127.0.0.1:23015> (ctrl-d)
>
Stop the server if you want to.
> ps -u $USER -o pid,command | grep redis
718 grep redis
10735 ./redis-server redis.conf
> kill 10735
or
> cat redis.pid | xargs kill
Managing the Server
For the ease of use and as a preparatory work for the next part, make a script that helps to open the client and start, restart and stop the server. An easy solution is to write a makefile. When writing a makefile, remember to use tabs instead of spaces.
> cd ~/webapps/fooredis/
> nano Makefile
# Redis Makefile
client cli:
./redis-cli -p 23015
start restart:
./redis-server redis.conf
stop:
cat redis.pid | xargs kill
The rules are quite self-explanatory. The special about the second rule is that while in daemon mode, calling the ./redis-server does not create a new process if there is a one running already.
The third rule has some quiet wisdom in it. If redis.pid was not stored under the directory of fooredis but for example to /var/run/redis.pid then it would not be so easy to stop the server. This is especially true if you run multiple Redis instances concurrently.
To execute a rule:
> make start
Keeping the Server Running
You now have an instance of Redis running in daemon mode which allows you to quit the shell without stopping it. This is still not enough. What if the process crashes? What if the server machine is rebooted? To cover these you have to create two cronjobs.
> export EDITOR=nano
> crontab -e
Add the following two lines and save.
*/5 * * * * make -C ~/webapps/fooredis/ -f ~/webapps/fooredis/Makefile start
#reboot make -C ~/webapps/fooredis/ -f ~/webapps/fooredis/Makefile start
The first one ensures each five minutes that fooredis is running. As said above this does not start new process if one is already running. The second one ensures that fooredis is started immediately after the server machine reboot and long before the first rule kicks in.
Some more deligate methods for this could be used, for example forever. See also this Webfaction Community thread for more about the topic.
Conclusion
Now you have it. Lots of things done but maybe more will come. Things you may like to do in the future which were not covered here includes the following.
Setting a password, preventing other users flushing your databases. (See redis.conf)
Limiting the memory usage (See redis.conf)
Logging the usage and errors (See redis.conf)
Backupping the data once in a while.
Any ideas, comments or corrections?
To summarize Akseli's excellent answer:
assume your user is named "magic_r_user"
cd ~
wget "http://download.redis.io/releases/redis-3.0.0.tar.gz"
tar -xzf redis-3.0.0.tar.gz
mv redis-3.0.0 redis
cd redis
make
make test
create a custom app "listening on port" through the Webfaction management website
assume we named it magic_r_app
assume it was assigned port 18932
cp ~/redis/redis.conf ~/webapps/magic_r_app/
vi ~/webapps/magic_r_app/redis.conf
daemonize yes
pidfile ~/webapps/magic_r_app/redis.pid
port 18932
test it
~/redis/src/redis-server ~/webapps/magic_r_app/redis.conf
~/redis/src/redis-cli -p 18932
ctrl-d
cat ~/webapps/magic_r_app/redis.pid | xargs kill
crontab -e
*/1 * * * * /home/magic_r_user/redis/src/redis-server /home/magic_r_user/webapps/magic_r_app/redis.conf &>> /home/magic_r_user/logs/user/cron.log
don't forget to set a password!
FYI, if you are installing redis 2.8.8+ you may get an error, undefined reference to __sync_add_and_fetch_4 when compiling. See http://www.eschrade.com/page/undefined-reference-to-__sync_add_and_fetch_4/ for information.
I've pasted the relevant portion from that page below in case the page ever goes offline. Essentially you need to export the CFLAGS variable and restart the build process.
[root#devvm1 redis-2.6.7]# export CFLAGS=-march=i686
[root#devvm1 redis-2.6.7]# make distclean
[root#devvm1 redis-2.6.7]# make

How to increase ulimit on Amazon EC2 instance?

After SSH'ing into an EC2 instance running the Amazon Linux AMI, I tried:
ulimit -n 20000
...and got the following error:
-bash: ulimit: open files: cannot modify limit: Operation not permitted
However, the shell allows me to decrease this number, for the current session only.
Is there anyway to increase the ulimit on an EC2 instance (permanently)?
In fact, changing values through the ulimit command only applies to the current shell session. If you want to permanently set a new limit, you must edit the /etc/security/limits.conf file and set your hard and soft limits. Here's an example:
# <domain> <type> <item> <value>
* soft nofile 20000
* hard nofile 20000
Save the file, log-out, log-in again and test the configuration through the ulimit -n command. Hope it helps.
P.S. 1: Keep the following in mind:
Soft limit: value that the kernel enforces for the corresponding resource.
Hard limit: works as a ceiling for the soft limit.
P.S. 2: Additional files in /etc/security/limits.d/ might affect what is configured in limits.conf.
Thank you for the answer. For me just updating /etc/security/limits.conf wasn't enough. Only the 'open files' ulimit -n was getting updated and nproc was not getting updated. After updating /etc/security/limits.d/whateverfile, nproc "ulimit -u" also got updated.
Steps:
sudo vi /etc/security/limits.d/whateverfile
Update limits set for nproc/ nofile
sudo vi /etc/security/limits.conf
* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
Reboot the machine sudo reboot
P.S. I was not able to add it as a comment, so had to post as an answer.
I don't have enough rep points to comment...sorry for the fresh reply, but maybe this will keep someone from wasting an hour.
Viccari's answer finally solved this headache for me. Every other source tells you to edit the limits.conf file, and if that doesn't work, to add
session required pam_limits.so
to the /etc/pam.d/common-session file
DO NOT DO THIS!
I'm running an Ubuntu 18.04.5 EC2 instance, and this locked me out of SSH entirely. I could log in, but as soon as it was about to drop me into a prompt, it dropped my connection (I even saw all the welcome messages and stuff). Verbose showed this as the last error:
fd 1 is not O_NONBLOCK
and I couldn't find an answer to what that meant. So, after shutting down the instance, waiting about an hour to snapshot the volume, and then mounting it to another running instance, I removed the edit to the common-session file and bam, SSH login worked again.
The fix that worked for me was looking for files in the /etc/security/limits.d/ folder, and editing those.
(and no, I did not need to reboot to get the new limits, just log out and back in)

Resources