LXD Issue, vm.max_map_count with Elasticsearch - elasticsearch

Ok so to start, all the things I've tried so far:
Set vm.max_map_count in:
The host in etc/sysctl.conf
The host in /etc/sysctl.d/99-sysctl.conf
The LXD Container in /etc/sysctl.conf
The LXD container in /etc/sysctl.d/99-sysctl.conf
According to the official LXD production settings, this setting is possible with LXD:
source: https://linuxcontainers.org/lxd/docs/master/production-setup
According to multiple resources online, this is the approved fix to remediate the error, because the default setting is 65530.
I've checked the host, it says this:
cmd: sysctl vm.max_map_count
output: vm.max_map_count = 262144
I've checked the lxd container, it says this:
cmd: sysctl vm.max_map_count
output: vm.max_map_count = 65530
I also verified the configuration file again in LXD container in /etc/sysctl.conf, and it shows the setting as: vm.max_map_count=262144
I've rebooted the container, I've stop and restarted the container, I've even built a new test container. All of them keep saying 65530. What can I do here to close this out?

So I figured out two ways to solve this problem:
Apply the solution above, and then go through an incredibly lengthy and painful process of disabling Apparmor just to change the one setting, then reenable AppArmor again.
Build Elasticsearch on another box, and bypass the entire process.
Took a quick 3 minute assessment, figured it wasn't worth the time + frustration to deal with all the apparmor pains, build it elsewhere.
But to answer the question in case anyone is willing to eat the time & pain to do it in lxd, disable apparmor, apply the vm.max_map_count setting, and then turn apparmor back on.

As of 5-19-2022 I had good luck simply adding vm.max_map_count = 262144 in /etc/sysctl.conf on the host and rebooting the host.
Host is Ubuntu 22.04 as is the LXD container. The Elasticsearch process came up without an issue.
No having to mess with apparmor thankfully!

Related

ElasticSearch service starts but can not be reached and does not do any logging

ElasticSearch 6.2.2 on Linux Ubuntu 16.04.3 VM in Azure. It had been up and running fine and then after I rebooted the machine a few days ago I could not get the ElasticSearch service to start at all. Issue was shared and solved here: (ElasticSearch Fails to Start on Ubuntu 16.04.3 - status=1 Failure) by increasing the heap size in the jvm.options file.
Now I have the ElasticSearch service running but I cannot ping it at all. I have tried to ping it from both inside the VM (as localhost:9200) and from outside, (similar to how I make calls to our other ES boxes, and do so successfully) but I'm told Could Not Get Any Reponse (Postman syntax).
The part that is making this impossible to diagnose is nothing is getting written to the ElasticSearch logs! The last time anything was written to any log at /var/log/elasticsearch was before I rebooted the machine a couple days ago.
I have checked the settings in elasticsearch.yml and all seems to be in-line with the elasticsearch.yml that's on a different box of ours in a different location which runs another ElasticSearch instance of ours without any issue.
EDIT: per request - the elasticsearch.yml file from the box that is NOT working correctly is here: http://s000.tinyupload.com/index.php?file_id=72318548245343478927 For comparison purposes, the elasticsearch.yml file from the box that IS working correctly is here: http://s000.tinyupload.com/index.php?file_id=20127693354114612595 Please note that the one that IS working correctly has 3 nodes whereas the one that is not working has only one node, so there will be some slight differences between the yml files because of this.
Check if path.logs: /var/log/elasticsearch is defined in elasticsearch.yml. Add this line if not present.
Check whether the user has permission to write into /var/log/elasticsearch. Change the permission of the files. sudo chmod 777 /var/log/elasticsearch/* and sudo chmod 777 /var/log/elasticsearch
Open /etc/init.d/elasticsearch and check whether ES_PATH_CONF is defined as ES_PATH_CONF="/etc/elasticsearch"
You may try commenting the following lines on log4j2.properties under /etc/elasticsearch.
logger.xpack_security_audit_logfile.name = org.elasticsearch.xpack.security.audit.logfile.LoggingAuditTrail
logger.xpack_security_audit_logfile.level = info
logger.xpack_security_audit_logfile.appenderRef.audit_rolling.ref = audit_rolling
logger.xpack_security_audit_logfile.additivity = false
Use netstat -nultp | grep 9200 and check whether the port is being listened to.
The issue was with the line in the ElasticSearch.yml file which showed as
"10.5.11.6""
That extra quotation mark at the end is what was causing the entire problem.
For anyone that this can benefit, the ElasticSearch.yml file is extremely sensitive when it comes to space, punctuation and case: even an extra space somewhere can cause the entire service to crash. Be very diligent with your edits to elasticsearch.yml.
There are ways to debug:
1. Check if you have ES service running on that particular host via `ps -ef | grep elastic`
2. Look on which port es is listening (or not) ? via netstat
3. it might be a case that your es is running and but is binding not to localhost but to the instance IP . You should be getting the hint on the elasticsearch.yaml
4. Make sure your /usr/share/elasticsearch/elasticsearch.yaml is the file that is being picked up and not the default at /etc/elasticsearch.yaml
5. Configure logging in elasticsearch.yaml to the location
Hope this helps?

Docker stuck on "Waiting for SSH to be available..."

I'm using a docker with Windows and Hyper-v to create containers. I've added a docker machine vmachine to my docker configuration. First time the machine is created, it gets an IP (although I cannot manage nginx to access it - ERR_CONNECTION_REFUSED) and finishes the bootup.
When I turn off the machine and then try to boot it, i get stuck in this message
Waiting for SSH to be available...
And it doesn't evolve from there. The machine is booted, however, I get an IPv6 when I input the command docker-machine ip vmachine like - fe80::215:5dff:fe21:10b insted of a IPv4
What am I doing wrong?
Problem here is by default docker uses DockerNAT network switch. You should create a new external network switch instead. This issue is covered here and here. You can create an external network switch using the below command
docker-machine create -d hyperv --hyperv-virtual-switch external-switch tempbox1
or you can create one through the UI
Be sure to reboot the device after creating the external switch.
I had a similar issue and non of the solutions worked. Turns out that according to this answer, docker launches SSH with Unix specific elements. This is said to have been fixed in the releases that followed, but I still encountered the 'Waiting for SSH' issue. I resolved this by simply using GIT bash to run all docker related SSH commands.
Use the switch --native-ssh
for example docker-machine --native-ssh .... Get more details from here
docker-machine.exe -debug create --driver hyperv --hyperv-virtual-switch "External Virtual Switch" --hyperv-cpu-count "1" --hyperv-memory "1024" --hyperv-disk-size "20000" mydockervm
make sure to have additional VirtualSwitch configure , with external network driver selected , Uninstall virtualbox
Use the debug switch to see the exact error , for me it was not able to allocate memory.
Here's what's solved it for me.
Turns out Windows 10 starting version 1709 has a built in SSH client at C:\Windows\System32\OpenSSH. Here's an article discussing it.
Looks like docker is using that SSH implementation and it's not compatible. I didn't look for a proper way to remove the built-in SSH implementatino in Windows 10, and simply renamed the folder. That was enough to fix it for me.
After doing what is mentioned in the above suggestions and if you are running docker on a windows machine try to login using cli. This has worked for me.
If you are using Command Promt Docker will stuck at Waiting for SSH to be available..., So change to use GIT BASH as #Dave Howson said it will work.
If you're using oracle VM you must ensure first that your new cloud vm is running.
Before:
After:

cloudera host with bad health during install

Trying again & again with all required steps completed but cluster Installation when install selected Parcels, always shows every host with bad health. setup never completed at full.
i am installing cm 5.5 on CentOS 6.7 using virtualbox.
The Error
Host is in bad health cm.feuni.edu
Host is in bad health dn1.feuni.edu
Host is in bad health dn2.feuni.edu
Host is in bad health nn1.feuni.edu
Host is in bad health nn2.feuni.edu
Host is in bad health rm.feuni.edu
above error are shown on step 6 where setup says
The selected parcels are being downloaded and installed on all the hosts in the cluster
in previous step 5 all hosts were completed with heartbeat checks in the end
memory distributions
cm 8GB
all others with 1GB
i could not find proper answer anywhere else. What reason could be for the bad health?
I don't know if it will help you...
For me, after a few days I struggled with it,
I found the log files (at )
It had a comment there is a mismatch of the guid,
so I uninstalled everything from both machines (using the script they give,/usr/share/cmf/uninstall-cloudera-manager.sh , yum remove 'cloudera-manager-*' and deletion of every directory related to cloudera I found...)
and then removed the guid file:
rm /var/lib/cloudera-scm-agent/cm_guid
Afterwards I re-installed everything, and that fixed that issue for me...
I read online that there can be issues with the hostname and things like that, but I guess that if you get to this part of the installation, you already fixed all the domain/FDQN/hosname/hosts issues.
It saddens me there is no real manual/FAQ for this product.. :(
Good luck!
I faced the same problem. This is my solution:
First I edited config.ini
$ nano /etc/cloudera-scm-agent/config.ini
so that the hostname where the same as the command $ hostname returned.
then I restarted the agent and the server of cloudera:
$ service cloudera-scm-agent restart
$ service cloudera-scm-server restart
then in cloudera manager I deleted the cluster and added again. The wizard continued to run normally.

vagrant up stuck on mount nfs

When I attempt to initiate 'vagrant up' the script executes as normal until it gets to the last line, where NFS shared drives are mounted.
I have tried deleting the exports file in /etc/ followed by a nfsd restart and vagrant destroy / vagrant up but to no avail.
After some considerable amount of time the console outputs the following [certain details redacted]:
*==> default: Mounting NFS shared folders...*
*The following SSH command responded with a non-zero exit status. Vagrant assumes that this means the command failed!*
*mount -o 'nolock,vers=3,udp,noatime' XXX.XXX.XX.X:'/Users/dhatton/Google Drive/moodle-doodle/site' /var/www/site*
*Stdout from the command:*
*Stderr from the command:*
*mount.nfs: Connection timed out*
UPDATE
The above problem was encountered when using a VPN into the office network. Upon logging in on-site without the VPN, everything works again.
For macOS Monterey 12.1 with virtualBox 6.1.30 and vagrant Vagrant 2.2.19/18:
create vbox folder in /etc
create a file inside /etc/vbox named networks.conf
add the following inside networks.conf
* 0.0.0.0/0 ::/0
Note: if you get the ip address range error, add your IP here too.
I had similar issue. I searched a lot, and tried following solutions:
Check /etc/exports and /etc/hosts files, if there are invalid entries in file, remove them.
Check your firewall is not blocking access
Restart NFS system
install vagrant plugin install vagrant-vbguest plugin
do vagrant reload --provision
Reboot your pc
Reinstall vagrant
For me reinstalling vagrant worked.
I've ran across this before and the problem turned out to be related to my companies VPN. If I tried running vagrant up connected to the VPN it would hang on mounting NFS, but if I disconnected from VPN and tried again it worked. Once running I could connect to VP Probably goes back to it needing a stable internet connection.
Assuming you are trying to mount from guest to host (host being OSX?) trying mounting to a different path. You might be encountering issues with the space in Google Drive?
Vagrant downloads binaries from its cloud while configuring a VM, so a stable internet connection is needed. In fact, an internet connection is necessary for using most of the Hashicorp products.

Docker unreachable after computer sleep

I have just installed docker using docker-toolbox 1.8.2 on Windows 10.
Due to due to this issue I had to recreate the docker image using these commands
docker-machine rm default
docker-machine --native-ssh create -d virtualbox default
After that it has been working fine, except for one problem:
When the PC has gone to sleep and then wakes again, the docker commands can no longer connect. Example:
> docker images
An error occurred trying to connect: Get https://192.168.99.100:2376/v1.20/images/json:
dial tcp 192.168.99.100:2376: ConnectEx tcp: A connection attempt failed because the
connected party did not properly respond after a period of time, or established connection
failed because connected host has failed to respond.
However the docker-machine lists the machine as running:
> docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM
default * virtualbox Running tcp://192.168.99.100:2376
I can also confirm in VirtualBox that the VM screen seems to be active.
I have tried starting and stopping the machine, but that does not help
C:\x> docker-machine stop default
C:\x> docker-machine start default
Starting VM...
Started machines may have new IP addresses. You may need to re-run the `docker-machine env` command.
C:\x> docker-machine env default --shell=powershell
Ironically, the last command hangs, so I never get any environment settings.
The only thing that helps is to restart the whole PC. But that should be unnecessary?
I have also posted this as an issue on the docker github repository,but that was closed. A related issue seems to be this one, but no workaround or solution has been posted for Windows.
After hous of fighting with VirtualBox + Docker Toolbox, I finally found the way, how to make Docker working again (even without restarting all the containers):
Wake up PC from sleep
Try docker images (won`t work)
Open VirtualBox -> Close VM with saving state (CTRL+V)
Run your VM again
Try docker images again (now should work)
Please note: All steps are in VirtualBox only! Running docker-machine restart default will create another host-only adapter, which is something you do not want. If you did it anyway, delete all additionally created adapters (File->Preferences->Network on VirtualBox), then follow steps 1-5.
I have experienced the exact same symptoms on Windows 8.1... The thing is that it's not really a docker-specific issue, but more how Windows manages the VirtualBox network adapters after sleep (I think...). The culprit in my case is that the network adapter's addresses were becoming private after sleep (they became 169.* addresses).
Credits to this guy who gave me the idea: http://lyngtinh.blogspot.ca/2011/12/how-to-disable-autoconfiguration-ipv4.html
Fix:
Start a command prompt as Administrator
Find out the "useful" network adapters: ipconfig /all. The useful ones in my case were the ones labeled "VirtualBox Host-Only Ethernet Adapter" that didn't have private ips (not starting with 169.*).
Run this command and note the "Idx" of the useful VirtualBox network adapters: netsh interface ipv4 show inter.
Run this command to disable the IP auto configuration: netsh interface ipv4 set interface <idx> dadtransmits=0 store=persistent. Replace <idx> with each index found in the previous step.
Restart Windows
Afterwards, I was able to docker-machine start default, then docker-machine env default --shell cmd, put the PC to sleep, wake up and run docker-machine env default --shell cmd again.
I found that removing 'host only adapter' (File->Preferences->Network on VirtualBox), and restart the docker-machine helps.
Not a real solution. But probably better over restart the computer.
Having tried all the other answers here, and having varying but not consistent success, the following seems to reliably bring it back for me after this problem occurs.
Open a powershell/command window (I have most success if I run all docker-machine commands in a powershell window opened as administrator, I don't know if that's important or not) then run (where "dev" is the name of your docker machine instance):
docker-machine ssh dev
Then on the terminal that is opened, run:
sudo shutdown -r now
When the machine restarts, it seems to refresh the network and work correctly. Note, however, that simply running docker-machine restart dev did not have the same effect for me.
Your machine needs to be running before you can do the ssh, so if it's not running, execute docker-machine start dev before trying to SSH.
Had the same problem on Windows 8.1 and docker toolbox 1.12.0
None of the above solutions worked for me, too.
[edited]
Found another way to make docker work after system wake up:
In the docker Quickstart Terminal window, stop docker process Ctrl-C (if it is still running)
Run command docker-compose down
Shut down docker with docker-machine stop default
Exit terminal window Ctrl-D
Run Quickstart Terminal again and do all subsequent steps you need.
This worked for me, on Windows host machine.
Configure your network adapter to
1) Allow the network adapter to wake the computer,
2) Allow a magic packet to wake the computer,
3) Allow IPV6
http://www.worldstart.com/dropped-internet-connection-in-sleep-mode/
Also, on virtual box network settings, go to advanced, and allow promiscuous mode to VM machines, or allow all

Resources