Elasticsearch, Failed to obtain node lock, is the following location writable - elasticsearch

Elasticsearch won't start using ./bin/elasticsearch.
It raises the following exception:
- ElasticsearchIllegalStateException[Failed to obtain node lock, is the following location writable?: [/home/user1/elasticsearch-1.4.4/data/elasticsearch]
I checked the permissions on the same location and the location has 777 permissions on it and is owned by user1.
ls -al /home/user1/elasticsearch-1.4.4/data/elasticsearch
drwxrwxrwx 3 user1 wheel 4096 Mar 8 13:24 .
drwxrwxrwx 3 user1 wheel 4096 Mar 8 13:00 ..
drwxrwxrwx 52 user1 wheel 4096 Mar 8 13:51 nodes
What is the problem?
Trying to run elasticsearch 1.4.4 on linux without root access.

I had an orphaned Java process related to Elasticsearch. Killing it solved the lock issue.
ps aux | grep 'java'
kill -9 <PID>

I got this same error message, but things were mounted fine and the permissions were all correctly assigned.
Turns out that I had an 'orphaned' elasticsearch process that was not being killed by the normal stop command.
I had to manually kill the process and then restarting elasticsearch worked again.

the reason is another instance is running!
first find the id of running elastic.
ps aux | grep 'elastic'
then kill using kill -9 <PID_OF_RUNNING_ELASTIC>.
There were some answers to remove node.lock file but that didn't help since the running instance will make it again!

In my situation I had wrong permissions on the ES dir folder. Setting correct owner solved it.
# change owner
chown -R elasticsearch:elasticsearch /data/elasticsearch/
# to validate
ls /data/elasticsearch/ -la
# prints
# drwxr-xr-x 2 elasticsearch elasticsearch 4096 Apr 30 14:54 CLUSTER_NAME

After I upgraded the elasticsearch docker-image from version 5.6.x to 6.3.y the container would not start anymore because of the aforementioned error
Failed to obtain node lock
In my case the root-cause of the error was missing file-permissions
The data-folder used by elasticsearch was mounted from the host-system into the container (declared in the docker-compose.yml):
volumes:
- /var/docker_folders/common/experimental-upgrade:/usr/share/elasticsearch/data
This folder could not be accessed anymore by elasticsearch for reasons I did not understand at all. After I set very permissive file-permissions to this folder and all sub-folders the container did start again.
I do not want to reproduce the command to set those very permissive access-rights on the mounted docker-folder, because it is most likely a very bad practice and a security-issue. I just wanted to share the fact that it might not be a second process of elasticsearch running, but actually just missing access-rights to the mounted folder.
Maybe someone could elaborate on the apropriate rights to set for a mounted-folder in a docker-container?

As with many others here replying, this was caused by wrong permissions on the directory (not owned by the elasticsearch user). In our case it was caused by uninstalling Elasticsearch and reinstalling it (via yum, using the official repositories).
As of this moment, the repos do not delete the nodes directory when they are uninstalled, but they do delete the elasticsearch user/group that owns it. So then when Elasticsearch is reinstalled, a new, different elasticsearch user/group is created, leaving the old nodes directory still present, but owned by the old UID/GID. This then conflicts and causes the error.
A recursive chown as mentioned by #oleksii is the solution.

You already have ES running. To prove that type:
curl 'localhost:9200/_cat/indices?v'
If you want to run another instance on the same box you can set node.max_local_storage_nodes in elasticsearch.yml to a value larger than 1.

Try the following:
1. find the port 9200, e.g.: lsof -i:9200
This will show you which processes use the port 9200.
2. kill the pid(s), e.g. repeat kill -9 pid for each PID that the output of lsof showed in step 1
3. restart elasticsearch, e.g. elasticsearch

I had an another ElasticSearch running on the same machine.
Command to check : netstat -nlp | grep 9200 (9200 - Elastic Port)
Result : tcp 0 0 :::9210 :::* LISTEN 27462/java
Kill the process by,
kill -9 27462
27462 - PID of ElasticSearch instance
Start the elastic search and it may run now.

In my case, this error was caused by not mounting the devices used for the configured data directories using "sudo mount".

chown -R elasticsearch:elasticsearch /var/lib/elasticsearch
It directly shows it doesn't have permission to obtain a lock. So need to give permissions.

In my case the /var/lib/elasticsearch was the dir with missing permissions (CentOS 8):
error: java.io.IOException: failed to obtain lock on /var/lib/elasticsearch/nodes/0
To fix it, use:
chown -R elasticsearch:elasticsearch /var/lib/elasticsearch

To add to the above answers there could be some other scenarios in which you can get the error.In fact I had done a update from 5.5 to 6.3 for elasticsearch.I have been using the docker compose setup with named volumes for data directories.I had to do a docker volume prune to remove the stale ones.After doing that I was no longer facing the issue.

If anyone is seeing this being caused by:
Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/docker/es]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?
The solution is to set max_local_storage_nodes in your elasticsearch.yml
node.max_local_storage_nodes: 2
The docs say to set this to a number greater than one on your development machine
By default, Elasticsearch is configured to prevent more than one node from sharing the same data path. To allow for more than one node (e.g., on your development machine), use the setting node.max_local_storage_nodes and set this to a positive integer larger than one.
I think that Elasticsearch needs to have a second node available so that a new instance can start. This happens to me whenever I try to restart Elasticsearch inside my Docker container. If I relaunch my container then Elasticsearch will start properly the first time without this setting.

Mostly this error occurs when you kill the process abruptly. When you kill the process, node.lock file may not be cleared. you can manually remove the node.lock file and start the process again, it should work

For me the error was a simple one: I created a new data directory /mnt/elkdata and changed the ownership to the elastic user. I then copied the files and forgot to change the ownership afterwards again.
After doing that and restarting the elastic node it worked.

check these options
sudo chown 1000:1000 <directory you wish to mount>
# With docker
sudo chown 1000:1000 /data/elasticsearch/
OR
# With VM
sudo chown elasticsearch:elasticsearch /data/elasticsearch/

If you are on windows then try this:
Kill any java processes
If the start batch is interrupted in between then rather than closing the terminal, press ctrl+c to properly stop the elastic search service before you exit the terminal.

Related

Cannot access Flink dashboard localhost:8081 on windows

I follow the first steps to install Flink.
I can start the cluster without any problem
$ start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host DESKTOP-....
Starting taskexecutor daemon on host DESKTOP-....
But I don't get any status from
$ ps aux | grep flink
I can also not access the dashboard via localhost:8081.
There is an older post having these issues, but the solution didn't work for me, since the described conf files do no longer exist, apparently.
My JAVA_HOME is set as C:\Progra~1\Java\jdk1.8.0_311 to avoid issues with the space in Program Files.
Can you check the logs in the /logs folder? I'm suspecting that C:\Program Files\ could still cause issues because of the space there.
go to download Flink folder and try bash command
$./bin/start-cluster.sh --daemon bootstrap-server localhost:8081
and run code one more
$ ./bin/flink run examples/streaming/WordCount.jar
if you finished run above code which not issue, go to localhost:8081
This still seems to be problematic. I tried to run from Windows Subsystem for Linux (WSL).
I have the following versions: java 11.0.16 and flink 1.15.2.
sudo apt-get update
sudo apt install openjdk-11-jre-headless
export FLINK_HOME=/mnt/c/Projects/Apache/flink-1.15.2
I set the following in flink-conf.yaml
rest.port: 8081
rest.address: localhost
rest.bind-adress: 0.0.0.0
Whereby I changed the bind address for localhost to 0.0.0.0 this seems to have fixed the problem.
$FLINK_HOME/bin/start-cluster.sh
Now I can access the Flink Web Dashboard.

ElasticSearch service starts but can not be reached and does not do any logging

ElasticSearch 6.2.2 on Linux Ubuntu 16.04.3 VM in Azure. It had been up and running fine and then after I rebooted the machine a few days ago I could not get the ElasticSearch service to start at all. Issue was shared and solved here: (ElasticSearch Fails to Start on Ubuntu 16.04.3 - status=1 Failure) by increasing the heap size in the jvm.options file.
Now I have the ElasticSearch service running but I cannot ping it at all. I have tried to ping it from both inside the VM (as localhost:9200) and from outside, (similar to how I make calls to our other ES boxes, and do so successfully) but I'm told Could Not Get Any Reponse (Postman syntax).
The part that is making this impossible to diagnose is nothing is getting written to the ElasticSearch logs! The last time anything was written to any log at /var/log/elasticsearch was before I rebooted the machine a couple days ago.
I have checked the settings in elasticsearch.yml and all seems to be in-line with the elasticsearch.yml that's on a different box of ours in a different location which runs another ElasticSearch instance of ours without any issue.
EDIT: per request - the elasticsearch.yml file from the box that is NOT working correctly is here: http://s000.tinyupload.com/index.php?file_id=72318548245343478927 For comparison purposes, the elasticsearch.yml file from the box that IS working correctly is here: http://s000.tinyupload.com/index.php?file_id=20127693354114612595 Please note that the one that IS working correctly has 3 nodes whereas the one that is not working has only one node, so there will be some slight differences between the yml files because of this.
Check if path.logs: /var/log/elasticsearch is defined in elasticsearch.yml. Add this line if not present.
Check whether the user has permission to write into /var/log/elasticsearch. Change the permission of the files. sudo chmod 777 /var/log/elasticsearch/* and sudo chmod 777 /var/log/elasticsearch
Open /etc/init.d/elasticsearch and check whether ES_PATH_CONF is defined as ES_PATH_CONF="/etc/elasticsearch"
You may try commenting the following lines on log4j2.properties under /etc/elasticsearch.
logger.xpack_security_audit_logfile.name = org.elasticsearch.xpack.security.audit.logfile.LoggingAuditTrail
logger.xpack_security_audit_logfile.level = info
logger.xpack_security_audit_logfile.appenderRef.audit_rolling.ref = audit_rolling
logger.xpack_security_audit_logfile.additivity = false
Use netstat -nultp | grep 9200 and check whether the port is being listened to.
The issue was with the line in the ElasticSearch.yml file which showed as
"10.5.11.6""
That extra quotation mark at the end is what was causing the entire problem.
For anyone that this can benefit, the ElasticSearch.yml file is extremely sensitive when it comes to space, punctuation and case: even an extra space somewhere can cause the entire service to crash. Be very diligent with your edits to elasticsearch.yml.
There are ways to debug:
1. Check if you have ES service running on that particular host via `ps -ef | grep elastic`
2. Look on which port es is listening (or not) ? via netstat
3. it might be a case that your es is running and but is binding not to localhost but to the instance IP . You should be getting the hint on the elasticsearch.yaml
4. Make sure your /usr/share/elasticsearch/elasticsearch.yaml is the file that is being picked up and not the default at /etc/elasticsearch.yaml
5. Configure logging in elasticsearch.yaml to the location
Hope this helps?

Occasionally web server unable to write on storage/logs on production environment

Using AWS as my host, occasionally the production server is unable to write logs to the storage/logs/* and causes my application to white-screen. I don't sudo when git pull'ing and my storage owner/group/permissions are as follows:
drwxrwxr-x 6 apache apache
There doesn't seem to be any sort of pattern as to when the white-screen'ing happens. It's without any sort of admin pulling or messing with files, it happens "naturally". Is there a small daemon or something that could be changing permissions?
The problem was supervisor's running as root which would overwrite logs as root.
In my supervisor program's .conf:
[program:my_programs_name]
user=ec2-user

install mongoDB (child process failed, exited with error number 100)

I tried to install mongoDB on my macbook air.
I've downloaded zipped file from official website and extract that file and move to root directory.
After that, under that directory, I've made /data/db and /log folder.
Here is my mongodb.config which describes the basic config for my DB.
dbpath = /mongodb/data/db
logpath = /mongodb/log/mongo.log
logappend = true
#bind ip = 127.0.0.1
port = 27017
fork = true
rest = true
verbose = true
#auth = true
#noauth = true
Additionally, I want to know what the # means in the config file.
I put this file to /mongodb/bin, /mongodb is the directory I extracted the files into.
I opened terminal and entered ./mongod --config mongodb.config and I got this back.
Juneyoung-ui-MacBook-Air:bin juneyoungoh$ ./mongod --config mongodb.config
about to fork child process, waiting until server is ready for connections.
forked process: 1775
all output going to: /mongodb/log/mongo.log
ERROR: child process failed, exited with error number 100
How can I handle this error and what this means?
The data folders you created were very likely created with sudo, yes? They are owned by root and are not writable by your normal user. If you are the only user of your macbook, then change the ownership of the directories to you:
sudo chown juneyoungoh /data
sudo chown juneyoungoh /data/db
sudo chown juneyoungoh /data/log
If you plan on installing this on a public machine or somewhere legit, then read more about mongo security practices elsewhere. I'll just get you running on your macbook.
I had a similar issue and it was not related to any 'sudo' problem. I was trying to recover from a kernel panic!
When I look at my data folder I found out a mongod.lock file was there. In my case this page helped a lot: http://docs.mongodb.org/manual/tutorial/recover-data-following-unexpected-shutdown/. As they explain,
if the mongod.lock is not a zero-byte file, then mongod will refuse to start.
I tested this solution in my environment and it works perfectly:
Remove mongod.lock file.
Repair the database: mongod --dbpath /your/db/path --repair
Run mongod: mongod --dbpath /your/db/path
There was the same problem on my machine. In the log file was:
Mon Jul 29 09:57:13.689 [initandlisten] ERROR: Insufficient free space for journal file
Mon Jul 29 09:57:13.689 [initandlisten] Please make at least 3379MB available in /var/mongoexp/rs2/journal or use --smallfiles
It was solved by using mongod --smallfiles. Or if you start mongod with --config option than in a configuration file disable write-ahead journaling by nojournal=true (remove the beginning #). Some more disk space would also solve the above problem.
It's because you probably didn't shutdown mongodb properly and you are not starting mongodb the right way. According your mongodb.config, you have dbpath = /mongodb/data/db - so I assume you created the repository /mongodb/data/db? Let me clarify all the steps.
TO START MONGODB
In your mongodb.config change the dbpath = /mongodb/data/db to dbpath = /data/db. On your terminal create the db repository by typing: mkdir /data/db. Now you have a repository - you can start your mongo.
To start mongo in the background type: mongod --dbpath /data/db --fork --logpath /dev/null.
/data/db is the location of the db.
--fork means you want to start mongo in the background - deamon.
--logpath /dev/null means you don't want to log - you can change that by replacing /dev/null to a path like /var/log/mongo.log
TO SHUTDOWN MONGODB
Connect to your mongo by typing: mongo and then use admin and db.shutdownServer(). Like explain in mongoDB
If this technique doesn't work for some reason you can always kill the process.
Find the mongodb process PID by typing: lsof -i:27017 assuming your mongodb is running on port 27017
Type kill <PID>, replace <PID> by the value you found the previous command.
Similar issue with the same error - I was trying to run the repair script
sudo -u mongodb mongod -f /etc/mongodb.conf --repair
Checked ps aux | grep mongo and see that the daemon was running. Stopped it and then the repair script run without an issue.
Hope that could be helpful for someone else.
I had the same error on linux (Centos) and this worked for me
Remove mongod.lock from the dbpath
$ rm /var/lib/mongo/mongod.lock
Repair the mongod process
$ mongod --repair
Run mongod config
$ mongod --config /etc/mongod.conf
I had the same error. I ran it interactively to see the log.
2014-10-21T10:12:35.418-0400 [initandlisten] ERROR: listen(): bind() failed errno:48 Address already in use for socket: 0.0.0.0:27017
Then I used lsof to find out which process was using my port.
$ lsof -i:27017
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mongod 2106 MYUSERID 10u IPv4 0x635b71ec3b65b4a1 0t0 TCP *:27017 (LISTEN)
It was a mongod that I had forked previously and forgot to turn off (since I hadn't seen it running in my bash window).
Simply killing it by running kill 2106, enabled my process to run without the error 100.
Generally, this error comes when the mongod.conf file is not able to
find a certain path for
Database store
or log store
or maybe processid store
or maybe it's not getting the file permission to access the config directories and files which has been declared in mongod.conf
to resolve this error we need to observe the log generated by the MongoDB
it will clearly indicate whether which file or directory you MongoDB is not able to access
the above error may look like below screenshot
create folder "data" and "db" inside it, in "/" path of your server.
actually you should create or modify permissions of folder that the data is going to be stored!

How gracefully restart Sphinx search daemon after reindexing

I've reindexed my Sphinx search with /usr/local/sphinx/bin/indexer --all --rotate and renamed my original index output files to something else. Simply changing the index argument passed to $sphinx->Query($query, $index); returns no results.
I suspected the daemon doesn't know the new index files exist. So I ran
sudo /usr/local/sphinx/bin/searchd
again to try to restart it. But it threw
FATAL: failed to lock pid file '/usr/local/sphinx/var/log/searchd.pid': Resource temporarily unavailable (searchd already running?)
I had to kill the 2 processes of the search daemon and start it again to grab from the new index files. Is there a graceful way to restart it?
I know this is a late answer, but just so you know, to 'restart' Sphinx, you need to stop it then start it (as in, two distinct processes).
To stop it, call searchd --stop then just start it again with searchd.
You'll need to call indexer on the new index to create it and then --rotate to update it.
So it would be something like
indexer --config /path/to/config.conf indexname
And then when you just want to update your indexes
indexer --config /path/to/config.conf --rotate --all
This will create a temporary copy of each index and replace the old ones when finished.
For more info on what actually happens see http://sphinxsearch.com/docs/manual-0.9.9.html#ref-indexer
On the other error your getting
Do
ps aux | grep searchd
if it returns no results, then remove /usr/local/sphinx/var/log/searchd.pid
and start searchd again
It seems there is an issue with the searchd --stop command failing to stop the daemon on some instances of Sphinx.
Try: service sphinxsearch stop
See: https://bugs.launchpad.net/ubuntu/+source/sphinxsearch/+bug/990395
service searchd start worked for me on CentOS

Resources