elasticsearch.yml : will the configuration file be written on the fly? - elasticsearch

I am building my own docker image for the elasticsearch application.
One question I have : will the configuration file elasticsearch.yml be modified by the application on the fly?
I hope that will never happen even if the node is running in cluster. But some other application (like redis), they modify the config file on the fly when cluster status changes. And if the configuration file changes on the fly, I have to export it as volumn since docker image can not retain the changes done on the fly

No, you don't run any risk of overwriting your configuration file. The configuration will be read from that file and kept in memory. ES also allows to persistently change settings at runtime, but they are stored in another global cluster state file (in data/CLUSTER_NAME/nodes/N/_state, where N is the 0-based node index) and re-read on each restart.

Related

How to move the data and log location in ElasticSearch

I have ES cluster setup with 3 master and 2 data node and running properly. I want to change one of the data node data and log location from local to external disk
In my current YAML file
path.data: /opt/elasticsearch/data
path.logs: /opt/logs/elasticsearch
Now I added 2 external disk to my server to store data/logs and would like to change the location to the new drives
I have added the new disk. What is correct process to point ES data/log to the new disk
The data on this node can be deleted as this is a dev env.
Could I just stop the ES on this server
delete the info in the current data and log folder
mount the new drive to the same mount point and restart the cluster
Thanks
You could just change the settings in YAML file and restart the elasticsearch service, it should work for you. There is no automatic reload when you change any YAML configuration.
Steps :
change Path in YAML
Restart the service

Sync config files between nodes on hadoop cluster

I have a hadoop cluster consisting of 4 nodes on which I am running a pyspark script. I have a config.ini file which contains details like locations of certificates, passwords, server names etc which are needed by the script. Each time this file is updated I need to sync the changes across all 4 nodes. Is there a way to avoid that?
I have needed to sync or update changes to my script. Making them on just one node and running it from there is enough. Is the same possible for the config file?
The most secure answer is likely to learn how to use a keystore with spark.
A little less secure but still good. Have you considered you could just put the file in HDFS and then just reference it? (lower security but easier to use)
Unsecure methods that are easy to use:
You can also pass it as a file to spark-submit to transfer the file for you.
Or you could add the values to your spark submit.

Using Apache Nifi in a docker instance, for a beginner

So, I want, very basically, to be able to spin up a container which runs Nifi, with a template I already have. I'm very new to containers, and fairly new to Nifi. I think I know how to spin up a Nifi container, but not how to make it so that it will automatically run my template every time.
You can use the apache/nifi Docker container found here as a starting point, and use a Docker RUN/COPY command to inject your desired flow. There are three ways to load an existing flow into a NiFi instance.
Export the flow as a template (an XML file containing the exported flow segment) and import it as a template into your running Nifi instance. This requires the "destination" NiFi instance to be running and uses the NiFi API.
Create the flow you want, manually extract the entire flow from the "source" NiFi instance by copying $NIFI_HOME/conf/flow.xml.gz, and overwrite the flow.xml.gz file in the "destination" NiFi's conf directory. This does not require the destination NiFi instance to be running, but it must occur before the destination NiFi starts.
Use the NiFi Registry to version control the original flow segment from the source NiFi and make it available to the destination NiFi. This seems like overkill for your scenario.
I would recommend Option 2, as you should have the desired flow as you want it. Simply use COPY /src/flow.xml.gz /destination/flow.xml.gz in your Dockerfile.
If you literally want it to "run my template every time", you probably want to ensure that the processors are all in enabled state (showing a "Play" icon) when you copy/save off the flow.xml.gz file, and that in your nifi.properties, nifi.flowcontroller.autoResumeState=true.

How to enable hadoop-metrics.properties

I need to report hadoop metrics (such as jvm, cldb) to a text file. I modified hadoop-metrics file in a conf directory on one of the nodes to test, but output files still didn't appear.
I tried to restart YARN-nodemanager and node itself, but still no result.
Am I need to do some additional magic, like changing env variables or other configs?
The problem was with a wrong config. I've been using a sample config file which supposed to report Namenode and Resourcemanager metrics.
But my node didn't have both of them.
Added other metrics. Works fine.

hbase zookeeper client connect issue

I received the below error in zookeeper when I log in my cluster environment. I am using the default zookeeper which comes along with hbase.
HBase is able to connect to ZooKeeper but the connection closes immediately.
This could be a sign that the server has too many connections (30 is the default)
Consider inspecting your ZK server logs for that error and then make sure you
are reusing HBase Configuration as often as you can. See HTable's javadoc for
more information.
It seems to be a file handle issue to me. HBase uses a lot of files all at the same time. The default ulimit -n -- i.e. user file limit -- of 1024 on most *nix systems is insufficient. Increasing the maximum number of file handles to a higher value say 10,000 or more might help. Please note that increasing the file handles for the user who is running the HBase process is an operating system configuration and not an HBase configuration.
If you are on Ubuntu you will need to make the following changes:
In the file /etc/security/limits.conf add the following line
hadoop - nofile 32768
Replace hadoop with whatever user is running Hadoop and HBase. If you have separate users, you will need 2 entries, one for each user. In the same file set nproc hard and soft limits. For example:
hadoop soft/hard nproc 32000
In the file /etc/pam.d/common-session add as the last line in the file:
session required pam_limits.so
Otherwise the changes in /etc/security/limits.conf won't be applied.
Don't forget to log out and back in again for the changes to take effect.
Reference : http://hbase.apache.org/book.html#basic.prerequisites
HTH
There can be many reasons.
I would say, first try this.
verify if /etc/hosts file is set correctly
Check your hbase configuration. Have you configured hbase-env.sh to manage Zookeeper itself? Have you configured zookeeper quorums in hbase-site.xml?
try copying the zoo.cfg in the hadoop conf directory (on classpath really) on the entire
cluster. (Source)
If that doesn't work out, go through your code and see if you are creating multiples HBaseConfiguration objects. The approach, as mentioned here, is to this is to create one single HBaseConfiguration object and then reuse it in all your code.
You can also take a look at the hbase.regionserver.handler.count
http://hbase.apache.org/configuration.html#recommended_configurations

Resources