es_exporter service configuration - elasticsearch

While setting up ElasticSearch Exporter's service, I came up with this below content
[Unit]
Description=Prometheus ES_exporter
After=local-fs.target network-online.target network.target
Wants=local-fs.target network-online.target network.target
[Service]
User=root
Nice=10
ExecStart = /usr/local/bin/es_exporter --es.uri=http://elastic_user:XXXXXXXXXXX#localhost:9200 --es.all --es.indices --es.timeout 20s
ExecStop= /usr/bin/killall es_exporter
[Install]
WantedBy=default.target
I din't get what values to be put in....
http://elastic_user:XXXXXXXXXXX#localhost:9200
Will it be like .... ?
http://elastic_user(by which I am starting peocess):(PASSWORD)#(IP/LOCALHOST):9200
[ Additional Info : These changes are being used to Monitor ElasticSearch Cluster using Prometheus and Grafana ]

In the documentation, this is written :
the address of a remote Elasticsearch server. When basic auth is needed, specify as: ://:#:. E.G., http://admin:pass#localhost:9200.
The documentation here : https://github.com/prometheus-community/elasticsearch_exporter#configuration
An example here :
https://github.com/Lyr/ansible-elasticsearch-exporter/blob/master/templates/elasticsearch_exporter.service.j2

Related

Installing navidrome throws "Unit navidrome.service is not loaded properly: Exec format error."

While installing navidrome I am getting this error:
hardik:/etc/systemd/system$ sudo systemctl start navidrome.service
Failed to start navidrome.service: Unit navidrome.service is not loaded properly: Exec format error.
See system logs and 'systemctl status navidrome.service' for details.
The content of navidrome.service is given below:-
navidrome.service
[Unit]
Description=Navidrome Music Server and Streamer compatible with Subsonic/Airsonic
After=remote-fs.target network.target
AssertPathExists=/var/lib/navidrome
[Install]
WantedBy=multi-user.target
[Service]
User=<user>
Group=<group>
Type=simple
ExecStart=/opt/navidrome/navidrome --configfile "/var/lib/navidrome/navidrome.toml"
WorkingDirectory=/var/lib/navidrome
TimeoutStopSec=20
KillMode=process
Restart=on-failure
# See https://www.freedesktop.org/software/systemd/man/systemd.exec.html
DevicePolicy=closed
NoNewPrivileges=yes
PrivateTmp=yes
PrivateUsers=yes
ProtectControlGroups=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
RestrictNamespaces=yes
RestrictRealtime=yes
SystemCallFilter=~#clock #debug #module #mount #obsolete #reboot #setuid #swap
ReadWritePaths=/var/lib/navidrome
# You can uncomment the following line if you're not using the jukebox This
# will prevent navidrome from accessing any real (physical) devices
#PrivateDevices=yes
# You can change the following line to `strict` instead of `full` if you don't
# want navidrome to be able to write anything on your filesystem outside of
# /var/lib/navidrome.
ProtectSystem=full
# You can uncomment the following line if you don't have any media in /home/*.
# This will prevent navidrome from ever reading/writing anything there.
#ProtectHome=true
# You can customize some Navidrome config options by setting environment variables here. Ex:
#Environment=ND_BASEURL="/navidrome"
Why am I getting the error and how do I fix it?
I had the same error when I was trying to start the service on my raspberry pi 3 using navidrome_0.47.5_Linux_arm64.tar.gz. When I replaced it with files from navidrome_0.47.5_Linux_armv7.tar.gz, everything went fine. It's likely that you might be trying to run the executable with a wrong architecture.
Also I believe that User and Group should contain the actual user and group that you chose here:
sudo install -d -o <user> -g <group> /opt/navidrome
sudo install -d -o <user> -g <group> /var/lib/navidrome

Recovering from Consul "No Cluster leader" state

I have:
one mesos-master in which I configured a consul server;
one mesos-slave in which I configure consul client, and;
one bootstrap server for consul.
When I hit start I am seeing the following error:
2016/04/21 19:31:31 [ERR] agent: failed to sync remote state: rpc error: No cluster leader
2016/04/21 19:31:44 [ERR] agent: coordinate update error: rpc error: No cluster leader
How do I recover from this state?
Did you look at the Consul docs ?
It looks like you have performed a ungraceful stop and now need to clean your raft/peers.json file by removing all entries there to perform an outage recovery. See the above link for more details.
As of Consul 0.7 things work differently from Keyan P's answer. raft/peers.json (in the Consul data dir) has become a manual recovery mechanism. It doesn't exist unless you create it, and then when Consul starts it loads the file and deletes it from the filesystem so it won't be read on future starts. There are instructions in raft/peers.info. Note that if you delete raft/peers.info it won't read raft/peers.json but it will delete it anyway, and it will recreate raft/peers.info. The log will indicate when it's reading and deleting the file separately.
Assuming you've already tried the bootstrap or bootstrap_expect settings, that file might help. The Outage Recovery guide in Keyan P's answer is a helpful link. You create raft/peers.json in the data dir and start Consul, and the log should indicate that it's reading/deleting the file and then it should say something like "cluster leadership acquired". The file contents are:
[ { "id": "<node-id>", "address": "<node-ip>:8300", "non_voter": false } ]
where <node-id> can be found in the node-id file in the data dir.
If u got raft version more than 2:
[
{
"id": "e3a30829-9849-bad7-32bc-11be85a49200",
"address": "10.88.0.59:8300",
"non_voter": false
},
{
"id": "326d7d5c-1c78-7d38-a306-e65988d5e9a3",
"address": "10.88.0.45:8300",
"non_voter": false
},
{
"id": "a8d60750-4b33-99d7-1185-b3c6d7458d4f",
"address": "10.233.103.119",
"non_voter": false
}
]
In my case I had 2 worker nodes in the k8s cluster, after adding another node the consul servers could elect a master and everything is up and running.
I will update what I did:
Little Background: We scaled down the AWS Autoscaling so lost the leader. But we had one server still running but without any leader.
What I did was:
I scaled up to 3 servers(don't make 2-4)
stopped consul in all 3 servers.sudo service consul stop(you can do status/stop/start)
created peers.json file and put it in old server(/opt/consul/data/raft)
start the 3 servers (peers.json should be placed on 1 server only)
For other 2 servers join it to leader using consul join 10.201.8.XXX
check peers are connected to leader using consul operator raft list-peers
Sample peers.json file
[
{
"id": "306efa34-1c9c-acff-1226-538vvvvvv",
"address": "10.201.n.vvv:8300",
"non_voter": false
},
{
"id": "dbeeffce-c93e-8678-de97-b7",
"address": "10.201.X.XXX:8300",
"non_voter": false
},
{
"id": "62d77513-e016-946b-e9bf-0149",
"address": "10.201.X.XXX:8300",
"non_voter": false
}
]
These id you can get from each server in /opt/consul/data/
[root#ip-10-20 data]# ls
checkpoint-signature node-id raft serf
[root#ip-10-1 data]# cat node-id
Some useful commands:
consul members
curl http://ip:8500/v1/status/peers
curl http://ip:8500/v1/status/leader
consul operator raft list-peers
cd opt/consul/data/raft/
consul info
sudo service consul status
consul catalog services
You may also ensure that bootstrap parameter is set in your Consul configuration file config.json on the first node:
# /etc/consul/config.json
{
"bootstrap": true,
...
}
or start the consul agent with the -bootstrap=1 option as described in the official Failure of a single server cluster Consul documentation.

systemd: How to use ExecStopPre in service files

Before my daemon is stopped I need to do call another program.
My first try was to use ExecStopPre similar to ExecStartPre but according to https://bugs.freedesktop.org/show_bug.cgi?id=73177 this is not supported and I should use "multiple ExecStop".
Anyone got an example for this? How should i kill the daemon from ExecStop?
You put multiple lines with ExecStop (from a node.js service): e.g.
[Service]
ExecStartPre=/usr/local/bin/npm run build
ExecStartPre=-/bin/rm local.sock
ExecStart=/usr/local/bin/npm --parseable start
ExecStop=/usr/local/bin/npm --parseable stop
ExecStop=-/bin/rm local.sock
RestartSec=300
Restart=always
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=nodejs
User=nobody
Group=nobody
Environment=NODE_ENV=dev
Environment=PORT=3000
WorkingDirectory=/var/www/nodejs/quaff
UMask=007

neo4j 2.1.8 HA in Amazon EC2

I have the following Problem:
I want to set up 3 neo4j-servers in EC2. All neo4j servers are on the same VPC-Network
The configuration of each neo4j server is as follows:
neo4j-Master:
conf/neo4j-server.properties
org.neo4j.server.webserver.address=110.0.0.5
org.neo4j.server.webserver.port=7474
org.neo4j.server.webserver.https.port=7484
org.neo4j.server.database.mode=HA
conf/neo4j.properties
ha.server_id=1
ha.initial_hosts=110.0.0.5:5001,110.0.1.5:5001,110.0.2.5:5001
ha.cluster_server=110.0.0.5:5001
ha.server=110.0.0.5:6001
neo4j-Slave-1:
conf/neo4j-server.properties
org.neo4j.server.webserver.address=110.0.1.5
org.neo4j.server.webserver.port=7475
org.neo4j.server.webserver.https.port=7485
org.neo4j.server.database.mode=HA
conf/neo4j.properties
ha.server_id=2
ha.initial_hosts=110.0.0.5:5001,110.0.1.5:5001,110.0.2.5:5001
ha.cluster_server=110.0.1.5:5001
ha.server=110.0.1.5:6001
neo4j-Slave-2:
conf/neo4j-server.properties
org.neo4j.server.webserver.address=110.0.2.5
org.neo4j.server.webserver.port=7476
org.neo4j.server.webserver.https.port=7486
org.neo4j.server.database.mode=HA
conf/neo4j.properties
ha.server_id=3
ha.initial_hosts=110.0.0.5:5001,110.0.1.5:5001,110.0.2.5:5001
ha.cluster_server=110.0.2.5:5001
ha.server=110.0.2.5:6001
And after trying to start the neo4j-server(Master) the following warning is given:
2015-09-21 12:02:45.964+0000 INFO [Cluster] Write transactions to database disabled
Where is the problem?

Systemd can't execute script which loads kernel module

When I'm trying to execute script through systemd service - I receive error message and script can't be run.
init_something.service file:
[Unit]
Description=Loading module --module_name module
[Service]
Type=oneshot
ExecStart=/usr/lib/systemd/init_script
[Install]
WantedBy=multi-user.target
init_script file:
#!/bin/bash -
/usr/local/bin/init.sh --module_init
And now if I try to start service by systemctl I receive error message:
# systemctl start init_something.service
Job for init_something.service failed. See 'systemctl status init_something.service' and 'journalctl -xn' for details
# systemctl status init_something.service
init_something.service - Loading module --module_name module
Loaded: loaded (/usr/lib/systemd/init_something.service)
Active: failed (Result: exit-code) since Thu 1970-01-01 08:00:24 CST; 1min 49s ago
Process: 243 ExecStart=/usr/lib/systemd/init_script (code=exited, status=1/FAILURE)
Main PID: 243 (code=exited, status=1/FAILURE)
But if I try to run init_script manualy - it works perfectly:
# /usr/lib/systemd/init_script
[ 447.409277] SYSCLK:S0[...]
[ 477.523434] VIN: (...)
Use default settings
map_size = (...)
u_code version = (...)
etc.
And finally module is loaded successfully.
So the question is - why systemctl can't execute this script, but manually it's no problem?
For running any script file, system needs shell. But systemd do'nt have its own shell. So you need to provide shell for running script.
so use ExecStart=/bin/sh /usr/lib/systemd/init_script in your service unit.
[Unit]
Description=Loading module --module_name module
[Service]
Type=oneshot
ExecStart=/bin/sh /usr/lib/systemd/init_script
[Install]
WantedBy=multi-user.target
And
chmod 777 /usr/lib/systemd/init_script
before running your script.

Resources