Pacemaker: adding custom resource - high-availability

I am trying to create an HA cluster with Pacemaker on CentOS7.
One of the required resources is a custom service. I have an LSB-compliant init script that I have put into /etc/init.d, and I have it listed when running:
pcs resource agents lsb:heartbeat
When I try to add the resource with
pcs resource create MyServer lsb:heartbeat:MyServer target_role=started resource_failure_stickiness=-INFINITY op monitor interval=30s op start timeout=180s op stop timeout=180s op status timeout=15 --group AllResources
The error I get:
Error: Unable to create resource 'lsb:heartbeat:MyServer', it is not installed on this system (use --force to override)
If I run it with --force, I get the following:
Call cib_replace failed (-203): Update does not conform to the configured schema
The group AllResources has other two resources: Ping and IPAddr2, that were added in a similar way with no errors.
What am I missing? Anyone faced something like this?

Turns out that unlike Heartbeat, in Pacemaker the form in which we address a custom LSB script is lsb:MyServer.
The working command would be:
pcs resource create MyServer lsb:MyServer target_role=started resource_failure_stickiness=-INFINITY op monitor interval=30s op start timeout=180s op stop timeout=180s op status timeout=15 --group AllResources

Related

Clickhouse server error - org.freedesktop.PolicyKit1

I am getting this error when i am trying to restart my clickhouse server.
Failed to start clickhouse-server.service: The name org.freedesktop.PolicyKit1 was not provided by any .service files
See system logs and 'systemctl status clickhouse-server.service' for details.
Upon further inspection of server. We noticed that Log directory was full. After flushing the logs clickhouse server restarted normally. But the error message made no sense cite the actual problem. then what is this error pointing to ? Pls enlight
org.freedesktop.PolicyKit1 , is like sudo but for systemd. It should be enabled for systemd to work. Resolved it by accessing ec2 superuser privelage.
sudo su

Can't create external initiators from chainlink CLI

We're trying to set external initiators to our chainlink containers deployed in GKE cluster according to the docs: https://docs.chain.link/docs/external-initiators-in-nodes/
I log into the the pod:
kubectl exec -it -n chainlink chainlink-75dd5b6bdf-b4kwr -- /bin/bash
And there I attempt to create external initiators:
root#chainlink-75dd5b6bdf-b4kwr:/home/root# chainlink initiators create xxx xxx
No help topic for 'initiators'
I don’t even see initiators in chainlink cli options:
root#chainlink-75dd5b6bdf-b4kwr:/home/root# chainlink
NAME:
chainlink - CLI for Chainlink
USAGE:
chainlink [global options] command [command options] [arguments...]
VERSION:
0.9.10#7cd042c1a94c57219ed826a6eab46752d63fa67a
COMMANDS:
admin Commands for remotely taking admin related actions
attempts, txas Commands for managing Ethereum Transaction Attempts
bridges Commands for Bridges communicating with External Adapters
config Commands for the node's configuration
job_specs Commands for managing Job Specs (jobs V1)
jobs Commands for managing Jobs (V2)
keys Commands for managing various types of keys used by the Chainlink node
node, local Commands for admin actions that must be run locally
runs Commands for managing Runs
txs Commands for handling Ethereum transactions
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--json, -j json output as opposed to table
--help, -h show help
--version, -v print the version
Chainlink version 0.9.10.
Could you please clarify what am I doing wrong?
You need to make sure you have the FEATURE_EXTERNAL_INITIATORS environment variable set to true in your .env file as such:
FEATURE_EXTERNAL_INITIATORS=true
This will open up access to the initiators command in the Chainlink CLI and you can resume the instructions from there.

how to manually start/stop hadoop services on boot up/down?

Hi is someone aware about stopping and starting CDH(cloudera distribution Hadoop) Services with script we are doing this for production servers. For an instance if servers are restarted then before reboot all the Hadoop services stops gracefully and on startup the start.
I have a 8 Node Hadoop cluster on RHEL with cloudera 5.4.7 installed on it.
Till now i have identified few ways to do that one is here on link it says i have to use chkconfig to register the service on OS for eg as below:
sudo chkconfig hadoop-hdfs-namenode on
But when i am doing that i am getting error as
error reading information on service hadoop-hdfs-namenode: No such file or directory
which clearly states that it is unable to find the file i have specifed.
Then i searched for file and it is located in
/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/rc.d/init.d/hadoop-hdfs-namenode
/opt/cloudera/parcels/CDH-5.4.7-1.cdh5.4.7.p0.3/etc/default/hadoop-hdfs-namenode
the i tried executing the same commands from the folder itself where files are located but the same error. The permissions are fine on file and tried ./ as well but same error.
I am also able to list all the process which are currently running by
sudo jps
14035 -- process information unavailable
10615 -- process information unavailable
15323 -- process information unavailable
5486 -- process information unavailable
2001 -- process information unavailable
46991 -- process information unavailable
42667 -- process information unavailable
33732 Jps
2698 -- process information unavailable
2727 -- process information unavailable
7901 -- process information unavailable
42624 -- process information unavailable
As one can see process names are not coming but these are hadoop process so to stop the process i can kill all of them but this is not the way to gracefully stop hadoop managed by cloudera. Please let me know if anyone is aware of anything which can help me moving forward.
Thanks to cloudera they provide a way to boot services on system startup. Below is the way to do that:
Click on the service
Go to the configuration
Search for Automatically Restart Process
Check the Check-Box.
It will restart the services on bootup.
you can do this by executing curl command form shell script. For example to start solr service you can use
curl -u admin:admin -X POST http://ipaddress:7180/api/v4/clusters//services/solr1/commands/start -H 'Content-type:aplication/json; charset=utf-8';
For More details on the visit
http://cloudera.github.io/cm_api/apidocs/v10/index.html

How to install Redis Sentinel as a Windows service?

I am trying to set up a redis sentinel as a windows service on a Azure VM (IaaS).
I am using the MS OpenTech port of Redis for Windows and running the following command...
redis-server --service-install --service-name rdsent redis.sentinel.conf --sentinel
This command installs the service on my system but when I try to start this service either through the services control panel or through the following command...
redis-server --service-run --service-name rdsent redis.sentinel.conf --sentinel
Then the service fails to start with the following error...
HandleServiceCommands: system error caught. error code=1063, message = StartServiceCtrlDispatcherA failed: unknown error
Am I missing something here?
Please someone help me start this service make it work properly.
I had the same problem, and mine was related to my sentinel config. A number of articles I have found have some incorrect examples, so my service install would not work until the configuration was correct. Anyway, here is what you need at a minimum for your sentinel config (for Windows Redis 2.8.17):
sentinel monitor <name of redis cache> <server IP> <port> 2
sentinel down-after-milliseconds <name of redis cache> 4000
sentinel failover-timeout <name of redis cache> 180000
sentinel parallel-syncs <name of redis cache> 1
Once you have that setup, the original Redis service command above will work.
According to MSOpenTech, the following command should install Redis Sentinel as a service:
redis-server --service-install --service-name Sentinel1 sentinel.1.conf --sentinel
But when I used that command the installed service wouldn't start: it would immediately fail with error 1067, "The process terminated unexpectedly." Looking at service entry I'm guessing the problem is that the --service-name parameter isn't being filtered and ends up as part of the service executable path.
What I did find to work is installing the service manually with the SC command:
SC CREATE Sentinel1 binpath= "\"C:\Program Files\Redis\redis-server.exe\" --service-run sentinel.1.conf --sentinel"
Don't forget the required space after "binpath=", and obviously that path will have to reflect where you've installed redis-server.exe. Also after the service installed I edited the service entry so Redis Sentinel would run under the Network Service account.
I am using v3.0.501 and ran into the two issues below. While present it caused the service to fail on start without an error written to either the file log or the Event Log.
The configuration file must be the last parameter of the command line. If another parameter was last, such as --service-name, it would run fine when invoked the command line but would consistently fail went started as a service.
Since the service installs a Network Service by default, ensure that it has access to the directory where the log file will be written.
Once these two items were accounted for the redis as a service run smooth as silk.
Recently, I have found a way how to setup windows service for Redis and Sentinel.
During my setup, I encountered similar problem. I finally figured it out: it was caused by the configuration file path.
I have put all my configuration into my github project: https://github.com/dingyuliang/Windows-Redis-Sentinel-Config

rabbitmqctl Error: unable to connect to node rabbit#myserver nodedown

I am running RabbitMQ v3.3.5 with Erlang OTP 17.1 on Windows 2008 R2. My Dev and QA environments are stand-alone. My staging and production environments are clustered.
I am finding this one problem happening often where the RabbitMQ service is running, the RabbitMQ management console is seeing everything, but when I try running rabbitmqctl from the command line it fails with an error saying that the node is down (tried locally and on a remote server).
This problem is resolved if I restart the Windows service.
I see no error message in the RabbitMQ error log. The last message indicated that the node was up.
Below is an example output of the issue that I recently experienced on node 2 of our staging windows cluster:
PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit#MYSERVER2 ...
Error: unable to connect to node rabbit#MYSERVER2: nodedown
DIAGNOSTICS
===========
attempted to contact: [rabbit#MYSERVER2]
rabbit#MYSERVER2:
* connected to epmd (port 4369) on MYSERVER2
* epmd reports: node 'rabbit' not running at all
no other nodes on MYSERVER2
* suggestion: start the node
current node details:
- node name: rabbitmqctl2199771#MYSERVER2
- home dir: C:\Users\RabbitMQ
- cookie hash: mn6OaTX9mS4DnZaiOzg8pA==
at this point I restart the RabbitMQ service and then try again
PS C:\Program Files (x86)\RabbitMQ Server\rabbitmq_server-3.3.5\sbin> .\rabbitmqctl.bat status
Status of node rabbit#MYSERVER2...
[{pid,3784},
{running_applications,
[{rabbitmq_management_agent,"RabbitMQ Management Agent","3.3.5"},
{rabbit,"RabbitMQ","3.3.5"},
{os_mon,"CPO CXC 138 46","2.2.15"},
{mnesia,"MNESIA CXC 138 12","4.12.1"},
{xmerl,"XML parser","1.3.7"},
{sasl,"SASL CXC 138 11","2.4"},
{stdlib,"ERTS CXC 138 10","2.1"},
{kernel,"ERTS CXC 138 10","3.0.1"}]},
{os,{win32,nt}},
{erlang_version,
"Erlang/OTP 17 [erts-6.1] [64-bit] [smp:4:4] [async-threads:30]\n"},
{memory,
[{total,35960208},
{connection_procs,2704},
{queue_procs,5408},
{plugins,111936},
{other_proc,13695792},
{mnesia,102296},
{mgmt_db,0},
{msg_index,21816},
{other_ets,884704},
{binary,25776},
{code,16672826},
{atom,602729},
{other_system,3834221}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,3435787059},
{disk_free_limit,50000000},
{disk_free,74911649792},
{file_descriptors,
[{total_limit,8092},
{total_used,4},
{sockets_limit,7280},
{sockets_used,2}]},
{processes,[{limit,1048576},{used,139}]},
{run_queue,0},
{uptime,5}]
...done.
Any idea as to what causes this and how to automatically detect the situation?
Is this specifically a problem with running RabbitMQ on Windows?
Hostnames are case-insensitives when you are trying to resolve them. For example, LOCALHOST and localhost are the same host.
However, when Erlang constructs the name of a node (eg. rabbit#<hostname> in the case of RabbitMQ), this name is case-sensitive. So rabbit#LOCALHOST and rabbit#localhost are two different node names, even if they run on the same host.
Recently, we (the RabbitMQ team) found out that, on Windows, the node name constructed for RabbitMQ was inconsistent. Therefore, sometimes, RabbitMQ started as a Windows service could be named rabbit#MYHOST but rabbitmqctl would try to reach rabbit#myhost and fail.
Since RabbitMQ 3.6.0, the node name should be consistent.
To anyone else getting this error, this was my fix. I installed Erlang, but overlooked the instructions on setting up the Environmental Variable.
I was reading the manual install page:
https://www.rabbitmq.com/install-windows-manual.html
and found the following:
Set ERLANG_HOME to where you actually put your Erlang installation,
e.g. C:\Program Files\erlx.x.x (full path). The RabbitMQ batch files
expect to execute %ERLANG_HOME%\bin\erl.exe.
Go to Start > Settings > Control Panel > System > Advanced >
Environment Variables. Create the system environment variable
ERLANG_HOME and set it to the full path of the directory which
contains bin\erl.exe.
For some reason, the auto install assigned the wrong path name to the ERLANG_HOME variable - see image below. I simply added \bin on the end.
I had a similar problem on my linux box and am posting the answer here, because rabbitmq on windows may handle things similarly.
My post and solution: rabbtimqadmin - Could not connect: [Errno -2] Name or service not known
The core issue was changing the servername after rabbitmq was configured. When installed, rabbitmq references the servers name, making it part of its configuration. I can see this being a similar issue on windows.
In short, you can change server's name back to the name it was when you first installed rabbitmq or you can add a rabbitmq-env.conf file, I'm not sure where it would go in windows, but the following gives details for linux: https://www.rabbitmq.com/man/rabbitmq-env.conf.5.man.html
Note that on linux the name of the server was CaSe SENiTivE! So you may or may not have a similar issue with windows.
Hope this helps and good luck!
If you are using linux try to change permission of /var/lib/rabbitmq/mnesia folder.

Resources