RabbitMQ as Windows service: badarith error on rpc.erl - windows

I am experiencing some problems with RabbitMQ started as a service on Windows.
Operative System: Windows 8 (Microsoft Windows NT version 6.2 Server)
(build 9200)
Erlang: R16B03 (erts-5.10.4)
RabbitMQ: 3.2.2
Goal: create a RabbitMQ cluster with three servers: Srv1, Srv2, Srv3.
Note: I have carefully followed the official documentation
All the following operations are executed as user "Administrator".
FIRST SCENARIO: start RabbitMQ from command line as a background process
I used the command "rabbitmq-server -detached" on Srv1.
Result: a file ".erlang.cookie" is created under C:\Users\Administrator
The execution of the command "rabbimqctl status" is successful and gives me the current state of the node.
I can then copy the file .erlang.cookie in the same folder on Srv2 and Srv3 and successfully create a cluster.
SECOND SCENARIO: start RabbitMQ as a service (this is requirement I have)
Result: the file ".erlang.cookie" is created under C:\Windows.
When I type the command "rabbitmqctl status" another file .erlang.cookie is created under C:\Users\Administrator and I receive the following result:
C:\Program Files\Aspect\DashBoard\RabbitMQ\sbin>rabbitmqctl.bat status
Status of node 'rabbit#RABBITMQ-NODE4' ...
Error: unable to connect to node 'rabbit#RABBITMQ-NODE4': nodedown
DIAGNOSTICS
===========
nodes in question: ['rabbit#RABBITMQ-NODE4']
hosts, their running nodes and ports:
- RABBITMQ-NODE4: [{rabbit,49428},{rabbitmqctl3045334,49434}]
current node details:
- node name: 'rabbitmqctl3045334#rabbitmq-node4'
- home dir: C:\Users\Administrator
- cookie hash: 0DLAKf8pOVrGC016+6BDBw==
We know that this is ok because the two cookies are different.
So I copy the .erlang.cookie file from C:\Windows into C:\Users\Administrator and I try again the same command. This time I get:
C:\Program Files\Aspect\DashBoard\RabbitMQ\sbin>rabbitmqctl.bat status
Status of node 'rabbit#RABBITMQ-NODE4' ...
Error: unable to connect to node 'rabbit#RABBITMQ-NODE4': nodedown
DIAGNOSTICS
===========
nodes in question: ['rabbit#RABBITMQ-NODE4']
hosts, their running nodes and ports:
- RABBITMQ-NODE4: [{rabbitmqctl1178095,49471}]
current node details:
- node name: 'rabbitmqctl1178095#rabbitmq-node4'
- home dir: C:\Users\Administrator
- cookie hash: TIuqp21HOQSoUJT8JfgRQw==
C:\Program Files\Aspect\DashBoard\RabbitMQ\sbin>rabbitmqctl.bat status
Status of node 'rabbit#RABBITMQ-NODE4' ...
Error: {badarith,[{rabbit_vm,bytes,1,[]},
{rabbit_vm,'-mnesia_memory/0-lc$^0/1-0-',1,[]},
{rabbit_vm,mnesia_memory,0,[]},
{rabbit_vm,memory,0,[]},
{rabbit,status,0,[]},
{rpc,'-handle_call_call/6-fun-0-',5,
[{file,"rpc.erl"},{line,205}]}]}
Please notice the Error at the end: "badarith" in rpc.erl, line 205.
I think that the file is Erlang\lib\kernel-2.16.4\src\rpc.erl
The function is this one:
handle_call_call(Mod, Fun, Args, Gleader, To, S) ->
RpcServer = self(),
%% Spawn not to block the rpc server.
{Caller,_} =
erlang:spawn_monitor(
fun () ->
set_group_leader(Gleader),
Reply =
%% in case some sucker rex'es
%% something that throws
case catch apply(Mod, Fun, Args) of
{'EXIT', _} = Exit ->
{badrpc, Exit};
Result ->
Result
end,
RpcServer ! {self(), {reply, Reply}}
end),
{noreply, gb_trees:insert(Caller, To, S)}.
and line 205 is 'case catch apply(Mod, Fun, Args) of'
THIRD SCENARIO: start RabbitMQ as a named user to avoid it to create the file .erlang.cookie under C:\Windows
I set the RabbitMQ service to log on as the user "Administrator", this way it does not create the file under C:\Windows but only under C:\User\Administrator.
Result: when the service starts, the file ".erlang.cookie" is created only under C:\User\Administrator.
When I type the command "rabbitmqctl status" I get the same error as in the provious case (badarith...).
Now the question: I have not found any information about this error (badarith).
Could anyone give me a suggestion about how to troubleshoot/avoid this?

Related

Clickhouse not start on red-hat 7.8 with error "DNS error: EAI: Address family for hostname not supported"

I installed clickhouse 21.2.4.6 (from tgz file) on red hat 7.8 and by executing the command
"systemctl start clickhouse-server"
the clickhouse server does not start and in the error file there are several messages:
Application: DB :: Exception : Listen [::]: 8123 failed: Little :: Exception.
Code: 1000, e.code () = 0, e.displayText () = DNS error: EAI:
Address family for hostname not supported (version 21.2.4.6 (official build)).
The <listen_host> :: 1 </listen_host> tag is commented in the config.xml file and the server ip <listen_host> ip_server </listen_host> is configured.
Can you give me some information to solve this problem?
please find and check your clickhouse-server.service file in systemd related directories and check how exactly clickhouse-server binary run, check --config parameter
usually you just need edit /etc/clickhouse-server/config.xml
and replace <listen_host>::1</listen_host> to <listen_host>127.0.0.1</listen_host>

Rundeck-unable to copy script to Windows node-Host not found

Configuration::https://1drv.ms/t/s!AizscpxS0QM4hJo9MJWA6CKzd1BOwQ (Kerberos authentication-domain user)
Can run command againts windows node,OpenSSH installed,manual scp copy works from linux to windows
scp somefile rundeck#test.com#WIN-II425CK1GMO.test.com:/C:
,public key authentication works without issues but when trying to run powershell script in Job getting:
TEST.COM#192.168.0.13
Script Failed dispatching to node DC: [jsch-scp] Failed copying the file: TEST.COM#192.168.0.13
Execution failed: 55 in project windows: [Workflow result: , step failures: {1=Dispatch failed on 1 nodes: [DC: HostNotFound: [jsch-scp]
Failed copying the file: TEST.COM#192.168.0.13]}, Node failures: {DC=[HostNotFound: [jsch-scp] Failed copying the file: TEST.COM#192.168.0.13]}, status: failed]
My bet is it happens because 2 # but don't know how to bypass it
Instead of WinRM, specified ssh authentication in resources.xml
<node name="dc" description="My windows" tags="node2" hostname="192.168.0.13" osArch="x86_64" osFamily="Windows" osName="Windows Server 2016" username="rundeck" ssh-key-storage-path="keys/Linuxtopic/server.1key" />
Removed domain part in username (#test.com) so jsch-scp wasn't confused with double #

Service Fabric MultiNode X509 Cluster - Timed out waiting for Installer Service to complete

In order to create an Azure SF test environment, I created three azure VMs within a dev test lab. These are to be secured with X509s.
I used the information Here & Here
The machines are:
Windows 2016 Data Centre
On the same virtual network
All firewalls are disabled (Can ping each machine from the other)
All using the same administrator account
I have created self-signed certificates using the certsetup.ps1 file provided by the documentation. One certificate for Server & Cluster combined as suggested.
If I run the TestConfiguration.ps1, I am given the following output.
LocalAdminPrivilege : True
IsJsonValid : True
IsCabValid :
RequiredPortsOpen : True
RemoteRegistryAvailable : True
FirewallAvailable : True
RpcCheckPassed : True
NoConflictingInstallations : True
FabricInstallable : True
DataDrivesAvailable : True
Passed : True
Obviously the IsCabValid field is blank, but the "Passed" field still suggests installation is possible. I continue to run the next powershell command to begin installation.
.\CreateServiceFabricCluster.ps1 -ClusterConfigFilePath
.\ClusterConfig.X509.MultiMachine.json
Following the above command, the process starts up and the console window is populated with the following text which suggests inter-node communication is fine..
Creating Service Fabric Cluster...
If it's taking too long, please check in Task Manager details and see if Fabric.exe for each node is running. If not, please look at: 1. traces in DeploymentTraces directory and 2. traces in FabricLogRoot configured in ClusterConfig.json.
Trace folder already exists. Traces will be written to existing trace folder: C:\StandaloneCluster\DeploymentTraces
Running Best Practices Analyzer...
Best Practices Analyzer completed successfully.
Creating Service Fabric Cluster...
Processing and validating cluster config.
Configuring nodes.
Default installation directory chosen based on system drive of machine '10.0.0.4'.
Copying installer to all machines.
Configuring machine '10.0.0.4'.
Configuring machine '10.0.0.5'.
Configuring machine '10.0.0.6'.
Machine 10.0.0.6 configured.
Machine 10.0.0.5 configured.
Machine 10.0.0.4 configured.
Running Fabric service installation.
Successfully started FabricInstallerSvc on machine 10.0.0.4
Successfully started FabricInstallerSvc on machine 10.0.0.6
Successfully started FabricInstallerSvc on machine 10.0.0.5
A long pause of a few minutes occurs after which the time out error is displayed, but with no real indication as to why. I have searched the window logs on the nodes, but have not been able to uncover any further information. The error displayed in the PS console is as follows:
Timed out waiting for Installer Service to complete for machine 10.0.0.4. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
Timed out waiting for Installer Service to complete for machine 10.0.0.6. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
Timed out waiting for Installer Service to complete for machine 10.0.0.5. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
CreateCluster Error: System.AggregateException: One or more errors occurred. ---> System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.5. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeploye
r -> Fabric
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )
--- End of inner exception stack trace ---
at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at System.Threading.Tasks.Parallel.ForWorker[TLocal](Int32 fromInclusive, Int32 toExclusive, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Func`4 bodyWithLocal, Func`1 localInit, Action`1 localFinally)
at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1 source, ParallelOptions parallelOptions, Action`1 body, Action`2 bodyWithState, Action`3 bodyWithStateAndIndex, Func`4 bodyWithStateAndLocal, Func`5 bodyWithEverything, Func`1 localInit, Ac
tion`1 localFinally)
at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1 source, Action`1 body)
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.RunFabricServices(List`1 machines, FabricPackageType fabricPackageType)
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.<CreateClusterAsyncInternal>d__7.MoveNext()
---> (Inner Exception #0) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.5. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---
---> (Inner Exception #1) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.6. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---
---> (Inner Exception #2) System.ServiceProcess.TimeoutException: Timed out waiting for Installer Service to complete for machine 10.0.0.4. Investigation order: FabricInstallerService -> FabricSetup -> FabricDeployer -> Fabric
at Microsoft.ServiceFabric.DeploymentManager.DeploymentManagerInternal.StartAndValidateInstallerServiceCompletion(String machineName, ServiceController installerSvc)
at System.Threading.Tasks.Parallel.<>c__DisplayClass17_0`1.<ForWorker>b__1()
at System.Threading.Tasks.Task.InnerInvokeWithArg(Task childTask)
at System.Threading.Tasks.Task.<>c__DisplayClass176_0.<ExecuteSelfReplicating>b__0(Object )<---
Trace folder already exists. Traces will be written to existing trace folder: C:\StandaloneCluster\DeploymentTraces
Cleaning up faulted installation.
Removing configuration from machine 10.0.0.5
Removing configuration from machine 10.0.0.4
Removing configuration from machine 10.0.0.6
Is there an Azure SF aficionado out there who can shed some light on the matter, or offer any suggestions as to where I am going wrong?
This is a generic failure pattern seen when FabricHost is failing to come up, which could happen for a number of reasons.
Since you are using raw Azure VMs instead of the SF VMSS deployment, you will also have to make sure the upstream ports set under the cluster configuration NodeType are open on each machine. To test this is set up correctly, try to deploy an unsecured cluster across these VMs first.
If the above works, to investigate, run deployment using the -NoCleanupOnFailure flag and check on one of the failing machines the event logs under "Applications and Services Logs > Microsoft-Service Fabric > Admin".
Error/Warning logs should indicate if there is an issue reading the cert, or if there is any other blocking issue. Check that the cert is ACLed to NETWORK SERVICE on each machine, as that is one of the listed requirements written in the doc.
One of the other common failures happens when the cert thumbprint contains invalid characters. There is a bug in the Windows cert management tool that causes the displayed thumbprint to contain such hidden invalid characters, that when copied straight into the config, leads to deployment issues. Please validate using a hex editor (such as HxD) the config thumbprint only contains valid characters.
If this doesn't provide enough information for you to figure out the issue, please run the Log Collector tool from Tools\Microsoft.Azure.ServiceFabric.WindowsServer.SupportPackage.zip contained in the Standalone package, and upload the collected logs to your choice of storage to share with our team. You can mail the link to sfsa#microsoft.com and we can help you look into this.
For cluster/ server/ reverseProxy certs, 1) their private key loading privilege needs to be ACLed to ‘Network Service’, and 2) their CA certs needs to be added to TrustedRoot.

RabbitMQ (OSX) : ERROR: epmd error for host x1-6-20-0c-c8-19-6b-bd: timeout (timed out)

I'm working on OSX 10.10.5 and installed RabbitMQ using the tarball.
Running it via the script :
bash sbin/rabbitmq-server
The first time it ran, but after a restart, it is giving out this error :
ERROR: epmd error for host x1-6-20-0c-c8-19-6b-bd: timeout (timed out)
sbin/rabbitmqctl status returns this :
Status of node 'rabbit#x1-6-20-0c-c8-19-6b-bd' ...
Error: unable to connect to node 'rabbit#x1-6-20-0c-c8-19-6b-bd': nodedown
DIAGNOSTICS
===========
attempted to contact: ['rabbit#x1-6-20-0c-c8-19-6b-bd']
rabbit#x1-6-20-0c-c8-19-6b-bd:
* unable to connect to epmd (port 4369) on x1-6-20-0c-c8-19-6b-bd: timeout (timed out)
current node details:
- node name: 'rabbitmq-cli-25#x1-6-20-0c-c8-19-6b-bd'
- home dir: /Users/mohit
- cookie hash: FOxL2w3eJGpNkenIS5ebSw==
Please help me resolve this, thanks!
Update : Interestingly it works when i switch back to my personal network from the office network. Possibly something to do with port / network firewall?
Add a configuration file:
/etc/rabbitmq/rabbitmq-env.conf
Add a line as below:
NODENAME=rabbit#localhost

Websphere Scripting - Error while SyncNode

Below is the jacl script which I use for syncing the Node in WAS 7.
#Sync Node Changes
puts "Begin SyncNode.."
set Sync1 [$AdminControl completeObjectName type=NodeSync,process=nodeagent,node=Profile01Node600,*]
set Sync2 [$AdminControl completeObjectName type=NodeSync,process=nodeagent,node=Profile02Node601,*]
$AdminControl invoke $Sync1 sync
$AdminControl invoke $Sync2 sync
puts "SyncNode Complete"
The environment is clustered, post deployment of the EAR file, I'm invoking this jacl script to sync the changes to Nodes.
The error I get when running the script:
WASX7209I: Connected to process "dmgr" on node wAMLDmgrNode using SOAP connector; The type of process is: DeploymentManager
Begin SyncNode..
WASX7017E: Exception received while running file "xxx/xxx/xxx.jacl"; exception information: com.ibm.ws.scripting.ScriptingException: WASX7025E: Error found in String ""; cannot create ObjectName.
What is the reason for Sync1 being ''?
Do we need to use process=nodeagent in the command?
What will be the result if the nodeagent is started and stopped?
The error message:
WASX7017E: Exception received while running file "xxx/xxx/xxx.jacl"; exception information: com.ibm.ws.scripting.ScriptingException: WASX7025E: Error found in String ""; cannot create ObjectName.
... tells that the ObjectName was not found. Most likely:
your nodeagent is down
there is a typo in node name (Profile01Node600 & Profile01Node601)
You can check which nodeagents (their NodeSync MBeans) are available by running this command:
$AdminControl queryNames WebSphere:*,type=NodeSync
Based on the output you can fix the typo.
If unavailability of nodeagent is the issue, then you can cater for that in your script by checking if completeObjectName returned an empty string.

Resources