macOS Server 5.3 Calendar pg_ctl not starting - macos

After updating macOS Server to 5.3 (running on macOS 10.12.4) my Calendar & Contacts have stopped syncing.
It seems that it's having trouble starting Postgres for cluster /Library/Server/Calendar and Contacts/Data/Database.xpg/cluster.pg and possibly trouble with the agent too.
The GUI seems to think that the Calendar & Contacts services have started and are available, but when I run $ sudo serveradmin fullstatus calendar from the command line I get:
calendar:setStateVersion = 1
calendar:readWriteSettingsVersion = 1
calendar:state = "STARTING"
calendar:contactsState = "STARTING"
calendar:calendarState = "STARTING"
System log is being spammed with:
Apr 22 11:58:42 com.apple.xpc.launchd[1] (org.calendarserver.agent[44649]): Service exited with abnormal code: 1
Apr 22 11:58:42 com.apple.xpc.launchd[1] (org.calendarserver.agent): Service only ran for 0 seconds. Pushing respawn out by 10 seconds.
Apr 22 11:58:52 com.apple.xpc.launchd[1] (org.calendarserver.agent[44659]): Service exited with abnormal code: 1
Apr 22 11:58:52 com.apple.xpc.launchd[1] (org.calendarserver.agent): Service only ran for 0 seconds. Pushing respawn out by 10 seconds.
Apr 22 11:59:02 com.apple.xpc.launchd[1] (org.calendarserver.agent[44668]): Service exited with abnormal code: 1
Apr 22 11:59:02 com.apple.xpc.launchd[1] (org.calendarserver.agent): Service only ran for 0 seconds. Pushing respawn out by 10 seconds.
Apr 22 11:59:07 com.apple.xpc.launchd[1] (org.calendarserver.calendarserver[44676]): Service exited with abnormal code: 1
Apr 22 11:59:07 com.apple.xpc.launchd[1] (org.calendarserver.calendarserver): Service only ran for 0 seconds. Pushing respawn out by 60 seconds.
Here's the output of $ sudo /Applications/Server.app/Contents/ServerRoot/usr/sbin/calendarserver_diagnose
Any ideas?
OS Build: 16E195
Server Build: 16S4123
/Library/Server/Preferences/Calendar.plist exists and can be parsed
Prefs plist says ServerRoot directory is: /Library/Server/Calendar and Contacts
ServerRoot volume ok
/Library/Server/Calendar and Contacts/Config/caldavd-system.plist exists and can be parsed
/Library/Server/Calendar and Contacts/Config/caldavd-user.plist does not exist
Configuration:
Calendar and Contacts service processes:
USER PID %CPU %MEM RSS ELAPSED STARTED COMMAND
root 42554 0.0 0.1 11072 07:49 Sat 22 Apr 11:32:16 2017 servermgr_calendar
Serverd status:
org.calendarserver.agent is enabled
org.calendarserver.calendarserver is enabled
org.calendarserver.relocate is enabled
Disk space on boot volume:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk1 999G 777G 222G 78% 8520180 4286447099 0% /
Disk space on service data volume:
Filesystem Size Used Avail Capacity iused ifree %iused Mounted on
/dev/disk1 999G 777G 222G 78% 8520180 4286447099 0% /
Disk space used by Calendar and Contacts service:
20K /Library/Server/Calendar and Contacts/Config
1014M /Library/Server/Calendar and Contacts/Data
200M /Library/Server/Calendar and Contacts/Logs
Postgres status for cluster /Library/Server/Calendar and Contacts/Data/Database.xpg/cluster.pg:
pg_ctl: no server running
Agent:
Attempting to send a request to the agent...
Can't connect to agent: timed out
Server connection:
Traceback (most recent call last):
File "/Applications/Server.app/Contents/ServerRoot/usr/sbin/calendarserver_diagnose", line 14, in <module>
load_entry_point('CalendarServer==9.1a1.dev0+56b4197875debefef19d9c19840f903a8e480c88.head', 'console_scripts', 'calendarserver_diagnose')()
File "/Applications/Server.app/Contents/ServerRoot/Library/CalendarServer/lib/python2.7/site-packages/calendarserver/tools/diagnose.py", line 145, in main
connectToCaldavd(keys)
File "/Applications/Server.app/Contents/ServerRoot/Library/CalendarServer/lib/python2.7/site-packages/calendarserver/tools/diagnose.py", line 584, in connectToCaldavd
url = "https://{host}/principals/".format(host=keys["ServerHostName"])
KeyError: 'ServerHostName'

Related

Analysis of Redis error logs "LOADING Redis is loading the dataset in memory" and more

I am frequently seeing these messages in the redis logs
1#
602854:M 23 Dec 2022 09:48:54.028 * 10 changes in 300 seconds. Saving...
602854:M 23 Dec 2022 09:48:54.035 * Background saving started by pid 3266364
3266364:C 23 Dec 2022 09:48:55.844 * DB saved on disk
3266364:C 23 Dec 2022 09:48:55.852 * RDB: 12 MB of memory used by copy-on-write
602854:M 23 Dec 2022 09:48:55.938 * Background saving terminated with success
2#
LOADING Redis is loading the dataset in memory
3#
7678:signal-handler (1671738516) Received SIGTERM scheduling shutdown...
7678:M 22 Dec 2022 23:48:36.300 # User requested shutdown...
7678:M 22 Dec 2022 23:48:36.300 # systemd supervision requested, but NOTIFY_SOCKET not found
7678:M 22 Dec 2022 23:48:36.300 * Saving the final RDB snapshot before exiting.
7678:M 22 Dec 2022 23:48:36.300 # systemd supervision requested, but NOTIFY_SOCKET not found
7678:M 22 Dec 2022 23:48:36.720 * DB saved on disk
7678:M 22 Dec 2022 23:48:36.720 * Removing the pid file.
7678:M 22 Dec 2022 23:48:36.720 # Redis is now ready to exit, bye bye...
7901:C 22 Dec 2022 23:48:37.071 # WARNING supervised by systemd - you MUST set appropriate values for TimeoutStartSec and TimeoutStopSec in your service unit.
7901:C 22 Dec 2022 23:48:37.071 # systemd supervision requested, but NOTIFY_SOCKET not found
7914:C 22 Dec 2022 23:48:37.071 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
7914:C 22 Dec 2022 23:48:37.071 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=7914, just started
7914:C 22 Dec 2022 23:48:37.071 # Configuration loaded
Are these messages concerning?
Let me know if there's any optimization to be carried out in terms of settings.
The first set of informational messages is related to Redis persistence: it appears your Redis node is configured to save the database to disk if 300 seconds elapsed and it surpassed 10 write operations against it. You can change that according to your needs through the Redis configuration file.
The message LOADING Redis is loading the dataset in memory, on the other side, is an error message received while attempting to connect to a Redis instance which is loading its dataset in memory: that occurs during the startup for standalone servers and master nodes or when replicas reconnect and fully resynchronize with master. If you are seeing this error too often and not right after a system restart, I would suggest to check your system log files and learn why your Redis instance is restarting or resynchronizing (depending on your topology).

AWS Linux 2 AMI Failed to get D-Bus connection: No such file or directory

I have an AWS Linux 2 AMI EC2 instance.
When running systemctl --user status I get the message:
Failed to get D-Bus connection: No such file or directory
I then ran systemctl start dbus.socket, which gave me this message:
Failed to start dbus.socket: The name org.freedesktop.PolicyKit1 was not provided by any .service files See system logs and 'systemctl status dbus.socket' for details.
I then ran systemctl status dbus.socket -l which returned this:
dbus.socket - D-Bus System Message Bus Socket
Loaded: loaded (/usr/lib/systemd/system/dbus.socket; static; vendor preset: disabled)
Active: active (running) since Thu 2022-03-31 21:26:42 UTC; 14h ago
Listen: /run/dbus/system_bus_socket (Stream)
Mar 31 21:26:42 ip-10-0-0-193.ec2.internal systemd[1]: Listening on D-Bus System Message Bus Socket.
Mar 31 21:26:42 ip-10-0-0-193.ec2.internal systemd[1]: Starting D-Bus System Message Bus Socket.
Running sudo systemctl --user status gives a different error:
Failed to get D-Bus connection: Connection refused
I'm unsure of what to investigate next or what steps to take to resolve the issue.

Install of elastic 7.5 on RHEL 7.8 makes memory violation sig=6 due to JNA

I am installing a brand new elasticsearch 7.5 on OS:Red Hat Enterprise Linux Server release 7.8 (Maipo)
At startup of the service, I have hard failure. here is what the service info provides
● elasticsearch.service - Elasticsearch
Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled)
Active: failed (Result: signal) since Tue 2020-08-25 11:34:39 CEST; 7min ago
Docs: http://www.elastic.co
Process: 102777 ExecStart=/usr/share/elasticsearch/bin/elasticsearch -p ${PID_DIR}/elasticsearch.pid --quiet (code=killed, signal=ABRT)
Main PID: 102777 (code=killed, signal=ABRT)
CGroup: /system.slice/elasticsearch.service
Aug 25 11:34:34 sv-1348lvd44.esante.local systemd[1]: Starting Elasticsearch...
Aug 25 11:34:35 sv-1348lvd44.esante.local elasticsearch[102777]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated...lease.
Aug 25 11:34:39 sv-1348lvd44.esante.local systemd[1]: elasticsearch.service: main process exited, code=killed, status=6/ABRT
Aug 25 11:34:39 sv-1348lvd44.esante.local systemd[1]: Failed to start Elasticsearch.
Aug 25 11:34:39 sv-1348lvd44.esante.local systemd[1]: Unit elasticsearch.service entered failed state.
Aug 25 11:34:39 sv-1348lvd44.esante.local systemd[1]: elasticsearch.service failed.
when using journalctl -xe
Aug 25 11:34:38 sv-1348lvd44.esante.local audispd[824]: node=sv-1348lvd44.esante.local type=ANOM_ABEND msg=audit(1598348078.836:208066): auid=429496 uid=995 gid=991 ses=4294967295 subj=system_u:system_r:unconfined_service_t:s0 pid=102777 comm="java" reason="memory violation" sig=6
Aug 25 11:34:39 sv-1348lvd44.esante.local systemd[1]: elasticsearch.service: main process exited, code=killed, status=6/ABRT
Aug 25 11:34:39 sv-1348lvd44.esante.local systemd[1]: Failed to start Elasticsearch.
when looking into the dump hs_err_pidXXXX I have.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f4818939b85, pid=52870, tid=52933
#
# JRE version: OpenJDK Runtime Environment (13.0.1+9) (build 13.0.1+9)
# Java VM: OpenJDK 64-Bit Server VM (13.0.1+9, mixed mode, sharing, tiered, compressed oops, concurrent mark sweep gc, linux-amd64)
# Problematic frame:
# C [jna515356041985641679.tmp+0x12b85] ffi_prep_closure_loc+0x15
[OS:Red Hat Enterprise Linux Server release 7.8 (Maipo)
uname:Linux 3.10.0-1127.10.1.el7.x86_64 #1 SMP Tue May 26 15:05:43 EDT 2020 x86_64
libc:glibc 2.17 NPTL 2.17
rlimit: STACK 8192k, CORE 0k, NPROC 4096, NOFILE 65535, AS infinity, DATA infinity, FSIZE infinity
load average:0.08 0.03 0.05
.../...
It works like a charm on CentOS without doing anything.
For RHEL, I already fixed the stuff about JNA by adding ES_TMPDIR=/var/es-temp into /etc/sysconfig/elasticsearch as
Memory seems fine. this is a brand new VM. (no application logs into /var/logs)
Seems that this version is supposed to be supported
I tested with -Xms2g -Xmx2g, -Xms1g -Xmx1g, -Xms512m -Xmx512m but same error.
I don't get what is going wrong. My Next step is to test with another version 7 of elasticsearch.
After 1 day of struggling, I found the solution at https://discuss.elastic.co/t/elasticsearch-v7-6-2-failed-to-start-killed-by-sigabrt-on-rhel-7-7-urgent/231039/11 from Ivan_A_Carrazana_C
I put here a copy of the steps to perform:
Hi
If you are applying a security compliance in your RHEL installation you must change the >path of the TMP directory that will use elasticsearch as Java.
Uncomment at /etc/elasticsearch/jvm.options
-Djava.io.tmpdir=${ES_TMPDIR}
Add in /etc /sysconfig/elasticsearch
ES_TMPDIR=/usr/share/elasticsearch/tmp
Create the /usr/share/elasticsearch/tmp directory and make sure that the owner and group >are elasticsearch and the permissions are 0755
Lastly make sure that /dev/shm doesn't have the noexec attribute with command:
mount | grep tmpfs | grep '/dev/shm'
Expected result:
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,seclabel)
If you get output like these:
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,noexec,seclabel)
Add or modify in /etc/fstab the following line:
tmpfs /dev/shm tmpfs defaults,nodev,nosuid 0 0
I had the same problem and this worked for me. Hope i can help you
Seems to be known by elastic but not documented correctly. don't undertand why the tmpfs should in noexec. Would be good to have an JNA expert feedback about it.
For some reason, adding a TMPDIR var to /etc/sysconfig/elasticsearch worked (on 7.7.1) and pointing it to the same location as -Djava.io.tmpdir.
i.e.
TMPDIR="/usr/share/elasticsearch/tmp"
(in my case I actually used /var/lib/elasticsearch/tmp with 0755 permissions on it).
I can't say why, and it doesn't change the call string used if I look at 'ps -aef' . But just having -Djava.io.tmpdir wasn't enough.
This allowed me to get it to work without removing noexec on /tmp and /dev/shm.

How to set up autosearch nodes in Elasticsearch 6.1

I have created cluster of 5 nodes in ES 6.1. I am able to create cluster when I added line with all ip addresses of other nodes into configuration file elasticsearch.yaml as discovery.zen.ping.unicast.hosts. It looks like this:
discovery.zen.ping.unicast.hosts: ["10.206.81.241","10.206.81.238","10.206.81.237","10.206.81.239"]
When I have this line in my config file, everything works well.
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
10.206.81.241 9 54 0 0.03 0.05 0.05 mi * master4
10.206.81.239 10 54 0 0.00 0.01 0.05 mi - master1
10.206.81.238 14 54 0 0.00 0.01 0.05 mi - master3
10.206.81.240 15 54 0 0.00 0.01 0.05 mi - master5
10.206.81.237 10 54 0 0.00 0.01 0.05 mi - master2
When I added discovery.zen.ping.multicast.enabled: true elasticsearch will not start.
I would like to have more nodes and if I will have to configure each file separately and add new address to each configuration every time, it is not proper way. So is there any way how to set up ES6 to find new nodes automatically?
EDIT:
journalctl -f output:
led 08 10:43:04 elk-prod3.user.dc.company.local polkitd[548]: Registered Authentication Agent for unix-process:23395:23676999 (system bus name :1.162 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8)
led 08 10:43:04 elk-prod3.user.dc.company.local systemd[1]: Stopping Elasticsearch...
led 08 10:43:04 elk-prod3.user.dc.company.local systemd[1]: Started Elasticsearch.
led 08 10:43:04 elk-prod3.user.dc.company.local systemd[1]: Starting Elasticsearch...
led 08 10:43:04 elk-prod3.user.dc.company.local polkitd[548]: Unregistered Authentication Agent for unix-process:23395:23676999 (system bus name :1.162, object path /org/freedesktop/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) (disconnected from bus)
led 08 10:43:07 elk-prod3.user.dc.company.local systemd[1]: elasticsearch.service: main process exited, code=exited, status=1/FAILURE
led 08 10:43:07 elk-prod3.user.dc.company.local systemd[1]: Unit elasticsearch.service entered failed state.
led 08 10:43:07 elk-prod3.user.dc.company.local systemd[1]: elasticsearch.service failed.
Basically you should have "stable" nodes. What i mean is that you should have IPs which are always part of cluster
discovery.zen.ping.unicast.hosts: [MASTER_NODE_IP_OR_DNS, MASTER2_NODE_IP_OR_DNS, MASTER3_NODE_IP_OR_DNS]
Then if you use autoscaling or add nodes they must "talk" to that ips to let them know that they are joining the cluster.
You haven't mentioned anything about your network setup so i can say you for sure what is wrong. But as I recall unicast hosts is recommended approach
PS. If you are using azure, there is feature called VM scaleset I modified template to my needs. Idea is that by default I am always using 3 nodes, and if my cluster is loaded scale set will add dynamically more nodes.
discovery.zen.ping.multicast has been removed from elasticsearch, see: https://www.elastic.co/guide/en/elasticsearch/plugins/6.1/discovery-multicast.html

One node in hadoop cluster failure

I have configured 10 nodes HDP hadoop cluster recently, each node is of OS SLES11..
On master node I have configured all master services and clients..also the mabari-server. Remaining nodes other slave services and their clients.
NTP sync is on, other pre-requisites also fine.
I am experiencing weird behavior on hadoop cluster, After starting all the services within few hours one of the node goes down.
When I experienced this first time, I have restarted that particular node and added back to the cluster.
Now My master node is causing the same issue due to which whole cluster is down. I have checked the logs but there are no indications related to failure.
I am clueless what is the root cause for the failure of the node in hadoop cluster?
Below are logs :-
the system which went down:
/var/log/messages
these are /var/log/messages: notice)=0', processed='source(src)=6830'
Apr 23 05:22:43 lnx1863 SuSEfirewall2: SuSEfirewall2 not active Apr 23
05:23:49 lnx1863 SuSEfirewall2: SuSEfirewall2 not active Apr 23
05:24:17 lnx1863 sudo: root : TTY=pts/0 ; PWD=/ ; USER=root ;
COMMAND=/usr/bin/du -h / Apr 23 05:24:55 lnx1863 SuSEfirewall2:
SuSEfirewall2 not active Apr 23 05:25:22 lnx1863 kernel:
[248531.127254] megasas: Found FW in FAULT state, will reset adapter.
Apr 23 05:25:22 lnx1863 kernel: [248531.127260] megaraid_sas:
resetting fusion adapter. Apr 23 05:25:22 lnx1863 kernel:
[248531.127427] megaraid_sas: Reset not supported, killing adapter.
namenode logs:-
INFO 2015-04-23 05:27:43,665 Heartbeat.py:78 - Building Heartbeat:
{responseId = 7607, timestamp = 1429781263665, commandsInProgress =
False, componentsMapped = True} INFO 2015-04-23 05:28:44,053
security.py:135 - Encountered communication error. Details:
SSLError('The read operation timed out',) ERROR 2015-04-23
05:28:44,053 Controller.py:278 - Connection to http://localhost was
lost (details=Request to
https://localhost:8441/agent/v1/heartbeat/localhostip failed due to
Error occured during connecting to the server: The read operation
timed out) INFO 2015-04-23 05:29:16,061 NetUtil.py:48 - Connecting to
https://localhost:8440/connection_info INFO 2015-04-23 05:29:16,118
security.py:93 - SSL Connect being called.. connecting to the server

Resources