aerospike on openvz 2core 4Gb ram doesn't start and doesn't give errors - vps

after installation without any trouble I've started aerospike on a openvz vps with 2cores and 4gb ram.
this is the result:
root#outland:~# /etc/init.d/aerospike start
* Start aerospike: asd [OK]
then check for running asd:
root#outland:~# /etc/init.d/aerospike status
* Halt aerospike: asd [fail]
what is going wrong?
adding logs:
Mar 03 2015 15:17:57 GMT: INFO (config): (cfg.c::3033) system file descriptor limit: 100000, proto-fd-max: 15000
Mar 03 2015 15:17:57 GMT: WARNING (cf:misc): (id.c::249) Tried eth,bond,wlan and list of all available interfaces on device.Failed to retrieve physical address with errno 19 No such device
Mar 03 2015 15:17:57 GMT: CRITICAL (config): (cfg.c:3363) could not get unique id and/or ip address
Mar 03 2015 15:17:57 GMT: WARNING (as): (signal.c::120) SIGINT received, shutting down
Mar 03 2015 15:17:57 GMT: WARNING (as): (signal.c::123) startup was not complete, exiting immediately

This is your config problem
Mar 03 2015 15:17:57 GMT: WARNING (cf:misc): (id.c::249) Tried eth,bond,wlan and list of all available interfaces on device.Failed to retrieve physical address with errno 19 No such device
Mar 03 2015 15:17:57 GMT: CRITICAL (config): (cfg.c:3363) could not get unique id and/or ip address
Basically the vps has a non standard interface name.
The solution is to add your interface name as network-interface-name to the config.
http://www.aerospike.com/docs/operations/troubleshoot/startup/#problem-with-network-interface
Which OS are your using btw?

Related

Analysis of Redis error logs "LOADING Redis is loading the dataset in memory" and more

I am frequently seeing these messages in the redis logs
1#
602854:M 23 Dec 2022 09:48:54.028 * 10 changes in 300 seconds. Saving...
602854:M 23 Dec 2022 09:48:54.035 * Background saving started by pid 3266364
3266364:C 23 Dec 2022 09:48:55.844 * DB saved on disk
3266364:C 23 Dec 2022 09:48:55.852 * RDB: 12 MB of memory used by copy-on-write
602854:M 23 Dec 2022 09:48:55.938 * Background saving terminated with success
2#
LOADING Redis is loading the dataset in memory
3#
7678:signal-handler (1671738516) Received SIGTERM scheduling shutdown...
7678:M 22 Dec 2022 23:48:36.300 # User requested shutdown...
7678:M 22 Dec 2022 23:48:36.300 # systemd supervision requested, but NOTIFY_SOCKET not found
7678:M 22 Dec 2022 23:48:36.300 * Saving the final RDB snapshot before exiting.
7678:M 22 Dec 2022 23:48:36.300 # systemd supervision requested, but NOTIFY_SOCKET not found
7678:M 22 Dec 2022 23:48:36.720 * DB saved on disk
7678:M 22 Dec 2022 23:48:36.720 * Removing the pid file.
7678:M 22 Dec 2022 23:48:36.720 # Redis is now ready to exit, bye bye...
7901:C 22 Dec 2022 23:48:37.071 # WARNING supervised by systemd - you MUST set appropriate values for TimeoutStartSec and TimeoutStopSec in your service unit.
7901:C 22 Dec 2022 23:48:37.071 # systemd supervision requested, but NOTIFY_SOCKET not found
7914:C 22 Dec 2022 23:48:37.071 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
7914:C 22 Dec 2022 23:48:37.071 # Redis version=6.0.9, bits=64, commit=00000000, modified=0, pid=7914, just started
7914:C 22 Dec 2022 23:48:37.071 # Configuration loaded
Are these messages concerning?
Let me know if there's any optimization to be carried out in terms of settings.
The first set of informational messages is related to Redis persistence: it appears your Redis node is configured to save the database to disk if 300 seconds elapsed and it surpassed 10 write operations against it. You can change that according to your needs through the Redis configuration file.
The message LOADING Redis is loading the dataset in memory, on the other side, is an error message received while attempting to connect to a Redis instance which is loading its dataset in memory: that occurs during the startup for standalone servers and master nodes or when replicas reconnect and fully resynchronize with master. If you are seeing this error too often and not right after a system restart, I would suggest to check your system log files and learn why your Redis instance is restarting or resynchronizing (depending on your topology).

Ruby execute system command while running as service (systemd) on ubuntu

I want to run a ruby program as a service in Ubuntu but in my code I have a system call to execute an external program:
system(frame3dd, 'file.csv', 'file.out')
It works just fine when I run the program in terminal but as soon as I run it as a service, frame3dd returns error code 12. according to the manual it means:
12 : error in opening the temporary cleaned input data file for writing
Oct 09 10:56:57 vps594898 ruby[6900]: FRAME3DD version: 20140514+
Oct 09 10:56:57 vps594898 ruby[6900]: Analysis of 2D and 3D structural
frames with elastic and geometric stiffness.
Oct 09 10:56:57 vps594898 ruby[6900]: http://frame3dd.sf.net
Oct 09 10:56:57 vps594898 ruby[6900]: GPL Copyright (C) 1992-2014, Henri P.
Gavin
Oct 09 10:56:57 vps594898 ruby[6900]: This is free software with absolutely
no warranty.
Oct 09 10:56:57 vps594898 ruby[6900]: For details, see the GPL license file,
LICENSE.txt
Oct 09 10:56:57 vps594898 ruby[6900]: false
Oct 09 11:42:19 vps594898 ruby[14314]: #<Process::Status: pid 14415 exit 12>
systemd config file:
[Unit]
Description=BK Geveldragers staging
[Service]
User=server
Group=server
PIDFile=/home/server/bk_projecten/staging/server.pid
WorkingDirectory=/home/server/bk_projecten/staging
Restart=always
ExecStart=/usr/bin/ruby server.rb
ExecStop=/bin/kill -s QUIT $MAINPID
[Install]
WantedBy=multi-user.target
The problem is solved. Frame3dd creates temporary files in /tmp. Some of these files were created with root. When I was running Frame3dd with a non root account the program exits with error code 12 because it coudn't overwrite these files.

Is there any solution to the XFS lockup in linux?

Apparently there is a known problem of XFS locking up the kernel/processes and corrupting volumes under heavy traffic.
Some web pages talk about it, but I was not able to figure out which pages are new and may have a solution.
My company's deployments have Debian with kernel 3.4.107, xfsprogs 3.1.4, and large storage arrays.
We have large data (PB) and high throughput (GB/sec) using async IO to several large volumes.
We constantly experience these unpredictable lockups on several systems.
Kernel logs/dmesg show something like the following:
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986515] INFO: task Sr2dReceiver-5:46829 blocked for more than 120 seconds.
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986518] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986520] Sr2dReceiver-5 D ffffffff8105b39e 0 46829 7284 0x00000000
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986524] ffff881e71f57b38 0000000000000082 000000000000000b ffff884066763180
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986529] 0000000000000000 ffff884066763180 0000000000011180 0000000000011180
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986532] ffff881e71f57fd8 ffff881e71f56000 0000000000011180 ffff881e71f56000
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986536] Call Trace:
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986545] [<ffffffff814ffe9f>] schedule+0x64/0x66
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986548] [<ffffffff815005f3>] rwsem_down_failed_common+0xdb/0x10d
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986551] [<ffffffff81500638>] rwsem_down_write_failed+0x13/0x15
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986555] [<ffffffff8126b583>] call_rwsem_down_write_failed+0x13/0x20
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986558] [<ffffffff814ff320>] ? down_write+0x25/0x27
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986572] [<ffffffffa01f29e0>] xfs_ilock+0xbc/0x12e [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986580] [<ffffffffa01eec71>] xfs_rw_ilock+0x2c/0x33 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986586] [<ffffffffa01eec71>] ? xfs_rw_ilock+0x2c/0x33 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986593] [<ffffffffa01ef234>] xfs_file_aio_write_checks+0x41/0xfe [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986600] [<ffffffffa01ef358>] xfs_file_buffered_aio_write+0x67/0x179 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986603] [<ffffffff8150099a>] ? _raw_spin_unlock_irqrestore+0x30/0x3d
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986611] [<ffffffffa01ef81d>] xfs_file_aio_write+0x163/0x1b5 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986614] [<ffffffff8106f1af>] ? futex_wait+0x22c/0x244
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986619] [<ffffffff8110038e>] do_sync_write+0xd9/0x116
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986622] [<ffffffff8150095f>] ? _raw_spin_unlock+0x26/0x31
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986634] [<ffffffff8106f2f1>] ? futex_wake+0xe8/0xfa
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986637] [<ffffffff81100d1d>] vfs_write+0xae/0x10a
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986639] [<ffffffff811015b3>] ? fget_light+0xb0/0xbf
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986642] [<ffffffff81100dd3>] sys_pwrite64+0x5a/0x79
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986645] [<ffffffff81506912>] system_call_fastpath+0x16/0x1b
Lockups leave the system in a bad state. The processes in D state that hang cannot even be killed with signal 9.
The only way to resume operations is to reboot, repair XFS and then the system works for another while.
But occasionally after the lockup we cannot even repair some volumes, as they get totally corrupted and we need to rebuild them with mkfs.
As a last resort, we now run xfs-repair periodically and this reduced the frequency of lockups and data loss to a certain extent.
But the incidents still occur often enough, so we need some solution.
I was wondering if there is a solution for this with kernel 3.4.107, e.g. some patch that we may apply.
Due to the large number of deployments and other software issues, we cannot upgrade the kernel in the near future.
However, we are working towards updating our applications so that we can run on kernel 3.16 in our next releases.
Does anyone know if this XFS lockup problem was fixed in 3.16?
Some people have experienced this but it was not a problem with XFS it was because the kernel was unable to flush dirty pages within the 120s time period. Have a look here but please check the numbers they're using as default on your own system.
http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/
and here
http://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/
You can see what you're dirty cache ratio is by running this
sysctl -a | grep dirty
or
cat /proc/sys/vm/dirty_ratio
The best write up on this I could find is here...
https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
Essentially you need to tune your application to make sure that it can write the dirty buffers to disk within the time period or change the timer period etc.
You can also see some interesting paramaters as follows
sysctl -a | grep hung
You could increase the timeout permanently using /etc/sysctl.conf as follows...
kernel.hung_task_timeout_secs = 300
Does anyone know if this XFS lockup problem was fixed in 3.16?
It is said so in A Short Guide to Kernel Debugging:
Searching for “xfs splice deadlock” turns up an email thread from 2011 that describes this
problem. However, bisecting the kernel source repository shows that
the bug wasn’t really addressed until April, 2014 (8d02076) for release in Linux 3.16.

Gwan stops working every night

I have a arch 64bit VPS on digitalocean. I installed gwan and run it in deamon mode. It stopped running every midnight.
Here is the log file
[Wed Apr 24 06:10:28 2013 GMT] memory footprint: 3.78 MiB
[Thu, 25 Apr 2013 00:00:19 GMT] * child abort(8) coredump
[Thu, 25 Apr 2013 00:00:19 GMT] * child abort(8) coredump
[Thu, 25 Apr 2013 00:00:19 GMT] * child abort(8) coredump
[Thu, 25 Apr 2013 00:00:19 GMT] * child died 3 times within 3 seconds
[Thu Apr 25 12:39:39 2013 GMT] memory footprint: 3.77 MiB.
[Thu Apr 25 12:39:56 2013 GMT] loaded maintenance script/opt/gwan_linux64-bit/0.0.0.0_8080/#0.0.0.0/csp/crash.c 43.14 KiB MD5:820cf6b4-2152b838-08a13fcb-5f0dc4be
[Fri, 26 Apr 2013 00:00:10 GMT] * child abort(8) coredump
[Fri, 26 Apr 2013 00:00:10 GMT] * child abort(8) coredump
[Fri, 26 Apr 2013 00:00:10 GMT] * child abort(8) coredump
[Fri, 26 Apr 2013 00:00:10 GMT] * child died 3 times within 3 seconds
This problem does not happen on all platforms and so far all the user reports we received used hypervisors which alter the CPU and OS behavior in erratic and undocumented ways (not to cite the additional bugs they inject into the system).
UPDATE
That new problem for 4-years old code that worked fine so far is a platform issue, for which we have found a workaround, to be published with the next release in a few weeks.

Weird behavior with gwan v3.12.26

I just installed the latest Xmas gift from gwan team, but I'm having some problems:
Segmentation fault with archlinux .
On Ubuntu strange behavior.
I can't run any script on it.
About #1, Archlinux is up to date and uses the 2.16 GLIBC.
About #2, I'm loading http://188.165.219.99:8080/100.html sometimes it display 100 X, sometimes an error page (with CSS) and sometimes an error page without CSS.
About #3, I can't run any csp script:
http://188.165.219.99:8080/?hello.c
http://188.165.219.99:8080/?hello.rb
http://188.165.219.99:8080/?hello.php
None of the above work. Has the csp url changed?
I have installed php5-cli and ruby on my ubuntu.
For informations:
# ldd --version
ldd (Ubuntu EGLIBC 2.11.1-0ubuntu7.11) 2.11.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
Here the log on archlinux
# cat logs/gwan.log
[Thu Dec 27 14:03:44 2012 GMT] ------------------------------------------------
[Thu Dec 27 14:03:44 2012 GMT] G-WAN 3.12.26 64-bit (Dec 26 2012 13:58:12)
[Thu Dec 27 14:03:44 2012 GMT] ------------------------------------------------
[Thu Dec 27 14:03:44 2012 GMT] Local Time: Thu, 27 Dec 2012 16:03:44 GMT+2
[Thu Dec 27 14:03:44 2012 GMT] RAM: (1.60 GiB free + 0 shared + 834.57 MiB buffers) / 23.64 GiB total
[Thu Dec 27 14:03:44 2012 GMT] Physical Pages: 1.60 GiB / 23.64 GiB
[Thu Dec 27 14:03:44 2012 GMT] DISK: 1.71 TiB free / 1.88 TiB total
[Thu Dec 27 14:03:44 2012 GMT] 336 processes, including pid:1545 './gwan'
[Thu Dec 27 14:03:44 2012 GMT] Multi-Core, HT enabled
[Thu Dec 27 14:03:44 2012 GMT] 1x Intel(R) Xeon(R) CPU W3520 # 2.67GHz (4 Core(s)/CPU, 2 thread(s)/Core)
[Thu Dec 27 14:03:44 2012 GMT] using 4 workers 0[1111]3
[Thu Dec 27 14:03:44 2012 GMT] among 8 threads 0[11110000]7
[Thu Dec 27 14:03:44 2012 GMT] 64-bit little-endian (least significant byte first)
1: segmentation fault with archlinux
Many informations are listed at the top of the /logs/wgan.log file. It would immensely help to see it (that's what log files are for).
2: on ubuntu strange behavior
Not knowing the nature of the "strange behavior" nor the Ubuntu version makes difficult to answer your question (if that's a question).
3: I can't run any script on it
We received more than 50 emails this morning alove. None (but yours) reports that scripts fail to run. So far, people report great results with all languages.
Check your files permissions (can the G-WAN account read the script files?) and verify that the relevant compilers / runtimes are installed.
Ogla's report suggests that javascript embedded in HTML (the new release does minifying there) migh be the cause of the cut files.
Other than that, the fact that the daemon mode is broken in v3.12.25/26 may explain your problems if you run in daemon mode.

Resources