Gwan stops working every night - g-wan

I have a arch 64bit VPS on digitalocean. I installed gwan and run it in deamon mode. It stopped running every midnight.
Here is the log file
[Wed Apr 24 06:10:28 2013 GMT] memory footprint: 3.78 MiB
[Thu, 25 Apr 2013 00:00:19 GMT] * child abort(8) coredump
[Thu, 25 Apr 2013 00:00:19 GMT] * child abort(8) coredump
[Thu, 25 Apr 2013 00:00:19 GMT] * child abort(8) coredump
[Thu, 25 Apr 2013 00:00:19 GMT] * child died 3 times within 3 seconds
[Thu Apr 25 12:39:39 2013 GMT] memory footprint: 3.77 MiB.
[Thu Apr 25 12:39:56 2013 GMT] loaded maintenance script/opt/gwan_linux64-bit/0.0.0.0_8080/#0.0.0.0/csp/crash.c 43.14 KiB MD5:820cf6b4-2152b838-08a13fcb-5f0dc4be
[Fri, 26 Apr 2013 00:00:10 GMT] * child abort(8) coredump
[Fri, 26 Apr 2013 00:00:10 GMT] * child abort(8) coredump
[Fri, 26 Apr 2013 00:00:10 GMT] * child abort(8) coredump
[Fri, 26 Apr 2013 00:00:10 GMT] * child died 3 times within 3 seconds

This problem does not happen on all platforms and so far all the user reports we received used hypervisors which alter the CPU and OS behavior in erratic and undocumented ways (not to cite the additional bugs they inject into the system).
UPDATE
That new problem for 4-years old code that worked fine so far is a platform issue, for which we have found a workaround, to be published with the next release in a few weeks.

Related

Trouble starting g-wan webserver on Ubuntu 22.04 LTS

I am working on Ubuntu 22.04 LTS, 64bit, Intel® Celeron(R) N4020 CPU # 1.10GHz × 2, 4.0 GiB.
I downloaded G-WAN from http://gwan.ch/download today. unzipped the file (version unzipped : G-WAN 7.12.6 64-bit (Feb 8 2016 16:33:28) ) and went to the directory.
But when I gave the command
I am getting the following error...
#sudo ./gwan <enter>
loading
can't find '' Qspupdpm!Iboemfs!
To run G-WAN, you must fix the error(s) or remove this Qspupdpm!Iboemfs!
Just in case I also deleted the existing PID file and restated (new PID was auto generated). Then also it gave the above error.
Below is the output from log/gwan.log
[Tue Jan 17 13:10:20 2023 GMT] ------------------------------------------------
[Tue Jan 17 13:10:20 2023 GMT] G-WAN 7.12.6 64-bit (Feb 8 2016 16:33:28)
[Tue Jan 17 13:10:20 2023 GMT] ------------------------------------------------
[Tue Jan 17 13:10:20 2023 GMT] /home/varshesh/numero/gwan/gwan
[Tue Jan 17 13:10:20 2023 GMT] local time: Tue, 17 Jan 2023 18:40:20 GMT+5
[Tue Jan 17 13:10:20 2023 GMT] last system reboot: 2023-01-13 09:14
[Tue Jan 17 13:10:20 2023 GMT] RAM: (222.10 MiB free + 424.54 MiB shared + 46.54 MiB buffers) / 3.70 GiB total
[Tue Jan 17 13:10:20 2023 GMT] physical pages: 222.10 MiB / 3.70 GiB
[Tue Jan 17 13:10:20 2023 GMT] disk: 22.75 GiB free / 54.75 GiB total
[Tue Jan 17 13:10:20 2023 GMT] Filesystem Type Size Used Avail Use% Mounted on
[Tue Jan 17 13:10:20 2023 GMT] tmpfs tmpfs 375M 2.3M 373M 1% /run
[Tue Jan 17 13:10:20 2023 GMT] /dev/sda2 ext4 62G 23G 37G 39% /
[Tue Jan 17 13:10:20 2023 GMT] tmpfs tmpfs 1.9G 112M 1.8G 6% /dev/shm
[Tue Jan 17 13:10:20 2023 GMT] tmpfs tmpfs 5.0M 4.0K 5.0M 1% /run/lock
[Tue Jan 17 13:10:20 2023 GMT] /dev/sda1 vfat 93M 5.3M 88M 6% /boot/efi
[Tue Jan 17 13:10:20 2023 GMT] /dev/sda3 ext4 55G 32G 20G 62% /home
[Tue Jan 17 13:10:20 2023 GMT] tmpfs tmpfs 375M 152K 375M 1% /run/user/1000
[Tue Jan 17 13:10:20 2023 GMT] 922 processes, including pid:99976 './gwan'
[Tue Jan 17 13:10:20 2023 GMT] page-size:4,096 child-max:14,684 stream-max:16
[Tue Jan 17 13:10:20 2023 GMT] CPU: 1x Intel(R) Celeron(R) N4020 CPU # 1.10GHz
[Tue Jan 17 13:10:20 2023 GMT] 0 id: 0 0
[Tue Jan 17 13:10:20 2023 GMT] 1 id: 1 1
[Tue Jan 17 13:10:20 2023 GMT] Cores: possible:0-3 present:0-1 online:0-1
[Tue Jan 17 13:10:20 2023 GMT] L1d cache: 24K line:64 0
[Tue Jan 17 13:10:20 2023 GMT] L1i cache: 32K line:64 0
[Tue Jan 17 13:10:20 2023 GMT] L2 cache: 4096K line:64 0-1
[Tue Jan 17 13:10:20 2023 GMT] NUMA node #1 0-1
[Tue Jan 17 13:10:20 2023 GMT] CPU(s):1, Core(s)/CPU:1, Thread(s)/Core:2
[Tue Jan 17 13:10:20 2023 GMT] bogomips: 2,188.80 (per physical CPU Core)
[Tue Jan 17 13:10:20 2023 GMT] virtualization: VT-x
[Tue Jan 17 13:10:20 2023 GMT] using 1 workers 0[01]0
[Tue Jan 17 13:10:20 2023 GMT] among 2 threads 0[11]1
[Tue Jan 17 13:10:20 2023 GMT] 64-bit little-endian (least significant byte first)
[Tue Jan 17 13:10:20 2023 GMT] Ubuntu 22.04 LTS \n \l (5.18.2-051802) 64-bit
[Tue Jan 17 13:10:20 2023 GMT] user: root (uid:0), group: root (uid:0)
[Tue Jan 17 13:10:20 2023 GMT] backlog: 4,096
[Tue Jan 17 13:10:20 2023 GMT] epoll_fds: 836,852
[Tue Jan 17 13:10:20 2023 GMT] port range: 4096-65500
[Tue Jan 17 13:10:20 2023 GMT] system fd_max: 1,024
[Tue Jan 17 13:10:20 2023 GMT] program fd_max: 1,024
[Tue Jan 17 13:10:20 2023 GMT] updated fd_max: 1,048,576
[Tue Jan 17 13:10:20 2023 GMT] virt max: -1, stack:8.00 MiB
[Tue Jan 17 13:10:20 2023 GMT] nic: hw nic: 00:0c.0 Network controller: Intel Corporation Gemini Lake PCH CNVi WiFi (rev 06)
[Tue Jan 17 13:10:20 2023 GMT] wlo2: driver: iwlwifi
wlo2: version: 5.18.2-051802-generic
wlo2: bus-info: 0000:00:0c.0
[Tue Jan 17 13:10:20 2023 GMT] nic: ip addresses (2)
[Tue Jan 17 13:10:20 2023 GMT] 127.0.0.1
[Tue Jan 17 13:10:20 2023 GMT] 192.168.0.112
[Tue Jan 17 13:10:20 2023 GMT] gcc version 11.3.0
[Tue Jan 17 13:10:20 2023 GMT] minify:n caches: query_char:? default_lang:ANSI C
[Tue Jan 17 13:10:20 2023 GMT] memory footprint: 3.18 MiB
[Tue Jan 17 13:10:20 2023 GMT] host 0.0.0.0:8081_PONG
[Tue Jan 17 13:10:20 2023 GMT] can't find '' Qspupdpm!Iboemfs!
[Tue Jan 17 13:10:20 2023 GMT]
To run G-WAN, you must fix the error(s) or remove this Qspupdpm!Iboemfs!
[Tue Jan 17 13:10:20 2023 GMT] exit(1):
To run G-WAN, you must fix the error(s) or remove this Qspupdpm!Iboemfs!
I did read somewhere that getting this installed on Debian / Ubuntu is a challenge unlike CentOS.
Have been searching Stack Overflow, but nothing specific to this issue. Including the suggested links before posting this question.
Long back I downloaded the bash script for installing other plugin, and after that it was working then. Now the link to bash script is redirecting to expired domain.
Any guidance or help will be appreciated to get this up and going. Hope I have followed the format of this forum.
Thank you

Is there any solution to the XFS lockup in linux?

Apparently there is a known problem of XFS locking up the kernel/processes and corrupting volumes under heavy traffic.
Some web pages talk about it, but I was not able to figure out which pages are new and may have a solution.
My company's deployments have Debian with kernel 3.4.107, xfsprogs 3.1.4, and large storage arrays.
We have large data (PB) and high throughput (GB/sec) using async IO to several large volumes.
We constantly experience these unpredictable lockups on several systems.
Kernel logs/dmesg show something like the following:
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986515] INFO: task Sr2dReceiver-5:46829 blocked for more than 120 seconds.
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986518] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986520] Sr2dReceiver-5 D ffffffff8105b39e 0 46829 7284 0x00000000
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986524] ffff881e71f57b38 0000000000000082 000000000000000b ffff884066763180
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986529] 0000000000000000 ffff884066763180 0000000000011180 0000000000011180
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986532] ffff881e71f57fd8 ffff881e71f56000 0000000000011180 ffff881e71f56000
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986536] Call Trace:
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986545] [<ffffffff814ffe9f>] schedule+0x64/0x66
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986548] [<ffffffff815005f3>] rwsem_down_failed_common+0xdb/0x10d
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986551] [<ffffffff81500638>] rwsem_down_write_failed+0x13/0x15
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986555] [<ffffffff8126b583>] call_rwsem_down_write_failed+0x13/0x20
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986558] [<ffffffff814ff320>] ? down_write+0x25/0x27
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986572] [<ffffffffa01f29e0>] xfs_ilock+0xbc/0x12e [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986580] [<ffffffffa01eec71>] xfs_rw_ilock+0x2c/0x33 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986586] [<ffffffffa01eec71>] ? xfs_rw_ilock+0x2c/0x33 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986593] [<ffffffffa01ef234>] xfs_file_aio_write_checks+0x41/0xfe [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986600] [<ffffffffa01ef358>] xfs_file_buffered_aio_write+0x67/0x179 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986603] [<ffffffff8150099a>] ? _raw_spin_unlock_irqrestore+0x30/0x3d
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986611] [<ffffffffa01ef81d>] xfs_file_aio_write+0x163/0x1b5 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986614] [<ffffffff8106f1af>] ? futex_wait+0x22c/0x244
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986619] [<ffffffff8110038e>] do_sync_write+0xd9/0x116
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986622] [<ffffffff8150095f>] ? _raw_spin_unlock+0x26/0x31
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986634] [<ffffffff8106f2f1>] ? futex_wake+0xe8/0xfa
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986637] [<ffffffff81100d1d>] vfs_write+0xae/0x10a
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986639] [<ffffffff811015b3>] ? fget_light+0xb0/0xbf
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986642] [<ffffffff81100dd3>] sys_pwrite64+0x5a/0x79
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986645] [<ffffffff81506912>] system_call_fastpath+0x16/0x1b
Lockups leave the system in a bad state. The processes in D state that hang cannot even be killed with signal 9.
The only way to resume operations is to reboot, repair XFS and then the system works for another while.
But occasionally after the lockup we cannot even repair some volumes, as they get totally corrupted and we need to rebuild them with mkfs.
As a last resort, we now run xfs-repair periodically and this reduced the frequency of lockups and data loss to a certain extent.
But the incidents still occur often enough, so we need some solution.
I was wondering if there is a solution for this with kernel 3.4.107, e.g. some patch that we may apply.
Due to the large number of deployments and other software issues, we cannot upgrade the kernel in the near future.
However, we are working towards updating our applications so that we can run on kernel 3.16 in our next releases.
Does anyone know if this XFS lockup problem was fixed in 3.16?
Some people have experienced this but it was not a problem with XFS it was because the kernel was unable to flush dirty pages within the 120s time period. Have a look here but please check the numbers they're using as default on your own system.
http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/
and here
http://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/
You can see what you're dirty cache ratio is by running this
sysctl -a | grep dirty
or
cat /proc/sys/vm/dirty_ratio
The best write up on this I could find is here...
https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
Essentially you need to tune your application to make sure that it can write the dirty buffers to disk within the time period or change the timer period etc.
You can also see some interesting paramaters as follows
sysctl -a | grep hung
You could increase the timeout permanently using /etc/sysctl.conf as follows...
kernel.hung_task_timeout_secs = 300
Does anyone know if this XFS lockup problem was fixed in 3.16?
It is said so in A Short Guide to Kernel Debugging:
Searching for “xfs splice deadlock” turns up an email thread from 2011 that describes this
problem. However, bisecting the kernel source repository shows that
the bug wasn’t really addressed until April, 2014 (8d02076) for release in Linux 3.16.

synchronize date in apache storm cluster

I use an apache storm topology on a cluster of 8+1 machines. The date on these machines is not the same and we may have more than 5 minutes of difference.
preprod-storm-nimbus-01:
Thu Feb 25 16:20:30 GMT 2016
preprod-storm-supervisor-01:
Thu Feb 25 16:20:32 GMT 2016
preprod-storm-supervisor-02:
Thu Feb 25 16:20:32 GMT 2016
preprod-storm-supervisor-03:
Thu Feb 25 16:14:54 UTC 2016 <<-- this machine is very late :(
preprod-storm-supervisor-04:
Thu Feb 25 16:20:31 GMT 2016
preprod-storm-supervisor-05:
Thu Feb 25 16:20:17 GMT 2016
preprod-storm-supervisor-06:
Thu Feb 25 16:20:00 GMT 2016
preprod-storm-supervisor-07:
Thu Feb 25 16:20:31 GMT 2016
preprod-storm-supervisor-08:
Thu Feb 25 16:19:55 GMT 2016
preprod-storm-supervisor-09:
Thu Feb 25 16:20:30 GMT 2016
Question:
Is the storm topology affected by this non-synchronization?
Note: I know that synchronizing is better, but the sysadmins won't do it without proving them proofs/reasons that they have to do it. Do they really have to do it, "for the topology's sake" :) ?
Thanks
It depends on the computation you are doing... It might have an effect on your result if you do time based window operations. Otherwise, it doesn't matter.
For Storm as an execution engine it has no effect at all.

What's the safest way to shut down MongoDB when running as a Windows Service?

I have a single instance of MongoDB 2.4.8 running on Windows Server 2012 R2. MongoDB is installed as a Windows Service. I have journalling enabled.
The MongoDB documentation suggests that the MongoDB service should just be shut down via the Windows Service Control Manager:
net stop MongoDB
When I did this recently, the following was logged and I ended up with a non-zero byte mongod.lock file on disk. (I used the --repair option to fix this but it turns out this probably wasn't necessary as I had journalling enabled.)
Thu Nov 21 11:08:12.011 [serviceShutdown] got SERVICE_CONTROL_STOP request from Windows Service Control Manager, will terminate after current cmd ends
Thu Nov 21 11:08:12.043 [serviceShutdown] now exiting
Thu Nov 21 11:08:12.043 dbexit:
Thu Nov 21 11:08:12.043 [serviceShutdown] shutdown: going to close listening sockets...
Thu Nov 21 11:08:12.043 [serviceShutdown] closing listening socket: 1492
Thu Nov 21 11:08:12.043 [serviceShutdown] closing listening socket: 1500
Thu Nov 21 11:08:12.043 [serviceShutdown] shutdown: going to flush diaglog...
Thu Nov 21 11:08:12.043 [serviceShutdown] shutdown: going to close sockets...
Thu Nov 21 11:08:12.043 [serviceShutdown] shutdown: waiting for fs preallocator...
Thu Nov 21 11:08:12.043 [serviceShutdown] shutdown: lock for final commit...
Thu Nov 21 11:08:12.043 [serviceShutdown] shutdown: final commit...
Thu Nov 21 11:08:12.043 [conn1333] end connection 127.0.0.1:51612 (18 connections now open)
Thu Nov 21 11:08:12.043 [conn1331] end connection 127.0.0.1:51610 (18 connections now open)
...snip...
Thu Nov 21 11:08:12.043 [conn1322] end connection 10.1.2.212:53303 (17 connections now open)
Thu Nov 21 11:08:12.043 [conn1337] end connection 127.0.0.1:51620 (18 connections now open)
Thu Nov 21 11:08:12.839 [serviceShutdown] shutdown: closing all files...
Thu Nov 21 11:08:14.683 [serviceShutdown] Progress: 5/163 3% (File Closing Progress)
Thu Nov 21 11:08:16.012 [serviceShutdown] Progress: 6/163 3% (File Closing Progress)
...snip...
Thu Nov 21 11:08:52.030 [serviceShutdown] Progress: 143/163 87% (File Closing Progress)
Thu Nov 21 11:08:54.092 [serviceShutdown] Progress: 153/163 93% (File Closing Progress)
Thu Nov 21 11:08:55.405 [serviceShutdown] closeAllFiles() finished
Thu Nov 21 11:08:55.405 [serviceShutdown] journalCleanup...
Thu Nov 21 11:08:55.405 [serviceShutdown] removeJournalFiles
Thu Nov 21 11:09:05.578 [DataFileSync] ERROR: Client::shutdown not called: DataFileSync
The last line is my main concern.
I'm also interested in how MongoDB is able to take longer to shut down than Windows normally allows for service shutdown? At what point is it safe to shut down the machine without checking the log file?

Weird behavior with gwan v3.12.26

I just installed the latest Xmas gift from gwan team, but I'm having some problems:
Segmentation fault with archlinux .
On Ubuntu strange behavior.
I can't run any script on it.
About #1, Archlinux is up to date and uses the 2.16 GLIBC.
About #2, I'm loading http://188.165.219.99:8080/100.html sometimes it display 100 X, sometimes an error page (with CSS) and sometimes an error page without CSS.
About #3, I can't run any csp script:
http://188.165.219.99:8080/?hello.c
http://188.165.219.99:8080/?hello.rb
http://188.165.219.99:8080/?hello.php
None of the above work. Has the csp url changed?
I have installed php5-cli and ruby on my ubuntu.
For informations:
# ldd --version
ldd (Ubuntu EGLIBC 2.11.1-0ubuntu7.11) 2.11.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
Here the log on archlinux
# cat logs/gwan.log
[Thu Dec 27 14:03:44 2012 GMT] ------------------------------------------------
[Thu Dec 27 14:03:44 2012 GMT] G-WAN 3.12.26 64-bit (Dec 26 2012 13:58:12)
[Thu Dec 27 14:03:44 2012 GMT] ------------------------------------------------
[Thu Dec 27 14:03:44 2012 GMT] Local Time: Thu, 27 Dec 2012 16:03:44 GMT+2
[Thu Dec 27 14:03:44 2012 GMT] RAM: (1.60 GiB free + 0 shared + 834.57 MiB buffers) / 23.64 GiB total
[Thu Dec 27 14:03:44 2012 GMT] Physical Pages: 1.60 GiB / 23.64 GiB
[Thu Dec 27 14:03:44 2012 GMT] DISK: 1.71 TiB free / 1.88 TiB total
[Thu Dec 27 14:03:44 2012 GMT] 336 processes, including pid:1545 './gwan'
[Thu Dec 27 14:03:44 2012 GMT] Multi-Core, HT enabled
[Thu Dec 27 14:03:44 2012 GMT] 1x Intel(R) Xeon(R) CPU W3520 # 2.67GHz (4 Core(s)/CPU, 2 thread(s)/Core)
[Thu Dec 27 14:03:44 2012 GMT] using 4 workers 0[1111]3
[Thu Dec 27 14:03:44 2012 GMT] among 8 threads 0[11110000]7
[Thu Dec 27 14:03:44 2012 GMT] 64-bit little-endian (least significant byte first)
1: segmentation fault with archlinux
Many informations are listed at the top of the /logs/wgan.log file. It would immensely help to see it (that's what log files are for).
2: on ubuntu strange behavior
Not knowing the nature of the "strange behavior" nor the Ubuntu version makes difficult to answer your question (if that's a question).
3: I can't run any script on it
We received more than 50 emails this morning alove. None (but yours) reports that scripts fail to run. So far, people report great results with all languages.
Check your files permissions (can the G-WAN account read the script files?) and verify that the relevant compilers / runtimes are installed.
Ogla's report suggests that javascript embedded in HTML (the new release does minifying there) migh be the cause of the cut files.
Other than that, the fact that the daemon mode is broken in v3.12.25/26 may explain your problems if you run in daemon mode.

Resources