Weird behavior with gwan v3.12.26 - g-wan

I just installed the latest Xmas gift from gwan team, but I'm having some problems:
Segmentation fault with archlinux .
On Ubuntu strange behavior.
I can't run any script on it.
About #1, Archlinux is up to date and uses the 2.16 GLIBC.
About #2, I'm loading http://188.165.219.99:8080/100.html sometimes it display 100 X, sometimes an error page (with CSS) and sometimes an error page without CSS.
About #3, I can't run any csp script:
http://188.165.219.99:8080/?hello.c
http://188.165.219.99:8080/?hello.rb
http://188.165.219.99:8080/?hello.php
None of the above work. Has the csp url changed?
I have installed php5-cli and ruby on my ubuntu.
For informations:
# ldd --version
ldd (Ubuntu EGLIBC 2.11.1-0ubuntu7.11) 2.11.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
Here the log on archlinux
# cat logs/gwan.log
[Thu Dec 27 14:03:44 2012 GMT] ------------------------------------------------
[Thu Dec 27 14:03:44 2012 GMT] G-WAN 3.12.26 64-bit (Dec 26 2012 13:58:12)
[Thu Dec 27 14:03:44 2012 GMT] ------------------------------------------------
[Thu Dec 27 14:03:44 2012 GMT] Local Time: Thu, 27 Dec 2012 16:03:44 GMT+2
[Thu Dec 27 14:03:44 2012 GMT] RAM: (1.60 GiB free + 0 shared + 834.57 MiB buffers) / 23.64 GiB total
[Thu Dec 27 14:03:44 2012 GMT] Physical Pages: 1.60 GiB / 23.64 GiB
[Thu Dec 27 14:03:44 2012 GMT] DISK: 1.71 TiB free / 1.88 TiB total
[Thu Dec 27 14:03:44 2012 GMT] 336 processes, including pid:1545 './gwan'
[Thu Dec 27 14:03:44 2012 GMT] Multi-Core, HT enabled
[Thu Dec 27 14:03:44 2012 GMT] 1x Intel(R) Xeon(R) CPU W3520 # 2.67GHz (4 Core(s)/CPU, 2 thread(s)/Core)
[Thu Dec 27 14:03:44 2012 GMT] using 4 workers 0[1111]3
[Thu Dec 27 14:03:44 2012 GMT] among 8 threads 0[11110000]7
[Thu Dec 27 14:03:44 2012 GMT] 64-bit little-endian (least significant byte first)

1: segmentation fault with archlinux
Many informations are listed at the top of the /logs/wgan.log file. It would immensely help to see it (that's what log files are for).
2: on ubuntu strange behavior
Not knowing the nature of the "strange behavior" nor the Ubuntu version makes difficult to answer your question (if that's a question).
3: I can't run any script on it
We received more than 50 emails this morning alove. None (but yours) reports that scripts fail to run. So far, people report great results with all languages.
Check your files permissions (can the G-WAN account read the script files?) and verify that the relevant compilers / runtimes are installed.
Ogla's report suggests that javascript embedded in HTML (the new release does minifying there) migh be the cause of the cut files.
Other than that, the fact that the daemon mode is broken in v3.12.25/26 may explain your problems if you run in daemon mode.

Related

Is there any solution to the XFS lockup in linux?

Apparently there is a known problem of XFS locking up the kernel/processes and corrupting volumes under heavy traffic.
Some web pages talk about it, but I was not able to figure out which pages are new and may have a solution.
My company's deployments have Debian with kernel 3.4.107, xfsprogs 3.1.4, and large storage arrays.
We have large data (PB) and high throughput (GB/sec) using async IO to several large volumes.
We constantly experience these unpredictable lockups on several systems.
Kernel logs/dmesg show something like the following:
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986515] INFO: task Sr2dReceiver-5:46829 blocked for more than 120 seconds.
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986518] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986520] Sr2dReceiver-5 D ffffffff8105b39e 0 46829 7284 0x00000000
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986524] ffff881e71f57b38 0000000000000082 000000000000000b ffff884066763180
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986529] 0000000000000000 ffff884066763180 0000000000011180 0000000000011180
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986532] ffff881e71f57fd8 ffff881e71f56000 0000000000011180 ffff881e71f56000
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986536] Call Trace:
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986545] [<ffffffff814ffe9f>] schedule+0x64/0x66
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986548] [<ffffffff815005f3>] rwsem_down_failed_common+0xdb/0x10d
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986551] [<ffffffff81500638>] rwsem_down_write_failed+0x13/0x15
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986555] [<ffffffff8126b583>] call_rwsem_down_write_failed+0x13/0x20
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986558] [<ffffffff814ff320>] ? down_write+0x25/0x27
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986572] [<ffffffffa01f29e0>] xfs_ilock+0xbc/0x12e [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986580] [<ffffffffa01eec71>] xfs_rw_ilock+0x2c/0x33 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986586] [<ffffffffa01eec71>] ? xfs_rw_ilock+0x2c/0x33 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986593] [<ffffffffa01ef234>] xfs_file_aio_write_checks+0x41/0xfe [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986600] [<ffffffffa01ef358>] xfs_file_buffered_aio_write+0x67/0x179 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986603] [<ffffffff8150099a>] ? _raw_spin_unlock_irqrestore+0x30/0x3d
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986611] [<ffffffffa01ef81d>] xfs_file_aio_write+0x163/0x1b5 [xfs]
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986614] [<ffffffff8106f1af>] ? futex_wait+0x22c/0x244
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986619] [<ffffffff8110038e>] do_sync_write+0xd9/0x116
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986622] [<ffffffff8150095f>] ? _raw_spin_unlock+0x26/0x31
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986634] [<ffffffff8106f2f1>] ? futex_wake+0xe8/0xfa
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986637] [<ffffffff81100d1d>] vfs_write+0xae/0x10a
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986639] [<ffffffff811015b3>] ? fget_light+0xb0/0xbf
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986642] [<ffffffff81100dd3>] sys_pwrite64+0x5a/0x79
2016 Mar 24 04:42:34 hmtmzhbgb01-ssu-1 kernel: [2358750.986645] [<ffffffff81506912>] system_call_fastpath+0x16/0x1b
Lockups leave the system in a bad state. The processes in D state that hang cannot even be killed with signal 9.
The only way to resume operations is to reboot, repair XFS and then the system works for another while.
But occasionally after the lockup we cannot even repair some volumes, as they get totally corrupted and we need to rebuild them with mkfs.
As a last resort, we now run xfs-repair periodically and this reduced the frequency of lockups and data loss to a certain extent.
But the incidents still occur often enough, so we need some solution.
I was wondering if there is a solution for this with kernel 3.4.107, e.g. some patch that we may apply.
Due to the large number of deployments and other software issues, we cannot upgrade the kernel in the near future.
However, we are working towards updating our applications so that we can run on kernel 3.16 in our next releases.
Does anyone know if this XFS lockup problem was fixed in 3.16?
Some people have experienced this but it was not a problem with XFS it was because the kernel was unable to flush dirty pages within the 120s time period. Have a look here but please check the numbers they're using as default on your own system.
http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/
and here
http://www.blackmoreops.com/2014/09/22/linux-kernel-panic-issue-fix-hung_task_timeout_secs-blocked-120-seconds-problem/
You can see what you're dirty cache ratio is by running this
sysctl -a | grep dirty
or
cat /proc/sys/vm/dirty_ratio
The best write up on this I could find is here...
https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/
Essentially you need to tune your application to make sure that it can write the dirty buffers to disk within the time period or change the timer period etc.
You can also see some interesting paramaters as follows
sysctl -a | grep hung
You could increase the timeout permanently using /etc/sysctl.conf as follows...
kernel.hung_task_timeout_secs = 300
Does anyone know if this XFS lockup problem was fixed in 3.16?
It is said so in A Short Guide to Kernel Debugging:
Searching for “xfs splice deadlock” turns up an email thread from 2011 that describes this
problem. However, bisecting the kernel source repository shows that
the bug wasn’t really addressed until April, 2014 (8d02076) for release in Linux 3.16.

synchronize date in apache storm cluster

I use an apache storm topology on a cluster of 8+1 machines. The date on these machines is not the same and we may have more than 5 minutes of difference.
preprod-storm-nimbus-01:
Thu Feb 25 16:20:30 GMT 2016
preprod-storm-supervisor-01:
Thu Feb 25 16:20:32 GMT 2016
preprod-storm-supervisor-02:
Thu Feb 25 16:20:32 GMT 2016
preprod-storm-supervisor-03:
Thu Feb 25 16:14:54 UTC 2016 <<-- this machine is very late :(
preprod-storm-supervisor-04:
Thu Feb 25 16:20:31 GMT 2016
preprod-storm-supervisor-05:
Thu Feb 25 16:20:17 GMT 2016
preprod-storm-supervisor-06:
Thu Feb 25 16:20:00 GMT 2016
preprod-storm-supervisor-07:
Thu Feb 25 16:20:31 GMT 2016
preprod-storm-supervisor-08:
Thu Feb 25 16:19:55 GMT 2016
preprod-storm-supervisor-09:
Thu Feb 25 16:20:30 GMT 2016
Question:
Is the storm topology affected by this non-synchronization?
Note: I know that synchronizing is better, but the sysadmins won't do it without proving them proofs/reasons that they have to do it. Do they really have to do it, "for the topology's sake" :) ?
Thanks
It depends on the computation you are doing... It might have an effect on your result if you do time based window operations. Otherwise, it doesn't matter.
For Storm as an execution engine it has no effect at all.

aerospike on openvz 2core 4Gb ram doesn't start and doesn't give errors

after installation without any trouble I've started aerospike on a openvz vps with 2cores and 4gb ram.
this is the result:
root#outland:~# /etc/init.d/aerospike start
* Start aerospike: asd [OK]
then check for running asd:
root#outland:~# /etc/init.d/aerospike status
* Halt aerospike: asd [fail]
what is going wrong?
adding logs:
Mar 03 2015 15:17:57 GMT: INFO (config): (cfg.c::3033) system file descriptor limit: 100000, proto-fd-max: 15000
Mar 03 2015 15:17:57 GMT: WARNING (cf:misc): (id.c::249) Tried eth,bond,wlan and list of all available interfaces on device.Failed to retrieve physical address with errno 19 No such device
Mar 03 2015 15:17:57 GMT: CRITICAL (config): (cfg.c:3363) could not get unique id and/or ip address
Mar 03 2015 15:17:57 GMT: WARNING (as): (signal.c::120) SIGINT received, shutting down
Mar 03 2015 15:17:57 GMT: WARNING (as): (signal.c::123) startup was not complete, exiting immediately
This is your config problem
Mar 03 2015 15:17:57 GMT: WARNING (cf:misc): (id.c::249) Tried eth,bond,wlan and list of all available interfaces on device.Failed to retrieve physical address with errno 19 No such device
Mar 03 2015 15:17:57 GMT: CRITICAL (config): (cfg.c:3363) could not get unique id and/or ip address
Basically the vps has a non standard interface name.
The solution is to add your interface name as network-interface-name to the config.
http://www.aerospike.com/docs/operations/troubleshoot/startup/#problem-with-network-interface
Which OS are your using btw?

Gwan stops working every night

I have a arch 64bit VPS on digitalocean. I installed gwan and run it in deamon mode. It stopped running every midnight.
Here is the log file
[Wed Apr 24 06:10:28 2013 GMT] memory footprint: 3.78 MiB
[Thu, 25 Apr 2013 00:00:19 GMT] * child abort(8) coredump
[Thu, 25 Apr 2013 00:00:19 GMT] * child abort(8) coredump
[Thu, 25 Apr 2013 00:00:19 GMT] * child abort(8) coredump
[Thu, 25 Apr 2013 00:00:19 GMT] * child died 3 times within 3 seconds
[Thu Apr 25 12:39:39 2013 GMT] memory footprint: 3.77 MiB.
[Thu Apr 25 12:39:56 2013 GMT] loaded maintenance script/opt/gwan_linux64-bit/0.0.0.0_8080/#0.0.0.0/csp/crash.c 43.14 KiB MD5:820cf6b4-2152b838-08a13fcb-5f0dc4be
[Fri, 26 Apr 2013 00:00:10 GMT] * child abort(8) coredump
[Fri, 26 Apr 2013 00:00:10 GMT] * child abort(8) coredump
[Fri, 26 Apr 2013 00:00:10 GMT] * child abort(8) coredump
[Fri, 26 Apr 2013 00:00:10 GMT] * child died 3 times within 3 seconds
This problem does not happen on all platforms and so far all the user reports we received used hypervisors which alter the CPU and OS behavior in erratic and undocumented ways (not to cite the additional bugs they inject into the system).
UPDATE
That new problem for 4-years old code that worked fine so far is a platform issue, for which we have found a workaround, to be published with the next release in a few weeks.

Writing jsp tag library

I am using MVC model in which I am writing too many web based widgets for the different different application, which cause lot of repetitive work for me to resolve the problem I am planning to write the new package for the jsp tags for each widgets(using tld) and the generated jar I will include in lots of application which use those widgets and I am successfully able to that also.
But here I am bit concerned about the css and javascript, which is used by the widget.
let say I write css in jsp tag itself in library then in that case, it fetch css and script every time which incurs extra latency, in case if I write common css at client side then for the multiple applications which use my widget package need to write css again and again ?
The jar for widget which I included in my MVC project.
jar -tvf AcmeUIUtils-1.0.jar
0 Fri Dec 07 07:41:56 IST 2012 META-INF/
106 Fri Dec 07 07:41:54 IST 2012 META-INF/MANIFEST.MF
0 Fri Dec 07 15:54:40 IST 2012 com/
0 Fri Dec 07 15:54:40 IST 2012 com/amazon/
0 Fri Dec 07 15:54:40 IST 2012 com/amazon/spotui/
0 Fri Dec 07 15:54:40 IST 2012 com/amazon/spotui/basicui/
2339 Fri Dec 07 02:11:38 IST 2012 com/amazon/spotui/basicui/AcmeMessage.class
1684 Fri Dec 07 15:54:40 IST 2012 com/amazon/spotui/basicui/Ping.class
0 Fri Dec 07 15:54:40 IST 2012 com/amazon/spotui/utils/
2989 Fri Dec 07 15:54:40 IST 2012 com/amazon/spotui/utils/AcmeTags.class
0 Fri Dec 07 07:41:40 IST 2012 META-INF/css/
635 Fri Dec 07 07:40:14 IST 2012 META-INF/css/error.css
1059 Fri Dec 07 14:47:20 IST 2012 META-INF/spot-ui-component.tld
0 Fri Dec 07 15:54:40 IST 2012 test-resources/
Now my question is that how to load error.css in my application in elegant way ? or do I need to made changes at the widget level ?
I don't mind any opensource solution for the problem. But I need jsp tags only.
Since servlet 3.0, a jar file placed under WEB-INF/lib can contain resources that will be served directly by the webapp. These resources must be placed under the META-INF/resources directory of the jar file.
So if your tag library jar contains a file META-INF/resources/js/MyTaglib.js, this file will be directly available using the URL
http://the.host.com/theWebApp/js/MyTaglib.js
If you're targetting pre-servlet3.0 webapps, then tell the developers to deploy the CSS and JS files of your taglib under a specific directory in the webapp.

Resources