Riak eating 100% CPU on OSX install - macos

This question is related to:
Riak node not working, but using 100% cpu
but since the poster seems to have left I'm posting my case here.
Last night I installed erlang(R15B01) from source, using the config options from the Riak website:
http://docs.basho.com/riak/1.2.1/tutorials/installation/Installing-Erlang/#Installing-on-Mac-OS-X
and Riak(1.4.1) on my 2013 MacBook Pro (2.8GHz i7, 16GB ram, OSX 10.8.3). I did not change the ulimit, as I assumed it would be fine for a vanilla run.
Installation went fine; warnings but no errors, and I was able to run the toy examples no problem.
However the empty instance quickly ate through all 4 cores and my machine started whining and overheating.
Looking in the logs I see the following error repeated a jillion times:
2013-10-11 09:04:04.266 [error] CRASH REPORT ¥
Process with 0 neighbours exited with reason: ¥
call to undefined function eleveldb:o
also tons of crash reports:
2013-10-11 09:14:47 =CRASH REPORT====
crasher:
initial call: riak_kv_index_hashtree:init/1
pid:
registered_name: []
exception exit: {{undef,[{eleveldb,open,
["./data/anti_entropy/479555224749202520035584085735030365824602865664",
[{create_if_missing,true},{max_open_files,20},{write_buffer_size,12886952}]],[]},
{hashtree,new_segment_store,2,[{file,"src/hashtree.erl"},{line,499}]},{hashtree,new,2,
[{file,"src/hashtree.erl"},{line,215}]},{riak_kv_index_hashtree,do_new_tree,2,
[{file,"src/riak_kv_index_hashtree.erl"},{line,421}]},{lists,foldl,3,[{file,"lists.erl"},
{line,1197}]},{riak_kv_index_hashtree,init_trees,2,[{file,"src/riak_kv_index_hashtree.erl"},
{line,366}]},{riak_kv_index_hashtree,init,1,[{file,"src/riak_kv_index_hashtree.erl"},
{line,226}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]}]},
[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,328}]},{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}
ancestors: [,riak_core_vnode_sup,riak_core_sup,]
messages: []
links: []
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 24
reductions: 492
neighbours:
erlang.log says
=====
===== LOGGING STARTED Fri Oct 11 09:04:01 CEST 2013
=====
Node 'riak#127.0.0.1' not responding to pings.
config is OK
!!!!
!!!! WARNING: ulimit -n is 2560; 4096 is the recommended minimum.
!!!!
Exec: /tmp/riak-1.4.1/rel/riak/bin/../erts-5.9.1/bin/erlexec
-boot /tmp/riak-1.4.1/rel/riak/bin/../releases/1.4.1/riak
-config /tmp/riak-1.4.1/rel/riak/bin/../etc/app.config
-pa /tmp/riak-1.4.1/rel/riak/bin/../lib/basho-patches
-args_file /tmp/riak-1.4.1/rel/riak/bin/../etc/vm.args -- console
Root: /tmp/riak-1.4.1/rel/riak/bin/..
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:64]
[kernel-poll:true]
Eshell V5.9.1 (abort with ^G)
(riak#127.0.0.1)1>
After less than 10m there are already 144MB of logging files with variations of the above.

I had the same problem by building riak 1.4.6 from source.
I changed in the file etc/app.config the line to
{anti_entropy, {off, []}},
Leveldb is used by AAE. See the config parameter anti_entropy_leveldb_opts.

Use process of elimination:
It's hard to say without more information. Is the 200% being used by
beam.smp? Do you see anything in console.log, error.log or crash.log
that would indicate something odd is happening? Are there clients
communicating with the cluster at the time? If so what client/protocol are
they using and what types of operations are being performed (e.g.
get/put/map reduce/etc)?
References
Riak consuming too much CPU
Interesting sawtooth increasing CPU usage on lightly-used Riak
Inspecting a Node
Riak Performance Tuning
Open Files Limit
Configuration Files

Related

AWS Lambda Chalice Layers Segmentation Fault

I am deploying a Python 3.7 Lambda function via Chalice. Because the code with its environment requirements, is larger than 50 MB limit, I am using the "automatic_layer" feature of Chalice to generate the layer with the requirements, which is awswrangler.
Because the generated layer is > 50 MB, I am uploading the generated managed-layer-...-python3.7.zip manually to s3 and create a Lambda layer. Then I re-deploy with chalice, removing the automatic_layer option and setting the layers to the generated ARN of the layer I manually created.
The function deployed this way worked OK for a couple of times, then started failing occasionally with "Segmentation Fault". The error rate increased shortly and now it is failing 100%.
Traceback:
> OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
> START RequestId: 3b98bd4b-6cda-4d21-8090-1a49b17c06fc Version: $LATEST
> OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k
> END RequestId: 3b98bd4b-6cda-4d21-8090-1a49b17c06fc
> REPORT RequestId: 3b98bd4b-6cda-4d21-8090-1a49b17c06fc Duration: 7165.04 ms Billed Duration: 7166 ms Memory Size: 128 MB Max Memory Used: 41 MB
> RequestId: 3b98bd4b-6cda-4d21-8090-1a49b17c06fc Error: Runtime exited with error: signal: segmentation fault (core dumped)
> Runtime.ExitError
As awswrangler itself requires boto3 & botocore, and they are already in the Lambda environment, I suspected that there might be a conflict of different versions of boto. I tried the same flow by explicitly including boto3 and botocore in the requirements but I am still receiving the same segmentation fault error.
Any help is much appreciated.
You could use AWS X-Ray to get more information on the problem : https://docs.aws.amazon.com/lambda/latest/dg/python-tracing.html
Moreover you might analyze the core dump generated executing your lambda function on a bash shell:
ulimit -c unlimited
cd /tmp
ececute your python ...
You should find a file named /tmp/core..... that you should analyze with gdb after download. The command "man core" could help you.

Powerpc systemsim-p8 does not boot debian 64 in ubuntu 64 16.04 LTS

I am trying to boot a debian ppc power8 ( or 7 ) in a simulation.
I followed the instructions in [1].
The only thing I manage to boot is an ram drive ( initrd ) with mambo kernel, but it is a closed source. I can't do much with it.
So , now I try to boot a mambo kernel ( with a bigus disk support ) from [2] with the debian image disk from [1].
The kernel manages to mount the drive, but i do not reach a login as depiced in [3].
[1] https://www14.software.ibm.com/webapp/set2/sas/f/pwrfs/pwrfsinstall.html.
[2] https://github.com/rpsene/linux.git
[3]
9415446: (688292884): [^[[32m OK ^[[0m] Reached target Local File Systems.
729612844: (688490282): Starting LSB: Raise network interfaces....
730353740: (689231178): Starting Trigger Flushing of Journal to Persistent Storage...
731308417: (690185857): Starting Create Volatile Files and Directories...
736470477: (695348428): udev-finish (1378) used greatest stack depth: 10752 bytes left
753931943: (712810985): [^[[32m OK ^[[0m] Started Copy rules generated while the root was ro.
765419589: (724298838): [^[[32m OK ^[[0m] Started Trigger Flushing of Journal to Persistent Storage.
804041342: (762920770): [^[[32m OK ^[[0m] Started Create Volatile Files and Directories.
804330683: (763210111): Starting Update UTMP about System Reboot/Shutdown...
815762188: (774642735): [^[[32m OK ^[[0m] Started Update UTMP about System Reboot/Shutdown.
817676182: (776556815): systemd-journald[1213]: Received request to flush runtime journal from PID 1
1076627432: (1035512412): [^[[32m OK ^[[0m] Started udev Coldplug all Devices.
Did you try this one https://github.com/open-power-sdk/power-simulator? This is the version I have uploaded last year. (bug report are welcome).
Also, you can get free Power VMs at https://minicloud.parqtec.unicamp.br/minicloud/
I have got the simulator up and running:
https://pastebin.com/ibGPeEFu
cloudusr#mambo:~$ ssh root#172.19.98.109
The authenticity of host '172.19.98.109 (172.19.98.109)' can't be established.
ECDSA key fingerprint is SHA256:x4/jPYq6SggOeSPOlQaxJlucih6elJLqog+i4P/euxY.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.19.98.109' (ECDSA) to the list of known hosts.
root#172.19.98.109's password:
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root#debianle:~# cat /proc/cpuinfo
processor : 0
cpu : POWER9 (raw), altivec supported
clock : 2000.000000MHz
revision : 2.0 (pvr 004e 0200)
timebase : 512000000
platform : PowerNV
model : Mambo,Simulated-System
machine : PowerNV Mambo,Simulated-System
firmware : OPAL
I've booted systemsim a bunch of times with ubuntu userspace. Do you have a copy of the disk image somewhere I can try? Is the sim userspace LE or BE?

Errors when generating mixed mode Flame Graphs

I'm trying to generate some mixed mode flame graphs on a Linux machine (CentOS 7) and running into some issues.
I'm following instructions from this link: https://www.slideshare.net/brendangregg/java-performance-analysis-on-linux-with-flame-graphs (starting from slide 47).
when I run the below command to collect perf data:
perf record -F 997 -a -g --sleep <time in seconds>; jmaps
It seems to generate perf.data with no errors. However, when I try to process the perf.data by running the below command,
perf script --input=perf.data > out.stacks01
it shows me below messages:
/tmp/libnetty-transport-native-epoll6943913993058681852.so was updated (is prelink enabled?). Restart the long running apps that use it!
corrupted callchain, skipping
Does anyone have any idea what these messages mean?

Yaws process died: {{badmatch,<<>>}

I'm going over a very basic erlang book while using yaws. I'm editing a single yaws file and refreshing the browser. Often (3rd time now) the process will just start to show this error. and i look and look for a syntax error or anything, and eventually i just restart the process and everything works. without any change to the source file.
right now this is the source file that triggered the error this last time
<erl>
out(Arg) ->
{ehtml,
{table, [{width, "100%"}],
{tr,[],
[{td, [{width, "50%"}], "hello world!"},
{td, [{width, "50%"}], "hi again."}]
}
}
}.
</erl>
I tried searching the error, but where all the search results have a meaningful context like "no access" all i get is "<<>>"
=ERROR REPORT==== 26-Nov-2013::20:17:32 ===
Yaws process died: {{badmatch,<<>>},
[{yaws_server,skip_data,2,
[{file,"yaws_server.erl"},{line,2951}]},
{yaws_server,deliver_dyn_file,6,
[{file,"yaws_server.erl"},{line,2717}]},
{yaws_server,aloop,4,
[{file,"yaws_server.erl"},{line,1152}]},
{yaws_server,acceptor0,2,
[{file,"yaws_server.erl"},{line,1013}]},
{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}
Some version info:
Yaws 1.94
Debian GNU/Linux 7.2 (wheezy)
Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.51-1 x86_64 GNU/Linux
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:4:4] [async-threads:0] [hipe] [kernel-poll:false]
Any ideas what this is telling me?
Also, any suggestions for debuggers that are good for beginners is very welcome.
For debugging, I think using Erlang tracing will be helpful. We want to figure out why the yaws_server:skip_data/2 function would be getting a badmatch exception, and specifically why it's getting an empty binary passed to it as that's the only way it could encounter that error. So we need to trace that condition. Try these steps (and don't forget the trailing period on each Erlang shell command):
Run yaws in interactive mode: yaws -i
Once yaws comes up, hit enter to get an Erlang shell prompt.
Create a tracing function for dbg so we get a reasonably formatted backtrace from the trace data: F = fun({trace,_,_,_,Dump},[]) -> io:format("~s~n", [binary_to_list(Dump)]), [] end.
Turn on tracing with this command: dbg:tracer(process, {F, []}).
Trace calls in all processes: dbg:p(all, call).
Now trace the condition of yaws_server:skip_data/2 getting an empty binary as a first argument, and when it does, get a backtrace:
dbg:tpl(yaws_server,skip_data,dbg:fun2ms(fun([<<>>, _]) -> message(process_dump()) end)).
With this in place, start hitting your .yaws page until you provoke the condition, at which point a backtrace will be displayed in your Erlang shell. If you get that backtrace, please copy it into a gist or pastebin and post a link to it as a follow-up here.
If I am correct the output of the out function is supposed to be a list. I didn't check your whole code but the following should work:
<erl>
out(Arg) ->
[{ehtml,
{table, [{width, "100%"}],
{tr,[],
[{td, [{width, "50%"}], "hello world!"},
{td, [{width, "50%"}], "hi again."}]
}
}
}].
</erl>

How to simulate process/daemon crash on OSX?

How can I invoke/simulate process/daemon crash on OSX and as result to receive crash report in
/Library/Logs/DiagnosticRepors
(e.g. opendirectoryd_2013-06-11-125032_macmini61.crash)?
I tried to make force quit for daemons using Activity Monitor but didn't receive any report. I need to crash some system or third party process (NOT developed by myself).
You can force almost any process to crash by sending it a "segmentation violation" signal.
Example: Find process id of "opendirectoryd":
$ ps -ef | grep opendirectoryd
0 15 1 0 9:14am ?? 0:01.11 /usr/libexec/opendirectoryd
^-- process id
Send signal to the process:
$ sudo kill -SEGV 15
This terminates the process and causes a diagnostic report to be written,
as can be verified in "system.log":
Oct 31 09:17:17 hostname com.apple.launchd[1] (com.apple.opendirectoryd[15]): Job appears to have crashed: Segmentation fault: 11
Oct 31 09:17:20 hostname ReportCrash[420]: Saved crash report for opendirectoryd[15] version ??? (???) to /Library/Logs/DiagnosticReports/opendirectoryd_2013-10-31-091720_localhost.crash
But note that deliberately crashing system services might cause severe problems (system instability, data loss, ...), so you should know exactly what you are doing.
Unless you can find a legitimate bug and get it to crash that way, you can't externally crash a daemon in such a fashion that it will result in a diagnostic report. All of the quit-forcing functions are exempt from diagnostic reports as they are external issues.

Resources