Transformers gets killed for no reason on linux - memory-management

So i'm trying to run inference on a Huggingface model, the model is 6.18gb.
This morning I was on Windows and it was possible to load the model, but inference was very slow so I took a look at DeepSpeed but only available on linux so I switched to Zorin OS.
Now the exact same script gets killed when running
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Cedille/fr-boris", device_map = "auto")
What is going on ?

Try to diagnose with below command:
dmesg -T| grep -E -i -B100 'killed process'
And you may find out the reason.
[Fri Feb 10 21:16:54 2023] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/session-14313.scope,task=python,pid=1071011,uid=1000
[Fri Feb 10 21:16:54 2023] Out of memory: Killed process 1071011 (python) total-vm:2480280kB, anon-rss:1709008kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtables:4276kB oom_score_adj:0

Related

My problem is regarding Jmeter when I am trying to use the non-GUI mode

When I am passing the command in command prompt then I am getting the below error-
C:\Users\ShivangiT\Downloads\apache-jmeter-5.3\apache-jmeter-5.3\bin>Jmeter.bat -Jjmeter.save.saveservice.output_format=xml -n -t \Users\ShivangiT\Downloads\apache-jmeter-5.3\apache-jmeter-5.3\bin\vieweventpage.jmx -l \Users\ShivangiT\Downloads\apache-jmeter-5.3\apache-jmeter-5.3\bin\rr.jtl
Creating summariser <summary>
Created the tree successfully using \Users\ShivangiT\Downloads\apache-jmeter-5.3\apache-jmeter-5.3\bin\vieweventpage.jmx
Starting standalone test # Fri Aug 21 07:29:38 BST 2020 (1597991378434)
Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
summary = 41 in 00:00:14 = 2.9/s Avg: 5256 Min: 7 Max: 13688 Err: 13 (31.71%)
Tidying up ... # Fri Aug 21 07:29:52 BST 2020 (1597991392905)
... end of run
The JVM should have exited but did not.
The following non-daemon threads are still running (DestroyJavaVM is OK):
Thread[DestroyJavaVM,5,main], stackTrace:
Thread[AWT-EventQueue-0,6,main], stackTrace:sun.misc.Unsafe#park
java.util.concurrent.locks.LockSupport#park
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject#await
java.awt.EventQueue#getNextEvent
java.awt.EventDispatchThread#pumpOneEventForFilters
java.awt.EventDispatchThread#pumpEventsForFilter
java.awt.EventDispatchThread#pumpEventsForHierarchy
java.awt.EventDispatchThread#pumpEvents
java.awt.EventDispatchThread#pumpEvents
java.awt.EventDispatchThread#run
Thread[AWT-Shutdown,5,system], stackTrace:java.lang.Object#wait
sun.awt.AWTAutoShutdown#run
java.lang.Thread#run
Can anybody please help me with this.
This is a known issue of JMeter 5.3 when test plan contains Http(s) Test script recorder.
The workaround is to remove it.
See:
https://bz.apache.org/bugzilla/show_bug.cgi?id=64479
Alternatively you can try nightly build:
https://ci.apache.org/projects/jmeter/nightlies/
Shiva, I would suggest that you should use the older version of JMeter always. The reason is very simple. There is not much of a change in JMeter since JMeter 4. JMeter is more inclined towards its compatibility with JDK 11 as currently, it supports JDK 8 flawlessly in the older versions. Use JMeter 4 from the JMeter official archive and you'll be able to execute everything smoothly. No need to look for workarounds. Make sure you use JMeter 4

Riak eating 100% CPU on OSX install

This question is related to:
Riak node not working, but using 100% cpu
but since the poster seems to have left I'm posting my case here.
Last night I installed erlang(R15B01) from source, using the config options from the Riak website:
http://docs.basho.com/riak/1.2.1/tutorials/installation/Installing-Erlang/#Installing-on-Mac-OS-X
and Riak(1.4.1) on my 2013 MacBook Pro (2.8GHz i7, 16GB ram, OSX 10.8.3). I did not change the ulimit, as I assumed it would be fine for a vanilla run.
Installation went fine; warnings but no errors, and I was able to run the toy examples no problem.
However the empty instance quickly ate through all 4 cores and my machine started whining and overheating.
Looking in the logs I see the following error repeated a jillion times:
2013-10-11 09:04:04.266 [error] CRASH REPORT ¥
Process with 0 neighbours exited with reason: ¥
call to undefined function eleveldb:o
also tons of crash reports:
2013-10-11 09:14:47 =CRASH REPORT====
crasher:
initial call: riak_kv_index_hashtree:init/1
pid:
registered_name: []
exception exit: {{undef,[{eleveldb,open,
["./data/anti_entropy/479555224749202520035584085735030365824602865664",
[{create_if_missing,true},{max_open_files,20},{write_buffer_size,12886952}]],[]},
{hashtree,new_segment_store,2,[{file,"src/hashtree.erl"},{line,499}]},{hashtree,new,2,
[{file,"src/hashtree.erl"},{line,215}]},{riak_kv_index_hashtree,do_new_tree,2,
[{file,"src/riak_kv_index_hashtree.erl"},{line,421}]},{lists,foldl,3,[{file,"lists.erl"},
{line,1197}]},{riak_kv_index_hashtree,init_trees,2,[{file,"src/riak_kv_index_hashtree.erl"},
{line,366}]},{riak_kv_index_hashtree,init,1,[{file,"src/riak_kv_index_hashtree.erl"},
{line,226}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]}]},
[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,328}]},{proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,227}]}]}
ancestors: [,riak_core_vnode_sup,riak_core_sup,]
messages: []
links: []
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 24
reductions: 492
neighbours:
erlang.log says
=====
===== LOGGING STARTED Fri Oct 11 09:04:01 CEST 2013
=====
Node 'riak#127.0.0.1' not responding to pings.
config is OK
!!!!
!!!! WARNING: ulimit -n is 2560; 4096 is the recommended minimum.
!!!!
Exec: /tmp/riak-1.4.1/rel/riak/bin/../erts-5.9.1/bin/erlexec
-boot /tmp/riak-1.4.1/rel/riak/bin/../releases/1.4.1/riak
-config /tmp/riak-1.4.1/rel/riak/bin/../etc/app.config
-pa /tmp/riak-1.4.1/rel/riak/bin/../lib/basho-patches
-args_file /tmp/riak-1.4.1/rel/riak/bin/../etc/vm.args -- console
Root: /tmp/riak-1.4.1/rel/riak/bin/..
Erlang R15B01 (erts-5.9.1) [source] [64-bit] [smp:8:8] [async-threads:64]
[kernel-poll:true]
Eshell V5.9.1 (abort with ^G)
(riak#127.0.0.1)1>
After less than 10m there are already 144MB of logging files with variations of the above.
I had the same problem by building riak 1.4.6 from source.
I changed in the file etc/app.config the line to
{anti_entropy, {off, []}},
Leveldb is used by AAE. See the config parameter anti_entropy_leveldb_opts.
Use process of elimination:
It's hard to say without more information. Is the 200% being used by
beam.smp? Do you see anything in console.log, error.log or crash.log
that would indicate something odd is happening? Are there clients
communicating with the cluster at the time? If so what client/protocol are
they using and what types of operations are being performed (e.g.
get/put/map reduce/etc)?
References
Riak consuming too much CPU
Interesting sawtooth increasing CPU usage on lightly-used Riak
Inspecting a Node
Riak Performance Tuning
Open Files Limit
Configuration Files

How to simulate process/daemon crash on OSX?

How can I invoke/simulate process/daemon crash on OSX and as result to receive crash report in
/Library/Logs/DiagnosticRepors
(e.g. opendirectoryd_2013-06-11-125032_macmini61.crash)?
I tried to make force quit for daemons using Activity Monitor but didn't receive any report. I need to crash some system or third party process (NOT developed by myself).
You can force almost any process to crash by sending it a "segmentation violation" signal.
Example: Find process id of "opendirectoryd":
$ ps -ef | grep opendirectoryd
0 15 1 0 9:14am ?? 0:01.11 /usr/libexec/opendirectoryd
^-- process id
Send signal to the process:
$ sudo kill -SEGV 15
This terminates the process and causes a diagnostic report to be written,
as can be verified in "system.log":
Oct 31 09:17:17 hostname com.apple.launchd[1] (com.apple.opendirectoryd[15]): Job appears to have crashed: Segmentation fault: 11
Oct 31 09:17:20 hostname ReportCrash[420]: Saved crash report for opendirectoryd[15] version ??? (???) to /Library/Logs/DiagnosticReports/opendirectoryd_2013-10-31-091720_localhost.crash
But note that deliberately crashing system services might cause severe problems (system instability, data loss, ...), so you should know exactly what you are doing.
Unless you can find a legitimate bug and get it to crash that way, you can't externally crash a daemon in such a fashion that it will result in a diagnostic report. All of the quit-forcing functions are exempt from diagnostic reports as they are external issues.

Apache on Win2003 cannot find the path specified

A fresh installation of Apache 2.2 on Win2003.
Configuration validates with the apache tool yet when I attempt to access the site the browser displays an internal error.
Apache log shows:
[Mon Jul 16 13:36:38 2012] [error] [client 10.162.9.158] (OS 3)The
system cannot find the path specified. : couldn't spawn child
process: D:/Heatmap/Webapp/public/dispatch.cg
The file system shows:
D:\Heatmap\Webapp\public>dir dispatch* Volume in drive D is DATA
Volume Serial Number is C482-3950
Directory of D:\Heatmap\Webapp\public
05/02/2012 10:56 AM 445 dispatch.cgi 05/02/2012 10:56
AM 520 dispatch.fcgi
2 File(s) 965 bytes
0 Dir(s) 5,625,618,432 bytes free
Since I noramlly run Apache on Linux servers I'm stymied as to what the root cause is here. The system cannot find a path that is present.
Cluestick please.
The bit of the message couldn't spawn child process caught my attention.
Research showed the shebang line is actually used by Apache - unless Perl at the WinOS level which associates the file extension with the interpreter - and I needed to correct it in my .cgi.
Specifying the full path to Perl in the cgi's shebang line corrected the problem.

Perl script are not running on Apache 2.2.15 with mod_fcgid / Win32

I have installed apache 2.2.15 with mod_fcgid on Windows XP SP3 and Activestate Perl (tried both 5.12 & 5.8.9)
tried the perl example script in the mod_fcgid reference page but it
is not working
I get this in error log
[Tue Dec 07 23:10:35 2010] [info] mod_fcgid: server 127.0.0.1:/usr/bin/perl.exe(5476) started
[Tue Dec 07 23:10:35 2010] [warn] [client 127.0.0.1] (OS 109)The pipe has been ended. : mod_fcgid : get overlap result error
[Tue Dec 07 23:10:35 2010] [error] [client 127.0.0.1] Premature end of script headers: f.pl
[Tue Dec 07 23:10:35 2010] [error] [client 127.0.0.1] File does not exist: C:/Apache2/htdocs/favicon.ico
[Tue Dec 07 23:10:39 2010] [info] mod_fcgid - infoneto: process /usr/bin/perl.exe(5476) exit(communication error), return code 9
I double check everything including:
The #!/usr/bin/perl.exe line
The mod_fcgid is loaded
When running the script as plain cgi it works
When I turned to the older mod_fastcgi it works just fine as a fastcgi (i.e. loads once runs many times).
Using process monitor I can see that apache starts Perl but it Perl exits almost instantly without even loading the Perl script
I tried it also on Apache 2.0.52 & 2.0.63 with older mod_fcgid and with Apache 2.2.15 with the newest mod_fcgid (2.3.6) but no luck
What can be done ?
I googled around but no one seems to have solution or managed using mod_fcgid with perl on Win32
I opened a bug on both FCGI at cpan and on apache tracker but no one seems to care...
Is there a solution for this ?
Does someone else need this ? (mod_fcgid with Perl on Apache/Win32)
You're on Win32, and you have a /usr/bin/perl.exe? Are you sure?
Regardless, I think you're looking for mod_fastcgi rather than mod_fcgid; at least, a quick google search seemed that it fixed the problem for most. Apparently mod_fcgid is not working as well under Windows.

Resources