Rails. Free memory of Delayed Job (active record) without process restart - ruby

It must be obvious, but I cant get a usecase of Delayed Job, cause due to ruby`s Gargabe Collector specific, it doesnt free memory back to OS. And once delayed job process will take all memory anyway. And the only way is to restart delayed job process.
But if I restart delayed job process and there is currenlty running task - it will never be completed. Probably, there is some workaround to restart that task later, but this approach seems ugly to me.
I tried real jobs and some simple computatuin without any variables, symbols or links so I dont think that "my code leaks". Still, every new job increases memory of delayed_job process.
May be I use Delayed job for something that its not designed? Or it could be environment problem (besides, tried on local machine and on VPS) ?
Tested on: Ubuntu 14.04 and Debian 6 (both x86), Rails 3.2, delayed_job 4.0.2, delayed_job_active_record 4.0.1, ruby 2.1.2
I could give some code examples, but, as I mentioned, I tried both: real job and simple computation. So I won`t if it is not significant and my mistakes are fundamental.
Due to my conditions - my tasks can be executed for couple of minutes, read and write about 100K records to database and require a lot of computation, tasks cant be interrupted, and number of tasks limited by 10-20 dayli, may be - I only guess to use Resque, because it forks process everytime, so there should be no problems with accumulating memory with time.
So do I realy do something wrong or this is a nature of DJ - to occupie all memory or require a restart - and if I cant restart it, I shouldnt use its approach ?
Everything I read on the internet (not so much, by the way) tells that its rubys GC trouble that it doesnt free memory back to OS, and some advises to profile code for unlinked objects (it sounds the most realistic to my case, but, I tried a lot with code that doesnt create any objects, and I explicitly set everything to nil and call GC.start)

Related

Longshot: Python and PyTorch not running unless I bump my GPU to run by doing some other task

Longshot, but anyone had this issue?
I have code running (or not) which is set to run on the GPU. It fails to run unless I bump the GPU to make it run a little bit, for example by watching YouTube or playing a game.
I believe there may be some computer resource/application prioritisation/limitation configurations causing vscode to stop running, and wondered if anyone else had run into this issue.
I am running the code in a .ipynb notebook file in vscode (not sure if that might contribute to the issue). Sometimes the code will just freeze permanently, and I usually go about restarting the code to get things going properly.
The code should normally takes about 7 seconds for the training epoch, and 0.7 seconds for the validation epoch. But I was away for the first epoch and found it hadn't started, and so I opened up Youtube and it began.
Code timings
I can't think of what settings to change for this, but have tried a few
Power options
Anyone had a similar issue before? My second theory is that I think perhaps I am using too much GPU ram in my python code which is slowing it down and effectively made it freeze. And then when I load another application to use the GPU it forces the GPU ram to reconfigure and somehow this RAM reconfiguring might be unblocking the GPU allowing it to run again.

How to detect or log (ubuntu 14.04) when Ruby forks the process?

I'm trying to reduce the amount of forking our Ruby/Rails app does. We shell out a lot with backticks, and each of these forks the entire process, which can cause a huge memory bloat.
I'm going through, identifying the ones that get called the most, and trying to replace them with code which achieves the same thing without making a shell call. However, in some cases I suspect it might still be forking under the hood anyway.
Is there a way to detect or log whenever a process forks? I'm using ubuntu 14.04. A log would be ideal as I can then keep an eye on it when I run the amended code.

Configure MRI Ruby GC to fail fast

I'm working on a Ruby on Rails application which has a memory leak, so eventually it crashes when there's no more memory.
However, in the final stage it basically only running the GC and processing very few requests, so basically DoS-ing itself. This DoS time was between 1 hour and 6 hours for my application!
I tried to locate the memory leak but no luck so far, so now I want to find a workaround for the production server.
Is there a way to configure the MRI Ruby GC so that when it reaches the memory limit then it just crashes? I mean to crash at the first time when Ruby tries to allocate more memory and the operating system denies it.
As far as I know, you cannot do that.
But your have another options:
Setup something in your system, which will prevent ruby from using too much memory (oom maybe?)
Setup your webserver to kill itself - like in this gem

5.6 GB not enough for Cloudera?

I am running Cloudera Hadoop on my laptop and Oracle VirtualBox VM.
I have given 5.6 GB out of mine 8 and six from eight cores as well.
And still I am not able to keep it up and running.
Even without load services would not stay up and running and when I try a query at least Hive will be down within 20 minutes. And sometimes they go down like dominoes: one after another.
More memory seemed to help some: with 3GB and all services, Hue was blinking with red colors when the Hue itself managed to get up. And after rebooting it would takes 30 - 60 minutes before I manage to get the system up enough to even try running anything on it.
There has been two sensible notes (that I have managed to find):
- Warning of swapping.
- Crashing note when the system used 26 GB of virtual memory which was not enough.
My dataset is less than one megabyte, so it is hard to understand why the system would go up to dozens of gigabytes, but for whatever was reason for that has passed: now the system is running more steadily around the 5.6 GB that I have given to it after closing down a few services: see my answer to myself.
And still it is just more stable. Right after I got a warning of swapping and the Hive went down again. What could be reason for more-or-less all Hadoop services going down if the VM starts to swap?
I don't have enough reputation to post the picture to here, but when Hive went down again it was swapping 13 pages / second and utilizing 5.9 GB / 5.6 GB. So basically my system starts crashing more-or-less right after it start to swap. "428 pages were swapped to disk in the previous 15 minute(s)"
I have used default installation options as far as hard drive is concerned.
Only addition is a shared folder between Windows and VM. That works somewhat strangely locking files all the time, so I used it just like FTP and only for passing files from one system to another. Thus I can go days without using it, but systems still crash, so that is not the cause either.
Now that the system is mostly up, services crash still about twice a day: Service Monitor and Hive are quite even with their crashing frequency. After those come Activity Monitor and Event Server, which appear to crash always together. I believe Yarn crashes as well, but it gets up on its own. Last time Hive crashed first, and then it got followed by Service Monitor, Hive (second time), Activity Monitor and Event Server all.
As swap is disk, perhaps the problem is with disk:
# cat /etc/fstab
# swapoff -a
# badblocks -v /dev/VolGroup/lv_swap
Checking blocks 0 to 8388607
Checking for bad blocks (read-only test): done
Pass completed, 0 bad blocks found.
# badblocks -vw /dev/VolGroup/lv_swap
Checking for bad blocks in read-write mode
From block 0 to 8388607
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
So nothing wrong with swap disk and I have not noticed any disk error anywhere else either.
Note that you could check file system from Windows side also. But I expect that if you make Windows to fix your Linux file system, you have good chances of destroying your Linux with that, so I did my checks somewhat pessimistically, because AFAIK these commands are safe to execute.
About half of the services kept going down, so giving more specifics would be a long story.
I succeeded to get the system more stable by closing down flume, hbase, impala, ks_indexer, oozie, spark and sqoop. And by increasing more memory to some remaining services that complained they had not been given enough memory.
Also I fixed couple of thing on the Windows side, I am not sure which one of these helped:
- MsMpEng.exe kept my hard drive busy. I didn't have permissions to kill it, but I decreased its priority to lowest possible.
- CcmExec.exe got to loop on my DVD and kept reading it for forever. This I solved by taking the DVD out from the drive. Then later on I killed the process tree to keep it from bothering for a while.
I found these using Windows resource manager.
The VM requires 4GB: http://www.cloudera.com/content/cloudera-content/cloudera-docs/DemoVMs/Cloudera-QuickStart-VM/cloudera_quickstart_vm.html You should use that.
I am not clear whether you are using the QuickStart VM though. It's set up to run just the essential services and tuned to conserve memory rather than exploit lots of memory.
It sounds like you are running your own installation, on one virtual machine, on your Windows machine. You may be running an entire cluster's worth of services on one desktop machine. Each of these services has master, worker processes, monitoring processes, etc. You don't need most of them.
You also probably have left memory settings at default suitable for a server-class machine of 16+ GB RAM. Remember these services usually run across many machines, not all on one.
Finally, you're clearly swapping, and that makes things incredibly slow. Remember this is all through a VM too!
Bottom line, use the QuickStart VM if you really want a 1-machine cluster tuned correctly. If you want a real cluster or more services, you need more hardware.
Also consider: cloudera.com/live contains a full CDH 5.1 cluster + sample data, running on demand on AWS. Of course, the advantage of the VM is that you can BYOD, but if you're simply looking for a hands-on Hadoop experience, Live is a great option.

shell that protects against machine hanging due to bad process

Is there a shell, or technique, that protects me against my entire machine hanging if a process goes haywire?
I'm using ubuntu 10.10.
If you restrict your resources with limits than you can even prevent fork bombs killing your machine.
Here's a nice tutorial: http://www.cyberciti.biz/tips/linux-limiting-user-process.html
I bought a multi-core CPU for that (it does not protect all cases in theory, but for all practical purposes it lets at least one core stay free almost always - to let your interactive "kill" command run.) :p

Resources