hadoop fair scheduler open file error? - hadoop

I am testing the fair scheduler mode for job assigning, however I get such error
java.io.IOException: Cannot run program "bash": java.io.IOException: error=24, Too many open files
After a google, Most of them will tell to check how many files are currently open in the system (by unix command lsof) and how's that number related to your system limit (check by bash command ulimit -n). Increasing maximum number of open files at a time is short-term solution in my opinion.
Is there anyway to avoid this?

The fact that your system is reaching the limit for #(max open files), you might have to check:
How many other operations are running on the system ?
Are they heavily opening many files ?
Is your hadoop job itself heavily opening many files ?
Is the current limit for #(max open files) too small on your system ? (you can google out the typical values). If its too small, consider increasing it
I think increasing the #(max open files) limit will work out. In long term you might again end up with that problem if #1,#2 and #3 are not addressed.

Related

How to detect memory leak or other system-wide problems caused by a batch processing job?

First the question:
How do I monitor the complete flow of the batch file, including influences to the surrounding system, all the way from initiating the cmd.exe to tearing it down after the script has completed?
Then the reason:
I've recently set up a batch script job to extract data, pack the data and transfer the compressed file to an external storage. It's a massive job executing for about 10 hours, and using lots of disk space.
It's the only change to the system (Windows Server 2008 SE SP2, 32bit with 4GB RAM) in a very long time, and just a few minutes after running the full script the for the second time, the server crashed hard, and had to be power-cycled.
I've been going through the script's log file, and everything worked flawless - at least down to the last line in the script that outputs the last line in the log file. Then I checked systems event log files, and found nothing to indicate an eminent problem... But I still very much suspect this script to be the triggering cause! Perhaps some sort of memory leak or memory fragmentation could be involved?
Finally the overall script operation and numbers:
The data is extracted from database files (SubVersion) on one disk, and temporarily stored on another disk (in about 150.000 files of varying size, taking up about 250 GB of space), then combined into about 1000 files which are compressed with 7zip down to about 22GB, which is transferred to external storage. Finally, all temporary files are removed, with the exception of a log file.
The initial batch script calls several other batch scripts in the cause of processing the data, during the task described above other scripts or commands are called about 20.000 times. Total number of lines in the scripts are about 600, with several for loops involved.

JMeter - Thread Users gone after a few mins when set loop forever and schedule time

I have a test plan with five thread groups, each of them has 10 thread users, i wanna run this test plan for 1 hour with 50 concurrent users, for my understanding the number of thread users should keep at 50, but somehow thread users keep reducing after several mins, and ends with 0 thread user, below is my configurations, can someone help take a look, thanks in advance.
Also in jemter command, i got the below error:
Above error shows you got OOM i.e. Out Of Memory error.
You ran out of JMeter allocated heap space.
To resolve this issue you can try below things,
Use tips provided here (use non-gui mode, avoid heavy reporters etc.)
http://blazemeter.com/blog/jmeter-performance-and-tuning-tips
Try to increase heap space of JMeter if your load test really requires large amount of heap space.
I would suggest follow suggestions sequentially.
you can try using distributed setup for load testing , this way you can distribute the load in several machine
The quick fixes
1 Use JMeter in non-GUI mode
Find where you installed JMeter through command line or the terminal
Set the ‘bin’ directory as your current directory
Run ‘jmeter –n –t .jmx
That’s it, when your test is complete you can do your analysis. You may be interested in seeing how your test is performing during execution in command line/terminal (rather than seeing a black window), to do this just uncomment the following within the JMeter properties file (remove the ‘#’):
summariser.name=summary
summariser.interval=180
summariser.out=true
2 Remove or at least disable listeners you don’t need. Typically listeners such as the ‘View Results Tree’ consume large amounts of memory purely down to how much information it gathers.
Simply take a second look at your script and remove the listeners you don’t need.
3Increase the JMeter heap memory
Within your explorer find where you installed JMeter
Open up the bin directory.
Find the ‘JMeter.bat’ file and open it with a text editor
Find the following ‘set HEAP’
You can set this HEAP value to whatever you like, in this example I’ve said allocate 2 GB of memory from the start and throughout the test run: ‘set HEAP=-Xms2048m -Xmx2048m’

Reading file in parallel from multiple processes

I'm running multiple processes in parallel and each of these processes read the same file in parallel. It looks like some of the processes see a corrupted version of the file if I increase the number of processes to > 15 or so. What is the recommended way of handling such a scenario?
More details:
The file being read in parallel is actually a perl script. The multiple jobs are python processes, and each of them launch this perl script independently with different input parameters. When the number of jobs is increased, some of these jobs give errors that the perl script has invalid syntax (which is not true). Hence, I suspect that some of these jobs read in corrupted versions of the perl script.
I'm running all of this on a 32core machine.
If any process is also writing to the file, then you need to enforce some synchronization, for example with a global named mutex.
If there is no asynchronous writing going on, I would not expect to see corruption during the reads. Are you opening the files with "r" access? If you're still encountering troubles, it might be worth experimenting with reducing read buffer size. Or call out to a native win32 API for the file access.
Good luck!

Set an application's resource limits (CPU usage)

It's a little strange I know,
but I want to limit a program (for example the winrar app) resource usage.
The reason:
I have an old laptop, with overheating problem, so if I want to do a calculate intensive task (compress a >10GB folder), my laptop overheats and turns off.
The question:
Is it possible to limit an application's resource/CPU usage? For example, can I set somehow, that winrar can only use my CPU's 50%?
I use windows 8.1,
but answer for other OS is welcome.
See Are there solutions that can limit the CPU usage of a process? for general answer.
WinRAR itself has the command line switch -ri, see in help of WinRAR the page with title:
Switch -RI<p>[:<s>] - set priority and sleep time
For example using a command line like
WinRAR.exe a -ri1:100 Backup.rar *
results in compressing all files in current working directory with default settings using lowest task priority and with 100 ms sleep time between each read or write operation.

How to avoid "Too many open files" issue in load balancer, what are side effects of setting ulimit -n to a higher value?

When load balancing a production system with heavy load of requests, or while load testing it, we face a 'Too many open files' issue as below.
[2011-06-09 20:48:31,852] WARN - HttpCoreNIOListener System may be unstable: IOReactor encountered a checked exception : Too many open files
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152)
at org.apache.http.impl.nio.reactor.DefaultListeningIOReactor.processEvent(DefaultListeningIOReactor.java:129)
at org.apache.http.impl.nio.reactor.DefaultListeningIOReactor.processEvents(DefaultListeningIOReactor.java:113)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:315)
at org.apache.synapse.transport.nhttp.HttpCoreNIOListener$2.run(HttpCoreNIOListener.java:253)
at java.lang.Thread.run(Thread.java:662)
This exception comes in load balancer, if the maximum allowed files (ulimit) to be open is set to low.
we can fix the above exception by increasing the ulimit by giving a higher value (given 655350 here).
ulimit -n 655350
however setting ulimit -n to a higher number may affect overall performance of load balancer and hence response time of our website in unprecedented ways. Any known side effects od setting unumber -n to higher value?
The side effects will be exactly that - under normal circumstances you want to limit the number of open file handles for a process so it doesn't exhaust available file descriptors or do other bad stuff due to programming errors or DoS attacks. If you're comfortable that your system won't do that (or plainly won't work within normal limits!) then there's no other side effects.

Resources