I have a problem when start hadoop.
DataBlockScanner consume up to 100% of one CPU.
Master log is:
2012-04-02 11:25:49,793 INFO org.apache.hadoop.hdfs.StateChange:
BLOCK NameSystem.processReport: from 192.168.33.44:50010, blocks: 16148, processing time: 13 msecs
Slave log is:
2012-04-02 11:09:34,109 INFO
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
succeeded for blk_-1757906724564777881_10532084
I checked hadoop fsck and found no error or corrupt block.
Why is the CPU usage so high, and how to stop the block verification?
Without digging through the source to confirm, this is probably only a problem on startup, as the datanode has to tree walk the data directory (/ies) to discover all the blocks and then report them to the namenode. Again without the source i'm unable to confirm as to whether the checksums of each block are verified on startup too, which could be the cause for the 100% CPU.
Thanks.I think my CPU usage so high because the leap second.I think the problem is java.When i start hadoop, cpu usage so high.
http://en.wikipedia.org/wiki/Leap_second
Related
I am using CentOS 7.1.1503 with kernel linux 3.10.0-229.el7.x86_64, ext4 file system with ordered journal mode, and delalloc enabled.
When my app writes logs to a file continually (about 6M/s), I can find the write system call stalled for 100-700 ms occasionally, When I disable ext4's journal feature, or set the journal mode to writeback, or disable delay allocate, the stalling disappears. When I set linux's writeback more frequent, dirty page expire time shorter, the problem reduced.
I printed the process's stack when stalling happends, I got this:
[<ffffffff812e31f4>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffffa0195854>] ext4_da_get_block_prep+0x1a4/0x4b0 [ext4]
[<ffffffff811fbe17>] __block_write_begin+0x1a7/0x490
[<ffffffffa019b71c>] ext4_da_write_begin+0x15c/0x340 [ext4]
[<ffffffff8115685e>] generic_file_buffered_write+0x11e/0x290
[<ffffffff811589c5>] __generic_file_aio_write+0x1d5/0x3e0
[<ffffffff81158c2d>] generic_file_aio_write+0x5d/0xc0
[<ffffffffa0190b75>] ext4_file_write+0xb5/0x460 [ext4]
[<ffffffff811c64cd>] do_sync_write+0x8d/0xd0
[<ffffffff811c6c6d>] vfs_write+0xbd/0x1e0
[<ffffffff811c76b8>] SyS_write+0x58/0xb0
[<ffffffff81614a29>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
I read the source code of linux kernel, and found that call_rwsem_down_read_failed will call rwsem_down_read_failed which will keep waiting for rw_semaphore.
I think the reason is that metadata journal flushing must wait for related dirty pages flushed, when dirty pages flushing took long time, the journal blocked, and journal commit has this inode's rw_semaphore, write system call to this inode is stalled.
I really hope I can find evidence to prove it.
I'm generating some parquet (v1.6.0) output from a PIG (v0.15.0) script. My script takes several input sources and joins them with some nesting. The script runs without error but then during the STORE operation I get:
2016-04-19 17:24:36,299 [PigTezLauncher-0] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: status=FAILED, progress=TotalTasks: 249 Succeeded: 220 Running: 0 Failed: 1 Killed: 28 FailedTaskAttempts: 43, diagnostics=Vertex failed, vertexName=scope-1446, vertexId=vertex_1460657535752_15030_1_18, diagnostics=[Task failed, taskId=task_1460657535752_15030_1_18_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:parquet.hadoop.MemoryManager$1: New Memory allocation 134217728 exceeds minimum allocation size 1048576 with largest schema having 132 columns
at parquet.hadoop.MemoryManager.updateAllocation(MemoryManager.java:125)
at parquet.hadoop.MemoryManager.addWriter(MemoryManager.java:82)
at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:104)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:309)
at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:262)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getRecordWriter(PigOutputFormat.java:81)
at org.apache.tez.mapreduce.output.MROutput.initialize(MROutput.java:398)
...
The above exception was thrown when I executed the script using -x tez but I get the same exception when using mapreduce. I have tried to increase parallelization using SET default_parallel as well as adding an (unneccessary w.r.t. my real objectives) ORDER BY operation just prior to my STORE operations to ensure PIG has an opportunity to ship data off to different reducers and minimize the memory required on any given reducer. Finally, I've tried pushing up the available memory using SET mapred.child.java.opts. None of this has helped however.
Is there something I'm just missing? Are there known strategies for avoiding the issue of one reducer carrying too much of the load and causing things to fail during write? I've experienced similar issues writing to avro output that appear to be caused by insufficient memory to execute the compression step.
EDIT: per this source file the issue seems to boil down to the fact that memAllocation/nCols<minMemAllocation. However the memory allocation seems unaffected by the mapred.child.java.opts setting I tried out.
I solved this finally using the parameter parquet.block.size. The default value (see source) is big enough to write a 128-column wide file, but no bigger. The solution in pig was to use SET parquet.block.size x; where x >= y * 1024^2 and y is the number of columns in your output.
I'm getting the following errors in my hadoop namenode log:
2015-12-20 06:15:40,717 WARN [IPC Server handler 21 on 9000] ipc.Server
(Server.java:run(2029)) - IPC Server handler 21 on 9000, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport
from 172.31.21.110:46999 Call#163559 Retry#0:
error: java.lang.OutOfMemoryError: Java heap space
java.lang.OutOfMemoryError: Java heap space
2015-12-20 06:15:42,710 WARN [IPC Server handler 22 on 9000] ipc.Server
(Server.java:run(2029)) - IPC Server handler 22 on 9000, call
org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol.blockReport from
172.31.24.250:45624 Call#164898 Retry#0:
error: java.lang.OutOfMemoryError: Java heap space
which results in all the nodes being listed as dead.
I have checked other stackoverflow questions and the most useful suggestion seems to be that I need to set the mapred.child.java.opts option in conf/mapred-site.xml to something higher than 2048MB,
but I'm concerned that might not be enough.
I'm launching my cluster using spark with the --hadoop-major-version=yarn option, so all MapReduce jobs are run through Yarn if I understand correctly, including jobs created by HDFS.
My question is: what other settings, if any, do I need to modify (and how do I determine their amounts, given that I want to use say 4GB for the mapreduce.child.java.opts setting) to increase the memory available to HDFS's MapReduce jobs?
Hadoop daemons control their JVM arguments, including heap size settings, through the use of environment variables that have names suffixed with _OPTS. These environment variables are defined in various *-env.sh files in the configuration directory.
Using the NameNode as an example, you can set a line like this in your hadoop-env.sh file.
export HADOOP_NAMENODE_OPTS="-Xms4G -Xmx4G $HADOOP_NAMENODE_OPTS"
This sets a minimum/maximum heap size of 4 GB for the NameNode and also preserves any other arguments that were placed into HADOOP_NAMENODE_OPTS earlier in the script.
My program, which I've run numerous times on different clusters suddenly stops. The log:
15/04/20 19:19:59 INFO scheduler.TaskSetManager: Finished task 12.0 in stage 15.0 (TID 374) in 61 ms on ip-XXX.compute.internal (16/24)
15/04/20 19:19:59 INFO storage.BlockManagerInfo: Added rdd_44_14 in memory on ip-XXX.compute.internal:37999 (size: 16.0 B, free: 260.6 MB)
Killed
What does "Killed" mean and why does it occur? There's no other errors.
"Killed" usually means that the OS has terminated the process by sending a SIGKILL signal. This is an unblockable signal that terminates a process immediately. It's often used as an OOM (out-of-memory) process killer -- if the OS decides that memory resources are getting dangerously low, it can pick a process to kill to try to free some memory.
Without more information, it's impossible to tell whether your process was killed because of memory problems or for some other reason. The kind of information you might be able to provide to help diagnose what's going on includes: how long was the process running before it was killed? can you enable and provide more verbose debug output from the process? is the process termination associated with any particular pattern of communication or processing activity?
Try setting yarn.nodemanager.vmem-check-enabled to false in your program's Spark config, something like this:
val conf = new SparkConf().setAppName("YourProgramName").set("yarn.nodemanager.vmem-check-enabled","false")
val sc = new SparkContext(conf)
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-being-killed-by-YARN-node-manager-td22199.html
maybe the vm problem
ensure you have swap partition.
ensure vm.swappiness is not zero.
I use JFFS2 partition as the root one mounting it in the command line (and fstab) as RO. It is Montavista 5.0 linux (2.6.18).
Everything works, except that when linux arrives to my application it is busy for about 15s by jffs2_gcd_mtd3 with 98% CPU time. This is unacceptable in my case.
I searched linux code and saw that GC thread is started ONLY when mounted RW, but in my case it starts nonetheless!
I tried to mount it rw and unmount afterwords, but...
Thanks ahead.
UPDATE: The statement about GC daemon is wrong - I saw it on error. The main cause of the issue is the VERY VERY slow work of JFFS2 in comparison to YAFFS2 I had previously. Just to compare - my ELF formatted application of 14MiB was loaded from YAFFS2 in 2-2.5 sec, while from JFFS2 it takes about 8 sec.!!!
This made me think that there is something blocking Linux...
Now, the question si changed to: what can make the JFFS2 to be SO DREADFULLY slow!?!? Again, the partition is mounted RO!
OK, the answer is as following:
It takes A LOT of time for JFFS2 to mount the partition of 120MiB - something about 10 seconds on ARM5 running at 300MHz. Nothing helps here - sumtools, unmount with R/W (to write the summary).
I solved the issue by:
- not including the unnecessary/unused space into Linux partitions;
- dividing the rest of 70MiB into two - one 55MiB with all Linux stuff, one of 15MiB with my application and its files.
This solved the problem. The time is about 2-3s