Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Has anyone tested PHP7 vs Java 8, I was wondering about their performance and how they compare. I was thinking the new virtual machine introduced for PHP was still new or immature compared to Java 8 and its new garbage collector. Not sure about it though
Comparing PHP and Java is apples and oranges. It is hard to get a fair and meaningful comparison.
However, see http://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=java&lang2=php. The short answer is that Java 8 is faster than PHP 7. But you should read all of the caveats on the page ... including the fact that language vs language benchmarks are pretty bogus.
For the record, this is what that page currently says (2017-04-23):
Java programs versus PHP
all other Java programs & measurements by benchmark task performance
fasta secs mem gz cpu cpu load
Java: 2.13 36,036 2457 5.66 94% 58% 59% 60%
PHP: 59.37 8,896 1030 59.36 5% 2% 3% 100%
fannkuch-redux secs mem gz cpu cpu load
Java: 13.74 30,368 1282 54.12 100% 98% 98% 99%
PHP: 280.04 33,588 1150 1,117.48 100% 100% 100% 100%
mandelbrot secs mem gz cpu cpu load
Java: 7.10 90,588 796 27.92 99% 99% 98% 98%
PHP: 125.17 136,776 863 499.16 100% 100% 100% 100%
n-body secs mem gz cpu cpu load
Java: 21.54 27,092 1489 21.56 1% 1% 100% 1%
PHP: 358.21 8,668 1082 358.12 17% 0% 1% 83%
spectral-norm secs mem gz cpu cpu load
Java: 4.29 29,884 950 16.56 96% 97% 99% 95%
PHP: 37.94 19,420 1135 150.67 99% 99% 100% 99%
binary-trees secs mem gz cpu cpu load
Java: 11.26 593,156 835 39.02 85% 88% 90% 88%
PHP: 88.07 736,372 1027 247.49 92% 77% 23% 91%
k-nucleotide secs mem gz cpu cpu load
Java: 7.93 465,372 1802 25.11 75% 75% 75% 93%
PHP: 43.96 235,632 1060 142.28 87% 100% 71% 72%
reverse-complement secs mem gz cpu cpu load
Java: 1.10 345,352 1661 2.40 33% 82% 53% 54%
PHP: 2.81 135,124 426 1.75 31% 21% 44% 57%
pidigits secs mem gz cpu cpu load
Java: 3.06 31,760 938 3.16 6% 3% 97% 1%
PHP: 2.15 9,884 394 2.15 1% 0% 100% 1%
regex-redux secs mem gz cpu cpu load
Java: 12.31 902,528 929 38.75 73% 76% 86% 81%
PHP: 3.34 158,792 786 3.30 25% 26% 22% 92%
Java Version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
PHP Version
PHP 7.1.4 (cli) (built: Apr 16 2017 16:17:54) ( NTS )
Copyright (c) 1997-2017 The PHP Group Zend Engine v3.1.0,
Copyright (c) 1998-2017 Zend Technologies
Related
I'm trying to install manjaro on laptop with oem windows 10 on board, I do it using
manjaro architect cli installer. I've created LVM, parted logical volumes, made luks on LVM, formatted partitions to btrfs and mounted it. When I moved next and began installing DE I caught the error:
checking available disk space...
error: Partition /mnt too full: 1333620 blocks needed, 0 blocks free
error: not enough free disk space
error: failed to commit transaction (not enough free disk space)
Errors occurred, no packages were upgraded.
==> ERROR: Failed to install packages to new root
Then I type df -hT and see this:
Filesystem Type Size Used Avail Use% Mounted on
dev devtmpfs 6.8G 0 6.8G 0% /dev
run tmpfs 6.9G 101M 6.8G 2% /run
/dev/sdb1 iso9660 2.7G 2.7G 0 100% /run/miso/bootmnt
cowspace tmpfs 256M 0 256M 0% /run/miso/cowspace
overlay_root tmpfs 11G 189M 11G 2% /run/miso/overlay_root
/dev/loop0 squashfs 21M 21M 0 100% /run/miso/sfs/livefs
/dev/loop1 squashfs 457M 457M 0 100% /run/miso/sfs/mhwdfs
/dev/loop2 squashfs 1.6G 1.6G 0 100% /run/miso/sfs/desktopfs
/dev/loop3 squashfs 592M 592M 0 100% /run/miso/sfs/rootfs
overlay overlay 11G 189M 11G 2% /
tmpfs tmpfs 6.9G 121M 6.8G 2% /dev/shm
tmpfs tmpfs 6.9G 0 6.9G 0% /sys/fs/cgroup
tmpfs tmpfs 6.9G 48M 6.8G 1% /tmp
tmpfs tmpfs 6.9G 2.3M 6.9G 1% /etc/pacman.d/gnupg
tmpfs tmpfs 1.4G 12K 1.4G 1% /run/user/1000
/dev/mapper/vg--default-lv--root btrfs 162G 1.1G 0 100% /mnt
/dev/mapper/crypto-home btrfs 60G 3.4M 60G 1% /mnt/home
/dev/mapper/crypto-project btrfs 40G 3.4M 40G 1% /mnt/home/project
/dev/nvme0n1p4 fuseblk 200G 20G 181G 10% /mnt/windows
/dev/nvme0n1p7 vfat 99M 512 99M 1% /mnt/boot/efi
What is wrong with the row?
/dev/mapper/vg--default-lv--root btrfs 162G 1.1G 0 100% /mnt
How can it be so - Use% 100% and Avail 0, but only 1.1G is used from 162G?
The solution is to manually unmount and remount the filesystem before installing packages. Possibly, closing and reopening the btrfs volume is also necessary. If you run into the issue while packages are being installed, you can do the mount/remount and then just redo the package installation step without formatting.
Previously I wrote the following, however it did not help in a recent install:
I ran into the same issue. This is some sort of bug in btrfs. I stumbled upon a workaround. After manually creating a file and writing to it (touch /mnt/temp, dd if=/dev/zero of=/mnt/temp bs=1M count=1000), df began to report correct available space and I was able to resume the installation.
P.S. I am using btrfs directly over luks over block device.
I built a 3 node Ceph cluster recently. Each node had seven 1TB HDD for OSDs. In total, I have 21 TB of storage space for Ceph.
However, when I ran a workload to keep writing data to Ceph, it turns to Err status and no data can be written to it any more.
The output of ceph -s is:
cluster:
id: 06ed9d57-c68e-4899-91a6-d72125614a94
health: HEALTH_ERR
1 full osd(s)
4 nearfull osd(s)
7 pool(s) full
services:
mon: 1 daemons, quorum host3
mgr: admin(active), standbys: 06ed9d57-c68e-4899-91a6-d72125614a94
osd: 21 osds: 21 up, 21 in
rgw: 4 daemons active
data:
pools: 7 pools, 1748 pgs
objects: 2.03M objects, 7.34TiB
usage: 14.7TiB used, 4.37TiB / 19.1TiB avail
pgs: 1748 active+clean
Based on my comprehension, since there is still 4.37 TB space left, Ceph itself should take care about how to balance the workload and make each OSD to not be at full or nearfull status. But the result doesn't work as my expectation, 1 full osd and 4 nearfull osd shows up, the health is HEALTH_ERR.
I can't visit Ceph with hdfs or s3cmd anymore, so here comes the question:
1, Any explanation about current issue?
2, How can I recover from it? Delete data on Ceph node directly with ceph-admin, and relaunch the Ceph?
Not get an answer for 3 days and I made some progress, let me share my findings here.
1, It's normal for different OSD to have size gap. If you list OSD with ceph osd df, you will find that different OSD has different usage ratio.
2, To recover from this issue, the issue here means the cluster crush due to OSD full. Follow steps below, it's mostly from redhat.
Get ceph cluster health info by ceph health detail. It's not necessary but you can get the ID of failed OSD.
Use ceph osd dump | grep full_ratio to get current full_ratio. Do not use statement listed at above link, it's obsoleted. The output can be like
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
Set OSD full ratio a little higher by ceph osd set-full-ratio <ratio>. Generally, we set ratio to 0.97
Now, the cluster status will change from HEALTH_ERR to HEALTH_WARN or HEALTH_OK. Remove some data that can be released.
Change OSD full ratio back to previous ratio. It can't be 0.97 always cause it's a little risky.
Hope this thread is helpful to some one who ran into same issue. The details about OSD configuration please refer to ceph.
Ceph requires free disk space to move storage chunks, called pgs, between different disks. As this free space is so critical to the underlying functionality, Ceph will go into HEALTH_WARN once any OSD reaches the near_full ratio (generally 85% full), and will stop write operations on the cluster by entering HEALTH_ERR state once an OSD reaches the full_ratio.
However, unless your cluster is perfectly balanced across all OSDs there is likely much more capacity available, as OSDs are typically unevenly utilized. To check overall utilization and available capacity you can run ceph osd df.
Example output:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
2 hdd 2.72849 1.00000 2.7 TiB 2.0 TiB 2.0 TiB 72 MiB 3.6 GiB 742 GiB 73.44 1.06 406 up
5 hdd 2.72849 1.00000 2.7 TiB 2.0 TiB 2.0 TiB 119 MiB 3.3 GiB 726 GiB 74.00 1.06 414 up
12 hdd 2.72849 1.00000 2.7 TiB 2.2 TiB 2.2 TiB 72 MiB 3.7 GiB 579 GiB 79.26 1.14 407 up
14 hdd 2.72849 1.00000 2.7 TiB 2.3 TiB 2.3 TiB 80 MiB 3.6 GiB 477 GiB 82.92 1.19 367 up
8 ssd 0.10840 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up
1 hdd 2.72849 1.00000 2.7 TiB 1.7 TiB 1.7 TiB 27 MiB 2.9 GiB 1006 GiB 64.01 0.92 253 up
4 hdd 2.72849 1.00000 2.7 TiB 1.7 TiB 1.7 TiB 79 MiB 2.9 GiB 1018 GiB 63.55 0.91 259 up
10 hdd 2.72849 1.00000 2.7 TiB 1.9 TiB 1.9 TiB 70 MiB 3.0 GiB 887 GiB 68.24 0.98 256 up
13 hdd 2.72849 1.00000 2.7 TiB 1.8 TiB 1.8 TiB 80 MiB 3.0 GiB 971 GiB 65.24 0.94 277 up
15 hdd 2.72849 1.00000 2.7 TiB 2.0 TiB 2.0 TiB 58 MiB 3.1 GiB 793 GiB 71.63 1.03 283 up
17 hdd 2.72849 1.00000 2.7 TiB 1.6 TiB 1.6 TiB 113 MiB 2.8 GiB 1.1 TiB 59.78 0.86 259 up
19 hdd 2.72849 1.00000 2.7 TiB 1.6 TiB 1.6 TiB 100 MiB 2.7 GiB 1.2 TiB 56.98 0.82 265 up
7 ssd 0.10840 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up
0 hdd 2.72849 1.00000 2.7 TiB 2.0 TiB 2.0 TiB 105 MiB 3.0 GiB 734 GiB 73.72 1.06 337 up
3 hdd 2.72849 1.00000 2.7 TiB 2.0 TiB 2.0 TiB 98 MiB 3.0 GiB 781 GiB 72.04 1.04 354 up
9 hdd 2.72849 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up
11 hdd 2.72849 1.00000 2.7 TiB 1.9 TiB 1.9 TiB 76 MiB 3.0 GiB 817 GiB 70.74 1.02 342 up
16 hdd 2.72849 1.00000 2.7 TiB 1.8 TiB 1.8 TiB 98 MiB 2.7 GiB 984 GiB 64.80 0.93 317 up
18 hdd 2.72849 1.00000 2.7 TiB 2.0 TiB 2.0 TiB 79 MiB 3.0 GiB 792 GiB 71.65 1.03 324 up
6 ssd 0.10840 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up
TOTAL 47 TiB 30 TiB 30 TiB 1.3 GiB 53 GiB 16 TiB 69.50
MIN/MAX VAR: 0.82/1.19 STDDEV: 6.64
As you can see in the above output, the used OSDs vary from 56.98% (OSD 19) to 82.92% (OSD 14) utilized, which is a significant variance.
As only a single OSD is full, and only 4 of your 21 OSD's are nearfull you likely have a significant amount of storage still available in your cluster, which means that it is time to perform a rebalance operation. This can be done manually by reweighting OSDs, or you can have Ceph do a best-effort rebalance by running the command ceph osd reweight-by-utilization. Once the rebalance is complete (i.e you have no objects misplaced in ceph status) you can check for the variation again (using ceph osd df) and trigger another rebalance if required.
If you are on Luminous or newer you can enable the Balancer plugin to handle OSD rewighting automatically.
We have HDP version - 2.6.4. On the datanode machine we can see that hdfs data isn’t balanced. On some disks we have different size as
sdb 11G
and
sdd 17G
/dev/sdd 20G 3.0G 17G 15% /grid/sdd
/dev/sdb 20G 11G 9.3G 53% /grid/sdb <-- Why disks are not balanced?
After searching in google I found the following CLI
( from https://community.hortonworks.com/questions/19694/help-with-exception-from-hdfs-balancer.html )
hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 20 1>/tmp/balancer-out.log 2>/tmp/balancer-debug.log
and after I run it we get the same hdfs size
/dev/sdd 20G 3.0G 17G 15% /grid/sdd
/dev/sdb 20G 11G 9.3G 53% /grid/sdb
more /tmp/balancer-out.log Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved The cluster is balanced. Exiting... Mar 7, 2019 5:02:34 PM 0 0 B 0 B 0 B Mar 7, 2019 5:02:34 PM Balancing took 1.453 seconds
So actually we do not see any difference in disk balancing in hdfs.
How to balance the hdfs data so all disk will be with the same used size?
I'm not an expert on this, I've only just started looking at it. I suspect you should be using hdfs diskbalancer, not balancer.
My team has a project which is not too big, built by make -js, cost 40 seconds, when using bazel, the time incresed to 70 secs. And here is the profile of the build process of bazel. I noticed that SKYFUNCTION takes 47% of the time cost, is that reasonable?
PROFILES
the last section of it:
Type Total Count Average
ACTION 0.03% 77 0.70 ms
ACTION_CHECK 0.00% 4 0.90 ms
ACTION_EXECUTE 40.40% 77 912 ms
ACTION_UPDATE 0.00% 74 0.02 ms
ACTION_COMPLETE 0.19% 77 4.28 ms
INFO 0.00% 1 0.05 ms
VFS_STAT 1.07% 117519 0.02 ms
VFS_DIR 0.27% 4613 0.10 ms
VFS_MD5 0.22% 151 2.56 ms
VFS_DELETE 4.43% 53830 0.14 ms
VFS_OPEN 0.01% 232 0.11 ms
VFS_READ 0.06% 3523 0.03 ms
VFS_WRITE 0.00% 4 0.97 ms
WAIT 0.05% 156 0.56 ms
SKYFRAME_EVAL 6.23% 1 10.830 s
SKYFUNCTION 47.01% 687 119 ms
#ittai, #Jin, #Ondrej K, I have tried to switched off the sandboxing in bazel, it seems much faster than with it switched on. here is the comparison:
SWITCHED ON: 70s
SWITCHED OFF: 33s±2
the skyFunction still takes 47% of all the execution time. but the everage times it takes turned from 119ms to 21ms.
My MR job ended at map 100% reduce 35% with lots of error messages similar to running beyond physical memory limits. Current usage: 3.0 GB of 3 GB physical memory used; 3.7 GB of 15 GB virtual memory used. Killing container.
My input *.bz2 file is about 4GB, if I uncompress it, the size of it will be about 38GB, it took about one hour to run this job with one Master and two slavers on the Amazon EMR.
My questions are
- Why this job used so much memory?
- Why this job took about one hour? Usually running a 40GB wordcount job on a small 4-node cluster takes about 10 mins.
- How to tune the MR parameters to solve this problem?
- Which Amazon EC2 Instance types are the good fit to solve this problem?
Please refer to the following log:
- Physical memory (bytes) snapshot=43327889408 => 43.3GB
- Virtual memory (bytes) snapshot=108950675456 => 108.95GB
- Total committed heap usage (bytes)=34940649472 => 34.94GB
My proposed solutions are as follows, but I'm not sure if they are correct solutions or not
- use larger Amazon EC2 Instance which is at least 8GB in memory
- tune the MR parameters using the following codes
Version 1:
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "jobtest1");
//don't kill the container, if the physical memory exceeds "mapreduce.reduce.memory.mb" or "mapreduce.map.memory.mb"
conf.setBoolean("yarn.nodemanager.pmem-check-enabled", false);
conf.setBoolean("yarn.nodemanager.vmem-check-enabled", false);
Version 2:
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "jobtest2");
//conf.set("mapreduce.input.fileinputformat.split.minsize","3073741824");
conf.set("mapreduce.map.memory.mb", "8192");
conf.set("mapreduce.map.java.opts", "-Xmx6144m");
conf.set("mapreduce.reduce.memory.mb", "8192");
conf.set("mapreduce.reduce.java.opts", "-Xmx6144m");
Log:
15/11/08 11:37:27 INFO mapreduce.Job: map 100% reduce 35%
15/11/08 11:37:27 INFO mapreduce.Job: Task Id : attempt_1446749367313_0006_r_000006_2, Status : FAILED
Container [pid=24745,containerID=container_1446749367313_0006_01_003145] is running beyond physical memory limits. Current usage: 3.0 GB of 3 GB physical memory used; 3.7 GB of 15 GB virtual memory used. Killing container.
Dump of the process-tree for container_1446749367313_0006_01_003145 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 24745 24743 24745 24745 (bash) 0 0 9658368 291 /bin/bash -c /usr/lib/jvm/java-openjdk/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx2304m -Djava.io.tmpdir=/mnt1/yarn/usercache/ec2-user/appcache/application_1446749367313_0006/container_1446749367313_0006_01_003145/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1446749367313_0006/container_1446749367313_0006_01_003145 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild **.***.***.*** 32846 attempt_1446749367313_0006_r_000006_2 3145 1>/var/log/hadoop-yarn/containers/application_1446749367313_0006/container_1446749367313_0006_01_003145/stdout 2>/var/log/hadoop-yarn/containers/application_1446749367313_0006/container_1446749367313_0006_01_003145/stderr
|- 24749 24745 24745 24745 (java) 14124 1281 3910426624 789477 /usr/lib/jvm/java-openjdk/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx2304m -Djava.io.tmpdir=/mnt1/yarn/usercache/ec2-user/appcache/application_1446749367313_0006/container_1446749367313_0006_01_003145/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/containers/application_1446749367313_0006/container_1446749367313_0006_01_003145 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild **.***.***.*** 32846 attempt_1446749367313_0006_r_000006_2 3145
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
15/11/08 11:37:28 INFO mapreduce.Job: map 100% reduce 25%
15/11/08 11:37:30 INFO mapreduce.Job: map 100% reduce 26%
15/11/08 11:37:37 INFO mapreduce.Job: map 100% reduce 27%
15/11/08 11:37:42 INFO mapreduce.Job: map 100% reduce 28%
15/11/08 11:37:53 INFO mapreduce.Job: map 100% reduce 29%
15/11/08 11:37:57 INFO mapreduce.Job: map 100% reduce 34%
15/11/08 11:38:02 INFO mapreduce.Job: map 100% reduce 35%
15/11/08 11:38:13 INFO mapreduce.Job: map 100% reduce 36%
15/11/08 11:38:22 INFO mapreduce.Job: map 100% reduce 37%
15/11/08 11:38:35 INFO mapreduce.Job: map 100% reduce 42%
15/11/08 11:38:36 INFO mapreduce.Job: map 100% reduce 100%
15/11/08 11:38:36 INFO mapreduce.Job: Job job_1446749367313_0006 failed with state FAILED due to: Task failed task_1446749367313_0006_r_000001
Job failed as tasks failed. failedMaps:0 failedReduces:1
15/11/08 11:38:36 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=11806418671
FILE: Number of bytes written=22240791936
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=16874
HDFS: Number of bytes written=0
HDFS: Number of read operations=59
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
S3: Number of bytes read=3942336319
S3: Number of bytes written=0
S3: Number of read operations=0
S3: Number of large read operations=0
S3: Number of write operations=0
Job Counters
Failed reduce tasks=22
Killed reduce tasks=5
Launched map tasks=59
Launched reduce tasks=27
Data-local map tasks=59
Total time spent by all maps in occupied slots (ms)=114327828
Total time spent by all reduces in occupied slots (ms)=131855700
Total time spent by all map tasks (ms)=19054638
Total time spent by all reduce tasks (ms)=10987975
Total vcore-seconds taken by all map tasks=19054638
Total vcore-seconds taken by all reduce tasks=10987975
Total megabyte-seconds taken by all map tasks=27438678720
Total megabyte-seconds taken by all reduce tasks=31645368000
Map-Reduce Framework
Map input records=728795619
Map output records=728795618
Map output bytes=50859151614
Map output materialized bytes=10506705085
Input split bytes=16874
Combine input records=0
Spilled Records=1457591236
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=150143
CPU time spent (ms)=14360870
Physical memory (bytes) snapshot=43327889408
Virtual memory (bytes) snapshot=108950675456
Total committed heap usage (bytes)=34940649472
File Input Format Counters
Bytes Read=0
I am not sure of Amazon EMR. So few points to consider regarding map reduce:
bzip2 is slower, although it compresses better than gzip. bzip2’s decompression speed is faster than its compression speed, but it is still slower than the other formats. So at a high level, you already have this compared to 40gb word count program which ran in ten minutes.(assuming that 40gb program don't have compression). Next question is, BUT HOW MUCH SLOWER
However, your job is still failing after one hour. Please confirm this. So only when the job runs successfully, can we thing of performance. For this reason, lets think of why is it failing.
You were getting memory error. Also based on error, a container is failed during the reducer phase(as mapper phase is completed 100%). Mostly not even one reducer might have succeeded. Even though 32% might trick you to think that some reducers ran, that % could be due to preparing clean up work before first reducer runs. One way to confirm is, see if you have got any reducer output file generated.
Once confirming that, none of the reducer ran, you can increase the memory for containers as per your version 2.
Your version 1 will help you to see if only a specific container is causing issue and allowing the job to complete.
Your input file size should conclude the number of reducers. Standard is 1 Reducer per 1 GB unless you are compressing the Mapper output data. So in this case ideal number should have been at least 38. Try passing the command line option as -D mapred.reduce.tasks=40 and see if there is any change.