Error while creating table in HBASE. "ERROR: java.io.IOException: Table Namespace Manager not ready yet, try
again later."
hcbk -fix shows ERROR: hbase:meta is not found on any region.
Error appeared after fresh start of hbase shell session No errors reported in master log during start.
Last session of Hbase closed properly but not zookeeper (suspecting this as a reason for meta table corruption).
I am able to list the tables created earlier
hbase(main):001:0> list
TABLE
IDX_STOCK_SYMBOL
Patient
STOCK_SYMBOL
STOCK_SYMBOL_BKP
SYSTEM.CATALOG
SYSTEM.FUNCTION
SYSTEM.SEQUENCE
SYSTEM.STATS
8 row(s) in 1.7930 seconds
Creating a table named custmaster
hbase(main):002:0> create 'custmaster', 'customer'
ERROR: java.io.IOException: Table Namespace Manager not ready yet, try
again later
at org.apache.hadoop.hbase.master.HMaster.getNamespaceDescriptor(HMaster.java:3179)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1735)
at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1774)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40470)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Work Around: Running hbck to identify inconsistencies
[hduser#master ~]$ hbase hbck
>Version: 0.98.4-hadoop2
>Number of live region servers: 2
>Number of dead region servers: 0
>Master: master,60000,1538793456542
>Number of backup masters: 0
>Average load: 0.0
>Number of requests: 11
>Number of regions: 0
>Number of regions in transition: 1
>
>ERROR: META region or some of its attributes are null.
>ERROR: hbase:meta is not found on any region.
>ERROR: hbase:meta table is not consistent. Run HBCK with proper fix options to fix hbase:meta inconsistency. Exiting...
.
.
.
>Summary: >
>3 inconsistencies detected.
>Status: INCONSISTENT
Ran hbck wih-details option to identify the tables involved
[hduser#master ~]$ hbase hbck -details
>ERROR: META region or some of its attributes are null.
>ERROR: hbase:meta is not found on any region.
>ERROR: hbase:meta table is not consistent. Run HBCK with proper fix options to fix hbase:meta inconsistency. Exiting...
>Summary:
>3 inconsistencies detected.
>Status: INCONSISTENT
The output of -details clearly shows the meta is not found on any region.
Tried running the command hbase hbck -fixMeta but same result returned as above
Hence tried hbase hbck -fix
This command ran for sometime with the prompt "Trying to fix a problem with hbase:meta.." and resulted in below error
[hduser#master ~]$ hbase hbck -fix
Version: 0.98.4-hadoop2
Number of live region servers: 2
Number of dead region servers: 0
Master: master,60000,1538793456542
Number of backup masters: 0
Average load: 0.0
Number of requests: 19
Number of regions: 0
Number of regions in transition: 1
ERROR: META region or some of its attributes are null.
ERROR: hbase:meta is not found on any region.
Trying to fix a problem with hbase:meta..
2018-10-06 09:01:03,424 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
2018-10-06 09:01:03,425 INFO [main] client.HConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x166473bbe720005
2018-10-06 09:01:03,432 INFO [main] zookeeper.ZooKeeper: Session: 0x166473bbe720005 closed
2018-10-06 09:01:03,432 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down
Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:
Sat Oct 06 08:52:13 IST 2018, org.apache.hadoop.hbase.client.RpcRetryingCaller#18920cc, org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.PleaseHoldException): org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2416)
at org.apache.hadoop.hbase.master.HMaster.assignRegion(HMaster.java:2472)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40456)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Sat Oct 06 08:52:13 IST 2018, org.apache.hadoop.hbase.client.RpcRetryingCaller#18920cc, org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.PleaseHoldException): org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2416)
at org.apache.hadoop.hbase.master.HMaster.assignRegion(HMaster.java:2472)
at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:40456)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Help me on how to resolve this issue?
Thanks in advance !!
I havenot checked on NameNode and Datanode logs. But when I check the real issue turned out was corrupt file in HDFS.
Ran hadoop fsck / to check health of the file system.
[hduser#master ~]$ hadoop fsck /
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
18/10/06 09:52:00 WARN util.NativeCodeLoader: Unable to load native-hadoop libr
ary for your platform... using builtin-java classes where applicable
Connecting to namenode via http://master:50070/fsck?ugi=hduser&path=%2F
FSCK started by hduser (auth:SIMPLE) from /192.168.1.11 for path / at Sat Oct 0
6 09:52:02 IST 2018
...............................................................................
..
/user/hduser/hbase/.hbck/hbase-1538798774320/data/hbase/meta/1588230740/info/35
9783d4cd07419598264506bac92dcf: CORRUPT blockpool BP-1664228054-192.168.1.11-15
35828595216 block blk_1073744002
/user/hduser/hbase/.hbck/hbase-1538798774320/data/hbase/meta/1588230740/info/35 9783d4cd07419598264506bac92dcf: MISSING 1 blocks of total size 3934 B.........
/user/hduser/hbase/data/default/IDX_STOCK_SYMBOL/a27db76f84487a05f3e1b8b74c13fa
78/0/c595bf49443f4daf952df6cdaad79181: CORRUPT blockpool BP-1664228054-192.168.
1.11-1535828595216 block blk_1073744000
/user/hduser/hbase/data/default/IDX_STOCK_SYMBOL/a27db76f84487a05f3e1b8b74c13fa
78/0/c595bf49443f4daf952df6cdaad79181: MISSING 1 blocks of total size 1354 B...
.........
...
/user/hduser/hbase/data/default/SYSTEM.CATALOG/d63574fdd00e8bf3882fcb6bd53c3d83
/0/dcb68bbb5e394d19b06db7f298810de0: CORRUPT blockpool BP-1664228054-192.168.1.
11-1535828595216 block blk_1073744001
/user/hduser/hbase/data/default/SYSTEM.CATALOG/d63574fdd00e8bf3882fcb6bd53c3d83
/0/dcb68bbb5e394d19b06db7f298810de0: MISSING 1 blocks of total size 2283 B..... ......................Status: CORRUPT
Total size: 4232998 B
Total dirs: 109
Total files: 129
Total symlinks: 0
Total blocks (validated): 125 (avg. block size 33863 B)
********************************
UNDER MIN REPL'D BLOCKS: 3 (2.4 %)
dfs.namenode.replication.min: 1
CORRUPT FILES: 3
MISSING BLOCKS: 3
MISSING SIZE: 7571 B
CORRUPT BLOCKS: 3
********************************
Minimally replicated blocks: 122 (97.6 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 1.952
Corrupt blocks: 3
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1
FSCK ended at Sat Oct 06 09:52:02 IST 2018 in 66 milliseconds
The filesystem under path '/' is CORRUPT
And I run hdfs hbck -delete option to delete the corrupt files and fixed the issue.
Detailed explanation on cleaning hdfs f/s is available here --> How to fix corrupt HDFS FIles
Related
I have a local Hadoop 3.2 installation : 1 master + 1 worker both running in my laptop. This is an experimental setup to make quick tests before submitting to a real cluster.
Everything is in good health:
$ jps
22326 NodeManager
21641 DataNode
25530 Jps
22042 ResourceManager
21803 SecondaryNameNode
21517 NameNode
$ hdfs fsck /
Connecting to namenode via http://master:9870/fsck?ugi=david&path=%2F
FSCK started by david (auth:SIMPLE) from /127.0.0.1 for path / at Wed Sep 04 13:54:59 CEST 2019
Status: HEALTHY
Number of data-nodes: 1
Number of racks: 1
Total dirs: 1
Total symlinks: 0
Replicated Blocks:
Total size: 0 B
Total files: 0
Total blocks (validated): 0
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 1
Average block replication: 0.0
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 0
Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
FSCK ended at Wed Sep 04 13:54:59 CEST 2019 in 0 milliseconds
The filesystem under path '/' is HEALTHY
When I'm running the provided Pi example, I get the following error:
$ yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar pi 16 1000
Number of Maps = 16
Samples per Map = 1000
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Starting Job
2019-09-04 13:55:47,665 INFO client.RMProxy: Connecting to ResourceManager at master/0.0.0.0:8032
2019-09-04 13:55:47,887 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/david/.staging/job_1567598091808_0001
2019-09-04 13:55:48,020 INFO input.FileInputFormat: Total input files to process : 16
2019-09-04 13:55:48,450 INFO mapreduce.JobSubmitter: number of splits:16
2019-09-04 13:55:48,508 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-09-04 13:55:49,000 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1567598091808_0001
2019-09-04 13:55:49,003 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-09-04 13:55:49,164 INFO conf.Configuration: resource-types.xml not found
2019-09-04 13:55:49,164 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-09-04 13:55:49,375 INFO impl.YarnClientImpl: Submitted application application_1567598091808_0001
2019-09-04 13:55:49,411 INFO mapreduce.Job: The url to track the job: http://cyclimse:8088/proxy/application_1567598091808_0001/
2019-09-04 13:55:49,412 INFO mapreduce.Job: Running job: job_1567598091808_0001
2019-09-04 13:55:55,477 INFO mapreduce.Job: Job job_1567598091808_0001 running in uber mode : false
2019-09-04 13:55:55,480 INFO mapreduce.Job: map 0% reduce 0%
2019-09-04 13:55:55,509 INFO mapreduce.Job: Job job_1567598091808_0001 failed with state FAILED due to: Application application_1567598091808_0001 failed 2 times due to AM Container for appattempt_1567598091808_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2019-09-04 13:55:54.458]Exception from container-launch.
Container id: container_1567598091808_0001_02_000001
Exit code: 1
[2019-09-04 13:55:54.464]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2019-09-04 13:55:54.465]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
For more detailed output, check the application tracking page: http://cyclimse:8088/cluster/app/application_1567598091808_0001 Then click on links to logs of each attempt.
. Failing the application.
2019-09-04 13:55:55,546 INFO mapreduce.Job: Counters: 0
Job job_1567598091808_0001 failed!
It seems there's is something wrong with the configuration of Log4j: No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).. However it's using the default configuration ($HADOOP_CONF_DIR/log4j.properties).
After the execution, HDFS state looks like this:
$ hdfs fsck /
Connecting to namenode via http://master:9870/fsck?ugi=david&path=%2F
FSCK started by david (auth:SIMPLE) from /127.0.0.1 for path / at Wed Sep 04 14:01:43 CEST 2019
/tmp/hadoop-yarn/staging/david/.staging/job_1567598091808_0001/job.jar: Under replicated BP-24234081-0.0.0.0-1567598050928:blk_1073741841_1017. Target Replicas is 10 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
/tmp/hadoop-yarn/staging/david/.staging/job_1567598091808_0001/job.split: Under replicated BP-24234081-0.0.0.0-1567598050928:blk_1073741842_1018. Target Replicas is 10 but found 1 live replica(s), 0 decommissioned replica(s), 0 decommissioning replica(s).
Status: HEALTHY
Number of data-nodes: 1
Number of racks: 1
Total dirs: 11
Total symlinks: 0
Replicated Blocks:
Total size: 510411 B
Total files: 20
Total blocks (validated): 20 (avg. block size 25520 B)
Minimally replicated blocks: 20 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 2 (10.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Missing blocks: 0
Corrupt blocks: 0
Missing replicas: 18 (47.36842 %)
Erasure Coded Block Groups:
Total size: 0 B
Total files: 0
Total block groups (validated): 0
Minimally erasure-coded block groups: 0
Over-erasure-coded block groups: 0
Under-erasure-coded block groups: 0
Unsatisfactory placement block groups: 0
Average block group size: 0.0
Missing block groups: 0
Corrupt block groups: 0
Missing internal blocks: 0
FSCK ended at Wed Sep 04 14:01:43 CEST 2019 in 5 milliseconds
The filesystem under path '/' is HEALTHY
As I didn't find any solution on the Internet about it, here I am :).
I am trying to set up a multinode cluster (Hadoop 1.0.4) and all the daemons are coming up. I have a 2 node Cluster with a Master and one Slave. I am configuring only slave as datanode.
I did change the core-site.xml, mapred-site.xml and hdfs-site.xml , Master(with Master IP) and slaves file (with slave IP).
Configured passwordless ssh and copied it to authorized_keys , copied Master public key to slave authorized_keys.
Formatted Namenode
I could see all daemons - Namenode, Jobtracker and Secondary namenode running in Master and TaskTracker,Datanode on Slave machine.
But when I try to load the data to hdfs using hadoop fs -put command I am getting the following error
15/09/26 08:43:33 ERROR hdfs.DFSClient: Exception closing file /Hello : org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /Hello could only be replicated to 0 nodes, instead of 1 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
I did an fsck command and got the below message.
FSCK started by hadoop from /172.31.18.149 for path / at Sat Sep 26 08:46:00 EDT 2015
Status: HEALTHY
Total size: 0 B
Total dirs: 5
Total files: 0 (Files currently being written: 1)
Total blocks (validated): 0
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 1
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 0
Number of racks: 0
Somehow Datanode is unavailable to the Namenode, but I couldnt figure out why.
Any help is appreciated . Thanks!
I am facing an issue while trying to run a sample Hadoop source code on an ARM processor. Every time I am trying to put some files in the HDFS, I am getting the below error.
13/10/07 11:31:29 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/root/bin/cpu-kmeans2D could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1267)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
at org.apache.hadoop.ipc.Client.call(Client.java:739)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
at com.sun.proxy.$Proxy0.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at com.sun.proxy.$Proxy0.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2904)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2786)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
13/10/07 11:31:29 WARN hdfs.DFSClient: Error Recovery for block null bad datanode[0] nodes == null
13/10/07 11:31:29 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/root/bin/cpu-kmeans2D" - Aborting...
put: java.io.IOException: File /user/root/bin/cpu-kmeans2D could only be replicated to 0 nodes, instead of 1
I tried replicating the namenode and datanode by deleting all the old logs on the master and the slave nodes as well as the folders under /app/hadoop/, after which I formatted the namenode and started the process again (bin/start-all.sh), but still no luck with the same.
I tried generating the admin report(pasted below) after doing the restart, it seems the data node is not getting started.
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)
root#tegra-ubuntu:~/hadoop-gpu-master/hadoop-gpu-0.20.1# bin/hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)
I have tried all possible ways to debug this. I have tried the following methods :
1) I logged in to the HADOOP home directory and removed all the old logs (rm -rf logs/*)
2) Next I deleted the contents of the directory on all my slave and master nodes (rm -rf /app/hadoop/*)
3) I formatted the namenode (bin/hadoop namenode -format)
4) I started all the processes - first the namenode, datanode and then the map - reduce. I typed jps on the terminal to ensure that all the processes (Namenode, Datanode, JobTracker, Task Tracker) are up and running.
5) Now doing this, I recreated the directories in the dfs.
I'm trying to copy a large file (32 GB) into HDFS. I never had any troubles copying files in HDFS but these were all smaller. I'm using hadoop fs -put <myfile> <myhdfsfile> and up to 13,7 GB everything goes well but then I get this exception:
hadoop fs -put * /data/unprocessed/
Exception in thread "main" org.apache.hadoop.fs.FSError: java.io.IOException: Input/output error
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:150)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.readChunk(ChecksumFileSystem.java:217)
at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:237)
at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
at java.io.DataInputStream.read(DataInputStream.java:100)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:74)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:47)
at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:100)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:230)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:191)
at org.apache.hadoop.fs.FileSystem.copyFromLocalFile(FileSystem.java:1183)
at org.apache.hadoop.fs.FsShell.copyFromLocal(FsShell.java:130)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:1762)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:1895)
Caused by: java.io.IOException: Input/output error
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:242)
at org.apache.hadoop.fs.RawLocalFileSystem$TrackingFileInputStream.read(RawLocalFileSystem.java:91)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileInputStream.read(RawLocalFileSystem.java:144)
... 20 more
When I check the log files (on my NameNode and DataNodes) I see that the lease on the file is removed but there's no reason specified. According to the log files everything went well. Here are the last lines of my NameNode log:
2013-01-28 09:43:34,176 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.allocateBlock: /data/unprocessed/AMR_EXPORT.csv. blk_-4784588526865920213_1001
2013-01-28 09:44:16,459 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 10.1.6.114:50010 is added to blk_-4784588526865920213_1001 size 30466048
2013-01-28 09:44:16,466 INFO org.apache.hadoop.hdfs.StateChange: Removing lease on file /data/unprocessed/AMR_EXPORT.csv from client DFSClient_1738322483
2013-01-28 09:44:16,472 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /data/unprocessed/AMR_EXPORT.csv is closed by DFSClient_1738322483
2013-01-28 09:44:16,517 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 168 Total time for transactions(ms): 26Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
Does anyone have a clue on this? I've checked core-default.xml and hdfs-default.xml for properties I could overwrite that would extend the lease or so but couldn't find one.
Some suggestions:
If you have multiple files to copy then use multiple -put sessions
If there is only one large file then use compression before copy OR you can split the large file into small ones then copy
This sounds to be an issue reading the local file than a problem with the hdfs client. The stack trace shows a problem reading the local file that has bubbled all the way up. The lease is dropped because the client has dropped the connection due to the IOException while reading the file.
UPDATE
You need to give hdfs-site.xml to hbase/conf so hbase can use the correct target replica, else it uses default 3.
That fixes the message. But my namenode is always in safemode during every process restart.
The fsck is all fine with no errors, no under replicated etc.
I see no logs after:
2012-10-17 13:15:13,278 INFO org.apache.hadoop.hdfs.StateChange: STATE* Safe mode ON.
The ratio of reported blocks 0.0000 has not reached the threshold 0.9990. Safe mode
will be turned off automatically.
2012-10-17 13:15:14,228 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node:
/default-rack/127.0.0.1:50010
2012-10-17 13:15:14,238 INFO org.apache.hadoop.hdfs.StateChange: BLOCK
NameSystem.processReport: from 127.0.0.1:50010, blocks: 20, processing time: 0 msecs
Any suggestions ?
I have dfs.replication set to 1.
hbase is in distributed mode.
First write goes through, but when I restart namenode always reports blocks as under reported.
Output from hadoop fsck /hbase
/hbase/tb1/.tableinfo.0000000003: Under replicated blk_-6315989673141511716_1029. Target Replicas is 3 but found 1 replica(s).
.
/hbase/tb1/83859abf0f46016485814a5941b16de5/.oldlogs/hlog.1350414672838: Under replicated blk_-7364606700173135939_1027. Target Replicas is 3 but found 1 replica(s).
.
/hbase/tb1/83859abf0f46016485814a5941b16de5/.regioninfo: Under replicated blk_788178851601564156_1027. Target Replicas is 3 but found 1 replica(s).
Status: HEALTHY
Total size: 8731 B
Total dirs: 34
Total files: 25 (Files currently being written: 1)
Total blocks (validated): 25 (avg. block size 349 B) (Total open file blocks (not validated): 1)
Minimally replicated blocks: 25 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 25 (100.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 1.0
Corrupt blocks: 0
Missing replicas: 50 (200.0 %)
Number of data-nodes: 1
Number of racks: 1
FSCK ended at Tue Oct 16 13:23:55 PDT 2012 in 0 milliseconds
Why does it say target replica is 3 but default replication factor is clearly 1.
Anyone please advice.
My versions are hadoop 1.0.3 and hbase 0.94.1
Thanks!
To force the Hdfs to exit from safemode.
Type this:
hadoop dfsadmin -safemode leave