HBase Bulk Loading Error. What Wrong? - hadoop

I tried to data bulk loading into hbase table like below and successed.
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.bulk.output=/tmp/example_output -Dimporttsv.columns=HBASE_ROW_KEY,cf1:val1,cf1:val2,cf1:val3 so_table /user/uclab/smallbusiness/bulk3/
After doing this job, I performed like below.
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/example_output so_table
But Some error occured recursively like below.
2015-10-12 01:52:42.835 DEBUG [LoadIncrementalHFiles-0]
mapreduce.LoadIncrementalHFiles: Goint to connect to server
regiont=so_table,,1444580736986.3c5aa99d4ca4dcb509c8cfb26c2b223f.,
hostname=datanode83,60020,1444578166533, seqNum=2 for row with hfile
group[{[B#5d37ce06,hdfs://namenode.uclab.com:8020/tmp/example_output/cf1/541f346
80be24932afa54c3fa14e4ad4}]
and
Caused by: org.apache.hadoop.ipc.RemoteException
(org.apache.hadoop.security.AccessControlException):
Permission denied: user=hbase, access=WRITE,
inode="/tmp/example_output/cf1":uclab:hdfs:drwxr-xr-x
How can I give write permission? and How can I solve this problem...?

I too faced similar kinda problem on Cloudera Quickstart VM.
Change the owner to “hbase” or HBase won’t have the permission to move the files. Run the following command:
sudo -u hdfs hdfs dfs -chown -R hbase:hbase /tmp/example_output
Now run
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/example_output so_table

Related

java.io.EOFException: Premature EOF: no length prefix available in Spark on Hadoop

I'm getting this weird exception. I'm using Spark 1.6.0 on Hadoop 2.6.4 and submitting Spark job on YARN cluster.
16/07/23 20:05:21 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-532134798-128.110.152.143-1469321545728:blk_1073741865_1041
java.io.EOFException: Premature EOF: no length prefix available
at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2203)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:867)
16/07/23 20:49:09 ERROR server.TransportRequestHandler: Error sending result RpcResponse{requestId=4719626006875125240, body=NioManagedBuffer{buf=java.nio.HeapByteBuffer[pos=0 lim=81 cap=81]}} to ms0440.utah.cloudlab.us/128.110.152.175:58944; closing connection
java.nio.channels.ClosedChannelException
I was getting this error when running on Hadoop 2.6.0 and thought the exception might be kind of a bug like this but after even changing this to Hadoop 2.6.4 I'm getting the same error. There is not any memory problem, my cluster is good with HDFS and memory. I went through this and this but no luck.
Note: 1. I'm using Apache Hadoop and Spark not any CDH/HDP. 2. I'm able to copy data in HDFS and even able to execute another job on this cluster.
Check file permissions of dfs directory:
find /path/to/dfs -group root
In general, the user permission group is hdfs.
Since I started HDFS service with root user, some dfs block file with root permissions was generated.
I solved the problem after change to right permissions:
sudo chown -R hdfs:hdfs /path/to/dfs

hive hadoop permissions not correct

I installed apache kylin which requires Hadoop, hove, hbase and java to work. All things are installed correctly. Now when I try to run this example. I get error after the first command ie ${KYLIN_HOME}/bin/sample.sh
and below is the error I am getting
Loading data to table default.kylin_sales
Failed with exception Unable to move source file:/usr/lib/kylin/sample_cube/data/DEFAULT.KYLIN_SALES.csv to destination hdfs://localhost:54310/user/hive/warehouse/kylin_sales/DEFAULT.KYLIN_SALES.csv
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
I have set 777 permissions for both the above path and I am operating as root
Check the hdfs directory permission. If it is not like below, change permission like below
hdfs dfs -chmod g+w /user/hive/warehouse

Not able to create new table in hive from Spark-shell

I am using single node setup in Redhat and installed Hadoop Hive Pig and Spark . I configured hive metadata in Derby and everything . I created new folder for Hive tables and gave full privilege (chmod 777 ) . Then I created one table from Hive CLI and I am able to select those data in Spark-shell and printed those values to the console. But from Spark-shell/Spark-Sql I am not able to create new tables .It is throwing error as
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:file:/2016/hive/test2 is not a directory or unable to create one)
I checked the permission and User(using same user for Installation and Hive and Hadoop Spark etc).
Is there anything need to be done for getting full integration of Spark and Hive
Thanks
Check that the permissions in hdfs are correct (not just the filesystem)
hadoop fs -chmod -R 755 /user
If the error message persists afterwards please update the question.

Permission denied issue in mapreduce?

I have tried the below query.
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount /home/cloudera/Desktop/words/output
map reduce is started after that its showing below error. can anyone please help on this issue.
15/11/04 10:33:57 INFO mapred.JobClient: Task Id : attempt_201511040935_0008_m_000002_0, Status : FAILED
org.apache.hadoop.security.AccessControlException: Permission denied: user=cloudera, access=WRITE, inode="/":hdfs:supergroup:drwxr-xr-x
Do I need to change anything config file or in cloudera manager.
The exception suggests that you are trying to write to the HDFS root directory "/" which you (user:cloudera) does not have permission to do.
Without knowing what your specific jar does:
I guess that the last argument ("/home/cloudera/Desktop/words/output") is where you wish to place the output.
I guess this is supposed to be within HDFS where /home does not exist.
Try to change this to somewhere where you can write, possibly "/user/cloudera/words/output"
There are set of default directories to be created before you start using the hadoop cluster,
do, it should show you the directories
$ hadoop fs -ls /
sample user, if you want to run as cloudera you need on hdfs
/user/cloudera -- the user running the program
/user/hadoop -- your hadoop file system user
/user/mapred -- your mapred user
/tmp -- temporary which needs to have permission hdfs chmod 1777
HTH.
The last argument that you are passing should be the output path of HDFS not the default file system.
As you are running with cloudera user, you can point to the /user/cloudera/words/output. But first you need to check whether you have cloudera in your HDFS and you have write permission by issuing the following
hadoop fs -ls /user/
Once you have it change your command to following:
hadoop jar /home/cloudera/workspace/para.jar word.Paras examples/wordcount <path_where_you_have_write_permission_in_HDFS>

FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed

I am using Ubuntu 12.04, hadoop-0.23.5, hive-0.9.0.
I specified my metastore_db separately to some other place $HIVE_HOME/my_db/metastore_db in hive-site.xml
Hadoop runs fine, jps gives ResourceManager,NameNode,DataNode,NodeManager,SecondaryNameNode
Hive gets started perfectly,metastore_db & derby.log also created,and all hive commands run successfully,I can create databases,table,etc. But after few day later,when I run show databases,or show tables, get below error
FAILED: Error in metadata: MetaException(message:Got exception: java.net.ConnectException Call to localhost/127.0.0.1:54310 failed on connection exception: java.net.ConnectException: Connection refused) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
I had this problem too and the accepted answer did not help me so will add my solution here for others:
My problem was I had a single machine with a pseudo distributed set up installed with hive. It was working fine with localhost as the host name. However when we decided to add multiple machines to the cluster we also decided to give the machines proper names "machine01, machine 02 etc etc".
I changed all the hadoop conf/*-site.xml files and the hive-site.xml file too but still had the error. After exhaustive research I realized that in the metastore db hive was picking up the URIs not from *-site files, but from the metastore tables in mysql. Where all the hive table meta data was saved are two tables SDS and DBS. Upon changing the DB_LOCATION_URI column and LOCATION in the tables DBS and SDS respectively to point to the latest namenode URI, I was back in business.
Hope this helps others.
reasons for this
If you changed your Hadoop/Hive version,you may be specifying previous hadoop version (which has ds.default.name=hdfs://localhost:54310 in core-site.xml) in your hive-0.9.0/conf/hive-env.sh
file
$HADOOP_HOME may be point to some other location
Specified version of Hadoop is not working
your namenode may be in safe mode ,run bin/hdfs dfsadmin -safemode leave or bin/hadoop dsfadmin -safemode leave
In case of fresh installation
the above problem can be the effect of a name node issue
try formatting the namenode using the command
hadoop namenode -format
1.Turn off your namenode from safe mode. Try the commands below:
hadoop dfsadmin -safemode leave
2.Restart your Hadoop daemons:
sudo service hadoop-master stop
sudo service hadoop-master start

Resources