Apache Hive Not Returning YARN Application Results Correctly

Apache Hive Not Returning YARN Application Results Correctly - hadoop

I'm running a from-scratch cluster on AWS EC2. I have an external table (partitioned) defined with data on S3. I'm able to query this table and receive results to the console with a simple select * statement:
hive> set hive.execution.engine=tez;
hive> select * from external_table where partition_1='1' and partition_2='2';
<correct results returned>
Running a query that requires Tez doesn't return the results to the console:
hive> set hive.execution.engine=tez;
hive> select count(*) from external_table where partition_1='1' and partition_2='2';
Status: Running (Executing on YARN cluster with App id application_1572972524483_0012)
OK
+------+
| _c0 |
+------+
+------+
No rows selected (8.902 seconds)
However, if I dig in the logs and on the filesystem, I can find the results from that query:
(yarn.resourcemanager.log) org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572972524483_0022 CONTAINERID=container_1572972524483_0022_01_000002 RESOURCE=<memory:1024, vCores:1> QUEUENAME=default
(container_folder/syslog_attempt) [TezChild] |exec.FileSinkOperator|: New Final Path: FS file:/tmp/<REALLY LONG FILE PATH>/000000_0
[root #] cat /tmp/<REALLY LONG FILE PATH>/000000_0
SEQ"org.apache.hadoop.io.BytesWritableorg.apache.hadoop.io.Textl▒ꩇ1som}▒▒j¹▒ 2060
2060 is the correct count for the partition.
Now, oddly enough, I'm able to get the results from the application if I insert overwrite directory on HDFS:
hive> set hive.execution.engine=tez;
hive> INSERT OVERWRITE DIRECTORY '/tmp/local_out' select count(*) from external_table where partition_1='1' and partition_2='2';
[root #] hdfs dfs -cat /tmp/local_out/000000_0
2060
However, attempting to insert overwrite local directory fails:
hive> set hive.execution.engine=tez;
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' select count(*) from external_table where partition_1='1' and partition_2='2';
[root #] cat /tmp/local_out/000000_0
cat: /tmp/local_out/000000_0: No such file or directory
If I cat the container result file for this query, it's only the number, no class name or special characters:
[root #] cat /tmp/<REALLY LONG FILE PATH>/000000_0
2060
The only out-of-place log message I can find comes from the YARN ResourceManager log:
(yarn.resourcemanager.log) INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1572972524483_0023 CONTAINERID=container_1572972524483_0023_01_000004 RESOURCE=<memory:1024, vCores:1> QUEUENAME=default
(yarn.resourcemanager.log) WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root IP=NMIP OPERATION=AM Released Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to release container not owned by app or with invalid id. PERMISSIONS=Unauthorized access or invalid container APPID=application_1572972524483_0023 CONTAINERID=container_1572972524483_0023_01_000004
My guess based on the vaguest of impressions is there's a character encoding problem when the results are written to the local filesystem (hence the special characters in the container result file) but it's really just a guess and I have no idea how to verify/tackle that issue. Any help is greatly appreciated!

Someone on the Apache Hive mailing list suggested this was being caused by the YARN container writing its results files to the local machine where it was running instead of HDFS. I did some digging in the source code and found that:
mapreduce.framework.name=local
which is the default in Hadoop 3.2.1, was causing the problem.
Solved with:
set mapreduce.framework.name=yarn

how you inserted data into hive table? Hive gives count(*) result from metastore instead of running a count job to optimise performance. Try MSCK Repair on this table first to let hive know about new external files and modify hive metastore accordingly.

Related

HBase: The table test does not exist in meta but has a znode. run hbck to fix inconsistencies (which fails)

I recently added a table test while getting started on HBase.
I decided to reinstall HBase due to some issues.
After reinstalling and running the HBase shell I tried:
hbase(main):004:0> list
TABLE
0 row(s) in 0.0070 seconds
=> []
So there are no tables. Now I tried to add the table test
hbase(main):005:0> create 'test', 'testfamily'
ERROR: Table already exists: test!
I took a look into the log files and found the following entry
2018-06-21 07:53:30,646 WARN [ProcedureExecutor-2]
procedure.CreateTableProcedure: The table test does not exist in meta
but has a znode. run hbck to fix inconsistencies.
I ran it and got the following
$ hbase hbck test
Table hbase:meta is okay.
Number of regions: 1
Deployed on: my_IP,16201,1529567081041
0 inconsistencies detected.
Status: OK
I'm wondering if there's a way to remove the znode by hand?

I have also faced the same issue where it was showing the following error
The table does not exist in meta but has a znode. run hbck to fix inconsistencies.
The answer is obvious in the error only.
Inconsistency is caused as the table exist in your zookeeper quorum(distributed/pseudo distributed mode) or single zookeeper node(for standalone mode) but is not present in hbase .
So the solution will be to remove the table from zookeeper node.
To do so -
Open zookeeper client. bin/zkCli.sh
You can see all the tables which are picked by zookeeper by ls /hbase/table
Try to find the table name mentioned in the error and run rmr /hbase/table/<table_name>.This will remove that table from the state of zookeeper.
Try to create table again from Hbase.It will get created without any problem.

Where hive stores table locally?

I have created a hive table and trying to locate where hive have created an hdfs file for this table locally. The Hive version is 2.3.0.
I tried this command to retrieve the location of my table
hive> describe formatted table_name;
I got this as an output(only showing relevant output! tb2 is the table_name in this case)
Location: hdfs://localhost:54310/user/hive/warehouse/tb2
I have no clue how to redirect to hdfs://localhost:54310 locally(from terminal). Also the table is not present in hadoop default directory.

Try running the below command to view the hive table. In the output you will find a folder by your tablename
hadoop dfs -ls /user/hive/warehouse/tb2
A table in hive is basically a folder in hdfs.

Hive No files matching path file and file Exists

I'm having a lot of trouble getting hive to work. I'm running CDH4.5 with YARN, all installed from Cloudera's yum repo. I followed their instructions to set up hive but for some reason it does not recognize legitimate files on my local file system.
[msknapp#localhost data]$ pwd
/home/msknapp/data
[msknapp#localhost data]$ ll | grep county_insurance_pp.txt
-rw-rw-rw- 1 msknapp msknapp 162537 Jan 5 14:58 county_insurance_pp.txt
[msknapp#localhost data]$ sudo -u hive hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/hive/hive_job_log_9e8bf55b-7ec8-4b79-be9b-cc2200a33f91_1795256456.txt
hive> describe count_insurance;
2014-01-08 02:42:59.000 GMT Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
----------------------------------------------------------------
2014-01-08 02:42:59.443 GMT:
Booting Derby version The Apache Software Foundation - Apache Derby - 10.4.2.0 - (689064): instance a816c00e-0143-6fbb-3f3a-000007a1d270
on database directory /var/lib/hive/metastore/metastore_db
Database Class Loader started - derby.database.classpath=''
OK
fips int
st string
stfips int
name string
a int
b int
c int
d int
e int
f int
total int
Time taken: 5.195 seconds
hive> LOAD DATA LOCAL INPATH 'county_insurance_pp.txt' OVERWRITE INTO TABLE count_insurance;
FAILED: SemanticException Line 1:23 Invalid path ''county_insurance_pp.txt'': No files matching path file:/home/msknapp/data/county_insurance_pp.txt
The file I'm trying to load does exist. I get the same exception when I use an absolute path in my load statement.
On a side note, I still don't know why it keeps giving me a FileNotFoundException for the derby log with a permission warning. A long time ago I went to /var/lib/hive and did 'sudo chmod -R 777 ./*', so permissions should not be a problem.
BTW I am running hadoop in pseudo-distributed mode, and have all three hive daemons running locally. I used hive-server2 not 1.
Somebody please let me know what I'm doing wrong here, or how to debug this.

This is Koji. I had a same problem recently.
The hive script run Hadoop server. If the file county_insurance_pp.txt does not exist on Hadoop server, it can not find the file.
You have to send your target file to Hadoop server before running the script. There are 2 ways to handle this:
use scp
use webhdfs (http://hadoop.apache.org/docs/r1.0.4/webhdfs.html)

Hive not fully honoring fs.default.name/fs.defaultFS value in core-site.xml

I have the NameNode service installed on a machine called hadoop.
The core-site.xml file has the fs.defaultFS (equivalent to fs.default.name) set to the following:
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:8020</value>
</property>
I have a very simple table called test_table that currently exists in the Hive server on the HDFS. That is, it is stored under /user/hive/warehouse/test_table. It was created using a very simple command in Hive:
CREATE TABLE new_table (record_id INT);
If I attempt to load data into the table locally (that is, using LOAD DATA LOCAL), everything proceeds as expected. However, if the data is stored on the HDFS and I want to load from there, an issue occurs.
I run a very simple query to attempt this load:
hive> LOAD DATA INPATH '/user/haduser/test_table.csv' INTO TABLE test_table;
Doing so leads to the following error:
FAILED: SemanticException [Error 10028]: Line 1:17 Path is not legal ''/user/haduser/test_table.csv'':
Move from: hdfs://hadoop:8020/user/haduser/test_table.csv to: hdfs://localhost:8020/user/hive/warehouse/test_table is not valid.
Please check that values for params "default.fs.name" and "hive.metastore.warehouse.dir" do not conflict.
As the error states, it is attempting to move from hdfs://hadoop:8020/user/haduser/test_table.csv to hdfs://localhost:8020/user/hive/warehouse/test_table. The first path is correct because it references hadoop:8020; the second path is incorrect, because it references localhost:8020.
The core-site.xml file clearly states to use hdfs://hadoop:8020. The hive.metastore.warehouse value in hive-site.xml correctly points to /user/hive/warehouse. Thus, I doubt this error message has any true value.
How can I get the Hive server to use the correct NameNode address when creating tables?

I found that the Hive metastore tracks the location of each table. You can see the that location be running the following in the Hive console.
hive> DESCRIBE EXTENDED test_table;
Thus, this issue occurs if the NameNode in core-site.xml was changed while the metastore service was still running. Therefore, to resolve this issue the service should be restarted on that machine:
$ sudo service hive-metastore restart
Then, the metastore will use the new fs.defaultFS for newly created tables such.
Already Existing Tables
The location for tables that already exist can be corrected by running the following set of commands. These were obtained from Cloudera documentation to configure the Hive metastore to use High-Availability.
$ /usr/lib/hive/bin/metatool -listFSRoot
...
Listing FS Roots..
hdfs://localhost:8020/user/hive/warehouse
hdfs://localhost:8020/user/hive/warehouse/test.db
Correcting the NameNode location:
$ /usr/lib/hive/bin/metatool -updateLocation hdfs://hadoop:8020 hdfs://localhost:8020
Now the listed NameNode is correct.
$ /usr/lib/hive/bin/metatool -listFSRoot
...
Listing FS Roots..
hdfs://hadoop:8020/user/hive/warehouse
hdfs://hadoop:8020/user/hive/warehouse/test.db

hbase cannot find an existing table

I set up a hbase cluster to store data from opentsdb. Recently due to reboot of some of the nodes, hbase lost the table "tsdb". I can still it on hbase's master node page, but when I click on it, it gives me a tableNotFoundException
org.apache.hadoop.hbase.TableNotFoundException: tsdb
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:952)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:818)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:782)
at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:249)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:213)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:171)
......
I entered hbase shell, trying to locate 'tsdb' table, but got the similar message
hbase(main):018:0> scan 'tsdb'
ROW COLUMN+CELL
ERROR: Unknown table tsdb!
However when I tried to re-create this table, hbase shell told me the table already exist...
hbase(main):013:0> create 'tsdb', {NAME => 't', VERSIONS => 1, BLOOMFILTER=>'ROW'}
ERROR: Table already exists: tsdb!
And I can also list the table in hbase shell
hbase(main):001:0> list
TABLE
tsdb
tsdb-uid
2 row(s) in 0.6730 seconds
Taking a look at the log, I found this which should be the cause of my issue
2012-05-14 12:06:22,140 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table:
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: tsdb, row=tsdb,,99999999999999
at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:157)
at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
It says cannot find row of tsbb in .META., but there are indeed tsdb rows in .META.
hbase(main):002:0> scan '.META.'
ROW COLUMN+CELL
tsdb,\x00\x00\x0FO\xA2\xF1\xD0\x00\x00\x01\x00\x00\x0E\x00\ column=info:regioninfo, timestamp=1336311752799, value={NAME => 'tsdb,\x00\x00\x0FO\xA2\xF1\xD0\x00\x00\x01\x00\x00\x0E\x00\x00\x02\x00\x00\x12\x00\x00\x03\x00\x00\x13\x00\x00\x
x00\x02\x00\x00\x12\x00\x00\x03\x00\x00\x13\x00\x00\x05\x00 05\x00\x001,1336311752340.7cd0d2205d9ae5fcadf843972ec74ec5.', STARTKEY => '\x00\x00\x0FO\xA2\xF1\xD0\x00\x00\x01\x00\x00\x0E\x00\x00\x02\x00\x00\x12\x00\x00\x03\x00\x00\x13\x00\
\x001,1336311752340.7cd0d2205d9ae5fcadf843972ec74ec5. x00\x05\x00\x001', ENDKEY => '\x00\x00\x10O\xA3\x8C\x80\x00\x00\x01\x00\x00\x0B\x00\x00\x02\x00\x00\x19\x00\x00\x03\x00\x00\x1A\x00\x00\x05\x00\x001', ENCODED => 7cd0d2205d9ae5f
cadf843972ec74ec5,}
tsdb,\x00\x00\x0FO\xA2\xF1\xD0\x00\x00\x01\x00\x00\x0E\x00\ column=info:server, timestamp=1337011527000, value=brycobapd01.usnycbt.amrs.bankofamerica.com:60020
x00\x02\x00\x00\x12\x00\x00\x03\x00\x00\x13\x00\x00\x05\x00
\x001,1336311752340.7cd0d2205d9ae5fcadf843972ec74ec5.
tsdb,\x00\x00\x0FO\xA2\xF1\xD0\x00\x00\x01\x00\x00\x0E\x00\ column=info:serverstartcode, timestamp=1337011527000, value=1337011518948
......
tsdb-uid,,1336081042372.a30d8074431c6a31c6a0a30e61fedefa. column=info:server, timestamp=1337011527458, value=bry200163111d.usnycbt.amrs.bankofamerica.com:60020
tsdb-uid,,1336081042372.a30d8074431c6a31c6a0a30e61fedefa. column=info:serverstartcode, timestamp=1337011527458, value=1337011519807
6 row(s) in 0.2950 seconds
Here is the result after I ran "hbck" on the cluster
ERROR: Region hdfs://slave-node-1:9000/hbase/tsdb/249438af5657bf1881a837c23997747e on HDFS, but not listed in META or deployed on any region server
ERROR: Region hdfs://slave-node-1:9000/hbase/tsdb/4f8c65fb72910870690b94848879db1c on HDFS, but not listed in META or deployed on any region server
ERROR: Region hdfs://slave-node-1:9000/hbase/tsdb/63276708b4ac9f11e241aca8b56e9def on HDFS, but not listed in META or deployed on any region server
ERROR: Region hdfs://slave-node-1:9000/hbase/tsdb/e54ee4def67d7f3b6dba75a3430e0544 on HDFS, but not listed in META or deployed on any region server
ERROR: (region tsdb,\x00\x00\x0FO\xA2\xF1\xD0\x00\x00\x01\x00\x00\x0E\x00\x00\x02\x00\x00\x12\x00\x00\x03\x00\x00\x13\x00\x00\x05\x00\x001,1336311752340.7cd0d2205d9ae5fcadf843972ec74ec5.) First region should start with an empty key. You need to create a new region and regioninfo in HDFS to plug the hole.
ERROR: Found inconsistency in table tsdb
Summary:
-ROOT- is okay.
Number of regions: 1
Deployed on: master-node,60020,1337011518948
.META. is okay.
Number of regions: 1
Deployed on: slave-node-2,60020,1337011519845
Table tsdb is inconsistent.
Number of regions: 5
Deployed on: slave-node-2,60020,1337011519845 slave-node-1,60020,1337011519807 master-node,60020,1337011518948
tsdb-uid is okay.
Number of regions: 1
Deployed on: slave-node-1,60020,1337011519807
5 inconsistencies detected.
Status: INCONSISTENT
I have run
bin/hbase hbck -fix
which unfortunately does not solve my problem
Could someone help me out on this that
Is it possible to recover this table "tsdb"?
If 1 cannot be done, is it a suggested way to gracefully remove 'tsdb', and create a new one?
I'd be greatly appreciated if anybody can let me know what is the most suggested way to reboot a node? Currently, I am leaving my master node always up. For other nodes, I run this command immediately after its reboot.
command:
# start data node
bin/hadoop-daemon.sh start datanode
bin/hadoop-daemon.sh start jobtracker
# start hbase
bin/hbase-daemon.sh start zookeeper
bin/hbase-daemon.sh start regionserver
Many Thanks!

A bit late, maybe it's helpful to the searcher.
Run the ZooKeeper shell hbase zkcli
In the shell run ls /hbase/table
Run rmr /hbase/table/TABLE_NAME
Restart Hbase

I am not very sure why you are unable to scan it. However, to recreate the table, you can try this:
1) Delete all entries in the .META table for this table manually, and
2) Delete the directory corresponding to this table from HDFS
Try creating the table again after that.

If you are using cdh4.3 then the path in zookeeper should be /hbase/table94/

To expand on #Devin Bayer's answer, run:
delete /hbase/table/<name_of_zombie_table>
if you find any zombie tables being maintained by the zookeeper. For more help on this issue, you should google 'HBase zombie tables'.

try to fix meta
hbase hbck
hbase hbck -fixMeta
hbase hbck -fixAssignments
hbase hbck -fixReferenceFiles
after and try again

More instructions on deleting the tables:
~/hbase-0.94.12/bin/hbase shell
> truncate 'tsdb'
> truncate 'tsdb-meta'
> truncate 'tsdb-uid'
> truncate 'tsdb-tree'
> exit
I also had to restart the tsd daemon.

I get a similar error message when I try an HBase connection from a Java client on a machine that doesn't have the TCP privilege to access the HBase machines.
The table indeed exists when I do hbase shell on the HBase machine itself.
Does opentsdb has all the privileges/port config to access the HBase machine ?

I do face these issues at my workplace. I usually either delete the znodes and them remove the corresponding table or restart hbase both HMaster and Hregionserver to get hbck status OK.

It is enough to remove the specified table from your zookeeper path.
For example if zookeeper.znode.parent is configured to blob in hbase-site.xml you should start zkCli.sh in your zookeeper server shell and remove that directory by rmr /blob/table/tsdb command.

hbase-clean.sh --cleanZk
It works well, simple enough.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio