Hive: select * from table fails for tables having more than 5000 rows via HiveServer2 - hadoop

I have a table sdh in Hive which has 100000 rows.
When I execute the command
select * from sdh on CLI, all rows are displayed
But the same command just hangs when I run it via HiveServer2 on beeline
All other tables which have 1000 odd rows work fine via CLI or beeline.
Anyone else face similar issue?
I got error from logs
org.apache.thrift.TApplicationException: Internal error processing FetchResults
at org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_FetchResults(TCLIService.java:489)
at org.apache.hive.service.cli.thrift.TCLIService$Client.FetchResults(TCLIService.java:476)
at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:285)
at org.apache.hive.beeline.BufferedRows.<init>(BufferedRows.java:42)
at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1541)
at org.apache.hive.beeline.Commands.execute(Commands.java:741)
at org.apache.hive.beeline.Commands.sql(Commands.java:657)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:763)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:630)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:363)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:346)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Error: Error retrieving next row (state=,code=0)

Use the option "--incremental=false" with beeline. Otherwise, it tries to read the entire results into memory at same time, and runs out of memory if the results are too large for available memory.
More discussion on making this option the default is here - https://issues.apache.org/jira/browse/HIVE-7224

Please use the option --incremental=true and place it before the -e option in beeline. For me it worked that way.

Related

ORC Split Generation issue with Hive Table

I'm using Hive version 3.1.3 on Hadoop 3.3.4 with Tez 0.9.2. When I create an ORC table that contains splits and try to query it, I get an ORC split generation failed exception. If I concatenate the table, this solves the issue in some cases. In others, however, the issue persists.
First I create the table like so, then try to query it:
CREATE TABLE ClaimsOrc STORED AS ORC
AS
SELECT *
FROM ClaimsImport;
SELECT COUNT(*) FROM ClaimsOrc WHERE ClaimID LIKE '%8%';
I then get the following exception:
Vertex failed, vertexName=Map 1, vertexId=vertex_1667735849290_0008_6_00, diagnostics=[Vertex vertex_1667735849290_0008_6_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: claimsorc initializer failed, vertex=vertex_1667735849290_0008_6_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1851)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1939)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:519)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:765)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.concurrent.ExecutionException: java.lang.NoSuchMethodError: org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1790)
However, if I concatenate the table first, which combines the output files into fewer smaller files, the table works fine:
ALTER TABLE ClaimsOrc CONCATENATE;
OK
Time taken: 11.673 seconds
SELECT COUNT(*) FROM ClaimsOrc WHERE ClaimID LIKE '%8%';
OK
1463419
Time taken: 7.446 seconds, Fetched: 1 row(s)
It appears something is going wrong with how the initial CTAS query calculates the splits, and that CONCATENATE fixes it in some cases. But in some cases, it doesn't, and there's no work around. How can I fix this?
A few other things worth noting:
Using DESCRIBE EXTENDED ClaimsOrc; shows that ClaimsOrc is an ORC table.
The source table ClaimsImport contains about 24 gzipped pipe delimited files.
Before the CONCATENATE, the ClaimsOrc table contains about 24 files
After the CONCATENATE, the ClaimsOrc table contains only 3 file splits
Before the CONCATENATE command, the ORC files appear to be valid. Using the orcfiledump command, I don't see any errors in the few I spot checked.
Tez 0.9.2 contains a tez.tar.gz that needs to be placed onto HDFS location. This tez.tar.gz contained hadoop-common-2.7.2.jar by default(This does not have the method compareTo that is thrown as an exception as shown in the error )
Repackage this jar with latest Hadoop jars or copy from the version of yours (hadoop 3.3.4) and you may have to repackage with other jars like guava, Woodstox, stax2 api and many more. Put this repackaged tar gz of tez into all nodes and hdfs location.
This error should go away. You may end up with other errors like I said which you could solve with adding Additional Hadoop dependency jars.
Otherwise upgrade tez to 0.10.x version, validate its Hadoop version. Expecting it to be hadoop3.x This would straight away be the solution.

All Hive functions fail

I honestly don't know what is going on. Everything fails except basically SHOW DATABASES.
Nothing on this page (below) even works. Everything gives me a NoViableAltException. This is both in Hive and Beeline CLI.
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
My opinion is I shouldn't have to be connected to a DB to execute these commands which didn't initially work. But, I decided to try connecting to the default DB and still all the same failures. Created a new DB and same failures. Sources I've used for doc info...
http://www.folkstalk.com/2011/11/date-functions-in-hive.html
http://www.folkstalk.com/2011/11/difference-between-normal-tables-and.html
http://hortonworks.com/wp-content/uploads/2016/05/Hortonworks.CheatSheet.SQLtoHive.pdf
Add minutes to datetime in Hive
Nothing works. All NoViableAltException exception. I started the cloudera manger and all things check healthy except Impala. I'm using the CDH5 quickstart docker image. Tried both hive and beeline cli. Any help would be appreciated.
EDIT: Here is an example from another StackOverflow question that fails though different exception.
0: jdbc:hive2://localhost:10000/default> from_unixtime(unix_timestamp());
Error: Error while compiling statement: FAILED: ParseException line 1:0 cannot recognize input near 'from_unixtime' '(' 'unix_timestamp' (state=42000,code=40000)
0: jdbc:hive2://localhost:10000/default>
This one is a different exception but I can't get anything other than show databases/tables/etc to work.
Here is another...
hive> DATEDIFF('2000-03-01', '2000-01-10');
NoViableAltException(26#[])
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1024)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:201)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:423)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
FAILED: ParseException line 1:0 cannot recognize input near 'DATEDIFF' '(' ''2000-03-01''
Literally nothing I see on the net that should work works.
Just add SELECT.
SELECT from_unixtime(unix_timestamp());
SELECT DATEDIFF('2000-03-01', '2000-01-10');

[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement

I'm getting the following error while executing queries against a database in impala. With other databases its working fine.
Error trace is as follows.
[Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: select * from test_table limit 1, SQL state: {1}, Query: {2}.[]
java.sql.SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: [Simba][JSQLEngine](12010) The table "test_table" could not be found., SQL state: HY000, Query: select count(*) from test_table.
at com.cloudera.impala.hivecommon.dataengine.HiveJDBCDataEngine.prepare(Unknown Source)
at com.cloudera.impala.jdbc.common.SStatement.executeNoParams(Unknown Source)
at com.cloudera.impala.jdbc.common.SStatement.executeQuery(Unknown Source)
Caused by: com.cloudera.impala.support.exceptions.GeneralException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: [Simba][JSQLEngine](12010) The table "test_table" could not be found., SQL state: HY000, Query: select count(*) from test_table.
... 3 more
if I execute show tables its listing the table name.
If I execute it from hue its not displaying anything in the result.
I tried by invalidating the metadata.
I tried by changing to latest driver jdbc41 same problem.
Where might be the problem?
In my case, this error was caused by not having a /user/scott directory on hdfs with write permissions for the Hiveserver (running as cloudera-scm user)(my jdbc connection uses scott as the user id). Once I created the dir and chmod'ed it, I could run all queries. Earlier only select * worked but select count(*) did not.
The problem was in the .avro file format. My teamlead has fixed it, Not sure what he has done he just said it was the problem of file format.

WebHCAT Error in getting Hive Table Metadata. Command was terminated due to timeout(10000ms). See templeton.exec.timeout property","exitcode":143

If I issue this webhcat REST call in my cloudera 5.4.1 environment
curl -s 'http://mywebhcat:50111/templeton/v1/ddl/database/default/table/person?
user.name=admin&format=extended'; echo; echo;
everything works fine and I see the metadata for the table Person.
But there is another table called foo_bar if I change the REST call above to
curl -s 'http://mywebhcat:50111/templeton/v1/ddl/database/default/table/foo_bar?
user.name=admin&format=extended'; echo; echo;
Then I get an error
{"statement":"use default; show table extended like
foo_bar;","error":"unable to show table:
foo_bar","exec":{"stdout":"","stderr":"which: no /opt/cloudera/parcels/
CDH-5.4.1-1.cdh5.4.1.p0.6/lib/hadoop/bin/hadoop in ((null))\ndirname: missing operand\nTry
`dirname --help' for more information.\nlog4j:ERROR setFile(null,true) call failed.\njava.io
.FileNotFoundException: /opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/lib/hive/logs/hcat.
log (No such file or directory)\n\tat java.io.FileOutputStream.open0(Native Method)\n\tat
java.io.FileOutputStream.open(FileOutputStream.java:270)\n\tat java.io.FileOutputStream.<
init>(FileOutputStream.java:213)\n\tat java.io.FileOutputStream.<init>(FileOutputStream.
java:133)\n\tat org.apache.log4j.FileAppender.setFile(FileAppender.java:294)\n\tat org.
apache.log4j.FileAppender.activateOptions(FileAppender.java:165)\n\tat org.apache.log4j.
DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223)\n\tat org.apache
.log4j.config.PropertySetter.activate(PropertySetter.java:307)\n\tat org.apache.log4j.config
.PropertySetter.setProperties(PropertySetter.java:172)\n\tat org.apache.log4j.config.
PropertySetter.setProperties(PropertySetter.java:104)\n\tat org.apache.log4j.
PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)\n\tat org.apache.log4j.
PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)\n\tat org.apache.log4j.
PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)\n\tat org.apache.
log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)\n\tat org.apache.log4j
.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)\n\tat org.apache.log4j.
PropertyConfigurator.configure(PropertyConfigurator.java:415)\n\tat org.apache.hadoop.hive.
common.LogUtils.initHiveLog4jDefault(LogUtils.java:127)\n\tat org.apache.hadoop.hive.common.
LogUtils.initHiveLog4jCommon(LogUtils.java:77)\n\tat org.apache.hadoop.hive.common.LogUtils.
initHiveLog4j(LogUtils.java:58)\n\tat org.apache.hive.hcatalog.cli.HCatCli.main(HCatCli.
java:65)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.
DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.
reflect.Method.invoke(Method.java:497)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.
java:221)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:136)\nlog4j:ERROR Either
File or DatePattern options are not set for appender [DRFA].\nOK\nTime taken: 1.035
seconds\n Command was terminated due to timeout(10000ms). See templeton.exec.timeout
property","exitcode":143}}
I don't know why it complains about a missing log directory only for foo_bar table but successfully returns the metadata regarding Person.
BTW, I can go into the hive console and do a select count(*) query on both Person and foo_bar.
EDIT::
Upon reading the error message again, it seems the core problem is
Command was terminated due to timeout(10000ms). See templeton.exec.timeout property","exitcode":143
But cloudera manager is not aware of this property "templeton.exec.timeout" ... what do I do... I don't want to edit files manually as there are many many nodes in the cluster.
Edit2::
I went inside each hadoop node and did
sudo vi /opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/etc/hive-webhcat/conf.dist/webhcat-default.xml
I found the timeout value and increased it to 1000000. I did this on each one and then I restarted Hive and WebHCat server using cloudera manager. but I got exactly the same error message.

hive failed execution error return code 2 from org.apache.hadoop.hive.ql.exec.mapredtask

I have one query. It is executing fine on Hive CLI and returning the result. But when I am executing it with the help of Hive JDBC, I am getting an error below:
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:192)
What is the problem? Also I am starting the Hive Thrift Server through Shell Script. (I have written a shell script which has command to start Hive Thrift Server) Later I decided to start Hive thrift Server manually by typing command as:
hadoop#ubuntu:~/hive-0.7.1$ bin/hive --service hiveserver
Starting Hive Thrift Server
org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:10000.
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:99)
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:80)
at org.apache.thrift.transport.TServerSocket.<init>(TServerSocket.java:73)
at org.apache.hadoop.hive.service.HiveServer.main(HiveServer.java:384)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
hadoop#ubuntu:~/hive-0.7.1$
Please help me out from this.
Thanks
For this error :
java.sql.SQLException: Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuer
Go to this link :
http://docs.amazonwebservices.com/ElasticMapReduce/latest/DeveloperGuide/UsingEMR_Hive.html
and add
**hadoop-0.20-core.jar
hive/lib/hive-exec-0.7.1.jar
hive/lib/hive-jdbc-0.7.1.jar
hive/lib/hive-metastore-0.7.1.jar
hive/lib/hive-service-0.7.1.jar
hive/lib/libfb303.jar
lib/commons-logging-1.0.4.jar
slf4j-api-1.6.1.jar
slf4j-log4j12-1.6.1.jar**
to the class path of your project , add this jars from the lib of hadoop and hive, and try the code. and also add the path of hadoop, hive, and hbase(if your are using) lib folder path to the project class path, like you have added the jars.
and for the second error you got
type
**netstat -nl | grep 10000**
if it shows something means hive server is already running. the second error comes only when the port you are specifying is already acquired by some other process, by default server port is 10000 so very with the above netstat command which i said.
Note : suppose you have connected using code exit from ... bin/hive of if you are connected through bin/hive > then code will not connect because i think (not sure) only one client can connect to the hive server.
do above steps hopefully will solve your problem.
NOTE : exit from cli when you are going to execute the code, and dont start cli while code is being executing.
Might be some issue with permission, just try some query like "SELECT * FROM " which won't start MR jobs.
Try to paste these property before the codes.
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
set hive.auto.convert.join = false;
set hive.exec.max.dynamic.partitions=100000;
set hive.exec.max.dynamic.partitions.pernode=10000;

Resources