I am trying to insert data in an external hive table in Hive 1.2 from another table using INSERT COmmand-
INSERT INTO perf_tech_security_detail_extn_fltr partition
(created_date)
SELECT seq_num,
action,
sde_timestamp,
instrmnt_id,
dm_lstupddt,
grnfthr_ind,
grnfthr_tl_dt,
grnfthr_frm_dt,
ftc_chge_rsn,
Substring (sde_timestamp, 0, 10)
FROM tech_security_detail_extn_fltr
WHERE Substring (sde_timestamp, 0, 10) = '2018-05-02';
But the hive shell hangs on-
hive> SET hive.exec.dynamic.partition=true;
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> set hive.enforce.bucketing=true;
hive> INSERT INTO PERF_TECH_SECURITY_DETAIL_EXTN_FLTR partition (created_date) select seq_num, action, sde_timestamp, instrmnt_id, dm_lstupddt, grnfthr_ind, grnfthr_tl_dt, grnfthr_frm_dt, ftc_chge_rsn, substring (sde_timestamp,0,10) from TECH_SECURITY_DETAIL_EXTN_FLTR where substring (sde_timestamp,0,10)='2018-05-02';
Query ID = tcs_20180503215950_585152fd-ecdc-4296-85fc-d464fef44e68
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 100
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Hive logs are as below-
18-05-03 21:28:01,703 INFO [main]: log.PerfLogger
(PerfLogger.java:PerfLogEnd(148)) - 2018-05-03 21:28:01,716
ERROR [main]: mr.ExecDriver (ExecDriver.java:execute(400)) - yarn
2018-05-03 21:28:01,758 INFO [main]: client.RMProxy
(RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at
/0.0.0.0:8032 2018-05-03 21:28:01,903 INFO [main]:
fs.FSStatsPublisher (FSStatsPublisher.java:init(49)) - created :
hdfs://localhost:9000/datanode/nifi_data/perf_tech_security_detail_extn_fltr/.hive-staging_hive_2018-05-03_21-27-59_433_5606951945441160381-1/-ext-10001 2018-05-03 21:28:01,960 INFO [main]: client.RMProxy
(RMProxy.java:createRMProxy(98)) - Connecting to ResourceManager at
/0.0.0.0:8032 2018-05-03 21:28:01,965 INFO [main]: exec.Utilities
(Utilities.java:getBaseWork(389)) - PLAN PATH =
hdfs://localhost:9000/tmp/hive/tcs/576b0aa3-059d-4fb2-bed8-c975781a5fce/hive_2018-05-03_21-27-59_433_5606951945441160381-1/-mr-10003/303a392c-2383-41ed-bc9d-78d37ae49f39/map.xml
2018-05-03 21:28:01,967 INFO [main]: exec.Utilities
(Utilities.java:getBaseWork(389)) - PLAN PATH =
hdfs://localhost:9000/tmp/hive/tcs/576b0aa3-059d-4fb2-bed8-c975781a5fce/hive_2018-05-03_21-27-59_433_5606951945441160381-1/-mr-10003/303a392c-2383-41ed-bc9d-78d37ae49f39/reduce.xml
2018-05-03 21:28:22,009 INFO [main]: ipc.Client
(Client.java:handleConnectionTimeout(832)) - Retrying connect to
server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); maxRetries=45
2018-05-03 21:28:42,027 INFO [main]: ipc.Client
(Client.java:handleConnectionTimeout(832)) - Retrying connect to
server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); maxRetries=45
..........................................................
I have also tried to insert data normally in unpartitioned table but even that is not working-
INSERT INTO emp values (1 ,'ROB')
I am not sure why you have not written table before table name, like this below:
INSERT INTO TABLE emp
VALUES (1 ,'ROB'), (2 ,'Shailesh');
Write proper commands to make them work
Resolved
MapReduce is not running due to wrong framename,so edited property mapreduce.framework.name in mapred-site.xml
In a cluster environment, the property yarn.resourcemanager.hostname is key to avoid this problem. It worked great for me.
Use this command to monitor YARN performance:
yarn application -list and yarn node -list
Related
So I was trying to import a table from Oracle to Hive, using Sqoop. Here is my query
sqoop-import --hive-import --connect jdbc:oracle:thin:#10.35.10.180:1521:dms
--table DEFECT
--hive-database inspex
--username INSPEX
--password inspex
Yarn seems to split the job into 4 parts.
17/02/17 15:15:17 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.9.1
17/02/17 15:15:17 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
17/02/17 15:15:17 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
17/02/17 15:15:17 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
17/02/17 15:15:18 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
17/02/17 15:15:18 INFO manager.SqlManager: Using default fetchSize of 1000
17/02/17 15:15:18 INFO tool.CodeGenTool: Beginning code generation
17/02/17 15:15:18 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "DEFECT" t WHERE 1=0
17/02/17 15:15:18 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/72422bf2a67c745893ae440ad77e3049/DEFECT.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/02/17 15:15:20 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/72422bf2a67c745893ae440ad77e3049/DEFECT.jar
17/02/17 15:15:20 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:20 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:20 INFO mapreduce.ImportJobBase: Beginning import of DEFECT
17/02/17 15:15:20 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
17/02/17 15:15:20 INFO manager.OracleManager: Time zone has been set to GMT
17/02/17 15:15:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
17/02/17 15:15:21 INFO client.RMProxy: Connecting to ResourceManager at vn1.localdomain/10.35.10.17:8032
17/02/17 15:15:23 INFO db.DBInputFormat: Using read commited transaction isolation
17/02/17 15:15:23 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN("DEFECT_INDEX"), MAX("DEFECT_INDEX") FROM "DEFECT"
17/02/17 15:15:45 INFO mapreduce.JobSubmitter: number of splits:4
17/02/17 15:15:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1487360998088_0003
17/02/17 15:15:45 INFO impl.YarnClientImpl: Submitted application application_1487360998088_0003
17/02/17 15:15:45 INFO mapreduce.Job: The url to track the job: http://vn1.localdomain:8088/proxy/application_1487360998088_0003/
17/02/17 15:15:45 INFO mapreduce.Job: Running job: job_1487360998088_0003
17/02/17 15:15:51 INFO mapreduce.Job: Job job_1487360998088_0003 running in uber mode : false
17/02/17 15:15:51 INFO mapreduce.Job: map 0% reduce 0%
17/02/17 15:15:57 INFO mapreduce.Job: map 50% reduce 0%
17/02/17 15:16:35 INFO mapreduce.Job: map 75% reduce 0%
Then it got stuck in 75% and runs forever. I notice that 3 out of 4 jobs are finished pretty quick. Except for one:
Seems like this job isn't making any progress and just stay at 0%. I checked out the syslog:
2017-02-17 15:15:53,795 INFO [main] org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2017-02-17 15:15:53,851 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2017-02-17 15:15:53,851 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system started
2017-02-17 15:15:53,859 INFO [main] org.apache.hadoop.mapred.YarnChild: Executing with tokens:
2017-02-17 15:15:53,859 INFO [main] org.apache.hadoop.mapred.YarnChild: Kind: mapreduce.job, Service: job_1487360998088_0003, Ident: (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier#41480ec1)
2017-02-17 15:15:53,948 INFO [main] org.apache.hadoop.mapred.YarnChild: Sleeping for 0ms before retrying again. Got null now.
2017-02-17 15:15:54,426 INFO [main] org.apache.hadoop.mapred.YarnChild: mapreduce.cluster.local.dir for child: /yarn/nm/usercache/root/appcache/application_1487360998088_0003
2017-02-17 15:15:55,409 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
2017-02-17 15:15:55,813 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2017-02-17 15:15:55,822 INFO [main] org.apache.hadoop.mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2017-02-17 15:15:56,278 INFO [main] org.apache.sqoop.mapreduce.db.DBInputFormat: Using read commited transaction isolation
2017-02-17 15:15:56,411 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: "DEFECT_INDEX" >= 1 AND "DEFECT_INDEX" < 545225318
2017-02-17 15:15:56,491 INFO [main] org.apache.sqoop.mapreduce.db.OracleDBRecordReader: Time zone has been set to GMT
2017-02-17 15:15:56,564 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Working on split: "DEFECT_INDEX" >= 1 AND "DEFECT_INDEX" < 545225318
2017-02-17 15:15:56,610 INFO [main] org.apache.sqoop.mapreduce.db.DBRecordReader: Executing query: SELECT "DEFECT_INDEX", "LAYER_SCAN_INDEX", "DEFECT_SITE_ID", "CLUSTER_ID", "CARRY_OVER", "VERIFIED_ADDER", "REPEATING_DEFECT", "TEST_NUMBER", "X_DIE_COORDINATE", "Y_DIE_COORDINATE", "X_COORDINATE", "Y_COORDINATE", "X_DEFECT_SIZE", "Y_DEFECT_SIZE", "D_SIZE", "INSPECT_INTENSITY", "PATTERN_ID", "SURFACE", "ANGLE", "HOT_SPOT", "ASPECT_RATIO", "GRAY_SCALE", "MACROSIGID", "REGIONID", "SEMREVSAMPLE", "POLARITY", "DBGROUP", "CAREAREAGROUPCODE", "VENNID", "SEGMENTID", "MDAT_OFFSET", "DESIGNFILEFLOORPLANID", "DBSCRITICALITYINDEX", "CELLSIZE", "PCI", "LINECOMPLEXITY", "DCIRANGE", "GDSX", "GDSY" FROM "DEFECT" WHERE ( "DEFECT_INDEX" >= 1 ) AND ( "DEFECT_INDEX" < 545225318 )
The log ends on starting executing query. I wait for like 10 hours and still there's no update. Which shouldn't be right because it's not really that big.
I didn't find any error in the log. Before this I have successfully imported several tables from oracle to hive. So I think my configuration is fine.
I have tried to set mapper from 1 to 100 and still didn't work. And I notice that the task which PK starts from 1~somenumber always show 0% progress while other works just fine.
I'm looking for any suggestions or help. Thanks.
by default, sqoop aplit your job in 4 mappers based on the PK of your table, however depending on the distribution of your data, this can be very inefficient. You didnt talk about the size of your cluster or your data but I would sugeest see how is the distribution of your data and try set a greater number of maps(-m option) using a different column to split the load (split-by option). your job is using the defect_index column to split the work
I have started metastore and hiveserver2
#./hive --service metastore
#./hive --service hiveserver2
When I excute below query
#./beeline -u jdbc:hive2://192.168.0.10:10000 -e 'select count(*) from test_tb' --hiveconf hive.root.logger=DEBUG,console --verbose=true
It throws below error
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=1)
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:275)
at org.apache.hive.beeline.Commands.execute(Commands.java:736)
at org.apache.hive.beeline.Commands.sql(Commands.java:657)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:804)
at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:608)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:630)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:368)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:351)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Beeline version 0.13.1 by Apache Hive
hiveserver2 log below
6/06/14 10:57:32 [main]: WARN common.LogUtils: DEPRECATED: Ignoring hive-default.xml found on the CLASSPATH at /data/offline/apache-hive-0.13.1-bin/conf/hive-default.xml
16/06/14 10:57:32 [main]: INFO metastore.HiveMetaStore: Starting hive metastore on port 9083
16/06/14 10:57:32 [main]: INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/06/14 10:57:32 [main]: INFO metastore.ObjectStore: ObjectStore, initialize called
16/06/14 10:57:33 [main]: INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/06/14 10:57:33 [main]: INFO metastore.ObjectStore: Initialized ObjectStore
16/06/14 10:57:34 [main]: INFO metastore.HiveMetaStore: Added admin role in metastore
16/06/14 10:57:34 [main]: INFO metastore.HiveMetaStore: Added public role in metastore
16/06/14 10:57:34 [main]: INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/06/14 10:57:34 [main]: INFO metastore.HiveMetaStore: Starting DB backed MetaStore Server
16/06/14 10:57:34 [main]: INFO metastore.HiveMetaStore: Started the new metaserver on port [9083]...
16/06/14 10:57:34 [main]: INFO metastore.HiveMetaStore: Options.minWorkerThreads = 200
16/06/14 10:57:34 [main]: INFO metastore.HiveMetaStore: Options.maxWorkerThreads = 100000
16/06/14 10:57:34 [main]: INFO metastore.HiveMetaStore: TCP keepalive = true
16/06/14 10:57:40 [pool-3-thread-1]: INFO metastore.HiveMetaStore: 1: source:/10.234.177.127 get_table : db=default tbl=test_tb
16/06/14 10:57:40 [pool-3-thread-1]: INFO HiveMetaStore.audit: ugi=qspace ip=/10.234.177.127 cmd=source:/192.168.0.10 get_table : db=default tbl=test_tb
16/06/14 10:57:40 [pool-3-thread-1]: INFO metastore.HiveMetaStore: 1: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/06/14 10:57:40 [pool-3-thread-1]: INFO metastore.ObjectStore: ObjectStore, initialize called
16/06/14 10:57:40 [pool-3-thread-1]: INFO metastore.ObjectStore: Initialized ObjectStore
I see you are using Hiveserver2. For such aggregations, depending upon configuration, I believe you may need to set the number of reducers prior to execution. With Hive you can use this syntax:
SET mapred.reduce.tasks=1
However on Hive2 I have noticed I need to use:
SET mapreduce.job.reduces=1
I hope this helps! Previously I had a the same error message and changing this resolved the problem for me.
This could be a permission issue,Try starting beeline as follows
beeline
!connect jdbc:hive2://hadoop7:10000/default
enter your user and password
Confirming that in my case it was caused by permission issue (as pointed already by wise_w) what was weird is that it looked like plain select * worked fine but anything more elaborate did not.
I tried the following:
$ beeline -u jdbc:hive2://server1:10000
(connected fine, no errors)
> select * from table1;
-> worked fine
> select count(*) from table1;
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=1)
or
ERROR : Job Submission failed with exception 'org.apache.hadoop.security.AccessControlException(Permission denied: user=anonymous, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
The solution started with wise_w suggestion
$ beeline --incremental=true
$ !connect jdbc:hive2://server1:10000/default
Enter username for jdbc:hive2://server1:10000/default: hive
Enter password for jdbc:hive2://server1:10000/default: **** (this is hive as well in my test setup)
-> and now everything worked fine.
Now one liner
$ beeline -u jdbc:hive2://server1:10000 -n hive -p hive --incremental=true
I thing the confision came from the fact, that connecting like:
$ beeline -u jdbc:hive2://server1:10000
does not throw any error.
Setting the below parameters, it is important in this case:
set mapred.job.queue.name=<your queue name>;
set mapreduce.job.reduces=1;
Add the below snippet before running you your query:
SET hive.auto.convert.join=false;
SET mapreduce.map.memory.mb = 16384;
SET mapreduce.map.java.opts='-Djava.net.preferIPv4Stack=true -Xmx13107M';
SET mapreduce.reduce.memory.mb = 13107;
SET mapreduce.reduce.java.opts='-Djava.net.preferIPv4Stack=true -Xmx16384M';
set hive.support.concurrency = false;
I can do all other queries in hive, but when I do a join it just gets stuck.
hive> select count (*) from tab10 join tab1;
Warning: Map Join MAPJOIN[13][bigTable=tab10] in task 'Stage-2:MAPRED' is a cross product
Query ID = root_20160406145959_b57642e0-7499-41a0-914c-0004774fe4ac
Total jobs = 1
Execution log at: /tmp/root/root_20160406145959_b57642e0-7499-41a0-914c-0004774fe4ac.log
2016-04-06 03:00:03 Starting to launch local task to process map join; maximum memory = 2058354688
2016-04-06 03:00:03 Dump the side-table for tag: 1 with group count: 1 into file: file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile01--.hashtable
2016-04-06 03:00:03 Uploaded 1 File to: file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile01--.hashtable (280 bytes)
2016-04-06 03:00:03 End of local task; Time Taken: 0.562 sec.
Its hung at this point, and it doesn't spawn any of the map reduce tasks at all. What could be wrong?
I did see this in hive.log.
2016-04-06 15:00:00,124 INFO [main]: ql.Driver (Driver.java:launchTask(1643)) - Starting task [Stage-5:MAPREDLOCAL] in serial mode
2016-04-06 15:00:00,125 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(159)) - Generating plan file file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10006/plan.xml
2016-04-06 15:00:00,233 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(288)) - Executing: /opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/lib/hadoop/bin/hadoop jar /opt/cloudera/parcels/CDH-5.5.2-1.cdh5.5.2.p0.4/jars/hive-exec-1.1.0-cdh5.5.2.jar org.apache.hadoop.hive.ql.exec.mr.ExecDriver -localtask -plan file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10006/plan.xml -jobconffile file:/tmp/root/b71aa45b-f356-4a54-a880-77e57cd53ed3/hive_2016-04-06_14-59-59_858_3722397802100174236-1/-local-10007/jobconf.xml
There is nothing beyond this. Anyone knows how to fix this ?
Open mapred-site.xml file and add the property:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
you need to increase your heap memory used by hadoop JVM
I am trying to run two sqoop jobs in parallel using oozie. But two jobs are stuck after 95 % , other two are in accepted state.I have also increased yarn resource maximum memory . also added
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>50 </value>
in mapred-site.xml , but nothing helped. please help.
Yarn Cluster Metrix:
Apps Submitted 4
Apps Pending 2
Apps Running 2
Apps Completed 0
Containers Running 4
Memory Used 10GB
Memory Total 32GB
Memory Reserved 0B
VCores Used 4
VCores Total 24
VCores Reserved 0
Active Nodes 4
Decommissioned Nodes 0
Lost Nodes 0
Unhealthy Nodes 0
Rebooted Nodes 0
----------
Sysout Log
========================================================================
3175 [main] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
3198 [main] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.5-cdh5.2.0
3212 [main] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead.
3213 [main] INFO org.apache.sqoop.tool.BaseSqoopTool - Using Hive-specific delimiters for output. You can override
3213 [main] INFO org.apache.sqoop.tool.BaseSqoopTool - delimiters with --fields-terminated-by, etc.
3224 [main] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
3280 [main] INFO org.apache.sqoop.manager.oracle.OraOopManagerFactory - Data Connector for Oracle and Hadoop is disabled.
3293 [main] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000
3297 [main] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation
3951 [main] INFO org.apache.sqoop.manager.OracleManager - Time zone has been set to GMT
4023 [main] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM PT_PRELIM_FINDING_V t WHERE 1=0
4068 [main] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop-mapreduce
5925 [main] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-nobody/compile/0dab11f6545d8ef69d6dd0f6b9041a50/PT_PRELIM_FINDING_CYTOGEN_V.jar
5937 [main] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of PT_PRELIM_FINDING_V
5962 [main] INFO org.apache.sqoop.manager.OracleManager - Time zone has been set to GMT
5981 [main] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies.
6769 [main] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Heart beat
Thanks #abeaamase.
I asked our DBA to increase oracle database max process to 750 and max session pool to around 1.5 times process size i.e 1125.
This has solved the issue. This has nothing to do with yarn memory.Unfortunately in sqoop2 this exception is not handled.
Please feel free to add more answers,if you feel this explanation is not appropriate.
I am new in this so it's completely possible I miss something basic.
I am trying to run an Oozie workflow that gets kicked off from a Coordinator. The Coordinator waits until files show up in a directory. The workflow contains a Hive action, that runs this script:
CREATE external TABLE IF NOT EXISTS daily_dump (
id bigint,
creationdate timestamp,
datelastupdated timestamp,
data1 string,
data2 string) LOCATION '/data/daily_dump';
FROM daily_dump d
INSERT OVERWRITE TABLE mydata_orc
PARTITION(id, datelastupdated)
SELECT d.id, d.creationdate, d.datelastupdated, d.data1, d.data2;
DROP TABLE daily_dump;
If I run the script manually from hive CLI, it works fine.
The workflow got kicked off correctly when _SUCCESS file shows up. It
appears the script is halfway executed as I can see from hive CLI that the
table "daily_dump" got created. I can see data in it. I checked the hivemetastore.log and did not see any errors.
But the statement after that seemed to die in Oozie with this error:
2015-01-30 18:04:40,086 WARN HiveActionExecutor:542 - USER[me] GROUP[-]
TOKEN[] APP[guzzler] JOB[0000162-150114210350250-oozie-oozi-W]
ACTION[0000162-150114210350250-oozie-oozi-W#copy_to_mydata] Launcher
ERROR, reason: Main class [org.apache.oozie.action.hadoop.HiveMain], exit
code [40000]
What does error 40000 mean?
My hive.log shows the last command in the script (the DROP TABLE) and no ERROR logging after that:
2015-01-30 15:25:05,001 INFO ql.Driver (Driver.java:execute(1197)) - Starting command:
DROP TABLE daily_dump
2015-01-30 15:25:05,001 INFO hooks.ATSHook (ATSHook.java:<init>(85)) - Created ATS Hook
2015-01-30 15:25:05,001 INFO log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=PreHook.org.apache.hadoop.hive.ql.hooks.ATSHook from=org.apache.hadoop.hive.ql.Driver>
2015-01-30 15:25:05,001 INFO log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=PreHook.org.apache.hadoop.hive.ql.hooks.ATSHook start=1422631505001 end=1422631505001 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2015-01-30 15:25:05,001 INFO log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=TimeToSubmit start=1422631504958 end=1422631505001 duration=43 from=org.apache.hadoop.hive.ql.Driver>
2015-01-30 15:25:05,001 INFO log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
2015-01-30 15:25:05,001 INFO log.PerfLogger (PerfLogger.java:PerfLogBegin(108)) - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
2015-01-30 15:25:05,095 INFO log.PerfLogger (PerfLogger.java:PerfLogEnd(135)) - </PERFLOG method=runTasks start=1422631505001 end=1422631505095 duration=94 from=org.apache.hadoop.hive.ql.Driver>
I am running oozie-4.0.0, hive-0.13. Anyone has any ideas?
We resolved the issue. It turned out that Hive ran my hive script fine, but it was having troubles creating temporary files in /tmp/hive-yarn. The directory was owned by a person who first ran his script in this cluster. I was running it as my user and did not have permissions to write in that directory.
How we found it out was by going to the actual Hadoop job log associated with the Hive action. The actual error was not propagated properly to Hive logs and Oozie logs. :-(
This error will be solved if you use database name along with table name while creating or deleting table. For example if database name is hivetempdb then use create statement as- create table hivetempdb.daily_dump(id int) ; and for drop table use statement as DROP Table hivetempdb.daily_dump;