Hive mappers taking long time to finish with SpillThread Map output logging - hadoop

I am running hive on mapreduce some of the mapper are running for to long ~ 8 hrs (mostly last few number of mappers).
I can see lot of [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 59
org.apache.hadoop.mapred.MapTask: Spilling map output in the logs.
Need your help to tune this ?
Please find below the sample query I am running.
sample query
CREATE TABLE schema.test_t AS
SELECT
demo,
col1,
col2 as col2,
col3 as col3,
col4,
col5,
col6,
col7,
SUM(col8) AS col8,
COUNT(1) AS col9,
count(distinct col10) as col10,
col11,
col12
FROM
schema.srce_t
WHERE col13 IN ('a','b')
GROUP BY
col1,col2,col3,col4,col5,col6,col7,col11,col12
GROUPING SETS ((col1,col2,col3,col4,col5,col6,col7,col11,col12),
(col1,col11,col2,col3,col5,col6,col12,col7),
(col1,col11,col2,col3,col6,col12,col7),
(col1,col11,col2,col3,col4,col6,col12,col7),
(col1,col11,col2,col4,col5,col6,col12,col7),
(col1,col11,col2,col4,col6,col12,col7),
(col1,col11,col2,col5,col6,col12,col7),
(col1,col11,col4,col5,col6,col12,col7),
(col1,col11,col3,col4,col5,col6,col12,col7),
(col1,col11,col3,col5,col6,col12,col7),
(col1,col11,col3,col4,col6,col12,col7),
(col1,col11,col4,col6,col12,col7),
(col1,col11,col3,col6,col12,col7),
(col1,col11,col5,col6,col12,col7),
(col1,col11,col2, col6,col12,col7),
(col1,col11,col6, col12,col7));
Hive properties.
SET mapreduce.reduce.memory.mb=10240;
SET mapreduce.reduce.java.opts=-Xmx9216m;
SET mapreduce.map.memory.mb=10240;
SET mapreduce.map.java.opts=-Xmx9216m;
SET mapreduce.task.io.sort.mb=1536
Logs:
2019-05-15 05:34:32,600 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 714424619; bufvoid = 1073741824
2019-05-15 05:34:32,600 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 268435452(1073741808); kvend = 232293228(929172912); length = 36142225/67108864
2019-05-15 05:34:32,600 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 750592747 kvi 187648180(750592720)
2019-05-15 05:34:41,305 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: RS[4]: records written - 10000000
2019-05-15 05:35:01,944 INFO [SpillThread] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.snappy]
2019-05-15 05:35:07,479 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 0
2019-05-15 05:35:07,480 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 750592747 kv 187648180(750592720) kvi 178606160(714424640)
2019-05-15 05:35:34,178 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: MAP[13]: records read - 1000000
2019-05-15 05:35:58,140 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
2019-05-15 05:35:58,140 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 750592747; bufend = 390854476; bufvoid = 1073741791
2019-05-15 05:35:58,140 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 187648180(750592720); kvend = 151400696(605602784); length = 36247485/67108864
2019-05-15 05:35:58,141 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 427407372 kvi 106851836(427407344)
2019-05-15 05:36:31,831 INFO [SpillThread] org.apache.hadoop.mapred.MapTask: Finished spill 1
2019-05-15 05:36:31,833 INFO [main] org.apache.hadoop.mapred.MapTask: (RESET) equator 427407372 kv 106851836(427407344) kvi 97806648(391226592)
2019-05-15 05:37:19,180 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output

Сheck current values of these parameters and reduce figures until you have more mappers in parallel:
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set mapreduce.input.fileinputformat.split.minsize=16000; -- 16 KB
set mapreduce.input.fileinputformat.split.maxsize=128000000; -- 128Mb
--files bigger than max size will be splitted.
--files smaller than min size will be processed on the same mapper combined
If your files are not in splittable format, like gzip. this will not help.
Play with these settings to get more smaller mappers.
Also these settings may help to improve performance of the query
set hive.optimize.distinct.rewrite=true;
set hive.map.aggr=true;
--if files are ORC, check PPD:
SET hive.optimize.ppd=true;
SET hive.optimize.ppd.storage=true;

Related

Is throughput in the mapreduce metrics in MB or Mb

After running TestDFSIO I got the following metrics:
2019-04-30 09:50:35,790 INFO fs.TestDFSIO: Date & time: Tue Apr 30 09:50:35 EDT 2019
2019-04-30 09:50:35,791 INFO fs.TestDFSIO: Number of files: 100
2019-04-30 09:50:35,791 INFO fs.TestDFSIO: Total MBytes processed: 10000
2019-04-30 09:50:35,791 INFO fs.TestDFSIO: Throughput mb/sec: 376.9
2019-04-30 09:50:35,791 INFO fs.TestDFSIO: Average IO rate mb/sec: 387.16
2019-04-30 09:50:35,791 INFO fs.TestDFSIO: IO rate std deviation: 60.42
2019-04-30 09:50:35,791 INFO fs.TestDFSIO: Test exec time sec: 115.21
Is Average IO rate mb/sec in MegaByte or Megabits?
TestDFSIO is a usefult tool but the only available documentation is in its source code.
By looking at the code for TestDFSIO.java it seems that the throughput is expressed in Mebibytes per seconds.
In the source code one can see how troughput is computed:
" Throughput mb/sec: " + df.format(toMB(size) / msToSecs(time)),
Function toMB() is the number of bytes divided by MEGA:
static float toMB(long bytes) {
return ((float)bytes)/MEGA;
}
which is in turn the constant 0x100000L, that is the integer 1048576=1024*1024.
From the code:
private static final long MEGA = ByteMultiple.MB.value();
and
enum ByteMultiple {
B(1L),
KB(0x400L),
MB(0x100000L),
GB(0x40000000L),
TB(0x10000000000L);
...
So the throughput should be expressed in mebibytes/sec (MiB/sec) and not in megabytes (MB).

Hive action fails on Oozie, while works well on hive commandline

Here is my workflow, works well if I have sample SQL like show tables or drop partition(Tried with Tez as well, it fials with cryptic error message again)..
workflow-app xmlns="uri:oozie:workflow:0.4" name="UDEX-OOZIE POC">
<credentials>
<credential name="HiveCred" type="hcat">
<property>
<name>hcat.metastore.uri</name>
<value>thrift://xxxx.local:9083</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>hive/_HOST#xxxx.LOCAL</value>
</property>
</credential>
</credentials>
<start to="IdEduTranCell-pm"/>
<action name="IdEduTranCell-pm" cred="HiveCred">
<hive xmlns="uri:oozie:hive-action:0.5">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<job-xml>${HiveConfigFile}</job-xml>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<script>${IdEduTranCell_path}</script>
<param>SOURCE_DB_NAME=${SOURCE_DB_NAME}</param>
<param>STRG_DB_NAME=${STRG_DB_NAME}</param>
<param>TABLE_NAME=${TABLE_NAME}</param>
<file>${IdEduTranCell_path}#${IdEduTranCell_path}</file>
<file>${HiveConfigFile}#${HiveConfigFile}</file>
</hive>
<ok to="sub-workflow-end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Sub-workflow failed while loading data into hive tables, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="sub-workflow-end"/>
</workflow-app>
But fails for a SQL . Data is not too large(fails for even 1 record), so something wrong that i cant spot on log.. please help
INSERT OVERWRITE TABLE xxx1.fact_tranCell
PARTITION (Timestamp)
select
`(Timestamp)?+.+`, Timestamp as Timestamp
from xxx2.fact_tranCell
order by tranCell_ID,ADMIN_CELL_ID, SITE_ID;
SQL is not bad, runs fine on command line..
Status: Running (Executing on YARN cluster with App id application_1459756606292_15271)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 7 7 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
Reducer 3 ...... SUCCEEDED 10 10 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 03/03 [==========================>>] 100% ELAPSED TIME: 19.72 s
--------------------------------------------------------------------------------
Loading data to table xxx1.fact_trancell partition (timestamp=null)
Time taken for load dynamic partitions : 496
Loading partition {timestamp=1464012900}
Time taken for adding to write entity : 8
Partition 4g_oss.fact_trancell{timestamp=1464012900} stats: [numFiles=10, numRows=4352, totalSize=9660382, rawDataSize=207776027]
OK
Time taken: 34.595 seconds
--------------------------- LOG ----------------------------
Starting Job = job_1459756606292_15285, Tracking URL = hxxp://xxxx.local:8088/proxy/application_1459756606292_15285/
Kill Command = /usr/bin/hadoop job -kill job_1459756606292_15285
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-05-27 17:32:35,792 Stage-1 map = 0%, reduce = 0%
2016-05-27 17:32:51,692 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.97 sec
2016-05-27 17:33:02,263 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.97 sec
MapReduce Total cumulative CPU time: 14 seconds 970 msec
Ended Job = job_1459756606292_15285
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1459756606292_15286, Tracking URL = hxxp://xxxx.local:8088/proxy/application_1459756606292_15286/
Kill Command = /usr/bin/hadoop job -kill job_1459756606292_15286
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2016-05-27 17:33:16,583 Stage-2 map = 0%, reduce = 0%
2016-05-27 17:33:29,814 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.29 sec
2016-05-27 17:33:45,587 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 38.74 sec
2016-05-27 17:33:53,990 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.29 sec
2016-05-27 17:34:08,662 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 39.27 sec
2016-05-27 17:34:17,061 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.29 sec
2016-05-27 17:34:28,576 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 38.28 sec
2016-05-27 17:34:36,940 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.29 sec
2016-05-27 17:34:48,435 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 38.09 sec
MapReduce Total cumulative CPU time: 38 seconds 90 msec
Ended Job = job_1459756606292_15286 with errors
Error during job, obtaining debugging information...
Job Tracking URL: http://xxxx.local:8088/proxy/application_1459756606292_15286/
Examining task ID: task_1459756606292_15286_m_000000 (and more) from job job_1459756606292_15286
Task with the most failures(4):
-----
Task ID:
task_1459756606292_15286_r_000000
URL:
hxxp://xxxx.local:8088/taskdetails.jsp?jobid=job_1459756606292_15286&tipid=task_1459756606292_15286_r_000000
-----
Diagnostic Messages for this Task:
Error: Java heap space
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 14.97 sec HDFS Read: 87739161 HDFS Write: 16056577 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 38.09 sec HDFS Read: 16056995 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 53 seconds 60 msec
Intercepting System.exit(2)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [2]
I could see the error in the log is while writing the ORC files, strange the ORC files are written nicely form command line !!!!
2016-05-30 11:11:20,377 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 1
2016-05-30 11:11:21,307 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 10
2016-05-30 11:11:21,917 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 100
2016-05-30 11:11:22,420 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 1000
2016-05-30 11:11:24,181 INFO [main] org.apache.hadoop.hive.ql.exec.ExtractOperator: 0 finished. closing...
2016-05-30 11:11:24,181 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 1 finished. closing...
2016-05-30 11:11:24,181 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 4352
2016-05-30 11:11:33,028 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at org.apache.hadoop.hive.ql.io.orc.OutStream.getNewInputBuffer(OutStream.java:107)
at org.apache.hadoop.hive.ql.io.orc.OutStream.write(OutStream.java:128)
at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.writeDirectValues(RunLengthIntegerWriterV2.java:374)
at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.writeValues(RunLengthIntegerWriterV2.java:182)
at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerWriterV2.write(RunLengthIntegerWriterV2.java:762)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.flushDictionary(WriterImpl.java:1211)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1132)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1616)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1996)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.close(WriterImpl.java:2288)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.close(OrcOutputFormat.java:106)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:186)
at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:952)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Stuck at Hive Elastic search insert Map 0% Reduce 0%

I have integrated elastic search with hadoop using elasticsearch-hadoop-2.3.2.jar.I am Querying my hive table using beeline.when i create or insert normal table , it works fine.But, when i create a external table for using elastic search like following
CREATE EXTERNAL TABLE flogs2 (
name STRING,
city STRING,
status STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.nodes' = '192.168.18.79','es.port' = '9200','es.index.auto.create' = 'true', 'es.resource' = 'mylog/log', 'es.query' = '?q=*');
The table is created.But, when i insert data into it like below,
INSERT OVERWRITE TABLE flogs2 SELECT s.name,s.city,s.status FROM logs
s;
i am stuck at the following lines
0: jdbc:hive2://192.168.18.197:10000> INSERT OVERWRITE TABLE flogs2 SELECT s.name,s.city,s.status FROM logs s;
INFO : Number of reduce tasks is set to 0 since there's no reduce operator
INFO : number of splits:1
INFO : Submitting tokens for job: job_1464067651503_0014
INFO : The url to track the job: http://vasanthakumar-virtual-machine:8088/proxy/application_1464067651503_0014/
INFO : Starting Job = job_1464067651503_0014, Tracking URL = http://vasanthakumar-virtual-machine:8088/proxy/application_1464067651503_0014/
INFO : Kill Command = /home/vasanthakumar/Desktop/software/hadoop-2.7.1/bin/hadoop job -kill job_1464067651503_0014
INFO : Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
INFO : 2016-05-24 16:26:51,866 Stage-0 map = 0%, reduce = 0%
INFO : 2016-05-24 16:27:52,372 Stage-0 map = 0%, reduce = 0%, Cumulative CPU 1.48 sec
INFO : 2016-05-24 16:28:52,498 Stage-0 map = 0%, reduce = 0%, Cumulative CPU 1.48 sec
INFO : 2016-05-24 16:29:52,562 Stage-0 map = 0%, reduce = 0%, Cumulative CPU 1.48 sec
INFO : 2016-05-24 16:30:52,884 Stage-0 map = 0%, reduce = 0%, Cumulative CPU 1.48 sec
INFO : 2016-05-24 16:31:53,103 Stage-0 map = 0%, reduce = 0%, Cumulative CPU 1.48 sec
Note:
1.I have tried both HiveCLI and Beeline
2.Increased my memory space
3.normal hive queries working fine
Please help me to get rid from this.

Hive not running Map Reduce with "where" clause

I'm trying out something simple in Hive on HDFS.
The problem is that the queries are not running map reduce when I'm running a "where clause". However, it runs map reduce for count(*), and even group by clauses.
Here's data and queries with result:
Create External Table:
CREATE EXTERNAL TABLE testtab1 (
id STRING, org STRING)
row format delimited
fields terminated by ','
stored as textfile
location '/usr/ankuchak/testtable1';
Simple select * query:
0: jdbc:hive2://> select * from testtab1;
15/07/01 07:32:46 [main]: ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
OK
+---------------+---------------+--+
| testtab1.id | testtab1.org |
+---------------+---------------+--+
| ankur | idc |
| user | idc |
| someone else | ssi |
+---------------+---------------+--+
3 rows selected (2.169 seconds)
Count(*) query
0: jdbc:hive2://> select count(*) from testtab1;
Query ID = ankuchak_20150701073407_e7fd66ae-8812-4e02-87d7-492f81781d15
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
15/07/01 07:34:08 [HiveServer2-Background-Pool: Thread-40]: ERROR mr.ExecDriver: yarn
15/07/01 07:34:08 [HiveServer2-Background-Pool: Thread-40]: WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
Starting Job = job_1435425589664_0005, Tracking URL = http://slc02khv:8088/proxy/application_1435425589664_0005/
Kill Command = /scratch/hadoop/hadoop/bin/hadoop job -kill job_1435425589664_0005
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
15/07/01 07:34:16 [HiveServer2-Background-Pool: Thread-40]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2015-07-01 07:34:16,291 Stage-1 map = 0%, reduce = 0%
2015-07-01 07:34:23,831 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.04 sec
2015-07-01 07:34:30,102 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.41 sec
MapReduce Total cumulative CPU time: 2 seconds 410 msec
Ended Job = job_1435425589664_0005
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.41 sec HDFS Read: 6607 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 410 msec
OK
+------+--+
| _c0 |
+------+--+
| 3 |
+------+--+
1 row selected (23.527 seconds)
Group by query:
0: jdbc:hive2://> select org, count(id) from testtab1 group by org;
Query ID = ankuchak_20150701073540_5f20df4e-0bd4-4e18-b065-44c2688ce21f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
15/07/01 07:35:40 [HiveServer2-Background-Pool: Thread-63]: ERROR mr.ExecDriver: yarn
15/07/01 07:35:41 [HiveServer2-Background-Pool: Thread-63]: WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
Starting Job = job_1435425589664_0006, Tracking URL = http://slc02khv:8088/proxy/application_1435425589664_0006/
Kill Command = /scratch/hadoop/hadoop/bin/hadoop job -kill job_1435425589664_0006
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
15/07/01 07:35:47 [HiveServer2-Background-Pool: Thread-63]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2015-07-01 07:35:47,200 Stage-1 map = 0%, reduce = 0%
2015-07-01 07:35:53,494 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.05 sec
2015-07-01 07:36:00,799 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.53 sec
MapReduce Total cumulative CPU time: 2 seconds 530 msec
Ended Job = job_1435425589664_0006
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.53 sec HDFS Read: 7278 HDFS Write: 14 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 530 msec
OK
+-------+------+--+
| org | _c1 |
+-------+------+--+
| idc | 2 |
| ssi | 1 |
+-------+------+--+
2 rows selected (21.187 seconds)
Now the simple where clause:
0: jdbc:hive2://> select * from testtab1 where org='idc';
OK
+--------------+---------------+--+
| testtab1.id | testtab1.org |
+--------------+---------------+--+
+--------------+---------------+--+
No rows selected (0.11 seconds)
It would be great if you could provide me with some pointers.
Please let me know if you need further information in this regard.
Regards,
Ankur
Map job is occuring in your last query. So it's not that map reduce is not happening. However, some rows should be returned in your last query. The likely culprit here is that for some reason it is not finding a match on the value "idc". Check your table and ensure that the group for Ankur and user contain the string idc.
Try this to see if you get any results:
Select * from testtab1 where org rlike '.*(idc).*';
or
Select * from testtab1 where org like '%idc%';
These will grab any row that has a value containing the string 'idc'. Good luck!
Here, details of the same error and fixed recently. Try verifying the version you are using

Soaarqube upgrade from 3.7.2 to 4.4. Hung during db upgrade

I am upgrading sonarqube from 3.7.2 to 4.4. DB migration taking more than 2 hours on "MergeMeasureDataIntoProjectMeasures". Here is the log.
2014.10.07 23:31:27 INFO [DbMigration]
2014.10.07 23:31:27 INFO [DbMigration] == RemoveActiveDashboardsLinkedOnUnsharedDashboards: migrating ===============
2014.10.07 23:31:27 INFO [DbMigration] == RemoveActiveDashboardsLinkedOnUnsharedDashboards: migrated (0.0320s) ======
2014.10.07 23:31:27 INFO [DbMigration]
2014.10.07 23:31:27 INFO [DbMigration] == MergeMeasureDataIntoProjectMeasures: migrating ============================
2014.10.07 23:31:27 INFO [DbMigration] -- add_column(:project_measures, "measure_data", :binary, {:null=>true})
2014.10.07 23:31:27 INFO [DbMigration] -> 0.0080s
2014.10.07 23:31:27 INFO [DbMigration] -> 0 rows
Please let me know the solution, if you had faced similar issue.
Thanks
Vimal
Thanks for quick reply Fabrice and kkuilla.
"RemoveActiveDashboardsLinkedOnUnsharedDashboards" has completed with in 1s. The issue was with "MergeMeasureDataIntoProjectMeasures"
Upgrade has been completed.
678849 have been upgraded. hence it took 2.30 hours.
here is the log
2014.10.07 23:31:27 INFO [DbMigration] == MergeMeasureDataIntoProjectMeasures: migrating ============================
2014.10.07 23:31:27 INFO [DbMigration] -- add_column(:project_measures, "measure_data", :binary, {:null=>true})
2014.10.07 23:31:27 INFO [DbMigration] -> 0.0080s
2014.10.07 23:31:27 INFO [DbMigration] -> 0 rows
2014.10.08 02:01:22 INFO [o.s.s.d.m.MassUpdater] 678849 rows have been updated
2014.10.08 02:01:22 INFO [DbMigration] -- drop_table(:measure_data)
2014.10.08 02:01:23 INFO [DbMigration] -> 0.5570s
2014.10.08 02:01:23 INFO [DbMigration] -> 0 rows
2014.10.08 02:01:23 INFO [DbMigration] == MergeMeasureDataIntoProjectMeasures: migrated (8995.5290s) ================

Resources