hive collect_set crashes query - hadoop

I've got the following table:
hive> describe tv_counter_stats;
OK
day string
event string
query_id string
userid string
headers string
And I want to perform the following query:
hive -e 'SELECT
day,
event,
query_id,
COUNT(1) AS count,
COLLECT_SET(userid)
FROM
tv_counter_stats
GROUP BY
day,
event,
query_id;' > counter_stats_data.csv
However, this query fails. But the following query works fine:
hive -e 'SELECT
day,
event,
query_id,
COUNT(1) AS count
FROM
tv_counter_stats
GROUP BY
day,
event,
query_id;' > counter_stats_data.csv
where I remove the collect_set command. So my question: Has anybody an idea why collect_set might fail in this case?
UPDATE: Error message added:
Diagnostic Messages for this Task:
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 3 Reduce: 1 Cumulative CPU: 10.49 sec HDFS Read: 109136387 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 10 seconds 490 msec
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237)
Error: GC overhead limit exceeded
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237)
Error: GC overhead limit exceeded
UPDATE 2:
I altered the query such that it look now like this:
hive -e '
SET mapred.child.java.opts="-server -Xmx1g -XX:+UseConcMarkSweepGC";
SELECT
day,
event,
query_id,
COUNT(1) AS count,
COLLECT_SET(userid)
FROM
tv_counter_stats
GROUP BY
day,
event,
query_id;' > counter_stats_data.csv
However, then I get the following error:
Diagnostic Messages for this Task:
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:250)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:237)
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 3 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

This is probably the memory problem, since collect_set aggregates data in the memory.
Try increasing heap size and enabling concurrent GC (via setting Hadoop mapred.child.java.opts to e.g -Xmx1g -XX:+UseConcMarkSweepGC).
This answer has more information about "GC overhead limit" error.

I had the same exact problem and came across this question, so I thought I'd share the solution I found.
The underlying problem is most likely that Hive is trying to do the aggregation on the mapper side, and the heuristics it uses to manage the in-memory hashmaps for that approach are thrown off by data that is "wide but shallow" -- i.e. in your case, if there are very few user_id values per day/event/query_id group.
I found an article that explains various ways to address this issue, but most of them are just optimizations to the full-out nuclear option: disable mapper-side aggregations entirely.
Using set hive.map.aggr = false; should do the trick.

Related

sge All queues dropped because of overload or full

I'm going to run a million batch jobs with " sge ".
Approximately 10,000 jobs are well executed, but after an hour of execution, they stop running.
After about an hour's run, the process slows down and eventually stops.
Checking the error message does not confirm any errors.
i can check the message below only.
"All queues dropped because of overload or full"
How do I set up the layout to run normally?
there is one master server and four clients and files share using nfs
and every system run on docker and docker-swirm
do qstat when job execution speed was slow down
$qstat -j
queue instance "peteris.q#sge00" dropped because it is full
queue instance "peteris.q#sge02" dropped because it is full
queue instance "peteris.q#sge03" dropped because it is full
queue instance "peteris.q#sge01" dropped because it is full
All queues dropped because of overload or full
detail messages
$qstat -j 1595799
=============================================================
job_number: 1595799
exec_file: job_scripts/1595799
submission_time: Sun May 27 08:08:10 2018
owner: root
uid: 0
group: root
gid: 0
sge_o_home: /root
sge_o_path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
sge_o_workdir: /data/23andMe
sge_o_host: sge
account: sge
cwd: /data/23andMe
mail_list: root#sge
notify: FALSE
job_name: python3
jobshare: 0
env_list:
job_args: lineage.py,makeShell/1009_user3130_user3600.list
script_file: python3
usage 1: cpu=00:00:02, mem=0.59503 GBs, io=0.03963, vmem=493.180M, maxvmem=493.180M
scheduling info: queue instance "peteris.q#sge00" dropped because it is full
queue instance "peteris.q#sge02" dropped because it is full
queue instance "peteris.q#sge03" dropped because it is full
queue instance "peteris.q#sge01" dropped because it is full
All queues dropped because of overload or full
sge config
algorithm default
schedule_interval 0:0:10
maxujobs 0
queue_sort_method load
job_load_adjustments np_load_avg=100.0
load_adjustment_decay_time 0:7:30
load_formula np_load_avg
schedd_job_info true
flush_submit_sec 2
flush_finish_sec 2
params none
reprioritize_interval 0:0:0
halftime 168
usage_weight_list cpu=1.000000,mem=0.000000,io=0.000000
compensation_factor 5.000000
weight_user 0.250000
weight_project 0.250000
weight_department 0.250000
weight_job 0.250000
weight_tickets_functional 0
weight_tickets_share 0
share_override_tickets TRUE
share_functional_shares TRUE
max_functional_jobs_to_schedule 200
report_pjob_tickets TRUE
max_pending_tasks_per_job 50
halflife_decay_list none
policy_hierarchy OFS
weight_ticket 0.500000
weight_waiting_time 0.278000
weight_deadline 3600000.000000
weight_urgency 0.500000
weight_priority 0.000000
max_reservation 0
default_duration INFINITY
sge queue config
qname peteris.q
hostlist #allhosts
seq_no 0
load_thresholds NONE
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:00:05
priority 0
min_cpu_interval 00:00:05
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make
rerun FALSE
slots 20
tmpdir /tmp
shell /bin/bash
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:01
owner_list NONE
user_lists NONE
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
Seems like you have hit a practical limit on the number of active jobs that the queue can handle at any given time. I cannot confirm where the maximum is defined by SGE, but seems likely it is:
max_jobs
The number of active (not finished) jobs simultaneously
allowed in Sun Grid Engine is controlled by this parameter.
A value greater than 0 defines the limit. The default value
0 means "unlimited". If the max_jobs limit is exceeded by a
job submission then the submission command exits with exit
status 25 and an appropriate error message.
Changing max_jobs will take immediate effect.
This value is a global configuration parameter only. It can-
not be overwritten by the execution host local configura-
tion.
From: http://gridscheduler.sourceforge.net/htmlman/htmlman5/sge_conf.html?pathrev=V62u5_TAG
If this is correct then value is unlimited; however, SGE will likely not perform well trying to manage ~1 million active jobs, hence the issue you are likely having. I would recommend you use job arrays, as this is the purpose of this type of job ie, to manage and run many near identical tasks.
There are many resources online for job arrays in SGE, such as this one:
http://wiki.gridengine.info/wiki/index.php/Simple-Job-Array-Howto
http://talby.rcs.manchester.ac.uk/~ri/_linux_and_hpc_lib/sge_array.html
https://wiki.duke.edu/display/SCSC/SGE+Array+Jobs
I am happy to assist further if you edit your question with specific requirements for each task. For example, does each of the ~ 1 millions tasks require one or more parameters as input?

Hive cross join fails on local map join

Is there a direct way to address the following error or overall a better way to use Hive to get the join that I need? Output to a stored table isn't a requirement as I can be content with an INSERT OVERWRITE LOCAL DIRECTORY to a csv.
I am trying to perform the following cross join. ipint is a 9GB table, and geoiplite is 270MB.
CREATE TABLE iplatlong_sample AS
SELECT ipintegers.networkinteger, geoiplite.latitude, geoiplite.longitude
FROM geoiplite
CROSS JOIN ipintegers
WHERE ipintegers.networkinteger >= geoiplite.network_start_integer AND ipintegers.networkinteger <= geoiplite.network_last_integer;
I use CROSS JOIN on ipintegers instead of geoiplite because I have read that the rule is for the smaller table to be on the left, larger on the right.
Map and Reduce stages complete to 100% according to HIVE, but then
2015-08-01 04:45:36,947 Stage-1 map = 100%, reduce = 100%, Cumulative
CPU 8767.09 sec
MapReduce Total cumulative CPU time: 0 days 2 hours 26
minutes 7 seconds 90 msec
Ended Job = job_201508010407_0001
Stage-8 is selected by condition resolver.
Execution log at: /tmp/myuser/.log
2015-08-01 04:45:38 Starting to launch local task to process map
join; maximum memory = 12221153280
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID: Stage-8
Logs:
/tmp/myuser/hive.log
FAILED: Execution Error, return code 3 from
org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
MapReduce Jobs
Launched: Job 0: Map: 38 Reduce: 1 Cumulative CPU: 8767.09 sec
HDFS Read: 9438495086 HDFS Write: 8575548486 SUCCESS
My hive config:
SET hive.mapred.local.mem=40960;
SET hive.exec.parallel=true;
SET hive.exec.compress.output=true;
SET hive.exec.compress.intermediate = true;
SET hive.optimize.skewjoin = true;
SET mapred.compress.map.output=true;
SET hive.stats.autogather=false;
I have varied SET hive.auto.convert.join between true and false but with the same result.
Here are the errors in the output log from /tmp/myuser/hive.log
$ tail -12 -f tmp/mysyer/hive.log
2015-08-01 07:30:46,086 ERROR exec.Task (SessionState.java:printError(419)) - Execution failed with exit status: 3
2015-08-01 07:30:46,086 ERROR exec.Task (SessionState.java:printError(419)) - Obtaining error information
2015-08-01 07:30:46,087 ERROR exec.Task (SessionState.java:printError(419)) -
Task failed!
Task ID:
Stage-8
Logs:
2015-08-01 07:30:46,087 ERROR exec.Task (SessionState.java:printError(419)) - /tmp/myuser/hive.log
2015-08-01 07:30:46,087 ERROR mr.MapredLocalTask (MapredLocalTask.java:execute(268)) - Execution failed with exit status: 3
2015-08-01 07:30:46,094 ERROR ql.Driver (SessionState.java:printError(419)) - FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
I am running the hive client on the Master, a Google Cloud Platform instance of type n1-highmem-8 type (8 CPU, 52GB) and workers are n1-highmem-4 (4CPU 26GB), but I suspect after MAP and REDUCE that a local join (as implied) takes place on the Master. Regardless, in bdutils I configured the JAVAOPTS for the worker nodes (n1-highmem-4) to: n1-highmem-4
SOLUTION EDIT: The solution is to organize the data the range data into a range tree.
I don't think it is possible to perform this kind of cross join brute force - just multiply the row numbers, it's a little out of hand. You need some optimizations, which I don't think hive is capable yet.
But is this problem can actually be solved in O(N1+N2) time providing you have your data sorted (which hive can do for you) - you just go through both lists simultaneously, on each step getting an ip integer, seeing if any intervals start on this integer, adding them, removing those that ended, emitting matching tuples, and so on. Pseudocode:
intervals=[]
ipintegers = iterator(ipintegers_sorted_file)
intervals = iterator(intervals_sorted_on_start_file)
for x in ipintegers:
intervals = [i for i in intervals if i.end >= x]
while(intervals.current.start<=x):
intervals.append(intervals.current)
intervals.next()
for i in intervals:
output_match(i, x)
Now, if you have an external script/UDF function that knows how to read the smaller table and gets ip integers as input and spits matching tuples as output, you can use hive and SELECT TRANSFORM to stream the inputs to it.
Or you can probably just run this algorithm on a local machine with two input files, because this is just O(N), and even 9 gb of data is very doable.

How to skip failed map tasks in hadoop streaming

I am running a hadoop streaming mapreduce job which has 26895 map tasks in total. However, one task that deals a certain input always fails. So I set mapreduce.map.failures.maxpercent=1 and want to skip failed tasks, but the job was still not successful.
Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts
map 100.00% 26895 0 0 26894 1 8 / 44
reduce 100.00% 1 0 0 0 1 0 / 1
How can I do to skip this?
There is a configuration available for the same.
Specify the mapred.max.map.failures.percent and mapred.max.reduce.failures.percent in the mapred-site.xml to specify the failure threshold. Both are set to 0.
These properties are deprecated now and use following properties for this purpose
mapreduce.map.failures.maxpercent
mapreduce.reduce.failures.maxpercent

Hive takes long time to launch hadoop job

I am a newbie to Hadoop and Hive. I am using Hive integration with Hadoop to execute the queries. When I submit any query, following log messages appear on console:
Hive history
file=/tmp/root/hive_job_log_root_28058#hadoop2_201203062232_1076893031.txt Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce
tasks determined at compile time: 1 In order to change the average
load for a reducer (in bytes): set
hive.exec.reducers.bytes.per.reducer= In order to limit the
maximum number of reducers: set hive.exec.reducers.max= In
order to set a constant number of reducers: set
mapred.reduce.tasks= Starting Job = job_201203062223_0004,
Tracking URL =
http://:50030/jobdetails.jsp?jobid=job_201203062223_0004 Kill
Command = //opt/hadoop_installation/hadoop-0.20.2/bin/../bin/hadoop
job -kill job_201203062223_0004 Hadoop job information for Stage-1:
number of mappers: 1; number of reducers: 1 2012-03-06 22:32:26,707
Stage-1 map = 0%, reduce = 0% 2012-03-06 22:32:29,716 Stage-1 map =
100%, reduce = 0% 2012-03-06 22:32:38,748 Stage-1 map = 100%, reduce
= 100% Ended Job = job_201203062223_0004 MapReduce Jobs Launched: Job 0: Map: 1 Reduce: 1 HDFS Read: 8107686 HDFS Write: 4 SUCCESS Total
MapReduce CPU Time Spent: 0 msec OK
The text mentioned in bold starts a hadoop job (that's what I believe). It takes long time to start the job. Once this line gets executed, the map reduce operations execute swiftly. Following are my questions:
Is there any way to make the launch of hadoop job faster. Is it possible to skip this phase?
Where does the value of 'Kill command' come from (in the bold text)?
Please let me know if any inputs are required.
1) Starting Job = job_201203062223_0004, Tracking URL = http: :50030/jobdetails.jsp?jobid=job_201203062223_0004
ANS: your HQL query > translated to hadoop job > hadoop will do some background work (like planning resources,data locality,stages needed to process query,launch configs,job,taskids generation etc) > launch mappers > sort && shuffle > reduce (aggregation) > result to hdfs .
The above flow is part of hadoop job life cycle, so no skipping of any..
http://namenode:port/jobtracker.jsp --- you can see ur job status with job-id :job_201203062223_0004, (Monitering)
2) Kill Command = HADOOP_HOME/bin/hadoop job -kill job_201203062223_0004
Ans : before launching your mappers, you will be showed with these lines because, hadoop works on bigdata, which may take much or less time depends on your dataset size. so at any point of time if you want to kill the job, its a help line . For any hadoop-job this line will be shown, it won't take much time to show an info line like this.
some addons with respect to your comments :
Hive is not meant for low Latency jobs , i mean immediate in time results not possible.
(plz check the hive -purposes in apache.hive)
launching overhead(refer q1s - hadoop will do some background work) is there in Hive, it cant be avoided.
Even for datasets of small size, these launching over head is there in hadoop.
PS : if you are really expecting in time quick results ( plz refer shark )
first,Hive is the tool which replace your mr work by HQL.In the background,it has lost of predefined funcitions,mr programes.Run a HQL,HADOOP Cluster will do lost of things,find the data blocks,allocating taskļ¼Œand so on.
Second,you can kill a job by the hadoop shell command.
If you job id is AAAAA.
you can execute below command to kill it
$HADOOP_HOME/bin/hadoop job -kill AAAAA
Launch of hadoop job can get delayed due to unavailability of resources. If you use yarn you can see that the jobs are in accepted state but not yet running. This means there is some other ongoing job that has consumed all your executors and the new query is waiting to run.
You can kill the older job by using hadoop job -kill <job_id> command or wait for it to finish.

Hive - Queries on Partitions return nothing

I have a table that is being partitioned by a specific start date (ds). I can query the latest partition (the previous day's data) and it will use the partition fine.
hive> select count(1) from vtc4 where ds='2012-11-01' ;
...garbage...
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 6.43 sec HDFS Read: 46281957 HDFS Write: 7 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 430 msec
OK
151225
Time taken: 35.007 seconds
However, when I try to query earlier partitions, hive seems to read the partition fine, but does not return any results.
hive> select count(1) from vtc4 where ds='2012-10-31' ;
...garbage...
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 7.64 sec HDFS Read: 37754168 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 640 msec
OK
0
Time taken: 29.07 seconds
However, if I tell hive to run the query against the date field inside the table itself, and don't use the partition, I get the correct result.
hive> select count(1) from vtc4 where date_started >= "2012-10-31 00:00:00" and date_started < "2012-11-01 00:00:00" ;
...garbage...
MapReduce Jobs Launched:
Job 0: Map: 63 Reduce: 1 Cumulative CPU: 453.52 sec HDFS Read: 16420276606 HDFS Write: 7 SUCCESS
Total MapReduce CPU Time Spent: 7 minutes 33 seconds 520 msec
OK
123201
Time taken: 265.874 seconds
What am I missing here? I'm running hadoop 1.03 and hive 0.9. I'm pretty new to hive/hadoop, so any help would be appreciated.
Thanks.
EDIT 1:
hive> describe formatted vtc4 partition (ds='2012-10-31');
Partition Value: [2012-10-31 ]
Database: default
Table: vtc4
CreateTime: Wed Oct 31 12:02:24 PDT 2012
LastAccessTime: UNKNOWN
Protect Mode: None
Location: hdfs://hadoop5.internal/user/hive/warehouse/vtc4/ds=2012-10-31
Partition Parameters:
transient_lastDdlTime 1351875579
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.191 seconds
The partition folders exist, but when i try to do a hadoop fs -ls on hdfs://hadoop5.internal/user/hive/warehouse/vtc4/ds=2012-10-31 it says the file/directory does not exist. If I browse to that directory using the web interface, I can get into the folder , as well as see the /part-m-000* files. If I do a fs -ls on hdfs://hadoop5.internal/user/hive/warehouse/vtc4/ds=2012-11-01 it works fine.
Seems like either a permissions thing, or something funky with the either hive's or the namenode's metadata. Here's what I would try:
copy the data in that partition to some other location in hdfs. You may need to do this as the hive or hdfs user, depending on how your permissions are set up.
alter table vtc4 drop partition (ds='2012-10-31');
alter table vtc4 add partition (ds='2012-10-31');
copy the data back into that partition on hdfs
Another thing with hive partition is that it sometime doesn't register in metadata system when created outside of hive (e.g. from sparksql). You can also try MSCK REPAIR TABLE xc_bonus; after any changes to partition so it reflects correctly.

Resources