how to achieve HiveQL error handling - hadoop

I have multiple queries in a hql file (say 10, every query ending with ;) which I am running from a shell script.
When a query in between fails (say query #5), the queries after 5 do not execute, and the hive job is completed.
How can I do error handling to make sure that queries from 6 to 10 run even though query 5 fails?

Demo
myscript.sql
select 1;
select assert_true(false);
select 2;
Option 1
hive --hiveconf hive.cli.errors.ignore=true -f myscript.sql
OK
1
Time taken: 3.742 seconds, Fetched: 1 row(s)
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: ASSERT_TRUE(): assertion failed.
Time taken: 0.264 seconds
OK
2
Time taken: 0.284 seconds, Fetched: 1 row(s)
Option 2
hive<myscript.sql
hive> select 1;
OK
1
Time taken: 3.181 seconds, Fetched: 1 row(s)
hive> select assert_true(false);
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: ASSERT_TRUE(): assertion failed.
Time taken: 0.335 seconds
hive> select 2;
OK
2
Time taken: 0.225 seconds, Fetched: 1 row(s)

Related

How to insert into table select function query with Hive

Insert into table When a table with a function result value is selected, the value does not appear. What should I do?
Function query result
hive> SELECT start_num,geoip(start_ip,'COUNTRY_CODE','/usr/local/hive/lib/GeoLite2-Country.mmdb') from geoip limit 3;
OK
17/05/24 18:02:15 INFO mapred.FileInputFormat: Total input files to process : 1
16778240 AU
16779264 CN
16781312 JP
Time taken: 0.129 seconds, Fetched: 3 row(s)
When you insert function query query results
Query insert into table iptest2 SELECT start_num,geoip(start_ip,'COUNTRY_CODE','/usr/local/hive/lib/GeoLite2-Country.mmdb') from geoip limit3;
17/05/24 18:05:41 INFO mapred.FileInputFormat: Total input files to process : 2
16778240
16779264
16781312
Time taken: 0.115 seconds, Fetched: 3 row(s)
iptest2 table desc
hive> desc iptest2;
OK
17/05/25 09:26:28 INFO mapred.FileInputFormat: Total input files to process : 1
code string
ccode string
Time taken: 0.066 seconds, Fetched: 2 row(s)
)
GEOIP function UDF (Use the UDF function from the link below)
https://github.com/Spuul/hive-udfs/blob/master/src/main/java/com/spuul/hive/GeoIP2.java

HIVE returning wrong date

I'm getting some odd results from HIVE when working with dates.
For starters, I'm using Hive 1.2.1000.2.4.0.0-169
I have a table defined (snipped) of the sort:
hive> DESCRIBE proto_hourly;
OK
elem string
protocol string
count bigint
date_val date
hour_id tinyint
# Partition Information
# col_name data_type comment
date_val date
hour_id tinyint
Time taken: 0.336 seconds, Fetched: xx row(s)
hive>
Ok so I have data loaded for the current year. I started noticing some "weirdness" in queries with specific dates but for a pointed example, here's a pretty simple query where i'm just asking for '2016-06-01' but i get back '2016-05-31'...why
hive> SET i="2016-06-01";
hive> with uniq_dates AS (
> SELECT DISTINCT date_val as date_val
> FROM proto_hourly
> WHERE date_val = date(${hiveconf:i}) )
> select * from uniq_dates;
Query ID = hive_20160616154318_a75b3343-a2fe-41a5-b02a-d9cda8695c91
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1465936275203_0023)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 3.63 s
--------------------------------------------------------------------------------
OK
2016-05-31
Time taken: 6.738 seconds, Fetched: 1 row(s)
hive>
Testing this a bit more, I found that there was one server configured in a different timezone in the cluster. Two of the three nodes were UTC, but one node was still in America/Denver.
I believe what was happening was the Map/Reduce jobs were executing on the server in the different timezone thus giving me the weird data offset issue.
Date 2016-06-01 UTC does indeed equal Date 2016-05-31 America/Denver
Silent TZ conversion...

Hive Current date function

I want to get the current date in beeline.
I tried to use this:
FROM_UNIXTIME(UNIX_TIMESTAMP())
it outputs this:
16-03-21
What I was looking to get it:
2016-03-21 09:34
How do I do it? I see the beeline documentation here:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions
But it didnt work for me.
you can get it by passing expected format as a parameter of from_unixtime function.
Example :
select from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:MM');
Result:
2016-03-21 16:03
Try this:
Select to_date(from_unixtime(unix_timestamp())) from my table ...
Results in '2016-03-21'
there are many functions you can use in hive : taken from http://atiblog.com/date-function-hive/
1)from_unixtime:
This function converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a STRING that represents the TIMESTAMP of that moment in the current system time zone in the format of “1970-01-01 00:00:00”. The following example returns the current date including the time.
hive> SELECT FROM_UNIXTIME(UNIX_TIMESTAMP());
OK
2015–05–18 05:43:37
Time taken: 0.153 seconds, Fetched: 1 row(s)
2)from_utc_timestamp:-
This function assumes that the string in the first expression is UTC and then, converts that string to the time zone of the second expression. This function and the to_utc_timestamp function do timezone conversions. In the following example, t1 is a string.
hive> SELECT from_utc_timestamp(‘1970-01-01 07:00:00’, ‘JST’);
OK
1970–01–01 16:00:00
Time taken: 0.148 seconds, Fetched: 1 row(s)
3)to_utc_timestamp:
This function assumes that the string in the first expression is in the timezone that is specified in the second expression, and then converts the value to UTC format. This function and the from_utc_timestamp function do timezone conversions.
hive> SELECT to_utc_timestamp (‘1970-01-01 00:00:00’,‘America/Denver’);
OK
1970–01–01 07:00:00
Time taken: 0.153 seconds, Fetched: 1 row(s)
4)unix_timestamp :
This function converts the date to the specified date format and returns the number of seconds between the specified date and Unix epoch. If it fails, then it returns 0. The following example returns the value 1237487400
hive> SELECT unix_timestamp (‘2009-03-20’, ‘yyyy-MM-dd’);
OK
1237487400
Time taken: 0.156 seconds, Fetched: 1 row(s)
5)unix_timestamp() :This function returns the number of seconds from the Unix epoch (1970-01-01 00:00:00 UTC) using the default time zone.
hive> select UNIX_TIMESTAMP(‘2000-01-01 00:00:00’);
OK
946665000
Time taken: 0.147 seconds, Fetched: 1 row(s)
6)unix_timestamp( string date ) :
This function converts the date in format ‘yyyy-MM-dd HH:mm:ss’ into Unix timestamp. This will return the number of seconds between the specified date and the Unix epoch. If it fails, then it returns 0.
hive> select UNIX_TIMESTAMP(‘2000-01-01 10:20:30’,‘yyyy-MM-dd’);
OK
946665000
Time taken: 0.148 seconds, Fetched: 1 row(s)
7)unix_timestamp( string date, string pattern ) :
This function converts the date to the specified date format and returns the number of seconds between the specified date and Unix epoch. If it fails, then it returns 0.
hive> select FROM_UNIXTIME( UNIX_TIMESTAMP() );
8)from_unixtime( bigint number_of_seconds [, string format] ) :The FROM_UNIX function converts the specified number of seconds from Unix epoch and returns the date in the format ‘yyyy-MM-dd HH:mm:ss’.
hive> SELECT FROM_UNIXTIME(UNIX_TIMESTAMP());
9)To_Date( string timestamp ) :
hive> select TO_DATE(‘2000-01-01 10:20:30’);
OK
2000–01–01
10)WEEKOFYEAR( string date )
The WEEKOFYEAR function returns the week number of the date.
hive> SELECT WEEKOFYEAR(‘2000-03-01 10:20:30’);
OK
9
11)DATEDIFF( string date1, string date2 )
The DATEDIFF function returns the number of days between the two given dates.
hive> SELECT DATEDIFF(‘2000-03-01’, ‘2000-01-10’);
OK
51
Time taken: 0.156 seconds, Fetched: 1 row(s)
12)DATE_ADD( string date, int days )
The DATE_ADD function adds the number of days to the specified date
hive> SELECT DATE_ADD(‘2000-03-01’, 5);
OK
2000–03–06
13)DATE_SUB( string date, int days )
The DATE_SUB function subtracts the number of days to the specified date
hive> SELECT DATE_SUB(‘2000-03-01’, 5);
OK
2000–02–25
14)DATE CONVERSIONS :Convert MMddyyyy Format to Unixtime
Note: M Should be Capital Every time in MMddyyyy Format
select cast(substring(from_unixtime(unix_timestamp(dt, ‘MMddyyyy’)),1,10) as date) from sample;

enclose columns in quotes using hiveql

Hive Version 0.13
I am having a column in hive which needs to be enclosed in double quotes
I am trying with the below query but the output is returned as NULL
select "\""+notes_detail+"\"" from service_request_notes.TSS_INCIDENT_NOTES_F_SAMPLE limit 1;
where as the output should be this
select notes_detail from service_request_notes.TSS_INCIDENT_NOTES_F_SAMPLE limit 1;
Job 0: Map: 106 Cumulative CPU: 151.24 sec MAPRFS Read: 0 MAPRFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 2 minutes 31 seconds 240 msec
OK
notes_detail
From: scottw#acespower.com

Google like query for Oracle database

I want to do a lookup on a column in an Oracle table that contains a company name.
If I ask for "Jennifer's Dry Cleaners" I'd like to return not only that exact match but also "close matches" like (but not limited to):
Jennifer's Pipe Cleaners
Jessica's Dry Cleaners
Jennifer Pipe Cleanup
Pipe Cleaning by Jennifer
Is there a way to accomplish this?
You might be able to us the LIKE operator on each word and combine that with UTL_MATCH.EDIT_DISTANCE to order the results by closeness:
SELECT
company_name,
utl_match.edit_distance(lower(company_name), 'jennifers dry cleaners') edit_distance
FROM company_names
WHERE
lower(company_name) LIKE '%jennifer%' OR
lower(company_name) LIKE '%dry%' OR
lower(company_name) LIKE '%cleaner'
ORDER BY edit_distance asc;
Use Oracle Text indexes. You can either index a particular column of data, or use a stored procedure to associate a group of terms with a particular record.
Here's an example of indexing a particular column of a table. See documentation on CTX_DDL for how to index using a "Data Store", for indexing multiple columns, even from different sources.
9:28:34 AM > create table NOTE_TABLE (NOTE_ID NUMBER constraint
NOTE_TABLE_PK primary key, NOTE_TEXT VARCHAR2(4000));
Table created
Executed in 0.045 seconds
9:28:34 AM > insert into NOTE_TABLE values(1, 'This is the contents
of a note that has the word "glory"');
1 row inserted
Executed in 0.023 seconds
9:28:34 AM > insert into NOTE_TABLE values(2, 'Mine eyes have see the
Glory of the coming of the LORD');
1 row inserted
Executed in 0.01 seconds
9:51:15 AM> insert into NOTE_TABLE values(3, 'This is Jennifer''s Pipe cleaning');
1 row inserted
Executed in 0.008 seconds
9:28:34 AM > commit;
Commit complete
Executed in 0.014 seconds
9:28:34 AM > begin
2 ctx_ddl.create_preference('my_word_list', 'BASIC_WORDLIST');
3 ctx_ddl.set_attribute('my_word_list','FUZZY_MATCH','ENGLISH');
4 ctx_ddl.set_attribute('my_word_list','FUZZY_SCORE','65');
5 ctx_ddl.set_attribute('my_word_list','FUZZY_NUMRESULTS','5000');
6 ctx_ddl.set_attribute('my_word_list','SUBSTRING_INDEX','TRUE');
7 ctx_ddl.set_attribute('my_word_list','PREFIX_INDEX','YES');
8 ctx_ddl.set_attribute('my_word_list','PREFIX_MIN_LENGTH', 1);
9 ctx_ddl.set_attribute('my_word_list','PREFIX_MAX_LENGTH', 64);
10 ctx_ddl.set_attribute('my_word_list','STEMMER','ENGLISH');
11 end;
12 /
PL/SQL procedure successfully completed
Executed in 0.024 seconds
9:28:34 AM > CREATE INDEX NOTE_TABLE_TEXT_IX on NOTE_TABLE(NOTE_TEXT) indextype is
2 ctxsys.CONTEXT parameters ('wordlist my_word_list');
Index created
Executed in 0.321 seconds
9:28:35 AM > select NOTE_ID from NOTE_TABLE t where contains(NOTE_TEXT, '{glory}', 10) > 0;
NOTE_ID
----------
1
2
Executed in 0.101 seconds
9:28:35 AM > select NOTE_ID from NOTE_TABLE t where contains(NOTE_TEXT, '{glory} and {lord}', 10) > 0;
NOTE_ID
----------
2
Executed in 0.055 seconds
9:51:16 AM> select NOTE_ID from NOTE_TABLE t where contains(NOTE_TEXT, 'fuzzy(cleaners, 1, 3)', 10) > 0;
NOTE_ID
----------
3
Executed in 0.056 seconds
9:28:35 AM > drop table NOTE_TABLE;
Table dropped
Executed in 0.283 seconds
9:28:35 AM > begin
2 ctx_ddl.drop_preference('my_word_list');
3 end;
4 /
PL/SQL procedure successfully completed
Executed in 0.012 seconds
9:28:36 AM >

Resources