How to redirect hive query output to text file with header and column name having space - hadoop

I have hive able product with rating.
Id, productid, rating, ProdBarCode
42 96 5 881107178
168 151 5 884288058
110 307 4 886987260
58 144 4 884304936
62 21 3 879373460
279 832 3 881375854
237 514 4 879376641
I want to write a query find average product rating of product to pipe separated text file with header using hive -e"query" > output.txt
OUTPUT Format:-|Productid|average rating|
Solution:
hive -e " select C.value from (select 1 key, '|Productid|average rating|' value union all select 2 key , concat('|',concat_ws('|', Productid, averagerating),'|') value from (select CAST(A.productid AS STRING) AS Productid, CAST(A.averagerating AS STRING) AS averagerating from (select productid, avg(rating) averagerating from product group by productid sort by productid ) AS A where A.averagerating > 2) B sort by key) C " > output.txt
Is this query correct? Is there any other simple way to redirect the output in text file with header and column name having spaces (average rating)?
Any suggestion

Related

HIVE : Replace string/ pattern in row if it exists else do nothing

I have a table A with id, name, age.
> id name age
> {20} Joan 12
> 3 James 12
> 12 Jill 12
> {54} Adam 12
> {10} Bill 12
I need to remove the {} surrounding 'id' field.
I tried this :
translate(regexp_extract(id, '([^{])([^}])', 2), '{', '')
which works but returned a null for values with NO {}.
id
3
12
Is there way I can get the output as ???
id
20
3
12
54
10
You could use the regexp_replace udf so as to remove the "{}" like :
select regexp_replace(id, '\\{|\\}','');
Please try the following select statement:
select regexp_replace(col1,'[{}]','') as replaced,col2,col3 from table_name;

Transform row into column and vice-versa using sql - oracle

I have this table:
create table history (
date_check DATE,
type VARCHAR2(30),
id_type NUMBER,
total NUMBER
)
Selecting.....
select * from history order by 1
DATE_CHECK TYPE ID_TYPE TOTAL
14/02/2016 abc 1 14
14/02/2016 abc33 1 14
14/02/2016 bbb 1 40
14/02/2016 bbb33 3 43
14/02/2016 ddd 2 61
14/02/2016 ddd33 2 62
15/02/2016 abc 1 33
15/02/2016 abc33 1 44
15/02/2016 bbb 1 55
15/02/2016 bbb33 3 66
15/02/2016 ddd 2 77
15/02/2016 ddd33 2 88
Type its always this 6 values:
abc
abc33
bbb
bbb33
ddd
ddd33
And I cross this data with "id_type" so there is a decode like this:
select type || decode(id_type, 1, '- new', 2, '- old', 3, '- xpto') as type from history order by 1
In the end I need something like this:
DATE_CHECK abc - new abc33 - old bbb - new bbb33 - old ....
14/02/2016 14 14 40 43
15/02/2016 33 44 55 66
What is the easiest way to do it? Using pivot?
try this:
with data as(
select date_check, type, total from (
select date_check, type || ' ' || decode(id_type, 1, '- new', 2, '- old', 3, '- xpto') as type, total from history
))
select * from data
pivot(
max(total) for type in ('abc - new', 'abc33 - new', 'bbb - new',
'bbb33 - xpto', 'ddd - old', 'ddd33 - old')
)
order by date_check;
And for the "vice versa" use UNPIVOT
You can reference multiple columns in a pivot statement to get your desired output. In your case you have a single analytic column (TOTAL) but multiple columns forming composite columns on which to perform the analytic function, you can use a pivot query like the following:
select *
from history
PIVOT ( max(TOTAL)
for (TYPE, ID_TYPE) in ( ('abc',1) abc_new
, ('abc',2) abc_old
, ('abc',3) abc_xpto
, ('abc33',1) abc33_new
, ('abc33',2) abc33_old
, ('abc33',3) abc33_xpto
, ('bbb',1) bbb_new
, ('bbb',2) bbb_old
, ('bbb',3) bbb_xpto
, ('bbb33',1) bbb33_new
, ('bbb33',2) bbb33_old
, ('bbb33',3) bbb33_xpto
, ('ddd',1) ddd_new
, ('ddd',2) ddd_old
, ('ddd',3) ddd_xpto
, ('ddd33',1) ddd33_new
, ('ddd33',2) ddd33_old
, ('ddd33',3) ddd33_xpto
)
)
You can adjust the output column headings to suite if desired by changing them similar to the following:
...
PIVOT ( max(TOTAL)
for (TYPE, ID_TYPE) in ( ('abc',1) "abc - new"
, ('abc',2) "abc - old"
, ('abc',3) "abc - xpto"
, ('abc33',1) "abc33 - new"
, ...

Sum multiple columns using PIG

I have multiple files with same columns and I am trying to aggregate the values in two columns using SUM.
The column structure is below
ID first_count second_count name desc
1 10 10 A A_Desc
1 25 45 A A_Desc
1 30 25 A A_Desc
2 20 20 B B_Desc
2 40 10 B B_Desc
How can I sum the first_count and second_count?
ID first_count second_count name desc
1 65 80 A A_Desc
2 60 30 B B_Desc
Below is the script I wrote but when I execute it I get an error "Could not infer matching function for SUM as multiple of none of them fit.Please use an explicit cast.
A = LOAD '/output/*/part*' AS (id:chararray,first_count:chararray,second_count:chararray,name:chararray,desc:chararray);
B = GROUP A BY id;
C = FOREACH B GENERATE group as id,
SUM(A.first_count) as first_count,
SUM(A.second_count) as second_count,
A.name as name,
A.desc as desc;
Your load statement is wrong. first_count, second_count is loaded as chararray. Sum can't add two strings. If you are sure that these columns will take numbers only then load them as int. Try this-
A = LOAD '/output/*/part*' AS (id:chararray,first_count:int,second_count:int,name:chararray,desc:chararray);
It should work.

How to find Max value of an alphanumeric field in oracle?

I have the data as below and ID is VARCHAR2 type
Table Name :EMP
ID TST_DATE
A035 05/12/2015
BAB0 05/12/2015
701 07/12/2015
81 07/12/2015
I used below query to get max of ID group by TST_DATE.
SELECT TST_DATE,MAX(ID) from EMP group by TST_DATE;
TST_DATE MAX(ID)
05/12/2015 BAB0
07/12/2015 81
In the second row it returning 81 instead of 701.
To sort strings that represent (hex) numbers in numeric, rather than lexicographical, order you need to convert them to actual numbers:
SELECT TST_DATE, ID, TO_NUMBER(ID, 'XXXXXXXXXX') from EMP
ORDER BY TO_NUMBER(ID, 'XXXXXXXXXX');
TST_DATE ID TO_NUMBER(ID,'XXXXXXXXXX')
---------- ---- ---------------------------------------
07/12/2015 81 129
07/12/2015 701 1793
05/12/2015 A035 41013
05/12/2015 BAB0 47792
You can use that numeric form within your max() and convert back to a hex string for display:
SELECT TST_DATE,
TO_CHAR(MAX(TO_NUMBER(ID, 'XXXXXXXXXX')), 'XXXXXXXXXX')
from EMP group by TST_DATE;
TST_DATE TO_CHAR(MAX
---------- -----------
07/12/2015 701
05/12/2015 BAB0
With a suitable number of Xs in the format models of course; how many depends on the size of your varchar2 column.

Sorting Matrix Columns in RDLC Report

I get the following query result:
EmployeeName payelement payelementValue payelementOrder
------------ ---------- --------------- ---------------
emp1 PE1 122 2
emp1 PE2 122 1
emp2 PE1 122 2
emp2 PE2 122 1
emp3 PE1 122 2
emp3 PE2 122 1
Which results in a report that looks like:
Employee Name PE2 PE1
emp1 122 122
emp2 122 122
emp3 122 122
I have created a matrix in rdlc report and and put the column field with the ->'payelement ' and the value field with ->'payelementValue' and set the rows field with ->'employeeName ' the problem now is that I want to sort the 'payelement' upon the field named 'payelementOrder' which represents the order for paylements in their actual table while I actually get them sorted alphabetically by defualt i.e.(PE1 then PE2). Any help would be greatly appreciated.
I Solved by this...
Go to the .rdlc... Check the Row Groups(which we will find in the left-bottom) under of that we will find the grouped column name (which we are having in the tables) then right click on it-> Go to Group properties... -> Go to sorting-> on the sort by give the column name which you want to sort according to and Click Ok.
And You are Done....
When you created a matrix you got a Column group. In the group properties of the column group you can set order by specific field (payelementOrder in your case)

Resources