HIVE GROUP_CONCAT with ORDER BY - hadoop

I have a table like
I expect the output like this (group concat the results in one record, and the group_concat should sort the results by value DESC).
Here is the query I tried,
SELECT id,
CONCAT('{',CONCAT_WS(',',GROUP_CONCAT(CONCAT('"',key, '":"',value, '"'))), '}') AS value
FROM
table_name
GROUP BY id
I want the value in the destination table should be sorted (descending order) by source table value.
To do that, I tried doing GROUP_CONCAT(... ORDER BY value).
Looks like Hive does not support this. Is there any other way to achieve this in hive?

Try out this query.
Hive does not support the GROUP_CONCAT function, but instead you can use the collect_list function to achieve something similar. Also, you will need to use analytic window functions because Hive does not support ORDER BY clause inside the collect_list function
select
id,
-- since we have a duplicate group_concat values against the same key
-- we can pick any one value by using the min() function
-- and grouping by the key 'id'
-- Finally, we can use the concat and concat_ws functions to
-- add the commas and the open/close braces for the json object
concat('{', concat_ws(',', min(g)), '}')
from
(
select
s.id,
-- The window function collect_list is run against each row with
-- the partition key of 'id'. This will create a value which is
-- similar to the value obtained for group_concat, but this
-- same/duplicate value will be appended for each row for the
-- same key 'id'
collect_list(s.c) over (partition by s.id
order by s.v desc
rows between unbounded preceding and unbounded following) g
from
(
-- First, form the key/value pairs from the original table.
-- Also, bring along the value column 'v', so that we can use
-- it further for ordering
select
id,
v,
concat('"', k, '":"', v, '"') as c
from
table_name -- This it th
)
s
)
gs
-- Need to group by 'id' since we have duplicate collect_list values
group by
id

Related

Oracle NoSQL - how to find all the rows in a MAP where the key starts With a value

I have a question about the MAP data type. Say I have a column labels ( labels MAP(RECORD(value STRING, contentType STRING)) in myTable, which the “labels” column is MAP data type and the value is a RECORD data type .
I want to query the table which returns all the rows that the key of the "labels" "startsWith" particular value ("xxx.*"),
I've tried this but I am wondering if there is a better way to do
Select labels.keys($key >='xxx') as keys,
labels.values($key >='xxx') as values
from myTable where labels.keys() >=any ('xxx')
You can try
select * from myTableName t
where exists t.labels.keys(starts_with($key, 'xxx'));
or
select f.labels.keys(regex_like($key,'xxx.*')) as keys,
f.labels.values(regex_like($key,'xxx.*')) as values
from myTable f
I also suggest changing from MAP to ARRAY, which can support path filter to get the matched entries. In the previous examples, the order between the values and keys is not guaranteed
select labels[regex_like($element.label ,‘xxx.*’)] from myTable

how to pass parameter to oracle update statement from csv file and excluding null values from csv

I have a situation where I have following csv file(say file.csv) with following data:
AcctId,Name,OpenBal,closingbal
1,abc,1000,
2,,0,
3,xyz,,
4,,,
how can I loop through this file using unix shell and say for example for column $2 (Name) , I want to get all occurances of Name column accept null values and pass it to for example following oracle query with single quotes '','' format?
select * from account
where name in (collection of values from csv file column name
but excluding null values)
and openbal in
and same thing for column 3 (collection of values from csv file column Openbal
but excluding null values)
and same thing for column 4 (collection of values from csv file column
closingbal but excluding null values)
In short what I want is pass the csv column values as input parameter to oracle sql query and update query too ? but again I dont want to include null values in it. If a column is entirely null for all rows I want to exclude it too?
Not sure why you'd want to loop through this file in a unix shell script: perhaps because you can't think of any better approach? Anyway, I'm going to skip that and offer a pure Oracle solution.
We can expose data in CSV files to the database using external tables. These are like regular tables except their data comes from files in OS directories on the database server (rather than the database's storage). Find out more.
Given this approach it is easy to write the query you want. I suggest using sub-query factoring to select from the external table once.
with cte as ( select name, openbal, closingbal
from your_external_tab )
select *
from account a
where a.name in ( select cte.name from cte )
and a.openbal in ( select cte.openbal from cte )
and a.closingbal in ( select cte.closingbal from cte )
The behaviour of the IN clause is to exclude NULL from consideration.
Incidentally, that will return a different (larger) result set from this:
select a.*
from account a
, your_external_table e
where a.name = e.name
and a.openbal= e.openbal
and a.closingbal = e.closingbal

Distinct Column in Hive

I am trying to get a query result in HiveQL with one column as distinct. However the results are not matching . There are almost 20 columns in the table.
create table uniq_us row format delimited fields terminated by ',' lines terminated by '\n' as select distinct(a),b,c,d,e,f,g,h,i,j from ctry_us_join;
The resulting number of Rows :513238
select count(distinct a) from ctry_us_join;
The resulting number of rows : 151616
How is this possible and is something wrong in my first or second query
U need to use subselect with group by statement.
select count(a) from (
select a, count(*) from ctry_us_join group by a) b
This is just one solution for this.
Distinct is a keyword, not a function. It applies to all columns you list in your select clause. It is quite reasonable that your table has only 151,616 distinct values in the column a, but multiple rows with the same value in the column a have different values in other columns. That might give you 513,238 distinct rows.

How to list distinct keys of an index?

user_indexes table has a column named 'distinct keys'. Does this value represent the number of distinct keys in the column indexed. In that case, is there a way to list all those keys ?
Does this value represent the number of distinct keys in the column indexed.
Yes, it does represent the number of distinct indexed values.
In that case, is there a way to list all those keys ?
You'll have to manually execute SELECT DISTINCT column_name FROM table_name to get list of distinct values. There is no system view, which stores the distinct values associated to an indexed column.
Since you're interested in the distinct values in an index, you would be better off running a query like this:
SELECT DISTINCT column_name FROM table_name WHERE column_name IS NOT NULL;
This is very likely to use the index to return the distinct values very quickly, without having to do a full table scan and a sort.
(Note: if the column already has a validated NOT NULL constraint, you won't need the "IS NOT NULL" where clause).

Oracle merge constants into single table

In Oracle, given a simple data table:
create table data (
id VARCHAR2(255),
key VARCHAR2(255),
value VARCHAR2(511));
suppose I want to "insert or update" a value. I have something like:
merge into data using dual on
(id='someid' and key='testKey')
when matched then
update set value = 'someValue'
when not matched then
insert (id, key, value) values ('someid', 'testKey', 'someValue');
Is there a better way than this? This command seems to have the following drawbacks:
Every literal needs to be typed twice (or added twice via parameter setting)
The "using dual" syntax seems hacky
If this is the best way, is there any way around having to set each parameter twice in JDBC?
I don't consider using dual to be a hack. To get rid of binding/typing twice, I would do something like:
merge into data
using (
select
'someid' id,
'testKey' key,
'someValue' value
from
dual
) val on (
data.id=val.id
and data.key=val.key
)
when matched then
update set data.value = val.value
when not matched then
insert (id, key, value) values (val.id, val.key, val.value);
I would hide the MERGE inside a PL/SQL API and then call that via JDBC:
data_pkg.merge_data ('someid', 'testKey', 'someValue');
As an alternative to MERGE, the API could do:
begin
insert into data (...) values (...);
exception
when dup_val_on_index then
update data
set ...
where ...;
end;
I prefer to try the update before the insert to save having to check for an exception.
update data set ...=... where ...=...;
if sql%notfound then
insert into data (...) values (...);
end if;
Even now we have the merge statement, I still tend to do single-row updates this way - just seems more a more natural syntax. Of course, merge really comes into its own when dealing with larger data sets.

Resources