Counting in Hadoop Hive - hadoop

I want to count values similar in a map where key would be the value in the Hive table column and the corresponding value is the count.
For example, for the table below:
+-------+-------+
| Col 1 | Col 2 |
+-------+-------+
| Key1 | Val1 |
| Key1 | Val2 |
| Key2 | Val1 |
+-------+-------+
So the hive query should return something like
Key1=2
Key2=1

It looks like you are looking for a simple group by.
SELECT Col1, COUNT(*) FROM Table GROUP BY Col1

Related

Oracle index that would optimize querying for rows whose ids are not inside a json array column

If I had a table with two columns
Id
json_data
1
[1, 10, 11]
2
[]
I could easily query the results using
SELECT M1.Id
FROM MyTable M1
WHERE Id NOT EXISTS (
SELECT 1 FROM MyTable M2, (JSON_TABLE(M2.JsonData, '$[*]'
ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH
COLUMNS(
Id NVARCHAR2(20) PATH '$'))) JT
WHERE JT.Id = M1.Id)
Now how do I index this column so the query is not doint a full table scan?
MULTIVALUE indexes are used (I believe) for only JSON_EXISTS queries like this one
SELECT Id
FROM MyTable WHERE NOT JSON_EXISTS(JsonData, '$?(# == 1)')
but I can't use this function for non constant values such as M1.Id
MULTIVALUE INDEX is available only in 21c and the array should be a field of a record, and the column must be JSON, it doesn't work with a CLOB with a CHECK constraint "IS JSON".
https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:9545758700346721260
create table t_test_ixjs (
id number(10,0),
json_data JSON
);
insert into t_test_ixjs(id,json_data) values(1,'{ "a" : [1, 10, 11] }') ;
insert into t_test_ixjs(id,json_data) values(2,'{ "a" : [] }') ;
create multivalue index ix_test_json_data on t_test_ixjs t ( t.json_data.a.number() );
with vals(d) as (
select 1 from dual
)
SELECT *
FROM vals v,
t_test_ixjs WHERE JSON_EXISTS(json_data, '$.a?(# == $d)' PASSING v.d AS "d")
;
Plan hash value: 1205791918
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 20513 | 2 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T_TEST_IXJS | 1 | 20513 | 2 (0)| 00:00:01 |
| 2 | HASH UNIQUE | | 1 | 20513 | | |
|* 3 | INDEX RANGE SCAN (MULTI VALUE) | IX_TEST_JSON_DATA | 1 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Insert data to table from another table containing null values and replace null values with the original table 1 values

I want to match first column of both table and insert table 2 values to table 1 . But if Table 2 values are null leave table 1 vlaues as it is .I am using Hive to dothis .Please help.
You need to use coalesce to get non null value to populate b column and case statement to make decision to populate c column.
Example:
hive> select t1.a,
coalesce(t2.y,t1.b)b,
case when t2.y is null then t1.c
else t2.z
end as c
from table1 t1 left join table2 t2 on t1.a=t2.x;
+----+-----+----+--+
| a | b | c |
+----+-----+----+--+
| a | xx | 5 |
| b | bb | 2 |
| c | zz | 7 |
| d | dd | 4 |
+----+-----+----+--+

Column to comma separated value in Hive

It's been asked and answered for SQL (Convert multiple rows into one with comma as separator), would any of the approaches mentioned work in Hive, e.g. to go from this:
+------+------+
| Col1 | Col2 |
+------+------+
| a | 1 |
| a | 5 |
| a | 6 |
| b | 2 |
| b | 6 |
+------+------+
to this:
+------+-------+
| Col1 | Col2 |
+------+-------+
| a | 1,5,6 |
| b | 2,6 |
+------+-------+
The aggregator function collect_set can achieve what you are trying to get. Here is the documentation. So you can write a query like:
SELECT Col1, collect_set(Col2)
FROM your_table
GROUP BY Col1;
However, there is one striking difference between MySQL's GROUP BY and Hive's collect_set that while GROUP_CONCAT also retains duplicates in the resulting array, collect_set removes the duplicates occuring in the array. In the example shown by you there are no repeating group values for Col2 so you can go ahead and use it.
And there is collect_list that will take full list (with duplicates).
Try this
SELECT Col1, concat_ws(',', collect_set(Col2)) as col2
FROM your_table
GROUP BY Col1;
apache.org documentation

Joining tables with same column names - ORACLE

I am using Oracle.
I am currently working one 2 tables which both have the same column names. Is there any way in which I can combine the 2 tables together as they are?
Simple example to show what I mean:
TABLE 1:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| a | 1 | w |
| b | 2 | x |
TABLE 2:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| c | 3 | y |
| d | 4 | z |
RESULT THAT I WANT:
| COLUMN 1 | COLUMN 2 | COLUMN 3 |
----------------------------------------
| a | 1 | w |
| b | 2 | x |
| c | 3 | y |
| d | 4 | z |
Any help would be greatly appreciated. Thank you in advance!
You can use the union set operator to get the result of two queries as a single result set:
select column1, column2, column3
from table1
union all
select column1, column2, column3
from table2
union on its own implicitly removes duplicates; union all preserves them. More info here.
The column names don't need to be the same, you just need the same number of columns with the same datatpes, in the same order.
(This is not what is usually meant by a join, so the title of your question is a bit misleading; I'm basing this on the example data and output you showed.)

How to select id, first_not_null(value1), first_not_null(value2).. on Postgresql

I have a table like this:
+--+---------+---------+
|id|str_value|int_value|
+--+---------+---------+
| 1| 'abc' | |
| 1| | 1 |
| 2| 'abcd' | |
| 2| | 2 |
+--+---------+---------+
I need to get this:
+--+---------+---------+
|id|str_value|int_value|
+--+---------+---------+
| 1| 'abc' | 1 |
| 2| 'abcd' | 2 |
+--+---------+---------+
It seems to me that I need something like:
select id, first_not_null(str_value), first_not_null(int_value)
from table
group by id
Is there any acceptable way to do this? I use Postgresql 9.0.1.
Update: this should work with uuid types as well
You should look at http://www.postgresql.org/docs/8.1/static/functions-aggregate.html for aggregate functions.
I guess max should do the work
EDIT: Working example
select id, max(col1), max(col2) from (
select 1 as id, null as col1, 'test' as col2
union
select 1 as id ,'blah' as col1, null as col2
) x group by id

Resources