Change Default Hive result to some values - hadoop

I was trying to get duplicate record count from table, but for particular partitions data is not available, so hive is only printing "OK" result.
Is it possible to change this result with some value like 0 Or NULL.
Yes have tried with nvl,COALESCE,case option still it showing OK. AND goal is to only check duplicate count, so required at least one value
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1

It will return no rows on empty dataset because you are using group by and having filter. Group by having nothing to group, that is why it does not return any rows. Without group by and having query returns 0:
select nvl(count(*),0) cnt, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
As a solution you can UNION ALL with null row when empty dataset
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
UNION ALL --returns 1 row on empty dataset
select col1, col2, DUPLICATE_ROW_COUNT, TABLE_NAME
from (select null col1, null col2, null AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
)a --inner join will not return rows when non-empty dataset
inner join (
select count(*) cnt from --should will return 0 on empty dataset
( --your original query
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
)s --your original query
)s on s.cnt=0
Also it's may be possible to use CTE (WITH) and WHERE NOT EXISTS instead of inner joinfor your subquery, didn't test it.
Also you can use shell to get result and test it on empty value:
dataset=$(hive -e "set hive.cli.print.header=false; [YOUR QUERY HERE]);
# test on empty dataset
if [[ -z "$dataset" ]] ; then
dataset=0
fi

Related

How to retrieve only columns which have at least one not null value in any row in Oracle

I have table structure and data as below
https://ibb.co/mkGp67
I want a SQL Query to retrieve data only for those columns which have at least one not null value in it, in above case i want data comes out to be
https://ibb.co/mz9967
i.e. i don't need column Col2, Col5 and Col6, also which column having all null value is not fixed.
Please let me know the SQL query which retreive data that having only those column which having not null value with data as above.
As far, as I know, you will not be able to achieve this with an SQL query. One of the strong assumptions of SELECT statements is that the list of returned columns is static - defined in the query, not by the data. Even for PIVOT queries (available - as far, as I know - since Oracle 11), the list of columns is defined in the query, by providing a list of values to be converted to columns has to be explicitly given.
What you are looking for is some kind of code, dynamically generating the query. This can be PL/SQL, returning cursor references or any application code.
Edit:
What you could do with a query, is to have a clear information on which columns do contain nulls, which do not, etc. It could look something like this:
SELECT CASE
WHEN COUNT(*) = 0 THEN 'no rows'
WHEN COUNT(Col1) = 0 THEN 'all NULLs'
WHEN COUNT(Col1) = COUNT(*) THEN 'no NULLs'
ELSE 'some NULLs'
END Col1NullStatus,
CASE
WHEN COUNT(*) = 0 THEN 'no rows'
WHEN COUNT(Col2) = 0 THEN 'all NULLs'
WHEN COUNT(Col2) = COUNT(*) THEN 'no NULLs'
ELSE 'some NULLs'
END Col2NullStatus,
CASE
WHEN COUNT(*) = 0 THEN 'no rows'
WHEN COUNT(Col3) = 0 THEN 'all NULLs'
WHEN COUNT(Col3) = COUNT(*) THEN 'no NULLs'
ELSE 'some NULLs'
END Col3NullStatus,
CASE
WHEN COUNT(*) = 0 THEN 'no rows'
WHEN COUNT(Col4) = 0 THEN 'all NULLs'
WHEN COUNT(Col4) = COUNT(*) THEN 'no NULLs'
ELSE 'some NULLs'
END Col4NullStatus,
CASE
WHEN COUNT(*) = 0 THEN 'no rows'
WHEN COUNT(Col5) = 0 THEN 'all NULLs'
WHEN COUNT(Col5) = COUNT(*) THEN 'no NULLs'
ELSE 'some NULLs'
END Col5NullStatus,
CASE
WHEN COUNT(*) = 0 THEN 'no rows'
WHEN COUNT(Col6) = 0 THEN 'all NULLs'
WHEN COUNT(Col6) = COUNT(*) THEN 'no NULLs'
ELSE 'some NULLs'
END Col6NullStatus
FROM myTable
See SQL Fiddle for the above.
Edit 2:
And the output of this query would look something like this:
Col1NullStatus | Col2NullStatus | Col3NullStatus | Col4NullStatus | Col5NullStatus | Col6NullStatus
---------------+----------------+----------------+----------------+----------------+----------------
no NULLs | all NULLs | some NULLs | no NULLs | all NULLs | all NULLs
This is the format, you could be using, to post your input data and expected results.
So, since you give no no formal table structure, and you seem to be confusing numbers and chars(s), I will do my best to try and make a query that will at least produce the results you want.
create table foo as (
col1 varchar(10),
col2 varchar(10),
col3 varchar(10),
col4 varchar(10),
col5 varchar(10),
col6 varchar(10)
);
select *
CASE cust1 WHEN null then 'null' else cust1 as cust1 end,
CASE cust2 WHEN null then 'null' else cust1 as cust1 end,
CASE cust3 WHEN null then 'null' else cust1 as cust1 end,
CASE cust4 WHEN null then 'null' else cust1 as cust1 end,
CASE cust5 WHEN null then 'null' else cust1 as cust1 end,
CASE cust6 WHEN null then 'null' else cust1 as cust1 end
from foo ;
As per below query , I able to get not null columns at row-level col1,col3 and col4.
Query :
select 'col1' as "Name",col1 from temp
where exists (select 1
from temp
group by to_char(col1)
having (count(to_char(col1)))> 0)
union all
select 'col2' as "Name",to_char(col2) from temp
where exists (select 1
from temp
group by to_char(col2)
having (count(to_char(col2)))> 0)
union all
select 'col3' as "Name" , to_char(col3) from temp
where exists (select 1
from temp
group by to_char(col3)
having (count(to_char(col3)))> 0)
union all
select 'col4'as "Name" , to_char(col4) from temp
where exists (select 1
from temp
group by to_char(col4)
having (count(to_char(col4)))> 0)
union all
select 'col5' as "Name" , to_char(col5) from temp
where exists (select 1
from temp
group by to_char(col5)
having (count(to_char(col5)))> 0)
union all
select 'col6' as "Name" , to_char(col6) from temp
where exists (select 1
from temp
group by to_char(col6)
having (count(to_char(col6)))> 0)
output:
col1 A
col1 B
col1 C
col1 D
col3 10
col3 20
col3 -
col3 10
col4 12
col4 23
col4 34
col4 43
I tried to make this output of rows to columns but I couldn't make it in single query ... Hope this will be helpful ...
I would do this usually in three steps.
Firstly, make sure that the table statistics are up to date. Check if last_analyzed is later than the last change to the table.
SELECT last_analyzed FROM user_tables WHERE table_name = 'MYTABLE';
If in doubt, update the statistics with
BEGIN dbms_stats.gather_table_stats('MYSCHEMA','MYTABLE'); END;
/
Now, the view user_tab_columns has a column num_nulls. This is the number of rows where this column is NULL. If the value is the same than the number of rows in the table, all rows are NULL. This can be used to let Oracle generate the required SQL:
WITH
qtab AS (SELECT owner, table_name, num_rows
FROM all_tables
WHERE owner='SCOTT' -- change to your schema
AND table_name='EMPLOYEES' -- change to your table name
),
qcol AS (SELECT owner, table_name, column_name, column_id
FROM qtab t
JOIN all_tab_columns c USING (owner, table_name)
WHERE c.nullable = 'N' -- protected by NOT NULL constraint
OR c.num_nulls = 0 -- never NULL
OR c.num_nulls < t.num_rows -- at least 1 row is NOT NULL
)
)
SELECT 'SELECT '||LISTAGG(column_name,',') WITHIN GROUP (ORDER BY column_id)||
' FROM '||owner||'.'||table_name||';' AS my_query
FROM qcol
GROUP BY owner, table_name;
This will output a query like
SELECT col1, col3, col4, col5 FROM myschema.mytable;
This query can now be executed to show the column values.

Hive Getting only max occurrence of a value

I have hive table which has two cloumns,I want to get the value which occured max number of times
For example in my below table a value occured twice and c only once , here a value is dominat so I want only a value as shown in output
col1 col2
a a_value1
a a_value2
a c_value3
b b_value1
OUTPUT:
col1 col2
a a_value1
b b_value1
You are looking for what statisticians call the mode. A pretty simple method is to use aggregation with a window function:
select col1, col2
from (select col1, col2, count(*) as cnt,
row_number() over (partition by col1 order by count(*) desc) as seqnum
from t
) t
where seqnum = 1;
The above query will return one value for each col1, even if there are ties. If you want all the values in the event of ties, then use rank() or dense_rank().

Oracle, get all columns

I got this statement:
select count(*),
article_no_external,
article_group_id
from tbl_erp_article
where article_no_external != ' '
group by article_no_external, article_group_id
having count(*) >1
I want to group by group_id and external_no, this works just fine, I get 128 records. But I would like to see all columns not only those 2. I tried to add them to the select, but then I get an error with the group by. I need 4 more columns cause I need to grab them to make a new record using the selected data.
select article_no_external, article_group_id, col2, col3, col4, col5
from (
select article_no_external, article_group_id, col2, col3, col4, col,
count(*) over (partition by article_no_external, article_group_id) as cnt
from tbl_erp_article
where article_no_external <> ' '
)
where cnt > 1;
If you want to find non-empty varchar columns remember that Oracle doesn't have an empty string. An '' is converted to NULL during inserts or updates. So you probably want where article_no_external IS NOT NULL
You cant get all column values when you aggregate your fields for count, sum etc.
Not exacly same result but this may help you.
select *
from tbl_erp_article
where article_no_external != ' ' and
(article_no_external, article_group_id) in (
select article_no_external, article_group_id
from tbl_erp_article
where article_no_external != ' '
group by article_no_external, article_group_id
having count(*) >1)

Ruby regex - extract words

I want to extract table names from an SQL query.
SELECT col1, col2, count(1) as count_all, FROM tbl1, tbl2 where condition order by column
I want teh result ["tbl1", "tbl2"]
It is not necessary that there will be multiple tables to query. In that case the query will be
SELECT col1, col2, count(1) as count_all, FROM tbl1 where condition order by column
And expected result ["tbl1"]
Thanks in advance!
Note that this can potentially match stuff inside a SQL string and therefore is not perfect.
test = [
"SELECT col1, col2, count(1) as count_all FROM tbl1, tbl2 where condition order by column",
"SELECT col1, col2, count(1) as count_all FROM tbl1 where condition order by column",
"SELECT col1, col2, count(1) as count_all FROM tbl1",
]
tests.map { |str| str.match(/\s*from\s*([a-z_0-9]+(?:,\s*[a-z_0-9]+)*)\b/i); $1 }
#=> ["tbl1, tbl2", "tbl1", "tbl1"]

Getting Error in query

update tablename set (col1,col2,col3) = (select col1,col2,col3 from tableName2 order by tablenmae2.col4) return error
Missing ). The query works fine if I remove the order by clause
ORDER BY is not allowed in a subquery within an UPDATE. So you get the error "Missing )" because the parser expects the subquery to end at the point that you have ORDER BY.
What is the ORDER BY intended to do?
What you probably have in mind is something like:
UPDATE TableName
SET (Col1, Col2, Col3) = (SELECT T2.Col1, T2.Col2, T2.Col3
FROM TableName2 AS T2
WHERE TableName.Col4 = T2.Col4
)
WHERE EXISTS(SELECT * FROM TableName2 AS T2 WHERE TableName.Col4 = T2.Col4);
This clumsy looking operation:
Grabs rows from TableName2 that match TableName on the value in Col4 and updates TableName with the values from the corresponding columns.
Ensures that only rows in TableName with a corresponding row in TableName2 are altered; if you drop the WHERE clause from the UPDATE, you replace the values in Col1, Col2, and Col3 with nulls if there are rows in TableName without a matching entry in TableName2.
Some DBMS also support an update-join notation to reduce the ghastliness of this notation.

Resources