Ruby regex - extract words - ruby

I want to extract table names from an SQL query.
SELECT col1, col2, count(1) as count_all, FROM tbl1, tbl2 where condition order by column
I want teh result ["tbl1", "tbl2"]
It is not necessary that there will be multiple tables to query. In that case the query will be
SELECT col1, col2, count(1) as count_all, FROM tbl1 where condition order by column
And expected result ["tbl1"]
Thanks in advance!

Note that this can potentially match stuff inside a SQL string and therefore is not perfect.
test = [
"SELECT col1, col2, count(1) as count_all FROM tbl1, tbl2 where condition order by column",
"SELECT col1, col2, count(1) as count_all FROM tbl1 where condition order by column",
"SELECT col1, col2, count(1) as count_all FROM tbl1",
]
tests.map { |str| str.match(/\s*from\s*([a-z_0-9]+(?:,\s*[a-z_0-9]+)*)\b/i); $1 }
#=> ["tbl1, tbl2", "tbl1", "tbl1"]

Related

Change Default Hive result to some values

I was trying to get duplicate record count from table, but for particular partitions data is not available, so hive is only printing "OK" result.
Is it possible to change this result with some value like 0 Or NULL.
Yes have tried with nvl,COALESCE,case option still it showing OK. AND goal is to only check duplicate count, so required at least one value
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
It will return no rows on empty dataset because you are using group by and having filter. Group by having nothing to group, that is why it does not return any rows. Without group by and having query returns 0:
select nvl(count(*),0) cnt, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
As a solution you can UNION ALL with null row when empty dataset
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
UNION ALL --returns 1 row on empty dataset
select col1, col2, DUPLICATE_ROW_COUNT, TABLE_NAME
from (select null col1, null col2, null AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
)a --inner join will not return rows when non-empty dataset
inner join (
select count(*) cnt from --should will return 0 on empty dataset
( --your original query
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
)s --your original query
)s on s.cnt=0
Also it's may be possible to use CTE (WITH) and WHERE NOT EXISTS instead of inner joinfor your subquery, didn't test it.
Also you can use shell to get result and test it on empty value:
dataset=$(hive -e "set hive.cli.print.header=false; [YOUR QUERY HERE]);
# test on empty dataset
if [[ -z "$dataset" ]] ; then
dataset=0
fi

Distinct in XMLAGG function in oracle sql

problem in avoiding duplicates using XMLAGG function
A table which is having multiple records. where each record has one column contains repetitive date.
Using XMLAGG function in the following sql
select col1, col2, XMLAGG(XMLELEMENT(E, colname || ',')).EXTRACT('//text()')
from table
group by col1, col2
i get the following output
col1 col2 col3
hareesh apartment residential, commercial, residential, residential
But i need the following output as
col3 : residential, commercial.
Anyone help me
Try using a subquery to remove duplicates:
SELECT col1, col2, XMLAGG(XMLELEMENT(E, colname || ',')).EXTRACT('//text()')
FROM (SELECT DISTINCT col1, col2, colname FROM table)
GROUP BY col1, col2

Oracle Query Subselect with order by

I am normally using MS SQL and am a total rookie with oracle.
I get an oracle driver problem when I use the ORDER BY statement in my subquery.
Example (my real statement is much more complex but I doubt it matters to my problem - I can post it if needed):
SELECT col1
, col2
, (SELECT colsub FROM subtbl WHERE idsub = tbl.id AND ROWNUM=1 ORDER BY coldate) col3
FROM tbl
If I do such a construct I get an odbc driver error: ORA-00907: Right bracket is missing (translated from german, so bracket might be other word :)).
If I remove the ORDER BY coldate everything works fine. I couldn't find any reason why, so what do I wrong?
It doesn't make any sense to write the ROWNUM and the ORDER BY this way since the ORDER BY is evaluated after the WHERE clause, meaning that it has no effect in this case. An example is given in this question.
This also gets a little more complicated because it is hard to join a sub-query back to the main query if it is nested too deeply.
The query below won't necessarily work because you can't join between tbl and subtbl in this way.
SELECT
col1,
col2,
(
select colsub
from (
SELECT colsub
FROM subtbl
WHERE idsub = tbl.id
order by coldate
)
where rownum = 1
) as col3
FROM tbl
So you'll need to use some sort of analytic function as shown in the example below:
SELECT
col1,
col2,
(SELECT max(colsub) keep (dense_rank first order by coldate) as colsub
FROM subtbl
WHERE idsub = tbl.id
group by idsub
) col3
FROM tbl
The FIRST analytic function is more complicated than it needs to be but it will get the job done.
Another option would be to use the ROW_NUMBER analytic function which would also solve the problem.
SELECT
col1,
col2,
(select colsub
from (
SELECT
idsub,
row_number() over (partition by idsub order by coldate) as rown,
colsub
FROM subtbl a
) a
WHERE a.idsub = tbl.id
and a.rown = 1
) col3
FROM tbl
What you are doing wrong is clear. You are using an order by in a sub-query. It does not make any sense using an order by in a sub-query so why would you want to do that?
Also you are using an order by on a sub-query that always returns 1 row. That also does not make any sense.
If you want the query result to be sorted use an order by at the highest level.
try:
select
col1,
col2,
colsub
from(
select
col1 ,
col2 ,
coldate,
max(coldate) over (partition by st.idsub) max_coldate
from
tbl t,
subtbl st
where
st.idsub = t.id)
where
coldate = max_coldate

Replace selfjoin with analytic functions

How do I go about replacing the following self join using analytics:
SELECT
t1.col1 col1,
t1.col2 col2,
SUM((extract(hour FROM (t1.times_stamp - t2.times_stamp)) * 3600 + extract(minute FROM ( t1.times_stamp - t2.times_stamp)) * 60 + extract(second FROM ( t1.times_stamp - t2.times_stamp)) ) ) div,
COUNT(*) tot_count
FROM tab1 t1,
tab1 t2
WHERE t2.col1 = t1.col1
AND t2.col2 = t1.col2
AND t2.col3 = t1.sequence_num
AND t2.times_stamp < t1.times_stamp
AND t2.col4 = 3
AND t1.col4 = 4
AND t2.col5 NOT IN(103,123)
AND t1.col5 != 549
GROUP BY t1.col1, t1.col2
I'm pretty sure you won't be able to replace the self-join with analytics because you are using inter-rows operations (t1.time_stamp - t2.time_stamp). Analytics can only access the values of the current row and the value of aggregate functions over a subset of rows (windowing clause).
See this article from Tom Kyte and this paper for further analysis of the limitations of analytics.
It almost looks like you could eliminate the self join on t2 and replace
t1.time_stamp - t2.time_stamp
with something like
t1.time_stamp - lag(t1.time_stamp) over (partition by col1, col2 order by time_stamp)
The different filters on t1 and t2 on col4 and col5 are what prevents you from doing this.
Analytic functions are applied after the where / group by on the main query, so you'd need to have a single filter on t1 in order to use lag/lead to specify following or preceding rows in a sequence.
Also, you'd need to push the sum/group by to an outer query to aggregate after the analytic function:
select col1, col2, sum(timestamp_diff) from (
select col1, col2, timestamp - lag(timestamp) over(.....) as timestamp_diff
where ....
) group by col1, col2

Getting Error in query

update tablename set (col1,col2,col3) = (select col1,col2,col3 from tableName2 order by tablenmae2.col4) return error
Missing ). The query works fine if I remove the order by clause
ORDER BY is not allowed in a subquery within an UPDATE. So you get the error "Missing )" because the parser expects the subquery to end at the point that you have ORDER BY.
What is the ORDER BY intended to do?
What you probably have in mind is something like:
UPDATE TableName
SET (Col1, Col2, Col3) = (SELECT T2.Col1, T2.Col2, T2.Col3
FROM TableName2 AS T2
WHERE TableName.Col4 = T2.Col4
)
WHERE EXISTS(SELECT * FROM TableName2 AS T2 WHERE TableName.Col4 = T2.Col4);
This clumsy looking operation:
Grabs rows from TableName2 that match TableName on the value in Col4 and updates TableName with the values from the corresponding columns.
Ensures that only rows in TableName with a corresponding row in TableName2 are altered; if you drop the WHERE clause from the UPDATE, you replace the values in Col1, Col2, and Col3 with nulls if there are rows in TableName without a matching entry in TableName2.
Some DBMS also support an update-join notation to reduce the ghastliness of this notation.

Resources