Delete all string after last matching words - shell

I want to delete all string after last where condition
My input is
DELETE FROM abc T1 WHERE EXISTS (SELECT 1 FROM cdef T2 WHERE T1.a=T2.b)
Want the output as
DELETE FROM abc T1 WHERE EXISTS (SELECT 1 FROM cdef T2 WHERE
i have tried it with sed command as
output=`echo DELETE FROM abc T1 WHERE EXISTS (SELECT 1 FROM cdef T2
WHERE T1.a=T2.b) | sed -n -e 's/[Ww][Hh][Ee][Rr][Ee].*//p'`
but i got output as
DELETE FROM abc T1

With sed, using BRE:
sed 's/\(.*WHERE\).*/\1/;s/\(.*where\).*/\1/;' <<< "DELETE FROM abc T1 WHERE EXISTS (SELECT 1 FROM cdef T2 WHERE T1.a=T2.b)"
With GNU sed, using the i (for case insensitive) modifier:
sed 's/\(.*where\).*/\1/i' <<< "DELETE FROM abc T1 WHERE EXISTS (SELECT 1 FROM cdef T2 WHERE T1.a=T2.b)"
or the alternation | operator:
sed -r 's/(.*(where|WHERE)).*/\1/i' <<< "DELETE FROM abc T1 WHERE EXISTS (SELECT 1 FROM cdef T2 WHERE T1.a=T2.b)"
output:
DELETE FROM abc T1 WHERE EXISTS (SELECT 1 FROM cdef T2 WHERE

Related

Extract values from sql script output which are not equal to zero

I am basically doing row counts of tables with same names between 2 different databases.
Our sql script is something like this:
select (select count(1) from source.abc#remotedb) - (select count(1) from target.bcd) from dual;
we have almost 2000 scripts similar to above.
and the output is like following:
select count(1) from source.abc#remotedb) - (select count(1) from target.abc
----------------------------------------------------------------------------
0
select count(1) from source.opo#remotedb) - (select count(1) from target.opo
----------------------------------------------------------------------------
26
select count(1) from source.asd#remotedb) - (select count(1) from target.asd
----------------------------------------------------------------------------
-95
Now using using bash/shell scripting i want to print the output to a separate file of only those three lines where the numeric value is NOT equal to 0.
Example:
$ cat final_result.txt
select count(1) from source.opo#remotedb) - (select count(1) from target.opo
----------------------------------------------------------------------------
26
select count(1) from source.asd#remotedb) - (select count(1) from target.asd
----------------------------------------------------------------------------
-95
grep -E -B1 '\-{0,1}[1-9][0-9]*' fileinput > final_result.txt
-B1: one line Before the matched line
Maybe something like
egrep -B3 '^\-*[1-9]+$' fileinput > final_result.txt
I would do like this:
cat result_of_sql | grep -v '^$' | paste - - - | awk -F"\t" '$3!=0{print $1}'
Where
grep -v '^$' get rid of empty lines
paste - - - aggregate every 3 lines into 1 with tabs
awk -F"\t" '$3!=0{print $1}' awk magic: print the expected result.

Rewrite Hive IN clause

I am trying to execute this subquery in HIVE,but i am getting error that subquery is not supported in my HIVE version, unfortunately yes we are using the old version of HIVE.
select col1,col2 from t1 where col1 in (select x from t2 where y = 0)
Then I have rewritten the subquery using left semi join like this,
select a.col1,a.col2
FROM t1 a LEFT SEMI JOIN t2 b on (a.col1 =b.x)
WHERE b.y = 0
This query is running fine if i don't give the where condition, but its not recognising the table b when I try to use b.any column in where condition or use b.any column in select clause. Throwing this error -
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 3:6 Invalid table alias or column reference 'b': (possible column names
Any help is much appreciated.
select a.col1,a.col2
FROM t2 b RIGHT OUTER JOIN t1 a on (b.x = a.col1)
WHERE b.y = 0
-- When you use LEFT SEMI JOIN, where condition is not work on right side table column. Please change your script to above condition.
Instead of t1 a LEFT SEMI JOIN t2 b, you can do something like this: t1 a LEFT SEMI JOIN (select * from t2 where y = 0) b.
select a.col1,a.col2
FROM t1 a LEFT SEMI JOIN (select * from t2 where y = 0) b on (a.col1 =b.x);
Please see below example.
Department table:
+--------------------+----------------------+--+
| department.deptid | department.deptname |
+--------------------+----------------------+--+
| D101 | sales |
| D102 | finance |
| D103 | HR |
| D104 | IT |
| D105 | staff |
+--------------------+----------------------+--+
Employee tabe:
+-----------------+------------------+------------------+--+
| employee.empid | employee.salary | employee.deptid |
+-----------------+------------------+------------------+--+
| 1001 | 1000 | D101 |
| 1002 | 2000 | D101 |
| 1003 | 3000 | D102 |
| 1004 | 4000 | D104 |
| 1005 | 5000 | D104 |
+-----------------+------------------+------------------+--+
hive> SELECT
dept.deptid, dept.deptname
FROM
department dept
LEFT SEMI JOIN
(SELECT * FROM employee WHERE salary > 3000) emp
ON (dept.deptid = emp.deptid);
+--------------+----------------+--+
| dept.deptid | dept.deptname |
+--------------+----------------+--+
| D104 | IT |
+--------------+----------------+--+

How to get latest two rows with certain value by date in SQL [duplicate]

This question already has answers here:
Get top results for each group (in Oracle)
(5 answers)
Closed last year.
My question is that I have certain table with some varchar2 values and insert date.
What I want to do is to get latest two such entries grouped by this varchar2 value
Is it possible to include some top(2) instead of max in Oracle group by ?
EDIT Updated to not count duplicate date value for the same varchar2.
Replaced RANK() with DENSE_RANK() such that it assigns consecutive ranks, then used distinct to eliminate the duplicates.
You can use DENSE_RANK()
SELECT DISTINCT TXT, ENTRY_DATE
FROM (SELECT txt,
entry_date,
DENSE_RANK () OVER (PARTITION BY txt ORDER BY entry_date DESC)
AS myRank
FROM tmp_txt) Q1
WHERE Q1.MYRANK < 3
ORDER BY txt, entry_date DESC
Input:
txt | entry_date
xyz | 03/11/2014
xyz | 25/11/2014
abc | 19/11/2014
abc | 04/11/2014
xyz | 20/11/2014
abc | 02/11/2014
abc | 28/11/2014
xyz | 25/11/2014
abc | 28/11/2014
Result:
txt | entry_date
abc | 28/11/2014
abc | 19/11/2014
xyz | 25/11/2014
xyz | 20/11/2014

Selecting greatest value of B for equal entries in A

Consider the following table:
A | B
-----|------
123 | 1
456 | 2
123 | 5
456 | 0
789 | 3
789 | 9
123 | 6
I want to get the following output:
A | B
-----|------
123 | 6
456 | 2
789 | 9
In other words: the greatest value of B for each equal value in A.
The initial table above comes already from another query which only selects duplicates of A:
select A, B from tbl where A in (
select A from tbl
group by A
having count(A) > 1
);
I tried wrapping/integrating another grouping function with and without max(B) around/into this query, but no success.
How can I get the desired output?
Just use max:
select A, max(B)
from tbl
group by A
having count(A) > 1
maybe I'm being naive here, but:
SELECT tbl2.A, MAX(tbl2.B) FROM
(select A, B from tbl where A in (
select A from tbl
group by A
having count(A) > 1
)) as tbl2
GROUP BY tbl2.A
seems like it should work.

How to add sum of any column value to last column of last record in a tab delimited file

I need to take sum of col2 and add the value to last column of last record. Please advice, how can we achive this using UNIX shell script.
E.g Input file:
Col1 Col2 Col3 Col4
abc 2 A null
bcd 3 B null
adf 4 C null
Output file
Col1 Col2 Col3 Col4
abc 2 A null
bcd 3 B null
adf 4 C 9
Assuming you want to preserve white space in your output:
$ awk '{sum+=$2; s=s $0 ORS} END{ sub("null"ORS"$",sum,s); print s}' file
Col1 Col2 Col3 Col4
abc 2 A null
bcd 3 B null
adf 4 C 9
or:
$ awk '{sum+=$2; printf "%s",p} {p=$0 ORS} END{ sub("null$",sum); print}' file
Col1 Col2 Col3 Col4
abc 2 A null
bcd 3 B null
adf 4 C 9
This awk script will do it I think:
awk '{sum+=$2; if (NR != 4) print; } END {$4=sum; print;}' infile
EDIT: Or even better:
awk '{sum+=$2; if (NR>1) print x; x=$0} END {$4=sum; print $0;}' infile
Here's one way using GNU awk:
awk -v last="$(wc -l < file.txt)" 'NR == 1 { print; next } { sum += $2 } NR == last { sub($NF, sum) }1' file.txt
Results:
Col1 Col2 Col3 Col4
abc 2 A null
bcd 3 B null
adf 4 C 9

Resources