How to use subquery in LIKE statement in Impala? - hadoop

I have a lookup table that contains forbidden values/strings and a rule number which depicts where the value cannot occur. So for e.g., I have ‘C/O’ as a value and this can’t occur anywhere in a name field. I also have ‘P.O’ which can’t occur in an address. I am attempting to create a data quality report to flag these values without hard coding. I have tried:
Select
A.name
,A.address
From customer A
Where a.name LIKE (Select concat(‘%’, exclusion_value, ‘%’) from DQ_lookup where rule_number=2)
Or a.address LIKE (Select concat(‘%’, exclusion_value, ‘%’) from DQ_lookup where rule_number=1)
This fails. How if at all can I get this to work ?

For matching patterns in hive, you would need to use rlike.
A RLIKE B
: NULL if A or B is NULL, TRUE if any (possibly empty) substring of A matches the Java regular expression B, otherwise FALSE. For example, 'foobar' RLIKE 'foo' evaluates to TRUE and so does 'foobar' RLIKE '^f.*r$'.
Something as below would do.
Select
A.name,
A.address
From customer A
Where
a.name RLIKE (Select exclusion_value from DQ_lookup where rule_number=2)
OR a.address RLIKE (Select exclusion_value from DQ_lookup where rule_number=1)
Note: exclusion_value should be a regex expression.

Related

ORA-00936: missing expression using SELECT INTO local_variable

I am trying to assing a result to a local variable in stored procedure sql.
For example
Select c.parm_val from Cusomter.name c where c.id = '102';
The above query gives me a result like 36,1508,4399,4403,4405,4407,4409,4411,4419
I want to assign it to a local variable
So I created in stored procedure like below
DECLARE
values VARCHAR2(500 BYTE);
BEGIN
Select into values c.parm_val from Cusomter.name c where c.id = '102';
END
When I execute this I get different errors each time
Something like PL/SQL: ORA-00936: missing expression
I want to assign those result a variable. I don't know if I can use INSERT as it not a table.
Can someone help me how to assign it to a variable.
I'm not sure about the syntax you are using. The FROM clause requires a table name like Customer, not Customer.name, which seems to be a column.
Starting with 11g Release 2 you can use the LISTAGG function to concatenate a column from the result rows into a single string.
SELECT LISTAGG(c.name, ',') WITHIN GROUP (ORDER BY c.name) INTO "values"
FROM Customer c
WHERE c.id = '102';
If c.id has a numeric type, drop the quotes: WHERE c.id = 102.
According to your comment, you probably want something like
SELECT c.name INTO "values"
FROM Customer c
WHERE c.id = '102';
See: PL/SQL SELECT INTO
Also, VALUES is a reserved word in SQL. Therefore, either choose another name, or escape it as "values" (in the declaration as well).
INTO comes after the field list:
Select c.parm_val into values from Cusomter.name c where c.id = '102';

oracle report lexical parameter

I am using oracle report and have problem with "SELECT ALL" here is my query
SELECT * FROM company A, seller B
WHERE a.id = b.id
&(P)Company_id
and in my after parameter in oracle report i use
begin
if (:(V)Company_id is not null and :(V)Company_id<> '0')
:(P)Company_id:= ' and a.id ='||:(V)Company_id;
end if;
return (true)`
end;
if the id is all digit like 000123 works fine, but if id like ([L]00123) the result is show all data. need help with my lexical parameter.
Information you post is misleading. I've been using Oracle Reports for ages, and I've never seen syntax you use. Code you wrote doesn't even compile; how would it work, then (which is what you claim)? There's no (V)something syntax at all.
Anyway, from my point of view, you don't need a lexical parameter but a simple OR condition, e.g.
select *
from company a join seller b on b.id = a.id
where (a.id = :par_company_id or :par_company_id is null)
the first part of it, a.id = :par_company_id will return rows whose ID is equal to value you enter in the parameter form
the second part, or :par_company_id is null will return all rows if you leave the parameter value empty
I would have thought you'd get an error from that rather than all data, but maybe Reports does something weird in that scenario. Anyway, it looks like you just need to enclose the passed-in value in single quotes, which you will need to escape; so instead of this line:
:(P)Company_id:= ' and a.id ='||:(V)Company_id;
use:
:(P)Company_id:= ' and a.id = ''' || :(V)Company_id || '''';
although it would be better if you could keep it as a bind variable. I'm not familiar with Reports but something like this might work:
select *
from company A
join seller B
on a.id = b.id
where &(V)Company_id is null or a.id = &(V)Company_id
(I've switched to ANSI join syntax as well...)

ORA-01722: invalid number while passing value from inner select query to the top select query

FOR the ISBN['9780495809135'] if CATEGORY_EXISTS column return as 1234,3454 then query is throwing below error.if it returns single row then its not throwing error.
I want to write in the topmost query say if CATEGORY_EXISTS ='Category Not Found' then FILE_NAME column then should display as 'files not found' otherwise pass the CATEGORY_EXISTS values with comma separated to top most most query.
Please note that this is just pseduo query,in the actual query lot of other tables and joins are there,
ORA-01722: invalid number
01722. 00000 - "invalid number"
*Cause: The specified number was invalid.
*Action: Specify a valid number.
SELECT ISBN ,
(SELECT LISTAGG(ANP.FILE_NAME, ',') WITHIN GROUP (
ORDER BY ANP.FILE_NAME)
FROM TABLE1 T
WHERE T.NODE_ID IN( CATEGORY_EXISTS)
)FILE_NAME
FROM
(SELECT ISBN,
(SELECT (
CASE
WHEN COUNT(DISTINCT AN.ID) > 0
THEN LISTAGG(AN.ID, ',') WITHIN GROUP (
ORDER BY AN.ID)
ELSE 'Category Not Found'
END )
FROM TABLE1 aca
JOIN TABLE2 AN
ON ACA.CHILD_NODE_ID=AN.ID
WHERE PARENT_NODE_ID=GT_CHILD_NODE_ID
) CATEGORY_EXISTS
FROM
(SELECT ISBN,
(SELECT ID FROM TEMP_CHILD_ASSOC ac WHERE CHILD_NODE_NAME=GT.ISBN
) GT_CHILD_NODE_ID
FROM MAIN_TABLE GT
WHERE ISBN='9780495809135'
)
);
The listagg() function generates a string of comma-separated values (if there is more than one ID). The case expression gives you either that generated string, of the fixed text literal (if there are no IDs). You are then trying to compare that string to a number; effectively one of these:
WHERE T.NODE_ID IN ('4321')
WHERE T.NODE_ID IN ('1234,3454')
WHERE T.NODE_ID IN ('Category Not Found')
You are implicitly converting the string to a number to compare it with NODE_ID. The first one will work as the implicit conversion is valid. The second will give you ORA-01722 (unless you have exactly two values, and your NLS decimal separator is a comma; but still won't give a match), and the third will also give that error - because those strings cannot be converted to numbers.
It's possible you are expecting the second one to be magically treated as two numbers inside the IN() clause, but that isn't how it works; it's getting a single string literal, not an actual list of numbers it can understand.
The IN condition does accept a list of multiple comma-separated expressions, but you are passing in a single string. The fact that string happens to consist of comma-separated values is irrelevant: it is itself still just a single expression. And that cannot be converted implicitly to a number.
If you have, or can create, a schema-level table type like:
create type my_number_tab as table of number
/
then you could use the collect() function to convert the IDs into a collection instead of a string, and then use member of to find matches; something like (with a bit of interpretation of your pseudocode):
SELECT ISBN ,
(SELECT LISTAGG(ANP.FILE_NAME, ',') WITHIN GROUP (
ORDER BY ANP.FILE_NAME)
FROM TABLE3 ANP
WHERE ANP.NODE_ID MEMBER OF CATEGORIES -- use collection
)FILE_NAME
FROM
(SELECT ISBN,
(SELECT CAST(COLLECT(AN.ID) AS my_number_tab) -- create collection not string
FROM TABLE1 aca
JOIN TABLE2 AN
ON ACA.CHILD_NODE_ID=AN.ID
WHERE PARENT_NODE_ID=GT_CHILD_NODE_ID
) CATEGORIES
FROM
(SELECT ISBN,
(SELECT ID FROM TEMP_CHILD_ASSOC ac WHERE CHILD_NODE_NAME=GT.ISBN
) GT_CHILD_NODE_ID
FROM MAIN_TABLE GT
WHERE ISBN='9780495809135'
)
);
It looks like you could also join to anp inside the inner query instead, so in that you generate the string list of file names rather than (or as well as) the string list of IDs. It's hard to tell from the pseudocode though; but perhaps something like:
SELECT ISBN,
(SELECT (
CASE
WHEN COUNT(DISTINCT AN.ID) > 0
THEN LISTAGG(ANP.FILE_NAME, ',') WITHIN GROUP (
ORDER BY ANP.FILE_NAME)
ELSE 'Category Not Found'
END )
FROM TABLE1 aca
JOIN TABLE2 AN
ON ACA.CHILD_NODE_ID=AN.ID
JOIN TABLE3 ANP
ON ANP.NODE_ID=AN.ID
WHERE ACA.PARENT_NODE_ID=GT_CHILD_NODE_ID
) FILE_NAME
FROM
(SELECT ISBN,
(SELECT ID FROM TEMP_CHILD_ASSOC ac WHERE CHILD_NODE_NAME=GT.ISBN
) GT_CHILD_NODE_ID
FROM MAIN_TABLE GT
WHERE ISBN='9780495809135'
);
You could probably also do the same thing with left outer joins (though perhaps they don't all need to be), although your comment suggests you have a reason for using subqueries instead:
SELECT GT.ISBN,
CASE WHEN COUNT(AN.ID) = 0 THEN 'files not found'
ELSE LISTAGG(ANP.FILE_NAME, ',') WITHIN GROUP (ORDER BY ANP.FILE_NAME)
END AS file_name
FROM MAIN_TABLE GT
LEFT JOIN TEMP_CHILD_ASSOC ac ON CHILD_NODE_NAME=GT.ISBN
LEFT JOIN table1 aca ON aca.parent_node_id = ac.id
LEFT JOIN table2 an on an.id = ACA.CHILD_NODE_ID
LEFT JOIN table3 anp on anp.node_id = an.id
WHERE GT.ISBN = '9780495809135'
GROUP BY GT.ISBN;
or something like that; again hard to tell from the pseudocode...

Regular Expressions Oracle

I want to find regular expression [.-] in field filial_name.
select uc.filial_name from MYTABLE uc
where regexp_like(uc.filial_name , '[.-]');
select uc.filial_name from MYTABLE uc
where uc.filial_name like '%[.-]%';
The first variant is working. But the second is not.
How to fix second variant ?
Second expression isn't regex , it is normal search and should be used be like '%text%' .
Try this
SELECT uc.filial_name FROM MYTABLE uc
where uc.filial_name like '%.-%';

NOT IN query... odd results

I need a list of users in one database that are not listed as the new_user_id in another. There are 112,815 matching users in both databases; user_id is the key in all queries tables.
Query #1 works, and gives me 111,327 users who are NOT referenced as a new_user_Id. But it requires querying the same data twice.
-- 111,327 GSU users are NOT listed as a CSS new user
-- 1,488 GSU users ARE listed as a new user in CSS
--
select count(gup.user_id)
from gsu.user_profile gup
join (select cud.user_id, cud.new_user_id, cud.user_type_code
from css.user_desc cud) cudsubq
on gup.user_id = cudsubq.user_id
where gup.user_id not in (select cud.new_user_id
from css.user_desc cud
where cud.new_user_id is not null);
Query #2 would be perfect... and I'm actually surprised that it's syntactically accepted. But it gives me a result that makes no sense.
-- This gives me 1,505 users... I've checked, and they are not
-- referenced as new_user_ids in CSS, but I don't know why the ones
-- that were excluded were excluded.
--
-- Where are the missing 109,822, and whatexcluded them?
--
select count(gup.user_id)
from gsu.user_profile gup
join (select cud.user_id, cud.new_user_id, cud.user_type_code
from css.user_desc cud) cudsubq
on gup.user_id = cudsubq.user_id
where gup.user_id not in (cudsubq.new_user_id);
What exactly is the where clause in the second query doing, and why is it excluding 109,822 records from the results?
Note The above query is a simplification of what I'm really after. There are other/better ways to do the above queries... they're just representative of the part of the query that's giving me problems.
Read this: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::NO::P11_QUESTION_ID:442029737684
For what I understand, your cudsubq.new_user_id can be NULL even though both tables are joined by user_id, so, you won't get results using the NOT IN operator when the subset contains NULL values . Consider the example in the article:
select * from dual where dummy not in ( NULL )
This returns no records. Try using the NOT EXISTS operator or just another kind of join. Here is a good source: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
And what you need is the fourth example:
SELECT COUNT(descr.user_id)
FROM
user_profile prof
LEFT OUTER JOIN user_desc descr
ON prof.user_id = descr.user_id
WHERE descr.new_user_id IS NULL
OR descr.new_user_id != prof.user_id
Second query is semantically different. In this case
where gup.user_id not in (cudsubq.new_user_id)
cudsubq.new_user_id is treated as expression (doc: IN condition), not as a subquery, thus the whole clause is basically equivalent to
where gup.user_id != cudsubq.new_user_id
So, in your first query, you're literally asking "show me all users in GUP, who also have entries in CSS and their GUP.ID is not matching ANY NOT NULL NEW_ID in CSS ".
However, the second query is "show me all users in GUP, who also have entries in CSS and their GUP.ID is not equal to their RESPECTIVE NULLABLE (no is not null clause, remember?) CSS.NEW_ID value".
And any (not) in (or equality/inequality) checks with nulls don't actually work.
12:07:54 SYSTEM#oars_sandbox> select * from dual where 1 not in (null, 2, 3, 4);
no rows selected
Elapsed: 00:00:00.00
This is where you lose your rows. I would probably rewrite your second query's where clause as
where cudsubq.new_user_id is null, assuming that non-matching users have null new_user_id.
Your second select compares gup.user_id with cud.new_user_id on current joining record. You can rewrite the query to get the same result
select count(gup.user_id)
from gsu.user_profile gup
join (select cud.user_id, cud.new_user_id, cud.user_type_code
from css.user_desc cud) cudsubq
on gup.user_id = cudsubq.user_id
where gup.user_id != cud.new_user_id or cud.new_user_id is null;
You mentioned you compare list of user in one database with a list of users in another. So you need to query data twice and you don't query the same data. Maybe you can use "minus" operator to avoid using "in"
select count(gup.user_id)
from gsu.user_profile gup
join (select cud.user_id from css.user_desc cud
minus
select cud.new_user_id from css.user_desc cud) cudsubq
on gup.user_id = cudsubq.user_id;
You want new_user_id's from table gup that don't match any new_user_id on table cud, right? It sounds like a job for a left join:
SELECT count(gup.user_id)
FROM gsu.user_profile gup LEFT JOIN css.user_desc cud
ON gup.user_id = cud.new_user_id
WHERE cud.new_user_id is NULL
The join keeps all rows of gup, matching them with a new_user_id if possible. The WHERE condition keeps only the rows that have no matching row in cud.
(Apologies if you know this already and you're only interested in the behavior of the not in query)

Resources