how to implement search by multiple params - performance

Multiple params received from request:
- domain
- GEO
- referer
- other params
Database table looks like that:
domain | geo | referer | utm | option_id |
---------|--------|---------|-----|-----------|
test.com | | | | 1
test.com | us | | | 2
test.com | us | ref.com | 12 | 3
test2.com| us | | | 4
For example i received some request with params:
domain=test.com
geo=us
referer=ref.com
utm=12
If i do a query:
select option_id from table where domain='test.com' and geo='us' and referer='ref.com' and utm='12';
It gives me full match result only option_id = 3
But i need to get all the options, for every match with domain and geo.
option_id = [1,2,3]
How to solve the problem in performant way, maybe the solution is not SQL database. I need to search in highload system in realtime.
Help will be useful, thank you.
The query that satisfy selection will be:
select option_id from table where domain='test.com' and geo='' and referer='' and utm=''
UNION
select option_id from table where domain='test.com' and geo='us' and referer='' and utm=''
UNION
select option_id from table where domain='test.com' and geo='us' and referer='ref.com' and utm='12'
But it is slow, i know that exist simple and performant solution. Maybe without SQL database

If you want to get all the options that match at least 1 of the params you should use OR instead of AND
select option_id from table where ( domain='test.com' OR geo='us' OR referer='ref.com' OR utm='12' );

select option_id from table where domain LIKE '%test.com%' and geo LIKE '%us%' and referer LIKE '%ref.com%' and utm like '%12%';
i'm not test this code but you must use like, it will search component with pattern.

Related

How to groupBy on one column in laravel?

I have a section table and class Table
class table is designed in this way
(id,class_name,section_id)
one class has many sections like
--------------------------------------------
| SN | ClassName | Section_id |
--------------------------------------------
| 1 | ClassOne | 1 |
| 2 | ClassOne | 2 |
| 3 | ClassOne | 3 |
| 4 | ClassOne | 4 |
--------------------------------------------
Now i want to groupBy Only ClassName and display all the sections of that class
$data['classes'] = SectionClass::groupBy('class_name')->paginate(10);
i have groupby like this but it only gives me one section id
Try this way...
$things = SectionClass::paginate(10);
$data['classes']= $things->groupBy('class_name');
You are getting just one row because that is what GROUP BY does, groups a set of rows into a set of summary rows and returns one row for each group. In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, in SQL Server if you try the next clause
SELECT * FROM [Class] GROUP BY [ClassName]
You'll get the next error
"Column 'SN' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
Think about it, you are grouping by ClassName, and following your sample data, this will return just one row. Your SELECT clause includes column ClassName, which is easy to get because is the same in every single row, but when you are selecting another, which one should be return if only one has to be selected?
Now, things change a little bit in MySQL. MySQL extends the standard SQL use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic. You can find a complete explanation about this topic here https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html
If you are expecting a result in one row, you can use GROUP_CONCAT() function to get something like
--------------------------------
| ClassName | Sections |
--------------------------------
| ClassOne | 1,2,3,4 |
--------------------------------
Your query must be something like:
select `ClassName`, group_concat(Section_id) from `class` group by `ClassName`
You can get this with a raw query in laravel or its up to you to find a way to get the same result using query builder ;)

How to return the match record based on lookup table by using hive

Let's say we have a look up table (table_A) and another table (table_B) as follows:
And we want to search string of Table_B from Table_A to return the chemical type and form Table_C, as follows:
How can we implement this by using hive query under hadoop environment?
The challenging part is to search for multiple keywords within same string and create new row for each matched record.
Thank you!
I think you should structure Table_A differently (or keep the current structure but split by comma and use explode in hive) like so:
----------------------------
| Table A |
----------------------------
| Chemical Type | Keyword |
----------------------------
| HF | 100HF |
----------------------------
| HF | 100:HF |
----------------------------
| HCL | HCL200 |
----------------------------
| HCL | 500HCL |
----------------------------
etc...
Then, it seems that you need to perform a cartesian product join:
select distinct b.machine,b.string,a.chemical_type from
Table_A as a, Table_B as b where instr(b.string,a.keyword) > 0;

Two date columns in source to decide the latest updated record in informatica

I have a requirement as below:
I have a source table like
id | name | address | updt_date_1 | updt_date_2
1 | abc | xyz | 2000-01-01 | 1999-01-01
1 | abc | pqr | 2001-01-01 | 1999-01-01
2 | lmn | ghi | 1999-01-01 | 1999-01-01
2 | lmn | stu | 2000-01-01 | 2008-01-01
I would want to get in target as:
1 | abc | pqr
2 | lmn | stu
i.e. I would want the record with the latest update date in either of the two date columns -updt_date_1 or updt_date_2
Please suggest how can this be implemented in informatica PC
This requirement can be achieved in a effective way by using just 3 transformations (SourceQualifier, Expression and Filter). Please see the steps below
1) Use the following SQL override in the Source Qualifier transformation to reduce the two last_updated_date fields into one
SELECT
id
,name
,address
,CASE WHEN updt_date_1 > updt_date_2 THEN updt_date_1 ELSE updt_date_2 AS updt_date
FROM souce_table
ORDER BY id, updt_date DESC
Now the first row for each id will be the required record.
2) Use an expression transformation to flag the first row of each id. Use the following ports in the same order in the expression transformation (prefix o_ means output port, v_ means variable port and i_ means input port)
PORT EXPRESSION
v_FIRST_ROW_FLAG - IIF(v_PREV_ID==i_id,'N','Y')
v_PREV_ID - i_id
o_FIRST_ROW_FLAG - v_FIRST_ROW_FLAG
3) Next add a filter transformation to filter records which does not satisfy the following condition
IIF(o_FIRST_ROW_FLAG==Y,TRUE,FALSE)
Connect this filter transformation to the target definition. This will give you the expected output.
Basically we have to determine maximum update date1 and update date2. Then we have to choose which one is maximum between them.
Usea souce qualifier and then sort the data based on id, name.
Add an aggregtor after. pull id, name, updt_date_1, updt_date_2 columns. Create two o/p columns - max_upd_dt1, max_upd_dt2 and calculate MAX(updt_date_1), MAX(updt_date_2) respectively . set group by id, name.
Use a joiner to join sorter output and aggregator output based on id,name. so now you have two extra columns- max_upd_dt1 and max_upd_dt2.
Use an expression transformation after joiner. Pull all columns in. Create two output port and set logic like below -
out_upd_dt1 = iif( max_upd_dt1 > max_upd_dt2, max_upd_dt1, updt_date_1 )
out_upd_dt2 = iif( max_upd_dt1 < max_upd_dt2, max_upd_dt2, updt_date_2 )
Use another source qualifier(sort by id,name)and join it with above expression tx. Join based on -
id=id, name=name, out_upd_dt1=updt_date_1, out_upd_dt2= updt_date_2
Pick up id, name, address
HTH
Koushik

nested PLSQL in a tabular form

I am trying to achieve the following result (the first line is header)
Level 1 | Level 2 | Level 3 | Level 4 | Person
Technicals | Development | Software | Team leader | Eric
Technicals | Development | Software | Team leader | Steven
Technicals | Development | Software | Team leader | Jana
How can I do so? I tried to use the following code. The first part is to create the hierarchy which works fine. The second part is to have the date in the above mentioned table is a pretty painful.
SELECT * FROM ( /* level2 */
SELECT * FROM ( /* level1 */
SELECT * FROM arc.localnode /*create hierarchy */
WHERE tree_id = 2408362
CONNECT BY PRIOR node_id = parent_id
START WITH parent_id IS NULL ) l1node
LEFT JOIN names on l1node.prent_id = names.name_id ) l2node
At this point, I am quite lost. A bit of guidance and suggestion would be a lot of help :-)
There are two tables. The first table has data like this:
NODE_ID | PREV_ID | NEXT_ID | PARENT_ID
1421864 3482917 1421768
3482981 3482917 1421866 1421768
3482911 3060402 3482913 1421768
3482917 1421864 3482981 1421768
This is a complicated because it is in hieraracy. So obviously a PARENT_ID can be the NODE_ID of some other PARENT_ID. Similarly the parent_ID can be the PREV_ID and NEXT_ID.
The names are in seperate table with name_id. The name ID in this table is similar to NODE_ID of the main table in hieraracy.
You can use the Stragg Package mentioned in AskTom in the below link
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2196162600402
Your can also refer the below link in oracle forum
https://forums.oracle.com/forums/thread.jspa?threadID=2258996
Kindly post create and insert statements for your requirement so that we can test it and confirm

Use Oracle unnested VARRAY's instead of IN operator

Let's say users have 1 - n accounts in a system. When they query the database, they may choose to select from m acounts, with m between 1 and n. Typically the SQL generated to fetch their data is something like
SELECT ... FROM ... WHERE account_id IN (?, ?, ..., ?)
So depending on the number of accounts a user has, this will cause a new hard-parse in Oracle, and a new execution plan, etc. Now there are a lot of queries like that and hence, a lot of hard-parses, and maybe the cursor/plan cache will be full quite early, resulting in even more hard-parses.
Instead, I could also write something like this
-- use any of these
CREATE TYPE numbers AS VARRAY(1000) of NUMBER(38);
CREATE TYPE numbers AS TABLE OF NUMBER(38);
SELECT ... FROM ... WHERE account_id IN (
SELECT column_value FROM TABLE(?)
)
-- or
SELECT ... FROM ... JOIN (
SELECT column_value FROM TABLE(?)
) ON column_value = account_id
And use JDBC to bind a java.sql.Array (i.e. an oracle.sql.ARRAY) to the single bind variable. Clearly, this will result in less hard-parses and less cursors in the cache for functionally equivalent queries. But is there anything like general a performance-drawback, or any other issues that I might run into?
E.g: Does bind variable peeking work in a similar fashion for varrays or nested tables? Because the amount of data associated with every account may differ greatly.
I'm using Oracle 11g in this case, but I think the question is interesting for any Oracle version.
I suggest you try a plain old join like in
SELECT Col1, Col2
FROM ACCOUNTS ACCT
TABLE TAB,
WHERE ACCT.User = :ParamUser
AND TAB.account_id = ACCT.account_id;
An alternative could be a table subquery
SELECT Col1, Col2
FROM (
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
) ACCT,
TABLE TAB
WHERE TAB.account_id = ACCT.account_id;
or a where subquery
SELECT Col1, Col2
FROM TABLE TAB
WHERE TAB.account_id IN
(
SELECT account_id
FROM ACCOUNTS
WHERE User = :ParamUser
);
The first one should be better for perfomance, but you better check them all with explain plan.
Looking at V$SQL_BIND_CAPTURE in a 10g database, I have a few rows where the datatype is VARRAY or NESTED_TABLE; the actual bind values were not captured. In an 11g database, there is just one such row, but it also shows that the bind value is not captured. So I suspect that bind value peeking essentially does not happen for user-defined types.
In my experience, the main problem you run into using nested tables or varrays in this way is that the optimizer does not have a good estimate of the cardinality, which could lead it to generate bad plans. But, there is an (undocumented?) CARDINALITY hint that might be helpful. The problem with that is, if you calculate the actual cardinality of the nested table and include that in the query, you're back to having multiple distinct query texts. Perhaps if you expect that most or all users will have at most 10 accounts, using the hint to indicate that as the cardinality would be helpful. Of course, I'd try it without the hint first, you may not have an issue here at all.
(I also think that perhaps Miguel's answer is the right way to go.)
For medium sized list (several thousand items) I would use this approach:
First:generate a prepared statement with an XMLTABLE in join with your main table.
For instance:
String myQuery = "SELECT ...
+" FROM ACCOUNTS A,"
+ "XMLTABLE('tab/row' passing XMLTYPE(?) COLUMNS id NUMBER path 'id') t
+ "WHERE A.account_id = t.id"
then loop through your data and build a StringBuffer with this content:
StringBuffer idList = "<tab><row><id>101</id></row><row><id>907</id></row> ...</tab>";
eventually, prepare and submit your statement, then fetch the results.
myQuery.setString(1, idList);
ResultSet rs = myQuery.executeQuery();
while (rs.next()) {...}
Using this approach is also possible to pass multi-valued list, as in the select statement
SELECT * FROM TABLE t WHERE (t.COL1, t.COL2) in (SELECT X.COL1, X.COL2 FROM X);
In my experience performances are pretty good, and the approach is flexible enough to be used in very complex query scenarios.
The only limit is the size of the string passed to the DB, but I suppose it is possible to use CLOB in place of String for arbitrary long XML wrapper to the input list;
This binding a variable number of items into an in list problem seems to come up a lot in various form. One option is to concatenate the IDs into a comma separated string and bind that, and then use a bit of a trick to split it into a table you can join against, eg:
with bound_inlist
as
(
select
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.token
Bind variable peaking is going to be a problem though.
Does the query plan actually change for larger number of accounts, ie would it be more efficient to move from index to full table scan in some cases, or is it borderline? As someone else suggested, you could use the CARDINALITY hint to indicate how many IDs are being bound, the following test case proves this actually works:
create table actual_table (id integer, padding varchar2(100));
create unique index actual_table_idx on actual_table(id);
insert into actual_table
select level, 'this is just some padding for '||level
from dual connect by level <= 1000;
explain plan for
with bound_inlist
as
(
select /*+ CARDINALITY(10) */
substr(txt,
instr (txt, ',', 1, level ) + 1,
instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
as token
from (select ','||:txt||',' txt from dual)
connect by level <= length(:txt)-length(replace(:txt,',',''))+1
)
select *
from bound_inlist a, actual_table b
where a.token = b.id;
----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 10 | 840 | 2 (0)| 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 10 | 840 | 2 (0)| 00:00:01 |
| 3 | VIEW | | 10 | 190 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
|* 6 | INDEX UNIQUE SCAN | ACTUAL_TABLE_IDX | 1 | | 0 (0)| 00:00:01 |
| 7 | TABLE ACCESS BY INDEX ROWID | ACTUAL_TABLE | 1 | 65 | 0 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Another option is to always use n bind variables in every query. Use null for m+1 to n.
Oracle ignores repeated items in the expression_list. Your queries will perform the same way and there will be fewer hard parses. But there will be extra overhead to bind all the variables and transfer the data. Unfortunately I have no idea what the overall affect on performance would be, you'd have to test it.

Resources