I have the following situation within a PL/SLQ function. Depending of existing table field value I might run a different select. Specifically: I can have multiple rows for a particular BILL CODE (PINGPONG) where I would only need to get the SYS_FIELD value. This field has to be fetched only once according to following condition: If fields prep_seq_num=0 and primary_ind=0 then just get this row sys_field value straightaway and do not take care of other possible prep_seq_num and primary_ind values different from 0. If that rows is not existing, fetch the sys_field value from prep_seq_num!=0 and primary_ind=1. For both case only one instance/row must be possible So in the first case I should run:
SELECT SYS_FIELD
INTO v_start_of_invoice
FROM BILL
WHERE TRACKING_ID = v_previous_trackingID
AND BSCO_CODE_ID = 'PINGPONG'
AND CHRG_ACCT_ID = v_ACCT_ID
AND PREP_SEQ_NUM = 0 -- maybe not needed here
AND ITEM_CAT_CODE_ID=1
AND PARTITION_KEY = v_prev_partition
AND SUBPARTITION_KEY = v_prev_subpartition
AND PRIMARY_IND=0;
In the second case
SELECT SYS_FIELD
INTO v_start_of_invoice
FROM BILL
WHERE TRACKING_ID = v_previous_trackingID
AND BILL_CODE_ID = 'PINGPONG'
AND ITEM_CAT_CODE_ID in ('5' , '-100')
AND PARTITION_KEY = v_prev_partition
AND SUBPARTITION_KEY = v_prev_subpartition
AND PRIMARY_IND=1;
Not sure I'm making it very complicated, but still I 'd need to know whether IF THEN ELSE or CASE or whatever should be used and how.
You can utilize an implicit cursor for loop.
for rec IN ( <your query that gives all records)
LOOP
--compare the value of this variable and decide in the IF condition.
IF rec.prep_seq_num=0 and rec.primary_ind=0
THEN
-- write the query to get the sys_field
ELSE
-- write alternative query.
END IF;
END LOOP;
Related
I'm attempting to make a linq where contains query quicker.
The data set contains 256,999 clients. The Ids is just a simple list of GUID'S and this would could only contain 3 records.
The below query can take up to a min to return the 3 records. This is because the logic will go through the 256,999 record to see if any of the 256,999 records are within the List of 3 records.
returnItems = context.ExecuteQuery<DataClass.SelectClientsGridView>(sql).Where(x => ids.Contains(x.ClientId)).ToList();
I would like to and get the query to check if the three records are within the pot of 256,999. So in a way this should be much quicker.
I don't want to do a loop as the 3 records could be far more (thousands). The more loops the more hits to the db.
I don't want to grap all the db records (256,999) and then do the query as it would take nearly the same amount of time.
If I grap just the Ids for all the 256,999 from the DB it would take a second. This is where the Ids come from. (A filtered, small and simple list)
Any Ideas?
Thanks
You've said "I don't want to grab all the db records (256,999) and then do the query as it would take nearly the same amount of time," but also "If I grab just the Ids for all the 256,999 from the DB it would take a second." So does this really take "just as long"?
returnItems = context.ExecuteQuery<DataClass.SelectClientsGridView>(sql).Select(x => x.ClientId).ToList().Where(x => ids.Contains(x)).ToList();
Unfortunately, even if this is fast, it's not an answer, as you'll still need effectively the original query to actually extract the full records for the Ids matched :-(
So, adding an index is likely your best option.
The reason the Id query is quicker is due to one field being returned and its only a single table query.
The main query contains sub queries (below). So I get the Ids from a quick and easy query, then use the Ids to get the more details information.
SELECT Clients.Id as ClientId, Clients.ClientRef as ClientRef, Clients.Title + ' ' + Clients.Forename + ' ' + Clients.Surname as FullName,
[Address1] ,[Address2],[Address3],[Town],[County],[Postcode],
Clients.Consent AS Consent,
CONVERT(nvarchar(10), Clients.Dob, 103) as FormatedDOB,
CASE WHEN Clients.IsMale = 1 THEN 'Male' WHEN Clients.IsMale = 0 THEN 'Female' END As Gender,
Convert(nvarchar(10), Max(Assessments.TestDate),103) as LastVisit, ";
CASE WHEN Max(Convert(integer,Assessments.Submitted)) = 1 Then 'true' ELSE 'false' END AS Submitted,
CASE WHEN Max(Convert(integer,Assessments.GPSubmit)) = 1 Then 'true' ELSE 'false' END AS GPSubmit,
CASE WHEN Max(Convert(integer,Assessments.QualForPay)) = 1 Then 'true' ELSE 'false' END AS QualForPay,
Clients.UserIds AS LinkedUsers
FROM Clients
Left JOIN Assessments ON Clients.Id = Assessments.ClientId
Left JOIN Layouts ON Layouts.Id = Assessments.LayoutId
GROUP BY Clients.Id, Clients.ClientRef, Clients.Title, Clients.Forename, Clients.Surname, [Address1] ,[Address2],[Address3],[Town],[County],[Postcode],Clients.Consent, Clients.Dob, Clients.IsMale,Clients.UserIds";//,Layouts.LayoutName, Layouts.SubmissionProcess
ORDER BY ClientRef
I was hoping there was an easier way to do the Contain element. As the pool of Ids would be smaller than the main pool.
A way I've speeded it up for now is. I've done a Stinrg.Join to the list of Ids and added them as a WHERE within the main SQL. This has reduced the time down to a seconds or so now.
Let's say I have intput_file.txt (user_id, event_code, event_date):
1,a,1
1,b,2
2,a,3
2,b,4
2,b,5
2,b,6
2,c,7
2,b,8
as you can see, user_id = 2, has events like this: abbbcb
I'd like to have a result like this:
1,{(a,1),(b,2)}
2,{(a,2),(b,6),(c,7),(b,8)}
So when we have few events, with the same code, I'd like to take only the last one.
Can you please share any hints?
Regards
Pawel
The main thing you are describing is what GROUP BY does.
In this case:
B = GROUP A BY user_id;
Gets your records together by user_id. Your data will now look like this:
1,{(a,1),(b,2)}
2,{(a,2),(b,6),(c,7),(b,8)}
You say you only want the last one (I assume you mean the one with the greatest event_date). To do this, you can do a nested FOREACH with an ORDER BY to sort by date, and then take the first one with LIMIT. Note that this has arbitrary behavior when there are ties.
C = FOREACH B {
DA = ORDER A BY event_date DESC;
DB = LIMIT DA 1;
GENERATE FLATTEN(group), FLATTEN(DB.event_code), FLATTEN(DB.event_date);
}
Your data should now look like this:
1,b,2
2,b,8
Another option would be to use a UDF to write some custom behavior on the groups given by GROUP BY:
B = GROUP A BY user_id;
C = FOREACH B GENERATE YourUDFThatYouBuilt(group, A);
In that UDF you'd write whatever custom behavior you want (in this case return the tuple with the greatest date)
It seems like you could use the DistinctBy UDF from Apache DataFu to achieve this. This UDF, given a bag, returns the first instance found for a given field. In your case the field you care about is event_code. But we have to reverse the order, as you actually want the last instance.
One clarification though. Correct me if I'm wrong, but I think the intended output is:
1,{(a,1),(b,2)}
2,{(a,3),(b,6),(c,7),(b,8)}
That is, the (a,3) event occurs for member 2. The (a,2) event occurs for member 1.
Here's how you can do it:
-- pass in 1 because we want distinct by event code (position 1)
define DistinctBy datafu.pig.bags.DistinctBy('1');
FOREACH (GROUP A BY user_id) {
-- reverse so we can take the last event code occurrence
A_reversed = ORDER A BY event_date DESC;
-- use DistinctBy to get the first tuple having an occurrence of a field value
A_distinct_by_code = DistinctBy(A_reversed);
-- put back in order again
A_ordered = ORDER A_distinct_by_code BY event_date ASC;
GENERATE group as user_id, A_ordered.(event_code,event_date);
}
I have a query below from source this is how Active flag for target is derived
select case when active_end_date is null then 'Y' else 'N' end
from csi_item_instances cii
where instance_id = <<INSTALL BASE ID>> --- (MP.INSTALL_BASE_ID)
I am comparing the active field value using the SQL below, is there better way to do this?
select * from stgdba.Stg_s_csi_item_instances cii, MDHDBA.M_CUSTOMER_PRODUCT mp
where cii.instance_id= MP.INSTALL_BASE_ID
and cii.active_end_date is null
and MP.ACTIVE_FLAG = 'N'
If that value is to be permanently calculated like that, you could do that in a view / computed column, which would make the logic a bit more permanent and not repeated all over the place.
(Stylistically, I would also try using ANSI joins a bit more.)
I am working on some legacy code, and I have the following wonderful issue. I am hoping some FoxPro experts can help!
The infrastructure of this legacy system is setup so that I have to use the built-in expression engine to return a result set, so no go on the SQL (i know that would be so much easier!)
Here is the issue.
I need to be able to do something like
PUBLIC ARRAY(20) ArrayOfValuesToFilterBy
SELECT dataTable
SET FILTER TO logicalField = .T. and otherField NOT INLIST(ArrayOfValuesToFilterBy)
However, I know this wont work, I just need the equivalency...not using SQL.
I can generate the list of values to filter by via SQL, just not the final record select due to the legacy infrastructure constraint.
Thanks!
First, a logical field you do not have to do explicit
set filter to Logicalfield = .t.
you can just do
set filter to LogicalField
or
set filter to NOT LogicalField
Next, on the array. VFP has a function ASCAN() which will scan an array for a value, if found, will return the row number within the array that matches what you are looking for.
As for arrays... ex:
DIMENSION MyArray[3]
MyArray[1] = "test1"
MyArray[2] = "something"
MyArray[3] = "anything else"
? ASCAN( MyArray, "else" ) && this will return 0
? ASCAN( MyArray, "anything else" ) && this will return 3
If you are doing a "set filter", the array needs to be "in scope" for the duration of the filter. If you set filter in a procedure the array exists, leave the procedure and the array is gone, you're done.
So, you could do
set filter to LogicalField and ASCAN( YourArray, StringColumnFromTable ) > 0
Now, if you want a subset to WORK WITH, you can do a SQL-Select and pull the data into a CURSOR (temporary read-write table) that has the same capabilities of the original table (except auto-increment when adding)...
I typically name my temporary cursors prefixed with "C_" for "CURSOR OF" so when I'm working with tables, I know if its production data, or just available for temp purposes for quicker display, presentation, extractions from other origins as needed.
use in select( "C_FinalRecords" )
select * from YourTable ;
where LogicalField ;
and ASCAN( YourArray, StringColumnFromTable ) > 0;
into cursor C_FinalRecords READWRITE
Then, you can just use that...
select C_FinalRecords
scan
do something with the record, or values of it...
endscan
or.. bind to a grid in a form, etc...
The INLIST() function takes an expression to search for and up to 24 expressions of the same data type to search.
SELECT dataTable
SET FILTER TO logicalField = .T. AND NOT INLIST(otherField, 'Value1', 'Value2', 'Value3', 'Value4')
I am making some assumptions here, that what you want to do is create a filter with a dynamic in list statement ? If this is correct have a play with this example :-
lcList1="ABCD"
lcList2="EFGH"
lcList3="IJKL"
lcList4="MNOP"
lcList5="QRST"
lcFullList=""
lcFullList=lcFullList+"'"+lcList1+"',"
lcFullList=lcFullList+"'"+lcList2+"',"
lcFullList=lcFullList+"'"+lcList3+"',"
lcFullList=lcFullList+"'"+lcList4+"',"
lcFullList=lcFullList+"'"+lcList5+"'"
lcField="PCode"
lcFilter="SET FILTER TO INLIST ("+lcField+","+lcFullList+")"
The results of the above would create the following filter statement and store it in lcFilter
SET FILTER TO INLIST (PCode,'ABCD','EFGH','IJKL','MNOP','QRST')
you can then use macro substitution
Select dataTable
&lcFilter
Bear in mind that there is likely to be some limitations on how many items you can define in a INLIST() statement
I'm still learning some of the PL/SQL differences, so this may be an easy question, but... here goes.
I have a cursor which grabs a bunch of records with multiple fields. I then run two separate SELECT statements in a LOOP from the cursor results to grab some distances and calculate those distances. These work perfectly.
When I go to update the table with the new values, my problem is that there are four pieces of specific criteria.
update work
set kilometers = calc_kilo,
kilo_test = test_kilo
where lc = rm.lc
AND ld = rm.ld
AND le = rm.le
AND lf = rm.lf
AND code = rm.code
AND lcode = rm.lcode
and user_id = username;
My problem is that this rarely updating because rm.lf and rm.le have NULL values in the database. How can I combat this, and create the correct update.
If I'm understanding you correctly, you want to match lf with rm.lf, including when they're both null? If that's what you want, then this will do it:
...
AND (lf = rm.lf
OR (lf IS NULL AND rm.lf IS NULL)
)
...
It's comparing the values of lf and rm.lf, which will return false if either is null, so the OR condition returns true if they're both null.
I have a cursor which grabs a bunch of records with multiple fields. I then run two separate SELECT statements in a LOOP from the cursor results to grab some distances and calculate those distances. These work perfectly.
When I go to update the table with the new values, my problem is that there are four pieces of specific criteria.
The first thing I'd look at is not using a cursor to read data, then make calculations, then perform updates. In 99% of cases it's faster and easier to just run updates that do all of this in a single step
update work
set kilometers = calc_kilo,
kilo_test = test_kilo
where lc = rm.lc
AND ld = rm.ld
AND NVL(le,'x') = NVL(rm.le,'x')
AND NVL(lf,'x') = NVL(rm.lf,'x')
AND code = rm.code
AND lcode = rm.lcode
and user_id = username;