ALTER TABLE
src_branches ADD CONSTRAINT bmod_to_id_check(
CASE WHEN bmod 7 THEN CASE WHEN id BETWEEN 0 AND 2 THEN 1 ELSE 0
CASE WHEN bmod 6 THEN CASE WHEN id BETWEEN 600 AND 699 THEN 1 ELSE 0
CASE WHEN bmod 5 THEN CASE WHEN id BETWEEN 500 AND 599 THEN 1 ELSE 0
CASE WHEN bmod 4 THEN CASE WHEN id BETWEEN 400 AND 499 THEN 1 ELSE 0
CASE WHEN bmod 3 THEN CASE WHEN id BETWEEN 300 AND 499 THEN 1 ELSE 0
CASE WHEN bmod 2 THEN CASE WHEN id BETWEEN 2 AND 300 THEN 1 ELSE 0
CASE WHEN bmod 1 THEN CASE WHEN id BETWEEN 2 AND 300 THEN 1 ELSE 0
END ELSE 1
END = 1
);
// my goal is keep my id in a set range when bmod = some integer
// i'm open to alternatives
Pay attention to the syntax of the CASE operator:
ALTER TABLE
src_branches ADD CONSTRAINT bmod_to_id check(
case bmod when 7 then id BETWEEN 0 AND 2
when 6 then id BETWEEN 600 AND 699
when 5 then id BETWEEN 500 AND 599
when 4 then id BETWEEN 400 AND 499
when 3 then id BETWEEN 300 AND 499
when 2 then id BETWEEN 2 AND 300
when 1 then id BETWEEN 2 AND 300
else 0
end)
So id BETWEEN x AND y is an expression that is true (1) or false (0).
Note the constraint syntax too.
ref: fiddle
Related
I recently worked on a task where I needed to identify new clients.
I managed to find something similar on google and the final result was this measure that I don't understand
and maybe you can help me understand the logic behind this measure. I obviously thought wrongly that it should be >=MIN(Sheet1[Data])))
not <MIN(Sheet1[Data])))
I improvised some data along with the formula.
new_cust =
CALCULATE(
DISTINCTCOUNT(Sheet1[Cust_id])
,FILTER(
ALL(Sheet1[Data])
,Sheet1[Data]<=MAX(Sheet1[Data])
)
)
-
CALCULATE(
DISTINCTCOUNT(Sheet1[Cust_id])
,FILTER(
ALL(Sheet1[Data])
,Sheet1[Data]<MIN(Sheet1[Data])
)
)
Cust_id Data New_Cust
1 1/1/2023 1
1 1/2/2023 0
2 1/3/2023 1
2 1/4/2023 0
2 1/5/2023 0
3 1/6/2023 1
3 1/7/2023 0
1 2/1/2023 0
1 2/2/2023 0
3 2/3/2023 0
3 2/4/2023 0
3 2/5/2023 0
4 2/6/2023 1
4 2/7/2023 0
4 2/8/2023 0
1 3/1/2023 0
1 3/2/2023 0
2 3/3/2023 0
2 3/4/2023 0
3 3/5/2023 0
3 3/6/2023 0
4 3/7/2023 0
4 3/8/2023 0
6 3/9/2023 1
6 3/10/2023 0
Thank you in advance for your understanding and help
I'd like to make "SORTKEY" like the below. It's not the same observations for each one.
Basically, each one is 3 obs but if flg=1 then "SORTKEY" includes that observation.
In this example, it means SORTKEY = 2 is 4 obs, SORTKEY ^=2 is 3 obs.
Is there the way to make the SORTKEY manually?. If you have a good idea, please give me some advice.
I want the following dataset, using the "test" dataset.
/*
SORTKEY NO FLG
1 1 0
1 2 0
1 3 0
2 4 0
2 5 0
2 6 0
2 7 1
3 8 0
3 9 0
3 10 0
*/
data test;
input no flg;
cards;
1 0
2 0
3 0
4 0
5 0
6 0
7 1
8 0
9 0
10 0
;
run;
Use a sequence counter to track the 3-rows-per-sortkey requirement.
Example:
data want;
set have;
retain sortkey 1;
seq+1;
if seq > 3 and flag ne 1 then do;
seq = 1;
sortkey+1;
end;
run;
a16s
id pic
1 1.jpg
2 2.jpg
3 3.jpg
4 4.jpg
a16s_like
id p_id u_id approve
1 1 2 0
2 1 1 1
3 1 5 1
4 1 6 1
5 1 7 0
6 2 2 0
7 2 3 0
8 2 1 1
9 4 4 0
10 4 3 1
11 4 2 1
SELECT
A.id,
A.PIC,
SUM(CASE WHEN B.approve IS NULL THEN 0 ELSE 1 END) AS Ashowcunt,
SUM(CASE WHEN B.approve=0 THEN 1 ELSE 0 END) AS Nshow,
SUM(CASE WHEN B.approve=1 THEN 1 ELSE 0 END) AS Yshow,
B.approve,
SUM(CASE WHEN B.approve=1 AND B.u_id=3 THEN 1 when B.approve=0 AND B.u_id=3 then 0 ELSE null END) AS U_id3show
FROM a16s AS A
LEFT JOIN a16s_like AS B ON A.ID = B.p_id
GROUP BY A.id,A.pic
to get the list and work well on mysql 5.7
when the u_id=2 to excute the select , I get
id pic Ashowocunt approve_0_count approve_1_count u_id2_approve
1 1.jpg 5 2 3 0
2 2.jpg 3 2 1 0
3 3.jpg 0 0 0 null
4. 4.jpg 3 0 3 0
u_id=3
id pic Ashowocunt approve_0_count approve_1_count u_id3_approve
1 1.jpg 0 0 0 null
2 2.jpg 0 1 0 0
3 3.jpg 0 0 0 null
4. 4.jpg 1 0 1 1
when I change the sql to laravel
$search_alls=
DB::select('A.id','A.route','B.approve')
->addSelect(DB::raw('SUM(CASE WHEN B.approve IS NULL THEN 0 ELSE 1 END) as Ashowcount'))
->addSelect(DB::raw('SUM(CASE WHEN B.approve = 0 THEN 1 ELSE 0 END) as Nshow'))
->addSelect(DB::raw('SUM(CASE WHEN B.approve = 1 THEN 1 ELSE 0 END) as Yshow'))
->addSelect(DB::raw('SUM(CASE WHEN B.approve = 1 AND b.u_id = 2 then 1
when B.approve = 0 AND b.u_id = 2 then 0 ELSE null END) as U_idshow'))
->from('a16s as A')
->join('a16s_like as B', function($join) {
$join->on('A.ID', '=', 'B.p_id');
})
->groupBy('A.id')
->orderby('A.id', 'DESC')
->paginate(12);
return View('comefo.results')
->with('search_alls', $search_alls)
->with('table',$table);
I got the error
Symfony \ Component \ Debug \ Exception \ FatalThrowableError (E_RECOVERABLE_ERROR)
Type error: Argument 1 passed to Illuminate\Database\Connection::prepareBindings() must be of the type array, string given, called in D:\AppServ\www\product\vendor\laravel\framework\src\Illuminate\Database\Connection.php on line 665
You have to explicitly create a query:
DB::query()->select(...
You place the table name in the wrong place. Use table method.
echo DB::table('a16s as A')
->select('A.id','A.route','B.approve')
...
->orderby('A.id', 'DESC')->toSql();
reveals normal sql like
select A.id, A.route, B.approve, SUM(CASE WHEN B.approve
IS NULL THEN 0 ELSE 1 END) as Ashowcount, SUM(CASE WHEN B.approve = 0
THEN 1 ELSE 0 END) as Nshow, SUM(CASE WHEN B.approve = 1 THEN 1 ELSE 0
END) as Yshow, SUM(CASE WHEN B.approve = 1 AND b.u_id = 2 then 1 when
B.approve = 0 AND b.u_id = 2 then 0 ELSE null END) as U_idshow from
a16s as A inner join a16s_like as B on A.ID = B.p_id
group by A.id order by A.id desc
This little code snippet is supposed to loop through a sorted data frame. It keeps a count of how many successive rows have the same information in columns aIndex and cIndex and also bIndex and dIndex. If these are the same, it deposits the count and increments it for the next time around, and if they differ, it deposits the count and resets it to 1 for the next time around.
for (i in 1:nrow(myFrame)) {
if (myFrame[i, aIndex] == myFrame[i, cIndex] &
myFrame[i, bIndex] == myFrame[i, dIndex]) {
myFrame[i, eIndex] <- count
count <- (count + 1)
} else {
myFrame[i, eIndex] <- count
count <- 1
}
}
It's been running for a long time now. I understand that I'm supposed to vectorize whenever possible, but I'm not really seeing it here. What am I supposed to do to make this faster?
Here's what an example few rows should look like after running:
aIndex bIndex cIndex dIndex eIndex
1 2 1 2 1
1 2 1 2 2
1 2 4 8 3
4 8 1 4 1
1 4 1 4 1
I think this will do what you want; the tricky part is that the count resets after the difference, which effectively puts a shift on the eIndex.
There (hopefully) is an easier way to do this, but this is what I came up with.
tmprle <- rle(((myFrame$aIndex == myFrame$cIndex) &
(myFrame$bIndex == myFrame$dIndex)))
myFrame$eIndex <- c(1,
unlist(ifelse(tmprle$values,
Vectorize(seq.default)(from = 2,
length = tmprle$lengths),
lapply(tmprle$lengths,
function(x) {rep(1, each = x)})))
)[-(nrow(myFrame)+1)]
which gives
> myFrame
aIndex bIndex cIndex dIndex eIndex
1 1 2 1 2 1
2 1 2 1 2 2
3 1 2 4 8 3
4 4 8 1 4 1
5 1 4 1 4 1
Maybe this will work. I have reworked the rle and sequence bits.
dat <- read.table(text="aIndex bIndex cIndex dIndex
1 2 1 2
1 2 1 2
1 2 4 8
4 8 1 4
1 4 1 4", header=TRUE, as.is=TRUE,sep = " ")
dat$eIndex <-NA
#identify rows where a=c and b=d, multiply by 1 to get a numeric vector
dat$id<-(dat$aIndex==dat$cIndex & dat$bIndex==dat$dIndex)*1
#identify sequence
runs <- rle(dat$id)
#create sequence, multiply by id to keep only identicals, +1 at the end
count <-sequence(runs$lengths)*dat$id+1
#shift sequence down one notch, start with 1
dat$eIndex <-c(1,count[-length(count)])
dat
aIndex bIndex cIndex dIndex eIndex id
1 1 2 1 2 1 1
2 1 2 1 2 2 1
3 1 2 4 8 3 0
4 4 8 1 4 1 0
5 1 4 1 4 1 1
create or replace procedure prcdr_Clustering is
v_sampleCount number;
v_sampleFlag number;
v_matchPercent number;
v_SpendAmount Number(18, 2);
cursor cur_PDCSample is
SELECT *
FROM TBL_BIL
WHERE UDF_CHK = 'N';
rec_Pdcsample TBL_BIL%rowtype;
BEGIN
OPEN cur_PDCSample;
LOOP
FETCH cur_PDCSample
into rec_Pdcsample;
EXIT WHEN cur_PDCSample%NOTFOUND;
SELECT COUNT(*)
INTO v_sampleCount
FROM TBL_BIL
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
IF v_sampleCount <> 0 THEN
UPDATE TBL_BIL
SET UDF_CHK = 'Y'
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
IF v_sampleCount > 1 THEN
v_sampleFlag := 1;
ELSE
IF v_sampleCount = 1 THEN
v_sampleFlag := 2;
ELSE
v_sampleFlag := 0;
END IF;
END IF;
UPDATE TBL_BIL
SET UDF_SAMPLECOUNT = v_sampleCount, UDF_SAMPLEFLAG = v_sampleFlag
WHERE uniqueid = rec_Pdcsample.uniqueid;
UPDATE TBL_BIL
SET UDF_PID = rec_Pdcsample.uniqueid
WHERE UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
UPDATE TBL_BIL
SET UDF_PIDSPEND = v_SpendAmount
WHERE uniqueid = rec_Pdcsample.uniqueid;
UPDATE TBL_BIL
SET UDF_MATCHPERCENT = 1
WHERE uniqueid <> rec_Pdcsample.uniqueid
AND UDF_TOKENIZED = rec_Pdcsample.UDF_TOKENIZED;
END IF;
IF cur_PDCSample%ISOPEN THEN
CLOSE cur_PDCSample;
END IF;
OPEN cur_PDCSample;
END LOOP;
IF cur_PDCSample%ISOPEN THEN
CLOSE cur_PDCSample;
END IF;
end PrcdrClustering;
It takes me days to execute, my table has 225,846 rows of data.
The structure of my table is :-
UNIQUEID NUMBER Notnull primary key
VENDORNAME VARCHAR2(200)
SHORTTEXT VARCHAR2(500)
SPENDAMT NUMBER(18,2)
UDF_TOKENIZED VARCHAR2(999)
UDF_PID NUMBER(10)
UDF_SAMPLEFLAG NUMBER(4)
UDF_SAMPLECOUNT NUMBER(4)
UDF_MATCHPERCENT NUMBER(4)
UDF_TOKENCNT NUMBER(4)
UDF_PIDSPEND NUMBER(18,2)
UDF_CHK VARCHAR2(1)
Where to start? I've a number points to make.
You're doing bulk updates; this implies that bulk collect ... forall would be far more efficient.
You're doing multiple updates of the same table, which doubles the amount of DML.
As you've already selected from the table, re-entering it to do another count is pretty pointless, use an analytic function to get the result you need.
Indentation, indentation, indentation. Makes your code much easier to read.
You can use elsif to reduce the amount of statements to be evaluated ( very, very minor win )
If the uniqueid is unique you can use rowid to update the table.
You're updating udf_pidspend to null, whether this is intentional or not there's no need to do a separate update for it.
You can do a lot more in the cursor, but there's obviously no need to select everything, which'll decrease the amount of data you need to read from the disks.
You may need a couple of commits in there; though this means you can't rollback if it fails midway.
I hope tbl_bil is indexed on uniqueid
As GolzeTrol noted you're opening the cursor multiple times. There's no need for this.
As general rules:
If you're going to select / update or delete from a table do it once if possible and as few times as possible if not.
If you're doing bulk operations use bulk collect.
Never write select *
Use rowid where possible it avoids all index problems.
This will only work in 11G, I answered this question recently where I provided my own way of dealing with this implementation restriction in versions prior to 11G and linked to Ollie's, Tom Kyte's and Sathya's
I'm not entirely certain what you're trying to do here so please forgive me if the logic is a little off.
create or replace procedure prcdr_Clustering is
cursor c_pdcsample is
select rowid as rid
, count(*) over ( partition by udf_tokenized ) as samplecount
, udf_chk
, max(uniqueid) over ( partition by udf_tokenized ) as udf_pid
from tbl_bil
where udf_chk = 'N';
type t__pdcsample is table of c_pdcsample%rowtype index by binary_integer;
t_pdcsample t__pdcsample;
begin
open c_pdcsample;
loop
fetch c_pdcsample bulk collect into t_pdcsample limit 1000;
exit when t_pdcsample.count = 0;
if t_pdcsample.samplecount <> 0 then
t_pdcsample.udf_chk := 'y';
if t_pdcsample.samplecount > 1 then
t_pdcsample.samplecount := 1;
elsif t_pdcsample.samplecount = 1 then
t_pdcsample.samplecount := 2;
else
t_pdcsample.samplecount := 0;
end if;
end if;
forall i in t_pdcsample.first .. t_pdcsample.last
update tbl_bil
set udfsamplecount = t_pdcsample.samplecount
, udf_sampleflag = t_pdcsample.sampleflag
, udf_pidspend = null
, udf_pid = t_pdcsample.udf_pid
where rowid = t_pdcsample(i).rowid
;
for i in t_pdcsample.first .. t_pdcsample.last loop
update tbl_bil TBL_BIL
set udfmatchpercent = 1
where uniqueid <> t_pdcsample.uniqueid
and udf_tokenized = t_pdcsample.udf_tokenized;
end loop;
commit ;
end loop;
close c_pdcsample;
end PrcdrClustering;
/
Lastly calling all tables tbl_... is a little bit unnecessary.
Here is a variant using a single SQL statement. I'm not 100% certain that the logic is exactly the same, but for my test set, it is. Also the current procedure is non deterministic when you have more than one record with udf_chk = 'N' and the same udf_tokenized ...
This is the refactored procedure
SQL> create procedure prcdr_clustering_refactored
2 is
3 begin
4 merge into tbl_bil t
5 using ( select tb1.uniqueid
6 , count(*) over (partition by tb1.udf_tokenized) cnt
7 , max(decode(udf_chk,'N',uniqueid)) over (partition by tb1.udf_tokenized order by tb1.udf_chk) pid
8 from tbl_bil tb1
9 where udf_chk = 'N'
10 or exists
11 ( select 'dummy'
12 from tbl_bil tb2
13 where tb2.udf_tokenized = tb1.udf_tokenized
14 )
15 ) q
16 on ( t.uniqueid = q.uniqueid )
17 when matched then
18 update
19 set t.udf_samplecount = decode(t.udf_chk,'N',q.cnt,t.udf_samplecount)
20 , t.udf_sampleflag = decode(t.udf_chk,'N',decode(q.cnt,1,2,1),t.udf_sampleflag)
21 , t.udf_pid = q.pid
22 , t.udf_pidspend = decode(t.udf_chk,'N',null,t.udf_pidspend)
23 , t.udf_matchpercent = decode(t.udf_chk,'N',t.udf_matchpercent,1)
24 , t.udf_chk = 'Y'
25 ;
26 end;
27 /
Procedure created.
And here is a test:
SQL> select *
2 from tbl_bil
3 order by uniqueid
4 /
UNIQUEID VENDORNAME SHORTTEXT SPENDAMT UDF_TOKENI UDF_PID UDF_SAMPLEFLAG UDF_SAMPLECOUNT UDF_MATCHPERCENT UDF_TOKENCNT UDF_PIDSPEND U
-------- ---------- ---------- -------- ---------- ------- -------------- --------------- ---------------- ------------ ------------ -
1 a a 1 bl 0 0 0 0 0 0 N
2 a a 1 bla 0 0 0 0 0 0 N
3 a a 1 bla 0 0 0 0 0 0 Y
4 a a 1 bla 0 0 0 0 0 0 Y
5 a a 1 bla 0 0 0 0 0 0 Y
6 a a 1 blah 0 0 0 0 0 0 N
7 a a 1 blah 0 0 0 0 0 0 Y
8 a a 1 blah 0 0 0 0 0 0 Y
9 a a 1 blah 0 0 0 0 0 0 Y
10 a a 1 blah 0 0 0 0 0 0 Y
11 a a 1 blah 0 0 0 0 0 0 Y
11 rows selected.
SQL> exec prcdr_clustering
PL/SQL procedure successfully completed.
SQL> select *
2 from tbl_bil
3 order by uniqueid
4 /
UNIQUEID VENDORNAME SHORTTEXT SPENDAMT UDF_TOKENI UDF_PID UDF_SAMPLEFLAG UDF_SAMPLECOUNT UDF_MATCHPERCENT UDF_TOKENCNT UDF_PIDSPEND U
-------- ---------- ---------- -------- ---------- ------- -------------- --------------- ---------------- ------------ ------------ -
1 a a 1 bl 1 2 1 0 0 Y
2 a a 1 bla 2 1 4 0 0 Y
3 a a 1 bla 2 0 0 1 0 0 Y
4 a a 1 bla 2 0 0 1 0 0 Y
5 a a 1 bla 2 0 0 1 0 0 Y
6 a a 1 blah 6 1 6 0 0 Y
7 a a 1 blah 6 0 0 1 0 0 Y
8 a a 1 blah 6 0 0 1 0 0 Y
9 a a 1 blah 6 0 0 1 0 0 Y
10 a a 1 blah 6 0 0 1 0 0 Y
11 a a 1 blah 6 0 0 1 0 0 Y
11 rows selected.
SQL> rollback
2 /
Rollback complete.
SQL> exec prcdr_clustering_refactored
PL/SQL procedure successfully completed.
SQL> select *
2 from tbl_bil
3 order by uniqueid
4 /
UNIQUEID VENDORNAME SHORTTEXT SPENDAMT UDF_TOKENI UDF_PID UDF_SAMPLEFLAG UDF_SAMPLECOUNT UDF_MATCHPERCENT UDF_TOKENCNT UDF_PIDSPEND U
-------- ---------- ---------- -------- ---------- ------- -------------- --------------- ---------------- ------------ ------------ -
1 a a 1 bl 1 2 1 0 0 Y
2 a a 1 bla 2 1 4 0 0 Y
3 a a 1 bla 2 0 0 1 0 0 Y
4 a a 1 bla 2 0 0 1 0 0 Y
5 a a 1 bla 2 0 0 1 0 0 Y
6 a a 1 blah 6 1 6 0 0 Y
7 a a 1 blah 6 0 0 1 0 0 Y
8 a a 1 blah 6 0 0 1 0 0 Y
9 a a 1 blah 6 0 0 1 0 0 Y
10 a a 1 blah 6 0 0 1 0 0 Y
11 a a 1 blah 6 0 0 1 0 0 Y
11 rows selected.
Regards,
Rob.
I don't know why, but you open the cur_PDCSample, which select (I suspect) thousands of records. And then, in a loop, you close the cursor and reopen it, each time processing only the first record that is returned.
If you open the cursor once, process each record and then close it, your procedure will probably go a lot faster.
Actually, since you do not always update TBL_BIL.UDF_CHK to 'Y', it seems to me that your current procedure may run infinitely.