pl/sql: Functions - oracle

I have three column value in excel sheet
A: # of unsuccessful transfers to CCR (CTI) =11986
B: # of calls NOT wrapped =8585
C: # of wrapped calls= 15283
and total of the three column is # of incoming calls(CTI)= 37017( this is sum of # of wrapped calls + # of unsuccessful transfers to CCR (CTI) + # of calls NOT wrapped)
I also calculate # of unaccounted calls(This is substracion of # of wrapped calls - # of unsuccessful transfers to CCR (CTI) - # of calls NOT wrapped)
So my # of unaccounted calls = 1163
Now i have to find out percentage of uncounted calls so i divide 37017/1163
So my percentatge is 3%, ideally it should be 0%, how do i find out in oracle that out of 3% what percent falls in A, B or C.

A B C comes from database qry, and the source is same but bunch of
different filters for each qry for A B and C
That might allow you spot a pattern in the rows that aren't picked up by any A, B or C, though you'd still need to work out which of the three queries you would have expected each row (or pattern of rows) to have been picked up by, and why they were missed.
Since the sum of the counts from the three queries with additional filters is lower than the count from the query without those filters, you seem to have a gap in the filters themselves. If I had to guess then the first place I'd look is for incorrect handling of null values, trying to equate them (since null is neither equal or not equal to anything, even itself). But that's clearly speculation, and without seeing the filters and knowing which columns can be null isn't very helpful.
You can maybe isolate the 1163 rows that aren't showing up by using minus to find the rows picked up by the 'total' query and not included by any of those producing A, B and C; something like:
select *
from xx_new.xx_cti_call_details#appsread.prd.com
where dealer_name = 'XYG'
and TRUNC(CREATION_DATE) BETWEEN '01-JUL-2012' AND '31-JUL-2012'
minus
select *
from xx_new.xx_cti_call_details#appsread.prd.com
where dealer_name = 'XYG'
and TRUNC(CREATION_DATE) BETWEEN '01-JUL-2012' AND '31-JUL-2012'
and <additional filters for A>
minus
select *
from xx_new.xx_cti_call_details#appsread.prd.com
where dealer_name = 'XYG'
and TRUNC(CREATION_DATE) BETWEEN '01-JUL-2012' AND '31-JUL-2012'
and <additional filters for B>
minus
select *
from xx_new.xx_cti_call_details#appsread.prd.com
where dealer_name = 'XYG'
and TRUNC(CREATION_DATE) BETWEEN '01-JUL-2012' AND '31-JUL-2012'
and <additional filters for C>
I'm curious about you having a distinct in your initial query though, since it suggests you're counting switches calls are made from rather than the calls themselves. It also might mean the counts should not add up - though in that case I'd perhaps expect A+B+C to be greater than the simple as there would be the potential for overlaps - and that select * might actually return more than 1163 rows; in which case you might only want to select the columns you think might be a problem.
Incidentally, if creation_date is indexed then you might get better performance with where creation_date >= date '2012-07-01' and creation_date < date '2012-08-01', as the trunk() function woudl prevent the index being used. Might not be an issue for you though.

Related

How to restrict query result from multiple instances of overlapping date ranges in Django ORM

First off, I admit that I am not sure whether what I am trying to achieve is possible (or even logical). Still I am putting forth this query (and if nothing else, at least be told that I need to redesign my table structure / business logic).
In a table (myValueTable) I have the following records:
Item
article
from_date
to_date
myStock
1
Paper
01/04/2021
31/12/9999
100
2
Tray
12/04/2021
31/12/9999
12
3
Paper
28/04/2021
31/12/9999
150
4
Paper
06/05/2021
31/12/9999
130
As part of the underlying process, I am to find out the value (of field myStock) as on a particular date, say 30/04/2021 (assuming no inward / outward stock movement in the interim).
To that end, I have the following values:
varRefDate = 30/04/2021
varArticle = "Paper"
And my query goes something like this:
get_value = myValueTable.objects.filter(from_date__lte=varRefDate, to_date__gte=varRefDate).get(article=varArticle).myStock
which should translate to:
get_value = SELECT myStock FROM myValueTable WHERE varRefDate BETWEEN from_date AND to_date
But with this I am coming up with more than one result (actually THREE!).
How do I restrict the query result to get ONLY the 3rd instance i.e. the one with value "150" (for article = "paper")?
NOTE: The upper limit of date range (to_date) is being kept constant at 31/12/9999.
Edit
Solved it. In a round about manner. Instead of .get, resorted to generating values_list with fields from_date and myStock. Using the count of objects returned; appended a list with date difference between from_date and the ref date (which is 30/04/2021) and the value of field myStock, sorted (ascending) the generated list. The first tuple in the sorted list will have the least date difference and the corresponding myStock value and that will be the value I am searching for. Tested and works.

Cypher: slow query optimization

I am using redisgraph with a custom implementation of ioredis.
The query runs 3 to 6 seconds on a database that has millions of nodes. It basically filters (b:brand) by different relationship counts by adding the following match and where multiple times on different nodes.
(:brand) - 1mil nodes
(:w) - 20mil nodes
(:e) - 10mil nodes
// matching b before this codeblock
MATCH (b)-[:r1]->(p:p)<-[:r2]-(w:w)
WHERE w.deleted IS NULL
WITH count(DISTINCT w) as count, b
WHERE count >= 0 AND count <= 10
The full query would look like this.
MATCH (b:brand)
WHERE b.deleted IS NULL
MATCH (b)-[:r1]->(p:p)<-[:r2]-(w:w)
WHERE w.deleted IS NULL
WITH count(DISTINCT w) as count, b
WHERE count >= 0 AND count <= 10
MATCH (c)-[:r3]->(d:d)<-[:r4]-(e:e)
WHERE e.deleted IS NULL
WITH count(DISTINCT e) as count, b
WHERE count >= 0 AND count <= 10
WITH b ORDER by b.name asc
WITH count(b) as totalCount, collect({id: b.id)[$cursor..($cursor+$limit)] AS brands
RETURN brands, totalCount
How can I optimize this query as it's really slow?
A few thoughts:
Property lookups are expensive; is there a way you can get around all the .deleted checks?
If possible, can you avoid naming r1, r2, etc.? It's faster when it doesn't have to check the relationship type.
You're essentially traversing the entire graph several times. If the paths b-->p<--w and c-->d<--e don't overlap, you can include them both in the match statement, separated by a comma, and aggregate both counts at once
I don't know if it'll help much, but you don't need to name p and d since you never refer to them
This is a very small improvement, but I don't see a reason to check count >= 0
Also, I'm sure you have your reasons, but why does the c-->d<--e path matter? This would make more sense to me if it were b-->d<--e to mirror the first portion.
EDIT/UPDATE: A few things I said need clarification:
First bullet:
The fastest lookup is on a node label; up to 4 labels are essentially O(0). (Well, for anchor nodes, it's slower for downstream nodes.)
The second-fastest lookup is on an INDEXED property. My comment above assumed UNINDEXED lookups.
Second bullet: I think I was just wrong here. Relationships are stored as doubly-linked lists grouped by relationship type. Therefore, always specify relationship type for better performance. Similarly, always specify direction.
Third bullet: What I said is generally correct, HOWEVER beware of Cartesian joins when you have two MATCH statements separated by a comma. In general, you would only use that structure when you have a common element, like you want directors, actors, and cinematographers all connected to a movie. Still, no overlap between these paths.

Strange behaviour when using FILTER to filter a different table with no direct relationship?

I have two facts tables, First and Second, and two dimension tables, dimTime and dimColour.
Fact table First looks like this:
and facet table Second looks like this:
Both dim-tables have 1:* relationships to both fact tables and the filtering is one-directional (from dim to fact), like this:
dimColour[Color] 1 -> * First[Colour]
dimColour[Color] 1 -> * Second[Colour]
dimTime[Time] 1 -> * First[Time]
dimTime[Time] 1 -> * Second[Time_]
Adding the following measure, I would expect the FILTER-functuion not to have any affect on the calculation, since Second does not filter First, right?
Test_Alone =
CALCULATE (
SUM ( First[Amount] );
First[Alone] = "Y";
FILTER(
'Second';
'Second'[Colour]="Red"
)
)
So this should evaluate to 7, since only two rows in First have [Alone] = "Y" with values 1 and 6 and that there is no direct relationship between First and Second. However, this evaluates to 6. If I remove the FILTER-function argument in the calculate, it evaluates to 7.
There are thre additional measures in the pbix-file attached which show the same type of behaviour.
How is filtering one fact table which has no direct relationship to a second fact table affecting the calculation done on the second table?
Ziped Power BI-file: PowerBIFileDownload
Evaluating the table reference 'Second' produces a table that includes the columns in both the Second table, as well as those in all the (transitive) parents of the Second table.
In this case, this is a table with all of the columns in dimColour, dimTime, Second.
You can't see this if you just run:
evaluate 'Second'
as when 'evaluate' returns the results to the user, these "Parent Table" (or "Related") columns are not included.
Even so, these columns are certainly present.
When a table is converted to a row context, these related columns become available via RELATED.
See the following queries:
evaluate FILTER('Second', ISBLANK(RELATED(dimColour[Color])))
evaluate 'Second' order by RELATED(dimTime[Hour])
Similarly, when arguments to CALCULATE are used to update the filter context, these hidden "Related" columns are not ignored; hence, they can end up filtering First, in your example. You can see this, by using a function that strips the related columns, such as INTERSECT:
Test_ActuallyAlone = CALCULATE (
SUM ( First[Amount] ),
First[Alone] = "Y",
//This filter now does nothing, as none of the columns in Second
//have an impact on 'SUM ( First[Amount] )'; and the related columns
//are removed by the INTERSECT.
FILTER(
INTERSECT('Second', 'Second')
'Second'[Colour]="Red"
)
)
(See these resources that describe the "Expanded Table"
(this is an alternative but equivalent explanation of this behaviour)
https://www.sqlbi.com/articles/expanded-tables-in-dax/
https://www.sqlbi.com/articles/context-transition-and-expanded-tables/
)

How to filter clickhouse table by array column contents?

I have a clickhouse table that has one Array(UInt16) column. I want to be able to filter results from this table to only get rows where the values in the array column are above a threshold value. I've been trying to achieve this using some of the array functions (arrayFilter and arrayExists) but I'm not familiar enough with the SQL/Clickhouse query syntax to get this working.
I've created the table using:
CREATE TABLE IF NOT EXISTS ArrayTest (
date Date,
sessionSecond UInt16,
distance Array(UInt16)
) Engine = MergeTree(date, (date, sessionSecond), 8192);
Where the distance values will be distances from a certain point at a certain amount of seconds (sessionSecond) after the date. I've added some sample values so the table looks like the following:
Now I want to get all rows which contain distances greater than 7. I found the array operators documentation here and tried the arrayExists function but it's not working how I'd expect. From the documentation, it says that this function "Returns 1 if there is at least one element in 'arr' for which 'func' returns something other than 0. Otherwise, it returns 0". But when I run the query below I get three zeros returned where I should get a 0 and two ones:
SELECT arrayExists(
val -> val > 7,
arrayEnumerate(distance))
FROM ArrayTest;
Eventually I want to perform this select and then join it with the table contents to only return rows that have an exists = 1 but I need this first step to work before that. Am I using the arrayExists wrong? What I found more confusing is that when I change the comparison value to 2 I get all 1s back. Can this kind of filtering be achieved using the array functions?
Thanks
You can use arrayExists in the WHERE clause.
SELECT *
FROM ArrayTest
WHERE arrayExists(x -> x > 7, distance) = 1;
Another way is to use ARRAY JOIN, if you need to know which values is greater than 7:
SELECT d, distance, sessionSecond
FROM ArrayTest
ARRAY JOIN distance as d
WHERE d > 7
I think the reason why you get 3 zeros is that arrayEnumerate enumerates over the array indexes not array values, and since none of your rows have more than 7 elements arrayEnumerates results in 0 for all the rows.
To make this work,
SELECT arrayExists(
val -> distance[val] > 7,
arrayEnumerate(distance))
FROM ArrayTest;

Slicing neo4j Cypher results in chunks

I want to slice Cypher results in chunks of 100 rows, and be able to retrieve a specific chunk.
At the moment, the only way to ensure that rows are not mixed-up is to user ORDER BY which makes the query very inefficient ( 3sec. for me is too much)
MATCH (p:Person) RETURN p.id ORDER BY p.id SKIP {chunk}*100 LIMIT 100
where {chunk} is an external parameter to identify a specific chunk.
Any suggestions?
PS: the property p.id is indexed.
You may try something like adding label to Person before extracting chunks and then using query like
Match (p:Chunk:Person) with p LIMIT 100
Match (p) remove p:Chunk
Return *
If the p.id values are unique and dense (say, the value starts at 1 and increments, without any gaps), then this query will take advantage of the index on :Person(id) to efficiently get each hundred-Person chunk:
WITH (({chunk} - 1) * 100 + 1) AS startId
MATCH (p:Person)
WHERE p.id IN RANGE(startId, startId + 99)
RETURN p.id
ORDER BY p.id
Now, practically speaking, your id space will probably not remain dense, even if it started out that way. Person nodes will be deleted over time. In that case, the above query can return fewer than 100 rows. So, you can make your chunk size bigger than 100 and do some post-processing to get the 100 you need. In the worst case, you may need to make multiple requests to get the 100 you need, but each request will be fast. (Ideally, you would want to assign no-longer-unused id values to new Person nodes, to fill up gaps in the id space -- but this would require you to scan for the gaps.)

Resources