How do I count one value and then count distinct another one in Qlik - distinct

I am trying to count how many times all my machines stopped and then I want to count distinct and see how many machines have stopped.
So how many times they stopped - and how many of the machines stopped
count(DISTINCT MachineNumber if ({$<Program = {"DETECT
STOP"}>})MachineNumber)
Thanks!

You need 2 expressions
Number of Machines:
count({$<Program = {"DETECT STOP"}>} DISTINCT MachineNumber)
Number of Stop Events:
count({$<Program = {"DETECT STOP"}>} MachineNumber)

Related

How to get a distinct count based on date

I have a requirement to get a distinct count of people we offered a job, problem is since we can offer multiple jobs on potential candidate , when i write my query its counting multiple offers. Requirement is to count only the first offer, any subsequent offer should not count. any suggestions on this?
You can use this COUNT(DISTINCT...) like the following :
SELECT a.p_id, b.p_name, c.p_desc,
COUNT(DISTINCT CASE WHEN a.date BETWEEN TRUNC(ADD_MONTHS(LAST_DAY(sysdate),-4) + 1) AND
ADD_MONTHS(LAST_DAY(TO_DATE(sysdate)),-1) ...

Can a count of a KGroupedTable be negative?

My code is applying a groupBy on a KTable, followed by a count:
KStream<AggregationFields, Long> theCounts = theTable
.groupBy((key, value) -> {
AggregationFields af = new AggregationFields(
value.getUser(),
value.getGroup(),
value.getSegment);
return KeyValue.pair(af, 1L);
}, Serialized.with(AggregationFields.getSerde(), Serdes.Long()))
.count()
.toStream();
In my production environment I sometimes see the count producing negative numbers upon starting this application, even though I am using the app reset tool to make sure no internal topics are left over, as well as deleting any local stream state. Is there any circumstance where the count can be negative? Did I do it wrong?
I am on kafka-streams 1.0.1 (however, the server is running a pre-1.0 version, not sure if that matters).
Each time, the base table is updated, Kafka Streams needs to send two record downstream to update the count, because in general, with multiple partitions, the two update record might be processed on different machines. One record is a "negative" subtraction record and the second is as "positive" addition record to the counts of potentially different keys.
If the update on the base table does not result in a key change for the count(), both records will be processed after each other and if the current count is zero, we would first decrease the count by one while processing the subtraction record and afterward increase the count again. For this special case, you might see a negative intermediate result.
Turns out I had my Streams app in a bad state, even though (I thought) I cleaned it up. Once I deployed it again with a new app ID, the counts looked good.

Generate pairs from list that hasn't already historically existed

I'm building a pairing system that is supposed to create a pairing between two users and schedule them in a meeting. The selection is based upon a criteria that I am having a hard time figuring out. The criteria is that an earlier match cannot have existed between the pair.
My input is a list of size n that contains email addresses. This list is supposed to be split into pairs. The restriction is that this match hasn't occured previously.
So for example, my list would contain a couple of user ids
list = {1,5,6,634,533,515,61,53}
At the same time i have a database table where the old pairs exist:
previous_pairs
---------------------
id date status
1 2016-10-14 12:52:24.214 1
2 2016-10-15 12:52:24.214 2
3 2016-10-16 12:52:24.214 0
4 2016-10-17 12:52:24.214 2
previous_pair_users
---------------------
id userid
1 1
1 5
2 634
2 553
3 515
3 61
4 53
4 1
What would be a good approach to solve this problem? My test solution right now is to pop two random users and checking them for a previous match. If there exists no match, i pop a new random (if possible) and push one of the incorrect users back to the list. If the two people are last they will get matched anyhow. This doesn't sound good to me since i should predict which matches that cannot occur based on my list with already "existing" pairs.
Do you have any idea on how to get me going in regards to building this procedure? Java 8 streams looks interesting and might be a way to solve this, but i am very new to that unfortunately.
The solution here was to create a list with tuples that contain the old matches using group_concat feature of MySQL:
SELECT group_concat(MatchProfiles.ProfileId) FROM Matches
INNER JOIN MatchProfiles ON Matches.MatchId = MatchProfiles.MatchId
GROUP BY Matches.MatchId
old_matches = ((42,52),(12,52),(19,52),(10,12))
After that I select the candidates and generate a new list of tuples using my pop_random()
new_matches = ((42,12),(19,48),(10,36))
When both lists are done I look at the intersection to find any duplicates
duplicates = list(set(new_matches) & set(old_matches))
If we have duplicates we simply run the randomizer again X attemps until I find it impossible.
I know that this is not very effective when having a large set of numbers but my dataset will never be that large so I think it will be good enough.

Performance tuning tips -Plsql/sql-Database

We are facing performance issue in production. Mv refersh program is running for long, almost 13 to 14 hours.
In the MV refersh program is trying to refersh 5 MV. Among that one of the MV is running for long.
Below is the MV script which is running for long.
SELECT rcvt.transaction_id,
rsh.shipment_num,
rsh.shipped_date,
rsh.expected_receipt_date,
(select rcvt1.transaction_date from rcv_transactions rcvt1
where rcvt1.po_line_id = rcvt.po_line_id
AND rcvt1.transaction_type = 'RETURN TO VENDOR'
and rcvt1.parent_transaction_id=rcvt.transaction_id
)transaction_date
FROM rcv_transactions rcvt,
rcv_shipment_headers rsh,
rcv_shipment_lines rsl
WHERE 1 =1
AND rcvt.shipment_header_id =rsl.shipment_header_id
AND rcvt.shipment_line_id =rsl.shipment_line_id
AND rsl.shipment_header_id =rsh.shipment_header_id
AND rcvt.transaction_type = 'RECEIVE';
Shipment table contains millions of records and above query is trying to extract almost 60 to 70% of the data. We are suspecting data load is the reason.
We are trying to improve the performance for the above script.So we added date filter to restrict the data.
SELECT rcvt.transaction_id,
rsh.shipment_num,
rsh.shipped_date,
rsh.expected_receipt_date,
(select rcvt1.transaction_date from rcv_transactions rcvt1
where rcvt1.po_line_id = rcvt.po_line_id
AND rcvt1.transaction_type = 'RETURN TO VENDOR'
and rcvt1.parent_transaction_id=rcvt.transaction_id
)transaction_date
FROM rcv_transactions rcvt,
rcv_shipment_headers rsh,
rcv_shipment_lines rsl
WHERE 1 =1
AND rcvt.shipment_header_id =rsl.shipment_header_id
AND rcvt.shipment_line_id =rsl.shipment_line_id
AND rsl.shipment_header_id =rsh.shipment_header_id
AND rcvt.transaction_type = 'RECEIVE'
AND TRUNC(rsh.creation_date) >= NVL(TRUNC((sysdate - profile_value),'MM'),TRUNC(rsh.creation_date) );
For 1 year profile, it shows some improvement but if we give for 2 years range its more worse than previous query.
Any suggestions to improve the performance.
Pls help
I'd pull out that scalar subquery into a regular outer join.
Costing for scalar subqueries can be poor and you are forcing it to do a lot of single record lookups (presumably via index) rather than giving it other options.
"The main query then has a scalar subquery in the select list.
Oracle therefore shows two independent plans in the plan table. One for the driving query – which has a cost of two, and one for the scalar subquery, which has a cost of 2083 each time it executes.
But Oracle does not “know” how many times the scalar subquery will run (even though in many cases it could predict a worst-case scenario), and does not make any cost allowance whatsoever for its execution in the total cost of the query."

Compute average number of rows per group in Sql Server Reporting Services

I have a table widget in reporting services where I group rows on a given id.
For each group, I display the number of rows per group, using countrows().
How can I display the average number of rows per group at the end of my report ?
What I am missing is : how to count the number of groups?
This is from memory - not sure if this expression is correct:
= Count(Fields!ID.Value) / CountDistinct(Fields!ID.Value)
Assuming that the "ID" you're grouping by is a single field, that sort of expression should get you what you want.

Resources