Amazon QuickSight - Single Activity ID has Multiple entries - need to SUM one column but display only ONE VALUE from another column - amazon-quicksight

I have been working on this for some time, but have come up empty.
I have a data set called 'Technical Assistance'.
In that data set, there is a column with the 'ActivityID', another column 'NumberAssisted', and a third column 'ContactHours'.
The issue is that for each ActivityID, there can be multiple entries, each with its own NumberAssisted and ContactHours.
Additionally, for EACH ACTIVITYID I need to show NUMBERASSISTED as a SUM, but only display CONTACTHOURS once (no sum, no calculation at all, just display).
In my example scenario, I have ONE Activity ID with FOUR entries - Each entry has a Number Served, and Contact Hours. By using SUM, I can get the correct Number Assisted (5), but cannot figure out how to get Contact Hours to display what I need. It SHOULD DISPLAY as 0.5 based on the scenario below:
ActivityID NumberAssisted ContactHours
101 1 0.5
101 1 0.5
101 1 0.5
101 2 0.5
TOTAL: 5 0.5
Thank you for any guidance!!
Troy

Related

Google Sheets: Data Validation - Unique row values across multiple columns

Good day,
I have seen from here a solution to control duplicate entries into a single column. A Data validation with this custom formula works well for one column.
I would like to achieve the same effect over multiple columns ... i.e. unique row entries across multiple columns. Take for example below three columns A-C. Only when values {1,2,1} are entered for the second time will the input be rejected.
A B C
1 1 1
1 2 1
1 2 2
2 2 2
1 2 1 X Entry should be rejected.
Is there a quick way to do this using Data Validation - custom formulae?
use custom formula for data validation:
=INDEX(COUNTIF($A$1:$A&"×"&$B$1:$B&"×"&$C$1:$C, $A1&"×"&$B1&"×"&$C1)<2)

JPA best way to avoid n+1 when I need to make a calculation for each row

My application is used to find place in a city. Each place needs a score to be calculated and this score cannot be predicted in advance (stored somewhere) as it is different for each user and changes over time. Here is was i'm at the moment doing an that is TERRIBLY inneficient (15 times slower than if I mock the database call inside the loop)
SQL(native) query to fetch all the places that matches the search (I select all the column I need specifically)
I loop through the List and for each poi I make a db call to get the info needed to calculate the scores (I need different value residing on different tables)
make the calculation
sort by score desc
cut the list depending on the pagination setting (yes I cannot put LIMIT directly in the query as i don't know the score yet....)
return the List.
Well, this takes 15 seconds in total.
If I remove 2. and simply mock the DB call it only takes 600ms..
my table looks like this:
place_tag_count table:
place_id / tag_id / tag_count
1 100 15
1 200 25
1 300 35
user_tag_score Table:
user_id / tag_id / score
1000 100 0.5
1000 200 0.3
as a simplified example the place score is the sum of the user's tag score multiplied by the tag count found in the place_tag_count
score = 0.5 * 15 + 0.3 * 25 + (i won't complicate the thing but if a tag score is missing i do other calculation that need other db calls....)
the query at 1. returns a distinct place so because the calculation needs all the counts from of the tag's place and the user's tag score I need to make that extra DB call for each poi.
My question is, what would be the BEST way to avoid having n+1 call in my situation? I have thought to some alternative but I prefer having the opinion of a more experience person before going head first.
Instead of returning a distinct place in the query in 1) I actually return the same place grouped by place_id,tag_id for example, and in my Java code I just loop and when I see that the place_id change it means I'm processing an other place
make the query in 1. a bitttt more complicate and aggregate all the numbers i need in a comma separated list )but that requires some kind of sub select which might affect the speed of the query)
other solution ?

Mahout recommender returns no results for a user

I'm curious why in the example below the Mahout recommender isn't returning a recommendation for user 1.
My input file is below. I added blank lines to enhance readability. This file will need the blank lines removed before it's run through Mahout.
The columns in this file are:
User ID | item number | item rating
1 101 0
1 102 0
1 103 5
1 104 0
2 101 4
2 102 5
2 103 4
2 104 0
3 101 0
3 102 5
3 103 5
3 104 3
You'll note that item 103 is the only common item that all 3 users rated.
I ran:
hadoop jar C:\hdp\mahout-0.9.0.2.1.3.0-1981\core\target\mahout-core-0.9.0.2.1.3.0-1981-job.jar org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -s SIMILARITY_COOCCURRENCE --input small_data_set.txt --output small_data_set_output
The Mahout recommendation output file shows:
2 [104:4.5]
3 [101:5.0]
Which I believe means:
User 2 would be recommended item 104. Since user 3 rated item 104 a 3 this may account for the 4.5 recommendation score vs. the result below…
User 3 would be recommended item 101. Since user 2 rated item 101 a "4" this may account for a slightly higher recommendation score of 5.
Is this correct?
Why isn't user 1 included in the recommendation output file? User 1 could have received a recommendation for Item 102 because user 2 and user 3 rated it. Is the data set too small?
Thanks in advance.
Several mistakes may be present in your data, the first two here will cause undefined behavior:
IDs must be contiguous non-zero integers starting at 0 so you need to map your IDs above somehow. So your-user-ID = 1 will be a Mahout-user-ID = 0. The same for items, your-item-ID = 101 will be Mahout-user-ID = 0.
You should omit the 0 values from the input altogether if you mean that the user has expressed no preference, this makes the preference "undefined" in a sense. To do this omit the lines entirely.
Always use SIMILARITY_LOGLIKELIHOOD, it is widely measured as doing significantly better than the other methods unless you are trying to predict ratings, in that case use cosine.
If you use LLR similarity you should omit the values since they will be ignored.
There are very few uses for preference values unless you are trying to predict a user's rating for an item. The preference weights are useless in determining recommendation ranking, which is the typical thing to optimize. If you want to recommend the right things in the right order toss the values and use LLR.
The other thing that people sometimes do with values is show some weight of preference so 1 = a view of a product page and 5 = a product purchase. This will not work! I tried this with a large ecommerce dataset and found the recommendations were worse when adding in product views, even though there was 100 times more data. They are fundamentally different user actions with different user intent and so can't be mixed in this way.
If you really do want to mix different actions use the new multimodal recommender based on Mahout, Spark, and Solr described on the Mahout site here: It allows cross-cooccurrence type indicator calculations so you can use user location, likes and dislikes, view and purchase. Virtually the entire user clickstream can be used. But only with cross-cooccurrence correlating one action to the canonical "best" action, the one you want to recommend.

Issue with OBIEE 11G. Records being counted twice

My development team is working with OBIEE 11G to make an analysis as follows:
Discover which policyholders are in alert. An alert is defined as this: when the quantity of claims of a policyholder is superior to a certain threshold. If the policyholder has at least one alert in one of its claims, then the policyholder is in alert. The problem is that those thresholds are defined to a particular key (combination of type of client, range of age, type of pathology and other stuff) and a policyholder can have many keys and a threshold for each key, so the quantity of claims varies. Something like this:
Policyholder Key #Claims Threshold
ABC123 XYZ 3 4
WQE 3 2
EFG456 ABC 1 2
The ABC123 policyholder has 6 claims in total, 3 for the key XYZ (which has a threshold of 4) and 3 for the key WQE (which has a threshold of 2). On the other hand, the EFG456 policyholder has 1 claim for the key ABC that has a threshold of 2. So in this case, ABC123 policyholder should be in alert because the quantity of claims for the key WQE is greater than the threshold.
So, in OBIEE 11G my team added two columns, one to mark the records in alert and one to mark the records which are not in alert. Like this:
Policyholder Key #Claims Threshold Alert notAlert
ABC123 XYZ 3 4 0 1
WQE 3 2 1 0
EFG456 ABC 1 2 0 1
You see the problem now? OBIEE 11G does not see policyholder ABC123 as a unit and mark it both as in alert and not in alert, which is wrong. The correct info should be:
Policyholder Key #Claims Threshold Alert notAlert
ABC123 XYZ 3 4 0 0
WQE 3 2 1 0
EFG456 ABC 1 2 0 1
Because, it doesn't matter if the policyholder did not reach the alert for key XYZ. If an alert is discovered, the complete file of the policyholder is examined to resolve the alert.
Is there anyway of telling this to OBIEE 11G???
Please help!!
I think this is a dimensional modeling problem instead of an OBIEE one:
In order to help I will make a few assumptions:
PolicyHolder and Key are separate dimensions:
Although the "key" dimension contains some attributes from the policyholder,
such as type of client and age group; it also combines other entities like pathology and to me
that is enough to consider it at least a mini dimension.
The "Is in alert flag" can be modeled as a factless fact table:
It looks like you only need to know if a particular policyholder is in alert,
there is no metric associated with the event and you only need a flag that is either 0 or 1. This can be solved with a simple table that includes at least 3 columns: FK_POLICYHOLDER,FK_DATE and the flag. You already have a flag but it is included in the claims table as a calculated column, if you model this flag as a separate table you will have control of the dimensionality and granularity of the alert. See What dw model is appropriate when there's no measure?.
The metric "number of claims" has a different dimensionality than the alert flag.
I think the crux of the problem is that flags are calculated at the key level but for reporting purposes are only needed at the Policyholder level.
If you want alerts to be assigned to a PolicyHolder "as a unit" then you need a fact table that is linked to the PolicyHolder dimension and NOT LINKED to the key dimension
Concretely:
Create a separate dimension table for your "Key" entity (type of client, pathology, etc.)
Create a new factless fact table that contains alerts for a policyholder, this table should not link to the Key dimension.
Change the "Alert" column in your report, you should get that value from the flag counter of your new factless fact table.
Firstly, the ALERT columns seem redundant. It's an incredibly simple calculation that would be better done by OBI dynamically. That way you can check for policy holders in alert (on the aggregate of their keys) or for each key.
If I wanted to fix that calculation in OBI I would do it with a calculated logical column in the BMM (based on other logical columns) simply evaluating CLAIMS against THRESHOLD:
CASE WHEN CLAIMS >= THRESHOLD THEN 1 ELSE 0 END
That way the flag can work at multiple levels (either for POLICYHOLDER or KEY). But it seems a very simple calculation that could just be done in the Analysis as a filter (or selection step).
Even simpler though (assuming you have CLAIMS and THRESHOLD as measure columns with a SUM aggregation, and POLICYHOLDER and KEY as dimension column) would be to ignore any sort of alert column altogether. If you don't bring KEY into the Analysis OBI would give you each policy holder, their total claims and total threshold. You could then use selection steps or a filter in the criteria to remove those not over the threshold.

Report Help: Crystal Reports

I am making one Crystal Report for bills.
Bills Table :
BillID(pk), PartyName, BillDate, Loading, Unloading.
BillDetails Table:
ID(pk),BillID(fk),Item, Quantity, Rate, Amount.
In the database expert I have joined the 2 tables.
I want the report like this:
BillID PartyName BillDate
SALE EXPENSES
ITEM QUANTITY RATE AMOUNT LOADING 10
toy 2 2 4 UNLOADING 20
doll 7 6 42
ball 8 6 48
cell 5 6 30
TOTAL : 160 30
NET: 160
- 30
130
The problem is that the loading and unloading appear only once per bill while the biil contains multiple items.
How I can mix details section with items that appear only once(loading and unloading) ?
You have at least two options for presented layout:
Use subreport for displaying loading/unloading values (link with bill ID)
Put your loading/unloading fields into special group header section (group by bill ID) and make that section "underlay following sections"
If you can dispaly loading/unloading values on separate row, then place them into appropriate group header/footer and you're done.
I think you'd need to use both of the options in Arvo's answer together to get what you are wanting.
I'd first create your subreport for "Loading and Unloading" and place it in the header with Item, QTY, etc and then set that header to "underlay following sections". This will get you the details of the subreport, but then you need to capture the total to send back to the main report.
To get this info you'll need to create a shared variable in the subreport for the total and then you can reference it in your formula in the main report's "Net" footer section.
Hope this helps.

Resources