Duplicate Funnel Chart % of Total Calculations - amazon-quicksight

I have a dataset that looks like the below. I have a visualization (funnel chart) that shows the steps, the counts, and the % of total. The problem is the labels are too large and get truncated by the chart. Thus I'm looking to recreate a table right below this visual with the data labels, essentially.
Step
Counts
1
1234
2
1000
3
753
4
342
The question is: How do I recreate the % of Total calculation? Every step should be divided by the step 1 count to get % of total, and I can't figure it out using sumOvers, sumIfs, etc. Ideal output is below
Step
Counts
Perc
1
1000
100%
2
900
90%
3
700
70%
4
300
30%

Related

If I have a 15 min sliding window, how can I collect daily/weekly aggregates?

I have a 15 min sliding window, and can aggregate at any given time over this data within this window. Due to memory constraints, I can't increase the size of the window. I still think I should be able to get aggregates (like trending items which is basically freq counter etc.) over a day/a week.
It doesn't have to be a very accurate count, just needs to filter out the top 3-5.
Will running a cron job every 15 mins and putting it into 4 (15min) counters work?
Can I get update some kind of a rolling counter over the aggregate?
Is there any other method to do this?
My suggestion is an exponentially decaying moving average. Like is done for the Unix load average. (See http://www.howtogeek.com/194642/understanding-the-load-average-on-linux-and-other-unix-like-systems/ for an explanation.)
What you do is pick a constant 0 < k < 1 then update every 5 minutes as follows:
moving_average = k * average_over_last_5_min + (1-k) * moving_average
This will behave something like an average over the last 5/k minutes. So if you set k = 1/(24.0 * 60.0 / 5.0) = 0.00347222222222222 then you get roughly a daily moving average. Divide that by 7 and you get roughly a weekly moving average.
The averages won't be exact, but should work perfectly well to identify what things are trending recently.

Evenly schedule timed items into fixed size containers

I got a task at work to evenly schedule commercial timed items into pre-defined commercial breaks (containers).
Each campaign has a set of commercials with or without spreading order. I need to allow users to chose multiple campaigns and distribute all the commercials to best fit the breaks within a time window.
Example of Campaign A:
Item | Duration | # of times to schedule | order
A1 15 sec 2 1
A2 25 sec 2 2
A3 30 sec 2 3
Required outcome:
each item should appear only once in a break, no repeating.
if there is specific order try to best fit by keeping the order. If
no order shuffle it.
At the end of the process the breaks should contain evenly amount of
commercial time.
Ideal spread would fully fill all desired campaigns into the breaks.
For example: Campaign {Item,Duration,#ofTimes,Order}
Campaign A which has set {A1,15,2,1},{A2,25,2,2},{A3,10,1,3}
Campaign B which has set {B1,20,2,2},{B2,35,3,1},
Campaign C which has set {C1,10,1,1},{C2,15,2,3 sec},{C3,15,1,2 sec}
,{C4,40,1,4}
A client will choose to schedule those campaigns in a specific date that hold 5 breaks of 60 second each.
A good outcome would result in:
Container 1: {A1,15}{B2,35}{C1,10} total of 60 sec
Container 2: {C3,15}{A2,25}{B1,20} total of 60 sec
Container 3: {A3,10}{C2,15}{B2,35} total of 60 sec
Container 4: {C4,40}{B1,20} total of 60 sec
Container 5: {C2,15}{A3,10}{B3,35} total of 60 sec
Of course it's rarely that all will fit so perfectly in real-life examples.
There are so many combinations with large amount of items and I'm not sure how to go about it. The order of items inside a break needs to be dynamically calculated so that the end result would best fit all the items into the breaks.
If the architecture is poor and someone has a better idea (like giving priority to items over order and schedule based on priority or such I'll be glad to hear).
It seems like simulated annealing might be a good way to approach this problem. Just incorporate your constraints of: keeping order, even spreading and fitting into 60sec frame into the scoring function. Your random neighbor function might just swap 2 items with each other | move an item to a different frame.

Ranking algorithm based spent time and user level

Think about a game, players are trying to solve problems and every problem is actually means "Level".
Players will see realtime ranking when playing.
RedisIO has sorted set feature, I'll use it.
But I don't know how to score players:
PlayerA at 7 level, total game time 80 seconds
PlayerB at 7 level, total game time 65 seconds
PlayerC at 5 level, total game time 40 seconds
PlayerD at 1 level, total game time 200 seconds
ranking that I want is just like this
1) PlayerB - because level 7 and 65 seconds
2) PlayerA - because level 7 and 80 seconds
3) PlayerC - because level 5 and 40 seconds
4) PlayerD - because level 1 and 200 seconds
I tried
(timeSpent/level)
calculation but it didn't work well when somebody is at lesser level and lesser spent time than other players.
Short answer: you can have the following function:
score = (level * HUGE_NUMBER) - timeSpent
For HUGE_NUMBER, you can select a value that is slightly larger than the maximum allowed time to finish a level.
While that might be enough for most cases, I would rather use sorting for this problem to avoid any potential unseen bugs in the ranking algorithm.
Assuming that the level of a player is the dominant factor in the ranking, I'd sort all players by level in descending order. This may give you something like that (note that it's not the final ranking yet):
1) PlayerA - because level 7 and 80 seconds
2) PlayerB - because level 7 and 65 seconds
3) PlayerC - because level 5 and 40 seconds
4) PlayerD - because level 1 and 200 seconds
Following that, I'd create sublists of players in each level and sort them by time in ascending order. In the above example, the second sorting would give you the final correct ranking.
1) PlayerB - because level 7 and 65 seconds
2) PlayerA - because level 7 and 80 seconds
3) PlayerC - because level 5 and 40 seconds
4) PlayerD - because level 1 and 200 seconds

Algorithm for aggregating stock chart datapoints out of many DB entries

I have a database with stock values in a table, for example:
id - unique id for this entry
stockId - ticker symbol of the stock
value - price of the stock
timestamp - timestamp of that price
I would like to create separate arrays for a timeframe of 24 hour, 7 days and 1 month from my database entries, each array containing datapoints for a stock chart.
For some stockIds, I have just a few data points per hour, for others it could be hundreds or thousands.
My question:
What is a good algorithm to "aggregate" the possibly many datapoints into a few - for example, for the 24 hours chart I would like to have at a maximum 10 datapoints per hour. How do I handle exceptionally high / low values?
What is the common approach in regards to stock charts?
Thank you for reading!
Some options: (assuming 10 points per hour, i.e. one roughly every 6 minutes)
For every 6 minute period, pick the data point closest to the centre of the period
For every 6 minute period, take the average of the points over that period
For an hour period, find the maximum and minimum for each 4 minutes period and pick the 5 maximum and 5 minimum in these respective sets (4 minutes is somewhat randomly chosen).
I originally thought to pick the 5 minimum points and the 5 maximum points such that each maximum point is at least 8 minutes apart, and similarly for minimum points.
The 8 minutes here is so we don't have all the points stacked up on each other. Why 8 minutes? At an even distribution, 60/5 = 12 minutes, so just moving a bit away from that gives us 8 minutes.
But, in terms of the implementation, the 4 minutes approach will be much simpler and should give similar results.
You'll have to see which one gives you the most desirable results. The last one is likely to give a decent indication of variation across the period, whereas the second one is likely to have a more stable graph. The first one can be a bit more erratic.

external sorting

In this web page: CS302 --- External Sorting
Merge the resulting runs together into successively bigger runs, until the file is sorted.
As I quoted, how can we merge the resulting runs together??? We don't have that much memory.
Imagine you have the numbers 1 - 9
9 7 2 6 3 4 8 5 1
And let's suppose that only 3 fit in memory at a time.
So you'd break them into chunks of 3 and sort each, storing each result in a separate file:
279
346
158
Now you'd open each of the three files as streams and read the first value from each:
2 3 1
Output the lowest value 1, and get the next value from that stream, now you have:
2 3 5
Output the next lowest value 2, and continue onwards until you've outputted the entire sorted list.
If you process two runs A and B into some larger run C you can do this line-by-line generating progressively larger runs, but still only reading at most 2 lines at a time. Because the process is iterative and because you're working on streams of data rather than full cuts of data you don't need to worry about memory usage. On the other hand, disk access might make the whole process slow -- but it sure beats not being able to do the work in the first place.

Resources