Sorting list for most efficient build - algorithm

I have a problem that I am having some trouble tackling, I am playing this game and thought it would be fun to make a calculator to calculate the most efficient builds.
Every fleet allows for 30 units, and a fleets power is the sum of every units power combined.
Every unit can give a certain amount of power ranging from 1 to 200. The price of these units is different and can vary from unit to unit, (A 200 power unit costs around 1.5, an a 50 power unit costs around 0.02)
I would like to calculate the cheapest build for a fleet of x amount of power.
At first I though I would get the price of all units, get the average price for 1 power and calculate the most efficient units based of price per power. And this gives me a list sorted from most efficient units, and then I would create a list containing the 30 most efficient PpP (Price per Power) units.
I could then do a check to see if my power of the generated fleet is equal or more than my desired amount, and if not, remove the lest powerful unit, and replace it with the next most efficient bigger one from my list, and repeat until it has enough power in the 30.
The problem is that seeing as the price grows exponentially it means that unit with 1-50 power are always the most efficient.
Does anyone know if there an algorithm that I could use or study?
I want to calculate the cheapest way of achieving a fleet of x amount of power with a maximum of 30 units. without it taking weeks to complete
I have a list of about 30 000 units containing the price and power.
Its a hard problem to explain so apologies if you don't understand anything
EDIT: Sorry I messed up in my description, the price of the units vary, meaning it could be more efficient to get, for example for a 3000 power fleet, maybe a 15 x 150power units and 15x 50power units fleet is better than 30x 100mp fleet. Sorry for the confusion

To improve the efficiency of your above algorithm you can find the unit whose power is the closest to 30 times the desired power without exceeding it. After that you use the same method removing units one by one and replacing it with the next higher unit. Since 30 of the next higher unit will always exceed the power required, this part of the algorithm only needs to run once.

Related

Normalizing workouts based on activity, total milage, and total time

My friends and I are competing in our own fitness challenge (Sober October) where we are keeping track of Activity, Total Time Spent Moving, and Distance. Our activities include running (outdoors), running (treadmill), running (elliptical), rowing, biking (stationary), biking (outdoors), swimming, and stair stepper.
As a group, we weren't really interested in using a calorie estimation because those results can be easily manipulated by increasing the weight that the equation uses, so we wanted to keep it based on just distance and time.
What kind of equation should I use to best normalize such exercises? I'm looking for something that would weight distance and time differently based on the activity; for example, when compared to running,biking should give more weight to time than to milage because it takes less work to go a mile on a bike than it does on foot.
I was able to find this article on how calories are calculated, and just thought about removing the weight portion of the equation to get our normalized number, but wanted to see if there was a better way to calculate what I'm looking for.
Objective measure
You are seeking an objective measurement which is independent of weight. Use METs.
A human expends a baseline of one MET sitting quietly. Maybe your measure will be excess-MET-hours.
Score = (METs - 1) × Hours
MET values
On that link above you can find reference METs values for various activities, including several of your target activities. These are independent of speed.
You can further improve the calculation by factoring in your distance/time measurements. For example, given cited METs figures:
Walking slowly (1 mph) = 2.0 MET
Walking (3 mph) = 3.0 MET
Jogging (6.8 mph) = 11.2 MET
You can fit them to a curve. Use Desmos.
So your score for walking/jogging/running is:
Excess METs = [1 + 0.2 × (miles/hours) ^ 2 - 1] × hours
You can make similar estimations for other activities.

Rapid change detection algorithm

I'm logging temperature values in a room, saving them to the database. I'd like to be alerted when temperature rises suddenly. I can't set fixed values, because 18°C is acceptable in winter and 25°C is acceptable in summer. But if it jumps from 20°C to 25°C during, let's say, 30 minutes and stays like this for 5 minutes (to eliminate false readouts), I'd like to be informed.
My current idea is to take readouts from last 30 minutes (A) and readouts from last 5 minutes (B), calculate median of A and B and check if difference between them is less then my desired threshold.
Is this correct way to solve this or is there a better algorithm? I searched for a specific one but most of them seem overcomplicated.
Thanks!
Detecting changes in a time-series is a well-researched subject, and hundreds if not thousands of papers have been written on this subject. As you've seen many methods are quite advanced, but proved to be quite useful for many use cases. Whatever method you choose, you should evaluate it against real of simulated data, and optimize its parameters for your use case.
As you require, let me suggest a very simple method that in many cases prove to be good enough, and is quite similar to that you considered.
Basically, you have two concerns:
Detecting a monotonous change in a sampled noisy signal
Ignoring false readouts
First, note that medians are not commonly used for detecting trends. For the series (1,2,3,30,35,3,2,1) the medians of 5 consecutive terms is be (3, 3, 3, 3). It is much more common to use averages.
One common trick is to throw the extreme values before averaging (e.g. for each 7 values average only the middle 5). If many false readouts are expected - try to take measurements at a faster rate, and throw more extreme values (e.g. for each 13 values average the middle 9).
Also, you should throw away unfeasible values and replace them with the last measured value (unfeasible means out of range, or non-physical change rate).
Your idea of comparing a short-period measure with a long-period measure is a good idea, and indeed it is commonly used (e.g. in econometrics).
Quoting from "Financial Econometric Models - Some Contributions to the Field [Nicolau, 2007]:
Buy and sell signals are generated by two moving averages of the price
level: a long-period average and a short-period average. A typical
moving average trading rule prescribes a buy (sell) when the
short-period moving average crosses the long-period moving average
from below (above) (i.e. when the original time series is rising
(falling) relatively fast).
When you say "rises suddenly," mathematically you are talking about the magnitude of the derivative of the temperature signal.
There is a nice algorithm to simultaneously smooth a signal and calculate its derivative called the Savitzky–Golay filter. It's explained with examples on Wikipedia, or you can use Matlab to help you generate the convolution coefficients required. Once you have the coefficients the calculation is very simple.

Comparison of Sorting Algorithms using running time in terms of seconds

I have devised a test in order to compare the different running times of my sorting algorithm with Insertion sort, bubble sort, quick sort, selection sort, and shell sort. I have based my test using the test done in this website http://warp.povusers.org/SortComparison/index.html, but I modified my test a bit.
I set up a test manager program server which generates the data, and the test manager sends it to the clients that run the different algorithms, therefore they are sorting the same data to have no bias.
I noticed that the insertion sort, bubble sort, and selection sort algorithms really did run for a very long time (some more than 15 minutes) just to sort one given data for sizes of 100,000 and 1,000,000.
So I changed the number of runs per test case for those two data sizes. My original runs for the 100,000 was 500 but I reduced it to 15, and for 1,000,000 was 100 and I reduced it to 3.
Now my professor doubts the credibility as to why I've reduced it that much, but as I've observed the running time for sorting a specific data distribution varied only by a small percentage, which is why I still find it that even though I've reduced it to that much I'd still be able to approximate the average runtime for that specific test case of that algorithm.
My question now is, is my assumption wrong? Does the machine at times make significant running time changes (>50% changes), like say for example sorting the same data over and over if a first run would give it 0.3 milliseconds will the second run give as much difference as making it run for 1.5 seconds? Because from my observation, the running times don't vary largely given the same type of test distribution (e.g. completely random, completely sorted, completely reversed).
What you are looking for is a way to measure error in your experiments. My favorite book on subject is Error Analysis by Taylor and Chapter 4 has what you need which I'll summarize here.
You need to calculate Standard error of the mean or SDOM. First calculate mean and standard deviation (formulas are on Wikipedia and quite simple). Your SDOM is standard deviation divided by square root of number of measurements. Assuming your timings have Normal distribution (which it should), the twice the value of SDOM is a very common way to specify +/- error.
For example, let's say you run sorting algorithm 5 times and get following numbers: 5, 6, 7, 4, 5. Then mean is 5.4 and standard deviation is 1.1. Therefor SDOM is 1.1/sqrt(5) = 0.5. So 2*SDOM = 1. Now you can say that algorithm rum time was 5.4 ± 1. You professor can determine if this is acceptable error in measurement. Notice that as you take more readings, your SDOM, i.e. plus or minus error, goes down inversely proportional to square root of N. Twice of SDOM interval has 95% probability or confidence that the true value lies within the interval which is accepted standard.
Also you most likely want to measure performance by measuring CPU time instead of simple timer. Modern CPUs are too complex with various cache level and pipeline optimizations and you might end up getting less accurate measurement if you are using timer. More about CPU time is in this answer: How can I measure CPU time and wall clock time on both Linux/Windows?
It absolutely does. You need a variety of "random" samples in order to be able to draw proper conclusions about the population.
Look at it this way. It takes a long time to poll 100,000 people in the U.S. about their political stance. If we reduce the sample size to 100 people in order to complete it faster, we not only reduce the precision of our final result (2 decimal places rather than 5), we also introduce a larger chance that the members of the sample have a specific bias (there is a greater chance that 100 people out of 3xx,000,000 think the same way than 100,000 out of those same 3xx,000,000).
Your professor is right, however he's not provided the details that I mention some of them here :
Sampling issue: It's right that you generate some random numbers and feed them to your sorting methods, but with a few test cases indeed you're biased cause almost all of the random functions are biased to some extent (specially to the state of machine or time at the moment), so you should use more and more test cases to be more confident about the randomness.
Machine state: Suppose you've provide perfect data (fully representative of a uniform distribution), the performance of the electro-mechanical devises like computers may vary in different situations, so you should try for considerable times to smooth the effects of these phenomena.
Note : In advanced technical reports, you should provide a confidence coefficient for the answers you provide derived from statistical analysis, and proven step by step, but if you don't need to be that much exact, simply increase these :
The size of the data
The number of tests

Algorithm needed - benelux contest 2007

This question (last one) appeared in Benelux Algorithm Programming Contest-2007
http://www.cs.duke.edu/courses/cps149s/spring08/problems/bapc07/allprobs.pdf
Problem Statement in short:
A Company needs to figure out strategy when to - buy OR sell OR no-op on a given input so as to maximise profit. Input is in the form:
6
4 4 2
2 9 3
....
....
It means input is given for 6 days.
Day 1: You get 4 shares, each with price 4$ and at-max you can sell 2 of them
Day 2: You get 2 shares, each with price 9$ and at-max you can sell 3 of them
.
We need to output the maximum profit which can be achieved.
I m thinking about how to go for this problem. It seems to me that if we apply brute force, it will take too much time. If this can be converted to some DP problem like 0-1 Knapsack? Some help will be highly appreciated.
it can be solved by DP
suppose there are n days, and the total number of stock shares is m
let f[i][j] means, at the ith day, with j shares remaining, the maximum profit is f[i][j]
obviously, f[i][j]=maximum(f[i-1][j+k]+k*price_per_day[i]), 0<=k<=maximum_shares_sell_per_day[i]
it can be further optimized that, since f[i][...] only depends on f[i-1][...], a rolling array can be used here. hence u need only to define f[2][m] to save space.
total time complexity is O(n*m*maximum_shares_sell_per_day).
perhaps it can be further optimized to save time. any feedback is welcome
Your description does not quite match the last problem in the PDF - in the PDF you receive the number of shares specified in the first column (or are forced to buy them - since there is no decision to make it does not matter) and can only decide on how many shares to sell. Since it does not say otherwise I presume that short selling is not allowed (else ignore everything except the price and go make so much money on the derivatives market that you afford to both bribe the SEC or congress and retire :-)).
This looks like a dynamic program, where the state at each point in time is the total number of shares you have in hand. So at time n you have an array with one element for each possible number of shares you might have ended up with at that time, and in that element you have the maximum amount of money you can make up to then while ending up with that number of shares. From this you can work out the same information for time n+1. When you reach the end, then all your shares are worthless so the best answer is the one associated with the maximum amount of money.
We can't do better than selling the maximum amount of shares we can on the day with the highest price, so I was thinking: (this may be somewhat difficult to implement (efficiently))
It may be a good idea to calculate the total number of shares received so far for each day to improve the efficiency of the algorithm.
Process the days in decreasing order of price.
For a day, sell amount = min(daily sell limit, shares available) (for the max price day (the first processed day), shares available = shares received to date).
For all subsequent days, shares available -= sell amount. For preceding days, we binary search for (shares available - shares sold) and all entries between that and the day just processed = 0.
We might not need to physically set the values (at least not at every step), just calculate them on-the-fly from the history thus-far (I'm thinking interval tree or something similar).

slot machine payout calculation

There's this question but it has nothing close to help me out here.
Tried to find information about it on the internet yet this subject is so swarmed with articles on "how to win" or other non-related stuff that I could barely find anything. None worth posting here.
My question is how would I assure a payout of 95% over a year?
Theoretically, of course.
So far I can think of three obvious variables to consider within the calculation: Machine payout term (year in my case), total paid and total received in that term.
Now I could simply shoot a random number between the paid/received gap and fix slots results to be shown to the player but I'm not sure this is how it's done.
This method however sounds reasonable, although it involves building the slots results backwards..
I could also make a huge list of all possibilities, save them in a database randomized by order and simply poll one of them each time.
This got many flaws - the biggest one is the huge list I'm going to get (millions/billions/etc' records).
I certainly hope this question will be marked with an "Answer" (:
You have to make reel strips instead of huge database. Here is brief example for very basic 3-reel game containing 3 symbols:
Paytable:
3xA = 5
3xB = 10
3xC = 20
Reels-strip is a sequence of symbols on each reel. For the calculations you only need the quantity of each symbol per each reel:
A = 3, 1, 1 (3 symbols on 1st reel, 1 symbol on 2nd, 1 symbol on 3rd reel)
B = 1, 1, 2
C = 1, 1, 1
Full cycle (total number of all possible combinations) is 5 * 3 * 4 = 60
Now you can calculate probability of each combination:
3xA = 3 * 1 * 1 / full cycle = 0.05
3xB = 1 * 1 * 2 / full cycle = 0.0333
3xC = 1 * 1 * 1 / full cycle = 0.0166
Then you can calculate the return for each combination:
3xA = 5 * 0.05 = 0.25 (25% from AAA)
3xB = 10 * 0.0333 = 0.333 (33.3% from BBB)
3xC = 20 * 0.0166 = 0.333 (33.3% from CCC)
Total return = 91.66%
Finally, you can shuffle the symbols on each reel to get the reels-strips, e.g. "ABACA" for the 1st reel. Then pick a random number between 1 and the length of the strip, e.g. 1 to 5 for the 1st reel. This number is the middle symbol. The upper and lower ones are from the strip. If you picked from the edge of the strip, use the first or last one to loop the strip (it's a virtual reel). Then score the result.
In real life you might want to have Wild-symbols, free spins and bonuses. They all are pretty complicated to describe in this answer.
In this sample the Hit Frequency is 10% (total combinations = 60 and prize combinations = 6). Most of people use excel to calculate this stuff, however, you may find some good tools for making slot math.
Proper keywords for Google: PAR-sheet, "slot math can be fun" book.
For sweepstakes or Class-2 machines you can't use this stuff. You have to display a combination by the given prize instead. This is a pretty different task, so you may try to prepare a database storing the combinations sorted by the prize amount.
Well, the first problem is with the keyword assure, if you are dealing with random, you cannot assure, unless you change the logic of the slot machine.
Consider the following algorithm though. I think this style of thinking is more reliable then plotting graphs of averages to achive 95%;
if( customer_able_to_win() )
{
calculate_how_to_win();
}
else
no_win();
customer_able_to_win() is your data log that says how much intake you have gotten vs how much you have paid out, if you are under 95%, payout, then customer_able_to_win() returns true; in that case, calculate_how_to_win() calculates how much the customer would be able to win based on your %, so, lets choose a sampling period of 24 hours. If over the last 24 hours i've paid out 90% of the money I've taken in, then I can pay out up to 5%.... lets give that 5% a number such as 100$. So calculate_how_to_win says I can pay out up to 100$, so I would find a set of reels that would pay out 100$ or less, and that user could win. You could add a little random to it, but to ensure your 95% you'll have to have some other rules such as a forced max payout if you get below say 80%, and so on.
If you change the algorithm a little by adding random to the mix you will have to have more of these caveats..... So to make it APPEAR random to the user, you could do...
if( customer_able_to_win() && payout_percent() < 90% )
{
calculate_how_to_win(); // up to 5% payout
}
else
no_win();
With something like that, it will go on a losing streak after you hit 95% until you reach 90%, then it will go on a winning streak of random increments until you reach 95%.
This isn't a full algorithm answer, but more of a direction on how to think about how the slot machine works.
I've always envisioned this is the way slot machines work especially with video poker. Because the no_win() function would calculate how to lose, but make it appear to be 1 card off to tease you to think you were going to win, instead of dealing with a 'fair' game and the random just happens to be like that....
Think of the entire process of.... first thinking if you are going to win, how are you going to win, if you're not going to win, how are you going to lose, instead of random number generators determining if you will win or not.
I worked many years ago for an internet casino in Australia, this one being the only one in the world that was regulated completely by a government body. The algorithms you speak of that produce "structured randomness" are obviously extremely complex especially when you are talking multiple lines in all directions, double up, pick the suit, multiple progressive jackpots and the like.
Our poker machine laws for our state demand a payout of 97% of what goes in. For rudely to be satisfied that our machine did this, they made us run 10 million mock turns of the machine and then wanted to see that our game paid off at what the law states with the tiniest range of error (we had many many machines running a script to auto playing using a script to simulate the click for about a week before we hit the 10 mil).
Anyhow the algorithms you speak of are EXPENSIVE! They range from maybe $500k to several million per machine so as you can understand, no one is going to hand them over for free, that's for sure. If you wanted a single line machine it would be easy enough to do. Just work out you symbols/cards and what pay structure you want for each. Then you could just distribute those payouts amongst non-payouts till you got you respective figure. Obviously the more options there are means the longer it will take to pay out at that respective rate, it may even payout more early in the piece. Hit frequency and prize size are also factors you may want to consider
A simple way to do it, if you assume that people win a constant number of times a time period:
Create a collection of all possible tumbler combinations with how much each one pays out.
The first time someone plays, in that time period, you can offer all combinations at equal probability.
If they win, take that amount off the total left for the time period, and remove from the available options any combination that would payout more than you have left.
Repeat with the reduced combinations until all the money is gone for that time period.
Reset and start again for the next time period.

Resources