Algorithm to find best 8 minute window in a 1 hour run - algorithm

An activity runs for a bit more than an hour. I need to get the best 8 minute window, where some parameters are maximum.
For example, a value x is measured every second. If my activity runs for one hour, I get 3600 values for x. I need to find the best continuous 8 minute time interval where the x value was the highest among all the others.
If I capture, say, from 0th minute to 8th minute, there may be another time frame like 0.4 to 8.4 where it was maximum. The granularity is one second. We need to consider every second.
Please help me with the design.

What is preventing you from just trying every possibility? 3600 values is not many. You can run through them in O(n*m) time where n in the number of data points and m in the number of points in the window.
If you want to make an optimization you can do it in O(n) by using a sliding window. Keep a queue of all the values of x within the first 8 minutes, and the total for that 8 minutes. Every time you advance one second you can dequeue the oldest value of x and subtract it from your running total. Then add the new value of x to the queue and add it to the total.
But this optimization hardly seems necessary for the size of problem you have. I'd recommend keeping it simple unless the extra performance is necessary.

Something along the lines of this:
int[] xs = { 1, 9, 5, 6, 14, 9, 6, 1, 5, 4, 7, 16, 8, 3, 2 };
int bestEndPos = 0;
int bestSum = 0;
int currentSum = 0;
for (int i = 0; i < xs.length; i++) {
// Add the current number to the sum.
currentSum += xs[i];
// Subtract the number falling out behind us.
if (i >= 8)
currentSum -= xs[i - 8];
// Record our new "best-last-index" if we beat our previous record!
if (currentSum > bestSum) {
bestEndPos = i;
bestSum = currentSum;
}
}
System.out.println("Best interval: " + (bestEndPos - 8 + 1) + "-" + bestEndPos);
System.out.println("Has interval-sum of: " + bestSum);

In Haskell!
module BestWindow where
import Data.List (tails, maximumBy, genericTake)
import Data.Ord (comparing)
import System.Random (mkStdGen, random)
type Value = Integer
type Seconds = Integer
type Window = [Seconds]
getVal :: Seconds -> Value
getVal = fst . random . mkStdGen . fromIntegral
winVal :: Window -> Value
winVal = sum . map getVal
bestWindow :: Seconds -> Seconds -> (Value, Window)
bestWindow winTime totTime = maximumBy (comparing fst) valWins
where
secs = [0 .. totTime]
wins = map (genericTake winTime) (tails secs)
vals = map winVal wins
valWins = zip vals wins
Usage:
bestWindow (8 * 60) (60 * 60) -- gives value of window and list of window seconds

there are many windows which will capture the peak value of x -- you need to define what you want a little better. the window the the maximum average value of x? of all the windows which contain the peak, the one with the highest average? lowest signal to noise ratio? the first window which contains the peak?

This is maximum subsequence problem and it can be done in O(N). google for examples.

Related

Find the sum of least common multiples of all subsets of a given set

Given: set A = {a0, a1, ..., aN-1} (1 &leq; N &leq; 100), with 2 &leq; ai &leq; 500.
Asked: Find the sum of all least common multiples (LCM) of all subsets of A of size at least 2.
The LCM of a setB = {b0, b1, ..., bk-1} is defined as the minimum integer Bmin such that bi | Bmin, for all 0 &leq; i < k.
Example:
Let N = 3 and A = {2, 6, 7}, then:
LCM({2, 6}) = 6
LCM({2, 7}) = 14
LCM({6, 7}) = 42
LCM({2, 6, 7}) = 42
----------------------- +
answer 104
The naive approach would be to simply calculate the LCM for all O(2N) subsets, which is not feasible for reasonably large N.
Solution sketch:
The problem is obtained from a competition*, which also provided a solution sketch. This is where my problem comes in: I do not understand the hinted approach.
The solution reads (modulo some small fixed grammar issues):
The solution is a bit tricky. If we observe carefully we see that the integers are between 2 and 500. So, if we prime factorize the numbers, we get the following maximum powers:
2 8
3 5
5 3
7 3
11 2
13 2
17 2
19 2
Other than this, all primes have power 1. So, we can easily calculate all possible states, using these integers, leaving 9 * 6 * 4 * 4 * 3 * 3 * 3 * 3 states, which is nearly 70000. For other integers we can make a dp like the following: dp[70000][i], where i can be 0 to 100. However, as dp[i] is dependent on dp[i-1], so dp[70000][2] is enough. This leaves the complexity to n * 70000 which is feasible.
I have the following concrete questions:
What is meant by these states?
Does dp stand for dynamic programming and if so, what recurrence relation is being solved?
How is dp[i] computed from dp[i-1]?
Why do the big primes not contribute to the number of states? Each of them occurs either 0 or 1 times. Should the number of states not be multiplied by 2 for each of these primes (leading to a non-feasible state space again)?
*The original problem description can be found from this source (problem F). This question is a simplified version of that description.
Discussion
After reading the actual contest description (page 10 or 11) and the solution sketch, I have to conclude the author of the solution sketch is quite imprecise in their writing.
The high level problem is to calculate an expected lifetime if components are chosen randomly by fair coin toss. This is what's leading to computing the LCM of all subsets -- all subsets effectively represent the sample space. You could end up with any possible set of components. The failure time for the device is based on the LCM of the set. The expected lifetime is therefore the average of the LCM of all sets.
Note that this ought to include the LCM of sets with only one item (in which case we'd assume the LCM to be the element itself). The solution sketch seems to sabotage, perhaps because they handled it in a less elegant manner.
What is meant by these states?
The sketch author only uses the word state twice, but apparently manages to switch meanings. In the first use of the word state it appears they're talking about a possible selection of components. In the second use they're likely talking about possible failure times. They could be muddling this terminology because their dynamic programming solution initializes values from one use of the word and the recurrence relation stems from the other.
Does dp stand for dynamic programming?
I would say either it does or it's a coincidence as the solution sketch seems to heavily imply dynamic programming.
If so, what recurrence relation is being solved? How is dp[i] computed from dp[i-1]?
All I can think is that in their solution, state i represents a time to failure , T(i), with the number of times this time to failure has been counted, dp[i]. The resulting sum would be to sum all dp[i] * T(i).
dp[i][0] would then be the failure times counted for only the first component. dp[i][1] would then be the failure times counted for the first and second component. dp[i][2] would be for the first, second, and third. Etc..
Initialize dp[i][0] with zeroes except for dp[T(c)][0] (where c is the first component considered) which should be 1 (since this component's failure time has been counted once so far).
To populate dp[i][n] from dp[i][n-1] for each component c:
For each i, copy dp[i][n-1] into dp[i][n].
Add 1 to dp[T(c)][n].
For each i, add dp[i][n-1] to dp[LCM(T(i), T(c))][n].
What is this doing? Suppose you knew that you had a time to failure of j, but you added a component with a time to failure of k. Regardless of what components you had before, your new time to fail is LCM(j, k). This follows from the fact that for two sets A and B, LCM(A union B} = LCM(LCM(A), LCM(B)).
Similarly, if we're considering a time to failure of T(i) and our new component's time to failure of T(c), the resultant time to failure is LCM(T(i), T(c)). Note that we recorded this time to failure for dp[i][n-1] configurations, so we should record that many new times to failure once the new component is introduced.
Why do the big primes not contribute to the number of states?
Each of them occurs either 0 or 1 times. Should the number of states not be multiplied by 2 for each of these primes (leading to a non-feasible state space again)?
You're right, of course. However, the solution sketch states that numbers with large primes are handled in another (unspecified) fashion.
What would happen if we did include them? The number of states we would need to represent would explode into an impractical number. Hence the author accounts for such numbers differently. Note that if a number less than or equal to 500 includes a prime larger than 19 the other factors multiply to 21 or less. This makes such numbers amenable for brute forcing, no tables necessary.
The first part of the editorial seems useful, but the second part is rather vague (and perhaps unhelpful; I'd rather finish this answer than figure it out).
Let's suppose for the moment that the input consists of pairwise distinct primes, e.g., 2, 3, 5, and 7. Then the answer (for summing all sets, where the LCM of 0 integers is 1) is
(1 + 2) (1 + 3) (1 + 5) (1 + 7),
because the LCM of a subset is exactly equal to the product here, so just multiply it out.
Let's relax the restriction that the primes be pairwise distinct. If we have an input like 2, 2, 3, 3, 3, and 5, then the multiplication looks like
(1 + (2^2 - 1) 2) (1 + (2^3 - 1) 3) (1 + (2^1 - 1) 5),
because 2 appears with multiplicity 2, and 3 appears with multiplicity 3, and 5 appears with multiplicity 1. With respect to, e.g., just the set of 3s, there are 2^3 - 1 ways to choose a subset that includes a 3, and 1 way to choose the empty set.
Call a prime small if it's 19 or less and large otherwise. Note that integers 500 or less are divisible by at most one large prime (with multiplicity). The small primes are more problematic. What we're going to do is to compute, for each possible small portion of the prime factorization of the LCM (i.e., one of the ~70,000 states), the sum of LCMs for the problem derived by discarding the integers that could not divide such an LCM and leaving only the large prime factor (or 1) for the other integers.
For example, if the input is 2, 30, 41, 46, and 51, and the state is 2, then we retain 2 as 1, discard 30 (= 2 * 3 * 5; 3 and 5 are small), retain 41 as 41 (41 is large), retain 46 as 23 (= 2 * 23; 23 is large), and discard 51 (= 3 * 17; 3 and 17 are small). Now, we compute the sum of LCMs using the previously described technique. Use inclusion-exclusion to get rid of the subsets whose LCM whose small portion properly divides the state instead of being exactly equal. Maybe I'll work a complete example later.
What is meant by these states?
I think here, states refer to if the number is in set B = {b0, b1, ..., bk-1} of LCMs of set A.
Does dp stand for dynamic programming and if so, what recurrence relation is being solved?
dp in the solution sketch stands for dynamic programming, I believe.
How is dp[i] computed from dp[i-1]?
It's feasible that we can figure out the state of next group of LCMs from previous states. So, we only need array of 2, and toggle back and forth.
Why do the big primes not contribute to the number of states? Each of them occurs either 0 or 1 times. Should the number of states not be multiplied by 2 for each of these primes (leading to a non-feasible state space again)?
We can use Prime Factorization and exponents only to present the number.
Here is one example.
6 = (2^1)(3^1)(5^0) -> state "1 1 0" to represent 6
18 = (2^1)(3^2)(5^0) -> state "1 2 0" to represent 18
Here is how we can get LMC of 6 and 18 using Prime Factorization
LCM (6,18) = (2^(max(1,1)) (3^ (max(1,2)) (5^max(0,0)) = (2^1)(3^2)(5^0) = 18
2^9 > 500, 3^6 > 500, 5^4 > 500, 7^4>500, 11^3 > 500, 13^3 > 500, 17^3 > 500, 19^3 > 500
we can use only count of exponents of prime number 2,3,5,7,11,13,17,19 to represent the LCMs in the set B = {b0, b1, ..., bk-1}
for the given set A = {a0, a1, ..., aN-1} (1 ≤ N ≤ 100), with 2 ≤ ai ≤ 500.
9 * 6 * 4 * 4 * 3 * 3 * 3 * 3 <= 70000, so we only need two of dp[9][6][4][4][3][3][3][3] to keep tracks of all LCMs' states. So, dp[70000][2] is enough.
I put together a small C++ program to illustrate how we can get sum of LCMs of the given set A = {a0, a1, ..., aN-1} (1 ≤ N ≤ 100), with 2 ≤ ai ≤ 500. In the solution sketch, we need to loop through 70000 max possible of LCMs.
int gcd(int a, int b) {
int remainder = 0;
do {
remainder = a % b;
a = b;
b = remainder;
} while (b != 0);
return a;
}
int lcm(int a, int b) {
if (a == 0 || b == 0) {
return 0;
}
return (a * b) / gcd(a, b);
}
int sum_of_lcm(int A[], int N) {
// get the max LCM from the array
int max = A[0];
for (int i = 1; i < N; i++) {
max = lcm(max, A[i]);
}
max++;
//
int dp[max][2];
memset(dp, 0, sizeof(dp));
int pri = 0;
int cur = 1;
// loop through n x 70000
for (int i = 0; i < N; i++) {
for (int v = 1; v < max; v++) {
int x = A[i];
if (dp[v][pri] > 0) {
x = lcm(A[i], v);
dp[v][cur] = (dp[v][cur] == 0) ? dp[v][pri] : dp[v][cur];
if ( x % A[i] != 0 ) {
dp[x][cur] += dp[v][pri] + dp[A[i]][pri];
} else {
dp[x][cur] += ( x==v ) ? ( dp[v][pri] + dp[v][pri] ) : ( dp[v][pri] ) ;
}
}
}
dp[A[i]][cur]++;
pri = cur;
cur = (pri + 1) % 2;
}
for (int i = 0; i < N; i++) {
dp[A[i]][pri] -= 1;
}
long total = 0;
for (int j = 0; j < max; j++) {
if (dp[j][pri] > 0) {
total += dp[j][pri] * j;
}
}
cout << "total:" << total << endl;
return total;
}
int test() {
int a[] = {2, 6, 7 };
int n = sizeof(a)/sizeof(a[0]);
int total = sum_of_lcm(a, n);
return 0;
}
Output
total:104
The states are one more than the powers of primes. You have numbers up to 2^8, so the power of 2 is in [0..8], which is 9 states. Similarly for the other states.
"dp" could well stand for dynamic programming, I'm not sure.
The recurrence relation is the heart of the problem, so you will learn more by solving it yourself. Start with some small, simple examples.
For the large primes, try solving a reduced problem without using them (or their equivalents) and then add them back in to see their effect on the final result.

Find all possible combinations from 4 input numbers which can add up to 24

Actually, this question can be generalized as below:
Find all possible combinations from a given set of elements, which meets
a certain criteria.
So, any good algorithms?
There are only 16 possibilities (and one of those is to add together "none of them", which ain't gonna give you 24), so the old-fashioned "brute force" algorithm looks pretty good to me:
for (unsigned int choice = 1; choice < 16; ++choice) {
int sum = 0;
if (choice & 1) sum += elements[0];
if (choice & 2) sum += elements[1];
if (choice & 4) sum += elements[2];
if (choice & 8) sum += elements[3];
if (sum == 24) {
// we have a winner
}
}
In the completely general form of your problem, the only way to tell whether a combination meets "certain criteria" is to evaluate those criteria for every single combination. Given more information about the criteria, maybe you could work out some ways to avoid testing every combination and build an algorithm accordingly, but not without those details. So again, brute force is king.
There are two interesting explanations about the sum problem, both in Wikipedia and MathWorld.
In the case of the first question you asked, the first answer is good for a limited number of elements. You should realize that the reason Mr. Jessop used 16 as the boundary for his loop is because this is 2^4, where 4 is the number of elements in your set. If you had 100 elements, the loop limit would become 2^100 and your algorithm would literally take forever to finish.
In the case of a bounded sum, you should consider a depth first search, because when the sum of elements exceeds the sum you are looking for, you can prune your branch and backtrack.
In the case of the generic question, finding the subset of elements that satisfy certain criteria, this is known as the Knapsack problem, which is known to be NP-Complete. Given that, there is no algorithm that will solve it in less than exponential time.
Nevertheless, there are several heuristics that bring good results to the table, including (but not limited to) genetic algorithms (one I personally like, for I wrote a book on them) and dynamic programming. A simple search in Google will show many scientific papers that describe different solutions for this problem.
Find all possible combinations from a given set of elements, which
meets a certain criteria
If i understood you right, this code will helpful for you:
>>> from itertools import combinations as combi
>>> combi.__doc__
'combinations(iterable, r) --> combinations object\n\nReturn successive r-length
combinations of elements in the iterable.\n\ncombinations(range(4), 3) --> (0,1
,2), (0,1,3), (0,2,3), (1,2,3)'
>>> set = range(4)
>>> set
[0, 1, 2, 3]
>>> criteria = range(3)
>>> criteria
[0, 1, 2]
>>> for tuple in list(combi(set, len(criteria))):
... if cmp(list(tuple), criteria) == 0:
... print 'criteria exists in tuple: ', tuple
...
criteria exists in tuple: (0, 1, 2)
>>> list(combi(set, len(criteria)))
[(0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3)]
Generally for a problem as this you have to try all posebilities, the thing you should do have the code abort the building of combiantion if you know it will not satesfie the criteria (if you criteria is that you do not have more then two blue balls, then you have to abort calculation that has more then two). Backtracing
def perm(set,permutation):
if lenght(set) == lenght(permutation):
print permutation
else:
for element in set:
if permutation.add(element) == criteria:
perm(sett,permutation)
else:
permutation.pop() //remove the element added in the if
The set of input numbers matters, as you can tell as soon as you allow e.g. negative numbers, imaginary numbers, rational numbers etc in your start set. You could also restrict to e.g. all even numbers, all odd number inputs etc.
That means that it's hard to build something deductive. You need brute force, a.k.a. try every combination etc.
In this particular problem you could build an algoritm that recurses - e.g. find every combination of 3 Int ( 1,22) that add up to 23, then add 1, every combination that add to 22 and add 2 etc. Which can again be broken into every combination of 2 that add up to 21 etc. You need to decide if you can count same number twice.
Once you have that you have a recursive function to call -
combinations( 24 , 4 ) = combinations( 23, 3 ) + combinations( 22, 3 ) + ... combinations( 4, 3 );
combinations( 23 , 3 ) = combinations( 22, 2 ) + ... combinations( 3, 2 );
etc
This works well except you have to be careful around repeating numbers in the recursion.
private int[][] work()
{
const int target = 24;
List<int[]> combos = new List<int[]>();
for(int i = 0; i < 9; i++)
for(int x = 0; x < 9; x++)
for(int y = 0; y < 9; y++)
for (int z = 0; z < 9; z++)
{
int res = x + y + z + i;
if (res == target)
{
combos.Add(new int[] { x, y, z, i });
}
}
return combos.ToArray();
}
It works instantly, but there probably are better methods rather than 'guess and check'. All I am doing is looping through every possibility, adding them all together, and seeing if it comes out to the target value.
If i understand your question correctly, what you are asking for is called "Permutations" or the number (N) of possible ways to arrange (X) numbers taken from a set of (Y) numbers.
N = Y! / (Y - X)!
I don't know if this will help, but this is a solution I came up with for an assignment on permutations.
You have an input of : 123 (string) using the substr functions
1) put each number of the input into an array
array[N1,N2,N3,...]
2)Create a swap function
function swap(Number A, Number B)
{
temp = Number B
Number B = Number A
Number A = temp
}
3)This algorithm uses the swap function to move the numbers around until all permutations are done.
original_string= '123'
temp_string=''
While( temp_string != original_string)
{
swap(array element[i], array element[i+1])
if (i == 1)
i == 0
temp_string = array.toString
i++
}
Hopefully you can follow my pseudo code, but this works at least for 3 digit permutations
(n X n )
built up a square matrix of nxn
and print all together its corresponding crossed values
e.g.
1 2 3 4
1 11 12 13 14
2 .. .. .. ..
3 ..
4 .. ..

How would you look at developing an algorithm for this hotel problem?

There is a problem I am working on for a programming course and I am having trouble developing an algorithm to suit the problem. Here it is:
You are going on a long trip. You start on the road at mile post 0. Along the way there are n
hotels, at mile posts a1 < a2 < ... < an, where each ai is measured from the starting point. The
only places you are allowed to stop are at these hotels, but you can choose which of the hotels
you stop at. You must stop at the final hotel (at distance an), which is your destination.
You'd ideally like to travel 200 miles a day, but this may not be possible (depending on the spacing
of the hotels). If you travel x miles during a day, the penalty for that day is (200 - x)^2. You want
to plan your trip so as to minimize the total penalty that is, the sum, over all travel days, of the
daily penalties.
Give an efficient algorithm that determines the optimal sequence of hotels at which to stop.
So, my intuition tells me to start from the back, checking penalty values, then somehow match them going back the forward direction (resulting in an O(n^2) runtime, which is optimal enough for the situation).
Anyone see any possible way to make this idea work out or have any ideas on possible implmentations?
If x is a marker number, ax is the mileage to that marker, and px is the minimum penalty to get to that marker, you can calculate pn for marker n if you know pm for all markers m before n.
To calculate pn, find the minimum of pm + (200 - (an - am))^2 for all markers m where am < an and (200 - (an - am))^2 is less than your current best for pn (last part is optimization).
For the starting marker 0, a0 = 0 and p0 = 0, for marker 1, p1 = (200 - a1)^2. With that starting information you can calculate p2, then p3 etc. up to pn.
edit: Switched to Java code, using the example from OP's comment. Note that this does not have the optimization check described in second paragraph.
public static void printPath(int path[], int i) {
if (i == 0) return;
printPath(path, path[i]);
System.out.print(i + " ");
}
public static void main(String args[]) {
int hotelList[] = {0, 200, 400, 600, 601};
int penalties[] = {0, (int)Math.pow(200 - hotelList[1], 2), -1, -1, -1};
int path[] = {0, 0, -1, -1, -1};
for (int i = 2; i <= hotelList.length - 1; i++) {
for(int j = 0; j < i; j++){
int tempPen = (int)(penalties[j] + Math.pow(200 - (hotelList[i] - hotelList[j]), 2));
if(penalties[i] == -1 || tempPen < penalties[i]){
penalties[i] = tempPen;
path[i] = j;
}
}
}
for (int i = 1; i < hotelList.length; i++) {
System.out.print("Hotel: " + hotelList[i] + ", penalty: " + penalties[i] + ", path: ");
printPath(path, i);
System.out.println();
}
}
Output is:
Hotel: 200, penalty: 0, path: 1
Hotel: 400, penalty: 0, path: 1 2
Hotel: 600, penalty: 0, path: 1 2 3
Hotel: 601, penalty: 1, path: 1 2 4
It looks like you can solve this problem with dynamic programming. The subproblem is the following:
d(i) : The minimum penalty possible when travelling from the start to hotel i.
The recursive formula is as follows:
d(0) = 0 where 0 is the starting position.
d(i) = min_{j=0, 1, ... , i-1} ( d(j) + (200-(ai-aj))^2)
The minimum penalty for reaching hotel i is found by trying all stopping places for the previous day, adding today's penalty and taking the minimum of those.
In order to find the path, we store in a separate array (path[]) which hotel we had to travel from in order to achieve the minimum penalty for that particular hotel. By traversing the array backwards (from path[n]) we obtain the path.
The runtime is O(n^2).
This is equivalent to finding the shortest path between two nodes in a directional acyclic graph. Dijkstra's algorithm will run in O(n^2) time.
Your intuition is better, though. Starting at the back, calculate the minimum penalty of stopping at that hotel. The first hotel's penalty is just (200-(200-x)^2)^2. Then, for each of the other hotels (in reverse order), scan forward to find the lowest-penalty hotel. A simple optimization is to stop as soon as the penalty costs start increasing, since that means you've overshot the global minimum.
I don't think you can do it as easily as sysrqb states.
On a side note, there is really no difference to starting from start or end; the goal is to find the minimum amount of stops each way, where each stop is as close to 200m as possible.
The question as stated seems to allow travelling beyond 200m per day, and the penalty is equally valid for over or under (since it is squared). This prefers an overage of miles per day rather than underage, since the penalty is equal, but the goal is closer.
However, given this layout
A ----- B----C-------D------N
0 190 210 390 590
It is not always true. It is better to go to B->D->N for a total penalty of only (200-190)^2 = 100. Going further via C->D->N gives a penalty of 100+400=500.
The answer looks like a full breadth first search with active pruning if you already have an optimal solution to reach point P, removing all solutions thus far where
sum(penalty-x) > sum(penalty-p) AND distance-to-x <= distance-to-p - 200
This would be an O(n^2) algorithm
Something like...
Quicksort all hotels by distance from start (discard any that have distance > hotelN)
Create an array/list of solutions, each containing (ListOfHotels, I, DistanceSoFar, Penalty)
Inspect each hotel in order, for each hotel_I
Calculate penalty to I, starting from each prior solution
Pruning
For each prior solution that is beyond 200 distanceSoFar from
current, and Penalty>current.penalty, remove it from list
loop
Following is the MATLAB code for hotel problem.
clc
clear all
% Data
% a = [0;50;180;150;50;40];
% a = [0, 200, 400, 600, 601];
a = [0,10,180,350,450,600];
% a = [0,1,2,3,201,202,203,403];
n = length(a);
opt(1) = 0;
prev(1)= 1;
for i=2:n
opt(i) =Inf;
for j = 1:i-1
if(opt(i)>(opt(j)+ (200-a(i)+a(j))^2))
opt(i)= opt(j)+ (200-a(i)+a(j))^2;
prev(i) = j;
end
end
S(i) = opt(i);
end
k = 1;
i = n;
sol(1) = n;
while(i>1)
k = k+1;
sol(k)=prev(i);
i = prev(i);
end
for i =k:-1:1
stops(i) = sol(i);
end
stops
Step 1 of 2
Sub-problem:
In this scenario, "C(j)" has been considered as sub-problem for minimum penalty gained up to the hotel "ai" when "0<=i<=n". The required value for the problem is "C(n)".
Algorithm to find minimum total penalty:
If the trip is stopped at the location "aj" then the previous stop will be "ai" and the value of i and should be less than j. Then all the possibilities of "ai", has been follows:
C(j) min{C(i)+(200-(aj-ai))^2}, 0<=i<=j.
Initialize the value of "C(0)" as "0" and “a0" as "0" to find the remaining values.
To find the optimal route, increase the value of "j" and "i" for each iteration of and use this detail to backtrack from "C(n)".
Here, "C(n)" refers the penalty of the last hotel (That is, the value of "i" is between "0" and "n").
Pseudocode:
//Function definition
Procedure min_tot()
//Outer loop to represent the value of for j = 1 to n:
//Calculate the distance of each stop C(j) = (200 — aj)^2
//Inner loop to represent the value of for i=1 to j-1:
//Compute total penalty and assign the minimum //total penalty to
"c(j)"
C(j) = min (C(i), C(i) + (200 — (aj — ai))^2}
//Return the value of total penalty of last hotel
return C(n)
Step 2 of 2
Explanation:
The above algorithm is used to find the minimum total penalty from the starting point to the end point.
It uses the function "min()" to find the total penalty for the each stop in the trip and computes the minimum
penalty value.
Running time of the algorithm:
This algorithm contains "n" sub-problems and each sub-problem take "O(n)" times to resolve.
It is needed to compute only the minimum values of "O(n)".
And the backtracking process takes "O(n)" times.
The total running time of the algorithm is nxn = n^2 = O(n^2) .
Therefore, this algorithm totally takes "0(n^2)" times to solve the whole problem.
I have come across this problem recently and wanted to share my solution written in Javascript.
Not dissimilar to the most of the above solutions, I have used dynamic programming approach. To calculate penalties[i], we need to search for such stopping place for the previous day so that the penalty is minimum.
penalties(i) = min_{j=0, 1, ... , i-1} ( penalties(j) + (200-(hotelList[i]-hotelList[j]))^2) The solution does not assume that the first penalty is Math.pow(200 - hotelList[1], 2). We don't know whether or not it is optimal to stop at the first top so this assumption should not be made.
In order to find the optimal path and store all the stops along the way, the helper array path is being used. Finally, the array is being traversed backwards to calculate the finalPath.
function calculateOptimalRoute(hotelList) {
const path = [];
const penalties = [];
for (i = 0; i < hotelList.length; i++) {
penalties[i] = Math.pow(200 - hotelList[i], 2)
path[i] = 0
for (j = 0; j < i; j++) {
const temp = penalties[j] + Math.pow((200 - (hotelList[i] - hotelList[j])), 2)
if (temp < penalties[i]) {
penalties[i] = temp;
path[i] = (j + 1);
}
}
}
const finalPath = [];
let index = path.length - 1
while (index >= 0) {
finalPath.unshift(index + 1);
index = path[index] - 1;
}
console.log('min penalty is ', penalties[hotelList.length - 1])
console.log('final path is ', finalPath)
return finalPath;
}
// calculateOptimalRoute([20, 40, 60, 940, 1500])
// Outputs [3, 4, 5]
// calculateOptimalRoute([190, 420, 550, 660, 670])
// Outputs [1, 2, 5]
// calculateOptimalRoute([200, 400, 600, 601])
// Outputs [1, 2, 4]
// calculateOptimalRoute([])
// Outputs []
To answer your question concisely, a PSPACE-complete algorithm is usually considered "efficient" for most Constraint Satisfaction Problems, so if you have an O(n^2) algorithm, that's "efficient".
I think the simplest method, given N total miles and 200 miles per day, would be to divide N by 200 to get X; the number of days you will travel. Round that to the nearest whole number of days X', then divide N by X' to get Y, the optimal number of miles to travel in a day. This is effectively a constant-time operation. If there were a hotel every Y miles, stopping at those hotels would produce the lowest possible score, by minimizing the effect of squaring each day's penalty. For instance, if the total trip is 605 miles, the penalty for travelling 201 miles per day (202 on the last) is 1+1+4 = 6, far less than 0+0+25 = 25 (200+200+205) you would get by minimizing each individual day's travel penalty as you went.
Now, you can traverse the list of hotels. The fastest method would be to simply pick the hotel that is the closest to each multiple of Y miles. It's linear-time and will produce a "good" result. However, I do not think this will produce the "best" result in all cases.
The more complex but foolproof method is to get the two closest hotels to each multiple of Y; the one immediately before and the one immediately after. This produces an array of X' pairs, which can be traversed in all possible permutations in 2^X' time. You can shorten this by applying Dijkstra to a map of these pairs, which will determine the least costly path for each day's travel, and will execute in roughly (2X')^2 time. This will probably be the most efficient algorithm that is guaranteed to produce the optimal result.
As #rmmh mentioned you are finding minimum distance path. Here distance is penalty ( 200-x )^2
So you will try to find a stopping plan by finding minimum penalty.
Lets say D(ai) gives distance of ai from starting point
P(i) = min { P(j) + (200 - (D(ai) - D(dj)) ^2 } where j is : 0 <= j < i
From a casual analysis it looks to be
O(n^2) algorithm ( = 1 + 2 + 3 + 4 + .... + n ) = O(n^2)
As a proof of concept, here is my JavaScript solution in Dynamic Programming without nested loops.
We start at zero miles.
We find the next stop by keeping the penalty as low as we can by comparing the penalty of a current hotel in the loop to the previous hotel's penalty.
Once we have our current minimum, we have found our stop for the day. We assign this point as our next starting point.
Optionally, we could keep the total of the penalties:
let hotels = [40, 80, 90, 200, 250, 450, 680, 710, 720, 950, 1000, 1080, 1200, 1480]
function findOptimalPath(arr) {
let start = 0
let stops = []
for (let i = 0; i < arr.length; i++) {
if (Math.pow((start + 200) - arr[i-1], 2) < Math.pow((start + 200) - arr[i], 2)) {
stops.push(arr[i-1])
start = arr[i-1]
}
}
console.log(stops)
}
findOptimalPath(hotels)
Here is my Python solution using Dynamic Programming:
distance = [150,180,250,340]
def hotelStop(distance):
n = len(distance)
DP = [0 for _ in distance]
for i in range(n-2,-1,-1):
min_penalty = float("inf")
for j in range(i+1,n):
# going from hotel i to j in first day
x = distance[j]-distance[i]
penalty = (200-x)**2
total_pentalty = penalty+ DP[j]
min_penalty = min(min_penalty,total_pentalty)
DP[i] = min_penalty
return DP[0]
hotelStop(distance)

Better random "feeling" integer generator for short sequences

I'm trying to figure out a way to create random numbers that "feel" random over short sequences. This is for a quiz game, where there are four possible choices, and the software needs to pick one of the four spots in which to put the correct answer before filling in the other three with distractors.
Obviously, arc4random % 4 will create more than sufficiently random results over a long sequence, but in a short sequence its entirely possible (and a frequent occurrence!) to have five or six of the same number come back in a row. This is what I'm aiming to avoid.
I also don't want to simply say "never pick the same square twice," because that results in only three possible answers for every question but the first. Currently I'm doing something like this:
bool acceptable = NO;
do {
currentAnswer = arc4random() % 4;
if (currentAnswer == lastAnswer) {
if (arc4random() % 4 == 0) {
acceptable = YES;
}
} else {
acceptable = YES;
}
} while (!acceptable);
Is there a better solution to this that I'm overlooking?
If your question was how to compute currentAnswer using your example's probabilities non-iteratively, Guffa has your answer.
If the question is how to avoid random-clustering without violating equiprobability and you know the upper bound of the length of the list, then consider the following algorithm which is kind of like un-sorting:
from random import randrange
# randrange(a, b) yields a <= N < b
def decluster():
for i in range(seq_len):
j = (i + 1) % seq_len
if seq[i] == seq[j]:
i_swap = randrange(i, seq_len) # is best lower bound 0, i, j?
if seq[j] != seq[i_swap]:
print 'swap', j, i_swap, (seq[j], seq[i_swap])
seq[j], seq[i_swap] = seq[i_swap], seq[j]
seq_len = 20
seq = [randrange(1, 5) for _ in range(seq_len)]; print seq
decluster(); print seq
decluster(); print seq
where any relation to actual working Python code is purely coincidental. I'm pretty sure the prior-probabilities are maintained, and it does seem break clusters (and occasionally adds some). But I'm pretty sleepy so this is for amusement purposes only.
You populate an array of outcomes, then shuffle it, then assign them in that order.
So for just 8 questions:
answer_slots = [0,0,1,1,2,2,3,3]
shuffle(answer_slots)
print answer_slots
[1,3,2,1,0,2,3,0]
To reduce the probability for a repeated number by 25%, you can pick a random number between 0 and 3.75, and then rotate it so that the 0.75 ends up at the previous answer.
To avoid using floating point values, you can multiply the factors by four:
Pseudo code (where / is an integer division):
currentAnswer = ((random(0..14) + lastAnswer * 4) % 16) / 4
Set up a weighted array. Lets say the last value was a 2. Make an array like this:
array = [0,0,0,0,1,1,1,1,2,3,3,3,3];
Then pick a number in the array.
newValue = array[arc4random() % 13];
Now switch to using math instead of an array.
newValue = ( ( ( arc4random() % 13 ) / 4 ) + 1 + oldValue ) % 4;
For P possibilities and a weight 0<W<=1 use:
newValue = ( ( ( arc4random() % (P/W-P(1-W)) ) * W ) + 1 + oldValue ) % P;
For P=4 and W=1/4, (P/W-P(1-W)) = 13. This says the last value will be 1/4 as likely as other values.
If you completely eliminate the most recent answer it will be just as noticeable as the most recent answer showing up too often. I do not know what weight will feel right to you, but 1/4 is a good starting point.

What is the correct algorthm for a logarthmic distribution curve between two points?

I've read a bunch of tutorials about the proper way to generate a logarithmic distribution of tagcloud weights. Most of them group the tags into steps. This seems somewhat silly to me, so I developed my own algorithm based on what I've read so that it dynamically distributes the tag's count along the logarthmic curve between the threshold and the maximum. Here's the essence of it in python:
from math import log
count = [1, 3, 5, 4, 7, 5, 10, 6]
def logdist(count, threshold=0, maxsize=1.75, minsize=.75):
countdist = []
# mincount is either the threshold or the minimum if it's over the threshold
mincount = threshold<min(count) and min(count) or threshold
maxcount = max(count)
spread = maxcount - mincount
# the slope of the line (rise over run) between (mincount, minsize) and ( maxcount, maxsize)
delta = (maxsize - minsize) / float(spread)
for c in count:
logcount = log(c - (mincount - 1)) * (spread + 1) / log(spread + 1)
size = delta * logcount - (delta - minsize)
countdist.append({'count': c, 'size': round(size, 3)})
return countdist
Basically, without the logarithmic calculation of the individual count, it would generate a straight line between the points, (mincount, minsize) and (maxcount, maxsize).
The algorithm does a good approximation of the curve between the two points, but suffers from one drawback. The mincount is a special case, and the logarithm of it produces zero. This means the size of the mincount would be less than minsize. I've tried cooking up numbers to try to solve this special case, but can't seem to get it right. Currently I just treat the mincount as a special case and add "or 1" to the logcount line.
Is there a more correct algorithm to draw a curve between the two points?
Update Mar 3: If I'm not mistaken, I am taking the log of the count and then plugging it into a linear equation. To put the description of the special case in other words, in y=lnx at x=1, y=0. This is what happens at the mincount. But the mincount can't be zero, the tag has not been used 0 times.
Try the code and plug in your own numbers to test. Treating the mincount as a special case is fine by me, I have a feeling it would be easier than whatever the actual solution to this problem is. I just feel like there must be a solution to this and that someone has probably come up with a solution.
UPDATE Apr 6: A simple google search turns up a many of the tutorials I've read, but this is probably the most complete example of stepped tag clouds.
UPDATE Apr 28: In response to antti.huima's solution: When graphed, the curve that your algorithm creates lies below the line between the two points. I've been trying to juggle the numbers around but still can't seem to come up with a way to flip that curve to the other side of the line. I'm guessing that if the function was changed to some form of logarithm instead of an exponent it would do exactly what I'd need. Is that correct? If so, can anyone explain how to achieve this?
Thanks to antti.huima's help, I re-thought out what I was trying to do.
Taking his method of solving the problem, I want an equation where the logarithm of the mincount is equal to the linear equation between the two points.
weight(MIN) = ln(MIN-(MIN-1)) + min_weight
min_weight = ln(1) + min_weight
While this gives me a good starting point, I need to make it pass through the point (MAX, max_weight). It's going to need a constant:
weight(x) = ln(x-(MIN-1))/K + min_weight
Solving for K we get:
K = ln(MAX-(MIN-1))/(max_weight - min_weight)
So, to put this all back into some python code:
from math import log
count = [1, 3, 5, 4, 7, 5, 10, 6]
def logdist(count, threshold=0, maxsize=1.75, minsize=.75):
countdist = []
# mincount is either the threshold or the minimum if it's over the threshold
mincount = threshold<min(count) and min(count) or threshold
maxcount = max(count)
constant = log(maxcount - (mincount - 1)) / (maxsize - minsize)
for c in count:
size = log(c - (mincount - 1)) / constant + minsize
countdist.append({'count': c, 'size': round(size, 3)})
return countdist
Let's begin with your mapping from the logged count to the size. That's the linear mapping you mentioned:
size
|
max |_____
| /
| /|
| / |
min |/ |
| |
/| |
0 /_|___|____
0 a
where min and max are the min and max sizes, and a=log(maxcount)-b. The line is of y=mx+c where x=log(count)-b
From the graph, we can see that the gradient, m, is (maxsize-minsize)/a.
We need x=0 at y=minsize, so log(mincount)-b=0 -> b=log(mincount)
This leaves us with the following python:
mincount = min(count)
maxcount = max(count)
xoffset = log(mincount)
gradient = (maxsize-minsize)/(log(maxcount)-log(mincount))
for c in count:
x = log(c)-xoffset
size = gradient * x + minsize
If you want to make sure that the minimum count is always at least 1, replace the first line with:
mincount = min(count+[1])
which appends 1 to the count list before doing the min. The same goes for making sure the maxcount is always at least 1. Thus your final code per above is:
from math import log
count = [1, 3, 5, 4, 7, 5, 10, 6]
def logdist(count, maxsize=1.75, minsize=.75):
countdist = []
mincount = min(count+[1])
maxcount = max(count+[1])
xoffset = log(mincount)
gradient = (maxsize-minsize)/(log(maxcount)-log(mincount))
for c in count:
x = log(c)-xoffset
size = gradient * x + minsize
countdist.append({'count': c, 'size': round(size, 3)})
return countdist
what you have is that you have tags whose counts are from MIN to MAX; the threshold issue can be ignored here because it amounts to setting every count below threshold to the threshold value and taking the minimum and maximum only afterwards.
You want to map the tag counts to "weights" but in a "logarithmic fashion", which basically means (as I understand it) the following. First, the tags with count MAX get max_weight weight (in your example, 1.75):
weight(MAX) = max_weight
Secondly, the tags with the count MIN get min_weight weight (in your example, 0.75):
weight(MIN) = min_weight
Finally, it holds that when your count decreases by 1, the weight is multiplied with a constant K < 1, which indicates the steepness of the curve:
weight(x) = weight(x + 1) * K
Solving this, we get:
weight(x) = weight_max * (K ^ (MAX - x))
Note that with x = MAX, the exponent is zero and the multiplicand on the right becomes 1.
Now we have the extra requirement that weight(MIN) = min_weight, and we can solve:
weight_min = weight_max * (K ^ (MAX - MIN))
from which we get
K ^ (MAX - MIN) = weight_min / weight_max
and taking logarithm on both sides
(MAX - MIN) ln K = ln weight_min - ln weight_max
i.e.
ln K = (ln weight_min - ln weight_max) / (MAX - MIN)
The right hand side is negative as desired, because K < 1. Then
K = exp((ln weight_min - ln weight_max) / (MAX - MIN))
So now you have the formula to calculate K. After this you just apply for any count x between MIN and MAX:
weight(x) = max_weight * (K ^ (MAX - x))
And you are done.
On a log scale, you just plot the log of the numbers linearly (in other words, pretend you're plotting linearly, but take the log of the numbers to be plotted first).
The zero problem can't be solved analytically--you have to pick a minimum order of magnitude for your scale, and no matter what you can't ever reach zero. If you want to plot something at zero, your choices are to arbitrarily give it the minimum order of magnitude of the scale, or to omit it.
I don't have the exact answer, but i think you want to look up Linearizing Exponential Data. Start by calculate the equation of the line passing through the points and take the log of both sides of that equation.

Resources