i'm developing an implementation of ant colony algorithm, and stuck at find nearest value process. here is the problem.
i have an array contains cummulative probabilities, let say: cummulativeProb:{0.0, 0.34782608695652173, 0.8695652173913044, 1.0}
and there is a random number: randomNumber: 0,3323792320
i want application choose nearest value from randomNumber, but NOT bigger than randomNumber, that means application will choose 0.34782608695652173
can you give me a hint, please?
Look at Arrays.binarySearch() this will give you the element that equals what you are looking for, or -ve the index that is lower than what you are looking for.
Related
I have to reorder a sequence of elements based on the similarity between each other (expressed by a coefficient) so that each element is the most similar possible to each of its neighbors. I have to find an algorithm rather than a code.
Example with 10 elements and similarity coefficients calculated for each pair of the elements below :
The excel file can be find here : https://1drv.ms/x/s!AtmZN4-kjgrPms99fqgaDwAS_F4uYw
What I have tried :
Find a pair with the highest coefficient. In the example : 0.98 for T3 (left-end) and T5 (right-end)
Find maximum coefficient between the left-end and the remaining elements
Find maximum coefficient between the right-end and the remaining elements
Take the maximum between 2. and 3.
If maximum is 2. add on the left the element corresponding to the maximum coefficient for the left-end. Else, add on the right the element corresponding to the maximum coefficient for the right-end
Repeat points 2 - 6 until no elements left.
Here is the result :
The result isn't bad. One of the disadvantages I see is that 0.99>0.98 is considered in the same way as 0.99>0.01.
The second option I thought about was maximizing the sum of coefficients between all neighbors, but don't really know where to start from. Especially if there are significantly more than 10 elements. More, it could result in a more "flat" order where while having better similarities overall some extremely similar elements could be placed far from each other.
Being really new to this kind of problems I am pretty sure this should be a rather standard issue with existing solutions. Could you please point to those?
Thank you!
After researching I have found that my problem can be seen as the "Travelling Salesman Problem" (TSP). More here : https://en.wikipedia.org/wiki/Travelling_salesman_problem
To apply it you can see "elements" in my example as "cities" in TSP and (1-Similarity coefficient) as "distances".
I have been trying to solve an optimization problem but could not able to think it through for any efficient solution.
Here's the problem
We are given data representing a sequence of bookings on a single car. Each booking data consist of two points (start location, end location). Now given two adjacent bookings b1,b2, we say a relocation is required between those bookings if the end location of b1 not equal to the start location of b2
We have to design an algorithm that takes a sequence of bookings as
input and outputs a single permutation of the input that minimizes the
total number of relocations within the sequence.
Here's my approach
To me, it looks like one of the greedy scheduling problems but I'm not able to derive any good heuristics to solve this problem from any of the existing scheduling problems. At last, I thought of sorting the given sequence on the basis of the minimum difference between start time and end time of the two adjacent sequence using insertion sort.
So, for our given problem
[(23, 42),(77, 45),(42, 77)] will get sorted to
[(23, 42),(42, 77),(77, 45)] thus minimizing end point my start point.
Let's take another example
[(3,1),(1,3),(3,1),(2,2),(3,1),(2,3),(1,3),(1,1),(3,3),(3,2),(3,3)]
now after sorting till index 7 using insertion sort, our array will look like
[(3,1),(1,3),(3,1),(2,2),(2,3),(3,3),(3,1),(1,3),(3,3),(3,2),(3,3)]
Now for placing point (3,3) present at index 8 in the unsorted array we will do the following
The idea is to put each point in its correct location. For the point
(3,3) at index 8 I will search in the already sorted array the first
entry whose endpoint matches 3 i.e. starting point of this new point,
given the condition that adding this point after that first found
entry does not violate the variant that start of next entry should
match the end of this point. So, we inserted (3,3) in between (2,3)
and (3,1) at index. It looks like this
[(3,1),(1,3),(3,1),(2,2),(2,3),(3,3),(3,1),(1,3),(3,3),(3,2),(1,1)]
However, I'm not sure how will I prove that this is the optimal or not optimal solution. Any pointer is highly appreciated. Is there a better way which I'm sure there is which will help us solve this.
You can convert this easily into a graph problem.
[a, b] -> vertices a and b with an edge between a and b. Use DFS to find all connected components in this undirected graph and do some post processing.
It is linear in input size.
Today I was asked the following question in an interview:
Given an n by n array of integers that contains no duplicates and values that increase from left to right as well as top to bottom, provide an algorithm that checks whether a given value is in the array.
The answer I provided was similar to the answer in this thread:
Algorithm: efficient way to search an integer in a two dimensional integer array?
This solution is O(2n), which I believed to be the optimal solution.
However, the interviewer then informed me that it is possible to solve this problem in sub linear time. I have racked my brain with how to go about doing this, but I am coming up with nothing.
Is a sub linear solution possible, or is this the optimal solution?
The thing to ask yourself is, what information does each comparison give you? It let's you eliminate the rectangle either "above to the left" or "below to the right".
Suppose you do a comparison at 'x' and it tells you what your are looking for is greater:
XXX...
XXX...
XXx...
......
......
'x' - checked space
'X' - check showed this is not a possible location for your data
'.' - still unknown
You have to use this information in a smart way to check the entire rectangle.
Suppose you do a binary search this way on the middle column...
You'll get a result like
XXX...
XXX...
XXX...
XXXXXX
...XXX
...XXX
Two rectangular spaces are left over of half width and possibly full height. What can you do with this information?
I recommend recurring on the 2 resulting subrectangles of '.'. BUT, now instead of choosing the middle column, you choose the middle row to do your binary search on.
So the resulting run time of an N by M rectangle looks like
T(N, M) = log(N) + T(M/2, N)*2
Note the change in indexes because your recursion stack switches between checking columns and rows. The final run time (I didn't bother solving the recursion) should be something like T(M, N) = log(M) + log(N) (it's probably not exactly this but it will be similar).
I am having trouble starting off this particular homework problem. Here is the problem:
Suppose that you are given an algorithm as a black box – you cannot see how it is designed – it has the following properties: if you input any sequence of real numbers and an integer k, the algorithm will answer YES or NO indicating whether there is a subset of numbers whose sum is exactly k. Show how to use this black box to find the subset of a given sequence X1, …., Xn whose sum is k. You can use the black box O(n) times.
I figure that the sequence should be sorted first, and anything < k should only be considered. Any help to get started would be greatly appreciated. Thanks.
Sorting is the wrong approach. Think about it this way: how can you use the oracle to determine whether a particular item in the set is part of the sum? Once you know whether that item is part of the sum, how can you use the oracle to figure out whether some other item is part of the sum?
The blackbox is something like this, in C# (ignore that I used int instead of real for the sequence, it's inconsequential to the problem).
bool blackbox(List<int> subSequence, int k)
{
// unknown
}
You are tasked with passing in a subset of the sequence and finding what part of the sequence equals k.
Start with the whole sequence, just to see if k is in it at all.
Then, if it contains k, try a subsequence to see if that subsequence contains k.
Repeat until you have the subsequence that contains k.
Say I have a linked list of numbers of length N. N is very large and I don’t know in advance the exact value of N.
How can I most efficiently write a function that will return k completely random numbers from the list?
There's a very nice and efficient algorithm for this using a method called reservoir sampling.
Let me start by giving you its history:
Knuth calls this Algorithm R on p. 144 of his 1997 edition of Seminumerical Algorithms (volume 2 of The Art of Computer Programming), and provides some code for it there. Knuth attributes the algorithm to Alan G. Waterman. Despite a lengthy search, I haven't been able to find Waterman's original document, if it exists, which may be why you'll most often see Knuth quoted as the source of this algorithm.
McLeod and Bellhouse, 1983 (1) provide a more thorough discussion than Knuth as well as the first published proof (that I'm aware of) that the algorithm works.
Vitter 1985 (2) reviews Algorithm R and then presents an additional three algorithms which provide the same output, but with a twist. Rather than making a choice to include or skip each incoming element, his algorithm predetermines the number of incoming elements to be skipped. In his tests (which, admittedly, are out of date now) this decreased execution time dramatically by avoiding random number generation and comparisons on each in-coming number.
In pseudocode the algorithm is:
Let R be the result array of size s
Let I be an input queue
> Fill the reservoir array
for j in the range [1,s]:
R[j]=I.pop()
elements_seen=s
while I is not empty:
elements_seen+=1
j=random(1,elements_seen) > This is inclusive
if j<=s:
R[j]=I.pop()
else:
I.pop()
Note that I've specifically written the code to avoid specifying the size of the input. That's one of the cool properties of this algorithm: you can run it without needing to know the size of the input beforehand and it still assures you that each element you encounter has an equal probability of ending up in R (that is, there is no bias). Furthermore, R contains a fair and representative sample of the elements the algorithm has considered at all times. This means you can use this as an online algorithm.
Why does this work?
McLeod and Bellhouse (1983) provide a proof using the mathematics of combinations. It's pretty, but it would be a bit difficult to reconstruct it here. Therefore, I've generated an alternative proof which is easier to explain.
We proceed via proof by induction.
Say we want to generate a set of s elements and that we have already seen n>s elements.
Let's assume that our current s elements have already each been chosen with probability s/n.
By the definition of the algorithm, we choose element n+1 with probability s/(n+1).
Each element already part of our result set has a probability 1/s of being replaced.
The probability that an element from the n-seen result set is replaced in the n+1-seen result set is therefore (1/s)*s/(n+1)=1/(n+1). Conversely, the probability that an element is not replaced is 1-1/(n+1)=n/(n+1).
Thus, the n+1-seen result set contains an element either if it was part of the n-seen result set and was not replaced---this probability is (s/n)*n/(n+1)=s/(n+1)---or if the element was chosen---with probability s/(n+1).
The definition of the algorithm tells us that the first s elements are automatically included as the first n=s members of the result set. Therefore, the n-seen result set includes each element with s/n (=1) probability giving us the necessary base case for the induction.
References
McLeod, A. Ian, and David R. Bellhouse. "A convenient algorithm for drawing a simple random sample." Journal of the Royal Statistical Society. Series C (Applied Statistics) 32.2 (1983): 182-184. (Link)
Vitter, Jeffrey S. "Random sampling with a reservoir." ACM Transactions on Mathematical Software (TOMS) 11.1 (1985): 37-57. (Link)
This is called a Reservoir Sampling problem. The simple solution is to assign a random number to each element of the list as you see it, then keep the top (or bottom) k elements as ordered by the random number.
I would suggest: First find your k random numbers. Sort them. Then traverse both the linked list and your random numbers once.
If you somehow don't know the length of your linked list (how?), then you could grab the first k into an array, then for node r, generate a random number in [0, r), and if that is less than k, replace the rth item of the array. (Not entirely convinced that doesn't bias...)
Other than that: "If I were you, I wouldn't be starting from here." Are you sure linked list is right for your problem? Is there not a better data structure, such as a good old flat array list.
If you don't know the length of the list, then you will have to traverse it complete to ensure random picks. The method I've used in this case is the one described by Tom Hawtin (54070). While traversing the list you keep k elements that form your random selection to that point. (Initially you just add the first k elements you encounter.) Then, with probability k/i, you replace a random element from your selection with the ith element of the list (i.e. the element you are at, at that moment).
It's easy to show that this gives a random selection. After seeing m elements (m > k), we have that each of the first m elements of the list are part of you random selection with a probability k/m. That this initially holds is trivial. Then for each element m+1, you put it in your selection (replacing a random element) with probability k/(m+1). You now need to show that all other elements also have probability k/(m+1) of being selected. We have that the probability is k/m * (k/(m+1)*(1-1/k) + (1-k/(m+1))) (i.e. probability that element was in the list times the probability that it is still there). With calculus you can straightforwardly show that this is equal to k/(m+1).
Well, you do need to know what N is at runtime at least, even if this involves doing an extra pass over the list to count them. The simplest algorithm to do this is to just pick a random number in N and remove that item, repeated k times. Or, if it is permissible to return repeat numbers, don't remove the item.
Unless you have a VERY large N, and very stringent performance requirements, this algorithm runs with O(N*k) complexity, which should be acceptable.
Edit: Nevermind, Tom Hawtin's method is way better. Select the random numbers first, then traverse the list once. Same theoretical complexity, I think, but much better expected runtime.
Why can't you just do something like
List GetKRandomFromList(List input, int k)
List ret = new List();
for(i=0;i<k;i++)
ret.Add(input[Math.Rand(0,input.Length)]);
return ret;
I'm sure that you don't mean something that simple so can you specify further?