Resource Allocation Algorithm (Weights in Containers) - algorithm

I am currently trying to work through this problem. But I cannot seem to find the solution for this problem.
So here is the premise, there are k number of containers. Each container has a capacity associated with it. You are to place weights in these containers. The weights can have random values. However, the total weight in the container cannot exceed the capacity of the container. Or else the container will break. There could be a situation where the weight does not fit in any of the container. Then, you can rearrange the weights to accommodate the new weight.
Example:
Container 1: [10, 4], Capacity = 20
Container 2: [7, 6], Capacity = 20
Container 3: [10, 6], Capacity = 20
Now lets say we have to add new weight with value 8.
One possible solution is to move the 6 from Container 2 to Container 1. And place the new weight in Container 2.
Container 1: [10, 4, 6], Capacity = 20
Container 2: [7, 8], Capacity = 20
Container 3: [10, 6], Capacity = 20
I would like to reallocate this in an few moves as possible.
Let me know if this does not make sense. I am sure there is an algorithm out there but I just cannot seem to find it.
Thanks.
I thought the "Distribution of Cookies" problem would help but that requires to many moves.

As I noted in the comments, the problem of finding if ANY solution exists is called Bin Packing and is NP-complete. Therefore any solution is either going to sometimes fail to find answers, or will be possibly exponentially slow.
The stated preference is for sometimes failing to find an answer. So I'll make reasonable decisions that result in that.
Note that this is would take me a couple of days for me to implement. Take a shot yourself, but if you want you can email btilly#gmail.com and we can discuss a contract. (I already spent too long on it.)
Next, the request for shortest path means a breadth first search. So we'll take a breadth-first search through "the reasonableness of the path". Basically we'll try greedy first strategies, and then cut it off if it takes too long. So we may find the wrong answer (if greedy was wrong), or give up (if it takes too long). But we'll generally do reasonably well.
So what is a reasonable path? Well a good greedy solution to bin packing is always place the heaviest thing first, and place it in the fullest bin you can. That's great for placing a bunch of objects in at once, but it won't help you directly with moving objects.
And therefore we'll prioritize moves that create large holes first. And so our rules for the first things to try become:
Always place the heaviest thing we have first.
If possible, place it where we leave the container as full as possible.
Try moving things to create large spaces before small ones.
Deduplicate early.
Figuring this out is going to involve a lot of, "Pick the closest to full bin where I fit," and, "Pick the smallest thing in this bin which lets me fit." And you'd like to do this while looking at a lot of, "We did, X, Y and Z..." and then looking at "...or maybe X, Y and W...".
Luckily I happen to have a perfect data structure for this. https://stackoverflow.com/a/75453554/585411 shows how to have a balanced binary tree, kept in sorted order, which it is easy to clone and try something with while not touching the original tree. There I did it so you can iterate over the old tree. But you can also use it to create a clone and try something out that you may later abandon.
I didn't make that a multi-set (able to add elements multiple times) or add a next_biggest method. A multi-set is doable by adding a count to a node. Now contains can return a count (possibly 0) instead of a boolean. And next_biggest is fairly easy to add.
We need to add a hash function to this for deduplication purposes. We can define this recursively with:
node.hash = some_hash(some_hash(node.value) + some_hash(node.left.hash) + some_hash(node.right.hash))
(insert appropriate default hashes if node.left or node.right is None)
If we store this in the node at creation, then looking it up for deduplication is very fast.
With this if you have many bins and many objects each, you can have the objects stored in sorted order of size, and the bins stored sorted by free space, then bin.hash. And now the idea is to add a new object to a bin as follows
new_bin = old_bin.add(object)
new_bins = old_bins.remove(old_bin).add(new_bin)
And remove similarly with:
new_bin = old_bin.remove(object)
new_bins = old_bins.remove(old_bin).add(new_bin)
And with n objects across m bins this constructs each new state using only O(log(n) + log(m)) new data. And we can easily see if we've been here before.
And now we create partial solutions objects consisting of:
prev_solution (the solution we came from, may be None)
current_state (our data for bins and objects in bins)
creation_id (ascending id for partial solutions)
last_move (object, from_bin, to_bin)
future_move_bins (list of bins in order of largest movable object)
future_bins_idx (which one we last looked at)
priority (what order to look at these in)
moves (how many moves we've actually used)
move_priority (at what priority we started emptying the from_bin)
Partial solutions should compare based on priority and then creation_id. They should hash based on (solution.state.hash, solution.last_move.move_to.hash, future_bins_idx).
There will need to be a method called next_solutions. It will return the next group of future solutions to consider. (These may share
The first partial solution will have prev_solution = None, creation_id=1, last_move=None, and priority = moves = move_priority = 0. The future_move_bins will be a list of bins sorted by biggest movable element descending. And future_move_bins_idx will be 0
When we create a new partial solution, we will have to:
clone old solution into self
self.prev_solution = old solution
self.creation_id = next_creation_id
next_creation_id += 1
set self.last_move
remove object from self.state.from_bin
add object to self.state.to_bin
(fixing future_move_bins left to caller)
self.moves += 1
if the new from_bin matches the previous:
self.priority = max(self.moves, self.move_priority)
else:
self.priority += 1
self.move_priority = self.priority
OK, this is a lot of setup. We're ALMOST there. (Except for the key future_moves business.)
The next thing that we need is the idea of a Priority Queue. Which in Python can be realized with heapq.
And NOW here is the logic for the search:
best_solution_hash_moves = {}
best_space_by_moves = {}
construct initial_solution
queue = []
add initial_solution.next_solutions() to queue
while len(queue) and not_time_to_stop(): # use this to avoid endless searches:
solution = heapq.heappop(queue)
# ANSWER HERE?
if can add target object to solution.state:
walk prev_solution backwards to get the moves we want
return reverse of the moves we found.
if solution.hash() not in best_solution_hash:
# We have never seen this solution hash
best_solution_hash[solution.hash()] = solution
elif solution.moves < best_solution_hash[solution.hash()].moves:
# This is a better way of finding this state we previously got to!
# We want to redo that work with higher priority!
solution.priority = min(solution.priority, best_solution_hash[solution.hash()].priority - 0.01)
best_solution_hash[solution.hash()] = solution
if best_solution_hash[solution.hash()] == solution:
for next_solution in solution.next_solutions():
# Is this solution particularly promising?
if solution.moves not in best_space_by_moves or
best_space_by_moves[solution.moves] <=
space left in solution.last_move.from_bin:
# Promising, maybe best solution? Let's prioritize it!
best_space_by_moves[solution.moves] =
space left in solution.last_move.from_bin:
solution.priority = solution.move_priority = solution.moves
add next_solution to queue
return None # because no solution was found
So the idea is that we take the best looking current solution, consider just a few related solutions, and add them back to the queue. Generally with a higher priority. So if something fairly greedy works, we'll try that fairly quickly. In time we'll get to unpromising moves. If one of those surprises us on the upside, we'll set its priority to moves (thereby making us focus on it), and explore that path more intensely.
So what does next_solutions do? Something like this:
def next_solutions(solution):
if solution.last_move is None:
if future_bins is not empty:
yield result of moving largest movable in future_bins[0] to first bin it can go into (ie enough space)
else:
if can do this from solution:
yield result of moving largest movable...
in future_bins[bin_idx]...
to smallest bin it can go in...
...at least as big as last_move.to_bin
if can move smaller object from same bin in prev_solution:
yield that with priority solution.priority+2
if can move same object to later to_bin in prev_solution:
yield that with priority solution.priority+2
if can move object from next bin_idx in prev_solution:
yield result of moving that with priority solution.priority+1
Note that trying moving small objects first, or moving objects to an emptier bin than needed are possible, but are unlikely to be a good idea. So I penalized that more severely to have the priority queue focus on better ideas. This results in a branching factor of about 2.7.
So if an obvious greedy approach succeeds in less than 7 steps, the queue will likely get to size 1000 or so before you find it. And is likely to find it if you had a couple of suboptimal choices.
Even if a couple of unusual choices need to be made, you'll still get an answer quickly. You might not find the best, but you'll generally find pretty good ones.
Solutions of a dozen moves with a lot of data will require the queue to grow to around 100,000 items, and that should take on the order of 50-500 MB of memory. And that's probably where this approach maxes out.
This all may be faster (by a lot) if the bins are full enough that there aren't a lot of moves to make.

Related

How do I find the largest cluster in this simple dataset?

I have data on users and their interests. Some users have more interests than others. Data looks like below.
How do I find the largest cluster of users with the most interests in common? Formally, I am trying to maximize (number of users in cluster * number of shared interests in cluster)
In the data below, the largest cluster is:
CORRECT ANSWER
Users: [1,2,3]
Interests: [2,3]
Cluster-value: 3 users x 2 shared interests = 6
DATA
User 1: {3,2}
User 2: {3,2,4}
User 3: {2,3,8}
User 4: {7}
User 5: {7}
User 6: {9}
How do I find the largest cluster of users with the most interests in common?
Here would be a hypothetical data generation process:
import random
# Generate 300 random (user, interest) tupples
def generate_data():
data = []
while len(data) < 300:
data_pt = {"user": random.randint(1,100), "interest":random.randint(50)}
if data_pt not in data:
data.append(data_pt)
return data
def largest_cluster(data):
return None
UPDATE: As somebody pointed out, the data is too parse. In the real case, there would be more users than interests. So I have updated the data generating process.
This looks to me like the kind of combinatorial optimization problem which would fall into the NP-Hard complexity class, which would of course mean that it's intractable to find an exact solution for instances with more than ~30 users.
Dynamic Programming would be the tool you'd want to employ if you were to find a usable algorithm for a problem with an exponential search space like this (here the solution space is all 2^n subsets of users), but I don't see DP helping us here because of the lack of overlapping sub-problems. That is, for DP to help, we have to be able to use and combine solutions to smaller sub-problems into an overall solution in polynomial time, and I don't see how we can do that for this problem.
Imagine you have a solution for a size=k problem, using a limited subset of the users {u1, u2,...uk} and you want to use that solution to find the new solution when you add another user u(k+1). The issue is the solution set in the incrementally larger instance might not overlap at all with the previous solution (it may be an entirely different group of users/interests), so we can't effectively combine solutions to subproblems to get the overall solution. And if instead of trying to just use the single optimal solution for the size k problem to reason about the size k+1 problem you instead stored all possible user combinations from the smaller instance along with their scores, you could of course quite easily do set intersections across these groups' interests with the new user's interests to find the new optimal solution. However, the problem with this approach is of course that the information you have to store would double with iteration, yielding an exponential time algorithm not better than the brute force solution. You run into similar problems if you try to base your DP off incrementally adding interests rather than users.
So if you know you only have a few users, you can use the brute force approach: generating all user combinations, taking a set intersection of each combination's interests, scoring and saving the max score. The best way to approach larger instances would probably be with approximate solutions through search algorithms (unless there is a DP solution I don't see). You could iteratively add/subtracts/swap users to improve the score and climb towards towards an optimum, or use a branch-and-bound algorithm which systematically explores all user combinations but stops exploring any user-subset branches with null interest intersection (as adding additional users to that subset will still produce a null intersection). You might have a lot of user groups with null interest intersections, so this latter approach could be quite quick practically speaking by its pruning off large parts of the search space, and if you ran it without a depth limit it would find the exact solution eventually.
Branch-and-bound would work something like this:
def getLargestCluster((user, interest)[]):
userInterestDict := { user -> {set of user's interests} } # build a dict
# generate and score user clusters
users := userInterestDict.keys() # save list of users to iterate over
bestCluster, bestInterests, bestClusterScore := {}, {}, 0
generateClusterScores()
return [bestCluster, bestInterests bestClusterScore]
# (define locally in getLargestCluster or pass needed values
def generateClusterScores(i = 0, userCluster = {}, clusterInterests = {}):
curScore := userCluster.size * clusterInterests.size
if curScore > bestScore:
bestScore, bestCluster, bestInterests := curScore, curCluster, clusterInterests
if i = users.length: return
curUser := users[i]
curInterests := userInterestDict[curUser]
newClusterInterests := userCluster.size = 0 ? curInterests : setIntersection(clusterInterests, curInterests)
# generate rest subsets with and without curUser (copy userCluster if pass by reference)
generateClusterScores(i+1, userCluster, clusterInterests)
if !newClusterInterests.isEmpty(): # bound the search here
generateClusterScores(i+1, userCluster.add(curUser), newClusterInterests)
You might be able to do a more sophisticated bounding (like if you can calculate that the current cluster score couldn't eclipse your current best score, even if all the remaining users were added to the cluster and the interest intersection stayed the same), but checking for an empty interest intersection is simple enough. This works fine for 100 users, 50 interests though, up to around 800 data points. You could also make it more efficient by iterating over the minimum of |interests| and |users| (to generate fewer recursive calls/combinations) and just mirror the logic for the case where interests is lower. Also, you get more interesting clusters with fewer users/interests

Jira's Lexorank algorithm for new stories

I am looking to create a large list of items that allows for easy insertion of new items and for easily changing the position of items within that list. When updating the position of an item, I want to change as few fields as possible regarding the order of items.
After some research, I found that Jira's Lexorank algorithm fulfills all of these needs. Each story in Jira has a 'rank-field' containing a string which is built up of 3 parts: <bucket>|<rank>:<sub-rank>. (I don't know whether these parts have actual names, this is what I will call them for ease of reference)
Examples of valid rank-fields:
0|vmis7l:hl4
0|i000w8:
0|003fhy:zzzzzzzzzzzw68bj
When dragging a card above 0|vmis7l:hl4, the new card will receive rank 0|vmis7l:hl2, which means that only the rank-field for this new card needs to be updated while the entire list can always be sorted on this rank-field. This is rather clever, and I can't imagine that Lexorank is the only algorithm to use this.
Is there a name for this method of sorting used in the sub-rank?
My question is related to the creation of new cards in Jira. Each new card starts with an empty sub-rank, and the rank is always chosen such that the new card is located at the bottom of the list. I've created a bunch of new stories just to see how the rank would change, and it seems that the rank is always incremented by 8 (in base-36).
Does anyone know more specifically how the rank for new cards is generated? Why is it incremented by 8?
I can only imagine that after some time (270 million cards) there are no more ranks to generate, and the system needs to recalculate the rank-field of all cards to make room for additional ranks.
Are there other triggers that require recalculation of all rank-fields?
I suppose the bucket plays a role in this recalculation. I would like to know how?
We are talking about a special kind of indexing here. This is not sorting; it is just preparing items to end up in a certain order in case someone happens to sort them (by whatever sorting algorithm). I know that variants of this kind of indexing have been used in libraries for decades, maybe centuries, to ensure that books belonging together but lacking a common title end up next to each other in the shelves, but I have never heard of a name for it.
The 8 is probably chosen wisely as a compromise, maybe even by analyzing typical use cases. Consider this: If you choose a small increment, e. g. 1, then all tickets will have ranks like [a, b, c, …]. This will be great if you create a lot of tickets (up to 26) in the correct order because then your rank fields keep small (one letter). But as soon as you move a ticket between two other tickets, you will have to add a letter: [a, b] plus a new ticket between them: [a, an, b]. If you expect to have this a lot, you better leave gaps between the ranks: [a, i, q, …], then an additional ticket can get a single letter as well: [a, e, i, q, …]. But of course if you now create lots of tickets in the correct order right in the beginning, you quickly run out of letters: [a, i, q, y, z, za, zi, zq, …]. The 8 probably is a good value which allows for enough gaps between the tickets without increasing the need for many letters too soon. Keep in mind that other scenarios (maybe not Jira tickets which are created manually) might make other values more reasonable.
You are right, the rank fields get recalculated now and then, Lexorank calls this "balancing". Basically, balancing takes place in one of three occasions: ① The ranks are exhausted (largest value reached), ② the ranks are due to user-reranking of tickets too close together ([a, b, i] and something is supposed to go in between a and b), and ③ a balancing is triggered manually in the management page. (Actually, according to the presentation, Lexorank allows for up to three letter ranks, so "too close together" can be something like aaa and aab but the idea is the same.)
The <bucket> part of the rank is increased during balancing, so a messy [0|a, 0|an, 0|b] can become a nice and clean [1|a, 1|i, 1|q] again. The brownbag presentation about Lexorank (as linked by #dandoen in the comments) mentions a round-robin use of <buckets>, so instead of a constant increment (0→1→2→3→…) a 2 is increased modulo 3, so it will turn back to 0 after the 2 (0→1→2→0→…). When comparing the ranks, the sorting algorithm can consider a 0 "greater" than a 2 (it will not be purely lexicographical then, admitted). If now the balancing algorithm works backwards (reorder the last ticket first), this will keep the sorting order intact all the time. (This is just a side aspect, that's why I keep the explanation small, but if this is interesting, ask, and I will elaborate on this.)
Sidenote: Lexorank also keeps track of minimum and maximum values of the ranks. For the functioning of the algorithm itself, this is not necessary.

Algorithm for optimizing the order of actions with cooldowns

I can choose from a list of "actions" to perform one once a second. Each action on the list has a numerical value representing how much it's worth, and also a value representing its "cooldown" -- the number of seconds I have to wait before using that action again. The list might look something like this:
Action A has a value of 1 and a cooldown of 2 seconds
Action B has a value of 1.5 and a cooldown of 3 seconds
Action C has a value of 2 and a cooldown of 5 seconds
Action D has a value of 3 and a cooldown of 10 seconds
So in this situation, the order ABA would have a total value of (1+1.5+1) = 3.5, and it would be acceptable because the first use of A happens at 1 second and the final use of A happens at 3 seconds, and then difference between those two is greater than or equal to the cooldown of A, 2 seconds. The order AAB would not work because you'd be doing A only a second apart, less than the cooldown.
My problem is trying to optimize the order in which the actions are used, maximizing the total value over a certain number of actions. Obviously the optimal order if you're only using one action would be to do Action D, resulting in a total value of 3. The maximum value from two actions would come from doing CD or DC, resulting in a total value of 5. It gets more complicated when you do 10 or 20 or 100 total actions. I can't find a way to optimize the order of actions without brute forcing it, which gives it complexity exponential on the total number of actions you want to optimize the order for. That becomes impossible past about 15 total.
So, is there any way to find the optimal time with less complexity? Has this problem ever been researched? I imagine there could be some kind of weighted-graph type algorithm that works on this, but I have no idea how it would work, let alone how to implement it.
Sorry if this is confusing -- it's kind of weird conceptually and I couldn't find a better way to frame it.
EDIT: Here is a proper solution using a highly modified Dijkstra's Algorithm:
Dijkstra's algorithm is used to find the shortest path, given a map (of a Graph Abstract), which is a series of Nodes(usually locations, but for this example let's say they are Actions), which are inter-connected by arcs(in this case, instead of distance, each arc will have a 'value')
Here is the structure in essence.
Graph{//in most implementations these are not Arrays, but Maps. Honestly, for your needs you don't a graph, just nodes and arcs... this is just used to keep track of them.
node[] nodes;
arc[] arcs;
}
Node{//this represents an action
arc[] options;//for this implementation, this will always be a list of all possible Actions to use.
float value;//Action value
}
Arc{
node start;//the last action used
node end;//the action after that
dist=1;//1 second
}
We can use this datatype to make a map of all of the viable options to take to get the optimal solution, based on looking at the end-total of each path. Therefore, the more seconds ahead you look for a pattern, the more likely you are to find a very-optimal path.
Every segment of a road on the map has a distance, which represents it's value, and every stop on the road is a one-second mark, since that is the time to make the decision of where to go (what action to execute) next.
For simplicity's sake, let's say that A and B are the only viable options.
na means no action, because no actions are avaliable.
If you are travelling for 4 seconds(the higher the amount, the better the results) your choices are...
A->na->A->na->A
B->na->na->B->na
A->B->A->na->B
B->A->na->B->A
...
there are more too, but I already know that the optimal path is B->A->na->B->A, because it's value is the highest. So, the established best-pattern for handling this combination of actions is (at least after analyzing it for 4 seconds) B->A->na->B->A
This will actually be quite an easy recursive algorithm.
/*
cur is the current action that you are at, it is a Node. In this example, every other action is seen as a viable option, so it's as if every 'place' on the map has a path going to every other path.
numLeft is the amount of seconds left to run the simulation. The higher the initial value, the more desirable the results.
This won't work as written, but will give you a good idea of how the algorithm works.
*/
function getOptimal(cur,numLeft,path){
if(numLeft==0){
var emptyNode;//let's say, an empty node wiht a value of 0.
return emptyNode;
}
var best=path;
path.add(cur);
for(var i=0;i<cur.options.length;i++){
var opt=cur.options[i];//this is a COPY
if(opt.timeCooled<opt.cooldown){
continue;
}
for(var i2=0;i2<opt.length;i2++){
opt[i2].timeCooled+=1;//everything below this in the loop is as if it is one second ahead
}
var potential=getOptimal(opt[i],numLeft-1,best);
if(getTotal(potential)>getTotal(cur)){best.add(potential);}//if it makes it better, use it! getTotal will sum up the values of an array of nodes(actions)
}
return best;
}
function getOptimalExample(){
log(getOptimal(someNode,4,someEmptyArrayOfNodes));//someNode will be A or B
}
End edit.
I'm a bit confused on the question but...
If you have a limited amount of actions, and that's it, then always pick the action with the most value, unless the cooldown hasn't been met yet.
Sounds like you want something like this (in pseudocode):
function getOptimal(){
var a=[A,B,C,D];//A,B,C, and D are actions
a.sort()//(just pseudocode. Sort the array items by how much value they have.)
var theBest=null;
for(var i=0;i<a.length;++i){//find which action is the most valuable
if(a[i].timeSinceLastUsed<a[i].cooldown){
theBest=a[i];
for(...){//now just loop through, and add time to each OTHER Action for their timeSinceLastUsed...
//...
}//That way, some previously used, but more valuable actions will be freed up again.
break;
}//because a is worth the most, and you can use it now, so why not?
}
}
EDIT: After rereading your problem a bit more, I see that the weighted scheduling algorithm would need to be tweaked to fit your problem statement; in our case we only want to take those overlapping actions out of the set that match the class of the action we selected, and those that start at the same point in time. IE if we select a1, we want to remove a2 and b1 from the set but not b2.
This looks very similar to the weighted scheduling problem which is discussed in depth in this pdf. In essence, the weights are your action's values and the intervals are (starttime,starttime+cooldown). The dynamic programming solution can be memoized which makes it run in O(nlogn) time. The only difficult part will be modifying your problem such that it looks like the weighted interval problem which allows us to then utilize the predetermined solution.
Because your intervals don't have set start and end times (IE you can choose when to start a certain action), I'd suggest enumerating all possible start times for all given actions assuming some set time range, then using these static start/end times with the dynamic programming solution. Assuming you can only start an action on a full second, you could run action A for intervals (0-2,1-3,2-4,...), action B for (0-3,1-4,2-5,...), action C for intervals (0-5,1-6,2-7,...) etc. You can then use union the action's sets to get a problem space that looks like the original weighted interval problem:
|---1---2---3---4---5---6---7---| time
|{--a1--}-----------------------| v=1
|---{--a2---}-------------------| v=1
|-------{--a3---}---------------| v=1
|{----b1----}-------------------| v=1.5
|---{----b2-----}---------------| v=1.5
|-------{----b3-----}-----------| v=1.5
|{--------c1--------}-----------| v=2
|---{--------c2---------}-------| v=2
|-------{-------c3----------}---| v=2
etc...
Always choose the available action worth the most points.

Need a Ruby way to determine the elements of a matrix "touching" another element

I think I need a method called “Touching” (as in contiguous, not emotional.)
I need to identify those elements of a matrix that are next to an individual element or set of elements. At least that’s the way I’ve thought of to solve the problem at hand.
The matrix State in the program below represents, let’s say, some underwater topography. As I lower the water, eventually the highest point will stick out and become an “island”. When the “water level” is at 34 then the element State[2,3] is the single point of the island. The array atlantis holds the coordinates of that single point .
As we lower the water level further, additional points will be “above water.” Additional contiguous points will become part of the island and their coordinates would be added to the array atlantis. (For example, the next piece of land to be part of atlantis would be State[3,4] at 31.)
My thought about how to do this is to identify all the matrix elements that touch/are next to the element in the atlantis, find the one with the highest elevation and then add it to the array atlantis. Looking for the elements next to a single element is a challenge in itself, but we could write some code to examine the set [i,j-1], [i,j+1], [i-1,j-1], [i-1,j], [i-1,j+1], [i+1,j-1], [i+1,J], [i+1,j+1]. (I think I got that right.)
But as we add additional points, the task of determining which points surround the points in atlantis becomes increasingly difficult. So that’s my question: can anyone think of any mechanism by which to do this? Any kind of simplified algorithm using capabilities of ruby of which I am unaware? (which include all but the most basic.) If such a method could be written then I could write atlantis.touching and get an array, for example, containing all the coordinates of all the points presently contiguous to atlantis.
At least that’s how I’m thinking this could be done. Any other ideas would be welcome. And if anyone knows any kind of partnering site where I could seek others who might be interested in working with me on this, that would be great.
# create State database using matrix
require 'matrix'
State=Matrix[ [3,1,4,4,6,2,8,12,8,2],
[6,2,4,13,25,21,11,22,9,3,],
[6,20,27,34,22,14,12,11,2,5],
[6,28,17,23,31,18,11,9,18,12],
[9,18,11,13,8,9,10,14,24,11],
[3,9,7,16,9,12,28,24,29,21],
[5,8,4,7,17,14,19,30,33,4],
[7,17,23,9,5,9,22,21,12,21,],
[7,14,25,22,16,10,19,15,12,11],
[5,16,7,3,6,3,9,8,1,5] ]
#find sate elements contiguous to island
atlantis=[[2,3]]
find all state[i,j] "touching" atlantis
Only checking the points around the currently exposed area doesn't sound like it could cover every case - what if the next point to be exposed was the beginning of a new island?
I'd go about it like this: Have another array - let's call it sorted which contains your points sorted by height. Every time you raise the water level, pop all the elements higher than the new water level off sorted and onto atlantis.
In fact, there's no need for separate sorted and atlantis arrays if you do it this way. Just store the index of the highest point not above water, and you've essentially got two arrays in one - everything above water on one side, and everything below water on the other.
Hope that helps!

Coming up with factors for a weighted algorithm?

I'm trying to come up with a weighted algorithm for an application. In the application, there is a limited amount of space available for different elements. Once all the space is occupied, the algorithm should choose the best element(s) to remove in order to make space for new elements.
There are different attributes which should affect this decision. For example:
T: Time since last accessed. (It's best to replace something that hasn't been accessed in a while.)
N: Number of times accessed. (It's best to replace something which hasn't been accessed many times.)
R: Number of elements which need to be removed in order to make space for the new element. (It's best to replace the least amount of elements. Ideally this should also take into consideration the T and N attributes of each element being replaced.)
I have 2 problems:
Figuring out how much weight to give each of these attributes.
Figuring out how to calculate the weight for an element.
(1) I realize that coming up with the weight for something like this is very subjective, but I was hoping that there's a standard method or something that can help me in deciding how much weight to give each attribute. For example, I was thinking that one method might be to come up with a set of two sample elements and then manually compare the two and decide which one should ultimately be chosen. Here's an example:
Element A: N = 5, T = 2 hours ago.
Element B: N = 4, T = 10 minutes ago.
In this example, I would probably want A to be the element that is chosen to be replaced since although it was accessed one more time, it hasn't been accessed in a lot of time compared with B. This method seems like it would take a lot of time, and would involve making a lot of tough, subjective decisions. Additionally, it may not be trivial to come up with the resulting weights at the end.
Another method I came up with was to just arbitrarily choose weights for the different attributes and then use the application for a while. If I notice anything obviously wrong with the algorithm, I could then go in and slightly modify the weights. This is basically a "guess and check" method.
Both of these methods don't seem that great and I'm hoping there's a better solution.
(2) Once I do figure out the weight, I'm not sure which way is best to calculate the weight. Should I just add everything? (In these examples, I'm assuming that whichever element has the highest replacementWeight should be the one that's going to be replaced.)
replacementWeight = .4*T - .1*N - 2*R
or multiply everything?
replacementWeight = (T) * (.5*N) * (.1*R)
What about not using constants for the weights? For example, sure "Time" (T) may be important, but once a specific amount of time has passed, it starts not making that much of a difference. Essentially I would lump it all in an "a lot of time has passed" bin. (e.g. even though 8 hours and 7 hours have an hour difference between the two, this difference might not be as significant as the difference between 1 minute and 5 minutes since these two are much more recent.) (Or another example: replacing (R) 1 or 2 elements is fine, but when I start needing to replace 5 or 6, that should be heavily weighted down... therefore it shouldn't be linear.)
replacementWeight = 1/T + sqrt(N) - R*R
Obviously (1) and (2) are closely related, which is why I'm hoping that there's a better way to come up with this sort of algorithm.
What you are describing is the classic problem of choosing a cache replacement policy. Which policy is best for you, depends on your data, but the following usually works well:
First, always store a new object in the cache, evicting the R worst one(s). There is no way to know a priori if an object should be stored or not. If the object is not useful, it will fall out of the cache again soon.
The popular squid cache implements the following cache replacement algorithms:
Least Recently Used (LRU):
replacementKey = -T
Least Frequently Used with Dynamic Aging (LFUDA):
replacementKey = N + C
Greedy-Dual-Size-Frequency (GDSF):
replacementKey = (N/R) + C
C refers to a cache age factor here. C is basically the replacementKey of the item that was evicted last (or zero).
NOTE: The replacementKey is calculated when an object is inserted or accessed, and stored alongside the object. The object with the smallest replacementKey is evicted.
LRU is simple and often good enough. The bigger your cache, the better it performs.
LFUDA and GDSF both are tradeoffs. LFUDA prefers to keep large objects even if they are less popular, under the assumption that one hit to a large object makes up lots of hits for smaller objects. GDSF basically makes the opposite tradeoff, keeping many smaller objects over fewer large objects. From what you write, the latter might be a good fit.
If none of these meet your needs, you can calculate optimal values for T, N and R (and compare different formulas for combining them) by minimizing regret, the difference in performance between your formula and the optimal algorithm, using, for example, Linear regression.
This is a completely subjective issue -- as you yourself point out. And a distinct possibility is that if your test cases consist of pairs (A,B) where you prefer A to B, then you might find that you prefer A to B , B to C but also C over A -- i.e. its not an ordering.
If you are not careful, your function might not exist !
If you can define a scalar function of your input variables, with various parameters for coefficients and exponents, you might be able to estimate said parameters by using regression, but you will need an awful lot of data if you have many parameters.
This is the classical statistician's approach of first reviewing the data to IDENTIFY a model, and then using that model to ESTIMATE a particular realisation of the model. There are large books on this subject.

Resources