So I have a large list of items ,each of which has an ID assigned . Now I need to pick N items from the list ,such that the ratio of the number of items from each ID is given.
Lets say:
There are 3 ids , and their weights are in the ratio - 1:3:2
so if N = 6 ,
I'll pick 1 item of id 1, 3 of id 2 ,and 2 of id 3
However in some cases there might not be enough items of a particular ID , in those cases it will have to be adjusted between the other ids.Total number of items picked has to be N.
One possible solution I thought of was to convert this problem to a weighted sampling problem. However converting the weights of the IDs to weights of each item would add a lot of complexity I believe .
Conceptually it is not that difficult, although you will have to handle a few edge cases.
Compute the actual quantity of items needed for each id based on the ratio between the sum of your input ratios and the total requested quantity, N. You may have to take care of rounding issues, so one quantity (perhaps the largest one) may need to be adjusted.
Scan over your list, and for each id create a list of "selected" items which will go in the final result, and a list of "available" items which may be used later, in case some ids don't reach the requested quantity.
One possibility is, at a point all ids will have reached the requested quantity, in which case we don't even need to loop over the full input list.
If this is not the case, then compute how many items are still needed, recalculate the new ratios for the ids which have available items and use those items to reach the total requested quantity; repeat until N is reached.
In the context of polls of voting results, I want to generate random numbers with specific properties to sample the possibilities within the margin of error of the poll.
For example, suppose I have polling results:
Party A: 34%
Party B: 25%
Party C: 14%
Party D: 27%
With a margin of error of 3.2% on the poll-results.
I want to generate batches of 4 (because in this case there are 4 parties) random numbers. Obviously because voting is a zero-sum game, the numbers in the batch need to sum up to zero.
I want each element of the batch to be smaller (in absolute value) than the given margin of error of the poll-results.
An example could be: (-0.5, +1.2, +0.1, -0.8). All the elements sum up to zero and each element is in absolute value smaller than 3.2, i.e. the given margin of error of the poll-results.
When generating a large amount of this kind of batches of random numbers, I would like these to have some specific statistic properties:
The maximum of the absolute values of the elements should be uniformly distributed over (0, error_of_margin) (this is the easy part).
The mean of the absolute values should also be uniformly distributed over (0, error_of_margin).
I tried two different approaches. I will link a gist, to not fill the question with code. https://gist.github.com/thomvil/02890ea2873eed6bb155e7dc387c9564
Does anybody have some suggestions on how I could tackle this?
Here is the scenario
Given requested amount 789
I have 4 warehouse. Warehouse 1 will be the highest priority and follow by warehouse 2 etc...
All warehouse have their own min and max stock amount to distribute.
Now I want to split the requested amount to fit the available ranges.
*Not necessary to utilize all the warehouse but the highest priority have to be utilize first.
*All warehouses only can use one time
Expecting answer as per below picture.
Is there any algorithm to get me an answer ?> [400,189,200]
Thanks alot ]1
Check if the maximum of the 1st warehouse covers the requested amount. If not, check if the max of 1 and 2 covers, if not, the max of 1,2 and 3... until it covers. Now you have x warehouses to use.
Start from the bottom now. Use the minimum possible from each warehouse until the top. Minimum of less priority warehouse, plus the minimum of 2nd less priority... The sum of all the minimums may not cover the requested amount, but let's proceed with that amount
Start adding from the more priority warehouse - add to the maximum of it. Does it cover?
If yes, remove (from that warehouse) until the requested amount is equal to the collected.
If not, Repeat steps 3 and 4 for the 2nd more priority warehouse, 3rd, 4th...
Out of 1000 locations, 250 locations are eligible for setting up an outlet. I want select 5 localities of these 250 such that it maximizes the sum of profits from neighborhoods of selected locations and the outlets are at least 5 miles apart. Willingness of people to travel from one location to another is given (which defines neighborhood of that location)
I have tried integer programming but had problems in defining the objective function. Any clustering/optimization technique that can solve this problem?
EDIT:
Given:
1000 locations and great circle distance between any two locations
Willingness of people to travel from one location to another for all 1000 locations
250 eligible locations
Objective:
To maximize the profit from 5 clusters where each cluster contains a selected location and all locations from where people are willing to travel to the selected location.
Constraints:
Total selected locations have to be 5 and have to be from 250 eligible locations
Selected locations have to be at least 5 miles apart
Every location can belong to only one cluster
a constrained full enumeration shouldl do for this problem size:
enumerate all feasible locations combinations not violating the "at least 5 miles apart" constraint. A simple recursion, easily parallelized.
for each of combinations from #1 calculate the total number of locations "covered" by the layout and pick the max. For each node in these 1000 find which outlet if any people from the node will go and +1 if any outlet is reachable. Easily parallelized, again.
Optional. Out of those equally good prefer the ones with the best "load balancing".
The problem you are working on is:
http://en.m.wikipedia.org/wiki/Quadratic_assignment_problem
Classic heuristics of approaching these are branch&bound, genetic, annealing. The full enumeration would be used to validate that heuristic is able to approach the global minimum efficiently on small problem sizes.
If I understand your problem correctly you want the following:
let variable Li represent location i being selected or not. (i = 1..250)
Let Si = {Lj for all j such that distance(Li, Lj) <= 5 miles}
Let Ci be a constant representing the value of neighborhood of Li
The constraints are:
Sum Si <= 1
Li = 1 or 0
Sum(Li for all i) == 5
the Objective function is
maximize Sum(Li*Ci for all i)
Here's the scenario.
I have one hundred car objects. Each car has a property for speed, and a property for price. I want to arrange images of the cars in a grid so that the fastest and most expensive car is at the top right, and the slowest and cheapest car is at the bottom left, and all other cars are in an appropriate spot in the grid.
What kind of sorting algorithm do I need to use for this, and do you have any tips?
EDIT: the results don't need to be exact - in reality I'm dealing with a much bigger grid, so it would be sufficient if the cars were clustered roughly in the right place.
Just an idea inspired by Mr Cantor:
calculate max(speed) and max(price)
normalize all speed and price data into range 0..1
for each car, calculate the "distance" to the possible maximum
based on a²+b²=c², distance could be something like
sqrt( (speed(car[i])/maxspeed)^2 + (price(car[i])/maxprice)^2 )
apply weighting as (visually) necessary
sort cars by distance
place "best" car in "best" square (upper right in your case)
walk the grid in zigzag and fill with next car in sorted list
Result (mirrored, top left is best):
1 - 2 6 - 7
/ / /
3 5 8
| /
4
Treat this as two problems:
1: Produce a sorted list
2: Place members of the sorted list into the grid
The sorting is just a matter of you defining your rules more precisely. "Fastest and most expensive first" doesn't work. Which comes first my £100,000 Rolls Royce, top speed 120, or my souped-up Mini, cost £50,000, top speed 180?
Having got your list how will you fill it? First and last is easy, but where does number two go? Along the top or down? Then where next, along rows, along the columns, zig-zag? You've got to decide. After that coding should be easy.
I guess what you want is to have cars that have "similar" characteristics to be clustered nearby, and additionally that the cost in general increases rightwards, and speed in general increases upwards.
I would try to following approach. Suppose you have N cars and you want to put them in an X * Y grid. Assume N == X * Y.
Put all the N cars in the grid at random locations.
Define a metric that calculates the total misordering in the grid; for example, count the number of car pairs C1=(x,y) and C2=(x',y') such that C1.speed > C2.speed but y < y' plus car pairs C1=(x,y) and C2=(x',y') such that C1.price > C2.price but x < x'.
Run the following algorithm:
Calculate current misordering metric M
Enumerate through all pairs of cars in the grid and calculate the misordering metric M' you obtain if you swapt the cars
Swap the pair of cars that reduces the metric most, if any such pair was found
If you swapped two cars, repeat from step 1
Finish
This is a standard "local search" approach to an optimization problem. What you have here is basically a simple combinatorial optimization problem. Another approaches to try might be using a self-organizing map (SOM) with preseeded gradient of speed and cost in the matrix.
Basically you have to take one of speed or price as primary and then get the cars with the same value of this primary and sort those values in ascending/descending order and primaries are also taken in the ascending/descending order as needed.
Example:
c1(20,1000) c2(30,5000) c3(20, 500) c4(10, 3000) c5(35, 1000)
Lets Assume Car(speed, price) as the measure in the above list and the primary is speed.
1 Get the car with minimum speed
2 Then get all the cars with the same speed value
3 Arrange these values in ascending order of car price
4 Get the next car with the next minimum speed value and repeat the above process
c4(10, 3000)
c3(20, 500)
c1(20, 1000)
c2(30, 5000)
c5(35, 1000)
If you post what language you are using them it would we helpful as some language constructs make this easier to implement. For example LINQ makes your life very easy in this situation.
cars.OrderBy(x => x.Speed).ThenBy(p => p.Price);
Edit:
Now you got the list, as per placing this cars items into the grid unless you know that there will be this many number of predetermined cars with these values, you can't do anything expect for going with some fixed grid size as you are doing now.
One option would be to go with a nonuniform grid, If you prefer, with each row having car items of a specific speed, but this is only applicable when you know that there will be considerable number of cars which has same speed value.
So each row will have cars of same speed shown in the grid.
Thanks
Is the 10x10 constraint necessary? If it is, you must have ten speeds and ten prices, or else the diagram won't make very much sense. For instance, what happens if the fastest car isn't the most expensive?
I would rather recommend you make the grid size equal to
(number of distinct speeds) x (number of distinct prices),
then it would be a (rather) simple case of ordering by two axes.
If the data originates in a database, then you should order them as you fetch them from the database. This should only mean adding ORDER BY speed, price near the end of your query, but before the LIMIT part (where 'speed' and 'price' are the names of the appropriate fields).
As others have said, "fastest and most expensive" is a difficult thing to do, you ought to just pick one to sort by first. However, it would be possible to make an approximation using this algorithm:
Find the highest price and fastest speed.
Normalize all prices and speeds to e.g. a fraction out of 1. You do this by dividing the price by the highest price you found in step 1.
Multiply the normalized price and speed together to create one "price & speed" number.
Sort by this number.
This ensures that is car A is faster and more expensive than car B, it gets put ahead on the list. Cars where one value is higher but the other is lower get roughly sorted. I'd recommend storing these values in the database and sorting as you select.
Putting them in a 10x10 grid is easy. Start outputting items, and when you get to a multiple of 10, start a new row.
Another option is to apply a score 0 .. 200% to each car, and sort by that score.
Example:
score_i = speed_percent(min_speed, max_speed, speed_i) + price_percent(min_price, max_price, price_i)
Hmmm... kind of bubble sort could be simple algorithm here.
Make a random 10x10 array.
Find two neighbours (horizontal or vertical) that are in "wrong order", and exchange them.
Repeat (2) until no such neighbours can be found.
Two neighbour elements are in "wrong order" when:
a) they're horizontal neighbours and left one is slower than right one,
b) they're vertical neighbours and top one is cheaper than bottom one.
But I'm not actually sure if this algorithm stops for every data. I'm almost sure it is very slow :-). It should be easy to implement and after some finite number of iterations the partial result might be good enough for your purposes though. You can also start by generating the array using one of other methods mentioned here. Also it will maintain your condition on array shape.
Edit: It is too late here to prove anything, but I made some experiments in python. It looks like a random array of 100x100 can be sorted this way in few seconds and I always managed to get full 2d ordering (that is: at the end I got wrongly-ordered neighbours). Assuming that OP can precalculate this array, he can put any reasonable number of cars into the array and get sensible results. Experimental code: http://pastebin.com/f2bae9a79 (you need matplotlib, and I recommend ipython too). iterchange is the sorting method there.