Choose X Elements from a List With Probabilties

Choose X Elements from a List With Probabilties - random

I have to make a List of let's say 10 items of different types.
For example:
item A. probability 90%
item B. probability 90%
item C. probability 90%
item D. probability 30%
item E. probability 20%
item F. probability 10%
item G. probability 1%
How could I get this list?
I've tried an approach showed by the first and for the third down (based on upvote) answers under this question: How to pick an item by its probability?
But as far as I understood all of these need a code line for every possible element, for example, if(Random.nextDouble(1) <= item.probability){List.add(item)} and while mathematically correct, this approach would request new logic for every item and wouldn't be easy to adjust to new items and wouldn't fit well with items that have the same probability of appearing.
I cannot quite understand The Second Question (The Accepted One), but as far as I've understood it also uses a per-item way of doing it (I Might be wrong, sorry, it's very condensed and it's java).
So I was looking for something more like this:
A weighted version of random.choice
The answer I linked is in python, and is a single line implementation of the method numpy.choice which could then be put into a cicle from 0 to X to run X times. Is there a way of doing something like this in Dart?

Related

n-place mastermind variation algorithm

A few days ago I came across such a problem at the contest my uni was holding:
Given the history of guesses in a mastermind game using digits instead
of colors in a form of pairs (x, y) where x is the guess and y is how
many digits were placed correctly, guess the correct number. Each
input is guaranteed to have a solution.
Example for a 5-place game:
(90342, 2)
(70794, 0)
(39458, 2)
(34109, 1)
(51545, 2)
(12531, 1)
Should yield:
39542
Create an algorithm to correctly guess the result in an n-place
mastermind given the history.
So the only idea I had was to keep the probability of each digit being correct based on the correct shots in a given guess and then try to generate the most possible number, then the next one and so on - so for example we'd have 9 being 40% possible for the first place (cause the first guess has 2/5=40% correct), 7 being impossible and so on. Then we do the same for other places in the number and finally generate a number with the highest probability to test it against all the guesses.
The problem with this approach, though, is that generating the next possible number, and the next, and so on (as we probably won't score a home run in the first try) is really non-trivial (or at least I don't see an easy way of implementing this) and since this contest had something like a 90 minute timeframe and this wasn't the only problem, I don't think something so elaborate was the anticipated approach.
So how could one do it easier?

An approach that comes to mind is to write a routine that can generally filter an enumeration of combinations based on a particular try and its score.
So for your example, you would initially pick one of the most constrained tries (one of the ones with a score of 2) as a filter and then enumerate all combinations that satisfy it.
The output from that enumeration is then used as input to a filter run for the next unprocessed try, and so on, until the list of tries is exhausted.
The candidate try that comes out of the final enumeration is the solution.

Probability does not apply here. In this case a number is either right or wrong. There is no "partially right".
For 5 digits you can just test all 100,000 possible numbers against the given history and throw out the ones where the matches are incorrect. This approach becomes impractical for larger numbers at some point. You will be left with a list of numbers that meet the criteria. If there is exactly one in the list, then you have solved it.
python code, where matches counts the matching digits of its 2 parameters:
for k in range(0,100000):
if matches(k,90342)==2 and matches(k,70794)==0 and matches(k,39458)==2 and matches(k,34109)==1 and matches(k,51545)==2 and matches(k,12531):
print k
prints:
39542

How do I calculate the most profit-dense combination in the most efficient way?

I have a combinations problem that's bothering me. I'd like someone to give me their thoughts and point out if I'm missing some obvious solution that I may have overlooked.
Let's say that there is a shop that buys all of its supplies from one supplier. The supplier has a list of items for sale. Each item has the following attributes:
size, cost, quantity, m, b
m and b are constants in the following equation:
sales = m * (price) + b
This line slopes downward. The equation tells me how many of that item I will be able to sell if I charge that particular price. Each item has its own m and b values.
Let's say that the shop has limited storage space, and limited funds. The shop wants to fill its warehouse with the most profit-dense items possible.
(By the way, profit density = profit/size. I'm defining that profit density be only with regard to the items size. I could work with the density with regard to size and cost, but to do that I'd have to know the cost of warehouse space. That's not a number I know currently, so I'm just going to use size.)
The profit density of items drops the more you buy (see below.)
If I flip the line equation, I can see what price I'd have to charge to sell some given amount of the item in some given period of time.
price = (sales-b)/m
So if I buy n items and wanted to sell all of them, I'd have to charge
price = (n-b)/m
The revenue from this would be
price*n = n*(n-b)/m
The profit would be
price*n-n*cost = n*(n-b)/m - n*cost
and the profit-density would be
(n*(n-b)/m - n*cost)/(n*size)
or, equivalently
((n-b)/m - cost)/size
So let's say I have a table containing every available item, and each item's profit-density.
The question is, how many of each item do I buy in order to maximise the amount of money that the shop makes?
One possibility is to generate every possible combination of items within the bounds of cost and space, and choose the combo with the highest profitability. In a list of 1000 items, this takes too long. (I tried this and it took 17 seconds for a list of 1000. Horrible.)
Another option I tried (on paper) was to take the top two most profitable items on the list. Let's call the most profitable item A, the 2nd-most profitable item B, and the 3rd-most profitable item C. I buy as many of item A as I can until it's less profitable than item B. Then I repeat this process using B and C, for every item in the list.
It might be the case however, that after buying item B, item A is again the most profitable item, more so than C. So this would involve hopping from the current most profitable item to the next until the resources are exhausted. I could do this, but it seems like an ugly way to do it.
I considered dynamic programming, but since the profit-densities of the items change depending on the amount you buy, I couldn't come up with a resolution for this.
I've considered multiple-linear regression, and by 'consider' I mean I've said to myself "is multi-linear regression an option?" and then done nothing with it.
My spidey-sense tells me that there's a far more obvious method staring me in the face, but I'm not seeing it. Please help me kick myself and facepalm at the same time.

If you treat this as a simple exercise in multivariate optimization, where the controllable variables are the quantities bought, then you are optimizing a quadratic function subject to a linear constraint.
If you use a Lagrange multiplier and differentiate then you get a linear equation for each quantity variable involving itself and the Lagrange multiplier as the only unknowns, and the constraint gives you a single linear equation involving all of the quantities. So write each quantity as a linear function of the Lagrange multiplier and substitute into the constraint equation to get a linear equation in the Lagrange multiplier. Solve this and then plug the Lagrange multiplier into the simpler equations to get the quantities.
This gives you a solution if you are allowed to buy fractional and negative quantities of things if required. Clearly you are not, but you might hope that nothing is very negative and you can round the non-integer quantities to get a reasonable answer. If this isn't good enough for you, you could use it as a basis for branch and bound. If you make an assumption on the value of one of the quantities and solve for the others in this way, you get an upper bound on the possible best answer - the profit predicted neglecting real world constraints on non-negativity and integer values will always be at least the profit earned if you have to comply with these constraints.

You can treat this as a dynamic programming exercise, to make the best use of a limited resource.
As a simple example, consider just satisfying the constraint on space and ignoring that on cost. Then you want to find the items that generate the most profit for the available space. Choose units so that expressing the space used as an integer is reasonable, and then, for i = 1 to number of items, work out, for each integer value of space up to the limit, the selection of the first i items that gives the most return for that amount of space. As usual, you can work out the answers for i+1 from the answers for i: for each value from 0 up to the limit on space just consider all possible quantities of the i+1th item up to that amount of space, and work out the combined return from using that quantity of the item and then using the remaining space according to the answers you have already worked out for the first i items. When i reaches the total number of items you will be working out the best possible return for the problem you actually want to solve.
If you have constraints for both space and cost, then the state of the dynamic program is not the single variable (space) but a pair of variables (space, cost) but you can still solve it, although with more work. Consider all possible values of (space, cost) from (0, 0) up to the actual constraints - you have a 2-dimensional table of returns to compute instead of a single set of values from 0 to max-space. But you can still work from i=1 to N, computing the highest possible return for the first i items for each limit of (space, cost) and using the answers for i to compute the answers for i+1.

Randomly sample a data set

I came across a Q that was asked in one of the interviews..
Q - Imagine you are given a really large stream of data elements (queries on google searches in May, products bought at Walmart during the Christmas season, names in a phone book, whatever). Your goal is to efficiently return a random sample of 1,000 elements evenly distributed from the original stream. How would you do it?
I am looking for -
What does random sampling of a data set mean?
(I mean I can simply do a coin toss and select a string from input if outcome is 1 and do this until i have 1000 samples..)
What are things I need to consider while doing so? For example .. taking contiguous strings may be better than taking non-contiguous strings.. to rephrase - Is it better if i pick contiguous 1000 strings randomly.. or is it better to pick one string at a time like coin toss..
This may be a vague question.. I tried to google "randomly sample data set" but did not find any relevant results.

Binary sample/don't sample may not be the right answer.. suppose you want to sample 1000 strings and you do it via coin toss.. This would mean that approximately after visiting 2000 strings.. you will be done.. What about the rest of the strings?
I read this post - http://gregable.com/2007/10/reservoir-sampling.html
which answers this Q quite clearly..
Let me put the summary here -
SIMPLE SOLUTION
Assign a random number to every element as you see them in the stream, and then always keep the top 1,000 numbered elements at all times.
RESERVOIR SAMPLING
Make a reservoir (array) of 1,000 elements and fill it with the first 1,000 elements in your stream.
Start with i = 1,001. With what probability after the 1001'th step should element 1,001 (or any element for that matter) be in the set of 1,000 elements? The answer is easy: 1,000/1,001. So, generate a random number between 0 and 1, and if it is less than 1,000/1,001 you should take element 1,001.
If you choose to add it, then replace any element (say element #2) in the reservoir chosen randomly. The element #2 is definitely in the reservoir at step 1,000 and the probability of it getting removed is the probability of element 1,001 getting selected multiplied by the probability of #2 getting randomly chosen as the replacement candidate. That probability is 1,000/1,001 * 1/1,000 = 1/1,001. So, the probability that #2 survives this round is 1 - that or 1,000/1,001.
This can be extended for the i'th round - keep the i'th element with probability 1,000/i and if you choose to keep it, replace a random element from the reservoir. The probability any element before this step being in the reservoir is 1,000/(i-1). The probability that they are removed is 1,000/i * 1/1,000 = 1/i. The probability that each element sticks around given that they are already in the reservoir is (i-1)/i and thus the elements' overall probability of being in the reservoir after i rounds is 1,000/(i-1) * (i-1)/i = 1,000/i.

I think you have used the word infinite a bit loosely , the very premise of sampling is every element has an equal chance to be in the sample and that is only possible if you at least go through every element. So I would translate infinite to mean a large number indicating you need a single pass solution rather than multiple passes.
Reservoir sampling is the way to go though the analysis from #abipc seems in the right direction but is not completely correct.
It is easier if we are firstly clear on what we want. Imagine you have N elements (N unknown) and you need to pick 1000 elements. This means we need to device a sampling scheme where the probability of any element being there in the sample is exactly 1000/N , so each element has the same probability of being in sample (no preference to any element based on its position on the original list). The scheme mentioned by #abipc works fine, the probability calculations goes like this -
After first step you have 1001 elements so we need to pick each element with probability 1000/1001. We pick the 1001st element with exactly that probability so that is fine. Now we also need to show that every other element also has the same probability of being in the sample.
p(any other element remaining in the sample) = [ 1 - p(that element is
removed from sample)]
= [ 1 - p(1001st element is selected) * p(the element is picked to be removed)
= [ 1 - (1000/1001) * (1/1000)] = 1000/1001
Great so now we have proven every element has a probability of 1000/1001 to be in the sample. This precise argument can be extended for the ith step using induction.

As I know such class of algorithms is called Reservoir Sampling algorithms.
I know one of it from DataMining, but don't know the name of it:
Collect first S elements in your storage with max.size equal to S.
Suppose next element of the stream has number N.
With probability S/N catch new element, else discard it
If you catched element N, then replace one of the elements in the sameple S, picked it uniformally.
N=N+1, get next element, goto 1
It can be theoretically proved that at any step of such stream processing your storage with size S contains elements with equal probablity S/N_you_have_seen.
So for example S=10;
N_you_have_seen=10^6
S - is finite number;
N_you_have_seen - can be infinite number;

How do I pick the most beneficial combination of items from a set of items?

I'm designing a piece of a game where the AI needs to determine which combination of armor will give the best overall stat bonus to the character. Each character will have about 10 stats, of which only 3-4 are important, and of those important ones, a few will be more important than the others.
Armor will also give a boost to 1 or all stats. For example, a shirt might give +4 to the character's int and +2 stamina while at the same time, a pair of pants may have +7 strength and nothing else.
So let's say that a character has a healthy choice of armor to use (5 pairs of pants, 5 pairs of gloves, etc.) We've designated that Int and Perception are the most important stats for this character. How could I write an algorithm that would determine which combination of armor and items would result in the highest of any given stat (say in this example Int and Perception)?

Targeting one statistic
This is pretty straightforward. First, a few assumptions:
You didn't mention this, but presumably one can only wear at most one kind of armor for a particular slot. That is, you can't wear two pairs of pants, or two shirts.
Presumably, also, the choice of one piece of gear does not affect or conflict with others (other than the constraint of not having more than one piece of clothing in the same slot). That is, if you wear pants, this in no way precludes you from wearing a shirt. But notice, more subtly, that we're assuming you don't get some sort of synergy effect from wearing two related items.
Suppose that you want to target statistic X. Then the algorithm is as follows:
Group all the items by slot.
Within each group, sort the potential items in that group by how much they boost X, in descending order.
Pick the first item in each group and wear it.
The set of items chosen is the optimal loadout.
Proof: The only way to get a higher X stat would be if there was an item A which provided more X than some other in its group. But we already sorted all the items in each group in descending order, so there can be no such A.
What happens if the assumptions are violated?
If assumption one isn't true -- that is, you can wear multiple items in each slot -- then instead of picking the first item from each group, pick the first Q(s) items from each group, where Q(s) is the number of items that can go in slot s.
If assumption two isn't true -- that is, items do affect each other -- then we don't have enough information to solve the problem. We'd need to know specifically how items can affect each other, or else be forced to try every possible combination of items through brute force and see which ones have the best overall results.
Targeting N statistics
If you want to target multiple stats at once, you need a way to tell "how good" something is. This is called a fitness function. You'll need to decide how important the N statistics are, relative to each other. For example, you might decide that every +1 to Perception is worth 10 points, while every +1 to Intelligence is only worth 6 points. You now have a way to evaluate the "goodness" of items relative to each other.
Once you have that, instead of optimizing for X, you instead optimize for F, the fitness function. The process is then the same as the above for one statistic.

If, there is no restriction on the number of items by category, the following will work for multiple statistics and multiple items.
Data preparation:
Give each statistic (Int, Perception) a weight, according to how important you determine it is
Store this as a 1-D array statImportance
Give each item-statistic combination a value, according to how much said item boosts said statistic for the player
Store this as a 2-D array itemStatBoost
Algorithm:
In pseudocode. Here assume that itemScore is a sortable Map with Item as the key and a numeric value as the value, and values are initialised to 0.
Assume that the sort method is able to sort this Map by values (not keys).
//Score each item and rank them
for each statistic as S
for each item as I
score = itemScore.get(I) + (statImportance[S] * itemStatBoost[I,S])
itemScore.put(I, score)
sort(itemScore)
//Decide which items to use
maxEquippableItems = 10 //use the appropriate value
selectedItems = new array[maxEquippableItems]
for 0 <= idx < maxEquippableItems
selectedItems[idx] = itemScore.getByIndex(idx)

How can I sort a 10 x 10 grid of 100 car images in two dimensions, by price and speed?

Here's the scenario.
I have one hundred car objects. Each car has a property for speed, and a property for price. I want to arrange images of the cars in a grid so that the fastest and most expensive car is at the top right, and the slowest and cheapest car is at the bottom left, and all other cars are in an appropriate spot in the grid.
What kind of sorting algorithm do I need to use for this, and do you have any tips?
EDIT: the results don't need to be exact - in reality I'm dealing with a much bigger grid, so it would be sufficient if the cars were clustered roughly in the right place.

Just an idea inspired by Mr Cantor:
calculate max(speed) and max(price)
normalize all speed and price data into range 0..1
for each car, calculate the "distance" to the possible maximum
based on a²+b²=c², distance could be something like
sqrt( (speed(car[i])/maxspeed)^2 + (price(car[i])/maxprice)^2 )
apply weighting as (visually) necessary
sort cars by distance
place "best" car in "best" square (upper right in your case)
walk the grid in zigzag and fill with next car in sorted list
Result (mirrored, top left is best):
1 - 2 6 - 7
/ / /
3 5 8
| /
4

Treat this as two problems:
1: Produce a sorted list
2: Place members of the sorted list into the grid
The sorting is just a matter of you defining your rules more precisely. "Fastest and most expensive first" doesn't work. Which comes first my £100,000 Rolls Royce, top speed 120, or my souped-up Mini, cost £50,000, top speed 180?
Having got your list how will you fill it? First and last is easy, but where does number two go? Along the top or down? Then where next, along rows, along the columns, zig-zag? You've got to decide. After that coding should be easy.

I guess what you want is to have cars that have "similar" characteristics to be clustered nearby, and additionally that the cost in general increases rightwards, and speed in general increases upwards.
I would try to following approach. Suppose you have N cars and you want to put them in an X * Y grid. Assume N == X * Y.
Put all the N cars in the grid at random locations.
Define a metric that calculates the total misordering in the grid; for example, count the number of car pairs C1=(x,y) and C2=(x',y') such that C1.speed > C2.speed but y < y' plus car pairs C1=(x,y) and C2=(x',y') such that C1.price > C2.price but x < x'.
Run the following algorithm:
Calculate current misordering metric M
Enumerate through all pairs of cars in the grid and calculate the misordering metric M' you obtain if you swapt the cars
Swap the pair of cars that reduces the metric most, if any such pair was found
If you swapped two cars, repeat from step 1
Finish
This is a standard "local search" approach to an optimization problem. What you have here is basically a simple combinatorial optimization problem. Another approaches to try might be using a self-organizing map (SOM) with preseeded gradient of speed and cost in the matrix.

Basically you have to take one of speed or price as primary and then get the cars with the same value of this primary and sort those values in ascending/descending order and primaries are also taken in the ascending/descending order as needed.
Example:
c1(20,1000) c2(30,5000) c3(20, 500) c4(10, 3000) c5(35, 1000)
Lets Assume Car(speed, price) as the measure in the above list and the primary is speed.
1 Get the car with minimum speed
2 Then get all the cars with the same speed value
3 Arrange these values in ascending order of car price
4 Get the next car with the next minimum speed value and repeat the above process
c4(10, 3000)
c3(20, 500)
c1(20, 1000)
c2(30, 5000)
c5(35, 1000)
If you post what language you are using them it would we helpful as some language constructs make this easier to implement. For example LINQ makes your life very easy in this situation.
cars.OrderBy(x => x.Speed).ThenBy(p => p.Price);
Edit:
Now you got the list, as per placing this cars items into the grid unless you know that there will be this many number of predetermined cars with these values, you can't do anything expect for going with some fixed grid size as you are doing now.
One option would be to go with a nonuniform grid, If you prefer, with each row having car items of a specific speed, but this is only applicable when you know that there will be considerable number of cars which has same speed value.
So each row will have cars of same speed shown in the grid.
Thanks

Is the 10x10 constraint necessary? If it is, you must have ten speeds and ten prices, or else the diagram won't make very much sense. For instance, what happens if the fastest car isn't the most expensive?
I would rather recommend you make the grid size equal to
(number of distinct speeds) x (number of distinct prices),
then it would be a (rather) simple case of ordering by two axes.

If the data originates in a database, then you should order them as you fetch them from the database. This should only mean adding ORDER BY speed, price near the end of your query, but before the LIMIT part (where 'speed' and 'price' are the names of the appropriate fields).
As others have said, "fastest and most expensive" is a difficult thing to do, you ought to just pick one to sort by first. However, it would be possible to make an approximation using this algorithm:
Find the highest price and fastest speed.
Normalize all prices and speeds to e.g. a fraction out of 1. You do this by dividing the price by the highest price you found in step 1.
Multiply the normalized price and speed together to create one "price & speed" number.
Sort by this number.
This ensures that is car A is faster and more expensive than car B, it gets put ahead on the list. Cars where one value is higher but the other is lower get roughly sorted. I'd recommend storing these values in the database and sorting as you select.
Putting them in a 10x10 grid is easy. Start outputting items, and when you get to a multiple of 10, start a new row.

Another option is to apply a score 0 .. 200% to each car, and sort by that score.
Example:
score_i = speed_percent(min_speed, max_speed, speed_i) + price_percent(min_price, max_price, price_i)

Hmmm... kind of bubble sort could be simple algorithm here.
Make a random 10x10 array.
Find two neighbours (horizontal or vertical) that are in "wrong order", and exchange them.
Repeat (2) until no such neighbours can be found.
Two neighbour elements are in "wrong order" when:
a) they're horizontal neighbours and left one is slower than right one,
b) they're vertical neighbours and top one is cheaper than bottom one.
But I'm not actually sure if this algorithm stops for every data. I'm almost sure it is very slow :-). It should be easy to implement and after some finite number of iterations the partial result might be good enough for your purposes though. You can also start by generating the array using one of other methods mentioned here. Also it will maintain your condition on array shape.
Edit: It is too late here to prove anything, but I made some experiments in python. It looks like a random array of 100x100 can be sorted this way in few seconds and I always managed to get full 2d ordering (that is: at the end I got wrongly-ordered neighbours). Assuming that OP can precalculate this array, he can put any reasonable number of cars into the array and get sensible results. Experimental code: http://pastebin.com/f2bae9a79 (you need matplotlib, and I recommend ipython too). iterchange is the sorting method there.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio