Probability of uniforms spaces - probability

Let X be a random variable on probability space (Ω,P), Suppose X~U({1,2,3}), does this mean the space (Ω,P) is uniform space.
I tried to come up with counter example and did not work out but i still think this statement isn't right.

You are correct that the statement isn't right. Here is a concrete counterexample with a bit of R code to illustrate it.
A standard 6-sided die has 3 pairs: (1,6), (2,5), (3,4) where each number in the pair is on the opposite side of the other. Suppose that such a die is biased so that each pair is equally likely but that in a pair the larger of the two numbers is twice as likely as the smaller. For example, 6 is twice as likely as 1. This is easily seen to imply that the numbers 1,2,3 appear with probability 1/9 and the numbers 4,5,6 appear with probability 2/9.
You can simulate 1000 rolls like this:
rolls <- sample(1:6,1000,replace = TRUE, prob = c(1/9,1/9,1/9,2/9,2/9,2/9))
Here is a display created by making a barplot of the tabulation of the results:
Confirming the obvious fact that the distribution is not uniform.
We can define X on it as the function which indicates what pair a rolls is in (so that 1,6 are in the first pair, 2,5 in the second, 3,4 in the third):
X = function(x){min(x,7-x)}
and then:
barplot(table(sapply(rolls,X)))
leading to:
which confirms the obvious fact that X is uniform.

Related

Is it better to reduce the space complexity or the time complexity for a given program?

Grid Illumination: Given an NxN grid with an array of lamp coordinates. Each lamp provides illumination to every square on their x axis, every square on their y axis, and every square that lies in their diagonal (think of a Queen in chess). Given an array of query coordinates, determine whether that point is illuminated or not. The catch is when checking a query all lamps adjacent to, or on, that query get turned off. The ranges for the variables/arrays were about: 10^3 < N < 10^9, 10^3 < lamps < 10^9, 10^3 < queries < 10^9
It seems like I can get one but not both. I tried to get this down to logarithmic time but I can't seem to find a solution. I can reduce the space complexity but it's not that fast, exponential in fact. Where should I focus on instead, speed or space? Also, if you have any input as to how you would solve this problem please do comment.
Is it better for a car to go fast or go a long way on a little fuel? It depends on circumstances.
Here's a proposal.
First, note you can number all the diagonals that the inputs like on by using the first point as the "origin" for both nw-se and ne-sw. The diagonals through this point are both numbered zero. The nw-se diagonals increase per-pixel in e.g the northeast direction, and decreasing (negative) to the southwest. Similarly ne-sw are numbered increasing in the e.g. the northwest direction and decreasing (negative) to the southeast.
Given the origin, it's easy to write constant time functions that go from (x,y) coordinates to the respective diagonal numbers.
Now each set of lamp coordinates is naturally associated with 4 numbers: (x, y, nw-se diag #, sw-ne dag #). You don't need to store these explicitly. Rather you want 4 maps xMap, yMap, nwSeMap, and swNeMap such that, for example, xMap[x] produces the list of all lamp coordinates with x-coordinate x, nwSeMap[nwSeDiagonalNumber(x, y)] produces the list of all lamps on that diagonal and similarly for the other maps.
Given a query point, look up it's corresponding 4 lists. From these it's easy to deal with adjacent squares. If any list is longer than 3, removing adjacent squares can't make it empty, so the query point is lit. If it's only 3 or fewer, it's a constant time operation to see if they're adjacent.
This solution requires the input points to be represented in 4 lists. Since they need to be represented in one list, you can argue that this algorithm requires only a constant factor of space with respect to the input. (I.e. the same sort of cost as mergesort.)
Run time is expected constant per query point for 4 hash table lookups.
Without much trouble, this algorithm can be split so it can be map-reduced if the number of lampposts is huge.
But it may be sufficient and easiest to run it on one big machine. With a billion lamposts and careful data structure choices, it wouldn't be hard to implement with 24 bytes per lampost in an unboxed structures language like C. So a ~32Gb RAM machine ought to work just fine. Building the maps with multiple threads requires some synchronization, but that's done only once. The queries can be read-only: no synchronization required. A nice 10 core machine ought to do a billion queries in well less than a minute.
There is very easy Answer which works
Create Grid of NxN
Now for each Lamp increment the count of all the cells which suppose to be illuminated by the Lamp.
For each query check if cell on that query has value > 0;
For each adjacent cell find out all illuminated cells and reduce the count by 1
This worked fine but failed for size limit when trying for 10000 X 10000 grid

finding the count of cells in a given 2d array satisfying the given constraints

Given a 2-D array starting at (0,0) and proceeding to infinity in positive x and y axes. Given a number k>0 , find the number of cells reachable from (0,0) such that at every moment -> sum of digits of x+ sum of digits of y <=k . Moves can be up, down ,left or right. given x,y>=0 . Dfs gives answers but not sufficient for large values of k. anyone can help me with a better algorithm for this?
I think they asked you to calculate the number of cells (x,y) reachable with k>=x+y. If x=1 for example, then y can take any number between 0 and k-1 and the sum would be <=k. The total number of possibilities can be calculated by
sum(sum(1,y=0..k-x),x=0..k) = 1/2*k²+3/2*k+1
That should be able to do the trick for large k.
I am somewhat confused by the "digits" in your question. The digits make up the index like 3 times 9 makes 999. The sum of digits for the cell (999,888) would be 51. If you would allow the sum of digits to be 10^9 then you could potentially have 10^8 digits for an index, resulting something around 10^(10^8) entries, well beyond normal sizes for a table. I am therefore assuming my first interpretation. If that's not correct, then could you explain it a bit more?
EDIT:
okay, so my answer is not going to solve it. I'm afraid I don't see a nice formula or answer. I would approach it as a coloring/marking problem and mark all valid cells, then use some other technique to make sure all the parts are connected/to count them.
I have tried to come up with something but it's too messy. Basically I would try and mark large parts at once based on the index and k. If k=20, you can mark the cell range (0,0..299) at once (as any lower index will have a lower index sum) and continue to check the rest of the range. I start with 299 by fixing the 2 last digits to their maximum value and look for the max value for the first digit. Then continue that process for the remaining hundreds (300-999) and only fix the last digit to end up with 300..389 and 390..398. However, you can already see that it's a mess... (nevertheless i wanted to give it to you, you might get some better idea)
Another thing you can see immediately is that you problem is symmetric in index so any valid cell (x,y) tells you there's another valid cell (y,x). In a marking scheme / dfs/ bfs this can be exploited.

Find if any permutation of a number is within a range

I need to find if any permutation of the number exists within a specified range, i just need to return Yes or No.
For eg : Number = 122, and Range = [200, 250]. The answer would be Yes, as 221 exists within the range.
PS:
For the problem that i have in hand, the number to be searched
will only have two different digits (It will only contain 1 and 2,
Eg : 1112221121).
This is not a homework question. It was asked in an interview.
The approach I suggested was to find all permutations of the given number and check. Or loop through the range and check if we find any permutation of the number.
Checking every permutation is too expensive and unnecessary.
First, you need to look at them as strings, not numbers,
Consider each digit position as a seperate variable.
Consider how the set of possible digits each variable can hold is restricted by the range. Each digit/variable pair will be either (a) always valid (b) always invalid; or (c) its validity is conditionally dependent on specific other variables.
Now model these dependencies and independencies as a graph. As case (c) is rare, it will be easy to search in time proportional to O(10N) = O(N)
Numbers have a great property which I think can help you here:
For a given number a of value KXXXX, where K is given, we can
deduce that K0000 <= a < K9999.
Using this property, we can try to build a permutation which is within the range:
Let's take your example:
Range = [200, 250]
Number = 122
First, we can define that the first number must be 2. We have two 2's so we are good so far.
The second number must be be between 0 and 5. We have two candidate, 1 and 2. Still not bad.
Let's check the first value 1:
Any number would be good here, and we still have an unused 2. We have found our permutation (212) and therefor the answer is Yes.
If we did find a contradiction with the value 1, we need to backtrack and try the value 2 and so on.
If none of the solutions are valid, return No.
This Algorithm can be implemented using backtracking and should be very efficient since you only have 2 values to test on each position.
The complexity of this algorithm is 2^l where l is the number of elements.
You could try to implement some kind of binary search:
If you have 6 ones and 4 twos in your number, then first you have the interval
[1111112222; 2222111111]
If your range does not overlap with this interval, you are finished. Now split this interval in the middle, you get
(1111112222 + 222211111) / 2
Now find the largest number consisting of 1's and 2's of the respective number that is smaller than the split point. (Probably this step could be improved by calculating the split directly in some efficient way based on the 1 and 2 or by interpreting 1 and 2 as 0 and 1 of a binary number. One could also consider taking the geometric mean of the two numbers, as the candidates might then be more evenly distributed between left and right.)
[Edit: I think I've got it: Suppose the bounds have the form pq and pr (i.e. p is a common prefix), then build from q and r a symmetric string s with the 1's at the beginning and the end of the string and the 2's in the middle and take ps as the split point (so from 1111112222 and 1122221111 you would build 111122222211, prefix is p=11).]
If this number is contained in the range, you are finished.
If not, look whether the range is above or below and repeat with [old lower bound;split] or [split;old upper bound].
Suppose the range given to you is: ABC and DEF (each character is a digit).
Algorithm permutationExists(range_start, range_end, range_index, nos1, nos2)
if (nos1>0 AND range_start[range_index] < 1 < range_end[range_index] and
permutationExists(range_start, range_end, range_index+1, nos1-1, nos2))
return true
elif (nos2>0 AND range_start[range_index] < 2 < range_end[range_index] and
permutationExists(range_start, range_end, range_index+1, nos1, nos2-1))
return true
else
return false
I am assuming every single number to be a series of digits. The given number is represented as {numberOf1s, numberOf2s}. I am trying to fit the digits (first 1s and then 2s) within the range, if not the procudure returns a false.
PS: I might be really wrong. I dont know if this sort of thing can work. I haven't given it much thought, really..
UPDATE
I am wrong in the way I express the algorithm. There are a few changes that need to be done in it. Here is a working code (It worked for most of my test cases): http://ideone.com/1aOa4
You really only need to check at most TWO of the possible permutations.
Suppose your input number contains only the digits X and Y, with X<Y. In your example, X=1 and Y=2. I'll ignore all the special cases where you've run out of one digit or the other.
Phase 1: Handle the common prefix.
Let A be the first digit in the lower bound of the range, and let B be the first digit in the upper bound of the range. If A<B, then we are done with Phase 1 and move on to Phase 2.
Otherwise, A=B. If X=A=B, then use X as the first digit of the permutation and repeat Phase 1 on the next digit. If Y=A=B, then use Y as the first digit of the permutation and repeat Phase 1 on the next digit.
If neither X nor Y is equal to A and B, then stop. The answer is No.
Phase 2: Done with the common prefix.
At this point, A<B. If A<X<B, then use X as the first digit of the permutation and fill in the remaining digits however you want. The answer is Yes. (And similarly if A<Y<B.)
Otherwise, check the following four cases. At most two of the cases will require real work.
If A=X, then try using X as the first digit of the permutation, followed by all the Y's, followed by the rest of the X's. In other words, make the rest of the permutation as large as possible. If this permutation is in range, then the answer is Yes. If this permutation is not in range, then no permutation starting with X can succeed.
If B=X, then try using X as the first digit of the permutation, followed by the rest of the X's, followed by all the Y's. In other words, make the rest of the permutation as small as possible. If this permutation is in range, then the answer is Yes. If this permutation is not in range, then no permutation starting with X can succeed.
Similar cases if A=Y or B=Y.
If none of these four cases succeed, then the answer is No. Notice that at most one of the X cases and at most one of the Y cases can match.
In this solution, I've assumed that the input number and the two numbers in the range all contain the same number of digits. With a little extra work, the approach can be extended to cases where the numbers of digits differ.

How do I calculate the shanten number in mahjong?

This is a followup to my earlier question about deciding if a hand is ready.
Knowledge of mahjong rules would be excellent, but a poker- or romme-based background is also sufficient to understand this question.
In Mahjong 14 tiles (tiles are like
cards in Poker) are arranged to 4 sets
and a pair. A straight ("123") always
uses exactly 3 tiles, not more and not
less. A set of the same kind ("111")
consists of exactly 3 tiles, too. This
leads to a sum of 3 * 4 + 2 = 14
tiles.
There are various exceptions like Kan
or Thirteen Orphans that are not
relevant here. Colors and value ranges
(1-9) are also not important for the
algorithm.
A hand consists of 13 tiles, every time it's our turn we get to pick a new tile and have to discard any tile so we stay on 13 tiles - except if we can win using the newly picked tile.
A hand that can be arranged to form 4 sets and a pair is "ready". A hand that requires only 1 tile to be exchanged is said to be "tenpai", or "1 from ready". Any other hand has a shanten-number which expresses how many tiles need to be exchanged to be in tenpai. So a hand with a shanten number of 1 needs 1 tile to be tenpai (and 2 tiles to be ready, accordingly). A hand with a shanten number of 5 needs 5 tiles to be tenpai and so on.
I'm trying to calculate the shanten number of a hand. After googling around for hours and reading multiple articles and papers on this topic, this seems to be an unsolved problem (except for the brute force approach). The closest algorithm I could find relied on chance, i.e. it was not able to detect the correct shanten number 100% of the time.
Rules
I'll explain a bit on the actual rules (simplified) and then my idea how to tackle this task. In mahjong, there are 4 colors, 3 normal ones like in card games (ace, heart, ...) that are called "man", "pin" and "sou". These colors run from 1 to 9 each and can be used to form straights as well as groups of the same kind. The forth color is called "honors" and can be used for groups of the same kind only, but not for straights. The seven honors will be called "E, S, W, N, R, G, B".
Let's look at an example of a tenpai hand: 2p, 3p, 3p, 3p, 3p, 4p, 5m, 5m, 5m, W, W, W, E. Next we pick an E. This is a complete mahjong hand (ready) and consists of a 2-4 pin street (remember, pins can be used for straights), a 3 pin triple, a 5 man triple, a W triple and an E pair.
Changing our original hand slightly to 2p, 2p, 3p, 3p, 3p, 4p, 5m, 5m, 5m, W, W, W, E, we got a hand in 1-shanten, i.e. it requires an additional tile to be tenpai. In this case, exchanging a 2p for an 3p brings us back to tenpai so by drawing a 3p and an E we win.
1p, 1p, 5p, 5p, 9p, 9p, E, E, E, S, S, W, W is a hand in 2-shanten. There is 1 completed triplet and 5 pairs. We need one pair in the end, so once we pick one of 1p, 5p, 9p, S or W we need to discard one of the other pairs. Example: We pick a 1 pin and discard an W. The hand is in 1-shanten now and looks like this: 1p, 1p, 1p, 5p, 5p, 9p, 9p, E, E, E, S, S, W. Next, we wait for either an 5p, 9p or S. Assuming we pick a 5p and discard the leftover W, we get this: 1p, 1p, 1p, 5p, 5p, 5p, 9p, 9p, E, E, E, S, S. This hand is in tenpai in can complete on either a 9 pin or an S.
To avoid drawing this text in length even more, you can read up on more example at wikipedia or using one of the various search results at google. All of them are a bit more technical though, so I hope the above description suffices.
Algorithm
As stated, I'd like to calculate the shanten number of a hand. My idea was to split the tiles into 4 groups according to their color. Next, all tiles are sorted into sets within their respective groups to we end up with either triplets, pairs or single tiles in the honor group or, additionally, streights in the 3 normal groups. Completed sets are ignored. Pairs are counted, the final number is decremented (we need 1 pair in the end). Single tiles are added to this number. Finally, we divide the number by 2 (since every time we pick a good tile that brings us closer to tenpai, we can get rid of another unwanted tile).
However, I can not prove that this algorithm is correct, and I also have trouble incorporating straights for difficult groups that contain many tiles in a close range. Every kind of idea is appreciated. I'm developing in .NET, but pseudo code or any readable language is welcome, too.
I've thought about this problem a bit more. To see the final results, skip over to the last section.
First idea: Brute Force Approach
First of all, I wrote a brute force approach. It was able to identify 3-shanten within a minute, but it was not very reliable (sometimes too a lot longer, and enumerating the whole space is impossible even for just 3-shanten).
Improvement of Brute Force Approach
One thing that came to mind was to add some intelligence to the brute force approach. The naive way is to add any of the remaining tiles, see if it produced Mahjong, and if not try the next recursively until it was found. Assuming there are about 30 different tiles left and the maximum depth is 6 (I'm not sure if a 7+-shanten hand is even possible [Edit: according to the formula developed later, the maximum possible shanten number is (13-1)*2/3 = 8]), we get (13*30)^6 possibilities, which is large (10^15 range).
However, there is no need to put every leftover tile in every position in your hand. Since every color has to be complete in itself, we can add tiles to the respective color groups and note down if the group is complete in itself. Details like having exactly 1 pair overall are not difficult to add. This way, there are max around (13*9)^6 possibilities, that is around 10^12 and more feasible.
A better solution: Modification of the existing Mahjong Checker
My next idea was to use the code I wrote early to test for Mahjong and modify it in two ways:
don't stop when an invalid hand is found but note down a missing tile
if there are multiple possible ways to use a tile, try out all of them
This should be the optimal idea, and with some heuristic added it should be the optimal algorithm. However, I found it quite difficult to implement - it is definitely possible though. I'd prefer an easier to write and maintain solution first.
An advanced approach using domain knowledge
Talking to a more experienced player, it appears there are some laws that can be used. For instance, a set of 3 tiles does never need to be broken up, as that would never decrease the shanten number. It may, however, be used in different ways (say, either for a 111 or a 123 combination).
Enumerate all possible 3-set and create a new simulation for each of them. Remove the 3-set. Now create all 2-set in the resulting hand and simulate for every tile that improves them to a 3-set. At the same time, simulate for any of the 1-sets being removed. Keep doing this until all 3- and 2-sets are gone. There should be a 1-set (that is, a single tile) be left in the end.
Learnings from implementation and final algorithm
I implemented the above algorithm. For easier understanding I wrote it down in pseudocode:
Remove completed 3-sets
If removed, return (i.e. do not simulate NOT taking the 3-set later)
Remove 2-set by looping through discarding any other tile (this creates a number of branches in the simulation)
If removed, return (same as earlier)
Use the number of left-over single tiles to calculate the shanten number
By the way, this is actually very similar to the approach I take when calculating the number myself, and obviously never to yields too high a number.
This works very well for almost all cases. However, I found that sometimes the earlier assumption ("removing already completed 3-sets is NEVER a bad idea") is wrong. Counter-example: 23566M 25667P 159S. The important part is the 25667. By removing a 567 3-set we end up with a left-over 6 tile, leading to 5-shanten. It would be better to use two of the single tiles to form 56x and 67x, leading to 4-shanten overall.
To fix, we simple have to remove the wrong optimization, leading to this code:
Remove completed 3-sets
Remove 2-set by looping through discarding any other tile
Use the number of left-over single tiles to calculate the shanten number
I believe this always accurately finds the smallest shanten number, but I don't know how to prove that. The time taken is in a "reasonable" range (on my machine 10 seconds max, usually 0 seconds).
The final point is calculating the shanten out of the number of left-over single tiles. First of all, it is obvious that the number is in the form 3*n+1 (because we started out with 14 tiles and always subtracted 3 tiles).
If there is 1 tile left, we're shanten already (we're just waiting for the final pair). With 4 tiles left, we have to discard 2 of them to form a 3-set, leaving us with a single tile again. This leads to 2 additional discards. With 7 tiles, we have 2 times 2 discards, adding 4. And so on.
This leads to the simple formula shanten_added = (number_of_singles - 1) * (2/3).
The described algorithm works well and passed all my tests, so I'm assuming it is correct. As stated, I can't prove it though.
Since the algorithm removes the most likely tiles combinations first, it kind of has a built-in optimization. Adding a simple check if (current_depth > best_shanten) then return; it does very well even for high shanten numbers.
My best guess would be an A* inspired approach. You need to find some heuristic which never overestimates the shanten number and use it to search the brute-force tree only in the regions where it is possible to get into a ready state quickly enough.
Correct algorithm sample: syanten.cpp
Recursive cut forms from hand in order: sets, pairs, incomplete forms, - and count it. In all variations. And result is minimal Shanten value of all variants:
Shanten = Min(Shanten, 8 - * 2 - - )
C# sample (rewrited from c++) can be found here (in Russian).
I've done a little bit of thinking and came up with a slightly different formula than mafu's. First of all, consider a hand (a very terrible hand):
1s 4s 6s 1m 5m 8m 9m 9m 7p 8p West East North
By using mafu's algorithm all we can do is cast out a pair (9m,9m). Then we are left with 11 singles. Now if we apply mafu's formula we get (11-1)*2/3 which is not an integer and therefore cannot be a shanten number. This is where I came up with this:
N = ( (S + 1) / 3 ) - 1
N stands for shanten number and S for score sum.
What is score? It's a number of tiles you need to make an incomplete set complete. For example, if you have (4,5) in your hand you need either 3 or 6 to make it a complete 3-set, that is, only one tile. So this incomplete pair gets score 1. Accordingly, (1,1) needs only 1 to become a 3-set. Any single tile obviously needs 2 tiles to become a 3-set and gets score 2. Any complete set of course get score 0. Note that we ignore the possibility of singles becoming pairs. Now if we try to find all of the incomplete sets in the above hand we get:
(4s,6s) (8m,9m) (7p,8p) 1s 1m 5m 9m West East North
Then we count the sum of its scores = 1*3+2*7 = 17.
Now if we apply this number to the formula above we get (17+1)/3 - 1 = 5 which means this hand is 5-shanten. It's somewhat more complicated than Alexey's and I don't have a proof but so far it seems to work for me. Note that such a hand could be parsed in the other way. For example:
(4s,6s) (9m,9m) (7p,8p) 1s 1m 5m 8m West East North
However, it still gets score sum 17 and 5-shanten according to formula. I also can't proof this and this is a little bit more complicated than Alexey's formula but also introduces scores that could be applied(?) to something else.
Take a look here: ShantenNumberCalculator. Calculate shanten really fast. And some related stuff (in japanese, but with code examples) http://cmj3.web.fc2.com
The essence of the algorithm: cut out all pairs, sets and unfinished forms in ALL possible ways, and thereby find the minimum value of the number of shanten.
The maximum value of the shanten for an ordinary hand: 8.
That is, as it were, we have the beginnings for 4 sets and one pair, but only one tile from each (total 13 - 5 = 8).
Accordingly, a pair will reduce the number of shantens by one, two (isolated from the rest) neighboring tiles (preset) will decrease the number of shantens by one,
a complete set (3 identical or 3 consecutive tiles) will reduce the number of shantens by 2, since two suitable tiles came to an isolated tile.
Shanten = 8 - Sets * 2 - Pairs - Presets
Determining whether your hand is already in tenpai sounds like a multi-knapsack problem. Greedy algorithms won't work - as Dialecticus pointed out, you'll need to consider the entire problem space.

Bin-packing (or knapsack?) problem

I have a collection of 43 to 50 numbers ranging from 0.133 to 0.005 (but mostly on the small side). I would like to find, if possible, all combinations that have a sum between L and R, which are very close together.*
The brute-force method takes 243 to 250 steps, which isn't feasible. What's a good method to use here?
Edit: The combinations will be used in a calculation and discarded. (If you're writing code, you can assume they're simply output; I'll modify as needed.) The number of combinations will presumably be far too large to hold in memory.
* L = 0.5877866649021190081897311406, R = 0.5918521703507438353981412820.
The basic idea is to convert it to an integer knapsack problem (which is easy).
Choose a small real number e and round numbers in your original problem to ones representable as k*e with integer k. The smaller e, the larger the integers will be (efficiency tradeoff) but the solution of the modified problem will be closer to your original one. An e=d/(4*43) where d is the width of your target interval should be small enough.
If the modified problem has an exact solution summing to the middle (rounded to e) of your target interval, then the original problem has one somewhere within the interval.
You haven't given us enough information. But it sounds like you are in trouble if you actually want to OUTPUT every possible combination. For example, consistent with what you told us are that every number is ~.027. If this is the case, then every collection of half of the elements with satisfy your criterion. But there are 43 Choose 21 such sets, which means you have to output at least 1052049481860 sets. (too many to be feasible)
Certainly the running time will be no better than the length of the required output.
Actually, there is a quicker way around this:
(python)
sums_possible = [(0, [])]
# sums_possible is an array of tuples like this: (number, numbers_that_yield_this_sum_array)
for number in numbers:
sums_possible_for_this_number = []
for sum in sums_possible:
sums_possible_for_this_number.insert((number + sum[0], sum[1] + [number]))
sums_possible = sums_possible + sums_possible_for_this_number
results = [sum[1] for sum in sums_possible if sum[0]>=L and sum[1]<=R]
Also, Aaron is right, so this may or may not be feasible for you

Resources