Mark M cells on a NxN board randomly with equal probability [duplicate]

Mark M cells on a NxN board randomly with equal probability [duplicate] - algorithm

This question already has answers here:
Algorithm to select a single, random combination of values?
(7 answers)
Closed 8 years ago.
An interview question:
Given a NxN board with all cells set to 0, mark M (M < NxN) cells to 1. The M cells should be chosen from all cells with equal probability.
E.g. Mark 30 cells in a 10x10 board, then the probability for a cell to be chosen is 0.3.
My idea is to iterate all cells and on each cell compute a random number in range [1-100], mark the cell to 1 if the number is less than or equal to 30.
The interviewer is not impressed by this solution. Any good idea? (You can use any language)

Put 70 zeros (NxN - M) and 30 ones (M) into a vector. Shuffle the vector. Iterate through and map each index k to 2-d indices via i = k / 10 and j = k % 10 for your example (use N as the divisor more generally).
ADDENDUM
After checking out #candu's link, I decided to give that approach a try. Here's an implementation in Ruby:
require 'set'
# implementation of Floyd's uniform subset algorithm for
# values in the range [0,n).
def generateMfromN(m, n)
s = Set.new
((n-m)...n).each {|j| s.add?(rand(j+1)) || s.add(j)}
s.to_a
end
#initialize a 10x10 array of zeros
a = Array.new(10)
10.times {|i| a[i] = Array.new(10,0)}
# create an array of 10 random indices between 0 and 99,
# map each index to 2-d indices, and set the corresponing
# element to 1.
generateMfromN(10,100).each {|index| a[index/10][index%10] = 1}
# show the results
a.each {|v| puts v.to_s}
This produces results such as...
[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 0, 0, 0, 0, 0]
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
and appears to require only O(M) work for Floyd's algorithm, since on each of M iterations an element always gets added to the set.
If M is bigger than N*N/2, initialize the array with 1's and randomize placement of zeros instead, as suggested by #btilly.

This can be done in expected running time O(m).
First let's deal with the case where we need at most half the board. So m <= n*n/2. For this case we can keep choosing random points and changing their values, throwing away and we chose before, until we have m of them. The probability of throwing away the next random choice is never more than half, so the number of random choices needed is at worst 2 m = O(m).
In the case where we need more than half the board, it takes time O(m) to flip every cell to 1, and then we use the previous solution to find n*n - m cells to turn back to 0.

Related

CountMultiplicativePairs in Python using optimized way

The complete problem is given below for which I wrote a Python code and wanted to know the complexity of it or whether it can be optimised more. The solutions are available in C# but the logics are quiet complex.
http://www.whatsjs.com/2018/01/codility-countmultiplicativepairs.html
Here is the solution to the problem:
How to find pairs with product greater than sum
Below the code I wrote in Python. Is there any other way or someone who has tried this problem in python as the C# code explained above doesn't have proper explanation
def solution(A,B):
"""
Count the number of pairs (x, y) such that x * y >= x + y.
"""
M = 1000*1000
max_count=1000*1000*1000
zero=count=0
if len(A)<=1:
return "Length of array A should be greater than 1"
if len(B)<=1:
return "Length of array B should be greater than 1"
if len(A)!=len(B):
return "Length of both arrays should be equal"
C=[0]*len(A)
for (i, elem) in enumerate(A):
C[i]=float(A[i])+float(B[i]/M)
for (i, elem) in enumerate(C):
if elem==0:
zero+=1
if elem>0 and elem<=1:
pass
if elem>1:
for j in range(i+1,len(C)):
if round(C[i]*C[j],2)>=C[i]+C[j]:
count+=1
zero_pairs=int(zero*(zero-1)/2)
count+=zero_pairs
return min(count,max_count)
#return C
#print(solution([0,1,2,2,3,5], [500000, 500000, 0, 0, 0, 20000]))
print(solution([1, 1, 1, 2, 2, 3, 5, 6],[200000, 250000, 500000, 0, 0, 0, 0, 0]))
# print(solution([0, 0, 2, 2], [0, 0, 0, 0]))
# print(solution([1, 3], [500000, 10000]))
# print(solution([1, 3], [400000, 500000]))
#print(solution([0, 0, 0, 0] , [0, 0, 0, 0]))
#print(solution([0, 0, 0, 0] , [1, 1, 1, 1]))
I wanted a more optimised way to solve this, as I feel the complexity currently is O(n^2)

How to check if every element in a 2D Array are connected together

Question is in the title. I have a 2D array:
array = [
[0, 0, 1, 0, 1],
[0, 0, 1, 0, 1],
[1, 1, 1, 1, 1],
[0, 0, 1, 0, 0],
[0, 0, 1, 0, 0]
]
How do I check to see if every element "1" in this example are all connected together as neighbors either laterally or horizontally. In this example the function should return TRUE since all of the 1's are all connected together. In contrast:
array = [
[0, 0, 0, 1, 1],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 0],
[1, 1, 0, 0, 0],
[1, 1, 0, 0, 0]
]
This should return FALSE, since their is a divide between the 1's and not all of them are neighbors.
My initial thought was to iterate through the array and check to see if any of the adjacent items were 1's or not. However, this doesn't work since two elements can be next to each other yet away from the rest of the group. Any help is greatly appreciated.

You can use BFS or DFS for that.
These are exploration algorithms that helps you to discover all nodes connected to your starting one.
The "trick" is to think of your matrix as a graph where:
V = { (i,j) | a[i][j] == 1} (informally, all locations where there is 1 in the matrix
E = { ((i1, j1), (i2, j2)) | (i1, j1), (i2, j2) are adjacent }
Then, just find a place where a[i][j] == 1, and start a BFS or DFS from it to disccover all reachable nodes.
Once you are done, iterate the matrix again, and see if each a[i][j] == 1 element was discovered.
Good luck!

The correct answer for this question is counting all the elements that are 1's then finding any element that is a '1' then using a flood fill algorithm that counts the amount of 1's. If the two values are equal then the answer is True if not then false.
https://en.wikipedia.org/wiki/Flood_fill

Sort algorithms with time complexity in the order of number of elements [duplicate]

This question already has answers here:
How to sort a list with given range in O(n)
(4 answers)
Is there an O(n) integer sorting algorithm?
(6 answers)
Closed 5 years ago.
I am looking for an O(n) sort algorithm where n is the number of elements to sort. I know that highly optimized sorting algorithms are O(n log n) but I was told that under the following condition we can do better. The condition is:
We are sorting numbers in a small enough range, say 0 to 100.

Say we have the following
unsortedArray = [4, 3, 4, 2]
Here is the algorithm:
Step 1) Iterate over the unsortedArray and use each element as the index into a new array we call countingArray. The value we will hold in each position is the count of times that that number appears. Each time we access a position we increment it by 1.
countingArray = [0, 0, 0, 0, 0, ..., 0, 0, 0, 0] // before iteration
countingArray = [0, 0, 0, 0, 1, ..., 0, 0, 0, 0] // after handling 4
countingArray = [0, 0, 0, 1, 1, ..., 0, 0, 0, 0] // after handling 3
countingArray = [0, 0, 0, 1, 2, ..., 0, 0, 0, 0] // after the second 4
countingArray = [0, 0, 1, 1, 2, ..., 0, 0, 0, 0] // after handling 2
We can allocate countingArray in advance because the range of the numbers we wish to sort is limited and known a-priori. In your example countingArray will have 101 elements.
Time complexity of this step is O(n) because you are iterating over n elements from unsortedArray. Inserting them into countingArray has constant time complexity.
Step 2) As shown in the example above countingArray is going to have positions with value 0 where there were no numbers to count in unsortedArray. We are going to skip these positions in the following iteration we will describe.
In countingArray non-zero positions define a number that we want to sort, and the content in that position define the count of how many times that number should appear in the final sortedArray.
We iterate over countingArray and starting at the first position of sortedArray put that number into count number of adjacent positions. This builds sortedArray and takes O(n).
countingArray = [0, 0, 1, 1, 2, ..., 0, 0, 0, 0]
// After skipping the first 2 0s and seeing a count of 1 in position 2
sortedArray = [2, 0, 0, 0]
// After seeing a count of 1 in position 3
sortedArray = [2, 3, 0, 0]
// In position 4 we have a count of 2 so we fill 4 in 2 positions
sortedArray = [2, 3, 4, 4]
=======
Total time complexity is O(n) * 2 = O(n)

How I can find the next value? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Given an array of 0 and 1, e.g. array[] = {0, 1, 0, 0, 0, 1, ...}, how I can predict what the next value will be with the best possible accuracy?
What kind of methods are best suited for this kind of task?

The prediction method would depend on the interpretation of data.
However, it looks like in this particular case we can make some general assumptions that might justify use of certain machine learning techniques.
Values are generated one after another in chronological order
Values depend on some (possibly non-observable) external state. If the state repeats itself, so do the values.
This is a pretty common scenario in many machine learning contexts. One example is the prediction of stock prices based on history.
Now, to build the predictive model you'll need to define the training data set. Assume our model looks at the last k values. In case if k=1, we might end up with something similar to a Markov chain model.
Our training data set will consist of k-dimensional data points together with their respective dependent values. For example, suppose k=3 and we have the following input data
0,0,1,1,0,1,0,1,1,1,1,0,1,0,0,1...
We'll have the following training data:
(0,0,1) -> 1
(0,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 1
(1,0,1) -> 1
(0,1,1) -> 1
(1,1,1) -> 1
(1,1,1) -> 0
(1,1,0) -> 1
(1,0,1) -> 0
(0,1,0) -> 0
(1,0,0) -> 1
Now, let's say you want to predict the next value in the sequence. The last 3 values are 0,0,1, so the model must predict the value of the function at (0,0,1), based on the training data.
A popular and relatively simple approach would be to use a multivariate linear regression on a k-dimensional data space. Alternatively, consider using a neural network if linear regression underfits the training data set.
You might need to try out different values of k and test against your validation set.

You could use a maximum likelihood estimator for the Bernoulli distribution. In essence you would:
look at all observed values and estimate parameter p
then use p to determine the next value
In Python this could look like this:
#!/usr/bin/env python
from __future__ import division
signal = [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0]
def maximum_likelihood(s, last=None):
"""
The maximum likelihood estimator selects the parameter value which gives
the observed data the largest possible probability.
http://mathworld.wolfram.com/MaximumLikelihood.html
If `last` is given, only use the last `n` values.
"""
if not last:
return sum(s) / len(s)
return sum(s[:-last]) / last
if __name__ == '__main__':
hits = []
print('p\tpredicted\tcorrect\tsignal')
print('-\t---------\t-------\t------')
for i in range(1, len(signal) - 1):
p = maximum_likelihood(signal[:i]) # p = maximum_likelihood(signal[:i], last=2)
prediction = int(p >= 0.5)
hits.append(prediction == signal[i])
print('%0.3f\t%s\t\t%s\t%s' % (
p, prediction, prediction == signal[i], signal[:i]))
print('accuracy: %0.3f' % (sum(hits) / len(hits)))
The output would like this:
# p predicted correct signal
# - --------- ------- ------
# 1.000 1 False [1]
# 0.500 1 True [1, 0]
# 0.667 1 True [1, 0, 1]
# 0.750 1 False [1, 0, 1, 1]
# 0.600 1 False [1, 0, 1, 1, 0]
# 0.500 1 True [1, 0, 1, 1, 0, 0]
# 0.571 1 False [1, 0, 1, 1, 0, 0, 1]
# 0.500 1 True [1, 0, 1, 1, 0, 0, 1, 0]
# 0.556 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1]
# 0.600 1 False [1, 0, 1, 1, 0, 0, 1, 0, 1, 1]
# 0.545 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0]
# 0.583 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1]
# 0.615 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1]
# 0.643 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1]
# 0.667 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1]
# 0.688 1 False [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1]
# 0.647 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0]
# 0.667 1 False [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1]
# 0.632 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0]
# 0.650 1 True [1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1]
# accuracy: 0.650
You could vary the window size for performance reasons or to favor recent events.
In above example, if we would estimate the the next value by looking only at the last 3 observed values, we could increase our accuracy to 0.7.
Update: Inspired by Narek's answer I added a logistic regression classifier example to the gist.

You can predict by calculating the probabilities of 0s and 1s and make their probability ranges and then draw a random number between 0 and 1 to predict.....

If these are series of numbers that are generated each time after some reset event, and next numbers are somehow related to previous ones, you could create a tree (binary tree with two branches at each node in your case) and feed in such historical series from the root, adjusting weights (say a count) on each branch you follow.
Could divide such counts by the number of series you entered before using them, or keep a number on each node too, increased before choosing a branch. That way root node contains number of series entered.
Then, as you feed it a new sequence you can see which branch is "hotter" (would make nice visualization as heatmap/tree btw) to follow, especially if sequence is long enough. That is, assuming order of items in sequence plays a role in what comes next.

Hashi Puzzle Representation to solve all solutions with Prolog Restrictions

Im trying to write a prolog program that receives a representation of an unsolved Hashi board and answers all the possible solutions, using restrictions. Im having an hard time figuring out which is the best (or a very good) way of representing the board with the bridges and without. The program is supposed to draw the boards for an easy reading of the solutions.
board(
[[3, 0, 6, 0, 0, 0, 6, 0, 3],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[2, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 3, 0, 0, 2, 0, 0, 0],
[0, 3, 0, 0, 0, 0, 4, 0, 1]]
).
For example, this representation is only good without the bridges, since it holds no info about them. The drawing of this board would be basicly turning the 0's into spaces, and the board would be drawn like this:
3 6 6 3
1
2 1
1 3 2
3 4 1
which is a decent representation of a real hashi board.
The point now is to be able to draw the same thing, but also draw bridges if there's any. I must be able to do so before i even think of making the restrictions themselves, since going at it with a bad way of representation will make my job alot more difficult.
I started thinking of solutions like this:
if every element of the board would be a list:
[NumberOfConnections, [ListOfConnections]]
but this gives me no info for the drawing, and what would the list of connections really have?
maybe this:
[Index, NumberOfConnections, [ListOfIndex]]
this way every "island" would have a unique ID and the list of connections would have ids
but drawing still sounds kinda hard, in the end the bridges can only be horizontal or vertical
Anyway, anyone can think of a better way of representation that makes it the easiest to achive the final goal of the program?

Nice puzzle, I agree. Here is a half-way solution in ECLiPSe, a Prolog dialect with constraints (http://eclipseclp.org).
The idea is to have, for every field of the board, four variables N, E, S, W (for North, East, etc) that can take values 0..2 and represent the number of connections on that edge of the field. For the node-fields, these connections must sum up to the given number. For the empty fields, the connections must go through (N=S, E=W) and not cross (N=S=0 or E=W=0).
Your example solves correctly:
?- hashi(stackoverflow).
3 = 6 = = = 6 = 3
| X X |
| 1 X X |
| | X X |
2 | X 1 X |
| | X | X |
| | X | X |
1 | 3 - - 2 X |
3 = = = = 4 1
but the wikipedia one doesn't, because there is no connectedness constraint yet!
:- lib(ic). % uses the integer constraint library
hashi(Name) :-
board(Name, Board),
dim(Board, [Imax,Jmax]),
dim(NESW, [Imax,Jmax,4]), % 4 variables N,E,S,W for each field
( foreachindex([I,J],Board), param(Board,NESW,Imax,Jmax) do
Sum is Board[I,J],
N is NESW[I,J,1],
E is NESW[I,J,2],
S is NESW[I,J,3],
W is NESW[I,J,4],
( I > 1 -> N #= NESW[I-1,J,3] ; N = 0 ),
( I < Imax -> S #= NESW[I+1,J,1] ; S = 0 ),
( J > 1 -> W #= NESW[I,J-1,2] ; W = 0 ),
( J < Jmax -> E #= NESW[I,J+1,4] ; E = 0 ),
( Sum > 0 ->
[N,E,S,W] #:: 0..2,
N+E+S+W #= Sum
;
N = S, E = W,
(N #= 0) or (E #= 0)
)
),
% find a solution
labeling(NESW),
print_board(Board, NESW).
print_board(Board, NESW) :-
( foreachindex([I,J],Board), param(Board,NESW) do
( J > 1 -> true ; nl ),
Sum is Board[I,J],
( Sum > 0 ->
write(Sum)
;
NS is NESW[I,J,1],
EW is NESW[I,J,2],
symbol(NS, EW, Char),
write(Char)
),
write(' ')
),
nl.
symbol(0, 0, ' ').
symbol(0, 1, '-').
symbol(0, 2, '=').
symbol(1, 0, '|').
symbol(2, 0, 'X').
% Examples
board(stackoverflow,
[]([](3, 0, 6, 0, 0, 0, 6, 0, 3),
[](0, 0, 0, 0, 0, 0, 0, 0, 0),
[](0, 1, 0, 0, 0, 0, 0, 0, 0),
[](0, 0, 0, 0, 0, 0, 0, 0, 0),
[](2, 0, 0, 0, 0, 1, 0, 0, 0),
[](0, 0, 0, 0, 0, 0, 0, 0, 0),
[](0, 0, 0, 0, 0, 0, 0, 0, 0),
[](1, 0, 3, 0, 0, 2, 0, 0, 0),
[](0, 3, 0, 0, 0, 0, 4, 0, 1))
).
board(wikipedia,
[]([](2, 0, 4, 0, 3, 0, 1, 0, 2, 0, 0, 1, 0),
[](0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 1),
[](0, 0, 0, 0, 2, 0, 3, 0, 2, 0, 0, 0, 0),
[](2, 0, 3, 0, 0, 2, 0, 0, 0, 3, 0, 1, 0),
[](0, 0, 0, 0, 2, 0, 5, 0, 3, 0, 4, 0, 0),
[](1, 0, 5, 0, 0, 2, 0, 1, 0, 0, 0, 2, 0),
[](0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 4, 0, 2),
[](0, 0, 4, 0, 4, 0, 0, 3, 0, 0, 0, 3, 0),
[](0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
[](2, 0, 2, 0, 3, 0, 0, 0, 3, 0, 2, 0, 3),
[](0, 0, 0, 0, 0, 2, 0, 4, 0, 4, 0, 3, 0),
[](0, 0, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0),
[](3, 0, 0, 0, 0, 3, 0, 1, 0, 2, 0, 0, 2))
).

For drawing bridges, you could use ASCII 179 for single vertical bridges, 186 for double vertical bridges, 196 for single horizontal bridges, and 205 for double horizontal bridges. This depends on which extended ASCII set is in use, though. It works in the most common.
For internal representation, I'd use -1 and -2 for single and double bridges in one direction, and -3 and -4 in the other. You could use just about any symbol that isn't 0-8, but this has the added benefit of simply adding the bridges to the island (converting (-3, -4) to (-1, -2)) to check the solution. If the sum is 0, that island is solved.

What a cool puzzle! I did a few myself, and I don't see an obvious way to make solving them deterministic, which is a nice property for a puzzle to have. Games like Tetris derive much of their ongoing play value from the fact that you don't get bored--even a good strategy can continually be refined. This has a practical ramification: if I were coding this, I would spend no further time trying to find a deterministic algorithm. I would instead focus on the generate/test paradigm Prolog excels at.
If you know you're going to do generate-and-test, you know already where all your effort at optimization is going to go: making your generator more intelligent (so it generates better candidates) and making your test fast. So I'm looking at your board representation and I'm asking myself: is it going to be easy and fast to generate alternatives from this? And we both know the answer is no, for several reasons:
Finding alternative islands to connect to from any particular island is going to be highly inefficient: searching a list forward and backward and then indexing all the other lists by the current offset. This is a huge amount of list finagling, which won't be cheap.
Detecting and preventing a bridge crossing is going to be interesting.
More to the point, the proper way to encode bridges is not obvious with this design. Islands can be separated by great distances--are you going to put a 0/1/2 in every connecting cell? If so, you have a data duplication problem; if not, you're going to have some fun calculating which location should hold the bridge count.
It's just an intuition, but having a heterogeneous data structure like this where the "kind" of element is determined entirely by whether the indices are odd or even, strikes me as unwelcome.
I think what you've got for the board layout is a great input format, but I don't think it's going to serve you well as an intermediate representation. The game is clearly a graph problem. This suggests one of the two classic graph data structures might be more helpful: the adjacency list, or the edge matrix. Either of these will expedite choosing alternatives for bridge layout, but it's not obvious to me (maybe to someone who does more graph theory) how one would prevent bridge crossings. Ideally, your data structure would simply prevent bridge crossings from occurring. Next best would be preventing the generator from generating candidate solutions with bridge crossings; worst would be to simply fail them at the test stage.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio