Sample with given probability

Sample with given probability - probability

I stumbled upon a basic discrete math/probability question and I wanted to get some ideas for improvements over my solution.
Assume you are given a collection (an alphabet, the natural numbers, etc.). How do you ensure that you draw a certain value X from this collection with a given probability P?
I'll explain my naïve solution with an example:
Collection = {A, B}
X = A, P = 1/4
We build an array v = [A, B, B, B] and we use a rand function to uniformly sample the indices of the array, i.e., {0, 1, 2, 3}
This approach works, but isn't efficient: the smaller P, the bigger the memory storage of v. Hence, I was wondering what ideas the stackoverflow community might have in improving this.
Thanks!

Partition the interval [0,1] into disjoint intervals whose union is [0,1]. Create the size of each partition to correspond to the probability of selecting each event. Then simply sample randomly from [0,1], evaluate which of your partitions the result lies in, then look up the selection that corresponds to that interval. In your example, this would result in the following 2 intervals [0,1/4) and [1/4,1] - generate a random uniform value from [0,1]. If your sample lies in the first interval then your selection X = A , if in the other interval then X = B.

Your proposed solution is indeed not great, and the most general and efficient way to solve it is as mathematician1975 states (this is known as the inverse CDF method). For your specific problem, which is multinomial sampling, you can also use a series of draws from binomial distributions to sample from your collection. This is often more intuitive if you're not familiar with sampling methods.
If the first item in the collection has probability p_1, sample uniformly in the interval [0-1]. If the sample is less than p_1, return item 1. Otherwise, renormalise the remaining outcomes by 1-p_1 and repeat the process with the next possible outcome. After each unsuccessful sampling, renormalise remaining outcomes by the total probability of rejected outcomes, so that the sum of remaining outcomes is 1. If you get to the last outcome, return it with probability 1. The result of the process will be random samples distributed according to your original vector.
This method is using the fact that individual components of a multinomial are binomially distributed, and any sub vector of the multinomial is also multinomial with parameters given by the renormalisation I describe above.

Related

Iterative random weighted choice [duplicate]

Suppose that I have an n-sided loaded die, where each side k has some probability pk of coming up when I roll it. I’m curious if there is a good data structure for storing this information statically (i.e., for a fixed set of probabilities), so that I can efficiently simulate a random roll of the die.
Currently, I have an O(lg n) solution for this problem. The idea is to store a table of the cumulative probability of the first k sides for all k, then generate a random real number in the range [0, 1) and perform a binary search over the table to get the largest index whose cumulative value is no greater than the chosen value.
I rather like this solution, but it seems odd that the runtime doesn’t take the probabilities into account. In particular, in the extreme cases of one side always coming up or the values being uniformly distributed, it’s possible to generate the result of the roll in O(1) using a naive approach, while my solution will still take logarithmically many steps.
Does anyone have any suggestions for how to solve this problem in a way that is somehow “adaptive” in it’s runtime?
Update: Based on the answers to this question, I have written up an article describing many approaches to this problem, along with their analyses. It looks like Vose’s implementation of the alias method gives Θ(n) preprocessing time and O(1) time per die roll, which is truly impressive. Hopefully this is a useful addition to the information contained in the answers!

You are looking for the alias method which provides a O(1) method for generating a fixed discrete probability distribution (assuming you can access entries in an array of length n in constant time) with a one-time O(n) set-up. You can find it documented in chapter 3 (PDF) of "Non-Uniform Random Variate Generation" by Luc Devroye.
The idea is to take your array of probabilities pk and produce three new n-element arrays, qk, ak, and bk. Each qk is a probability between 0 and 1, and each ak and bk is an integer between 1 and n.
We generate random numbers between 1 and n by generating two random numbers, r and s, between 0 and 1. Let i = floor(r*N)+1. If qi < s then return ai else return bi. The work in the alias method is in figuring out how to produce qk, ak and bk.

Use a balanced binary search tree (or binary search in an array) and get O(log n) complexity. Have one node for each die result and have the keys be the interval that will trigger that result.
function get_result(node, seed):
if seed < node.interval.start:
return get_result(node.left_child, seed)
else if seed < node.interval.end:
// start <= seed < end
return node.result
else:
return get_result(node.right_child, seed)
The good thing about this solution is that is very simple to implement but still has good complexity.

I'm thinking of granulating your table.
Instead of having a table with the cumulative for each die value, you could create an integer array of length xN, where x is ideally a high number to increase accuracy of the probability.
Populate this array using the index (normalized by xN) as the cumulative value and, in each 'slot' in the array, store the would-be dice roll if this index comes up.
Maybe I could explain easier with an example:
Using three dice: P(1) = 0.2, P(2) = 0.5, P(3) = 0.3
Create an array, in this case I will choose a simple length, say 10. (that is, x = 3.33333)
arr[0] = 1,
arr[1] = 1,
arr[2] = 2,
arr[3] = 2,
arr[4] = 2,
arr[5] = 2,
arr[6] = 2,
arr[7] = 3,
arr[8] = 3,
arr[9] = 3
Then to get the probability, just randomize a number between 0 and 10 and simply access that index.
This method might loose accuracy, but increase x and accuracy will be sufficient.

There are many ways to generate a random integer with a custom distribution (also known as a discrete distribution). The choice depends on many things, including the number of integers to choose from, the shape of the distribution, and whether the distribution will change over time.
One of the simplest ways to choose an integer with a custom weight function f(x) is the rejection sampling method. The following assumes that the highest possible value of f is max and each weight is 0 or greater. The time complexity for rejection sampling is constant on average, but depends greatly on the shape of the distribution and has a worst case of running forever. To choose an integer in [1, k] using rejection sampling:
Choose a uniform random integer i in [1, k].
With probability f(i)/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is f(i) or less, return i, or go to step 1 otherwise.)
Other algorithms have an average sampling time that doesn't depend so greatly on the distribution (usually either constant or logarithmic), but often require you to precalculate the weights in a setup step and store them in a data structure. Some of them are also economical in terms of the number of random bits they use on average. Many of these algorithms were introduced after 2011, and they include—
The Bringmann–Larsen succinct data structure ("Succinct Sampling from Discrete Distributions", 2012),
Yunpeng Tang's multi-level search ("An Empirical Study of Random Sampling Methods for Changing Discrete Distributions", 2019), and
the Fast Loaded Dice Roller (2020).
Other algorithms include the alias method (already mentioned in your article), the Knuth–Yao algorithm, the MVN data structure, and more. See my section "Weighted Choice With Replacement" for a survey.

Biasing random number generator to some integer n with deviation b

Given an integer range R = [a, b] (where a >=0 and b <= 100), a bias integer n in R, and some deviation b, what formula can I use to skew a random number generator towards n?
So for example if I had the numbers 1 through 10 inclusively and I don't specify a bias number, then I should in theory have equal chances of randomly drawing one of them.
But if I do give a specific bias number (say, 3), then the number generator should be drawing 3 a more frequently than the other numbers.
And if I specify a deviation of say 2 in addition to the bias number, then the number generator should be drawing from 1 through 5 a more frequently than 6 through 10.
What algorithm can I use to achieve this?
I'm using Ruby if it makes it any easier/harder.

i think the simplest route is to sample from a normal (aka gaussian) distribution with the properties you want, and then transform the result:
generate a normal value with given mean and sd
round to nearest integer
if outside given range (normal can generate values over the entire range from -infinity to -infinity), discard and repeat
if you need to generate a normal from a uniform the simplest transform is "box-muller".
there are some details you may need to worry about. in particular, box muller is limited in range (it doesn't generate extremely unlikely values, ever). so if you give a very narrow range then you will never get the full range of values. other transforms are not as limited - i'd suggest using whatever ruby provides (look for "normal" or "gaussian").
also, be careful to round the value. 2.6 to 3.4 should all become 3, for example. if you simply discard the decimal (so 3.0 to 3.999 become 3) you will be biased.
if you're really concerned with efficiency, and don't want to discard values, you can simply invent something. one way to cheat is to mix a uniform variate with the bias value (so 9/10 times generate the uniform, 1/10 times return 3, say). in some cases, where you only care about average of the sample, that can be sufficient.

For the first part "But if I do give a specific bias number (say, 3), then the number generator should be drawing 3 a more frequently than the other numbers.", a very easy solution:
def randBias(a,b,biasedNum=None, bias=0):
x = random.randint(a, b+bias)
if x<= b:
return x
else:
return biasedNum
For the second part, I would say it depends on the task. In a case where you need to generate a billion random numbers from the same distribution, I would calculate the probability of the numbers explicitly and use weighted random number generator (see Random weighted choice )

If you want an unimodal distribution (where the bias is just concentrated in one particular value of your range of number, for example, as you state 3), then the answer provided by andrew cooke is good---mostly because it allows you to fine tune the deviation very accurately.
If however you wish to make several biases---for instance you want a trimodal distribution, with the numbers a, (a+b)/2 and b more frequently than others, than you would do well to implement weighted random selection.
A simple algorithm for this was given in a recent question on StackOverflow; it's complexity is linear. Using such an algorithm, you would simply maintain a list, initial containing {a, a+1, a+2,..., b-1, b} (so of size b-a+1), and when you want to add a bias towards X, you would several copies of X to the list---depending on how much you want to bias. Then you pick a random item from the list.
If you want something more efficient, the most efficient method is called the "Alias method" which was implemented very clearly in Python by Denis Bzowy; once your array has been preprocessed, it runs in constant time (but that means that you can't update the biases anymore once you've done the preprocessing---or you would to reprocess the table).
The downside with both techniques is that unlike with the Gaussian distribution, biasing towards X, will not bias also somewhat towards X-1 and X+1. To simulate this effect you would have to do something such as
def addBias(x, L):
L = concatList(L, [x, x, x, x, x])
L = concatList(L, [x+2])
L = concatList(L, [x+1, x+1])
L = concatList(L, [x-1,x-1,x-1])
L = concatList(L, [x-2])

Fastest way to check if a vector increases matrix rank

Given an n-by-m matrix A, with it being guaranteed that n>m=rank(A), and given a n-by-1 column v, what is the fastest way to check if [A v] has rank strictly bigger than A?
For my application, A is sparse, n is about 2^12, and m is anywhere in 1:n-1.
Comparing rank(full([A v])) takes about a second on my machine, and I need to do it tens of thousands of times, so I would be very happy to discover a quicker way.

There is no need to do repeated solves IF you can afford to do ONE computation of the null space. Just one call to null will suffice. Given a new vector V, if the dot product with V and the nullspace basis is non-zero, then V will increase the rank of the matrix. For example, suppose we have the matrix M, which of course has a rank of 2.
M = [1 1;2 2;3 1;4 2];
nullM = null(M')';
Will a new column vector [1;1;1;1] increase the rank if we appended it to M?
nullM*[1;1;1;1]
ans =
-0.0321573705742971
-0.602164651199413
Yes, since it has a non-zero projection on at least one of the basis vectors in nullM.
How about this vector:
nullM*[0;0;1;1]
ans =
1.11022302462516e-16
2.22044604925031e-16
In this case, both numbers are essentially zero, so the vector in question would not have increased the rank of M.
The point is, only a simple matrix-vector multiplication is necessary once the null space basis has been generated. If your matrix is too large (and the matrix nearly of full rank) that a call to null will fail here, then you will need to do more work. However, n = 4096 is not excessively large as long as the matrix does not have too many columns.
One alternative if null is too much is a call to svds, to find those singular vectors that are essentially zero. These will form the nullspace basis that we need.

I would use sprank for sparse matrixes. Check it out, it might be faster than any other method.
Edit : As pointed out correctly by #IanHincks, it is not the rank. I am leaving the answer here, just in case someone else will need it in the future.

Maybe you can try to solve the system A*x=v, if it has a solution that means that the rank does not increase.
x=(B\A)';
norm(A*x-B) %% if this is small then the rank does not increase

Data structures for loaded dice?

You are looking for the alias method which provides a O(1) method for generating a fixed discrete probability distribution (assuming you can access entries in an array of length n in constant time) with a one-time O(n) set-up. You can find it documented in chapter 3 (PDF) of "Non-Uniform Random Variate Generation" by Luc Devroye.
The idea is to take your array of probabilities pk and produce three new n-element arrays, qk, ak, and bk. Each qk is a probability between 0 and 1, and each ak and bk is an integer between 1 and n.
We generate random numbers between 1 and n by generating two random numbers, r and s, between 0 and 1. Let i = floor(r*N)+1. If qi < s then return ai else return bi. The work in the alias method is in figuring out how to produce qk, ak and bk.

Use a balanced binary search tree (or binary search in an array) and get O(log n) complexity. Have one node for each die result and have the keys be the interval that will trigger that result.
function get_result(node, seed):
if seed < node.interval.start:
return get_result(node.left_child, seed)
else if seed < node.interval.end:
// start <= seed < end
return node.result
else:
return get_result(node.right_child, seed)
The good thing about this solution is that is very simple to implement but still has good complexity.

I'm thinking of granulating your table.
Instead of having a table with the cumulative for each die value, you could create an integer array of length xN, where x is ideally a high number to increase accuracy of the probability.
Populate this array using the index (normalized by xN) as the cumulative value and, in each 'slot' in the array, store the would-be dice roll if this index comes up.
Maybe I could explain easier with an example:
Using three dice: P(1) = 0.2, P(2) = 0.5, P(3) = 0.3
Create an array, in this case I will choose a simple length, say 10. (that is, x = 3.33333)
arr[0] = 1,
arr[1] = 1,
arr[2] = 2,
arr[3] = 2,
arr[4] = 2,
arr[5] = 2,
arr[6] = 2,
arr[7] = 3,
arr[8] = 3,
arr[9] = 3
Then to get the probability, just randomize a number between 0 and 10 and simply access that index.
This method might loose accuracy, but increase x and accuracy will be sufficient.

There are many ways to generate a random integer with a custom distribution (also known as a discrete distribution). The choice depends on many things, including the number of integers to choose from, the shape of the distribution, and whether the distribution will change over time.
One of the simplest ways to choose an integer with a custom weight function f(x) is the rejection sampling method. The following assumes that the highest possible value of f is max and each weight is 0 or greater. The time complexity for rejection sampling is constant on average, but depends greatly on the shape of the distribution and has a worst case of running forever. To choose an integer in [1, k] using rejection sampling:
Choose a uniform random integer i in [1, k].
With probability f(i)/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is f(i) or less, return i, or go to step 1 otherwise.)
Other algorithms have an average sampling time that doesn't depend so greatly on the distribution (usually either constant or logarithmic), but often require you to precalculate the weights in a setup step and store them in a data structure. Some of them are also economical in terms of the number of random bits they use on average. Many of these algorithms were introduced after 2011, and they include—
The Bringmann–Larsen succinct data structure ("Succinct Sampling from Discrete Distributions", 2012),
Yunpeng Tang's multi-level search ("An Empirical Study of Random Sampling Methods for Changing Discrete Distributions", 2019), and
the Fast Loaded Dice Roller (2020).
Other algorithms include the alias method (already mentioned in your article), the Knuth–Yao algorithm, the MVN data structure, and more. See my section "Weighted Choice With Replacement" for a survey.

From an interview: Removing rows and columns in an n×n matrix to maximize the sum of remaining values

Given an n×n matrix of real numbers. You are allowed to erase any number (from 0 to n) of rows and any number (from 0 to n) of columns, and after that the sum of the remaining entries is computed. Come up with an algorithm which finds out which rows and columns to erase in order to maximize that sum.

The problem is NP-hard. (So you should not expect a polynomial-time algorithm for solving this problem. There could still be (non-polynomial time) algorithms that are slightly better than brute-force, though.) The idea behind the proof of NP-hardness is that if we could solve this problem, then we could solve the the clique problem in a general graph. (The maximum-clique problem is to find the largest set of pairwise connected vertices in a graph.)
Specifically, given any graph with n vertices, let's form the matrix A with entries a[i][j] as follows:
a[i][j] = 1 for i == j (the diagonal entries)
a[i][j] = 0 if the edge (i,j) is present in the graph (and i≠j)
a[i][j] = -n-1 if the edge (i,j) is not present in the graph.
Now suppose we solve the problem of removing some rows and columns (or equivalently, keeping some rows and columns) so that the sum of the entries in the matrix is maximized. Then the answer gives the maximum clique in the graph:
Claim: In any optimal solution, there is no row i and column j kept for which the edge (i,j) is not present in the graph. Proof: Since a[i][j] = -n-1 and the sum of all the positive entries is at most n, picking (i,j) would lead to a negative sum. (Note that deleting all rows and columns would give a better sum, of 0.)
Claim: In (some) optimal solution, the set of rows and columns kept is the same. This is because starting with any optimal solution, we can simply remove all rows i for which column i has not been kept, and vice-versa. Note that since the only positive entries are the diagonal ones, we do not decrease the sum (and by the previous claim, we do not increase it either).
All of which means that if the graph has a maximum clique of size k, then our matrix problem has a solution with sum k, and vice-versa. Therefore, if we could solve our initial problem in polynomial time, then the clique problem would also be solved in polynomial time. This proves that the initial problem is NP-hard. (Actually, it is easy to see that the decision version of the initial problem — is there a way of removing some rows and columns so that the sum is at least k — is in NP, so the (decision version of the) initial problem is actually NP-complete.)

Well the brute force method goes something like this:
For n rows there are 2n subsets.
For n columns there are 2n subsets.
For an n x n matrix there are 22n subsets.
0 elements is a valid subset but obviously if you have 0 rows or 0 columns the total is 0 so there are really 22n-2+1 subsets but that's no different.
So you can work out each combination by brute force as an O(an) algorithm. Fast. :)
It would be quicker to work out what the maximum possible value is and you do that by adding up all the positive numbers in the grid. If those numbers happen to form a valid sub-matrix (meaning you can create that set by removing rows and/or columns) then there's your answer.
Implicit in this is that if none of the numbers are negative then the complete matrix is, by definition, the answer.
Also, knowing what the highest possible maximum is possibly allows you to shortcut the brute force evaluation since if you get any combination equal to that maximum then that is your answer and you can stop checking.
Also if all the numbers are non-positive, the answer is the maximum value as you can reduce the matrix to a 1 x 1 matrix with that 1 value in it, by definition.
Here's an idea: construct 2n-1 n x m matrices where 1 <= m <= n. Process them one after the other. For each n x m matrix you can calculate:
The highest possible maximum sum (as per above); and
Whether no numbers are positive allowing you to shortcut the answer.
if (1) is below the currently calculate highest maximum sum then you can discard this n x m matrix. If (2) is true then you just need a simple comparison to the current highest maximum sum.
This is generally referred to as a pruning technique.
What's more you can start by saying that the highest number in the n x n matrix is the starting highest maximum sum since obviously it can be a 1 x 1 matrix.
I'm sure you could tweak this into a (slightly more) efficient recursive tree-based search algorithm with the above tests effectively allowing you to eliminate (hopefully many) unnecessary searches.

We can improve on Cletus's generalized brute-force solution by modelling this as a directed graph. The initial matrix is the start node of the graph; its leaves are all the matrices missing one row or column, and so forth. It's a graph rather than a tree, because the node for the matrix without both the first column and row will have two parents - the nodes with just the first column or row missing.
We can optimize our solution by turning the graph into a tree: There's never any point exploring a submatrix with a column or row deleted that comes before the one we deleted to get to the current node, as that submatrix will be arrived at anyway.
This is still a brute-force search, of course - but we've eliminated the duplicate cases where we remove the same rows in different orders.
Here's an example implementation in Python:
def maximize_sum(m):
frontier = [(m, 0, False)]
best = None
best_score = 0
while frontier:
current, startidx, cols_done = frontier.pop()
score = matrix_sum(current)
if score > best_score or not best:
best = current
best_score = score
w, h = matrix_size(current)
if not cols_done:
for x in range(startidx, w):
frontier.append((delete_column(current, x), x, False))
startidx = 0
for y in range(startidx, h):
frontier.append((delete_row(current, y), y, True))
return best_score, best
And here's the output on 280Z28's example matrix:
>>> m = ((1, 1, 3), (1, -89, 101), (1, 102, -99))
>>> maximize_sum(m)
(106, [(1, 3), (1, 101)])

Since nobody asked for an efficient algorithm, use brute force: generate every possible matrix that can be created by removing rows and/or columns from the original matrix, choose the best one. A slightly more efficent version, which most likely can be proved to still be correct, is to generate only those variants where the removed rows and columns contain at least one negative value.

To try it in a simple way:
We need the valid subset of the set of entries {A00, A01, A02, ..., A0n, A10, ...,Ann} which max. sum.
First compute all subsets (the power set).
A valid subset is a member of the power set that for each two contained entries Aij and A(i+x)(j+y), contains also the elements A(i+x)j and Ai(j+y) (which are the remaining corners of the rectangle spanned by Aij and A(i+x)(j+y)).
Aij ...
. .
. .
... A(i+x)(j+y)
By that you can eliminate the invalid ones from the power set and find the one with the biggest sum in the remaining.
I'm sure it can be improved by improving an algorithm for power set generation in order to generate only valid subsets and by that avoiding step 2 (adjusting the power set).

I think there are some angles of attack that might improve upon brute force.
memoization, since there are many distinct sequences of edits that will arrive at the same submatrix.
dynamic programming. Because the search space of matrices is highly redundant, my intuition is that there would be a DP formulation that can save a lot of repeated work
I think there's a heuristic approach, but I can't quite nail it down:
if there's one negative number, you can either take the matrix as it is, remove the column of the negative number, or remove its row; I don't think any other "moves" result in a higher sum. For two negative numbers, your options are: remove neither, remove one, remove the other, or remove both (where the act of removal is either by axing the row or the column).
Now suppose the matrix has only one positive number and the rest are all <=0. You clearly want to remove everything but the positive entry. For a matrix with only 2 positive entries and the rest <= 0, the options are: do nothing, whittle down to one, whittle down to the other, or whittle down to both (resulting in a 1x2, 2x1, or 2x2 matrix).
In general this last option falls apart (imagine a matrix with 50 positives & 50 negatives), but depending on your data (few negatives or few positives) it could provide a shortcut.

Create an n-by-1 vector RowSums, and an n-by-1 vector ColumnSums. Initialize them to the row and column sums of the original matrix. O(n²)
If any row or column has a negative sum, remove edit: the one with the minimum such and update the sums in the other direction to reflect their new values. O(n)
Stop when no row or column has a sum less than zero.
This is an iterative variation improving on another answer. It operates in O(n²) time, but fails for some cases mentioned in other answers, which is the complexity limit for this problem (there are n² entries in the matrix, and to even find the minimum you have to examine each cell once).
Edit: The following matrix has no negative rows or columns, but is also not maximized, and my algorithm doesn't catch it.
1 1 3 goal 1 3
1 -89 101 ===> 1 101
1 102 -99
The following matrix does have negative rows and columns, but my algorithm selects the wrong ones for removal.
-5 1 -5 goal 1
1 1 1 ===> 1
-10 2 -10 2
mine
===> 1 1 1

Compute the sum of each row and column. This can be done in O(m) (where m = n^2)
While there are rows or columns that sum to negative remove the row or column that has the lowest sum that is less than zero. Then recompute the sum of each row/column.
The general idea is that as long as there is a row or a column that sums to nevative, removing it will result in a greater overall value. You need to remove them one at a time and recompute because in removing that one row/column you are affecting the sums of the other rows/columns and they may or may not have negative sums any more.
This will produce an optimally maximum result. Runtime is O(mn) or O(n^3)

I cannot really produce an algorithm on top of my head, but to me it 'smells' like dynamic programming, if it serves as a start point.

Big Edit: I honestly don't think there's a way to assess a matrix and determine it is maximized, unless it is completely positive.
Maybe it needs to branch, and fathom all elimination paths. You never no when a costly elimination will enable a number of better eliminations later. We can short circuit if it's found the theoretical maximum, but other than any algorithm would have to be able to step forward and back. I've adapted my original solution to achieve this behaviour with recursion.
Double Secret Edit: It would also make great strides to reduce to complexity if each iteration didn't need to find all negative elements. Considering that they don't change much between calls, it makes more sense to just pass their positions to the next iteration.
Takes a matrix, the list of current negative elements in the matrix, and the theoretical maximum of the initial matrix. Returns the matrix's maximum sum and the list of moves required to get there. In my mind move list contains a list of moves denoting the row/column removed from the result of the previous operation.
Ie: r1,r1
Would translate
-1 1 0 1 1 1
-4 1 -4 5 7 1
1 2 4 ===>
5 7 1
Return if sum of matrix is the theoretical maximum
Find the positions of all negative elements unless an empty set was passed in.
Compute sum of matrix and store it along side an empty move list.
For negative each element:
Calculate the sum of that element's row and column.
clone the matrix and eliminate which ever collection has the minimum sum (row/column) from that clone, note that action as a move list.
clone the list of negative elements and remove any that are effected by the action taken in the previous step.
Recursively call this algorithm providing the cloned matrix, the updated negative element list and the theoretical maximum. Append the moves list returned to the move list for the action that produced the matrix passed to the recursive call.
If the returned value of the recursive call is greater than the stored sum, replace it and store the returned move list.
Return the stored sum and move list.
I'm not sure if it's better or worse than the brute force method, but it handles all the test cases now. Even those where the maximum contains negative values.

This is an optimization problem and can be solved approximately by an iterative algorithm based on simulated annealing:
Notation: C is number of columns.
For J iterations:
Look at each column and compute the absolute benefit of toggling it (turn it off if it's currently on or turn it on if it's currently off). That gives you C values, e.g. -3, 1, 4. A greedy deterministic solution would just pick the last action (toggle the last column to get a benefit of 4) because it locally improves the objective. But that might lock us into a local optimum. Instead, we probabilistically pick one of the three actions, with probabilities proportional to the benefits. To do this, transform them into a probability distribution by putting them through a Sigmoid function and normalizing. (Or use exp() instead of sigmoid()?) So for -3, 1, 4 you get 0.05, 0.73, 0.98 from the sigmoid and 0.03, 0.42, 0.56 after normalizing. Now pick the action according to the probability distribution, e.g. toggle the last column with probability 0.56, toggle the second column with probability 0.42, or toggle the first column with the tiny probability 0.03.
Do the same procedure for the rows, resulting in toggling one of the rows.
Iterate for J iterations until convergence.
We may also, in early iterations, make each of these probability distributions more uniform, so that we don't get locked into bad decisions early on. So we'd raise the unnormalized probabilities to a power 1/T, where T is high in early iterations and is slowly decreased until it approaches 0. For example, 0.05, 0.73, 0.98 from above, raised to 1/10 results in 0.74, 0.97, 1.0, which after normalization is 0.27, 0.36, 0.37 (so it's much more uniform than the original 0.05, 0.73, 0.98).

It's clearly NP-Complete (as outlined above). Given this, if I had to propose the best algorithm I could for the problem:
Try some iterations of quadratic integer programming, formulating the problem as: SUM_ij a_ij x_i y_j, with the x_i and y_j variables constrained to be either 0 or 1. For some matrices I think this will find a solution quickly, for the hardest cases it would be no better than brute force (and not much would be).
In parallel (and using most of the CPU), use a approximate search algorithm to generate increasingly better solutions. Simulating Annealing was suggested in another answer, but having done research on similar combinatorial optimisation problems, my experience is that tabu search would find good solutions faster. This is probably close to optimal in terms of wandering between distinct "potentially better" solutions in the shortest time, if you use the trick of incrementally updating the costs of single changes (see my paper "Graph domination, tabu search and the football pool problem").
Use the best solution so far from the second above to steer the first by avoiding searching possibilities that have lower bounds worse than it.
Obviously this isn't guaranteed to find the maximal solution. But, it generally would when this is feasible, and it would provide a very good locally maximal solution otherwise. If someone had a practical situation requiring such optimisation, this is the solution that I'd think would work best.
Stopping at identifying that a problem is likely to be NP-Complete will not look good in a job interview! (Unless the job is in complexity theory, but even then I wouldn't.) You need to suggest good approaches - that is the point of a question like this. To see what you can come up with under pressure, because the real world often requires tackling such things.

yes, it's NP-complete problem.
It's hard to easily find the best sub-matrix,but we can easily to find some better sub-matrix.
Assume that we give m random points in the matrix as "feeds". then let them to automatically extend by the rules like :
if add one new row or column to the feed-matrix, ensure that the sum will be incrementive.
,then we can compare m sub-matrix to find the best one.

Let's say n = 10.
Brute force (all possible sets of rows x all possible sets of columns) takes
2^10 * 2^10 =~ 1,000,000 nodes.
My first approach was to consider this a tree search, and use
the sum of positive entries is an upper bound for every node in the subtree
as a pruning method. Combined with a greedy algorithm to cheaply generate good initial bounds, this yielded answers in about 80,000 nodes on average.
but there is a better way ! i later realised that
Fix some choice of rows X.
Working out the optimal columns for this set of rows is now trivial (keep a column if its sum of its entries in the rows X is positive, otherwise discard it).
So we can just brute force over all possible choices of rows; this takes 2^10 = 1024 nodes.
Adding the pruning method brought this down to 600 nodes on average.
Keeping 'column-sums' and incrementally updating them when traversing the tree of row-sets should allow the calculations (sum of matrix etc) at each node to be O(n) instead of O(n^2). Giving a total complexity of O(n * 2^n)

For slightly less than optimal solution, I think this is a PTIME, PSPACE complexity issue.
The GREEDY algorithm could run as follows:
Load the matrix into memory and compute row totals. After that run the main loop,
1) Delete the smallest row,
2) Subtract the newly omitted values from the old row totals
--> Break when there are no more negative rows.
Point two is a subtle detail: subtracted two rows/columns has time complexity n.
While re-summing all but two columns has n^2 time complexity!

Take each row and each column and compute the sum. For a 2x2 matrix this will be:
2 1
3 -10
Row(0) = 3
Row(1) = -7
Col(0) = 5
Col(1) = -9
Compose a new matrix
Cost to take row Cost to take column
3 5
-7 -9
Take out whatever you need to, then start again.
You just look for negative values on the new matrix. Those are values that actually substract from the overall matrix value. It terminates when there're no more negative "SUMS" values to take out (therefore all columns and rows SUM something to the final result)
In an nxn matrix that would be O(n^2)Log(n) I think

function pruneMatrix(matrix) {
max = -inf;
bestRowBitField = null;
bestColBitField = null;
for(rowBitField=0; rowBitField<2^matrix.height; rowBitField++) {
for (colBitField=0; colBitField<2^matrix.width; colBitField++) {
sum = calcSum(matrix, rowBitField, colBitField);
if (sum > max) {
max = sum;
bestRowBitField = rowBitField;
bestColBitField = colBitField;
}
}
}
return removeFieldsFromMatrix(bestRowBitField, bestColBitField);
}
function calcSumForCombination(matrix, rowBitField, colBitField) {
sum = 0;
for(i=0; i<matrix.height; i++) {
for(j=0; j<matrix.width; j++) {
if (rowBitField & 1<<i && colBitField & 1<<j) {
sum += matrix[i][j];
}
}
}
return sum;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio