Number of subsets whose XOR contains less than two set bits

Number of subsets whose XOR contains less than two set bits - algorithm

I have an Array A(size <= 10^5) of numbers(<= 10^8), and I need to answer some queries(50000), for L, R, how many subsets for elements in the range [L, R], the XOR of the subset is a number that has 0 or 1 bit set(power of 2). Also, point modifications in the array are being done in between the queries, so can't really do some offline processing or use techniques like square root decomposition etc.
I have an approach where I use DP to calculate for a given range, something on the lines of this:
https://www.geeksforgeeks.org/count-number-of-subsets-having-a-particular-xor-value/
But this is clearly too slow. This feels like a classical segment tree problem, but can't seem to find as to what data points to store at each node, so that I can use the left child and right child to compute the answer for the given range.

Yeah, that DP won't be fast enough.
What will be fast enough is applying some linear algebra over GF(2), the Galois field with two elements. Each number can be interpreted as a bit-vector; adding/subtracting vectors is XOR; scalar multiplication isn't really relevant.
The data you need for each segment is (1) how many numbers are there in the segment (2) a basis for the subspace of numbers generated by numbers in the segment, which will consist of at most 27 numbers because all numbers are less than 2^27. The basis for a one-element segment is just that number if it's nonzero, else the empty set. To find the span of the union of two bases, use Gaussian elimination and discard the zero vectors.
Given the length of an interval and a basis for it, you can count the number of good subsets using the rank-nullity theorem. Basically, for each target number, use your Gaussian elimination routine to test whether the target number belongs to the subspace. If so, there are 2^(length of interval minus size of basis) subsets. If not, the answer is zero.

Related

How to code the maximum set packing algorithm?

Suppose we have a finite set S and a list of subsets of S. Then, the set packing problem asks if some k subsets in the list are pairwise disjoint .
The optimization version of the problem, maximum set packing, asks for the maximum number of pairwise disjoint sets in the list.
http://en.wikipedia.org/wiki/Set_packing
So, Let S = {1,2,3,4,5,6,7,8,9,10}
and `Sa = {1,2,3,4}`
and `Sb = {4,5,6}`
and `Sc = {5,6,7,8}`
and `Sd = {9,10}`
Then the maximum number of pairwise disjoint sets are 3 ( Sa, Sc, Sd )
I could not find any articles about the algorithm involved. Can you shed some light on the same?
My approach:
Sort the sets according to the size. Start from the set of the smallest size. If no element of the next set intersects with the current set, then we unite the set and increase the count of maximum sets. Does this sound good to you? Any better ideas?

As hivert pointed out, this problem is NP-hard, so there's no efficient way to do this. However, if your input is relatively small, you can still pull it off. Exponential doesn't mean impossible, after all. It's just that exponential problems become impractical very quickly, as the input size grows. But for something like 25 sets, you can easily brute force it.
Here's one approach. Let's say you have n subsets, called S0, S1, ..., etc. We can try every combination of subsets, and pick the one with maximum cardinality. There are only 2^25 = 33554432 choices, so this is probably reasonable enough.
An easy way to do this is to notice that any non-negative number strictly below 2^N represents a particular choice of subsets. Look at the binary representation of the number, and choose the sets whose indices correspond to the bits that are on. So if the number is 11, the 0th, 1st and 3rd bits are on, and this corresponds to the combination [S0, S1, S3]. Then you just verify that these three sets are in fact disjoint.
Your procedure is as follows:
Iterate i from 0 to 2^N - 1
For each value of i, use the bits that are on to figure out the corresponding combination of subsets.
If those subsets are pairwise disjoint, update your best answer with this combination (i.e., use this if it is bigger than your current best).
Alternatively, use backtracking to generate your subsets. The two approaches are equivalent, modulo implementation tradeoffs. Backtracking will have some stack overhead, but can cut off entire lines of computation if you check disjointness as you go. For example, if S1 and S2 are not disjoint, then it will never bother with any bigger combinations containing those two, saving some time. The iterative method can't optimize itself in this way, but is fast and efficient because of the bitwise operations and tight loop.
The only nontrivial matter here is how to check if the subsets are pairwise disjoint. There are all sorts of tricks you can pull here as well, depending on the constraints.
A simple approach is to start with an empty set structure (pick whatever you want from the language of your choice) and add elements from each subset one by one. If you ever hit an element that's already in the set, then it occurs in at least two subsets, and you can give up on this combination.
If the original set S has m elements, and m is relatively small, you can map each of them to the range [0, m-1] and use bitmasks for each set. So if m <= 64, you can use a Java long to represent each subset. Turn on all the bits that correspond to the elements in the subset. This allows blazing fast set operation, because of the speed of bitwise operations. Bitwise AND corresponds to set intersection, and bitwise OR is a union. You can check if two subsets are disjoint by seeing if the intersection is empty (i.e., ANDing the two bitmasks gives you 0).
If you don't have so few elements, you can still avoid repeating the set intersections multiple times. You have very few sets, so precompute which ones are disjoint at the start. You can just store a boolean matrix D, such that D[i][j] = true iff i and j are disjoint. Then you just look up all pairs in a combination to verify pairwise disjointness, rather than doing real set operations.

You can solve the set packing problem searching a Maximum independent set. You encode your problem as follows:
for each set you put a vertex
you put an edge between two vertex if they share a common number.
Then you wan't a maximum set of vertex without two having two related vertex. Unfortunately this is a NP-Hard problem. Any know algorithm is exponential.

Find medians in multiple sub ranges of a unordered list

E.g. given a unordered list of N elements, find the medians for sub ranges 0..100, 25..200, 400..1000, 10..500, ...
I don't see any better way than going through each sub range and run the standard median finding algorithms.
A simple example: [5 3 6 2 4]
The median for 0..3 is 5 . (Not 4, since we are asking the median of the first three elements of the original list)

INTEGER ELEMENTS:
If the type of your elements are integers, then the best way is to have a bucket for each number lies in any of your sub-ranges, where each bucket is used for counting the number its associated integer found in your input elements (for example, bucket[100] stores how many 100s are there in your input sequence). Basically you can achieve it in the following steps:
create buckets for each number lies in any of your sub-ranges.
iterate through all elements, for each number n, if we have bucket[n], then bucket[n]++.
compute the medians based on the aggregated values stored in your buckets.
Put it in another way, suppose you have a sub-range [0, 10], and you would like to compute the median. The bucket approach basically computes how many 0s are there in your inputs, and how many 1s are there in your inputs and so on. Suppose there are n numbers lies in range [0, 10], then the median is the n/2th largest element, which can be identified by finding the i such that bucket[0] + bucket[1] ... + bucket[i] greater than or equal to n/2 but bucket[0] + ... + bucket[i - 1] is less than n/2.
The nice thing about this is that even your input elements are stored in multiple machines (i.e., the distributed case), each machine can maintain its own buckets and only the aggregated values are required to pass through the intranet.
You can also use hierarchical-buckets, which involves multiple passes. In each pass, bucket[i] counts the number of elements in your input lies in a specific range (for example, [i * 2^K, (i+1) * 2^K]), and then narrow down the problem space by identifying which bucket will the medium lies after each step, then decrease K by 1 in the next step, and repeat until you can correctly identify the medium.
FLOATING-POINT ELEMENTS
The entire elements can fit into memory:
If your entire elements can fit into memory, first sorting the N element and then finding the medians for each sub ranges is the best option. The linear time heap solution also works well in this case if the number of your sub-ranges is less than logN.
The entire elements cannot fit into memory but stored in a single machine:
Generally, an external sort typically requires three disk-scans. Therefore, if the number of your sub-ranges is greater than or equal to 3, then first sorting the N elements and then finding the medians for each sub ranges by only loading necessary elements from the disk is the best choice. Otherwise, simply performing a scan for each sub-ranges and pick up those elements in the sub-range is better.
The entire elements are stored in multiple machines:
Since finding median is a holistic operator, meaning you cannot derive the final median of the entire input based on the medians of several parts of input, it is a hard problem that one cannot describe its solution in few sentences, but there are researches (see this as an example) have been focused on this problem.

I think that as the number of sub ranges increases you will very quickly find that it is quicker to sort and then retrieve the element numbers you want.
In practice, because there will be highly optimized sort routines you can call.
In theory, and perhaps in practice too, because since you are dealing with integers you need not pay n log n for a sort - see http://en.wikipedia.org/wiki/Integer_sorting.
If your data are in fact floating point and not NaNs then a little bit twiddling will in fact allow you to use integer sort on them - from - http://en.wikipedia.org/wiki/IEEE_754-1985#Comparing_floating-point_numbers - The binary representation has the special property that, excluding NaNs, any two numbers can be compared like sign and magnitude integers (although with modern computer processors this is no longer directly applicable): if the sign bit is different, the negative number precedes the positive number (except that negative zero and positive zero should be considered equal), otherwise, relative order is the same as lexicographical order but inverted for two negative numbers; endianness issues apply.
So you could check for NaNs and other funnies, pretend the floating point numbers are sign + magnitude integers, subtract when negative to correct the ordering for negative numbers, and then treat as normal 2s complement signed integers, sort, and then reverse the process.

My idea:
Sort the list into an array (using any appropriate sorting algorithm)
For each range, find the indices of the start and end of the range using binary search
Find the median by simply adding their indices and dividing by 2 (i.e. median of range [x,y] is arr[(x+y)/2])
Preprocessing time: O(n log n) for a generic sorting algorithm (like quick-sort) or the running time of the chosen sorting routine
Time per query: O(log n)
Dynamic list:
The above assumes that the list is static. If elements can freely be added or removed between queries, a modified Binary Search Tree could work, with each node keeping a count of the number of descendants it has. This will allow the same running time as above with a dynamic list.

The answer is ultimately going to be "in depends". There are a variety of approaches, any one of which will probably be suitable under most of the cases you may encounter. The problem is that each is going to perform differently for different inputs. Where one may perform better for one class of inputs, another will perform better for a different class of inputs.
As an example, the approach of sorting and then performing a binary search on the extremes of your ranges and then directly computing the median will be useful when the number of ranges you have to test is greater than log(N). On the other hand, if the number of ranges is smaller than log(N) it may be better to move elements of a given range to the beginning of the array and use a linear time selection algorithm to find the median.
All of this boils down to profiling to avoid premature optimization. If the approach you implement turns out to not be a bottleneck for your system's performance, figuring out how to improve it isn't going to be a useful exercise relative to streamlining those portions of your program which are bottlenecks.

Algorithm for computing transitive closures on DAGs by harnessing machine words?

Let G be DAG with n vertices and m edges given by adjacency matrix. I need to calculate it's closure in form of a matrix as well. We have a computer that each word is b bits. and I need to find an algorithm that calculate the transitive closure in (n^2+nm/b)
I'm not really sure I understand what bits means and how can I use it
Adding the algorithm for finding transitive closure of dag:
TransitiveForDAG (Graph G)
int T[1...n,1...n] ={0,...,0}
List L <- TopologicalSort(G)
For each v in reverse(L)
T[v,v]<-1
For each u in Adj[v]
for j<-1,...,n do
T[v,j]<-T[v,j] or T[u,j]

You say you don't know what bits mean, so let's start with that.
Bit is the smallest unit of digital information - a 0 or a 1
Word is the unit of data processed by a computer at once. Processors don't take and process individual bits, but small chunks of them. Most today's computer architectures use words of 32 or 64 bits.
Now, how to work with words of binary data? In most programming languages, you'll use a numeric data type to store the data. To manipulate them, most languages provide bitwise operators - bitwise or (|) is one needed here.
So, how to make your algorithm faster? Look at the T matrix. It can only have values of 0 or 1 - a single bit is enough to store that information. You're processing fields of the matrix one by one; every time you process the last line of your algorithm, you only use one bit from the v'th row and one from the u'th row.
As said before, the processor has to read a whole word to read and process each of those bits. That's ineffective; mostly you wouldn't care about such a detail, but here it's in a critical place - the last line is in the innermost cycle and will be executed very many times.
Now, how to be more effective? Change the way you store the data - use a type with the length of a word as the data type for your matrix. Store b values from your original matrix in every value of the new one - this is called packing. Because of the way your algorithm works, you'll want to store them by row - the first value in the i-th row will contain first b values from the i-th row of the original matrix.
Apart from that, you only have to change the innermost cycle of the algorithm - the cycle will iterate over the words instead of individual fields and inside you'll process the whole words at once using bitwise or
T[v,j]<-T[v,j] | T[u,j]
The cycle is what generates the time complexity of the algorithm. You've now managed to iterate b-times less, so the complexity becomes (n^2+nm/b)

For a simple graph, each entry in the adjacency matrix is either 0 or 1 and can be represented by one bit. This allows a compact representation of the matrix by packing b entries into each computer word. The challenge then is to implement matrix multiplication (to compute the closure) using the correct bit manipulation operators.

possible combinations without zeros

So I am looking for a way to get possible combinations of two integers from an array, say I have
v = [0, 1, 2, 0, 4]
I would like at the end, conceptually a matrix like this, C = v^T v where v^T is the Transpose of the vector so you get a matrix with some nonzeros and the entries will be the combinations of two integers. For row 1 for instance,
(0,0) (1,1) (1,2) (0,0) (0,4)
but I only need (1,1) (1,2) also similar reasoning holds for the other rows in my conceptual matrix visualisation. I can do this by two nested loops by checking if they include 0 or not. Question is: are there some algorithms for these kinds of combinatorial tasks that would do that better than nested loops?

There is no way to generate this output "matrix" without a 2D nested loop (or something directly equivalent). So given you'll have the loops anyway, it's trivial to add the conditional checking.
I suppose you could pre-sort the array, and then start your loop counters at the first non-zero value...

Depending on how many zeroes you have, you could use a sparse matrix data structure and that may reduce the problem from O(n^2) to something slightly smaller (depending on how many zeroes you have).
If you ignore the zeroes for a second, N choose 2 combinations is equal to n^2/2-n/2 combinations. Similarly there are n*(n-1) 2-permutations of n. So no matter whether you are doing combinations or permutations, you still need an algorithm that is worst case O(n^2). Note your matrix would possibly be symmetric depending on certain assumptions, but that would not be enough to change it from O(n^2) because it would essentially be O((n/2)^2) which is still O(n^2).

From an interview: Removing rows and columns in an n×n matrix to maximize the sum of remaining values

Given an n×n matrix of real numbers. You are allowed to erase any number (from 0 to n) of rows and any number (from 0 to n) of columns, and after that the sum of the remaining entries is computed. Come up with an algorithm which finds out which rows and columns to erase in order to maximize that sum.

The problem is NP-hard. (So you should not expect a polynomial-time algorithm for solving this problem. There could still be (non-polynomial time) algorithms that are slightly better than brute-force, though.) The idea behind the proof of NP-hardness is that if we could solve this problem, then we could solve the the clique problem in a general graph. (The maximum-clique problem is to find the largest set of pairwise connected vertices in a graph.)
Specifically, given any graph with n vertices, let's form the matrix A with entries a[i][j] as follows:
a[i][j] = 1 for i == j (the diagonal entries)
a[i][j] = 0 if the edge (i,j) is present in the graph (and i≠j)
a[i][j] = -n-1 if the edge (i,j) is not present in the graph.
Now suppose we solve the problem of removing some rows and columns (or equivalently, keeping some rows and columns) so that the sum of the entries in the matrix is maximized. Then the answer gives the maximum clique in the graph:
Claim: In any optimal solution, there is no row i and column j kept for which the edge (i,j) is not present in the graph. Proof: Since a[i][j] = -n-1 and the sum of all the positive entries is at most n, picking (i,j) would lead to a negative sum. (Note that deleting all rows and columns would give a better sum, of 0.)
Claim: In (some) optimal solution, the set of rows and columns kept is the same. This is because starting with any optimal solution, we can simply remove all rows i for which column i has not been kept, and vice-versa. Note that since the only positive entries are the diagonal ones, we do not decrease the sum (and by the previous claim, we do not increase it either).
All of which means that if the graph has a maximum clique of size k, then our matrix problem has a solution with sum k, and vice-versa. Therefore, if we could solve our initial problem in polynomial time, then the clique problem would also be solved in polynomial time. This proves that the initial problem is NP-hard. (Actually, it is easy to see that the decision version of the initial problem — is there a way of removing some rows and columns so that the sum is at least k — is in NP, so the (decision version of the) initial problem is actually NP-complete.)

Well the brute force method goes something like this:
For n rows there are 2n subsets.
For n columns there are 2n subsets.
For an n x n matrix there are 22n subsets.
0 elements is a valid subset but obviously if you have 0 rows or 0 columns the total is 0 so there are really 22n-2+1 subsets but that's no different.
So you can work out each combination by brute force as an O(an) algorithm. Fast. :)
It would be quicker to work out what the maximum possible value is and you do that by adding up all the positive numbers in the grid. If those numbers happen to form a valid sub-matrix (meaning you can create that set by removing rows and/or columns) then there's your answer.
Implicit in this is that if none of the numbers are negative then the complete matrix is, by definition, the answer.
Also, knowing what the highest possible maximum is possibly allows you to shortcut the brute force evaluation since if you get any combination equal to that maximum then that is your answer and you can stop checking.
Also if all the numbers are non-positive, the answer is the maximum value as you can reduce the matrix to a 1 x 1 matrix with that 1 value in it, by definition.
Here's an idea: construct 2n-1 n x m matrices where 1 <= m <= n. Process them one after the other. For each n x m matrix you can calculate:
The highest possible maximum sum (as per above); and
Whether no numbers are positive allowing you to shortcut the answer.
if (1) is below the currently calculate highest maximum sum then you can discard this n x m matrix. If (2) is true then you just need a simple comparison to the current highest maximum sum.
This is generally referred to as a pruning technique.
What's more you can start by saying that the highest number in the n x n matrix is the starting highest maximum sum since obviously it can be a 1 x 1 matrix.
I'm sure you could tweak this into a (slightly more) efficient recursive tree-based search algorithm with the above tests effectively allowing you to eliminate (hopefully many) unnecessary searches.

We can improve on Cletus's generalized brute-force solution by modelling this as a directed graph. The initial matrix is the start node of the graph; its leaves are all the matrices missing one row or column, and so forth. It's a graph rather than a tree, because the node for the matrix without both the first column and row will have two parents - the nodes with just the first column or row missing.
We can optimize our solution by turning the graph into a tree: There's never any point exploring a submatrix with a column or row deleted that comes before the one we deleted to get to the current node, as that submatrix will be arrived at anyway.
This is still a brute-force search, of course - but we've eliminated the duplicate cases where we remove the same rows in different orders.
Here's an example implementation in Python:
def maximize_sum(m):
frontier = [(m, 0, False)]
best = None
best_score = 0
while frontier:
current, startidx, cols_done = frontier.pop()
score = matrix_sum(current)
if score > best_score or not best:
best = current
best_score = score
w, h = matrix_size(current)
if not cols_done:
for x in range(startidx, w):
frontier.append((delete_column(current, x), x, False))
startidx = 0
for y in range(startidx, h):
frontier.append((delete_row(current, y), y, True))
return best_score, best
And here's the output on 280Z28's example matrix:
>>> m = ((1, 1, 3), (1, -89, 101), (1, 102, -99))
>>> maximize_sum(m)
(106, [(1, 3), (1, 101)])

Since nobody asked for an efficient algorithm, use brute force: generate every possible matrix that can be created by removing rows and/or columns from the original matrix, choose the best one. A slightly more efficent version, which most likely can be proved to still be correct, is to generate only those variants where the removed rows and columns contain at least one negative value.

To try it in a simple way:
We need the valid subset of the set of entries {A00, A01, A02, ..., A0n, A10, ...,Ann} which max. sum.
First compute all subsets (the power set).
A valid subset is a member of the power set that for each two contained entries Aij and A(i+x)(j+y), contains also the elements A(i+x)j and Ai(j+y) (which are the remaining corners of the rectangle spanned by Aij and A(i+x)(j+y)).
Aij ...
. .
. .
... A(i+x)(j+y)
By that you can eliminate the invalid ones from the power set and find the one with the biggest sum in the remaining.
I'm sure it can be improved by improving an algorithm for power set generation in order to generate only valid subsets and by that avoiding step 2 (adjusting the power set).

I think there are some angles of attack that might improve upon brute force.
memoization, since there are many distinct sequences of edits that will arrive at the same submatrix.
dynamic programming. Because the search space of matrices is highly redundant, my intuition is that there would be a DP formulation that can save a lot of repeated work
I think there's a heuristic approach, but I can't quite nail it down:
if there's one negative number, you can either take the matrix as it is, remove the column of the negative number, or remove its row; I don't think any other "moves" result in a higher sum. For two negative numbers, your options are: remove neither, remove one, remove the other, or remove both (where the act of removal is either by axing the row or the column).
Now suppose the matrix has only one positive number and the rest are all <=0. You clearly want to remove everything but the positive entry. For a matrix with only 2 positive entries and the rest <= 0, the options are: do nothing, whittle down to one, whittle down to the other, or whittle down to both (resulting in a 1x2, 2x1, or 2x2 matrix).
In general this last option falls apart (imagine a matrix with 50 positives & 50 negatives), but depending on your data (few negatives or few positives) it could provide a shortcut.

Create an n-by-1 vector RowSums, and an n-by-1 vector ColumnSums. Initialize them to the row and column sums of the original matrix. O(n²)
If any row or column has a negative sum, remove edit: the one with the minimum such and update the sums in the other direction to reflect their new values. O(n)
Stop when no row or column has a sum less than zero.
This is an iterative variation improving on another answer. It operates in O(n²) time, but fails for some cases mentioned in other answers, which is the complexity limit for this problem (there are n² entries in the matrix, and to even find the minimum you have to examine each cell once).
Edit: The following matrix has no negative rows or columns, but is also not maximized, and my algorithm doesn't catch it.
1 1 3 goal 1 3
1 -89 101 ===> 1 101
1 102 -99
The following matrix does have negative rows and columns, but my algorithm selects the wrong ones for removal.
-5 1 -5 goal 1
1 1 1 ===> 1
-10 2 -10 2
mine
===> 1 1 1

Compute the sum of each row and column. This can be done in O(m) (where m = n^2)
While there are rows or columns that sum to negative remove the row or column that has the lowest sum that is less than zero. Then recompute the sum of each row/column.
The general idea is that as long as there is a row or a column that sums to nevative, removing it will result in a greater overall value. You need to remove them one at a time and recompute because in removing that one row/column you are affecting the sums of the other rows/columns and they may or may not have negative sums any more.
This will produce an optimally maximum result. Runtime is O(mn) or O(n^3)

I cannot really produce an algorithm on top of my head, but to me it 'smells' like dynamic programming, if it serves as a start point.

Big Edit: I honestly don't think there's a way to assess a matrix and determine it is maximized, unless it is completely positive.
Maybe it needs to branch, and fathom all elimination paths. You never no when a costly elimination will enable a number of better eliminations later. We can short circuit if it's found the theoretical maximum, but other than any algorithm would have to be able to step forward and back. I've adapted my original solution to achieve this behaviour with recursion.
Double Secret Edit: It would also make great strides to reduce to complexity if each iteration didn't need to find all negative elements. Considering that they don't change much between calls, it makes more sense to just pass their positions to the next iteration.
Takes a matrix, the list of current negative elements in the matrix, and the theoretical maximum of the initial matrix. Returns the matrix's maximum sum and the list of moves required to get there. In my mind move list contains a list of moves denoting the row/column removed from the result of the previous operation.
Ie: r1,r1
Would translate
-1 1 0 1 1 1
-4 1 -4 5 7 1
1 2 4 ===>
5 7 1
Return if sum of matrix is the theoretical maximum
Find the positions of all negative elements unless an empty set was passed in.
Compute sum of matrix and store it along side an empty move list.
For negative each element:
Calculate the sum of that element's row and column.
clone the matrix and eliminate which ever collection has the minimum sum (row/column) from that clone, note that action as a move list.
clone the list of negative elements and remove any that are effected by the action taken in the previous step.
Recursively call this algorithm providing the cloned matrix, the updated negative element list and the theoretical maximum. Append the moves list returned to the move list for the action that produced the matrix passed to the recursive call.
If the returned value of the recursive call is greater than the stored sum, replace it and store the returned move list.
Return the stored sum and move list.
I'm not sure if it's better or worse than the brute force method, but it handles all the test cases now. Even those where the maximum contains negative values.

This is an optimization problem and can be solved approximately by an iterative algorithm based on simulated annealing:
Notation: C is number of columns.
For J iterations:
Look at each column and compute the absolute benefit of toggling it (turn it off if it's currently on or turn it on if it's currently off). That gives you C values, e.g. -3, 1, 4. A greedy deterministic solution would just pick the last action (toggle the last column to get a benefit of 4) because it locally improves the objective. But that might lock us into a local optimum. Instead, we probabilistically pick one of the three actions, with probabilities proportional to the benefits. To do this, transform them into a probability distribution by putting them through a Sigmoid function and normalizing. (Or use exp() instead of sigmoid()?) So for -3, 1, 4 you get 0.05, 0.73, 0.98 from the sigmoid and 0.03, 0.42, 0.56 after normalizing. Now pick the action according to the probability distribution, e.g. toggle the last column with probability 0.56, toggle the second column with probability 0.42, or toggle the first column with the tiny probability 0.03.
Do the same procedure for the rows, resulting in toggling one of the rows.
Iterate for J iterations until convergence.
We may also, in early iterations, make each of these probability distributions more uniform, so that we don't get locked into bad decisions early on. So we'd raise the unnormalized probabilities to a power 1/T, where T is high in early iterations and is slowly decreased until it approaches 0. For example, 0.05, 0.73, 0.98 from above, raised to 1/10 results in 0.74, 0.97, 1.0, which after normalization is 0.27, 0.36, 0.37 (so it's much more uniform than the original 0.05, 0.73, 0.98).

It's clearly NP-Complete (as outlined above). Given this, if I had to propose the best algorithm I could for the problem:
Try some iterations of quadratic integer programming, formulating the problem as: SUM_ij a_ij x_i y_j, with the x_i and y_j variables constrained to be either 0 or 1. For some matrices I think this will find a solution quickly, for the hardest cases it would be no better than brute force (and not much would be).
In parallel (and using most of the CPU), use a approximate search algorithm to generate increasingly better solutions. Simulating Annealing was suggested in another answer, but having done research on similar combinatorial optimisation problems, my experience is that tabu search would find good solutions faster. This is probably close to optimal in terms of wandering between distinct "potentially better" solutions in the shortest time, if you use the trick of incrementally updating the costs of single changes (see my paper "Graph domination, tabu search and the football pool problem").
Use the best solution so far from the second above to steer the first by avoiding searching possibilities that have lower bounds worse than it.
Obviously this isn't guaranteed to find the maximal solution. But, it generally would when this is feasible, and it would provide a very good locally maximal solution otherwise. If someone had a practical situation requiring such optimisation, this is the solution that I'd think would work best.
Stopping at identifying that a problem is likely to be NP-Complete will not look good in a job interview! (Unless the job is in complexity theory, but even then I wouldn't.) You need to suggest good approaches - that is the point of a question like this. To see what you can come up with under pressure, because the real world often requires tackling such things.

yes, it's NP-complete problem.
It's hard to easily find the best sub-matrix,but we can easily to find some better sub-matrix.
Assume that we give m random points in the matrix as "feeds". then let them to automatically extend by the rules like :
if add one new row or column to the feed-matrix, ensure that the sum will be incrementive.
,then we can compare m sub-matrix to find the best one.

Let's say n = 10.
Brute force (all possible sets of rows x all possible sets of columns) takes
2^10 * 2^10 =~ 1,000,000 nodes.
My first approach was to consider this a tree search, and use
the sum of positive entries is an upper bound for every node in the subtree
as a pruning method. Combined with a greedy algorithm to cheaply generate good initial bounds, this yielded answers in about 80,000 nodes on average.
but there is a better way ! i later realised that
Fix some choice of rows X.
Working out the optimal columns for this set of rows is now trivial (keep a column if its sum of its entries in the rows X is positive, otherwise discard it).
So we can just brute force over all possible choices of rows; this takes 2^10 = 1024 nodes.
Adding the pruning method brought this down to 600 nodes on average.
Keeping 'column-sums' and incrementally updating them when traversing the tree of row-sets should allow the calculations (sum of matrix etc) at each node to be O(n) instead of O(n^2). Giving a total complexity of O(n * 2^n)

For slightly less than optimal solution, I think this is a PTIME, PSPACE complexity issue.
The GREEDY algorithm could run as follows:
Load the matrix into memory and compute row totals. After that run the main loop,
1) Delete the smallest row,
2) Subtract the newly omitted values from the old row totals
--> Break when there are no more negative rows.
Point two is a subtle detail: subtracted two rows/columns has time complexity n.
While re-summing all but two columns has n^2 time complexity!

Take each row and each column and compute the sum. For a 2x2 matrix this will be:
2 1
3 -10
Row(0) = 3
Row(1) = -7
Col(0) = 5
Col(1) = -9
Compose a new matrix
Cost to take row Cost to take column
3 5
-7 -9
Take out whatever you need to, then start again.
You just look for negative values on the new matrix. Those are values that actually substract from the overall matrix value. It terminates when there're no more negative "SUMS" values to take out (therefore all columns and rows SUM something to the final result)
In an nxn matrix that would be O(n^2)Log(n) I think

function pruneMatrix(matrix) {
max = -inf;
bestRowBitField = null;
bestColBitField = null;
for(rowBitField=0; rowBitField<2^matrix.height; rowBitField++) {
for (colBitField=0; colBitField<2^matrix.width; colBitField++) {
sum = calcSum(matrix, rowBitField, colBitField);
if (sum > max) {
max = sum;
bestRowBitField = rowBitField;
bestColBitField = colBitField;
}
}
}
return removeFieldsFromMatrix(bestRowBitField, bestColBitField);
}
function calcSumForCombination(matrix, rowBitField, colBitField) {
sum = 0;
for(i=0; i<matrix.height; i++) {
for(j=0; j<matrix.width; j++) {
if (rowBitField & 1<<i && colBitField & 1<<j) {
sum += matrix[i][j];
}
}
}
return sum;
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio