duplicacy problems while creating a sudoku puzzle

duplicacy problems while creating a sudoku puzzle - algorithm

I am trying to create my own normal 9x9 sudoku puzzle.
I divided the problem into two parts -
creating a fully filled sudoku, and
removing unnecessary numbers from
the grid
Right now, I am stuck with the first part.
This is the algorithm I use in brief:
a) first of all I choose a number (say 1), generate a random cell position, and place it there if
the cell is not already occupied, and
if the row does not already have the number, and
if the column does not already have the number, and
if the 3x3 box does not already have the number
b) now I check for a situation in which in a row, or a column or a box, only one place is empty and I fill that
c) I check that if there is a number that in not present in a box but is present in the boxes in the same row and the same column (i am talking about 3x3 boxes here), the number's place is fixed and I fill it.
d) I repeat the above steps until every number appears nine times on the grid.
The problem I am facing is that, more than often I am getting an intermediate situation like this:
0 1 0 | 0 0 3 | 0[4/2]0
0 [2] 0 | 0 [4] 1 | 3 0 0
3 0 [4]|[2] 0 0 | 0 0 1
---------+---------+---------
2 0 3 | 0 5 4 | 0 1 0
0 0 1 | 3 0 2 |[4] 0 0
0 4 0 | 0 1 0 |[2] 3 0
---------+---------+---------
1 0 2 | 0 3 0 | 0 0 [4]
4 3 0 | 1 0 0 | 0 0 [2]
5 0 0 | 4 2 0 | 1 0 3
See the place with [4/2] written? that is the place of 2 as well as 4 because of the boxes marked [].
What can I do to avoid getting in this situation (because this situation is a deadlock - I cannot move further)

There's another way to generate sudoku puzzles: Start with a known good grid - any one will do - then randomly 'shuffle' it by applying operations that don't destroy the invariants. Valid operations include:
Swapping rows within a block
Swapping columns within a block
Swapping entire rows of blocks (eg, first 3, middle 3, last 3 rows)
Swapping entire columns of blocks
Swapping all instances of one number with another
Reflecting the board
Rotating the board
With these operations, you can generate a very large range of possible boards. You need to be careful about how you apply the operations, however - much like the naive shuffle, it's easy to write an algorithm that makes some boards more likely than others. A technique similar to the Knuth shuffle may help here.
Edit: It has been pointed out in the comments that these operations alone aren't sufficient to create every possible grid.

You will always get that situation. You need a recursive backtracking search to solve it.
Basically, the only way to determine whether a particular digit really is valid for a cell is to continue the search and see what happens.
Backtracking searches are normally done using recursive calls. Each call will iterate through the (possibly) still valid options for one cell, recursing to evaluate all the options for the next cell. When you can't continue, backtracking means returning from the current call - erasing any digit you tested out for that cell first, of course.
When you find a valid solution, either save it and backtrack to continue (ie find alternatives), or break out of all the recursive calls to finish. Success in a recursive backtracking search is a special case where throwing an exception for success is IMO a good idea - it is exceptional for a particular call to succeed, and the code will be clearer.
If generating a random board, iterate the options in a particular recursive call (for a particular cell) in random order.
The same basic algorithm also applies for a partly completed board (ie to solve existing sodoku) - when evaluating a cell that already has a digit, well, that's the only option for that cell so recurse for the next cell.
Here's the backtracking search from a solver I wrote once - a lot is abstracted out, but hopefully that just makes the principle clearer...
size_t Board::Rec_Search (size_t p_Pos)
{
size_t l_Count = 0;
if (p_Pos == 81) // Found a solution
{
l_Count++;
std::cout << "------------------------" << std::endl;
Draw ();
std::cout << "------------------------" << std::endl;
}
else
{
if (m_Board [p_Pos] == 0) // Need to search here
{
size_t l_Valid_Set = Valid_Set (p_Pos);
if (l_Valid_Set != 0) // Can only continue if there are possible digits
{
size_t l_Bit = 1; // Scan position for valid set
for (size_t i = 1; i <= 9; i++)
{
if (l_Valid_Set & l_Bit)
{
Set_Digit (p_Pos, i);
l_Count += Rec_Search (p_Pos + 1);
}
l_Bit <<= 1;
}
Clr_Digit (p_Pos); // Ensure cleared properly for backtracking
}
}
else // Already filled in - skip
{
l_Count += Rec_Search (p_Pos + 1);
}
}
return l_Count;
}

If you've reached a contradictory state where a cell if both 2 and 4 some of your other 2s and 4s must be placed wrongly. You need to rollback and try some different solutions.
Sounds like you might have an algorithm problem? Some good stuff here.

Related

Optimal choice algorithm

i have an appointment for university which is due today and i start getting nervous. We recently discussed dynamic programming for algorithm optimization and now we shall implement an algorithm ourself which uses dynamic programming.
Task
So we have a simple game for which we shall write an algorithm to find the best possible strategy to get the best possible score (assuming both players play optimized).
We have a row of numbers like 4 7 2 3 (note that according to the task description it is not asured that it always is an equal count of numbers). Now each player turnwise takes a number from the back or the front. When the last number is picked the numbers are summed up for each player and the resulting scores for each player are substracted from each other. The result is then the score for player 1. So an optimal order for the above numbers would be
P1: 3 -> p2: 4 -> p1: 7 -> p2: 2
So p1 would have 3, 7 and p2 would have 4, 2 which results in a final score of (3 + 7) - (4 + 2) = 4 for player 1.
In the first task we should simply implement "an easy recursive way of solving this" where i just used a minimax algorithm which seemed to be fine for the automated test. In the second task however i am stuck since we shall now work with dynamic programming techniques. The only hint i found was that in the task itself a matrix is mentioned.
What i know so far
We had an example of a word converting problem where such a matrix was used it was called Edit distance of two words which means how many changes (Insertions, Deletions, Substitutions) of letters does it take to change one word into another. There the two words where ordered as a table or matrix and for each combination of the word the distance would be calculated.
Example:
W H A T
| D | I
v v
W A N T
editing distance would be 2. And you had a table where each editing distance for each substring was displayed like this:
"" W H A T
1 2 3 4
W 1 0 1 2 3
A 2 1 1 2 3
N 3 2 2 2 3
T 4 3 3 3 2
So for example from WHA to WAN would take 2 edits: insert N and delete H, from WH to WAN would also take 2 edits: substitude H->A and insert N and so on. These values where calculated with an "OPT" function which i think stands for optimization.
I also leanred about bottom-up and top-down recursive schemes but im not quite sure how to attach that to my problem.
What i thought about
As a reminder i use the numbers 4 7 2 3.
i learned from the above that i should try to create a table where each possible result is displayed (like minimax just that it will be saved before). I then created a simple table where i tried to include the possible draws which can be made like this (which i think is my OPT function):
4 7 2 3
------------------
a. 4 | 0 -3 2 1
|
b. 7 | 3 0 5 4
|
c. 2 | -2 -5 0 -1
|
d. 3 | -1 -4 1 0
the left column marks player 1 draws, the upper row marks player 2 draws and each number then stands for numberP1 - numberP2. From this table i can at least read the above mentioned optimal strategy of 3 -> 4 -> 7 -> 2 (-1 + 5) so im sure that the table should contain all possible results, but im not quite sure now how to draw the results from it. I had the idea to start iterating over the rows and pick the one with the highest number in it and mark that as the pick from p1 (but that would be greedy anyways). p2 would then search this row for the lowest number and pick that specific entry which would then be the turn.
Example:
p1 picks row a. 7 | 3 0 5 4 since 5 is the highest value in the table. P2 now picks the 3 from that row because it is the lowest (the 0 is an invalid draw since it is the same number and you cant pick that twice) so the first turn would be 7 -> 4 but then i noticed that this draw is not possible since the 7 is not accessible from the start. So for each turn you have only 4 possibilities: the outer numbers of the table and the ones which are directly after/before them since these would be accessable after drawing. So for the first turn i only have rows a. or d. and from that p1 could pick:
4 which leaves p2 with 7 or 3. Or p1 takes 3 which leaves p2 with 4 or 2
But i dont really know how to draw a conclusion out of that and im really stuck.
So i would really like to know if im on the right way with that or if im overthinking this pretty much. Is this the right way to solve this?

The first thing you should try to write down, when starting a dynamic programming algorithm, is a recurrence relation.
Let's first simplify a very little the problem. We will consider that the number of cards is even, and that we want to design an optimal strategy for the first player to play. Once we have managed to solve this version of the problem, the others (odd number of cards, optimize strategy for second player) follows trivially.
So, first, a recurrence relation. Let X(i, j) be the best possible score that player 1 can expect (when player 2 plays optimally as well), when the cards remaining are from the i^th to the j^th ones. Then, the best score that player 1 can expect when playing the game will be represented by X(1, n).
We have:
X(i, j) = max(Arr[i] + X(i+1, j), X(i, j-1) + Arr[j]) if j-i % 2 == 1, meaning that the best score that player one can expect is the best between taking the card on the left, and taking the card on the right.
In the other case, the other player is playing, so he'll try to minimize:
X(i, j) = min(Arr[i] + X(i+1, j), X(i, j-1) + Arr[j]) if j-i % 2 == 0.
The terminal case is trivial: X(i, i) = Arr[i], meaning that when there is only one card, we just pick it, and that's all.
Now the algorithm without dynamic programming, here we only write the recurrence relation as a recursive algorithm:
function get_value(Arr, i, j) {
if i == j {
return Arr[i]
} else if j - i % 2 == 0 {
return max(
Arr[i] + get_value(i+1, j),
get_value(i, j-1) + Arr[j]
)
} else {
return min(
Arr[i] + get_value(i+1, j),
get_value(i, j-1) + Arr[j]
)
}
}
The problem with this function is that for some given i, j, there will be many redundant calculations of X(i, j). The essence of dynamic programming is to store intermediate results in order to prevent redundant calculations.
Algo with dynamic programming (X is initialized with + inf everywhere.
function get_value(Arr, X, i, j) {
if X[i][j] != +inf {
return X[i][j]
} else if i == j {
result = Arr[i]
} else if j - i % 2 == 0 {
result = max(
Arr[i] + get_value(i+1, j),
get_value(i, j-1) + Arr[j]
)
} else {
result = min(
Arr[i] + get_value(i+1, j),
get_value(i, j-1) + Arr[j]
)
}
X[i][j] = result
return result
}
As you can see the only difference with the algorithm above is that we now use a 2D array X to store intermediate results. The consequence on time complexity is huge, since the first algorithm runs in O(2^n), while the second runs in O(n²).

Dynamic programming problems can generally be solved in 2 ways, top down and bottom up.
Bottom up requires building a data structure from the simplest to the most complex case. This is harder to write, but offers the option of throwing away parts of the data that you know you won't need again. Top down requires writing a recursive function, and then memoizing. So bottom up can be more efficient, top down is usually easier to write.
I will show both. The naive approach can be:
def best_game(numbers):
if 0 == len(numbers):
return 0
else:
score_l = numbers[0] - best_game(numbers[1:])
score_r = numbers[-1] - best_game(numbers[0:-1])
return max(score_l, score_r)
But we're passing a lot of redundant data. So let's reorganize it slightly.
def best_game(numbers):
def _best_game(i, j):
if j <= i:
return 0
else:
score_l = numbers[i] - _best_game(i+1, j)
score_r = numbers[j-1] - _best_game(i, j-1)
return max(score_l, score_r)
return _best_game(0, len(numbers))
And now we can add a caching layer to memoize it:
def best_game(numbers):
seen = {}
def _best_game(i, j):
if j <= i:
return 0
elif (i, j) not in seen:
score_l = numbers[i] - _best_game(i+1, j)
score_r = numbers[j-1] - _best_game(i, j-1)
seen[(i, j)] = max(score_l, score_r)
return seen[(i, j)]
return _best_game(0, len(numbers))
This approach will be memory and time O(n^2).
Now bottom up.
def best_game(numbers):
# We start with scores for each 0 length game
# before, after, and between every pair of numbers.
# There are len(numbers)+1 of these, and all scores
# are 0.
scores = [0] * (len(numbers) + 1)
for i in range(len(numbers)):
# We will compute scores for all games of length i+1.
new_scores = []
for j in range(len(numbers) - i):
score_l = numbers[j] - scores[j+1]
score_r = numbers[j+i] - scores[j]
new_scores.append(max(score_l, score_r))
# And now we replace scores by new_scores.
scores = new_scores
return scores[0]
This is again O(n^2) time but only O(n) space. Because after I compute the games of length 1 I can throw away the games of length 0. Of length 2, I can throw away the games of length 1. And so on.

Develop an algorithm for "The Tumbler Problem"

Problem Statement: Several tumblers are placed in a line on a table. Some tumblers are upside down, some are the right way up. It is required to turn all the tumblers the right way up. However, the tumblers may not be turned individually; an allowed move is to turn any two tumbler simultaneously. From which initial states of the tumblers is it possible to turn all the tumblers the right way up?
I need to understand the problem and develop an algorithm for this problem

Since the tumblers start either upside-down or rightside-up, think of it in binary - each tumbler is a bit: correct orientation = 0, incorrect = 1. Thus, when you have finished the problem, the sum will be 0 (all 0's).
Also note that there are 3 possible moves if you have to do 2 at a time:
flip a 1 into a 0 and a 0 into a 1 (net change is -1 + 1 = 0)
flip two 0's into 1's (net change is 1 + 1 = 2)
flip two 1's into 0's (net change is -1 - 1 = -2)
Since you can only change the sum by +2 and -2, then the starting sum must be even - an even number of incorrectly aligned tumblers need to exist at the start.
Since there are an even number of 1's, then your algorithm should always turn two 1's into 0's every move. There should never be a need to do any other move on a setup that started solvable.
Your algorithm should represent the tumblers as a list of bits like tumblers = [0,1,1,0].
First, check solvability by summing the list and checking if even:
solvable = (sum(tumblers)%2==0).
Only if it's solvable, set up a loop to solve:
if(solvable)
while(sum(tumblers)!=0)
found = 0
indexA = null
indexB = null
foreach(tumblers as index=>tumbler)
if(found==2)
break //exit foreach loop since you found two 1's
endif
if(tumbler==1)
found++
if(indexA==null)
indexA = index
else
indexB = index
endif
endif
endforeach
tumblers[indexA] = 0
tumblers[indexB] = 0
endwhile
endif

Matlab: Speed up loop applied to each of 820,000 elements

I have a set of rainfall data, with a value every 15 minutes over many years, giving 820,000 rows.
The aim (eventually) of my code is to create columns which categorise the data which can then be used to extract relevant chunks of data for further analysis.
I am a Matlab novice and would appreciate some help!
The first steps I have got working sufficiently fast. However, some steps are very slow.
I have tried pre-allocating arrays, and using the lowest intX (8 or 16 depending on situation) possible, but other steps are so slow they don't complete.
The slow ones are for loops, but I don't know if they can be vectorised/split into chunks/anything else to speed them up.
I have a variable "rain" which contains a value for every time step/row.
I have created a variable called "state" of 0 if no rain, and 1 if there is rain.
Also a variable called "begin" which has 1 if it is the first row of a storm, and 0 if not.
The first slow loop is to create a "spell" variable - to give each rain storm a number.
% Generate blank column for spell of size (rain) - preallocate
spell = zeros(size(st),1,'int16');
% Start row for analysis
x=1;
% Populate "spell" variable with a storm number in each row of rain, for the storm number it belongs to (storm number calculated by adding up the number of "begin" values up to that point
for i=1:size(state)
if(state(x)==1)
spell(x) = sum(begin(1:x));
end
x=x+1;
end
The next stage is about length of each storm. The first steps are fast enough.
% List of storm numbers
spellnum = unique(spell);
% Length of each spell
spelllength = histc(spell,spellnum);
The last step below (the for loop) is too slow and just crashes.
% Generate blank column for length
length = zeros(size(state),1,'int16');
% Starting row
x = 1;
% For loop to output the total length of the storm for each row of rain within that storm
for i=1:size(state)
for j=1:size(state)
position = find(spell==x);
for k=1:size(state)
length(position) = spelllength(x+1);
end
end
x=x+1;
end
Is it possible to make this more efficient?
Apologies if examples already exist - I'm not sure what the process would be called!
Many thanks in advance.

Mem. allocation/reallocation tips:
try to create the results directly from expression (eventually trimming another, more general result);
if 1. is not possible, try to pre-allocate whenever possible (when you have an upper limit for the result);
if 2. is not possible try to grow cell-arrays rather than massive matrices (because a matrix requires a contiguous memory area)
Type-choice tips:
try to use always double in intermediate results, because is the basic numeric data type in MATLAB; avoiding conversions back and forth;
use other types for intermediate results only if there's a memory constraint that can be alleviated by using a smaller-size type.
Linearisation tips:
fastest linearisation uses matrix-wise or element-wise basic algebraic operations combined with logical indexing.
loops are not that bad starting with MATLAB R2008;
the worst-performing element-wise processing functions are arrayfun, cellfun and structfun with anonymous functions, because anon functions evaluate the slowest;
try not to calculate the same things twice, even if this gives you better linearisation.
First block:
% Just calculate the entire cumulative sum over begin, then
% trim the result. Check if the cumsum doesn't overflow.
spell = cumsum(begin);
spell(state==0) = 0;
Second block:
% The same, not sure how could you speed this up; changed
% the name of variables to my taste, though.
spell_num = unique(spell);
spell_length = histc(spell,spell_num);
Third block:
% Fix the following issues:
% - the most-inner "for" does not make sense because it rewrites
% several times the same thing;
% - the same looping variable "i" is re-used in three nested loops,
% - thename of the standard function "length" is obscured by declaring
% a variable named "length".
for x = 1:numel(spell_num)
storm_selector = (spell==spell_num(x));
storm_length(storm_selector) = spell_length(x+1);
end;

The combination of code I have ended up using is a mixture from #CST_Link and #Sifu. Thank you very much for your help! I don't think Stackoverflow lets me accept two answers, so for clarity by putting it all together, here is the code which everyone's helped me create!
The only slow part is the for loop in block three, but this still runs in a few minutes, which is good enough for me, and infinitely better than my attempt.
First block:
%% Spell
%spell is cumulative sum of begin
spell = cumsum(begin);
%% start row
x=1;
%% Replace all rows of spell with no rain with 0
spell(state==0)=0
Second block (unchanged except better variable names):
%% Spell number = all values of spell
spell_num = unique(spell);
%% Spell length = how many of each value of spell
spell_length = histc(spell,spell_num);
Third block:
%% Generate blank column for spell of size (state)
spell_length2 = zeros(length(state),1);
%%
for x=1:length(state)
position = find(spell==x);
spell_length2(position) = spell_length(x+1);
end

for the first part if i am following what you are doing
i created some data matching your description for testing.
please tell me if i missed something
state=[ 1 0 0 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 0];
begin=[ 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0];
spell = zeros(length(state),1,'int16');
%Start row for analysis
x=1;
% Populate "spell" variable with a storm number in each row of rain, for the storm number it belongs to (storm number calculated by adding up the number of "begin" values up to that point
for i=1:length(state)
if(state(x)==1)
spell(x) = sum(begin(1:x));
end
x=x+1;
end
% can be accomplished by simply using cumsum ( no need for extra variables if you are short in memory)
spell2=cumsum(begin);
spell3=spell2.*(state==1);
and the output for both spell and spell3 as shown
[spell.'; spell3]
0 0 0 0 0 1 1 1 1 1 0 2 0 0 2 0 3 3 3 3 0
0 0 0 0 0 1 1 1 1 1 0 2 0 0 2 0 3 3 3 3 0

Why don't you do that instead?
% For loop to output the total length of the storm for each row of rain within that storm
for x=1:size(state)
position = find(spell==x);
length(position) = spelllength(x+1);
end
I replaced the i iterator for x, that removes 2 lines and some computation.
I then proceeded to removed the two nested loops as they were litteraly useless (each loop would output the same thing)
That's already a good start..

How to sort an integer array on lexicological order using only adjacent swaps for a given max # of swaps(m)

I was asked that one during a phone interview of course, the other questions where fine, but that one I'm still not sure of the best answer.
At first i thought it smelled of a radix sort but since you can't only use adjacent swaps of course not.
So I think it's more of a bubble sort type algo, which is what I tried to do but the "max number of swaps" bit makes it very tricky (along with he lexicological part but i guess that's just a comparaison side issue)
I guess my algo would be something like (of course now i have better ideas than during the interview !)
int index = 0;
while(swapsLeft>0 && index < arrays.length)
{
int smallestIndex = index;
for(int i=index; i < index + swapsLeft)
{
// of course < is not correct, we need to compare as string or "by radix" or something
if(array[i]) < array[smallestIndex]
smallestIndex = i;
}
// if found a smaller item within swap range then swap it to the front
for(int i = smallestIndex; i > index; i--)
{
temp = array[smallestIndex];
array[smallestIndex] = array[index];
array[index] = temp
swapsLeft--;
}
// continue for next item in array
index ++; // edit:could probably optimize to index = index + 1 + (smallestIndex - index) ?
}
Does that seem about right ?
Who as a better solution, I'm curious as to an efficient / proper way to do this.

I am actually working on writing this exact code for my Algorithms class in Java for my Software Engineering Bachelors degree. So I will help you solve this by explaining the problem, and the steps to solve it. You are going to need at least 2 methods to do this more than once.
First you take your first value, just to make this easy lets keep it small and simple.
1 2 3 4
You should be using an array for sorting. To find the next number lexologically, you start out on the far right, move to the left, and stop when you find your first decrease. You have to replace that smaller value with the next largest value on the right. So for our example we would be replacing 3 with 4. So our next number is:
1 2 4 3
That was pretty simple right? Don't worry it gets much harder. Let's now try to get the next number using:
1 4 3 2
Ok so we start out on the far right and move left till our first smaller number. 2 is smaller than 3 is smaller than 4 is larger than 1. Ok so we have our first decrease at 1. So now we need to move back to the right till we hit the last number that is larger than 1. 4 is larger than 1, 3 is larger than 1, and 2 is larger than 1. Ok with 2 being the last number that means that 2 need to replace 1. But what about the rest of the numbers, well they are already in order, they are just backwards of what we need. So we need to flip the order and we come up with:
2 1 3 4
So you need a method that does that sorting, and another method that calls that method in a loop until you have done the correct number of parameters.

Can we compute this in less than O(n*n) ...( nlogn or n)

This is a question asked to me by a very very famous MNC. The question is as follows ...
Input an 2D N*N array of 0's and 1's. If A(i,j) = 1, then all the values corresponding to the ith row and the jth column are going to be 1. If there is a 1 already, it remains as a 1.
As an example , if we have the array
1 0 0 0 0
0 1 1 0 0
0 0 0 0 0
1 0 0 1 0
0 0 0 0 0
we should get the output as
1 1 1 1 1
1 1 1 1 1
1 1 1 1 0
1 1 1 1 1
1 1 1 1 0
The input matrix is sparsely populated.
Is this possible in less than O(N^2)?
No additional space is provided was another condition. I would like to know if there's a way to achieve the complexity using a space <= O(N).
P.S : I don't need answers that give me a complexity of O(N*N). This is not a homework problem. I have tried much and couldn't get a proper solution and thought I could get some ideas here.Leave the printing aside for the complexity
My rough idea was to may be dynamically eliminate the number of elements traversed restricting them to around 2N or so. But I couldn't get a proper idea.

In the worst case, you may need to toggle N * N - N bits from 0 to 1 to generate the output. It would seem you're pretty well stuck with O(N*N).

I would imagine that you can optimize it for the best case, but I'm tempted to say that your worst case is still O(N*N): Your worst case will be an array of all 0s, and you will have to examine every single element.
The optimization would involve skipping a row or column as soon as you found a "1" (I can provide details, but you said you don't care about O(N*N)", but unless you have metadata to indicate that an entire row/column is empty, or unless you have a SIMD-style way to check multiple fields at once (say, if every row is aligned by 4, and you can read 32 bits worth data, or if your data is in form of a bitmask), you will always have to deal with the problem of an all-zero array.

Clearly, nor the output matrix nor its negated version has to be sparse (take a matrix with half of the first row set to 1 and anything else to 0 to see), so time depends on what format you are allowed to use for the output. (I'm assuming the input is a list of elements or something equivalent, since otherwise you couldn't take advantage of the matrix being sparse.)
A simple solution for O(M+N) space and time (M is the number of ones in the input matrix): take two arrays of length N filled with ones, iterate through all ones in the input, and for each drop the X coordinate from the first array and the Y from the second one. The output is the two arrays, which clearly define the result matrix: its (X,Y) coordinate is 0 iff the X coordinate of the first array and the Y coordinate of the second are 0.
Update: depending on the language, you could use some trickery to return a normal 2D array by referencing the same row multiple times. For example in PHP:
// compute N-length arrays $X and $Y which have 1 at the column
// and row positions which had no 1's in the input matrix
// this is O(M+N)
$result = array();
$row_one = array_fill(0,N,1);
for ($i=0; $i<N; $i++) {
if ($Y[$i]) {
$result[$i] = &$row_one;
} else {
$result[$i] = &$X;
}
}
return $result;
Of course this is a normal array only as long as you don't try to write it.

Since every entry of the matrix has to be checked, your worst case is always going to be N*N.
With a small 2*N extra storage, you can perform the operation in O(N*N). Just create a mask for each row and another for each column - scan the array and update the masks as you go. Then scan again to populate the result matrix based on the masks.
If you're doing something where the input matrix is changing, you could store a count of non-zero entries for each row and column of the input (rather than a simple mask). Then when an entry in the input changes, you update the counts accordingly. At that point, I would drop the output matrix entirely and query the masks/counts directly rather than even maintaining the output matrix (which could also be updated as thing change in less than NN time if you really wanted to keep it around). So loading the initial matrix would still be O(NN) but updates could be much less.

The input matrix may be sparse, but unless you can get it in a sparse format (i.e. a list of (i,j) pairs that are initially set), just reading your input will consume Ω(n^2) time. Even with sparse input, it's easy to end up with O(n^2) output to write. As a cheat, if you were allowed to output a list of set rows and set columns, then you could get down to linear time. There's no magic to be had when your algorithm actually has to produce a result more substantial than 'yes' or 'no'.
Mcdowella's comment on another answer suggests another alternative input format: run-length encoding. For a sparse input, that clearly requires no more than O(n) time to read it (consider how many transitions there are between 0 and 1). However, from there it breaks down. Consider an input matrix structured as follows:
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 . . .
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . .
. .
. .
. .
That is, alternating 0 and 1 on the first row, 0 everywhere else. Clearly sparse, since there are n/2 ones in total. However, the RLE output has to repeat this pattern in every row, leading to O(n^2) output.

You say:
we should get the output as...
So you need to output the entire matrix, which has N^2 elements. This is O(N*N).
The problem itself is not O(N*N): you dont have to compute and store the entire matrix: you only need two vectors, L and C, each of size N:
L[x] is 1 if line x is a line of ones, 0 otherwise;
C[x] is 1 if line x is a line of ones, 0 otherwise;
You can construct these vectors in O(N), because the initial matrix is sparse; your input data will not be a matrix, but a list containing the coordinates(line,column) of each non-zero element. While reading this list, you set L[line]=1 and C[column]=1, and the problem is solved: M[l,c] == 1 if L[l]==1 OR C[c]==1

Hii guys ,
thanks to the comment from mb14 i think i could get it solved in less than O(NN) time...
The worst would take O(NN)...
Actually , we have the given array suppose
1 0 0 0 1
0 1 0 0 0
0 1 1 0 0
1 1 1 0 1
0 0 0 0 0
Lets have 2 arrays of size N (this would be the worst case) ... One is dedicated for indexing rows and other columns...
Put those with a[i][1] = 0 in one array and then a[1][j] =0 in another..
Then take those values only and check for the second row and colums...In this manner , we get the values of rows and colums where there are only 0;'s entirely...
The number of values in the row array gives number of 0's in the result array and the points a[row-array values][column array value] gives you those points ....
We could solve it in below O(NN) and worst is O(NN) ... As we can seee , the arrays ( of size N) diminishes ....
I did this for a few arrays and got the result for all of them ... :)
Please correct me if i am wrong anywhere...
Thanx for all your comments guys...You are all very helpful and i did learn quite a few things along the way ... :)

There is clearly up to O(N^2) work to do. In the matrix
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
all bits have to be set to 1, and N*(N-1) are not set to one (20, in this 5x5 case).
Conversely, you can come up with an algorithm that always does it in O(N^2) time: sum along the top row and let column, and if the row or column gets a nonzero answer, fill in the entire row or column; then solve the smaller (N-1)x(N-1) problem.
So there exist cases that must take at least N^2 and any case can be solved in N^2 without extra space.

If your matrix is sparse, the complexity depends much on the input encoding and its in particular not well measured in N N2 or something like that but in terms of N your input complexity Min and your output complexity Mout. I'd expect something like O(N + Min + Mout) but much depending on the encoding and the tricks that you can play with it.

That depends entirely of your input data structure. If you pass your matrix (1s and 0s) as a 2D array you need to traverse it and that is O(N^2). But as your data is sparse, if you only pass the 1's as input, you can do it so the ouptut is O(M), where M is not the number of cells but the number of 1 cells. It would be something similar to this (pseudocode below):
list f(list l) {
list rows_1;
list cols_1;
for each elem in l {
rows_1[elem.row] = 1;
cols_1[elem.col] = 1;
}
list result;
for each row in rows_1 {
for each col in cols_1 {
if (row == 1 || col == 1) {
add(result, new_elem(row, col));
}
}
}
return result;
}

Don't fill the center of the matrix when you're checking values. As you go through the elements, when you have 1 set the corresponding element in the first row and the first column. Then go back and fill down and across.
edit: Actually, this is the same as Andy's.

It depends on your data structure.
There are only two possible cases for rows:
A row i is filled with 1's if there is an element (i,_) in the input
All other rows are the same: i.e. the j-th element is 1 iff there is an element (_,j) in the input.
Hence the result could be represented compactly as an array of references to rows. Since we only need two rows the result would also only consume O(N) memory. As an example this could be implemented in python as follows:
def f(element_list, N):
A = [1]*N
B = [0]*N
M = [B]*N
for row, col in element_list:
M[row] = A
B[col] = 1
return M
A sample call would be
f([(1,1),(2,2),(4,3)],5)
with the result
[[0, 1, 1, 1, 0], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [0, 1, 1, 1, 0], [1, 1, 1, 1, 1]]
The important point is that the arrays are not copied here, i.e. M[row]=A is just an assignment of a reference. Hence the complexity is O(N+M), where M is the length of the input.

#include<stdio.h>
include
int main()
{
int arr[5][5] = { {1,0,0,0,0},
{0,1,1,0,0},
{0,0,0,0,0},
{1,0,0,1,0},
{0,0,0,0,0} };
int var1=0,var2=0,i,j;
for(i=0;i<5;i++)
var1 = var1 | arr[0][i];
for(i=0;i<5;i++)
var2 = var2 | arr[i][0];
for(i=1;i<5;i++)
for(j=1;j<5;j++)
if(arr[i][j])
arr[i][0] = arr[0][j] = 1;
for(i=1;i<5;i++)
for(j=1;j<5;j++)
arr[i][j] = arr[i][0] | arr[0][j];
for(i=0;i<5;i++)
arr[0][i] = var1;
for(i=0;i<5;i++)
arr[i][0] = var2;
for(i=0;i<5;i++)
{
printf("\n");
for(j=0;j<5;j++)
printf("%d ",arr[i][j]);
}
getch();
}
This program makes use of only 2 4 temporary variables (var1,var2,i and j) and hence runs in constant space with time complexity O(n^2).. I Think it is not possible at all to solve this problem in < O(n^2).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio