Matlab: Speed up loop applied to each of 820,000 elements - performance

I have a set of rainfall data, with a value every 15 minutes over many years, giving 820,000 rows.
The aim (eventually) of my code is to create columns which categorise the data which can then be used to extract relevant chunks of data for further analysis.
I am a Matlab novice and would appreciate some help!
The first steps I have got working sufficiently fast. However, some steps are very slow.
I have tried pre-allocating arrays, and using the lowest intX (8 or 16 depending on situation) possible, but other steps are so slow they don't complete.
The slow ones are for loops, but I don't know if they can be vectorised/split into chunks/anything else to speed them up.
I have a variable "rain" which contains a value for every time step/row.
I have created a variable called "state" of 0 if no rain, and 1 if there is rain.
Also a variable called "begin" which has 1 if it is the first row of a storm, and 0 if not.
The first slow loop is to create a "spell" variable - to give each rain storm a number.
% Generate blank column for spell of size (rain) - preallocate
spell = zeros(size(st),1,'int16');
% Start row for analysis
x=1;
% Populate "spell" variable with a storm number in each row of rain, for the storm number it belongs to (storm number calculated by adding up the number of "begin" values up to that point
for i=1:size(state)
if(state(x)==1)
spell(x) = sum(begin(1:x));
end
x=x+1;
end
The next stage is about length of each storm. The first steps are fast enough.
% List of storm numbers
spellnum = unique(spell);
% Length of each spell
spelllength = histc(spell,spellnum);
The last step below (the for loop) is too slow and just crashes.
% Generate blank column for length
length = zeros(size(state),1,'int16');
% Starting row
x = 1;
% For loop to output the total length of the storm for each row of rain within that storm
for i=1:size(state)
for j=1:size(state)
position = find(spell==x);
for k=1:size(state)
length(position) = spelllength(x+1);
end
end
x=x+1;
end
Is it possible to make this more efficient?
Apologies if examples already exist - I'm not sure what the process would be called!
Many thanks in advance.

Mem. allocation/reallocation tips:
try to create the results directly from expression (eventually trimming another, more general result);
if 1. is not possible, try to pre-allocate whenever possible (when you have an upper limit for the result);
if 2. is not possible try to grow cell-arrays rather than massive matrices (because a matrix requires a contiguous memory area)
Type-choice tips:
try to use always double in intermediate results, because is the basic numeric data type in MATLAB; avoiding conversions back and forth;
use other types for intermediate results only if there's a memory constraint that can be alleviated by using a smaller-size type.
Linearisation tips:
fastest linearisation uses matrix-wise or element-wise basic algebraic operations combined with logical indexing.
loops are not that bad starting with MATLAB R2008;
the worst-performing element-wise processing functions are arrayfun, cellfun and structfun with anonymous functions, because anon functions evaluate the slowest;
try not to calculate the same things twice, even if this gives you better linearisation.
First block:
% Just calculate the entire cumulative sum over begin, then
% trim the result. Check if the cumsum doesn't overflow.
spell = cumsum(begin);
spell(state==0) = 0;
Second block:
% The same, not sure how could you speed this up; changed
% the name of variables to my taste, though.
spell_num = unique(spell);
spell_length = histc(spell,spell_num);
Third block:
% Fix the following issues:
% - the most-inner "for" does not make sense because it rewrites
% several times the same thing;
% - the same looping variable "i" is re-used in three nested loops,
% - thename of the standard function "length" is obscured by declaring
% a variable named "length".
for x = 1:numel(spell_num)
storm_selector = (spell==spell_num(x));
storm_length(storm_selector) = spell_length(x+1);
end;

The combination of code I have ended up using is a mixture from #CST_Link and #Sifu. Thank you very much for your help! I don't think Stackoverflow lets me accept two answers, so for clarity by putting it all together, here is the code which everyone's helped me create!
The only slow part is the for loop in block three, but this still runs in a few minutes, which is good enough for me, and infinitely better than my attempt.
First block:
%% Spell
%spell is cumulative sum of begin
spell = cumsum(begin);
%% start row
x=1;
%% Replace all rows of spell with no rain with 0
spell(state==0)=0
Second block (unchanged except better variable names):
%% Spell number = all values of spell
spell_num = unique(spell);
%% Spell length = how many of each value of spell
spell_length = histc(spell,spell_num);
Third block:
%% Generate blank column for spell of size (state)
spell_length2 = zeros(length(state),1);
%%
for x=1:length(state)
position = find(spell==x);
spell_length2(position) = spell_length(x+1);
end

for the first part if i am following what you are doing
i created some data matching your description for testing.
please tell me if i missed something
state=[ 1 0 0 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 0];
begin=[ 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0];
spell = zeros(length(state),1,'int16');
%Start row for analysis
x=1;
% Populate "spell" variable with a storm number in each row of rain, for the storm number it belongs to (storm number calculated by adding up the number of "begin" values up to that point
for i=1:length(state)
if(state(x)==1)
spell(x) = sum(begin(1:x));
end
x=x+1;
end
% can be accomplished by simply using cumsum ( no need for extra variables if you are short in memory)
spell2=cumsum(begin);
spell3=spell2.*(state==1);
and the output for both spell and spell3 as shown
[spell.'; spell3]
0 0 0 0 0 1 1 1 1 1 0 2 0 0 2 0 3 3 3 3 0
0 0 0 0 0 1 1 1 1 1 0 2 0 0 2 0 3 3 3 3 0

Why don't you do that instead?
% For loop to output the total length of the storm for each row of rain within that storm
for x=1:size(state)
position = find(spell==x);
length(position) = spelllength(x+1);
end
I replaced the i iterator for x, that removes 2 lines and some computation.
I then proceeded to removed the two nested loops as they were litteraly useless (each loop would output the same thing)
That's already a good start..

Related

MATLAB: Fast calculation of Adamic-Adar Score

I have an adjacency matrix of a network, and want to calculate the Adamic-Adar score. It is defined in the following way: For each pair of edges x and y, let z one of their common neighbors, and |z| is the degree of the neighbor.
Now the score is defined as a sum over all common neighbors z:
See for instance this paper, page 3.
I have written a small algorithm for MATLAB, but it uses two for-loops. I am convinced that it can be made much faster, but I dont know how. Could you please indicate ways how to speed this up?
% the entries of nn will always be 0 or 1, and the diagonal will always be 0
nn=[0 0 0 0 1 0; ...
0 0 0 1 1 0; ...
0 0 0 0 1 0; ...
0 1 0 0 0 1; ...
1 1 1 0 0 0; ...
0 0 0 1 0 0];
deg=sum(nn>0);
AAScore=zeros(size(nn));
for ii=1:length(nn)-1
for jj=ii+1:length(nn)
NBs=nn(ii,:).*nn(jj,:);
B=NBs.*deg;
C=B(B>1);
AAScore(ii,jj)=sum(1./log(C));
end
end
AAScore
I would appreciate any suggestion, thank you!
Comparing runtimes
My nn has ~2% entries, so it can be approximated by:
kk=1500;
nn=(rand(kk)>0.98).*(1-eye(kk));
My double-for: 37.404445 seconds.
Divakar's first solution: 58.455826 seconds.
Divakar's updated solution: 22.333510 seconds.
First off, get the indices in the output array that would be set, i.e. non-zeros. Looking at the code, we could notice that we are basically performing AND-ing of each row from input matrix nn against every other row. Given the fact that we are dealing with 1s and 0s, this basically translates to performing matrix-multiplication. So, the non-zeros in the matrix-multiplication result would indicate the places in the sqaured matrix output array where the computation is needed. This should be efficient as we would be iterating over lesser elements. On top of it, since we are getting a upper triangular matrix output, that should further reduce the computations by using a mask with triu(...,1).
Following those ideas, here's an implementation -
[R,C] = find(triu(nn*nn.'>0,1));
vals = sum(1./log(bsxfun(#times,nn(R,:).*nn(C,:),deg)),2);
out=zeros(size(nn));
out(sub2ind(size(out),R,C)) = vals;
For a case with input matrix nn being less-sparsey and really huge, you would feel the bottleneck at computing bsxfun(#times,nn(R,:).*nn(C,:),deg). So, for such a case, you can directly use those R,C indices to perform computation for updating respective selective places in the output array.
Thus, an alternative implementation would be -
[R,C] = find(triu(nn*nn.',1));
out=zeros(size(nn));
for ii =1:numel(R)
out(R(ii),C(ii)) = sum(1./log(nn(R(ii),:).*nn(C(ii),:).*deg));
end
A middle-ground could probably be estabilshed between the two above mentioned approaches by starting off with the R,C indices, then selecting chunks of rows off nn(R,:) and respective ones from nn(C,:) too and using the vectorized implementation across those chunks iteratively with lesser complexity. Setting the chunk size could be tricky, as it would largely depend on the system resources, input array size involved and the sparse-ness of it.

How many times does a zero occur on an odometer

I am solving how many times a zero occus on an odometer. I count +1 everytime I see a zero.
10 -> +1
100-> +2 because in 100 I see 2 zero's
10004 -> +3 because I see 3 zero's
So I get,
1 - 100 -> +11
1 - 500 -> +91
1 - 501 -> +92
0 - 4294967295-> +3825876150
I used rubydoctest for it. I am not doing anything with begin_number yet. Can anyone explain how to calculate it without a brute force method?
I did many attempts. They go well for numbers like 10, 1000, 10.000, 100.000.000, but not for numbers like 522, 2280. If I run the rubydoctest, it will fail on # >> algorithm_count_zero(1, 500)
# doctest: algorithm_count_zero(begin_number, end_number)
# >> algorithm_count_zero(1, 10)
# => 1
# >> algorithm_count_zero(1, 1000)
# => 192
# >> algorithm_count_zero(1, 10000000)
# => 5888896
# >> algorithm_count_zero(1, 500)
# => 91
# >> algorithm_count_zero(0, 4294967295)
# => 3825876150
def algorithm_count_zero(begin_number, end_number)
power = Math::log10(end_number) - 1
if end_number < 100
return end_number/10
else
end_number > 100
count = (9*(power)-1)*10**power+1
end
answer = ((((count / 9)+power)).floor) + 1
end
end_number = 20000
begin_number = 10000
puts "Algorithm #{algorithm_count_zero(begin_number, end_number)}"
As noticed in a comment, this is a duplicate to another question, where the solution gives you correct guidelines.
However, if you want to test your own solution for correctness, i'll put in here a one-liner in the parallel array processing language Dyalog APL (which i btw think everyone modelling mathemathics and numbers should use).
Using tryapl.org you'll be able to get a correct answer for any integer value as argument. Tryapl is a web page with a backend that executes simple APL code statements ("one-liners", which are very typical to the APL language and it's extremely compact code).
The APL one-liner is here:
{+/(c×1+d|⍵)+d×(-c←0=⌊(a|⍵)÷d←a×+0.1)+⌊⍵÷a←10*⌽⍳⌈10⍟⍵} 142857
Copy that and paste it into the edit row at tryapl.org, and press enter - you will quickly see an integer, which is the answer to your problem. In the code row above, you can see the argument rightmost; it is 142857 this time but you can change it to any integer.
As you have pasted the one-liner once, and executed it with Enter once, the easiest way to get it back for editing is to press [Up arrow]. This returns the most recently entered statement; then you can edit the number sitting rightmost (after the curly brace) and press Enter again to get the answer for a different argument.
Pasting teh code row above will return 66765 - that many zeroes exist for 142857.
If you paste this 2 characters shorter row below, you will see the individual components of the result - the sum of these components make up the final result. You will be able to see a pattern, which possibly makes it easier to understand what happens.
Try for example
{(c×1+d|⍵)+d×(-c←0=⌊(a|⍵)÷d←a×+0.1)+⌊⍵÷a←10*⌽⍳⌈10⍟⍵} 1428579376
0 100000000 140000000 142000000 142800000 142850000 142857000 142857900 142857930 142857937
... and see how the intermediate results contain segments of the argument 1428579376, starting from left! There are as many intermediate results as there are numbers in the argument (10 this time).
The result for 1428579376 will be 1239080767, ie. the sum of the 10 numbers above. This many zeroes appear in all numbers between 1 and 1428579376 :-).
Consider each odometer position separately. The position x places from the far right changes once every 10^x times. By looking at the numbers to its right, you know how long it will be until it next changes. It will then hold each value for 10^x times before changing, until it reaches the end of the range you are considering, when it will hold its value at that time for some number of times that you can work out given the value at the very end of the range.
Now you have a sequence of the form x...0123456789012...y where you know the length and you know the values of x and y. One way to count the number of 0s (or any other digit) within this sequence is to clip off the prefix from x.. to just before the first 0, and clip off the suffix from just after the last 9 to y. Look for 0s n in this suffix, and measure the length of the long sequence from prefix to suffix. This will be of a length divisible by 10, and will contain each digit the same number of times.
Based on this you should be able to work out, for each position, how often within the range it will assume each of its 10 possible values. By summing up the values for 0 from each of the odometer positions you get the answer you want.

Create multiple combinations summing to 100

I would like to be able to create multiple combinations that sum to 100%, given a defined number of "buckets" with a defined 'difference factor'. In the below example, the difference is a factor of 20 to make it simple, but I will probably reduce it to 1 in the final solution.
For example, with 3 "buckets" A, B, C you could have:
A 100 80 80 60 60 ... 0
B 0 20 0 20 40 ... 0
C 0 0 20 20 0 ... 100
Each column is one combination (summing to 100) that I would like to store and do further calculations on.
This is a business problem and not homework.
Please help me come up with a solution. A brute force way would be to create a multi-dimension array for every possible combination, e.g. 100x100x100 and then go through each 1 million combination to see which ones sum to 100. However this looks like it will be way too inefficient.
Much appreciated. I hope I have explained clearly enough.
This problem is known as partitions rather than combinations, which is something different.
First off: the 'difference factor' just turns the problem from finding partitions of 100 to (in your example) finding partitions of 5 (then multiplying by 20).
Next up: If the number of buckets is constant, you can just do (pseudo code):
for i = 0 to n
for j = 0 to n-i
output (i, j, n-(i+j))
If the number of buckets is going to be dynamic, you'd have to be a bit cleverer, but this approach will basically work.
This looks like it would yield well to a bit of cacheing and dynamic programming.
fun partition (partitions_left, value):
if partitions_left == 0
return empty_list
if value == 0:
return list of list of partitions_left 0 elements
return_value = empty_list
for possible_value from value downto 1:
remainder = value-possible_value
children = partition(partitions_left-1, remainder)
for child in children:
append (cons of possible_value and child) to return_value
return return_value
If you also make sure that you serve already-computed values from the cache, "all" you need to then do is to generate all possible permutations of all generated partitions.
Algorithm wise you could make a list of all the numbers between 0 and 100 in steps of 20 in list A, then make a copy of list A to be list B.
Next, compare each of list A's values to list B seeing which values add up to 100 or fewer and store a record of these in list C. Next, do the same to list C again (checking all the values between 0 and 100 with a step of 20) to see which values add up to 100.

Can we compute this in less than O(n*n) ...( nlogn or n)

This is a question asked to me by a very very famous MNC. The question is as follows ...
Input an 2D N*N array of 0's and 1's. If A(i,j) = 1, then all the values corresponding to the ith row and the jth column are going to be 1. If there is a 1 already, it remains as a 1.
As an example , if we have the array
1 0 0 0 0
0 1 1 0 0
0 0 0 0 0
1 0 0 1 0
0 0 0 0 0
we should get the output as
1 1 1 1 1
1 1 1 1 1
1 1 1 1 0
1 1 1 1 1
1 1 1 1 0
The input matrix is sparsely populated.
Is this possible in less than O(N^2)?
No additional space is provided was another condition. I would like to know if there's a way to achieve the complexity using a space <= O(N).
P.S : I don't need answers that give me a complexity of O(N*N). This is not a homework problem. I have tried much and couldn't get a proper solution and thought I could get some ideas here.Leave the printing aside for the complexity
My rough idea was to may be dynamically eliminate the number of elements traversed restricting them to around 2N or so. But I couldn't get a proper idea.
In the worst case, you may need to toggle N * N - N bits from 0 to 1 to generate the output. It would seem you're pretty well stuck with O(N*N).
I would imagine that you can optimize it for the best case, but I'm tempted to say that your worst case is still O(N*N): Your worst case will be an array of all 0s, and you will have to examine every single element.
The optimization would involve skipping a row or column as soon as you found a "1" (I can provide details, but you said you don't care about O(N*N)", but unless you have metadata to indicate that an entire row/column is empty, or unless you have a SIMD-style way to check multiple fields at once (say, if every row is aligned by 4, and you can read 32 bits worth data, or if your data is in form of a bitmask), you will always have to deal with the problem of an all-zero array.
Clearly, nor the output matrix nor its negated version has to be sparse (take a matrix with half of the first row set to 1 and anything else to 0 to see), so time depends on what format you are allowed to use for the output. (I'm assuming the input is a list of elements or something equivalent, since otherwise you couldn't take advantage of the matrix being sparse.)
A simple solution for O(M+N) space and time (M is the number of ones in the input matrix): take two arrays of length N filled with ones, iterate through all ones in the input, and for each drop the X coordinate from the first array and the Y from the second one. The output is the two arrays, which clearly define the result matrix: its (X,Y) coordinate is 0 iff the X coordinate of the first array and the Y coordinate of the second are 0.
Update: depending on the language, you could use some trickery to return a normal 2D array by referencing the same row multiple times. For example in PHP:
// compute N-length arrays $X and $Y which have 1 at the column
// and row positions which had no 1's in the input matrix
// this is O(M+N)
$result = array();
$row_one = array_fill(0,N,1);
for ($i=0; $i<N; $i++) {
if ($Y[$i]) {
$result[$i] = &$row_one;
} else {
$result[$i] = &$X;
}
}
return $result;
Of course this is a normal array only as long as you don't try to write it.
Since every entry of the matrix has to be checked, your worst case is always going to be N*N.
With a small 2*N extra storage, you can perform the operation in O(N*N). Just create a mask for each row and another for each column - scan the array and update the masks as you go. Then scan again to populate the result matrix based on the masks.
If you're doing something where the input matrix is changing, you could store a count of non-zero entries for each row and column of the input (rather than a simple mask). Then when an entry in the input changes, you update the counts accordingly. At that point, I would drop the output matrix entirely and query the masks/counts directly rather than even maintaining the output matrix (which could also be updated as thing change in less than NN time if you really wanted to keep it around). So loading the initial matrix would still be O(NN) but updates could be much less.
The input matrix may be sparse, but unless you can get it in a sparse format (i.e. a list of (i,j) pairs that are initially set), just reading your input will consume Ω(n^2) time. Even with sparse input, it's easy to end up with O(n^2) output to write. As a cheat, if you were allowed to output a list of set rows and set columns, then you could get down to linear time. There's no magic to be had when your algorithm actually has to produce a result more substantial than 'yes' or 'no'.
Mcdowella's comment on another answer suggests another alternative input format: run-length encoding. For a sparse input, that clearly requires no more than O(n) time to read it (consider how many transitions there are between 0 and 1). However, from there it breaks down. Consider an input matrix structured as follows:
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 . . .
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . .
. .
. .
. .
That is, alternating 0 and 1 on the first row, 0 everywhere else. Clearly sparse, since there are n/2 ones in total. However, the RLE output has to repeat this pattern in every row, leading to O(n^2) output.
You say:
we should get the output as...
So you need to output the entire matrix, which has N^2 elements. This is O(N*N).
The problem itself is not O(N*N): you dont have to compute and store the entire matrix: you only need two vectors, L and C, each of size N:
L[x] is 1 if line x is a line of ones, 0 otherwise;
C[x] is 1 if line x is a line of ones, 0 otherwise;
You can construct these vectors in O(N), because the initial matrix is sparse; your input data will not be a matrix, but a list containing the coordinates(line,column) of each non-zero element. While reading this list, you set L[line]=1 and C[column]=1, and the problem is solved: M[l,c] == 1 if L[l]==1 OR C[c]==1
Hii guys ,
thanks to the comment from mb14 i think i could get it solved in less than O(NN) time...
The worst would take O(NN)...
Actually , we have the given array suppose
1 0 0 0 1
0 1 0 0 0
0 1 1 0 0
1 1 1 0 1
0 0 0 0 0
Lets have 2 arrays of size N (this would be the worst case) ... One is dedicated for indexing rows and other columns...
Put those with a[i][1] = 0 in one array and then a[1][j] =0 in another..
Then take those values only and check for the second row and colums...In this manner , we get the values of rows and colums where there are only 0;'s entirely...
The number of values in the row array gives number of 0's in the result array and the points a[row-array values][column array value] gives you those points ....
We could solve it in below O(NN) and worst is O(NN) ... As we can seee , the arrays ( of size N) diminishes ....
I did this for a few arrays and got the result for all of them ... :)
Please correct me if i am wrong anywhere...
Thanx for all your comments guys...You are all very helpful and i did learn quite a few things along the way ... :)
There is clearly up to O(N^2) work to do. In the matrix
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
all bits have to be set to 1, and N*(N-1) are not set to one (20, in this 5x5 case).
Conversely, you can come up with an algorithm that always does it in O(N^2) time: sum along the top row and let column, and if the row or column gets a nonzero answer, fill in the entire row or column; then solve the smaller (N-1)x(N-1) problem.
So there exist cases that must take at least N^2 and any case can be solved in N^2 without extra space.
If your matrix is sparse, the complexity depends much on the input encoding and its in particular not well measured in N N2 or something like that but in terms of N your input complexity Min and your output complexity Mout. I'd expect something like O(N + Min + Mout) but much depending on the encoding and the tricks that you can play with it.
That depends entirely of your input data structure. If you pass your matrix (1s and 0s) as a 2D array you need to traverse it and that is O(N^2). But as your data is sparse, if you only pass the 1's as input, you can do it so the ouptut is O(M), where M is not the number of cells but the number of 1 cells. It would be something similar to this (pseudocode below):
list f(list l) {
list rows_1;
list cols_1;
for each elem in l {
rows_1[elem.row] = 1;
cols_1[elem.col] = 1;
}
list result;
for each row in rows_1 {
for each col in cols_1 {
if (row == 1 || col == 1) {
add(result, new_elem(row, col));
}
}
}
return result;
}
Don't fill the center of the matrix when you're checking values. As you go through the elements, when you have 1 set the corresponding element in the first row and the first column. Then go back and fill down and across.
edit: Actually, this is the same as Andy's.
It depends on your data structure.
There are only two possible cases for rows:
A row i is filled with 1's if there is an element (i,_) in the input
All other rows are the same: i.e. the j-th element is 1 iff there is an element (_,j) in the input.
Hence the result could be represented compactly as an array of references to rows. Since we only need two rows the result would also only consume O(N) memory. As an example this could be implemented in python as follows:
def f(element_list, N):
A = [1]*N
B = [0]*N
M = [B]*N
for row, col in element_list:
M[row] = A
B[col] = 1
return M
A sample call would be
f([(1,1),(2,2),(4,3)],5)
with the result
[[0, 1, 1, 1, 0], [1, 1, 1, 1, 1], [1, 1, 1, 1, 1], [0, 1, 1, 1, 0], [1, 1, 1, 1, 1]]
The important point is that the arrays are not copied here, i.e. M[row]=A is just an assignment of a reference. Hence the complexity is O(N+M), where M is the length of the input.
#include<stdio.h>
include
int main()
{
int arr[5][5] = { {1,0,0,0,0},
{0,1,1,0,0},
{0,0,0,0,0},
{1,0,0,1,0},
{0,0,0,0,0} };
int var1=0,var2=0,i,j;
for(i=0;i<5;i++)
var1 = var1 | arr[0][i];
for(i=0;i<5;i++)
var2 = var2 | arr[i][0];
for(i=1;i<5;i++)
for(j=1;j<5;j++)
if(arr[i][j])
arr[i][0] = arr[0][j] = 1;
for(i=1;i<5;i++)
for(j=1;j<5;j++)
arr[i][j] = arr[i][0] | arr[0][j];
for(i=0;i<5;i++)
arr[0][i] = var1;
for(i=0;i<5;i++)
arr[i][0] = var2;
for(i=0;i<5;i++)
{
printf("\n");
for(j=0;j<5;j++)
printf("%d ",arr[i][j]);
}
getch();
}
This program makes use of only 2 4 temporary variables (var1,var2,i and j) and hence runs in constant space with time complexity O(n^2).. I Think it is not possible at all to solve this problem in < O(n^2).

duplicacy problems while creating a sudoku puzzle

I am trying to create my own normal 9x9 sudoku puzzle.
I divided the problem into two parts -
creating a fully filled sudoku, and
removing unnecessary numbers from
the grid
Right now, I am stuck with the first part.
This is the algorithm I use in brief:
a) first of all I choose a number (say 1), generate a random cell position, and place it there if
the cell is not already occupied, and
if the row does not already have the number, and
if the column does not already have the number, and
if the 3x3 box does not already have the number
b) now I check for a situation in which in a row, or a column or a box, only one place is empty and I fill that
c) I check that if there is a number that in not present in a box but is present in the boxes in the same row and the same column (i am talking about 3x3 boxes here), the number's place is fixed and I fill it.
d) I repeat the above steps until every number appears nine times on the grid.
The problem I am facing is that, more than often I am getting an intermediate situation like this:
0 1 0 | 0 0 3 | 0[4/2]0
0 [2] 0 | 0 [4] 1 | 3 0 0
3 0 [4]|[2] 0 0 | 0 0 1
---------+---------+---------
2 0 3 | 0 5 4 | 0 1 0
0 0 1 | 3 0 2 |[4] 0 0
0 4 0 | 0 1 0 |[2] 3 0
---------+---------+---------
1 0 2 | 0 3 0 | 0 0 [4]
4 3 0 | 1 0 0 | 0 0 [2]
5 0 0 | 4 2 0 | 1 0 3
See the place with [4/2] written? that is the place of 2 as well as 4 because of the boxes marked [].
What can I do to avoid getting in this situation (because this situation is a deadlock - I cannot move further)
There's another way to generate sudoku puzzles: Start with a known good grid - any one will do - then randomly 'shuffle' it by applying operations that don't destroy the invariants. Valid operations include:
Swapping rows within a block
Swapping columns within a block
Swapping entire rows of blocks (eg, first 3, middle 3, last 3 rows)
Swapping entire columns of blocks
Swapping all instances of one number with another
Reflecting the board
Rotating the board
With these operations, you can generate a very large range of possible boards. You need to be careful about how you apply the operations, however - much like the naive shuffle, it's easy to write an algorithm that makes some boards more likely than others. A technique similar to the Knuth shuffle may help here.
Edit: It has been pointed out in the comments that these operations alone aren't sufficient to create every possible grid.
You will always get that situation. You need a recursive backtracking search to solve it.
Basically, the only way to determine whether a particular digit really is valid for a cell is to continue the search and see what happens.
Backtracking searches are normally done using recursive calls. Each call will iterate through the (possibly) still valid options for one cell, recursing to evaluate all the options for the next cell. When you can't continue, backtracking means returning from the current call - erasing any digit you tested out for that cell first, of course.
When you find a valid solution, either save it and backtrack to continue (ie find alternatives), or break out of all the recursive calls to finish. Success in a recursive backtracking search is a special case where throwing an exception for success is IMO a good idea - it is exceptional for a particular call to succeed, and the code will be clearer.
If generating a random board, iterate the options in a particular recursive call (for a particular cell) in random order.
The same basic algorithm also applies for a partly completed board (ie to solve existing sodoku) - when evaluating a cell that already has a digit, well, that's the only option for that cell so recurse for the next cell.
Here's the backtracking search from a solver I wrote once - a lot is abstracted out, but hopefully that just makes the principle clearer...
size_t Board::Rec_Search (size_t p_Pos)
{
size_t l_Count = 0;
if (p_Pos == 81) // Found a solution
{
l_Count++;
std::cout << "------------------------" << std::endl;
Draw ();
std::cout << "------------------------" << std::endl;
}
else
{
if (m_Board [p_Pos] == 0) // Need to search here
{
size_t l_Valid_Set = Valid_Set (p_Pos);
if (l_Valid_Set != 0) // Can only continue if there are possible digits
{
size_t l_Bit = 1; // Scan position for valid set
for (size_t i = 1; i <= 9; i++)
{
if (l_Valid_Set & l_Bit)
{
Set_Digit (p_Pos, i);
l_Count += Rec_Search (p_Pos + 1);
}
l_Bit <<= 1;
}
Clr_Digit (p_Pos); // Ensure cleared properly for backtracking
}
}
else // Already filled in - skip
{
l_Count += Rec_Search (p_Pos + 1);
}
}
return l_Count;
}
If you've reached a contradictory state where a cell if both 2 and 4 some of your other 2s and 4s must be placed wrongly. You need to rollback and try some different solutions.
Sounds like you might have an algorithm problem? Some good stuff here.

Resources