Find all possible paths through a maze

Find all possible paths through a maze - maze

I'm trying to create a program that will traverse a randomly generated maze where 1's are open and 0's are walls. starting in the top left and ending in the bottom right. The path can go up, down, left, and right.
Currently, my program gives me a solution, but I'm having trouble getting it to print more than one path.
I've read several different versions of this problem, but I'm unable to find one quite with my parameters.
Here's my code, I omitted the part where I randomly generate my maze.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <stdbool.h>
int n, minMatrix, solIndex = 1, minLen = 10000000; //I use the latter 3 variables in order to find the shortest path, not relevant for now
bool solveMaze(int mat[n][n],int x, int y, int sol[][n], int count){
int i, j;
if((!(x >= 0 && x <n && y >=0 && y < n)) || mat[x][y] == 0 || sol[x][y] == 1){
return false;
}
if(x == n-1 && y == n-1){
sol[x][y] = 1;
printf("Solution %d is:\n", solIndex);
for(i = 0; i < n; i++)
{
for( j=0;j<n;j++)
{
printf("%d", sol[i][j]);
}
printf("\n");
}
if(count<minLen)
{
minLen = count;
minMatrix = solIndex;
}
solIndex +=1;
sol[x][y] = 0;
return true;
}
sol[x][y] = 1;
if(solveMaze(mat, x+1, y, sol, count+1)){
return true;
}
if(solveMaze(mat, x-1, y, sol, count+1)){
return true;
}
if(solveMaze(mat, x, y+1, sol, count+1)){
return true;
}
if(solveMaze(mat, x, y-1, sol, count+1)){
return true;
}
sol[x][y] = 0;
return false;
}
I've omitted the part of my main where I randomly generate my maze.
int main(){
if(!solveMaze(**mat, 0, 0, sol, 0)){
printf("No possible paths, run program again\n");
}
else{
printf("the shortest path is %d\n", minMatrix);
}
}
For instance if I have the maze
1100111111
1101111111
1111110110
1110011111
1101101011
1111101011
1110111101
1100111111
1110111011
1101101111
It gives me the first path that it finds
1000000000
1001100000
1111110000
1100011000
1100001000
1100001000
1100001000
1100001011
1100001011
1100001111
Though it takes a roundabout way of getting there, due to the preferences of going in order of down, up, right, and left, it is still one path.
So ultimately, I'm not sure how to iterate for multiple paths.

Straightforward fully working solution using the example maze from this similar question (which was marked as duplicate but was standalone compilable): Find all paths in a maze using DFS
It uses a simple DFS with straightforward recursion, which seems the same approach as in the question here. It keeps track of the current track in a single string instance and modifies the maze in place to block off the current track.
#include <iostream>
#include <string>
const int WIDTH = 6;
const int HEIGHT = 5;
void check(int x, int y, int dest_x, int dest_y,
int (&maze)[HEIGHT][WIDTH], std::string& path) {
if (x < 0 || y < 0 || x >= WIDTH|| y >= HEIGHT || !maze[y][x]) {
return;
}
int len = path.size();
path += (char) ('0' + x);
path += ',';
path += (char) ('0' + y);
if (x == dest_x && y == dest_y) {
std::cout << path << "\n";
} else {
path += " > ";
maze[y][x] = 0;
check (x + 0, y - 1, dest_x, dest_y, maze, path);
check (x + 0, y + 1, dest_x, dest_y, maze, path);
check (x - 1, y + 0, dest_x, dest_y, maze, path);
check (x + 1, y + 0, dest_x, dest_y, maze, path);
maze[y][x] = 1;
}
path.resize(len);
}
int main() {
int maze[HEIGHT][WIDTH] = {
{1,0,1,1,1,1},
{1,0,1,0,1,1},
{1,1,1,0,1,1},
{0,0,0,0,1,0},
{1,1,1,0,1,1}};
std::string path;
check(0, 0, 4, 3, maze, path);
return 0;
}
Runnable version: https://code.sololearn.com/cYn18c5p7609

i finally found you the solution to your question. but honestly it's not a solution that i did develope, some other people (namely: Schröder) had this idea before!
the problem is described by Schröder, but have a look at the german translation speaking of permutating a binary tree.
transform your path and all reachable nodes into a binary tree and permutate it! (but be warned, there may be many many solutions)
as you can see these are all solutions for crossing a 4x4 square (missing the mirrored part, but thats alas).

Related

0-1 knapsack TLE

I was solving 0-1 knapsack problem (src:https://www.interviewbit.com/problems/0-1-knapsack/)
and would like to understand why I got TLE and know how to get rid of TLE .
My solution : (which showed TLE in hard case)
int knapsack(vector<int> A , vector<int> B, int weight , int n , vector<vector<int>> &dp ){
if(weight==0 || n==0){
// dp[weight][n] = 0;
return 0 ;
}
if(dp[weight][n]!=(-1)){
return dp[weight][n];
}
if(B[n-1]<=weight){
dp[weight][n] = max( (knapsack(A,B,weight,n-1,dp)) , (A[n-1] + knapsack(A,B,weight-B[n-1],n-1,dp)) );
return max( (knapsack(A,B,weight,n-1,dp)) , (A[n-1] + knapsack(A,B,weight-B[n-1],n-1,dp)) );
}
// if(B[n-1]>weight){
else{
dp[weight][n] = knapsack(A,B,weight,n-1,dp);
return knapsack(A,B,weight,n-1,dp);
}
}
int Solution::solve(vector<int> &A, vector<int> &B, int C) {
int N = A.size();
// n rows and weights written vertically in columns
vector<vector<int>> dp(C+1, vector<int> (N+1,-1));
return knapsack(A,B,C,N,dp);
}
One solution which I found in discussion tab and does not get TLE which is exactly same as my solution :
int knapsack(vector<int>& wt, vector<int>& val, int W, int n, vector<vector<int>>& dp)
{
if(n == 0 || W == 0)
return 0;
if(dp[n][W] != -1)
return dp[n][W];
if(wt[n-1] <= W)
return dp[n][W] = max(val[n-1] + knapsack(wt, val, W-wt[n-1], n-1, dp), knapsack(wt, val, W, n-1, dp));
else
return dp[n][W] = knapsack(wt, val, W, n-1, dp);
}
int Solution::solve(vector<int> &val, vector<int> &wt, int W)
{
int n = wt.size();
vector<vector<int>> dp(n+1 , vector<int> (W+1, -1));
return knapsack(wt, val, W, n, dp);
}
Is it possible that using a bigger variable name caused me a TLE in the hard case ?

You do too many recursive calls.
knapsack is not a pure function. It has side effect of modifying dp, and the compiler is not smart enough to figure out that a second call in the else branch
dp[weight][n] = knapsack(A,B,weight,n-1,dp);
return knapsack(A,B,weight,n-1,dp);
is redundant. Help the compiler and optimize it out manually:
dp[weight][n] = knapsack(A,B,weight,n-1,dp);
return dp[weight][n];
or, just as in the other solution
return dp[weight][n] = knapsack(A,B,weight,n-1,dp);
Ditto for the if branch,
(And no, variable names do not affect performance).

How can you find the cuboid with the greatest volume in a heightmap? (with low complexity)

I need to find the cuboid with the greatest volume, contained within a 2D-heightmap.
The heightmap is an array of size w*d where w is width, h is height and d is depth.
In C, this would look along the lines of:
unsigned heightmap[w][d]; // all values are <= h
I already know that there is a naive algorithm which can solve this with O(w*d*h) complexity.
However, I suspect that there is a more optimal method out there.
It works as follows, in pythonic pseudocode:
resultRectangle = None
resultHeight = None
resultVolume = -1
# iterate over all heights
for loopHeight in range(0, h):
# create a 2D bitmap from our heightmap where a 1 represents a height >= loopHeight
bool bitmap[w][d]
for x in range(0, w):
for y in range(0, d):
bitmap[x][y] = heightmap[x][y] >= loopHeight
# obtain the greatest-volume cuboid at this particular height
maxRectangle = maxRectangleInBitmap(bitmap)
volume = maxRectangle.area() * loopHeight
# compare it to our current maximum and replace it if we found a greater cuboid
if volume > resultVolume:
resultHeight = loopHeight
resultVolume = volume
resultRectangle = maxRectangle
resultCuboid = resultRectangle.withHeight(resultHeight)
Finding the greatest area of all 1 in a rectangle is a known problem with O(1) complexity per pixel or O(w*d) in our case.
The total complexity of the naive approach is thus O(w*h*d).
So as I already stated, I was wondering if we can beat this complexity.
Perhaps we can get it down to O(w*d * log(h)) by searching through heights more intelligently instead of "brute-forcing" all of them.
The answer to this question Find largest cuboid containing only 1's in an NxNxN binary array by Evgeny Kluev seems to take a similar approach, but it falsely(?) assumes that the volumes which we would find at these heights form a unimodal function.
If this was the case, we could use Golden Section Search to choose heights more intelligently, but I don't think we can.

Here is an idea, with a significant assumption. pseudo-code:
P <- points from heightmap sorted by increasing height.
R <- set of rectangles. All maximal empty sub-rectangles for the current height.
R.add(Rectangle(0,0,W,H)
result = last_point_in(P).height()
foreach(p in P):
RR <- rectangles from R that overlap P (can be found in O(size(RR)), possibly with some logarithmic factors)
R = R - RR
foreach(r in RR)
result = max(result, r.area() * p.height())
split up r, adding O(1) new rectangles to R.
return result
The assumption, which I have a gut feeling about, but can't prove, is that RR will be O(1) size on average.
Edit: to clarify the "splittting", if we split at point p:
AAAAADFFF
AAAAADFFF
AAAAADFFF
BBBBBpGGG
CCCCCEHHH
CCCCCEHHH
We generate new rectangles consisting of:
ABC, CEH, FGH, ADF, and add them to R.

OK, another take. Most "meat" is in the go function. It uses the same "splitting" concept as in my other answer, but uses top-down dynamic programming with memoization. rmq2d implements 2D Range Minimum Query. for size 1000x1000 it takes about 30 seconds (while using 3GB of memory).
#include <iostream>
#include <vector>
#include <cassert>
#include <set>
#include <tuple>
#include <memory.h>
#include <limits.h>
using namespace std;
constexpr int ilog2(int x){
return 31 - __builtin_clz(x);
}
const int MAX_DIM = 100;
template<class T>
struct rmq2d{
struct point{
int x,y;
point():x(0),y(0){}
point(int x,int y):x(x),y(y){}
};
typedef point array_t[MAX_DIM][ilog2(MAX_DIM)+1][MAX_DIM];
int h, logh;
int w, logw;
vector<vector<T>> v;
array_t *A;
rmq2d(){A=nullptr;}
rmq2d &operator=(const rmq2d &other){
assert(sizeof(point)==8);
if(this == &other) return *this;
if(!A){
A = new array_t[ilog2(MAX_DIM)+1];
}
v=other.v;
h=other.h;
logh = other.logh;
w=other.w;
logw=other.logw;
memcpy(A, other.A, (ilog2(MAX_DIM)+1)*sizeof(array_t));
return *this;
}
rmq2d(const rmq2d &other){
A = nullptr;
*this = other;
}
~rmq2d(){
delete[] A;
}
T query(point pos){
return v[pos.y][pos.x];
}
rmq2d(vector<vector<T>> &v) : v(v){
A = new array_t[ilog2(MAX_DIM)+1];
h = (int)v.size();
logh = ilog2(h) + 1;
w = (int)v[0].size();
logw = ilog2(w) + 1;
for(int y=0; y<h; ++y){
for(int x=0;x<w;x++) A[0][y][0][x] = {x, y};
for(int jx=1; jx<logw; jx++){
int sz = 1<<(jx-1);
for(int x=0; x+sz < w; x++){
point i1 = A[0][y][jx-1][x];
point i2 = A[0][y][jx-1][x+sz];
if(query(i1) < query(i2)){
A[0][y][jx][x] = i1;
}else{
A[0][y][jx][x] = i2;
}
}
}
}
for(int jy=1; jy<logh; ++jy){
int sz = 1<<(jy-1);
for(int y=0; y+sz<h; ++y){
for(int jx=0; jx<logw; ++jx){
for(int x=0; x<w; ++x){
point i1 = A[jy-1][y][jx][x];
point i2 = A[jy-1][y+sz][jx][x];
if(query(i1) < query(i2)){
A[jy][y][jx][x] = i1;
}else{
A[jy][y][jx][x] = i2;
}
}
}
}
}
}
point pos_q(int x1, int x2, int y1, int y2){
assert(A);
int lenx = ilog2(x2 - x1);
int leny = ilog2(y2 - y1);
point idxs[] = {
A[leny][y1][lenx][x1],
A[leny][y2-(1<<leny)][lenx][x1],
A[leny][y1][lenx][x2-(1<<lenx)],
A[leny][y2-(1<<leny)][lenx][x2-(1<<lenx)]
};
point ret = idxs[0];
for(int i=1; i<4; ++i){
if(query(ret) > query(idxs[i])) ret = idxs[i];
}
return ret;
}
T val_q(int x1, int x2, int y1, int y2){
point pos = pos_q(x1,x2,y1,y2);
return v[pos.y][pos.x];
}
};
rmq2d<long long> rmq;
set<tuple<int, int, int ,int>> cac;
vector<vector<long long>> v(MAX_DIM-5,vector<long long>(MAX_DIM-5,0));
long long ret = 0;
int nq = 0;
void go(int x1, int x2, int y1, int y2){
if(x1 >= x2 || y1>=y2) return;
if(!cac.insert(make_tuple(x1,y1,x2,y2)).second) return;
++nq;
auto p = rmq.pos_q(x1, x2, y1, y2);
long long cur = v[p.y][p.x]*(x2-x1)*(y2-y1);
if(cur > ret){
cout << x1 << "-" << x2 << ", " << y1 << "-" << y2 << " h=" << v[p.y][p.x] << " :" << cur << endl;
ret = cur;
}
go(p.x+1, x2, y1, y2);
go(x1, p.x, y1, y2);
go(x1, x2, p.y+1, y2);
go(x1, x2, y1, p.y);
}
int main(){
int W = (int)v[0].size();
int H=(int)v.size();
for(int y=0; y<H;++y){
for(int x=0; x<W; ++x){
v[y][x] = rand()%10000;
}
}
rmq = rmq2d<long long>(v);
go(0,W, 0, H);
cout << "nq:" << nq << endl;
}

Smallest Multiple of given number With digits only 0 and 1

You are given an integer N. You have to find smallest multiple of N which consists of digits 0 and 1 only. Since this multiple could be large, return it in form of a string.
Returned string should not contain leading zeroes.
For example,
For N = 55, 110 is smallest multiple consisting of digits 0 and 1.
For N = 2, 10 is the answer.
I saw several related problems, but I could not find the problem with my code.
Here is my code giving TLE on some cases even after using map instead of set.
#define ll long long
int getMod(string s, int A)
{
int res=0;
for(int i=0;i<s.length();i++)
{
res=res*10+(s[i]-'0');
res%=A;
}
return res;
}
string Solution::multiple(int A) {
if(A<=1)
return to_string(A);
queue<string>q;
q.push("1");
set<int>st;
string s="1";
while(!q.empty())
{
s=q.front();
q.pop();
int mod=getMod(s,A);
if(mod==0)
{
return s;
}
else if(st.find(mod)==st.end())
{
st.insert(mod);
q.push(s+"0");
q.push(s+"1");
}
}
}

Here is an implementation in Raku.
my $n = 55;
(1 .. Inf).map( *.base(2) ).first( * %% $n );
(1 .. Inf) is a lazy list from one to infinity. The "whatever star" * establishes a closure and stands for the current element in the map.
base is a method of Rakus Num type which returns a string representation of a given number in the wanted base, here a binary string.
first returns the current element when the "whatever star" closure holds true for it.
The %% is the divisible by operator, it implicitly casts its left side to Int.
Oh, and to top it off. It's easy to parallelize this, so your code can use multiple cpu cores:
(1 .. Inf).race( :batch(1000), :degree(4) ).map( *.base(2) ).first( * %% $n );

As mentioned in the "math" reference, the result is related to the congruence of the power of 10 modulo A.
If
n = sum_i a[i] 10^i
then
n modulo A = sum_i a[i] b[i]
Where the a[i] are equal to 0 or 1, and the b[i] = (10^i) modulo A
Then the problem is to find the minimum a[i] sequence, such that the sum is equal to 0 modulo A.
From a graph a point of view, we have to find the shortest path to zero modulo A.
A BFS is generally well adapted to find such a path. The issue is the possible exponential increase of the number of nodes to visit. Here, were are sure to get a number of nodes less than A, by rejecting the nodes, the sum of which (modulo A) has already been obtained (see vector used in the program). Note that this rejection is needed in order to get the minimum number at the end.
Here is a program in C++. The solution being quite simple, it should be easy to understand even by those no familiar with C++.
#include <iostream>
#include <string>
#include <vector>
struct node {
int sum = 0;
std::string s;
};
std::string multiple (int A) {
std::vector<std::vector<node>> nodes (2);
std::vector<bool> used (A, false);
int range = 0;
int ten = 10 % A;
int pow_ten = 1;
if (A == 0) return "0";
if (A == 1) return "1";
nodes[range].push_back (node{0, "0"});
nodes[range].push_back (node{1, "1"});
used[1] = true;
while (1) {
int range_new = (range + 1) % 2;
nodes[range_new].resize(0);
pow_ten = (pow_ten * ten) % A;
for (node &x: nodes[range]) {
node y = x;
y.s = "0" + y.s;
nodes[range_new].push_back(y);
y = x;
y.sum = (y.sum + pow_ten) % A;
if (used[y.sum]) continue;
used[y.sum] = true;
y.s = "1" + y.s;
if (y.sum == 0) return y.s;
nodes[range_new].push_back(y);
}
range = range_new;
}
}
int main() {
std::cout << "input number: ";
int n;
std::cin >> n;
std::cout << "Result = " << multiple(n) << "\n";
return 0;
}
EDIT
The above program is using a kind of memoization in order to speed up the process but for large inputs memory becomes too large.
As indicated in a comment for example, it cannot handle the case N = 60000007.
I improved the speed and the range a little bit with the following modifications:
A function (reduction) was created to simplify the search when the input number is divisible by 2 or 5
For the memorization of the nodes (nodes array), only one array is used now instead of two
A kind of meet-in-the middle procedure is used: in a first step, a function mem_gen memorizes all relevant 01 sequences up to N_DIGIT_MEM (=20) digits. Then the main procedure multiple2 generates valid 01 sequences "after the 20 first digits" and then in the memory looks for a "complementary sequence" such that the concatenation of both is a valid sequence
With this new program the case N = 60000007 provides the good result (100101000001001010011110111, 27 digits) in about 600ms on my PC.
EDIT 2
Instead of limiting the number of digits for the memorization in the first step, I now use a threshold on the size of the memory, as this size does not depent only on the number of digits but also of the input number. Note that the optimal value of this threshold would depend of the input number. Here, I selected a thresholf of 50k as a compromise. With a threshold of 20k, for 60000007, I obtain the good result in 36 ms. Besides, with a threshold of 100k, the worst case 99999999 is solved in 5s.
I made different tests with values less than 10^9. In about all tested cases, the result is provided in less that 1s. However, I met a corner case N=99999999, for which the result consists in 72 consecutive "1". In this particular case, the program takes about 6.7s. For 60000007, the good result is obtained in 69ms.
Here is the new program:
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <unordered_map>
#include <chrono>
#include <cmath>
#include <algorithm>
std::string reverse (std::string s) {
std::string res {s.rbegin(), s.rend()};
return res;
}
struct node {
int sum = 0;
std::string s;
node (int sum_ = 0, std::string s_ = ""): sum(sum_), s(s_) {};
};
// This function simplifies the search when the input number is divisible by 2 or 5
node reduction (int &X, long long &pow_ten) {
node init {0, ""};
while (1) {
int digit = X % 10;
if (digit == 1 || digit == 3 || digit == 7 || digit == 9) break;
switch (digit) {
case(0):
X /= 10;
break;
case(2):
case(4):
case(6):
case(8):
X = (5*X)/10;
break;
case(5):
X = (2*X)/10;
break;
}
init.s.push_back('0');
pow_ten = (pow_ten * 10) % X;
}
return init;
}
const int N_DIGIT_MEM = 30; // 20
const int threshold_size_mem = 50000;
// This function memorizes all relevant 01 sequences up to N_DIGIT_MEM digits
bool gene_mem (int X, long long &pow_ten, int index_max, std::map<int, std::string> &mem, node &result) {
std::vector<node> nodes;
std::vector<bool> used (X, false);
bool start = true;
for (int index = 0; index < index_max; ++index){
if (start) {
node x = {int(pow_ten), "1"};
nodes.push_back (x);
} else {
for (node &x: nodes) {
x.s.push_back('0');
}
int n = nodes.size();
for (int i = 0; i < n; ++i) {
node y = nodes[i];
y.sum = (y.sum + pow_ten) % X;
y.s.back() = '1';
if (used[y.sum]) continue;
used[y.sum] = true;
if (y.sum == 0) {
result = y;
return true;
}
nodes.push_back(y);
}
}
pow_ten = (10 * pow_ten) % X;
start = false;
int n_mem = nodes.size();
if (n_mem > threshold_size_mem) {
break;
}
}
for (auto &x: nodes) {
mem[x.sum] = x.s;
}
//std::cout << "size mem = " << mem.size() << "\n";
return false;
}
// This function generates valid 01 sequences "after the 20 first digits" and then in the memory
// looks for a "complementary sequence" such that the concatenation of both is a valid sequence
std::string multiple2 (int A) {
std::vector<node> nodes;
std::map<int, std::string> mem;
int ten = 10 % A;
long long pow_ten = 1;
int digit;
if (A == 0) return "0";
int X = A;
node init = reduction (X, pow_ten);
if (X != A) ten = ten % X;
if (X == 1) {
init.s.push_back('1');
return reverse(init.s);
}
std::vector<bool> used (X, false);
node result;
int index_max = N_DIGIT_MEM;
if (gene_mem (X, pow_ten, index_max, mem, result)) {
return reverse(init.s + result.s);
}
node init2 {0, ""};
nodes.push_back(init2);
while (1) {
for (node &x: nodes) {
x.s.push_back('0');
}
int n = nodes.size();
for (int i = 0; i < n; ++i) {
node y = nodes[i];
y.sum = (y.sum + pow_ten) % X;
if (used[y.sum]) continue;
used[y.sum] = true;
y.s.back() = '1';
if (y.sum != 0) {
int target = X - y.sum;
auto search = mem.find(target);
if (search != mem.end()) {
//std::cout << "mem size 2nd step = " << nodes.size() << "\n";
return reverse(init.s + search->second + y.s);
}
}
nodes.push_back(y);
}
pow_ten = (pow_ten * ten) % X;
}
}
int main() {
std::cout << "input number: ";
int n;
std::cin >> n;
std::string res;
auto t1 = std::chrono::high_resolution_clock::now();
res = multiple2(n),
std::cout << "Result = " << res << " ndigit = " << res.size() << std::endl;
auto t2 = std::chrono::high_resolution_clock::now();
auto duration2 = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
std::cout << "time = " << duration2/1000 << " ms" << std::endl;
return 0;
}

For people more familiar with Python, here is a converted version of #Damien's code. Damien's important insight is to strongly reduce the search tree, taking advantage of the fact that each partial sum only needs to be investigated once, namely the first time it is encountered.
The problem is also described at Mathpuzzle, but there they mostly fix on the necessary existence of a solution. There's also code mentioned at the online encyclopedia of integer sequences. The sage version seems to be somewhat similar.
I made a few changes:
Starting with an empty list helps to correctly solve A=1 while simplifying the code. The multiplication by 10 is moved to the end of the loop. Doing the same for 0 seems to be hard, as log10(0) is minus infinity.
Instead of alternating between nodes[range] and nodes[new_range], two different lists are used.
As Python supports integers of arbitrary precision, the partial results could be stored as decimal or binary numbers instead of as strings. This is not yet done in the code below.
from collections import namedtuple
node = namedtuple('node', 'sum str')
def find_multiple_ones_zeros(A):
nodes = [node(0, "")]
used = set()
pow_ten = 1
while True:
new_nodes = []
for x in nodes:
y = node(x.sum, "0" + x.str)
new_nodes.append(y)
next_sum = (x.sum + pow_ten) % A
y = node((x.sum + pow_ten) % A, x.str)
if next_sum in used:
continue
used.add(next_sum)
y = node(next_sum, "1" + x.str)
if next_sum == 0:
return y.str
new_nodes.append(y)
pow_ten = (pow_ten * 10) % A
nodes = new_nodes

minimum length window in string1 where string2 is subsequence

Main DNA sequence(a string) is given (let say string1) and another string to search for(let say string2). You have to find the minimum length window in string1 where string2 is subsequence.
string1 = "abcdefababaef"
string2 = "abf"
Approaches that i thought of, but does not seem to be working:
1. Use longest common subsequence(LCS) approach and check if the (length of LCS = length of string2). But this will give me whether string2 is present in string1 as subsequence, but not smallest window.
2. KMP algo, but not sure how to modify it.
3. Prepare a map of {characters: pos of characters} of string1 which are in string2. Like:
{ a : 0,6,8,10
b : 1,7,9
f : 5,12 }
And then some approach to find min window and still maintaining the order of "abf"
I am not sure whether I am thinking in right directions or am I totally off.
Is there a known algorithm for this, or does anyone know any approach? Kindly suggest.
Thanks in advance.

You can do LCS and find all the max subsequences in the String1 of String2 using recursion on the DP table of the LCS result. Then calculate the window length of each of LCS and you can get minimum of it. You can also stop a branch if it already exceeds size of current smallest window found.
check Reading out all LCS :-
http://en.wikipedia.org/wiki/Longest_common_subsequence_problem

Dynamic Programming!
Here is a C implementation
#include <iostream>
#include <vector>
using namespace std;
int main() {
string a, b;
cin >> a >> b;
int m = a.size(), n = b.size();
int inf = 100000000;
vector < vector < int > > dp (n + 1, vector < int > (m + 1, inf)); // length of min string a[j...k] such that b[i...] is a subsequence of a[j...k]
dp[n] = vector < int > (m + 1, 0); // b[n...] = "", so dp[n][i] = 0 for each i
for (int i = n - 1; i >= 0; --i) {
for (int j = m - 1; j >= 0; --j) {
if(b[i] == a[j]) dp[i][j] = 1 + dp[i+1][j+1];
else dp[i][j] = 1 + dp[i][j+1];
}
}
int l, r, min_len = inf;
for (int i = 0; i < m; ++i) {
if(dp[0][i] < min_len) {
min_len = dp[0][i];
l = i, r = i + min_len;
}
}
if(min_len == inf) {
cout << "no solution!\n";
} else {
for (int i = l; i < r; ++i) {
cout << a[i];
}
cout << '\n';
}
return 0;
}

I found a similar interview question on CareerCup , only difference being that its an array of integers instead of characters. I borrowed an idea and made a few changes, let me know if you have any questions after reading this C++ code.
What I am trying to do here is : The for loop in the main function is used to loop over all elements of the given array and find positions where I encounter the first element of the subarray, once found, I call the find_subsequence function where I recursively match the elements of the given array to the subarray at the same time preserving the order of elements. Finally, find_subsequence returns the position and I calculate the size of the subsequence.
Please excuse my English, wish I could explain it better.
#include "stdafx.h"
#include "iostream"
#include "vector"
#include "set"
using namespace std;
class Solution {
public:
int find_subsequence(vector<int> s, vector<int> c, int arrayStart, int subArrayStart) {
if (arrayStart == s.size() || subArrayStart ==c.size()) return -1;
if (subArrayStart==c.size()-1) return arrayStart;
if (s[arrayStart + 1] == c[subArrayStart + 1])
return find_subsequence(s, c, arrayStart + 1, subArrayStart + 1);
else
return find_subsequence(s, c, arrayStart + 1, subArrayStart);
}
};
int main()
{
vector<int> v = { 1,5,3,5,6,7,8,5,6,8,7,8,0,7 };
vector<int> c = { 5,6,8,7 };
Solution s;
int size = INT_MAX;
int j = -1;
for (int i = 0; i <v.size(); i++) {
if(v[i]==c[0]){
int x = s.find_subsequence(v, c, i-1, -1);
if (x > -1) {
if (x - i + 1 < size) {
size = x - i + 1;
j = i;
}
if (size == c.size())
break;
}
}
}
cout << size <<" "<<j;
return 0;
}

Connected Component Labeling - Implementation

I have asked a similar question some days ago, but I have yet to find an efficient way of solving my problem.
I'm developing a simple console game, and I have a 2D array like this:
1,0,0,0,1
1,1,0,1,1
0,1,0,0,1
1,1,1,1,0
0,0,0,1,0
I am trying to find all the areas that consist of neighboring 1's (4-way connectivity). So, in this example the 2 areas are as following:
1
1,1
1
1,1,1,1
1
and :
1
1,1
1
The algorithm, that I've been working on, finds all the neighbors of the neighbors of a cell and works perfectly fine on this kind of matrices. However, when I use bigger arrays (like 90*90) the program is very slow and sometimes the huge arrays that are used cause stack overflows.
One guy on my other question told me about connected-component labelling as an efficient solution to my problem.
Can somebody show me any C++ code which uses this algorithm, because I'm kinda confused about how it actually works along with this disjoint-set data structure thing...
Thanks a lot for your help and time.

I'll first give you the code and then explain it a bit:
// direction vectors
const int dx[] = {+1, 0, -1, 0};
const int dy[] = {0, +1, 0, -1};
// matrix dimensions
int row_count;
int col_count;
// the input matrix
int m[MAX][MAX];
// the labels, 0 means unlabeled
int label[MAX][MAX];
void dfs(int x, int y, int current_label) {
if (x < 0 || x == row_count) return; // out of bounds
if (y < 0 || y == col_count) return; // out of bounds
if (label[x][y] || !m[x][y]) return; // already labeled or not marked with 1 in m
// mark the current cell
label[x][y] = current_label;
// recursively mark the neighbors
for (int direction = 0; direction < 4; ++direction)
dfs(x + dx[direction], y + dy[direction], current_label);
}
void find_components() {
int component = 0;
for (int i = 0; i < row_count; ++i)
for (int j = 0; j < col_count; ++j)
if (!label[i][j] && m[i][j]) dfs(i, j, ++component);
}
This is a common way of solving this problem.
The direction vectors are just a nice way to find the neighboring cells (in each of the four directions).
The dfs function performs a depth-first-search of the grid. That simply means it will visit all the cells reachable from the starting cell. Each cell will be marked with current_label
The find_components function goes through all the cells of the grid and starts a component labeling if it finds an unlabeled cell (marked with 1).
This can also be done iteratively using a stack.
If you replace the stack with a queue, you obtain the bfs or breadth-first-search.

This can be solved with union find (although DFS, as shown in the other answer, is probably a bit simpler).
The basic idea behind this data structure is to repeatedly merge elements in the same component. This is done by representing each component as a tree (with nodes keeping track of their own parent, instead of the other way around), you can check whether 2 elements are in the same component by traversing to the root node and you can merge nodes by simply making the one root the parent of the other root.
A short code sample demonstrating this:
const int w = 5, h = 5;
int input[w][h] = {{1,0,0,0,1},
{1,1,0,1,1},
{0,1,0,0,1},
{1,1,1,1,0},
{0,0,0,1,0}};
int component[w*h];
void doUnion(int a, int b)
{
// get the root component of a and b, and set the one's parent to the other
while (component[a] != a)
a = component[a];
while (component[b] != b)
b = component[b];
component[b] = a;
}
void unionCoords(int x, int y, int x2, int y2)
{
if (y2 < h && x2 < w && input[x][y] && input[x2][y2])
doUnion(x*h + y, x2*h + y2);
}
int main()
{
for (int i = 0; i < w*h; i++)
component[i] = i;
for (int x = 0; x < w; x++)
for (int y = 0; y < h; y++)
{
unionCoords(x, y, x+1, y);
unionCoords(x, y, x, y+1);
}
// print the array
for (int x = 0; x < w; x++)
{
for (int y = 0; y < h; y++)
{
if (input[x][y] == 0)
{
cout << ' ';
continue;
}
int c = x*h + y;
while (component[c] != c) c = component[c];
cout << (char)('a'+c);
}
cout << "\n";
}
}
Live demo.
The above will show each group of ones using a different letter of the alphabet.
p i
pp ii
p i
pppp
p
It should be easy to modify this to get the components separately or get a list of elements corresponding to each component. One idea is to replace cout << (char)('a'+c); above with componentMap[c].add(Point(x,y)) with componentMap being a map<int, list<Point>> - each entry in this map will then correspond to a component and give a list of points.
There are various optimisations to improve the efficiency of union find, the above is just a basic implementation.

You could also try this transitive closure approach, however the triple loop for the transitive closure slows things up when there are many separated objects in the image, suggested code changes welcome
Cheers
Dave
void CC(unsigned char* pBinImage, unsigned char* pOutImage, int width, int height, int CON8)
{
int i, j, x, y, k, maxIndX, maxIndY, sum, ct, newLabel=1, count, maxVal=0, sumVal=0, maxEQ=10000;
int *eq=NULL, list[4];
int bAdd;
memcpy(pOutImage, pBinImage, width*height*sizeof(unsigned char));
unsigned char* equivalences=(unsigned char*) calloc(sizeof(unsigned char), maxEQ*maxEQ);
// modify labels this should be done with iterators to modify elements
// current column
for(j=0; j<height; j++)
{
// current row
for(i=0; i<width; i++)
{
if(pOutImage[i+j*width]>0)
{
count=0;
// go through blocks
list[0]=0;
list[1]=0;
list[2]=0;
list[3]=0;
if(j>0)
{
if((i>0))
{
if((pOutImage[(i-1)+(j-1)*width]>0) && (CON8 > 0))
list[count++]=pOutImage[(i-1)+(j-1)*width];
}
if(pOutImage[i+(j-1)*width]>0)
{
for(x=0, bAdd=true; x<count; x++)
{
if(pOutImage[i+(j-1)*width]==list[x])
bAdd=false;
}
if(bAdd)
list[count++]=pOutImage[i+(j-1)*width];
}
if(i<width-1)
{
if((pOutImage[(i+1)+(j-1)*width]>0) && (CON8 > 0))
{
for(x=0, bAdd=true; x<count; x++)
{
if(pOutImage[(i+1)+(j-1)*width]==list[x])
bAdd=false;
}
if(bAdd)
list[count++]=pOutImage[(i+1)+(j-1)*width];
}
}
}
if(i>0)
{
if(pOutImage[(i-1)+j*width]>0)
{
for(x=0, bAdd=true; x<count; x++)
{
if(pOutImage[(i-1)+j*width]==list[x])
bAdd=false;
}
if(bAdd)
list[count++]=pOutImage[(i-1)+j*width];
}
}
// has a neighbour label
if(count==0)
pOutImage[i+j*width]=newLabel++;
else
{
pOutImage[i+j*width]=list[0];
if(count>1)
{
// store equivalences in table
for(x=0; x<count; x++)
for(y=0; y<count; y++)
equivalences[list[x]+list[y]*maxEQ]=1;
}
}
}
}
}
// floyd-Warshall algorithm - transitive closure - slow though :-(
for(i=0; i<newLabel; i++)
for(j=0; j<newLabel; j++)
{
if(equivalences[i+j*maxEQ]>0)
{
for(k=0; k<newLabel; k++)
{
equivalences[k+j*maxEQ]= equivalences[k+j*maxEQ] || equivalences[k+i*maxEQ];
}
}
}
eq=(int*) calloc(sizeof(int), newLabel);
for(i=0; i<newLabel; i++)
for(j=0; j<newLabel; j++)
{
if(equivalences[i+j*maxEQ]>0)
{
eq[i]=j;
break;
}
}
free(equivalences);
// label image with equivalents
for(i=0; i<width*height; i++)
{
if(pOutImage[i]>0&&eq[pOutImage[i]]>0)
pOutImage[i]=eq[pOutImage[i]];
}
free(eq);
}

very useful Document => https://docs.google.com/file/d/0B8gQ5d6E54ZDM204VFVxMkNtYjg/edit
java application - open source - extract objects from image - connected componen labeling => https://drive.google.com/file/d/0B8gQ5d6E54ZDTVdsWE1ic2lpaHM/edit?usp=sharing
import java.util.ArrayList;
public class cclabeling
{
int neighbourindex;ArrayList<Integer> Temp;
ArrayList<ArrayList<Integer>> cc=new ArrayList<>();
public int[][][] cclabel(boolean[] Main,int w){
/* this method return array of arrays "xycc" each array contains
the x,y coordinates of pixels of one connected component
– Main => binary array of image
– w => width of image */
long start=System.nanoTime();
int len=Main.length;int id=0;
int[] dir={-w-1,-w,-w+1,-1,+1,+w-1,+w,+w+1};
for(int i=0;i<len;i+=1){
if(Main[i]){
Temp=new ArrayList<>();
Temp.add(i);
for(int x=0;x<Temp.size();x+=1){
id=Temp.get(x);
for(int u=0;u<8;u+=1){
neighbourindex=id+dir[u];
if(Main[neighbourindex]){
Temp.add(neighbourindex);
Main[neighbourindex]=false;
}
}
Main[id]=false;
}
cc.add(Temp);
}
}
int[][][] xycc=new int[cc.size()][][];
int x;int y;
for(int i=0;i<cc.size();i+=1){
xycc[i]=new int[cc.get(i).size()][2];
for(int v=0;v<cc.get(i).size();v+=1){
y=Math.round(cc.get(i).get(v)/w);
x=cc.get(i).get(v)-y*w;
xycc[i][v][0]=x;
xycc[i][v][1]=y;
}
}
long end=System.nanoTime();
long time=end-start;
System.out.println("Connected Component Labeling Time =>"+time/1000000+" milliseconds");
System.out.println("Number Of Shapes => "+xycc.length);
return xycc;
}
}

Please find below the sample code for connected component labeling . The code is written in JAVA
package addressextraction;
public class ConnectedComponentLabelling {
int[] dx={+1, 0, -1, 0};
int[] dy={0, +1, 0, -1};
int row_count=0;
int col_count=0;
int[][] m;
int[][] label;
public ConnectedComponentLabelling(int row_count,int col_count) {
this.row_count=row_count;
this.col_count=col_count;
m=new int[row_count][col_count];
label=new int[row_count][col_count];
}
void dfs(int x, int y, int current_label) {
if (x < 0 || x == row_count) return; // out of bounds
if (y < 0 || y == col_count) return; // out of bounds
if (label[x][y]!=0 || m[x][y]!=1) return; // already labeled or not marked with 1 in m
// mark the current cell
label[x][y] = current_label;
// System.out.println("****************************");
// recursively mark the neighbors
int direction = 0;
for (direction = 0; direction < 4; ++direction)
dfs(x + dx[direction], y + dy[direction], current_label);
}
void find_components() {
int component = 0;
for (int i = 0; i < row_count; ++i)
for (int j = 0; j < col_count; ++j)
if (label[i][j]==0 && m[i][j]==1) dfs(i, j, ++component);
}
public static void main(String[] args) {
ConnectedComponentLabelling l=new ConnectedComponentLabelling(4,4);
l.m[0][0]=0;
l.m[0][1]=0;
l.m[0][2]=0;
l.m[0][3]=0;
l.m[1][0]=0;
l.m[1][1]=1;
l.m[1][2]=0;
l.m[1][3]=0;
l.m[2][0]=0;
l.m[2][1]=0;
l.m[2][2]=0;
l.m[2][3]=0;
l.m[3][0]=0;
l.m[3][1]=1;
l.m[3][2]=0;
l.m[3][3]=0;
l.find_components();
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 4; j++) {
System.out.print(l.label[i][j]);
}
System.out.println("");
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find all possible paths through a maze - maze

Related

0-1 knapsack TLE

How can you find the cuboid with the greatest volume in a heightmap? (with low complexity)

Smallest Multiple of given number With digits only 0 and 1

minimum length window in string1 where string2 is subsequence

Connected Component Labeling - Implementation

Categories

Resources