Rascal MPL how to extract additional loc segments? - location

According to Location, the loc format is | Uri | ( O, L, < BL, BC > , < EL,EC > ).
Say we have loc l = |java+compilationUnit:///test/src/main.java|(0,123,<1,0>,<7,1>);
How to retrieve e.g. only < BL, BC >?
As an aside, how to convert loc values to a string? toString(l) does not work.

Towards the bottom of Location, below the list items marked with **, there it is:
l.offset, l.length, l.begin, l.end. Some useful examples:
l.uri will return str: "java+compilationUnit:///test/src/main.java"
l.path will return str: "/test/src/main.java"
l.begin will return tuple[int line,int column]: <1,0>

Related

Knuth-Morris-Pratt implementation in Haskell -- Index out of bounds

I've used the pseudocode from Wikipedia in an attempt to write a KMP algorithm in Haskell.
It's giving "index out of bounds" when I try to search beyond the length of the pattern and I can't seem to find the issue; my "fixes" have only ruined the result.
import Control.Monad
import Control.Lens
import qualified Data.ByteString.Char8 as C
import qualified Data.Vector.Unboxed as V
(!) :: C.ByteString -> Int -> Char
(!) = C.index
-- Make the table for the KMP. Directly from Wikipedia. Works as expected for inputs from Wikipedia article.
mkTable :: C.ByteString -> V.Vector Int
mkTable pat = make 2 0 (ix 0 .~ (negate 1) $ V.replicate l 0)
where
l = C.length pat
make :: Int -> Int -> V.Vector Int -> V.Vector Int
make p c t
| p >= l = t
| otherwise = proc
where
proc | pat ! (p-1) == pat ! c
= make (p+1) (c+1) (ix p .~ (c+1) $ t)
| c > 0 = make p (t V.! c) t
| otherwise = make (p+1) c (ix p .~ 0 $ t)
kmp :: C.ByteString -> C.ByteString -> V.Vector Int -> Int
kmp text pat tbl = search 0 0
where
l = C.length text
search m i
| m + i >= l = l
| otherwise = cond
where
-- The conditions for the loop, given in the wiki article
cond | pat ! i == text ! (m+i)
= if i == C.length pat - 1
then m
else search m (i+1)
| tbl V.! i > (-1)
= search (m + i - (tbl V.! i)) (tbl V.! i)
| otherwise
= search 0 (m+1)
main :: IO()
main = do
t <- readLn
replicateM_ t $ do
text <- C.getLine
pat <- C.getLine
putStrLn $ kmp text pat (mkTable pat)
Simple solution: I mixed up m and i in the last condition of kmp.
| otherwise = search 0 (m+1)
Becomes
| otherwise = search (m+1) 0
And the issue is resolved.
Aside from that, it's necessary to use unboxed arrays in the ST monad or the table generation takes an absurd amount of time.

Remove the first char to extract an integer (unhandled exception: Subscript)

I'm trying to write a function which extracts only the integer in a string.
All my strings have the format Ci where C is a single character and i is an integer. I would like to be able to remove the C from my string.
I tried something like this :
fun transformKripke x =
if size x > 1
then String.substring (x, 1, size x)
else x
But unfortunately, I get an error like unhandled exception: Subscript.
I assume it's because sometimes my string will be empty and size of empty string is not working. But I don't know how to make it work... :/
Thanks in advance for your help
Best Regards.
The problem is calling String.substring (x, 1, size x) when x is not long enough.
The following should fix your immediate problem:
fun transformKripke s =
if size s = 0
then s
else String.substring (s, 1, size s)
or slightly prettier:
fun transformKripke s =
if size s = 0
then s
else String.extract (s, 1, NONE) (* means "until the end" *)
But you may want to consider naming your function something more general so that it can be useful in more senses than performing a Kripke transform (whatever that is). For example, you may want to be able to extract an actual int the first time one occurs anywhere in a string, regardless of how many non-integer characters that precede it:
fun extractInt s =
let val len = String.size s
fun helper pos result =
if pos = len
then result
else let val c = String.sub (s, pos)
val d = ord c - ord #"0"
in case (Char.isDigit c, result) of
(true, NONE) => helper (pos+1) (SOME d)
| (true, SOME ds) => helper (pos+1) (SOME (ds * 10 + d))
| (false, NONE) => helper (pos+1) NONE
| (false, SOME ds) => SOME ds
end
in helper 0 NONE
end
My mistake was stupid,
The string is finishing at size x -1 not size x. So now it's correct :
fun transformKripke x =
if size x > 1
then String.substring (x, 1, (size x)-1)
else x
Hope it will help ! :)

how to match dna sequence pattern

I am getting a trouble finding an approach to solve this problem.
Input-output sequences are as follows :
**input1 :** aaagctgctagag
**output1 :** a3gct2ag2
**input2 :** aaaaaaagctaagctaag
**output2 :** a6agcta2ag
Input nsequence can be of 10^6 characters and largest continuous patterns will be considered.
For example for input2 "agctaagcta" output will not be "agcta2gcta" but it will be "agcta2".
Any help appreciated.
Explanation of the algorithm:
Having a sequence S with symbols s(1), s(2),…, s(N).
Let B(i) be the best compressed sequence with elements s(1), s(2),…,s(i).
So, for example, B(3) will be the best compressed sequence for s(1), s(2), s(3).
What we want to know is B(N).
To find it, we will proceed by induction. We want to calculate B(i+1), knowing B(i), B(i-1), B(i-2), …, B(1), B(0), where B(0) is empty sequence, and and B(1) = s(1). At the same time, this constitutes a proof that the solution is optimal. ;)
To calculate B(i+1), we will pick the best sequence among the candidates:
Candidate sequences where the last block has one element:
B(i )s(i+1)1
B(i-1)s(i+1)2 ; only if s(i) = s(i+1)
B(i-2)s(i+1)3 ; only if s(i-1) = s(i) and s(i) = s(i+1)
…
B(1)s(i+1)[i-1] ; only if s(2)=s(3) and s(3)=s(4) and … and s(i) = s(i+1)
B(0)s(i+1)i = s(i+1)i ; only if s(1)=s(2) and s(2)=s(3) and … and s(i) = s(i+1)
Candidate sequences where the last block has 2 elements:
B(i-1)s(i)s(i+1)1
B(i-3)s(i)s(i+1)2 ; only if s(i-2)s(i-1)=s(i)s(i+1)
B(i-5)s(i)s(i+1)3 ; only if s(i-4)s(i-3)=s(i-2)s(i-1) and s(i-2)s(i-1)=s(i)s(i+1)
…
Candidate sequences where the last block has 3 elements:
…
Candidate sequences where the last block has 4 elements:
…
…
Candidate sequences where last block has n+1 elements:
s(1)s(2)s(3)………s(i+1)
For each possibility, the algorithm stops when the sequence block is no longer repeated. And that’s it.
The algorithm will be some thing like this in psude-c code:
B(0) = “”
for (i=1; i<=N; i++) {
// Calculate all the candidates for B(i)
BestCandidate=null
for (j=1; j<=i; j++) {
Calculate all the candidates of length (i)
r=1;
do {
Candidadte = B([i-j]*r-1) s(i-j+1)…s(i-1)s(i) r
If ( (BestCandidate==null)
|| (Candidate is shorter that BestCandidate))
{
BestCandidate=Candidate.
}
r++;
} while ( ([i-j]*r <= i)
&&(s(i-j*r+1) s(i-j*r+2)…s(i-j*r+j) == s(i-j+1) s(i-j+2)…s(i-j+j))
}
B(i)=BestCandidate
}
Hope that this can help a little more.
The full C program performing the required task is given below. It runs in O(n^2). The central part is only 30 lines of code.
EDIT I have restructured a little bit the code, changed the names of the variables and added some comment in order to be more readable.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>
// This struct represents a compressed segment like atg4, g3, agc1
struct Segment {
char *elements;
int nElements;
int count;
};
// As an example, for the segment agagt3 elements would be:
// {
// elements: "agagt",
// nElements: 5,
// count: 3
// }
struct Sequence {
struct Segment lastSegment;
struct Sequence *prev; // Points to a sequence without the last segment or NULL if it is the first segment
int totalLen; // Total length of the compressed sequence.
};
// as an example, for the sequence agt32ta5, the representation will be:
// {
// lastSegment:{"ta" , 2 , 5},
// prev: #A,
// totalLen: 8
// }
// and A will be
// {
// lastSegment{ "agt", 3, 32},
// prev: NULL,
// totalLen: 5
// }
// This function converts a sequence to a string.
// You have to free the string after using it.
// The strategy is to construct the string from right to left.
char *sequence2string(struct Sequence *S) {
char *Res=malloc(S->totalLen + 1);
char *digits="0123456789";
int p= S->totalLen;
Res[p]=0;
while (S!=NULL) {
// first we insert the count of the last element.
// We do digit by digit starting with the units.
int C = S->lastSegment.count;
while (C) {
p--;
Res[p] = digits[ C % 10 ];
C /= 10;
}
p -= S->lastSegment.nElements;
strncpy(Res + p , S->lastSegment.elements, S->lastSegment.nElements);
S = S ->prev;
}
return Res;
}
// Compresses a dna sequence.
// Returns a string with the in sequence compressed.
// The returned string must be freed after using it.
char *dnaCompress(char *in) {
int i,j;
int N = strlen(in);; // Number of elements of a in sequence.
// B is an array of N+1 sequences where B(i) is the best compressed sequence sequence of the first i characters.
// What we want to return is B[N];
struct Sequence *B;
B = malloc((N+1) * sizeof (struct Sequence));
// We first do an initialization for i=0
B[0].lastSegment.elements="";
B[0].lastSegment.nElements=0;
B[0].lastSegment.count=0;
B[0].prev = NULL;
B[0].totalLen=0;
// and set totalLen of all the sequences to a very HIGH VALUE in this case N*2 will be enougth, We will try different sequences and keep the minimum one.
for (i=1; i<=N; i++) B[i].totalLen = INT_MAX; // A very high value
for (i=1; i<=N; i++) {
// at this point we want to calculate B[i] and we know B[i-1], B[i-2], .... ,B[0]
for (j=1; j<=i; j++) {
// Here we will check all the candidates where the last segment has j elements
int r=1; // number of times the last segment is repeated
int rNDigits=1; // Number of digits of r
int rNDigitsBound=10; // We will increment r, so this value is when r will have an extra digit.
// when r = 0,1,...,9 => rNDigitsBound = 10
// when r = 10,11,...,99 => rNDigitsBound = 100
// when r = 100,101,.,999 => rNDigitsBound = 1000 and so on.
do {
// Here we analitze a candidate B(i).
// where the las segment has j elements repeated r times.
int CandidateLen = B[i-j*r].totalLen + j + rNDigits;
if (CandidateLen < B[i].totalLen) {
B[i].lastSegment.elements = in + i - j*r;
B[i].lastSegment.nElements = j;
B[i].lastSegment.count = r;
B[i].prev = &(B[i-j*r]);
B[i].totalLen = CandidateLen;
}
r++;
if (r == rNDigitsBound ) {
rNDigits++;
rNDigitsBound *= 10;
}
} while ( (i - j*r >= 0)
&& (strncmp(in + i -j, in + i - j*r, j)==0));
}
}
char *Res=sequence2string(&(B[N]));
free(B);
return Res;
}
int main(int argc, char** argv) {
char *compressedDNA=dnaCompress(argv[1]);
puts(compressedDNA);
free(compressedDNA);
return 0;
}
Forget Ukonnen. Dynamic programming it is. With 3-dimensional table:
sequence position
subsequence size
number of segments
TERMINOLOGY: For example, having a = "aaagctgctagag", sequence position coordinate would run from 1 to 13. At sequence position 3 (letter 'g'), having subsequence size 4, the subsequence would be "gctg". Understood? And as for the number of segments, then expressing a as "aaagctgctagag1" consists of 1 segment (the sequence itself). Expressing it as "a3gct2ag2" consists of 3 segments. "aaagctgct1ag2" consists of 2 segments. "a2a1ctg2ag2" would consist of 4 segments. Understood? Now, with this, you start filling a 3-dimensional array 13 x 13 x 13, so your time and memory complexity seems to be around n ** 3 for this. Are you sure you can handle it for million-bp sequences? I think that greedy approach would be better, because large DNA sequences are unlikely to repeat exactly. And, I would suggest that you widen your assignment to approximate matches, and you can publish it straight in a journal.
Anyway, you will start filling the table of compressing a subsequence starting at some position (dimension 1) with length equal to dimension 2 coordinate, having at most dimension 3 segments. So you first fill the first row, representing compressions of subsequences of length 1 consisting of at most 1 segment:
a a a g c t g c t a g a g
1(a1) 1(a1) 1(a1) 1(g1) 1(c1) 1(t1) 1(g1) 1(c1) 1(t1) 1(a1) 1(g1) 1(a1) 1(g1)
The number is the character cost (always 1 for these trivial 1-char sequences; number 1 does not count into the character cost), and in the parenthesis, you have the compression (also trivial for this simple case). The second row will be still simple:
2(a2) 2(a2) 2(ag1) 2(gc1) 2(ct1) 2(tg1) 2(gc1) 2(ct1) 2(ta1) 2(ag1) 2(ga1) 2(ag1)
There is only 1 way to decompose a 2-character sequence into 2 subsequences -- 1 character + 1 character. If they are identical, the result is like a + a = a2. If they are different, such as a + g, then, because only 1-segment sequences are admissible, the result cannot be a1g1, but must be ag1. The third row will be finally more interesting:
2(a3) 2(aag1) 3(agc1) 3(gct1) 3(ctg1) 3(tgc1) 3(gct1) 3(cta1) 3(tag1) 3(aga1) 3(gag1)
Here, you can always choose between 2 ways of composing the compressed string. For example, aag can be composed either as aa + g or a + ag. But again, we cannot have 2 segments, as in aa1g1 or a1ag1, so we must be satisfied with aag1, unless both components consist of the same character, as in aa + a => a3, with character cost 2. We can continue onto 4 th line:
4(aaag1) 4(aagc1) 4(agct1) 4(gctg1) 4(ctgc1) 4(tgct1) 4(gcta1) 4(ctag1) 4(taga1) 3(ag2)
Here, on the first position, we cannot use a3g1, because only 1 segment is allowed at this layer. But at the last position, compression to character cost 3 is agchieved by ag1 + ag1 = ag2. This way, one can fill the whole first-level table all the way up to the single subsequence of 13 characters, and each subsequence will have its optimal character cost and its compression under the first-level constraint of at most 1 segment associated with it.
Then you go to the 2nd level, where 2 segments are allowed... And again, from the bottom up, you identify the optimum cost and compression of each table coordinate under the given level's segment count constraint, by comparing all the possible ways to compose the subsequence using already computed positions, until you fill the table completely and thus compute the global optimum. There are some details to solve, but sorry, I'm not gonna code this for you.
After trying my own way for a while, my kudos to jbaylina for his beautiful algorithm and C implementation. Here's my attempted version of jbaylina's algorithm in Haskell, and below it further development of my attempt at a linear-time algorithm that attempts to compress segments that include repeated patterns in a one-by-one fashion:
import Data.Map (fromList, insert, size, (!))
compress s = (foldl f (fromList [(0,([],0)),(1,([s!!0],1))]) [1..n - 1]) ! n
where
n = length s
f b i = insert (size b) bestCandidate b where
add (sequence, sLength) (sequence', sLength') =
(sequence ++ sequence', sLength + sLength')
j' = [1..min 100 i]
bestCandidate = foldr combCandidates (b!i `add` ([s!!i,'1'],2)) j'
combCandidates j candidate' =
let nextCandidate' = comb 2 (b!(i - j + 1)
`add` ((take j . drop (i - j + 1) $ s) ++ "1", j + 1))
in if snd nextCandidate' <= snd candidate'
then nextCandidate'
else candidate' where
comb r candidate
| r > uBound = candidate
| not (strcmp r True) = candidate
| snd nextCandidate <= snd candidate = comb (r + 1) nextCandidate
| otherwise = comb (r + 1) candidate
where
uBound = div (i + 1) j
prev = b!(i - r * j + 1)
nextCandidate = prev `add`
((take j . drop (i - j + 1) $ s) ++ show r, j + length (show r))
strcmp 1 _ = True
strcmp num bool
| (take j . drop (i - num * j + 1) $ s)
== (take j . drop (i - (num - 1) * j + 1) $ s) =
strcmp (num - 1) True
| otherwise = False
Output:
*Main> compress "aaagctgctagag"
("a3gct2ag2",9)
*Main> compress "aaabbbaaabbbaaabbbaaabbb"
("aaabbb4",7)
Linear-time attempt:
import Data.List (sortBy)
group' xxs sAccum (chr, count)
| null xxs = if null chr
then singles
else if count <= 2
then reverse sAccum ++ multiples ++ "1"
else singles ++ if null chr then [] else chr ++ show count
| [x] == chr = group' xs sAccum (chr,count + 1)
| otherwise = if null chr
then group' xs (sAccum) ([x],1)
else if count <= 2
then group' xs (multiples ++ sAccum) ([x],1)
else singles
++ chr ++ show count ++ group' xs [] ([x],1)
where x:xs = xxs
singles = reverse sAccum ++ (if null sAccum then [] else "1")
multiples = concat (replicate count chr)
sequences ws strIndex maxSeqLen = repeated' where
half = if null . drop (2 * maxSeqLen - 1) $ ws
then div (length ws) 2 else maxSeqLen
repeated' = let (sequence,(sequenceStart, sequenceEnd'),notSinglesFlag) = repeated
in (sequence,(sequenceStart, sequenceEnd'))
repeated = foldr divide ([],(strIndex,strIndex),False) [1..half]
equalChunksOf t a = takeWhile(==t) . map (take a) . iterate (drop a)
divide chunkSize b#(sequence,(sequenceStart, sequenceEnd'),notSinglesFlag) =
let t = take (2*chunkSize) ws
t' = take chunkSize t
in if t' == drop chunkSize t
then let ts = equalChunksOf t' chunkSize ws
lenTs = length ts
sequenceEnd = strIndex + lenTs * chunkSize
newEnd = if sequenceEnd > sequenceEnd'
then sequenceEnd else sequenceEnd'
in if chunkSize > 1
then if length (group' (concat (replicate lenTs t')) [] ([],0)) > length (t' ++ show lenTs)
then (((strIndex,sequenceEnd,chunkSize,lenTs),t'):sequence, (sequenceStart,newEnd),True)
else b
else if notSinglesFlag
then b
else (((strIndex,sequenceEnd,chunkSize,lenTs),t'):sequence, (sequenceStart,newEnd),False)
else b
addOne a b
| null (fst b) = a
| null (fst a) = b
| otherwise =
let (((start,end,patLen,lenS),sequence):rest,(sStart,sEnd)) = a
(((start',end',patLen',lenS'),sequence'):rest',(sStart',sEnd')) = b
in if sStart' < sEnd && sEnd < sEnd'
then let c = ((start,end,patLen,lenS),sequence):rest
d = ((start',end',patLen',lenS'),sequence'):rest'
in (c ++ d, (sStart, sEnd'))
else a
segment xs baseIndex maxSeqLen = segment' xs baseIndex baseIndex where
segment' zzs#(z:zs) strIndex farthest
| null zs = initial
| strIndex >= farthest && strIndex > 0 = ([],(0,0))
| otherwise = addOne initial next
where
next#(s',(start',end')) = segment' zs (strIndex + 1) farthest'
farthest' | null s = farthest
| otherwise = if start /= end && end > farthest then end else farthest
initial#(s,(start,end)) = sequences zzs strIndex maxSeqLen
areExclusive ((a,b,_,_),_) ((a',b',_,_),_) = (a' >= b) || (b' <= a)
combs [] r = [r]
combs (x:xs) r
| null r = combs xs (x:r) ++ if null xs then [] else combs xs r
| otherwise = if areExclusive (head r) x
then combs xs (x:r) ++ combs xs r
else if l' > lowerBound
then combs xs (x: reduced : drop 1 r) ++ combs xs r
else combs xs r
where lowerBound = l + 2 * patLen
((l,u,patLen,lenS),s) = head r
((l',u',patLen',lenS'),s') = x
reduce = takeWhile (>=l') . iterate (\x -> x - patLen) $ u
lenReduced = length reduce
reduced = ((l,u - lenReduced * patLen,patLen,lenS - lenReduced),s)
buildString origStr sequences = buildString' origStr sequences 0 (0,"",0)
where
buildString' origStr sequences index accum#(lenC,cStr,lenOrig)
| null sequences = accum
| l /= index =
buildString' (drop l' origStr) sequences l (lenC + l' + 1, cStr ++ take l' origStr ++ "1", lenOrig + l')
| otherwise =
buildString' (drop u' origStr) rest u (lenC + length s', cStr ++ s', lenOrig + u')
where
l' = l - index
u' = u - l
s' = s ++ show lenS
(((l,u,patLen,lenS),s):rest) = sequences
compress [] _ accum = reverse accum ++ (if null accum then [] else "1")
compress zzs#(z:zs) maxSeqLen accum
| null (fst segment') = compress zs maxSeqLen (z:accum)
| (start,end) == (0,2) && not (null accum) = compress zs maxSeqLen (z:accum)
| otherwise =
reverse accum ++ (if null accum || takeWhile' compressedStr 0 /= 0 then [] else "1")
++ compressedStr
++ compress (drop lengthOriginal zzs) maxSeqLen []
where segment'#(s,(start,end)) = segment zzs 0 maxSeqLen
combinations = combs (fst $ segment') []
takeWhile' xxs count
| null xxs = 0
| x == '1' && null (reads (take 1 xs)::[(Int,String)]) = count
| not (null (reads [x]::[(Int,String)])) = 0
| otherwise = takeWhile' xs (count + 1)
where x:xs = xxs
f (lenC,cStr,lenOrig) (lenC',cStr',lenOrig') =
let g = compare ((fromIntegral lenC + if not (null accum) && takeWhile' cStr 0 == 0 then 1 else 0) / fromIntegral lenOrig)
((fromIntegral lenC' + if not (null accum) && takeWhile' cStr' 0 == 0 then 1 else 0) / fromIntegral lenOrig')
in if g == EQ
then compare (takeWhile' cStr' 0) (takeWhile' cStr 0)
else g
(lenCompressed,compressedStr,lengthOriginal) =
head $ sortBy f (map (buildString (take end zzs)) (map reverse combinations))
Output:
*Main> compress "aaaaaaaaabbbbbbbbbaaaaaaaaabbbbbbbbb" 100 []
"a9b9a9b9"
*Main> compress "aaabbbaaabbbaaabbbaaabbb" 100 []
"aaabbb4"

Convert Excel Column Number to Column Name in Matlab

I am using Excel 2007 which supports Columns upto 16,384 Columns. I would like to obtain the Column name corresponding Column Number.
Currently, I am using the following code. However this code supports upto 256 Columns. Any idea how to obtain Column Name if the column number is greater than 256.
function loc = xlcolumn(column)
if isnumeric(column)
if column>256
error('Excel is limited to 256 columns! Enter an integer number <256');
end
letters = {'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
count = 0;
if column-26<=0
loc = char(letters(column));
else
while column-26>0
count = count + 1;
column = column - 26;
end
loc = [char(letters(count)) char(letters(column))];
end
else
letters = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'];
if size(column,2)==1
loc =findstr(column,letters);
elseif size(column,2)==2
loc1 =findstr(column(1),letters);
loc2 =findstr(column(2),letters);
loc = (26 + 26*loc1)-(26-loc2);
end
end
Thanks
As a diversion, here is an all function handle example, with (almost) no file-based functions required. This is based on the dec2base function, since Excel column names are (almost) base 26 numbers, with the frustrating difference that there are no "0" characters.
Note: this is probably a terrible idea overall, but it works. Better solutions are probably found elsewhere in the file exchange.
First, the one file based function that I couldn't get around, to perform arbitrary depth function composition.
function result = compose( fnHandles )
%COMPOSE Compose a set of functions
% COMPOSE({fnHandles}) returns a function handle consisting of the
% composition of the cell array of input function handles.
%
% For example, if F, G, and H are function handles with one input and
% one output, then:
% FNCOMPOSED = COMPOSE({F,G,H});
% y = FNCOMPOSED(x);
% is equivalent to
% y = F(G(H(x)));
if isempty(fnHandles)
result = #(x)x;
elseif length(fnHandles)==1
result = fnHandles{1};
else
fnOuter = fnHandles{1};
fnRemainder = compose(fnHandles(2:end));
result = #(x)fnOuter(fnRemainder(x));
end
Then, the bizarre, contrived path to convert base26 values into the correct string
%Functions leading to "getNumeric", which creates a numeric, base26 array
remapUpper = #(rawBase)(rawBase + (rawBase>='A')*(-55)); %Map the letters 'A-P' to [10:26]
reMapLower = #(rawBase)(rawBase + (rawBase<'A')*(-48)); %Map characters '0123456789' to [0:9]
getRawBase = #(x)dec2base(x, 26);
getNumeric = #(x)remapUpper(reMapLower(getRawBase(x)));
%Functions leading to "correctNumeric"
% This replaces zeros with 26, and reduces the high values entry by 1.
% Similar to "borrowing" as we learned in longhand subtraction
borrowDownFrom = #(x, fromIndex) [x(1:(fromIndex-1)) (x(fromIndex)-1) (x(fromIndex+1)+26) (x((fromIndex+2):end))];
borrowToIfNeeded = #(x, toIndex) (x(toIndex)<=0)*borrowDownFrom(x,toIndex-1) + (x(toIndex)>0)*(x); %Ugly numeric switch
getAllConditionalBorrowFunctions = #(numeric)arrayfun(#(index)#(numeric)borrowToIfNeeded(numeric, index),(2:length(numeric)),'uniformoutput',false);
getComposedBorrowFunction = #(x)compose(getAllConditionalBorrowFunctions(x));
correctNumeric = #(x)feval(getComposedBorrowFunction(x),x);
%Function to replace numerics with letters, and remove leading '#' (leading
%zeros)
numeric2alpha = #(x)regexprep(char(x+'A'-1),'^#','');
%Compose complete function
num2ExcelName = #(x)arrayfun(#(x)numeric2alpha(correctNumeric(getNumeric(x))), x, 'uniformoutput',false)';
Now test using some stressing transitions:
>> num2ExcelName([1:5 23:28 700:704 727:729 1024:1026 1351:1355 16382:16384])
ans =
'A'
'B'
'C'
'D'
'E'
'W'
'X'
'Y'
'Z'
'AA'
'AB'
'ZX'
'ZY'
'ZZ'
'AAA'
'AAB'
'AAY'
'AAZ'
'ABA'
'AMJ'
'AMK'
'AML'
'AYY'
'AYZ'
'AZA'
'AZB'
'AZC'
'XFB'
'XFC'
'XFD'
This function I wrote works for any number of columns (until Excel runs out of columns). It just requires a column number input (e.g. 16368 will return a string 'XEN').
If the application of this concept is different than my function, it's important to note that a column of x number of A's begins every 26^(x-1) + 26^(x-2) + ... + 26^2 + 26 + 1. (e.g. 'AAA' begins on 26^2 + 26 + 1 = 703)
function [col_str] = let_loc(num_loc)
test = 2;
old = 0;
x = 0;
while test >= 1
old = 26^x + old;
test = num_loc/old;
x = x + 1;
end
num_letters = x - 1;
str_array = zeros(1,num_letters);
for i = 1:num_letters
loc = floor(num_loc/(26^(num_letters-i)));
num_loc = num_loc - (loc*26^(num_letters-i));
str_array(i) = char(65 + (loc - 1));
end
col_str = strcat(str_array(1:length(str_array)));
end
Hope this saves someone some time!

Code Golf: Solve a Maze [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Here's an interesting problem to solve in minimal amounts of code. I expect the recursive solutions will be most popular.
We have a maze that's defined as a map of characters, where = is a wall, a space is a path, + is your starting point, and # is your ending point. An incredibly simple example is like so:
====
+ =
= ==
= #
====
Can you write a program to find the shortest path to solve a maze in this style, in as little code as possible?
Bonus points if it works for all maze inputs, such as those with a path that crosses over itself or with huge numbers of branches. The program should be able to work for large mazes (say, 1024x1024 - 1 MB), and how you pass the maze to the program is not important.
The "player" may move diagonally. The input maze will never have a diagonal passage, so your base set of movements will be up, down, left, right. A diagonal movement would be merely looking ahead a little to determine if a up/down and left/right could be merged.
Output must be the maze itself with the shortest path highlighted using the asterisk character (*).
Works for any (fixed-size) maze with a minimum of CPU cycles (given a big enough BFG2000). Source size is irrelevant since the compiler is incredibly efficient.
while curr.x != target.x and curr.y != target.y:
case:
target.x > curr.x : dx = 1
target.x < curr.x : dx = -1
else : dx = 0
case:
target.y > curr.y : dy = 1
target.y < curr.y : dy = -1
else : dy = 0
if cell[curr.x+dx,curr.y+dy] == wall:
destroy cell[curr.x+dx,curr.y+dy] with patented BFG2000 gun.
curr.x += dx
curr.y += dy
survey shattered landscape
F#, not very short (72 non-blank lines), but readable. I changed/honed the spec a bit; I assume the original maze is a rectangle fully surrounded by walls, I use different characters (that don't hurt my eyes), I only allow orthogonal moves (not diagonal). I only tried one sample maze. Except for a bug about flipping x and y indicies, this worked the first time, so I expect it is right (I've done nothing to validate it other than eyeball the solution on the one sample I gave it).
open System
[<Literal>]
let WALL = '#'
[<Literal>]
let OPEN = ' '
[<Literal>]
let START = '^'
[<Literal>]
let END = '$'
[<Literal>]
let WALK = '.'
let sampleMaze = #"###############
# # # #
# ^# # # ### #
# # # # # # #
# # # #
############ #
# $ #
###############"
let lines = sampleMaze.Split([|'\r';'\n'|], StringSplitOptions.RemoveEmptyEntries)
let width = lines |> Array.map (fun l -> l.Length) |> Array.max
let height = lines.Length
type BestInfo = (int * int) list * int // path to here, num steps
let bestPathToHere : BestInfo option [,] = Array2D.create width height None
let mutable startX = 0
let mutable startY = 0
for x in 0..width-1 do
for y in 0..height-1 do
if lines.[y].[x] = START then
startX <- x
startY <- y
bestPathToHere.[startX,startY] <- Some([],0)
let q = new System.Collections.Generic.Queue<_>()
q.Enqueue((startX,startY))
let StepTo newX newY (path,count) =
match lines.[newY].[newX] with
| WALL -> ()
| OPEN | START | END ->
match bestPathToHere.[newX,newY] with
| None ->
bestPathToHere.[newX,newY] <- Some((newX,newY)::path,count+1)
q.Enqueue((newX,newY))
| Some(_,oldCount) when oldCount > count+1 ->
bestPathToHere.[newX,newY] <- Some((newX,newY)::path,count+1)
q.Enqueue((newX,newY))
| _ -> ()
| c -> failwith "unexpected maze char: '%c'" c
while not(q.Count = 0) do
let x,y = q.Dequeue()
let (Some(path,count)) = bestPathToHere.[x,y]
StepTo (x+1) (y) (path,count)
StepTo (x) (y+1) (path,count)
StepTo (x-1) (y) (path,count)
StepTo (x) (y-1) (path,count)
let mutable endX = 0
let mutable endY = 0
for x in 0..width-1 do
for y in 0..height-1 do
if lines.[y].[x] = END then
endX <- x
endY <- y
printfn "Original maze:"
printfn "%s" sampleMaze
let bestPath, bestCount = bestPathToHere.[endX,endY].Value
printfn "The best path takes %d steps." bestCount
let resultMaze = Array2D.init width height (fun x y -> lines.[y].[x])
bestPath |> List.tl |> List.iter (fun (x,y) -> resultMaze.[x,y] <- WALK)
for y in 0..height-1 do
for x in 0..width-1 do
printf "%c" resultMaze.[x,y]
printfn ""
//Output:
//Original maze:
//###############
//# # # #
//# ^# # # ### #
//# # # # # # #
//# # # #
//############ #
//# $ #
//###############
//The best path takes 27 steps.
//###############
//# # #....... #
//# ^# #.# ###. #
//# .# #.# # #. #
//# .....# #. #
//############. #
//# $....... #
//###############
Python
387 Characters
Takes input from stdin.
import sys
m,n,p=sys.stdin.readlines(),[],'+'
R=lambda m:[r.replace(p,'*')for r in m]
while'#'in`m`:n+=[R(m)[:r]+[R(m)[r][:c]+p+R(m)[r][c+1:]]+R(m)[r+1:]for r,c in[(r,c)for r,c in[map(sum,zip((m.index(filter(lambda i:p in i,m)[0]),[w.find(p)for w in m if p in w][0]),P))for P in zip((-1,0,1,0),(0,1,0,-1))]if 0<=r<len(m)and 0<=c<len(m[0])and m[r][c]in'# ']];m=n.pop(0)
print''.join(R(m))
I did this sort of thing for a job interview once (it was a pre-interview programming challenge)
Managed to get it working to some degree of success and it's a fun little challenge.

Resources