Write a short program in SNOBOL that calculate the sum of all even integers from 1 to 100 - snobol

You should use loop for multiple iterations of addition. GOTO statement and labels can be used to design a loop of your requirement.
please give me some ideas.

I think it's not really Snobol if it doesn't contain a pattern match...
This solution generates the even numbers 000 .. 100 by matching a pattern against the string "01012345678902468" and adding the generated numbers through the function add(). Looping is done through the FAIL keyword at the end of the pattern, which forces the pattern scanning to continue looking for alternatives until the expression (*EQ(h t o,100) ABORT) causes the scanning to abort.
&FULLSCAN = 1
DEFINE('add(x)') :(add.End);add sum = sum + x :(RETURN);add.End
"01012345678902468" (LEN(1) $ h *LEN(1 - h)) ARB (LEN(1) $ t
. *LEN(9 - t)) ARB LEN(1) $ o *add(h t o)
. (*EQ(h t o,100) ABORT) FAIL
OUTPUT = sum
END
Works both in Snobol and Spitbol (&FULLSCAN = 1 is needed for Snobol, it's a no-op in Spitbol)

*This works on the Spitbol version of Snobol.
loop m = le(n, 98) (n = n + 2) + m :s(loop); output = m ;end

Related

Write a program that adds together all the integers from `1` to `250` (inclusive) and `puts`es the total

How would I do this using ruby and only using a while loop with if, elsif and else only?
sum = 0
i = 1
while i < 251
sum += i
i += 1
end
puts sum
You don’t need a while loop to implement this. But if they want a while loop…
n = 250
while true
puts ( n * (n + 1)) / 2
break
end
Some of the comments have alternatives that are interesting, but keeping this one because it has the flow of control at least pass through the while loop. Not sure if it counts as using a loop if control doesn't even enter the body of the loop.

A faster alternative to all(a(:,i)==a,1) in MATLAB

It is a straightforward question: Is there a faster alternative to all(a(:,i)==a,1) in MATLAB?
I'm thinking of a implementation that benefits from short-circuit evaluations in the whole process. I mean, all() definitely benefits from short-circuit evaluations but a(:,i)==a doesn't.
I tried the following code,
% example for the input matrix
m = 3; % m and n aren't necessarily equal to those values.
n = 5000; % It's only possible to know in advance that 'm' << 'n'.
a = randi([0,5],m,n); % the maximum value of 'a' isn't necessarily equal to
% 5 but it's possible to state that every element in
% 'a' is a positive integer.
% all, equal solution
tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
%%%%%%%%%% all and equal solution %%%%%%%%%
ax_boo = all(a(:,i)==a,1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc
% alternative solution
tic
for i = 1:n % stepping up the elapsed time in orders of magnitude
%%%%%%%%%%% alternative solution %%%%%%%%%%%
ax_boo = a(1,i) == a(1,:);
for k = 2:m
ax_boo(ax_boo) = a(k,i) == a(k,ax_boo);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
end
toc
but it's intuitive that any "for-loop-solution" within the MATLAB environment will be naturally slower. I'm wondering if there is a MATLAB built-in function written in a faster language.
EDIT:
After running more tests I found out that the implicit expansion does have a performance impact in evaluating a(:,i)==a. If the matrix a has more than one row, all(repmat(a(:,i),[1,n])==a,1) may be faster than all(a(:,i)==a,1) depending on the number of columns (n). For n=5000 repmat explicit expansion has proved to be faster.
But I think that a generalization of Kenneth Boyd's answer is the "ultimate solution" if all elements of a are positive integers. Instead of dealing with a (m x n matrix) in its original form, I will store and deal with adec (1 x n matrix):
exps = ((0):(m-1)).';
base = max(a,[],[1,2]) + 1;
adec = sum( a .* base.^exps , 1 );
In other words, each column will be encoded to one integer. And of course adec(i)==adec is faster than all(a(:,i)==a,1).
EDIT 2:
I forgot to mention that adec approach has a functional limitation. At best, storing adec as uint64, the following inequality must hold base^m < 2^64 + 1.
Since your goal is to count the number of columns that match, my example converts the binary encoding to integer decimals, then you just loop over the possible values (with 3 rows that are 8 possible values) and count the number of matches.
a_dec = 2.^(0:(m-1)) * a;
num_poss_values = 2 ^ m;
num_matches = zeros(num_poss_values, 1);
for i = 1:num_poss_values
num_matches(i) = sum(a_dec == (i - 1));
end
On my computer, using 2020a, Here are the execution times for your first 2 options and the code above:
Elapsed time is 0.246623 seconds.
Elapsed time is 0.553173 seconds.
Elapsed time is 0.000289 seconds.
So my code is 853 times faster!
I wrote my code so it will work with m being an arbitrary integer.
The num_matches variable contains the number of columns that add up to 0, 1, 2, ...7 when converted to a decimal.
As an alternative you can use the third output of unique:
[~, ~, iu] = unique(a.', 'rows');
for i = 1:n
ax_boo = iu(i) == iu;
end
As indicated in a comment:
ax_boo isolates the indices of the columns I have to sum in a row vector b. So, basically the next line would be something like c = sum(b(ax_boo),2);
It is a typical usage of accumarray:
[~, ~, iu] = unique(a.', 'rows');
C = accumarray(iu,b);
for i = 1:n
c = C(i);
end

Boyer-Moore Galil Rule

I was implementing the Boyer-Moore Algorithm for substring search in Python when I learned about the Galil Rule. I've looked around online for the Galil Rule but I haven't found anything more than a couple of sentences, and I cannot get access to the original paper. How can I implement this into my current algorithm?
i = 0
while i < (N - M + 1):
skip = 0
for j in reversed(range(0, M)):
if pattern[j] != text[i + j]:
skip = max(1, j - offsets[text[i+j]])
break
if skip == 0:
return i
i += skip
return -1
Notes:
offsets[c] = -1 if c is not in the pattern
offsets[c] = last index of c in the pattern
Example:
aaabcb
offsets[a] = 2
offsets[b] = 5
offsets[c] = 4
offsets[d] = -1
The few sentences I have found have said to keep track of when the first mismatch occurs in my inner loop (j, if the if-statement inside the inner loop is True) and the position in which I started the comparisons (i + j, in my case). I understand the intuition that I've already checked all the indices in between those, so I shouldn't have to do those comparisons again. I just don't understand how to connect the dots and arrive at an implementation.
The Galil rule is about exploiting periodicity in the pattern to reduce comparisons. Say you have a pattern abcabcab. It's periodic with smallest period abc. In general, a pattern P is periodic if there's a string U such that P is a prefix of UUUUU.... (In the above example, abcabcab is clearly a prefix of the repeating string abc = U.) We call the shortest such string the period of P. Let the length of that period be k (in the example above k = 3 since U = abc).
First of all, keep in mind that the Galil rule applies only after you've found an occurrence of P in the text. When you do that, the Galil rule says that you could shift by k (the periodicity of the pattern) and you only have to compare the last k characters of the now shifted pattern to determine if there was a match.
Here's an example:
P = ababa
T = bababababab
U = ab
k = 2
First occurrence: b[ababa]babab. Now you can shift by k = 2 and you only have to check the last two characters of the pattern:
T = bababa[ba]bab
P = aba[ba] // Only need to compare chars inside brackets for next match.
The rest of P must match since P is periodic and you shifted it by its period k from an existing match (this is crucial) so the repeating parts will nicely line up.
If you've found another match, just repeat. If you find a mismatch, however, you revert to the standard Boyer-Moore algorithm until you find another match. Remember, you can only use the Galil rule when you find a match and you shift by k (otherwise the pattern is not guaranteed to line up with the previous occurrence).
Now, you might wonder, how to determine k for a given pattern P. You'll need to calculate the suffixes array N first, where N[i] will be the length of the longest common suffix of the prefix P[0, i] and P. (You can calculate the suffixes array by calculating the prefixes array Z on the reverse of P using the Z algorithm, as described here, for example.) Once you have the suffixes array, you can easily find k since it'll be the smallest k > 0 such that N[m - k - 1] == m - k (where m = |P|).
For example:
P = ababa
m = 5
N = [1, 0, 3, 0, 5]
k = 2 because N[m - k - 1] == N[5 - 2 - 1] == N[2] == 3 == 5 - k
The answer by #Lajos Nagy has explained the idea of Galil rule perfectly, however we have a more straightforward way to calculate k:
Just use the prefix function of KMP algorithm.
The prefix[i] means the longest proper prefix of P[0..i] which is also a suffix.
And, k = m-prefix[m-1] .
This article has explained the details.

Homework: Implementing Karp-Rabin; For the hash values modulo q, explain why it is a bad idea to use q as a power of 2?

I have a two-fold homework problem, Implement Karp-Rabin and run it on a test file and the second part:
For the hash values modulo q, explain why it is a bad idea to use q as a power of 2. Can you construct a terrible example e.g. for q=64
and n=15?
This is my implementation of the algorithm:
def karp_rabin(text, pattern):
# setup
alphabet = 'ACGT'
d = len(alphabet)
n = len(pattern)
d_n = d**n
q = 2**32-1
m = {char:i for i,char in enumerate(alphabet)}
positions = []
def kr_hash(s):
return sum(d**(n-i-1) * m[s[i]] for i in range(n))
def update_hash():
return d*text_hash + m[text[i+n-1]] - d_n * m[text[i-1]]
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
positions.append(i)
return ' '.join(map(str, positions))
...The second part of the question is referring to this part of the code/algo:
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
# the modulo q used to check if the hashes are congruent
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
positions.append(i)
I don't understand why it would be a bad idea to use q as a power of 2. I've tried running the algorithm on the test file provided(which is the genome of ecoli) and there's no discernible difference.
I tried looking at the formula for how the hash is derived (I'm not good at math) trying to find some common factors that would be really bad for powers of two but found nothing. I feel like if q is a power of 2 it should cause a lot of clashes for the hashes so you'd need to compare strings a lot more but I didn't find anything along those lines either.
I'd really appreciate help on this since I'm stumped. If someone wants to point out what I can do better in the first part (code efficiency, readability, correctness etc.) I'd also be thrilled to hear your input on that.
There is a problem if q divides some power of d, because then only a few characters contribute to the hash. For example in your code d=4, if you take q=64 only the last three characters determine the hash (d**3 = 64).
I don't really see a problem if q is a power of 2 but gcd(d,q) = 1.
Your implementation looks a bit strange because instead of
if pattern_hash % q == text_hash % q and pattern == text[i:i+n]:
you could also use
if pattern_hash == text_hash and pattern == text[i:i+n]:
which would be better because you get fewer collisions.
The Thue–Morse sequence has among its properties that its polynomial hash quickly becomes zero when a power of 2 is the hash module, for whatever polynomial base (d). So if you will try to search a short Thue-Morse sequence in a longer one, you will have a great lot of hash collisions.
For example, your code, slightly adapted:
def karp_rabin(text, pattern):
# setup
alphabet = '01'
d = 15
n = len(pattern)
d_n = d**n
q = 32
m = {char:i for i,char in enumerate(alphabet)}
positions = []
def kr_hash(s):
return sum(d**(n-i-1) * m[s[i]] for i in range(n))
def update_hash():
return d*text_hash + m[text[i+n-1]] - d_n * m[text[i-1]]
pattern_hash = kr_hash(pattern)
for i in range(0, len(text) - n + 1):
text_hash = update_hash() if i else kr_hash(text[i:n])
if pattern_hash % q == text_hash % q : #and pattern == text[i:i+n]:
positions.append(i)
return ' '.join(map(str, positions))
print(karp_rabin('0110100110010110100101100110100110010110011010010110100110010110', '0110100110010110'))
outputs a lot of positions, although only three of then are proper matches.
Note that I have dropped the and pattern == text[i:i+n] check. Obviously if you restore it, the result will be correct, but also it is obvious that the algorithm will do much more work checking this additional condition than for other q. In fact, because there are so many collisions, the whole idea of algorithm becomes not working: you could almost as effectively wrote a simple algorithm that checks every position for a match.
Also note that your implementation is quite strange. The whole idea of polynomial hashing is to take the modulo operation each time you compute the hash. Otherwise your pattern_hash and text_hash are very big numbers. In other languages this might mean arithmetic overflow, but in Python this will invoke big integer arithmetic, which is slow and once again loses the whole idea of the algorithm.

Dynamic programming idiom for combinations

Consider the problem in which you have a value of N and you need to calculate how many ways you can sum up to N dollars using [1,2,5,10,20,50,100] Dollar bills.
Consider the classic DP solution:
C = [1,2,5,10,20,50,100]
def comb(p):
if p==0:
return 1
c = 0
for x in C:
if x <= p:
c += comb(p-x)
return c
It does not take into effect the order of the summed parts. For example, comb(4) will yield 5 results: [1,1,1,1],[2,1,1],[1,2,1],[1,1,2],[2,2] whereas there are actually 3 results ([2,1,1],[1,2,1],[1,1,2] are all the same).
What is the DP idiom for calculating this problem? (non-elegant solutions such as generating all possible solutions and removing duplicates are not welcome)
Not sure about any DP idioms, but you could try using Generating Functions.
What we need to find is the coefficient of x^N in
(1 + x + x^2 + ...)(1+x^5 + x^10 + ...)(1+x^10 + x^20 + ...)...(1+x^100 + x^200 + ...)
(number of times 1 appears*1 + number of times 5 appears * 5 + ... )
Which is same as the reciprocal of
(1-x)(1-x^5)(1-x^10)(1-x^20)(1-x^50)(1-x^100).
You can now factorize each in terms of products of roots of unity, split the reciprocal in terms of Partial Fractions (which is a one time step) and find the coefficient of x^N in each (which will be of the form Polynomial/(x-w)) and add them up.
You could do some DP in calculating the roots of unity.
You should not go from begining each time, but at max from were you came from at each depth.
That mean that you have to pass two parameters, start and remaining total.
C = [1,5,10,20,50,100]
def comb(p,start=0):
if p==0:
return 1
c = 0
for i,x in enumerate(C[start:]):
if x <= p:
c += comb(p-x,i+start)
return c
or equivalent (it might be more readable)
C = [1,5,10,20,50,100]
def comb(p,start=0):
if p==0:
return 1
c = 0
for i in range(start,len(C)):
x=C[i]
if x <= p:
c += comb(p-x,i)
return c
Terminology: What you are looking for is the "integer partitions"
into prescibed parts (you should replace "combinations" in the title).
Ignoring the "dynamic programming" part of the question, a routine
for your problem is given in the first section of chapter 16
("Integer partitions", p.339ff) of the fxtbook, online at
http://www.jjj.de/fxt/#fxtbook

Resources