Ruby loop order? - ruby

I'm trying to bruteforce a password. As I was playing with some loops, I've noticed there's a specific order. Like, if I have for i in '.'..'~' it puts
.
/
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
#
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
^
_
`
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~
After seeing this, I wondered to myself "what is the loop order in Ruby?" What character is the highest priority and which is the lowest priority? Sorry if this question is basic. I just haven't found a site where anyone knows. If you have questions about the question just ask. I hope this is clear enough!

The order is defined by the binary representation of the letters. Which, in turn, is defined by a standard. The standard used is ASCII (American Standard Code for Information Interchange).
http://www.asciitable.com/
Other encoding standards exist, like EBCDIC which is used by IBM mid-range computers.

for / in is (mostly) syntactic sugar for each, so
for i in '.'..'~' do puts i end
is (roughly) equivalent (modulo local variable scope) to
('.'..'~').each do |i| puts i end
Which means that we have to look at Range#each for our answer (bold emphasis mine):
each {| i | block } → rng
Iterates over the elements of range, passing each in turn to the block.
The each method can only be used if the begin object of the range supports the succ method. A TypeError is raised if the object does not have succ method defined (like Float).
And the documentation for the Range class itself provides more details:
Custom Objects in Ranges
Ranges can be constructed using any objects that can be compared using the <=> operator. Methods that treat the range as a sequence (#each and methods inherited from Enumerable) expect the begin object to implement a succ method to return the next object in sequence.
So, while it isn't spelled out directly, it is clear that Range#each works by
Repeatedly sending the succ message to the begin object (and then to the object that was returned by succ, and then to that object, and so forth), and
Comparing the current element to the end object using the <=> spaceship combined comparison operator to figure out whether to produce another object or end the loop.
Which means that we have to look at String#succ next:
succ → new_str
Returns the successor to str. The successor is calculated by incrementing characters starting from the rightmost alphanumeric (or the rightmost character if there are no alphanumerics) in the string. Incrementing a digit always results in another digit, and incrementing a letter results in another letter of the same case. Incrementing nonalphanumerics uses the underlying character set’s collating sequence.
Basically, what this means is:
incrementing a letter does what you expect
incrementing a digit does what you expect
incrementing something that is neither a letter nor a digit is arbitrary and dependent on the string's character set's collating sequence
In this particular case, you didn't tell us what the collating sequence of your string is, but I assume it is ASCII, which means you get what is colloquially called ASCIIbetical ordering.

It's not about priority, but the order of their values. As already said, the characters have their own ASCII representation (E.g., 'a' value is 97 and 'z' value is 122).
You could see this for yourself trying this:
('a'..'z').each do |c|
puts c.ord
end
Analogously, this should also work:
(97..122).each do |i|
puts i.chr
end

Related

number of letters to be deleted from a string so that it is divisible by another string

I am doing this problem https://www.spoj.com/problems/DIVSTR/
We are given two strings S and T.
S is divisible by string T if there is some non-negative integer k, which satisfies the equation S=k*T
What is the minimum number of characters which should be removed from S, so that S is divisible by T?
The main idea was to match T with S using a pointer and count the number of instances of T occurring in S when the count is done, bring the pointer to the start of T and if there's a mismatch, compare T's first letter with S's present letter.
This code is working totally fine with test cases they provided and custom test cases I gave, but it could not get through hidden test cases.
this is the code
def no_of_letters(string1,string2):
# print(len(string1),len(string2))
count = 0
pointer = 0
if len(string1)<len(string2):
return len(string1)
if (len(string1)==len(string2)) and (string1!=string2):
return len(string1)
for j in range(len(string1)):
if (string1[j]==string2[pointer]) and pointer<(len(string2)-1):
pointer+=1
elif (string1[j]==string2[pointer]) and pointer == (len(string2)-1):
count+=1
pointer=0
elif (string1[j]!=string2[pointer]):
if string1[j]==string2[0]:
pointer=1
else:
pointer = 0
return len(string1)-len(string2)*count
One place where I think there should be confusion is when same letters can be parts of two counts, but it should not be a problem, because our answer doesn't need to take overlapping into account.
for example, S = 'akaka' T= 'aka' will give the output 2, irrespective of considering first 'aka',ka as count or second ak,'aka'.
I believe that the solution is much more straightforward that you make it. You're simply trying to find how many times the characters of T appear, in order, in S. Everything else is the characters you remove. For instance, given RobertBaron's example of S="akbaabka" and T="aka", you would write your routine to locate the characters a, k, a, in that order, from the start of S:
akbaabka
ak a^
# with some pointer, ptr, now at position 4, marked with a caret above
With that done, you can now recur on the remainder of the string:
find_chars(S[ptr:], T)
With each call, you look for T in S; if you find it, count 1 repetition and recur on the remainder of S; if not, return 0 (base case). As you crawl back up your recursion stack, accumulate all the 1 counts, and there is your value of k.
The quantity of chars to remove is len(s) - k*len(T).
Can you take it from there?

Need efficient algorithm in combinatorics

I am trying to find the best (realistic) algorithm for solving a cryptography challenge, in which:
the given cipher text C is made of about 6000 characters taken in the set S={A,B,C,...,Y,a,b,c,...y}. So |S| = 50.
the encryption scheme does not allow to have two identical adjacent characters in C
25 letters in S are called Nulls, and are unknown
these Nulls must be removed from C to obtain the actual cipher text C' which can then be attacked.
the list of Nulls in C is named N and |N| is close to |C|/2 = 3000
so: |N| + |C'| = |C|
My aim is to identify the 25 Nulls, satisfying these two conditions:
there may not be two identical adjacent characters in C'
there may not be two identical adjacent Nulls in N
Obviously by brute force there are 50!/(25! 25!) = 126410606437752 combinations of 25 Nulls in S, so this is not a realistic approach.
I have tried to recursively explore the tree of sets of Nulls and 'cut branches' as much and as soon as possible.
For example, when adding a letter of S to the subset of Nulls, if the sequence "x n1n2 x" appears in C where x is not yet a Null and n1n2 are Nulls, then x should be a Null too.
However this is not enough for a run-time lower than a few centuries...
Can you think of a more clever algorithm for identifying these 25 Nulls ?
Note: there might be more than one set of Nulls satisfying the two conditions
lets try something like this:
Create a list of sets - each set contains one char from S. the set is the null chars.
while you have more then two sets:
for each set
search the cipher text for X[<set-chars>]+X
if found, union the set with the set X in it.
if no sets where united, start recursing with two sets united.
You can speed up things if you keep a different cipher text for each set, removing from it the chars in the set. if you do so, the search is easier - you are searching for XX, witch is constant length. every time you union two sets you need to remove all the chars in the sets from the cipher text.
The time this well take depends on the string C you are given.
An explanation about the sets - each set is an option for C' or N. If you find that A and X are in the same group, then {A, X} is either a subset of N or of C'. If later you will find the same about Y and B, then {Y, B} is a subset. Later, finding a substring YAXAXY means that Y is in the same group as A and X, and so will B, because it's with Y. At the end you will end with two groups - one for C' and one for N, witch you can't distinguish between.
elyashiv's method is the good one.
It is very fast.
I have produced the two sets C' and N, which are equivalent.
The sub-sets of S, S1 and S2 which produce C' and N are adequately such that S = S1 U S2.
Thank you.

Ukkonen's suffix tree algorithm: procedure 'test and split' unclear

ukkonen's on line construction algorithm
i got a problem trying to understand the 'test and split' procedure,which is as follows:
procedure test–and–split(s, (k, p), t):
>1. if k ≤ p then
>2. let g'(s,(k',p'))=s' be the tk-transition from s
>3. if t=t(k'+p-k+1) then return (true,s)
my problem is that what exactly does the 2nd line mean,how can g'(s,(k',p'))be still a tk-transition if it starts from s and followed by t(k') instead of t(k)??
Probably you already figured it out and you don't need an answer anymore, but since I had the same problem in trying to understand it, and maybe it'll be useful for someone else in the future, the answer I think is the following one.
In Ukkonen's on line construction algorithm, on page 7 you can read that:
...
The string w spelled out by the transition path in STrie(T) between two explicit states s and r is represented in STree(T) as generalized transition g′(s,w) = r. To save space the string w is actually represented as a pair (k,p) of pointers (the left pointer k and the right pointer p) to T such that tk . . . tp = w. In this way the generalized transition gets form g′(s, (k, p)) = r.
Such pointers exist because there must be a suffix Ti such that the transition path for Ti in STrie(T) goes through s and r. We could select the smallest such i, and let k and p point to the substring of this Ti that is spelled out by the transition path from s to r. A transition g′(s, (k, p)) = r is called an a–transition if tk = a. Each s can have at most one a–transition for each a ∈ Σ.
...
This means that we are looking for the smallest indexes k and p such that tk . . . tp = w in T
=> if there is more than one occurrence of w in T, with k and p we always reference the first one.
Now, procedure test–and–split(s,(k,p),t) tests whether or not a state with canonical reference pair (s,(k,p)) is the endpoint, that is, a state that in STrie(T i−1) would have a ti –transition. Symbol ti is given as input parameter t.
The first lines of the algorithm are the following:
procedure test–and–split(s,(k,p),t):
1. if k ≤ p then
2. let g′(s,(k′,p′)) = s′ be the t(k)–transition from s;
3. if t = t(k′+p−k+1) then return(true,s)
4. else ...
On line 1 we check if the state is implicit (that is when k <= p).
If so, then on line 2 we want to find the transition from s that starts with the character we find in pos k of T (that is tk). Note that tk must be equal to tk' but indexes k and k' can be different because we always point to the first occurrence of a string w in T (remember also that from one state there can be at most one transition that starts with character tk => so that's the correct and the only one).
Then on line 3 we check if the state referenced by the canonical reference pair (s,(k,p)) is the endpoint, that is if it has a ti -transition. The state (s,(k,p)) is the one (implicit or not) that we can reach from state s, following the tk' -transition (that is the tk-transition because k' = k) for (p - k) characters. This explains the tk′+p−k+1, where the +1 is for the next character, the one that we are checking if it is equal to t (where t = ti). In that case we reached the endpoint and we return true.
Else, starting from line 4, we split the transition g′(s,(k′,p′)) = s′ to make explicit the state (s,(k,p)) and return the new explicit state.

Scope of variables and the digits function

My question is twofold:
1) As far as I understand, constructs like for loops introduce scope blocks, however I'm having some trouble with a variable that is define outside of said construct. The following code depicts an attempt to extract digits from a number and place them in an array.
n = 654068
l = length(n)
a = Int64[]
for i in 1:(l-1)
temp = n/10^(l-i)
if temp < 1 # ith digit is 0
a = push!(a,0)
else # ith digit is != 0
push!(a,floor(temp))
# update n
n = n - a[i]*10^(l-i)
end
end
# last digit
push!(a,n)
The code executes fine, but when I look at the a array I get this result
julia> a
0-element Array{Int64,1}
I thought that anything that goes on inside the for loop is invisible to the outside, unless I'm operating on variables defined outside the for loop. Moreover, I thought that by using the ! syntax I would operate directly on a, this does not seem to be the case. Would be grateful if anyone can explain to me how this works :)
2) Second question is about syntex used when explaining functions. There is apparently a function called digits that extracts digits from a number and puts them in an array, using the help function I get
julia> help(digits)
Base.digits(n[, base][, pad])
Returns an array of the digits of "n" in the given base,
optionally padded with zeros to a specified size. More significant
digits are at higher indexes, such that "n ==
sum([digits[k]*base^(k-1) for k=1:length(digits)])".
Can anyone explain to me how to interpret the information given about functions in Julia. How am I to interpret digits(n[, base][, pad])? How does one correctly call the digits function? I can't be like this: digits(40125[, 10])?
I'm unable to reproduce you result, running your code gives me
julia> a
1-element Array{Int64,1}:
654068
There's a few mistakes and inefficiencies in the code:
length(n) doesn't give the number of digits in n, but always returns 1 (currently, numbers are iterable, and return a sequence that only contain one number; itself). So the for loop is never run.
/ between integers does floating point division. For extracting digits, you´re better off with div(x,y), which does integer division.
There's no reason to write a = push!(a,x), since push! modifies a in place. So it will be equivalent to writing push!(a,x); a = a.
There's no reason to digits that are zero specially, they are handled just fine by the general case.
Your description of scoping in Julia seems to be correct, I think that it is the above which is giving you trouble.
You could use something like
n = 654068
a = Int64[]
while n != 0
push!(a, n % 10)
n = div(n, 10)
end
reverse!(a)
This loop extracts the digits in opposite order to avoid having to figure out the number of digits in advance, and uses the modulus operator % to extract the least significant digit. It then uses reverse! to get them in the order you wanted, which should be pretty efficient.
About the documentation for digits, [, base] just means that base is an optional parameter. The description should probably be digits(n[, base[, pad]]), since it's not possible to specify pad unless you specify base. Also note that digits will return the least significant digit first, what we get if we remove the reverse! from the code above.
Is this cheating?:
n = 654068
nstr = string(n)
a = map((x) -> x |> string |> int , collect(nstr))
outputs:
6-element Array{Int64,1}:
6
5
4
0
6
8

Why is there not a destructive version of `to_s`?

Lets say:
n = 5
n.to_s
p n
the result of n is still 5 rather than "5". What's the shortest way to replace the original variable n with my newly converted n without having to go through the following:
n = 5
a = n.to_s
p a
Why doesn't Ruby allow me to call to_s! on the object?
An integer cannot magically turn itself into a String. Methods (including ! methods) can only cause the object value to change, not the type. Besides, integers are immutable -- the integer itself can't be modified (but the name pointing at it can be re-pointed at a new integer).
Therefore, to_s! does not exist, and instead you need to rebind the variable by writing e.g.
n = n.to_s

Resources