Writing an iterpreter in Ruby, not able to consume whitespace properly - ruby

So I am following the "Writing an interpreter book" and implementing it in Ruby instead of Go. I am able to scan tokens like ; =, +, etc but it seems to behave differently when I have identifiers like let, 10, etc in my input string. Tried to hunt this bug this whole week but in vain so I thought a fresh pair of eyes might be able to catch it.
Here is an overview.
The codebase is very small and most of the logic resides in lib/lexer/lexer.rb
The class Lexer maintains the following state a cursor for the current character in the input string, a cursor for the next character and the current character in the input string
Lexer has the following methods
read_char which sets the data members to appropriate values
read_indentifier which is used to extract all the characters belonging to strings that are not reserved keywords but identifiers and call read_char before returning
read_number same as read_identifier but for numbers
consume_whitespace to skip over spaces, newlines, etc
next_token used to match the current character with the appropriate case and return its Token object defined in lib/token/token.rb and call read_char to increment the cursors before returning
require_relative '../token/token'
def is_letter(ch) #basically decides syntax acceptable for variable names
#puts ch.class
'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_'
end
def is_digit(ch) #checks if digit
'0' <= ch && ch <= '9'
end
class Lexer
def initialize(input)
#input = input
#position = 0
#readPosition = 0
#ch =''
read_char
end
def read_char
#puts caller[0]
#ch = #readPosition >= #input.length ? '' : #input[#readPosition]
#position = #readPosition
#readPosition += 1
#puts "INSIDE READ_CHAR #{#position} #{#readPosition} #{#ch}"
end
# SUPPOSED TO BE A LOOP WAS JUST A CONDITION. NOW FIXED.
def consume_whitespace
while #ch == ' ' || #ch =='\t' || #ch == '\n' || #ch == '\r' do
read_char
end
end
def read_identifier
pos = #position
#puts "RI: char #{#ch} pos #{pos} position #{#position}"
while is_letter(#ch) do
#puts #ch
read_char
end
puts "METHOD read_identifier: char #{#ch} pos #{pos} position #{#position}\n"
#input[pos..#position-1]
end
def read_number
pos = #position
#puts "RN: char #{#ch} pos #{pos} position #{#position}"
while is_digit(#ch) do
read_char
end
puts "METHOD read_number: char #{#ch} pos #{pos} position #{#position}\n"
#input[pos..#position-1]
end
def next_token
#puts #ch, #ch.class
#puts "\nX=X=X=X=X=X=X=X=X=: #{#ch}, #{#ch.ord}, X=X=X=X=X=X=X=X=X=\n"
tok = nil
consume_whitespace
tok =
case #ch
when '=' then Token.new(ASSIGN, #ch)
when '+' then Token.new(PLUS, #ch)
when '-' then Token.new(MINUS, #ch)
when '/' then Token.new(DIVIDE, #ch)
when '*' then Token.new(MULTIPLY, #ch)
when '%' then Token.new(MODULO, #ch)
#when '==' then Token.new(EQUAL_TO, #ch)
when '>' then Token.new(GREATER_THAN, #ch)
when '<' then Token.new(LESS_THAN, #ch)
#when '!=' then Token.new(UNEQUAL_TO, #ch)
#when '&&' then Token.new(AND, #ch)
#when '||' then Token.new(OR, #ch)
when '!' then Token.new(NOT, #ch)
when ',' then Token.new(COMMA, #ch)
when ';' then Token.new(SEMICOLON, #ch)
when '?' then Token.new(QUESTION, #ch)
when '(' then Token.new(LPAREN, #ch)
when ')' then Token.new(RPAREN, #ch)
when '[' then Token.new(LSQUARE, #ch)
when ']' then Token.new(RSQUARE, #ch)
when '{' then Token.new(LCURLY, #ch)
when '}' then Token.new(RCURLY, #ch)
else
#puts 'hello from next_token', #ch.ord
# STATE WAS BEING MUTATED NOW FIXED
puts "letter #{#ch}"
puts "letter ascii #{#ch.ord}"
#puts "isletter "
if is_letter(#ch)
literal = read_identifier
Token.new(look_up_ident(literal), literal)
elsif is_digit(#ch)
Token.new(INT, read_number)
else
Token.new(ILLEGAL, "ILLEGAL")
end
end
read_char
return tok
end
end
Now the rake test failures weren't helpful in debugging so I decided to do it simply by writing a main.rb script which would import and run my lexer and a sprinkled a lot of putss throughout the codebase
This is my main.rb
require_relative 'lib/lexer/lexer'
lex = Lexer.new('five = 5;
ten = 10;')
i = 1
while i <= 8
tok = lex.next_token
puts "\nIN_MAIN: #{tok.type} ==> #{tok.literal}\n\n"
i=i+1
end
This is the output of ruby main.rb
letter f
letter ascii 102
METHOD read_identifier: char pos 0 position 4
IN_MAIN: IDENTIFIER ==> five
IN_MAIN: = ==> =
letter 5
letter ascii 53
METHOD read_number: char ; pos 7 position 8
IN_MAIN: INT ==> 5
letter
letter ascii 10
IN_MAIN: ILLEGAL ==> ILLEGAL
letter t
letter ascii 116
METHOD read_identifier: char pos 27 position 30
IN_MAIN: IDENTIFIER ==> ten
IN_MAIN: = ==> =
letter 1
letter ascii 49
METHOD read_number: char ; pos 33 position 35
IN_MAIN: INT ==> 10
letter
Traceback (most recent call last):
2: from main.rb:8:in `<main>'
1: from /home/palash25/gundoochy/lib/lexer/lexer.rb:89:in `next_token'
/home/palash25/gundoochy/lib/lexer/lexer.rb:89:in `ord': empty string (ArgumentError)
We can ignore the last line because I am not able to handle how to return an object for EOF right now but here is the gist of what is happening before that
The lexer is able to scan the tokens correctly till five = 5 after that it skips over the next immediate character which was ; and does return a token object for that and instead returns a token object of ILLEGAL type for the \n that is right after ; (I even printed out the ascii values of the character to know for sure it was the \n returning and ILLEGAL)
This should not have happened since consume_whitespace is supposed to skip over all kinds of whitespace but it still didn't for newlines anyway after that we are able to scan the next line that is ten = 10 but the last semicolon is nowhere to be seen in the output just like the first one
If I use an input string without any identifier or number it works perfectly fine.
Here is the link to the full codebase https://gitlab.com/palash25/gundoochy

In your Lexer code (which you should include in your original question), you have the following method:
def consume_whitespace
while #ch == ' ' || #ch =='\t' || #ch == '\n' || #ch == '\r' do
read_char
end
end
Here, you attempt to specify various whitespace characters. However, since you have named them with single quotes, the escape sequences with the backslash are not applied. Instead, you consume a literal backslash followed by an t, n, or r character.
If you use double quotes here, the characters in your source code are interpreted as tab, newline, or carriage return characters respectively:
def consume_whitespace
while #ch == ' ' || #ch == "\t" || #ch == "\n" || #ch == "\r" do
read_char
end
end

Related

How to use .ord and .chr properly in a loop?

I am trying to make a function that takes a jumbled sequence of letters and returns English. For some reason, I can't get word = (word.ord-4).chr to work properly. The secret to the code is that the letters are shifted 4 slots backwards, which is why I'm converting it to an integer first, subtracting 4, and then turning it back to a string.
The loop also appears to be ignoring the fact that I told it to skip a word if it's any of those special characters. What am I doing wrong?
Any suggestions or sources that will bring me closer to solving this problem?
def north_korean_cipher(coded_mesage)
input = coded_mesage.split('') # splits the coded message into array of letters
input.each do |word|
word = (word.ord - 4).chr
if word == '#' || '#' || '$' || '%' || '^' || '&' || '*'
next
end
end
print input
end
north_korean_cipher('m^aerx%e&gsoi!')
You want a mapping like this:
input: abcdefghijklmnopqrstuvwxyz
output: wxyzabcdefghijklmnopqrstuv
Unfortunately your approach doesn't work for the first 4 letters:
("a".ord - 4).chr #=> "]"
("b".ord - 4).chr #=> "^"
("c".ord - 4).chr #=> "_"
("d".ord - 4).chr #=> "`"
I'd use String#tr. It replaces each occurrence in the first string with the corresponding character in the second string:
"m^aerx%e&gsoi!".tr("abcdefghijklmnopqrstuvwxyz", "wxyzabcdefghijklmnopqrstuv")
#=> "i^want%a&coke!"
There's also a "c1-c2 notation to denote ranges of characters":
"m^aerx%e&gsoi!".tr("a-z", "w-za-v")
#=> "i^want%a&coke!"
The documentation further says:
If to_str is shorter than from_str, it is padded with its last character in order to maintain the correspondence.
So it can be used to easily replace the "special characters" with a space:
"m^aerx%e&gsoi!".tr("a-z##$%^&*", "w-za-v ")
#=> "i want a coke!"
This:
if word == '#' || '#' || '$' || '%' || '^' || '&' || '*'
does not do what you expect it to do, because '#' as a condition will always be true. You can't compare objects like that. You should do something like
if word == '#' || word == '#' || word == '$' || word == '%' || word == '^' || word == '&' || word == '*'
You can write it in a more succinct way by asking:
if %w(# # $ % ^ & *).include? word
Which checks if word is in the collection of options...

Ruby method to reverse characters not recursing for words

The 'reverse by characters' works but the third test "by words" doesn't -
expected: "sti gniniar"
got: "sti" (using ==)
def reverse_itti(msg, style='by_character')
new_string = ''
word = ''
if style == 'by_character'
msg.each_char do |one_char|
new_string = one_char + new_string
end
elsif style == 'by_word'
msg.each_char do |one_char|
if one_char != ' '
word+= one_char
else
new_string+= reverse_itti(word, 'by_character')
word=''
end
end
else
msg
end
new_string
end
describe "It should reverse sentences, letter by letter" do
it "reverses one word, e.g. 'rain' to 'niar'" do
reverse_itti('rain', 'by_character').should == 'niar'
end
it "reverses a sentence, e.g. 'its raining' to 'gniniar sti'" do
reverse_itti('its raining', 'by_character').should == 'gniniar sti'
end
it "reverses a sentence one word at a time, e.g. 'its raining' to 'sti gniniar'" do
reverse_itti('its raining', 'by_word').should == 'sti gniniar'
end
end
The problem is in this loop:
msg.each_char do |one_char|
if one_char != ' '
word+= one_char
else
new_string+= reverse_itti(word, 'by_character')
word=''
end
end
The else block reverses the current word and adds it to the output string, but it only runs when the loop encounters a space character. Since there is no space at the very end of the string, the last word is never added to the output. You can fix this by adding new_string+= reverse_itti(word, 'by_character') after the end of the loop.
Also, you probably want to add a space to the end of the output string in the else block, too.

What causes the "already initialized constant" warning?

What's wrong with my code? Is FileNameArray being reused?
f.rb:17: warning: already initialized constant FileNameArray
number = 0
while number < 99
number = number + 1
if number <= 9
numbers = "000" + number.to_s
elsif
numbers = "00" + number.to_s
end
files = Dir.glob("/home/product/" + numbers + "/*/*.txt")
files.each do |file_name|
File.open(file_name,"r:utf-8").each do | txt |
if txt =~ /http:\/\//
if txt =~ /static.abc.com/ or txt =~ /static0[1-9].abc.com/
elsif
$find = txt
FileNameArray = file_name.split('/')
f = File.open("error.txt", 'a+')
f.puts FileNameArray[8], txt , "\n"
f.close
end
end
end
end
end
You might be a ruby beginner, I tried to rewrite the same code in ruby way...
(1..99).each do |number|
Dir.glob("/home/product/" + ("%04d" % numbers) + "/*/*.txt").each do |file_name|
File.open(file_name,"r:utf-8").each do | txt |
next unless txt =~ /http:\/\//
next if txt =~ /static.abc.com/ || txt =~ /static0[1-9].abc.com/
$find = txt
file_name_array = file_name.split('/')
f = File.open("error.txt", 'a+')
f.puts file_name_array[8], txt , "\n"
f.close
end
end
end
Points to note down,
In ruby if you use a variable prefixed with $ symbol, it is taken as a global variable. So use $find, only if it is required.
In ruby a constant variable starts with capital letter, usually we are NOT supposed to change a constant value. This might have caused the error in your program.
(1..99) is a literal used to create instance of Range class, which returns values from 1 to 99
In Ruby variable name case matters. Local variables must start with a lower case character. Constants - with an upper case.
So, please try to rename FileNameArray to fileNameArray.
Also, glob takes advanced expressions that can save you one loop and a dozen of LOCs. In your case this expression should look something like:
Dir.glob("/home/product/00[0-9][0-9]/*/*.txt")

Capitalize first letter in Ruby with UTF-8 strings with exceptions

I would like to capitalize each word of a UTF-8 string. However, I need the function to ignore some special characters in the beginning of words, like "(-.,". The function will be used to capitalize song titles which can look like this:
marko, gabriel boni, simple jack - recall (original mix)
...would output:
Marko, Gabriel Boni, Simple Jack - Recall (Original Mix)
It should also be able to capitalize UTF-8 chars like "å" > "Å". "é" > "É".
Is there something why Unicode::capitalize method from unicode library does not suit your needs ?
irb(main):013:0> require 'unicode'
=> true
irb(main):014:0> begin Unicode::capitalize 'åäöéèí' rescue $stderr.print "unicode error\n" end
=> "Åäöéèí"
irb(main):015:0> begin Unicode::capitalize '-åäöéèí' rescue $stderr.print "unicode error\n" end
=> "-åäöéèí"
"åbc".mb_chars.capitalize
#=> "Åbc"
"ébc".mb_chars.capitalize.to_s
#=> "Ébc"
UPD
And to ignore none word chars:
string = "-åbc"
str = string.match(/^(\W*)(.*)/)
str[1] + str[2].mb_chars.capitalize.to_s
#=> "-Åbc"
I did this and wanted to filter a lot of things.
I created a constants file initializers/constants.rb
letters = ("a".."z").collect
numbers = ("1".."9").collect
symbols = %w[! # # $ % ^ & * ( ) _ - + = | \] { } : ; ' " ? / > . < , ]
FILTER = letters + numbers + symbols
And then just did a check to see if it was in my filter:
if !FILTER.include?(c)
#no
else
#yes
end
You can also check the value of the unicode but you need to know the range or specific values. I did this with chinese characters, so that is where I got my values. I will post some code just to give you an idea:
def check(char)
char = char.unpack('U*').first
if char >= 0x4E00 && char <= 0x9FFF
return true
end
if char >= 0x3400 && char <= 0x4DBF
return true
end
if char >= 0x20000 && char <= 0x2A6DF
return true
end
if char >= 0x2A700 && char <= 0x2B73F
return true
end
return false
end
You need to know the specific values here of course.

Syntax error, unexpected '='

I have the following as part of a class
def to_s
i = 0
first_line? = true
output = ''
#selections.each do | selection |
i += 1
if first_line?
output << selection.to_s(first_line?)
first_line? = false
else
output << selection.to_s
end
if i >= 5
output << "\r"
i = 0
else (output << " $ ")
end
end
return output
end
And i am getting the following syntax errors
SyntaxError: list2sel.rb:45: syntax error, unexpected '='
first_line? = true
^
list2sel.rb:47: syntax error, unexpected keyword_do_block, expecting keyword_end
#selections.each do | selection |
^
list2sel.rb:51: syntax error, unexpected '='
first_line? = false
^
What give, also thanks in advance, this is driving me nuts.
I suppose, you can't name variables with '?' at the end.
Variable names (with a few exceptions noted below) can only contain letters, numbers and the underscore. (Also, they must begin with a letter or the underscore; they can't begin with a number.) You can't use ? or ! in a variable name.
Beyond that rule, there is a strong convention in Ruby that a question mark at the end of something indicates a method that returns a boolean value:
4.nil? # => returns false....
So even if you could use it, a variable like first_line? would confuse (and then annoy) the hell out of Rubyists. They would expect it be a method testing whether something was the first line of something (whatever exactly that means in context).
Exceptions about variable names:
Global variables begin with $ - e.g., $stdin for standard input.
Instance variables begin with # - e.g. #name for an object
Class variables begin with ## - e.g. ##total for a class
I believe this is a more concise way of doing the above (untested):
def to_s
output = ""
#selections.each_with_index do | selection,line |
output << line==0 ? selection.to_s(true) and next : selection.to_s
output << line % 5 ? " $ " : "\r"
end
return output
end
If you are not a fan of the ternary operator (x ? y : z) then you can make them ifs:
def to_s
output = ""
#selections.each_with_index do | selection,line |
if line==0
output << selection.to_s(true)
else
output << selection.to_s
if line % 5
output << " $ "
else
output << "\r"
end
end
end
return output
end
Variable names allow non-ASCII letters, and there are non-ASCII versions of the question mark, so you can put question marks (and also some forms of space characters) into variable names.

Resources