Regex that matches valid Ruby local variable names - ruby

Does anyone know the rules for valid Ruby variable names? Can it be matched using a RegEx?
UPDATE: This is what I could come up with so far:
^[_a-z][a-zA-Z0-9_]+$
Does this seem right?

Identifiers are pretty straightforward. They begin with letters or an underscore, and contain letters, underscore and numbers. Local variables can't (or shouldn't?) begin with an uppercase letter, so you could just use a regex like this.
/^[a-z_][a-zA-Z_0-9]*$/

It's possible for variable names to be unicode letters, in which case most of the existing regexes don't match.
varname = "\u2211" # => "∑"
eval(varname + '= "Tony the Pony"') => "Tony the Pony"
puts varname # => ∑
local_variable_identifier = /Insert large regular expression here/
varname =~ local_variable_identifier # => nil
See also "Fun with Unicode" in either the Ruby 1.9 Pickaxe or at Fun with Unicode.

According to http://rubylearning.com/satishtalim/ruby_names.html a Ruby variable consists of:
A name is an uppercase letter,
lowercase letter, or an underscore
("_"), followed by Name characters
(this is any combination of upper- and
lowercase letters, underscore and
digits).
In addition, global variables begin with a dollar sign, instance variables with a single at-sign, and class variables with two at-signs.
A regular expression to match all that would be:
%r{
(\$|#{1,2})? # optional leading punctuation
[A-Za-z_] # at least one upper case, lower case, or underscore
[A-Za-z0-9_]* # optional characters (including digits)
}x
Hope that helps.

I like #aboutruby's answer, but just to complete it, here's the equivalent using POSIX bracket expressions.
/^[_[:lower:]][_[:alnum:]]*$/
Or, since a-z is actually shorter than [:lower:]:
/^[_a-z][_[:alnum:]]*$/

I think /^(\$){0,1}[_a-zA-Z][a-zA-Z0-9_]*([?!]){0,1}$/ is a bit closer to what you will need...
It depends on whether you want to match method names as well.
If you are trying to match a name that might be encountered in an expression, then it might start with $ and it might end with ? or !. If you know for sure that it is just a local variable then the rule will be much simpler.

i was trying to figure one out for a rails patch, and Matthew Draper wrote this one, using the ruby parser as a reference:
/\A(?![A-Z0-9])(?:[[:alnum:]_]|[^\0-\177])+\z/

And here it is, straight from the horse's mouth. (The horse in this case is the Draft ISO Ruby Specification):
local-variable-identifier → ( lowercase-character | _ ) identifier-character *
identifier-character → lowercase-character | uppercase-character | decimal-digit | _
uppercase-character → A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z
lowercase-character → a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z
decimal-digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
In Ruby 1.9, using named groups, you can translate this literally:
local_variable_identifier = %r{
(?<uppercase_character> A | B | C | D | E | F | G | H | I | J | K | L | M
| N | O | P | Q | R | S | T | U | V | W | X | Y | Z
){0}
(?<lowercase_character> a | b | c | d | e | f | g | h | i | j | k | l | m
| n | o | p | q | r | s | t | u | v | w | x | y | z
){0}
(?<decimal_digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9){0}
(?<identifier_character> \g<lowercase_character>
| \g<uppercase_character>
| \g<decimal_digit>
| _
){0}
( \g<lowercase_character> | _ ) \g<identifier_character>*
}x
Of course, this is not how you would really write it.

Related

Ruby adding empty strings to hash for CSV spacing

I have:
hash = {"1"=>["A", "B", "C", ... "Z"], "2"=>["B", "C"], "3"=>["A", "C"]
My goal is to use hash as a source for creating a CSV with columns whose names are a letter of the alphabet and with rows hash(key) = 1,2,3 etc.
I created an array of all hash.values.unshift("")values that serve as row 1 (columns labels).
desired output:
| A | B | C | ... | Z |
1| A | B | C | ... | Z |
2| | B | C | ....... |
3| A | | C | ....... |
Creating CSV:
CSV.open("groups.csv", 'w') do |csv|
csv << row1
hash.each do |v|
csv << v.flatten
end
end
This makes the CSV look almost what I want but There is no spacing to get columns to align.
Any advice on how to make a method for modifying my hash that compares my all [A-Z] against each subsequent hash key (rows) to insert empty strings to provide spacing?
Can Class CSV do it better?
Something like this?
require 'csv'
ALPHA = ('A'..'Z').to_a.freeze
hash={"1"=>ALPHA, "2"=>["B", "C"], "3"=>["A", "C"]}
csv = CSV.generate("", col_sep: "|") do |csv|
csv << [" "] + ALPHA # header
hash.each do |k, v|
alphabet = ALPHA.map { |el| [el, 0] }.to_h
v.each { |el| alphabet[el] += 1 }
csv << [k, *alphabet.map { |k, val| val == 1 ? k : " " }]
end
end
csv.split("\n").each { |row| puts row }
output:
|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
1|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z
2| |B|C| | | | | | | | | | | | | | | | | | | | | | |
3|A| |C| | | | | | | | | | | | | | | | | | | | | | |
If your values are truly single characters and don't need the CSV escaping, then I recommend bypassing CSV altogether and building the string in plain Ruby.
Assuming you want to align your lines correctly regardless of the number of digits in the row number (e.g. 1, 10, and 100), you can use printf style formatting to guarantee horizontal aligment (assuming your row number width never exceeds the value of ROWNUM_WIDTH).
By the way, I changed the hash's keys to integers, hope that's ok.
#!/usr/bin/env ruby
FIELDS = ('A'..'Z').to_a
DATA = { 1 => FIELDS, 2 => %w(B C), 3 => %w(A C) }
ROWNUM_WIDTH = 3
output = ' ' * ROWNUM_WIDTH + " | #{FIELDS.join(' | ')} |\n"
DATA.each do |rownum, values|
line = "%*d | " % [ROWNUM_WIDTH, rownum]
FIELDS.each do |field|
char = values.include?(field) ? field : ' '
line << "#{char} | "
end
output << line << "\n"
end
puts output
=begin
Outputs:
| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
1 | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |
2 | | B | C | | | | | | | | | | | | | | | | | | | | | | | |
3 | A | | C | | | | | | | | | | | | | | | | | | | | | | | |
=end
all = [*?A..?Z]
hash = {"1"=>[*?A..?Z], "2"=>["B", "C"], "3"=>["A", "C"]}
hash.map do |k, v|
[k, *all.map { |k| v.include?(k) ? k : ' ' }]
end.unshift([' ', *all]).
map { |row| row.join('|') }
#⇒ [
# [0] " |A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z",
# [1] "1|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z",
# [2] "2| |B|C| | | | | | | | | | | | | | | | | | | | | | | ",
# [3] "3|A| |C| | | | | | | | | | | | | | | | | | | | | | | "
# ]

EBNF Syntax for imaginary language

In school we have been studying metalanguages, in particular, railroad diagrams and EBNF. I received a question where an imaginary programming language (winston) was described in EBNF. Here it is:
Digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
LCase = a | b | c | d
UCase = A | B | C | D | E | F | G | H | I | J
Operator = + | - | * | /
Logical = < | > | <= | >= | <>
Constant = [-] <Digit>{<Digit>}
Identifier = <UCase>{<LCase> | <Digit>}
Assignment = Set <Identifier> to <Constant> | <Identifier
{<Operator>(<Constant> | <Identifier>)}
Condition = <Identifier> <Logical> (<Identifier> | <Constant>)
{(and | or) <Identifier> <Logical> (<Identifier> | <Constant>)}
When = (<Assignment> | <Condition> {<Assignment> | <Condition>})
Statement = <Input> | <Output> | <Assignment> | <Condition> | <When> | <Pretest> | <Posttest>
Program = Start <Statement> {! <Statement>} Stop
The program written below was made with winston but doesn't execute properly. Use the EBNF descriptions to identify the error.
Start
Input J1
Input J2
When (J1 = J2, Set A3 to 0), (J1 < J2, Set A3 to -1), Set A3 to 1
Output A3
Stop
My working so far: To me, this program seems legitimate. It is a program so if must start with "start" and end with "stop", which it does. The statements in the middle seemingly are allowed to be in there. Can someone point me in the right direction?
Also, can someone tell me what it means in the EBNF description of a program what this means:<statement>
I think it means statements like when and if but Im not too sure. Thanks for the help :)
When is comma-separated and the grammar doesn't specify commas at all.
J1 = J2 -- there is no = comparison op in the grammar (see Logical), so J1 = J2 is neither Assignment, nor Condition and is thus invalid.
<statement> -- the grammar wraps symbols in angle brackets on the right-hand side, e.g. Identifier on the left-hand and, later, <Identifier> in Assignment rule -- that doesn't look like a valid EBNF.

Infinite loop nested within a block in Ruby

Here's what I'm trying to do. I'm iterating through an array of strings, each of which may contain [] multiple times. I want to use String#match as many times as necessary to process every occurrence. So I have an Array#each block, and nested within that is an infinite loop that should break only when I run out of matches for a given string.
def follow(c, dir)
case c
when "\\"
dir.rotate!
when "/"
dir.rotate!.map! {|x| -x}
end
return dir
end
def parse(ar)
ar.each_with_index { |s, i|
idx = 0
while true do
t = s.match(/\[[ ]*([a-zA-Z]+)[ ]*\]/) { |m|
idx = m.end 0
r = m[1]
s[m.begin(0)...idx] = " "*m[0].size
a = [i, idx]
dir = [0, 1]
c = ar[a[0]][a[1]]
while !c.nil? do
dir = follow c, dir
ar[a[0]][a[1]] = " "
a[0] += dir[0]; a[1] += dir[1]
c = ar[a[0]][a[1]]
if c == ">" then
ar[a[0]][a[1]+1] = r; c=nil
elsif c == "<" then
ar[a[0]][a[1]-r.size] = r; c=nil
end
end
ar[a[0]][a[1]] = " "
puts ar
}
if t == nil then break; end
end
}
parse File.new("test", "r").to_a
Contents of test:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
/--> /----> | |
| |[kale]/ | hot dogs | Brazil |
| | <----------------------\ |
| | orange |[yellow]\ | [green]/ |
| +--------+--------|-+-------------+
\-------------------/
Goal:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| green |
| orange | | |
+--------+-------- -+-------------+
Actual output of program:
+--------+----------+-------------+
| Colors | Foods | Countries |
+--------+----------+-------------+
| red | pizza | Switzerland |
yellow kale | |
| | hot dogs | Brazil |
| <----------------------\ |
| orange | | [green]/ |
+--------+-------- -+-------------+
(Since the array is modified in place, I figure there is no need to update the match index.) The inner loop should run again for each pair of [] I find, but instead I only get one turn per array entry. (It seems that the t=match... bit isn't the problem, but I could be wrong.) How can I fix this?

Print all possible path using haskell

I need to write a program to draw all possible paths in a given matrix that can be had by moving in only left, right and up direction.
One should not cross the same location more than once. Note also that on a particular path, we may or may not use motion in all possible directions.
Path will start in the bottom-left corner in the matrix and will reach the top-right corner.
Following symbols are used to denote the direction of the motion in the current position:
+---+
| > | right
+---+
+---+
| ^ | up
+---+
+---+
| < | left
+---+
The symbol * is used in the final location to indicate end of path.
Example:
For a 5x8 matrix, using left, right and up directions, 2 different paths are shown below.
Path 1:
+---+---+---+---+---+---+---+---+
| | | | | | | | * |
+---+---+---+---+---+---+---+---+
| | | > | > | > | > | > | ^ |
+---+---+---+---+---+---+---+---+
| | | ^ | < | < | | | |
+---+---+---+---+---+---+---+---+
| | > | > | > | ^ | | | |
+---+---+---+---+---+---+---+---+
| > | ^ | | | | | | |
+---+---+---+---+---+---+---+---+
Path 2
+---+---+---+---+---+---+---+---+
| | | | > | > | > | > | * |
+---+---+---+---+---+---+---+---+
| | | | ^ | < | < | | |
+---+---+---+---+---+---+---+---+
| | | | | | ^ | | |
+---+---+---+---+---+---+---+---+
| | | > | > | > | ^ | | |
+---+---+---+---+---+---+---+---+
| > | > | ^ | | | | | |
+---+---+---+---+---+---+---+---+
Can anyone help me with this?
I tried to solve using lists. It i soon realized that i am making a disaster. Here is the code i tried with.
solution x y = travel (1,1) (x,y)
travelRight (x,y) = zip [1..x] [1,1..] ++ [(x,y)]
travelUp (x,y) = zip [1,1..] [1..y] ++ [(x,y)]
minPaths = [[(1,1),(2,1),(2,2)],[(1,1),(1,2),(2,2)]]
travel startpos (x,y) = rt (x,y) ++ up (x,y)
rt (x,y) | odd y = map (++[(x,y)]) (furtherRight (3,2) (x,2) minPaths)
| otherwise = furtherRight (3,2) (x,2) minPaths
up (x,y) | odd x = map (++[(x,y)]) (furtherUp (2,3) (2,y) minPaths)
| otherwise = furtherUp (2,3) (2,y) minPaths
furtherRight currpos endpos paths | currpos == endpos = (travelRight currpos) : map (++[currpos]) paths
| otherwise = furtherRight (nextRight currpos) endpos ((travelRight currpos) : (map (++[currpos]) paths))
nextRight (x,y) = (x+1,y)
furtherUp currpos endpos paths | currpos == endpos = (travelUp currpos) : map (++[currpos]) paths
| otherwise = furtherUp (nextUp currpos) endpos ((travelUp currpos) : (map(++[currpos]) paths))
nextUp (x,y) = (x,y+1)
identify lst = map (map iden) lst
iden (x,y) = (x,y,1)
arrows lst = map mydir lst
mydir (ele:[]) = "*"
mydir ((x1,y1):(x2,y2):lst) | x1==x2 = '>' : mydir ((x2,y2):lst)
| otherwise = '^' : mydir ((x2,y2):lst)
surroundBox lst = map (map createBox) lst
bar = "+ -+"
mid x = "| "++ [x] ++" |"
createBox chr = bar ++ "\n" ++ mid chr ++ "\n" ++ bar ++ "\n"
This ASCII grids are much more confusing than enlightening. Let me describe a better way to represent each possible path.
Each non-top row will have exactly one cell with UP. I claim that once each of the UP cells has been chosen that the LEFT and RIGHT and EMPTY cells can be determined. I claim that all possible cells in each of the non-top rows can be UP in all combination.
Each path is thus isomorphic to a (rows-1) length list of numbers in the range (1..columns) that determine the UP cells. The number of allowed paths is thus columns^(rows-1) and enumerating the possible paths in this format should be easy.
Then you could make a printer that converts this format to the ASCII art. This may be annoying, depending on skill level.
Looks like a homework so I will try to give enough hints
Try first filling number of paths from a cell to your goal.
So
+---+---+---+---+---+---+---+---+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | * |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
The thing to note here is from the cell in the top level there will always be one path to the *.
Number of possible path from cells in the same row will be same. You can realize this as all the paths will ultimately have to move up as there is no down action so in any path the cell above the current row can be reached by any cell in the current row.
You can feel the all possible paths from the current cell has its relation with the possible paths from the cell left,right and above. But as we know we can find all possible paths from only one cell in a row and rest of cells' possible paths will be some movements in the same row followed by a suffix of possible paths from that cell.
Maybe I will give you a example
+---+---+---+
| 1 | 1 | * |
+---+---+---+
| | | |
+---+---+---+
| | | |
+---+---+---+
You know all possible paths from cells in the first row. You need to find the same in the second row. So a good strategy would be to do it for the right most cell
+---+---+---+
| > | > | * |
+---+---+---+
| ^ | < | < |
+---+---+---+
| | | |
+---+---+---+
+---+---+---+
| | > | * |
+---+---+---+
| | ^ | < |
+---+---+---+
| | | |
+---+---+---+
+---+---+---+
| | | * |
+---+---+---+
| | | ^ |
+---+---+---+
| | | |
+---+---+---+
Now finding this for rest of the cells in the same row is trivial using these as I have told before.
In the end if you have m X n matrix the number of paths from bottom-left corner to top-right corner will be n^(m-1).
Another way
This way is not very optimal but easy to implement. Consider m X n grid
Find the path of longest length. You dont need the exact path just the number of <,>,^.
You can find the direct formula in terms of m and n.
Like
^ = m - 1
< = (n-1) * floor((m-1)/2)
> = (n-1) * (floor((m-1)/2) + 1)
Any valid path will be a prefix of the permutations of this which you can search exhaustively. Use permutations from Data.List to get all possible permutations. Then make a function which given a path strips a valid path from this. map this over the list of permutations and remove duplicates. The thing to note is path will be a prefix of what you get from permutation, so there can be several permutations for the same path.
Can you create that matrix and define the "fields"? Even if you can't (a specific matrix is given), you can map an [(Int, Int)] matrix (which sounds reasonable for this kind of task) to your own representation.
Since you didn't specify what your skill level was, I hope you don't mind that I suggest that you first try to create some kind of a grid in order to have something to work on:
data Status = Free | Left | Right | Up
deriving (Read, Show, Eq)
type Position = (Int, Int)
type Field = (Position, Status)
type Grid = [Field]
grid :: Grid
grid = [((x, y), stat) | x <- [1..10], y <- [1..10], let stat = Free]
Of course there are other ways to achieve this. Afterwards you can define some movement, map Position to Grid index and Statuses to printable characters... Try to fiddle with it and you might get some ideas.

Data management with several variables

Currently I am facing the following problem, which I'm working in Stata to solve. I have added the algorithm tag, because it's mainly the steps that I'm interested in rather than the Stata code.
I have some variables, say, var1 - var20 that can possibly contain a string. I am only interested in some of these strings, let us call them A,B,C,D,E,F, but other strings can occur also (all of these will be denoted X). Also I have a unique identifier ID. A part of the data could look like this:
ID | var1 | var2 | var3 | .. | var20
1 | E | | | | X
1 | | A | | | C
2 | X | F | A | |
8 | | | | | E
Now I want to create an entry for every ID and for every occurrence of one of the strings A,B,C,E,D,F in any of the variables. The above data should look like this:
ID | var1 | var2 | var3 | .. | var20
1 | E | | | .. |
1 | | A | | |
1 | | | | | C
2 | | F | | |
2 | | | A | |
8 | | | | | E
Here we ignore every time there's a string X that is NOT A,B,C,D,E or F. My attempt so far was to create a variable that for each entry counts the number, N, of occurrences of A,B,C,D,E,F. In the original data above that variable would be N=1,2,2,1. Then for each entry I create N duplicates of this. This results in the data:
ID | var1 | var2 | var3 | .. | var20
1 | E | | | | X
1 | | A | | | C
1 | | A | | | C
2 | X | F | A | |
2 | X | F | A | |
8 | | | | | E
My problem is how do I attack this problem from here? And sorry for the poor title, but I couldn't word it any more specific.
Sorry, I thought the finally block was your desired output (now I understand that it's what you've accomplished so far). You can get the middle block with two calls to reshape (long, then wide).
First I'll generate data to match yours.
clear
set obs 4
* ids
generate n = _n
generate id = 1 in 1/2
replace id = 2 in 3
replace id = 8 in 4
* generate your variables
forvalues i = 1/20 {
generate var`i' = ""
}
replace var1 = "E" in 1
replace var1 = "X" in 3
replace var2 = "A" in 2
replace var2 = "F" in 3
replace var3 = "A" in 3
replace var20 = "X" in 1
replace var20 = "C" in 2
replace var20 = "E" in 4
Now the two calls to reshape.
* reshape to long, keep only desired obs, then reshape to wide
reshape long var, i(n id) string
keep if inlist(var, "A", "B", "C", "D", "E", "F")
tempvar long_id
generate int `long_id' = _n
reshape wide var, i(`long_id') string
The first reshape converts your data from wide to long. The var specifies that the variables you want to reshape to long all start with var. The i(n id) specifies that each unique combination of n and i is a unique observation. The reshape call provides one observation for each n-id combination for each of your var1 through var20 variables. So now there are 4*20=80 observations. Then I keep only the strings that you'd like to keep with inlist().
For the second reshape call var specifies that the values you're reshaping are in variable var and that you'll use this as the prefix. You wanted one row per remaining letter, so I made a new index (that has no real meaning in the end) that becomes the i index for the second reshape call (if I used n-id as the unique observation, then we'd end up back where we started, but with only the good strings). The j index remains from the first reshape call (variable _j) so the reshape already knows what suffix to give to each var.
These two reshape calls yield:
. list n id var1 var2 var3 var20
+-------------------------------------+
| n id var1 var2 var3 var20 |
|-------------------------------------|
1. | 1 1 E |
2. | 2 1 A |
3. | 2 1 C |
4. | 3 2 F |
5. | 3 2 A |
|-------------------------------------|
6. | 4 8 E |
+-------------------------------------+
You can easily add back variables that don't survive the two reshapes.
* if you need to add back dropped variables
forvalues i =1/20 {
capture confirm variable var`i'
if _rc {
generate var`i' = ""
}
}

Resources