Defining a Template in Flex - compilation

I want to define a "KEYER" in flex, which is a "KEY" in "[]". A "KEY" is starting with a letter and a string of letters, numbers and the following characters: "~_'?$. -".
I defind:
keyChar ([a-zA-z0-9~_'?$. \-])
letter ([a-zA-Z])
key ({letter}{keyChar}+)
keyer ("["{key}"]")
and:
<*>{keyer} print("KEYER");
Somehow the input:
[keyer1] [keyer2] [keyer 3]
is read as one KEYER and not three of them. what did I do wrong?

You wrote A-z instead of A-Z in the pattern for keyChar. [A-z] includes the characters between Z and a, which include brackets.
On the whole, it is better to avoid range expressions when not necessary. I would have written:
keyChar ([[:alnum:]~_'?$. -])
key ([[:alpha:]]{keyChar}+)
keyer ("["{key}"]")

Related

How to replace _A_&_B_ using gsub in R

I am trying to join two columns containing company names from two distinct data tables on R. In one column I have the pattern _A_&_B_ where A and B can be any letters. I would like to get rid of those two letters i.e letter of length 1 surrounded by _
So if I have John_K_&_E_Scott I would like to have John__&__Scott as I can remove the punctuation. I have tried the below
names[, JOINING_ID := gsub("[A-Za-z]_&_[A-Za-z]\\w", "", JOINING_ID)]
But this transforms John_A_&_ BOYS_ in John__&_ OYS_ which is not what I want.
Use the following regex pattern:
_[[:alpha:]]_&_[[:alpha:]]_
and replace with __&__. See the regex demo. It won't match strings like John_A_&_BOYS_ and thus there won't be issues like the one you are having.
Note that [[:alpha:]] matches any letter.
R usage:
gsub("_[[:alpha:]]_&_[[:alpha:]]_", "__&__", JOINING_ID)
Or, if you only expect 1 match per string, use sub:
sub("_[[:alpha:]]_&_[[:alpha:]]_", "__&__", JOINING_ID)

How does pack work in Ruby?

I am a tad confused about what I see here:
a = [ "a", "b", "c" ]
n = [ 65, 66, 67 ]
a.pack("A3A3A3") #=> "a b c "
a.pack("a3a3a3") #=> "a\000\000b\000\000c\000\000"
n.pack("ccc") #=> "ABC"
From the docs:
Packs the contents of arr into a binary sequence according to the directives in aTemplateString (see the table below) Directives “A,'' “a,'' and “Z'' may be followed by a count, which gives the width of the resulting field.
Here are the directives:
So we're using the A directive 3 times it seems? What does it mean to pack the string a into an arbitrary binary string (space padded, count is width?) Can you help me understand the output? Why are there so many 0s?
In the first case, you're printing "a" but padding its length to 3 with spaces, hence the two spaces to get the total length to 3.
In the second case, you're doing the same but padding with null bytes instead (ASCII value 0). Null bytes in Ruby are printed (and can be read) using the escape syntax \000 (this is one character), so \000\000 is actually just two null bytes.
The variable n is irrelevant, so you can ignore it.
In the pack statements, the bytes "a", "b" and "c" are concatenated ("packed") into a single string, with padding between them. The padding is such that the number of bytes (the width) taken up by the contents plus the padding equals the number provided.
So in the first pack statement, the "a" is padded with two spaces to make these three bytes: "a.." where I've put a . in place of the spaces to make it clear. That is concatenated with the "b" and the "c" similarly padded, to produce "a..b..c..".
In the second pack statement, null characters ('\000') are used instead of spaces. The \xxx notation (called an "escape sequence") means the byte with octal value xxx. It's used when there isn't a useful ASCII character (like 'a' or ' ') to show. A null character has no useful ASCII character, so the \xxx notation is used instead.

When using gsub with [a-zA-Z] in ruby

I've seen this [a-zA-Z] for the gsub method:
string.gsub(/[a-zA-Z]/,"-")
where it will find any lower case letters a-z and or uppercase letters A-Z.
My question is why does this a-z work back to back with A-Z : a-zA-Z ?
Where might I find more info on using [a-zA-Z] in ruby?
Inside a character class (the [] inside the regex), you can list all the characters you want :
/[abcdefg]/
to gain some space, you can define a range with an hyphen (-) and a letter on each side of the - :
/[a-g]/
Since it's clear that this range is from a to g, you could write another character directly after :
/[a-gm]/
You could also define another range :
/[a-gm-z]/
From the documentation :
A range can be followed by another range, so [abcdwxyz] is equivalent
to [a-dw-z]
Note that for your example, you could also use a case insensitive regex :
string.gsub(/[a-z]/i,"-")
Finally, you can use ranges with unicode characters :
arrows = /[\u2190-\u21FF]/
"a⇸b⇙c↺d↣e↝f".scan(arrows)
# => ["⇸", "⇙", "↺", "↣", "↝"]
I frequently use http://rubular.com/ as a reference
[a-zA-Z] Any single character in the range a-z or A-Z

What is an escape character in Ruby?

I would like to split lines which contains [ (bracket: []). However, when I type this as /[/ it is treated as comment.
You need to escape the [ char like /\[/.
I infer that you're using string.split, which can use a regex (the stuff between the / /) to indicate what delimiter character it will split the string into a list with.
Well, regexes use the [ and ] characters in a special way, to denote that such a group will match any of the characters inside.
[abc] => matches a, b, or c
Since you actually need to match the [ symbol literally, you need to escape it with the \ switch
So, write your split as:
string.split(/\[/)

Regular expression to match my pattern of words, wild chars

can you help me with this:
I want a regular expression for my Ruby program to match a word with the below pattern
Pattern has
List of letters ( For example. ABCC => 1 A, 1 B, 2 C )
N Wild Card Charaters ( N can be 0 or 1 or 2)
A fixed word (for example “XY”).
Rules:
Regarding the List of letters, it should match words with
a. 0 or 1 A
b. 0 or 1 B
c. 0 or 1 or 2 C
Based on the value of N, there can be 0 or 1 or 2 wild chars
Fixed word is always in the order it is given.
The combination of all these can be in any order and should match words like below
ABWXY ( if wild char = 1)
BAXY
CXYCB
But not words with 2 A’s or 2 B’s
I am using the pattern like ^[ABCC]*.XY$
But it looks for words with more than 1 A, or 1 B or 2 C's and also looks for words which end with XY, I want all words which have XY in any place and letters and wild chars in any postion.
If it HAS to be a regex, the following could be used:
if subject =~
/^ # start of string
(?!(?:[^A]*A){2}) # assert that there are less than two As
(?!(?:[^B]*B){2}) # and less than two Bs
(?!(?:[^C]*C){3}) # and less than three Cs
(?!(?:[ABCXY]*[^ABCXY]){3}) # and less than three non-ABCXY characters
(?=.*XY) # and that XY is contained in the string.
/x
# Successful match
else
# Match attempt failed
end
This assumes that none of the characters A, B, C, X, or Y are allowed as wildcards.
I consider myself to be fairly good with regular expressions and I can't think of a way to do what you're asking. Regular expressions look for patterns and what you seem to want is quite a few different patterns. It might be more appropriate to in your case to write a function which splits the string into characters and count what you have so you can satisfy your criteria.
Just to give an example of your problem, a regex like /[abc]/ will match every single occurrence of a, b and c regardless of how many times those letters appear in the string. You can try /c{1,2}/ and it will match "c", "cc", and "ccc". It matches the last case because you have a pattern of 1 c and 2 c's in "ccc".
One thing I have found invaluable when developing and debugging regular expressions is rubular.com. Try some examples and I think you'll see what you're up against.
I don't know if this is really any help but it might help you choose a direction.
You need to break out your pattern properly. In regexp terms, [ABCC] means "any one of A, B or C" where the duplicate C is ignored. It's a set operator, not a grouping operator like () is.
What you seem to be describing is creating a regexp based on parameters. You can do this by passing a string to Regexp.new and using the result.
An example is roughly:
def match_for_options(options)
pattern = '^'
pattern << 'A' * options[:a] if (options[:a])
pattern << 'B' * options[:b] if (options[:b])
pattern << 'C' * options[:c] if (options[:c])
Regexp.new(pattern)
end
You'd use it something like this:
if (match_for_options(:a => 1, :c => 2).match('ACC'))
# ...
end
Since you want to allow these "elements" to appear in any order, you might be better off writing a bit of Ruby code that goes through the string from beginning to end and counts the number of As, Bs, and Cs, finds whether it contains your desired substring. If the number of As, Bs, and Cs, is in your desired limits, and it contains the desired substring, and its length (i.e. the number of characters) is equal to the length of the desired substring, plus # of As, plus # of Bs, plus # of Cs, plus at most N characters more than that, then the string is good, otherwise it is bad. Actually, to be careful, you should first search for your desired substring and then remove it from the original string, then count # of As, Bs, and Cs, because otherwise you may unintentionally count the As, Bs, and Cs that appear in your desired string, if there are any there.
You can do what you want with a regular expression, but it would be a long ugly regular expression. Why? Because you would need a separate "case" in the regular expression for each of the possible orders of the elements. For example, the regular expression "^ABC..XY$" will match any string beginning with "ABC" and ending with "XY" and having two wild card characters in the middle. But only in that order. If you want a regular expression for all possible orders, you'd need to list all of those orders in the regular expression, e.g. it would begin something like "^(ABC..XY|ACB..XY|BAC..XY|BCA..XY|" and go on from there, with about 5! = 120 different orders for that list of 5 elements, then you'd need more for the cases where there was no A, then more for cases where there was no B, etc. I think a regular expression is the wrong tool for the job here.

Resources