Ruby: %q with strings [case] - ruby

I encountered this line:
at = #seq.slice(#seq.length - 2, 2).count(%q[at])
where #seq is a string. I know how slice and %q work, but I don't get the idea of putting a variable at (which we define here) as an argument of [] after %q.

It is a very verbose code.
#seq.length - 2 gives the index of the second to last character in #seq.
#seq.slice(#seq.length - 2, 2) gives the last two characters in #seq.
Applying count(%q[at]) to it returns the number of occurrences of characters in %q[at] (i.e., "at") in it, which counts "a" and "t". Since there are only two characters, it would be either 0, 1, or 2.

%q with paired delimiters are similar to the single quoted strings. In other words, %q[at], or %q!at!, or %q{at}, are all equivalent to 'at'.
%q[at]
# => "at"
P.S, %Q works similarly, but like double quoted strings.

Related

How does pack work in Ruby?

I am a tad confused about what I see here:
a = [ "a", "b", "c" ]
n = [ 65, 66, 67 ]
a.pack("A3A3A3") #=> "a b c "
a.pack("a3a3a3") #=> "a\000\000b\000\000c\000\000"
n.pack("ccc") #=> "ABC"
From the docs:
Packs the contents of arr into a binary sequence according to the directives in aTemplateString (see the table below) Directives “A,'' “a,'' and “Z'' may be followed by a count, which gives the width of the resulting field.
Here are the directives:
So we're using the A directive 3 times it seems? What does it mean to pack the string a into an arbitrary binary string (space padded, count is width?) Can you help me understand the output? Why are there so many 0s?
In the first case, you're printing "a" but padding its length to 3 with spaces, hence the two spaces to get the total length to 3.
In the second case, you're doing the same but padding with null bytes instead (ASCII value 0). Null bytes in Ruby are printed (and can be read) using the escape syntax \000 (this is one character), so \000\000 is actually just two null bytes.
The variable n is irrelevant, so you can ignore it.
In the pack statements, the bytes "a", "b" and "c" are concatenated ("packed") into a single string, with padding between them. The padding is such that the number of bytes (the width) taken up by the contents plus the padding equals the number provided.
So in the first pack statement, the "a" is padded with two spaces to make these three bytes: "a.." where I've put a . in place of the spaces to make it clear. That is concatenated with the "b" and the "c" similarly padded, to produce "a..b..c..".
In the second pack statement, null characters ('\000') are used instead of spaces. The \xxx notation (called an "escape sequence") means the byte with octal value xxx. It's used when there isn't a useful ASCII character (like 'a' or ' ') to show. A null character has no useful ASCII character, so the \xxx notation is used instead.

Regex out elements of server name in format of "ubuntu-prod-sfo1-01"

Trying to extract out the individual elements of a server name in the format "ubuntu-prod-sfo1-01" which would give me the result of ["ubuntu","prod", "sfo1", "01"]. So everything between "-" AND the beginning element and ending element when end and start with "-", respectfully.
"ubuntu-prod-sfo1-01" ==> ["ubuntu","prod", "sfo1", "01"]
My attempts at it have failed as the best solution I could find would give me the first "prod", but would fail to get each of the remaining elements. The problem seems to be the reusing of the '-' between the element.
You can split a string into substrings at a delimiter, in your case -.
string = "ubuntu-prod-sfo1-01"
string.split('-')
=> ["ubuntu", "prod", "sf01", "01"]
From the official documentation
Divides str into substrings based on a delimiter, returning an array of these substrings.
If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.
You don't need a regex for that, you can just do...
"ubuntu-prod-sfo1-01".split('-')
=> ["ubuntu","prod", "sfo1", "01"]

Splitting the content of brackets without separating the brackets ruby

I am currently working on a ruby program to calculate terms. It works perfectly fine except for one thing: brackets. I need to filter the content or at least, to put the content into an array, but I have tried for an hour to come up with a solution. Here is my code:
splitted = term.split(/\(+|\)+/)
I need an array instead of the brackets, for example:
"1-(2+3)" #=>["1", "-", ["2", "+", "3"]]
I already tried this:
/(\((?<=.*)\))/
but it returned:
Invalid pattern in look-behind.
Can someone help me with this?
UPDATE
I forgot to mention, that my program will split the term, I only need the content of the brackets to be an array.
If you need to keep track of the hierarchy of parentheses with arrays, you won't manage it just with regular expressions. You'll need to parse the string word by word, and keep a stack of expressions.
Pseudocode:
Expressions = new stack
Add new array on stack
while word in string:
if word is "(": Add new array on stack
Else if word is ")": Remove the last array from the stack and add it to the (next) last array of the stack
Else: Add the word to the last array of the stack
When exiting the loop, there should be only one array in the stack (if not, you have inconsistent opening/closing parentheses).
Note: If your ultimate goal is to evaluate the expression, you could save time and parse the string in Postfix aka Reverse-Polish Notation.
Also consider using off-the-shelf libraries.
A solution depends on the pattern you expect between the parentheses, which you have not specified. (For example, for "(st12uv)" you might want ["st", "12", "uv"], ["st12", "uv"], ["st1", "2uv"] and so on). If, as in your example, it is a natural number followed by a +, followed by another natural number, you could do this:
str = "1-( 2+ 3)"
r = /
\(\s* # match a left parenthesis followed by >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
(\+) # match a plus sign in a capture group
\s* # match >= 0 whitespace chars
(\d+) # match one or more digits in a capture group
\s* # match >= 0 whitespace chars
\) # match a right parenthesis
/x
str.scan(r0).first
=> ["2", "+", "3"]
Suppose instead + could be +, -, * or /. Then you could change:
(\+)
to:
([-+*\/])
Note that, in a character class, + needn't be escaped and - needn't be escaped if it is the first or last character of the class (as in those cases it would not signify a range).
Incidentally, you received the error message, "Invalid pattern in look-behind" because Ruby's lookarounds cannot contain variable-length matches (i.e., .*). With positive lookbehinds you can get around that by using \K instead. For example,
r = /
\d+ # match one or more digits
\K # forget everything previously matched
[a-z]+ # match one or more lowercase letters
/x
"123abc"[r] #=> "abc"

Complex requirements for string split around select commas

TL;DR
I need some help making a regex that will match any commas in a string that are side by side with unlimited white space around them and between them. The commas and their surrounding white space cannot be within matching single quotes or double quotes. I then need to capture the non-whitespace values from around those commas and count how many of those commas there are.
The values captured from around the commas will become their own values in the final array, while the commas that were counted will become nil values that are added to the final array.
Explanation of the problem:
This is a pretty complex problem so any help is greatly appreciated. I'm adding functionality to a library I've been using for a while now. I have this string that contains an array
"['d,og,f:asdf,:hello,",,\",,alsee',,,'ho,la', "-123,4,5.3", true, :good, false,,, "gr\'\'\'true,\',\'ee\"n", ":::testme", true]"
I would like to split this string only around select commas so that I have an array containing the following values
'd,og,f:asdf,:hello,",,\",,alsee'
nil
nil
'ho,la'
"-123,4,5.3"
true
:good
false
nil
nil
"gr\'\'\'true,\',\'ee\"n"
":::testme"
true
Then nil values are coming from the side by side commas that are not contained in any string. I wrote the following regex to split the string above (I already got rid of the start and end brackets):
/(?<=(?:['\"]|false|true|^|,)),(?=(?:\s*(?:(?::[\w]+)|(?:(?::?(?:\"[\s\S]*\")|(?:'[\s\S]*'))|(?:false|true)))\s*(?:,|$)))/
This splits the string so I get these values:
(0) "'d,og,f:asdf,:hello,",,\",,alsee',,"
(1) "'ho,la'"
(2) " "-123,4,5.3""
(3) " true"
(4) " :good, false,,"
(5) " "gr\'\'\'true,\',\'ee\"n""
(6) " ":::testme""
(7) " true"
All the values are strings as can be seen by their surrounding double quotes. They will not all end up that way though. A true or false will be converted to a boolean. The values surrounded by internal quotes will end up as strings. Then a value preceded with a : will end up as a symbol.
There are problems with the values at index 0 and 4. Index 0 should be this:
(0.0) "'d,og,f:asdf,:hello,",,\",,alsee'"
(0.1) nil
(0.2) nil
As you can see, the two commas at the end are gone. They have become the two nil values you see above. Then the string starts at the first single quote and ends at the last single quote, signifying that this value in the array is a string.
Then index 4 (" :good, false,,") should be this:
(4.0) " :good"
(4.1) " false"
(4.2) nil
(4.3) nil
The two commas at the end have become nil. Then " false" is it's own value which will later be converted to a boolean, while " :good" is also it's own value and will later be converted to a symbol.
To fix the problem with index 4 I have all the values run through a second regex. Here it is:
/^(\s*:(?:(?:[\w]+|\"[\s\S]+\"|'[\s\S]+')\s*)),([\s\S]*)$/
Instead of splitting this one I get the capture groups. It ends up returning this array for the value at index 4:
(4.0) " :good"
(4.1) " false,,"
That's what I wanted except for one problem. The value at index 4.1 (" false,,") has the two trailing commas which should be nil values in the array.
I need some help making a regex that will match any commas in a string that are side by side with unlimited white space around them and between them. The commas and their surrounding white space cannot be within matching single quotes or double quotes. I then need to capture the non-whitespace values from around those commas and count how many of those commas there are.
The values captured from around the commas will become their own values in the final array, while the commas that were counted will become nil values that are added to the final array.
"['d,og,f:asdf,:hello,"
,,\
",,alsee',,,'ho,la', "
-123,4,5.3
", true, :good, false,,, "
gr\
'\'
I count 4 strings. 3 in double quotes and the last one in single quotes?
You say this is broken down into smaller strings by your regx. But what about the characters outside the 4 strings?
Sorry, it looks a bit of a mess.
Try putting it all in a here document string and then breaking it down by a regx.
I finally figured it out myself. You can see how it fits in with the rest if you look at the description of the question above.
/^(([\s]*,)*)[\s]*((?::[\w]+)|(?::?(?:\"[\s\S]*\")|(?:'[\s\S]*')|false|true))?(([\s]*,)*)$/

Splitting with empty space in Ruby [duplicate]

This question already has an answer here:
How do I avoid trailing empty items being removed when splitting strings?
(1 answer)
Closed 8 years ago.
In both Ruby and JavaScript I can write expression " x ".split(/[ ]+/)
. In JavaScript I get somehow reasonable result ["", "x", ""], but in Ruby (2.0.0) i get ["", "x"], which is for me quite counterintuitive. I have problems to understand how regular expressions works in Ruby. Why don't I get the same result as in JavaScript or just ["x"]?
From string#split documentation, emphasis my own:
split(pattern=$;, [limit])
If pattern is a String, then its contents are used as the delimiter when splitting str. If pattern is a single space, str is split on whitespace, with leading whitespace and runs of contiguous whitespace characters ignored.
If pattern is a Regexp, str is divided where the pattern matches. Whenever the pattern matches a zero-length string, str is split into individual characters. If pattern contains groups, the respective matches will be returned in the array as well.
If pattern is omitted, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ` ' were specified.
If the limit parameter is omitted, trailing null fields are suppressed. If limit is a positive number, at most that number of fields will be returned (if limit is 1, the entire string is returned as the only entry in an array). If negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed.
So if you were to use " x ".split(/[ ]+/, -1) you would get your expected result of ["", "x", ""]
*edited to reflect Wayne's comment
I found this in the C code for String#split, almost right at the end:
if (NIL_P(limit) && lim == 0) {
long len;
while ((len = RARRAY_LEN(result)) > 0 &&
(tmp = RARRAY_AREF(result, len-1), RSTRING_LEN(tmp) == 0))
rb_ary_pop(result);
}
So it actually pops empty strings off the end of the result array before returning! It looks like the creators of Ruby didn't want String#split to return a bunch of empty strings.
Notice the check for NIL_P(limit) -- this accords exactly with what the documentation says, as #dax pointed out.

Resources