ruby global variable dollar sign semicolon ($;) regex equivilant - ruby

I have a string "\nbed.bed_id,\nbed.bed_label,\nbed.room_id,\nbed.pool_bed, nbed.record_state\n"and I need to split it by white space and comma.
I tried split(/,?\s+/) which works but also leaves a "" at the beginning.
Using split($;) doesn't. What I'm looking for is say split(/,?$;/) is there a way to retain the default functionality and just add to it?
(p.s I know I can do this split[1..-1], there are so many ways to do things in ruby).
update:
My issue was with $; I wasn't sure really what it was and thought it had special meaning, because as a variable irb>$; #=> nil. Now it may be that I missed it, or that the documentation has been updated but, ruby-doc.org says "If pattern is omitted, the value of $; is used. If $; is nil (which is the default), str is split on whitespace as if ` ‘ were specified."
As well, $; is from Perl or awk, known as the SUBSEP, and a further explanation as to why the beginning is stripped away with $; is here Why is split(' ') trying to be (too) smart?

You can't avoid split() from returning some empty elements at the start or end in this case?
Try rejecting empty strings from the array:
string.split(/,?\s+/).reject &:empty?

With using split u can do it
str = "\nbed.bed_id,\nbed.bed_label,\nbed.room_id,\nbed.pool_bed, nbed.record_state\n"
st = str.split(/,?\s+/)
st.shift
st

Related

What is this regex replacing?

I have this line in a Ruby file loading program:
row_hash.map{|k,v| v.gsub!(/\A"|"\Z/, '').try(:strip!) if !v.nil? }
I remember adding it, though the reason escapes me. I know that \A and \Z are the start and end of a string, respectively.
I've written regexes intermittently for 15 years, but the "|" is what's really mystifying me?
It strips quotes from strings.
This regex suffers from leaning toothpick syndrome. We can ease that by using %r, balanced delimiters, and extended formatting to ignore whitespace.
%r{ \A" | "\Z }x;
It matches a quote at the beginning of the string, or one at the end (or just before a newline).
So looking at it all together...
v.gsub!( %r{ \A" | "\Z }x;, '' ).try(:strip) if !v.nil?
The gsub! will apply the match until it doesn't match anymore. So it will match quotes at the beginning and end of v and replace them with nothing, all in place to v. The end result is v is stripped of beginning and ending quotes.
Then there's the blah.try(:strip). That's a Rails extension which is roughly equivalent to...
blah.strip if blah
Since gsub! will return null if the match fails, that means it will strip v only if it was in quotes. It will do it after the quotes have been stripped and it will only do it if there were quotes. I suspect this is not the intended behavior.
However, strip doesn't alter v in place so probably does nothing unless you're using the return value of map which would make this even more complicated. You probably want try(:strip!).
Finally if !v.nil? means all that will only happen if v wasn't nil. Putting it at the end of an already complicated statement makes things even harder to understand.
This is a bit over-complicated as one line. It would be better if the nil check was done separate and the whole thing properly spaced out. I've also decided to use an if condition instead of try to make it more obvious the stripping only happens if the gsub matches, I don't think that's the desired behavior and want it to be really obvious to anyone reading it.
row_hash.map { |_,v|
next if v.nil?
if v.gsub!( %r{ \A" | "\Z }x;, '' )
v.strip!
end
}
Finally, since the behavior is really specific and finicky (and probably subtly wrong) the inner portion should be turned into a method so it can be named, documented and tested.
row_hash.map { |_,v| v.strip_quotes! }
It replaces the quote character at the start and end of the string. It ignores other occurrences of the character. Here's a sample of how the regex works.
http://rubular.com/r/pVMbQ9aqSl
"|" does not mean that the pipe is quoted. It basically matches \A" (start of the string followed by " ) or "\Z ( " followed by end of the string)
Let me know if this helps.

ruby remove variable length string from regular expression leaving hyphen

I have a string such as this: "im# -33.870816,151.203654"
I want to extract the two numbers including the hyphen.
I tried this:
mystring = "im# -33.870816,151.203654"
/\D*(\-*\d+\.\d+),(\-*\d+\.\d+)/.match(mystring)
This gives me:
33.870816,151.203654
How do I get the hyphen?
I need to do this in ruby
Edit: I should clarify, the "im# " was just an example, there can be any set of characters before the numbers. the numbers are mostly well formed with the comma. I was having trouble with the hyphen (-)
Edit2: Note that the two nos are lattidue, longitude. That pattern is mostly fixed. However, in theory, the preceding string can be arbitrary. I don't expect it to have nos. or hyphen, but you never know.
How about this?
arr = "im# -33.2222,151.200".split(/[, ]/)[1..-1]
and arr is ["-33.2222", "151.200"], (using the split method).
now
arr[0].to_f is -33.2222 and arr[1].to_f is 151.2
EDIT: stripped "im#" part with [1..-1] as suggested in comments.
EDIT2: also, this work regardless of what the first characters are.
If you want to capture the two numbers with the hyphen you can use this regex:
> str = "im# -33.870816,151.203654"
> str.match(/([\d.,-]+)/).captures
=> ["33.870816,151.203654"]
Edit: now it captures hyphen.
This one captures each number separetely: http://rubular.com/r/NNP2OTEdiL
Note: Using String#scan will match all ocurrences of given pattern, in this case
> str.scan /\b\s?([-\d.]+)/
=> [["-33.870816"], ["151.203654"]] # Good, but flattened version is better
> str.scan(/\b\s?([-\d.]+)/).flatten
=> ["-33.870816", "151.203654"]
I recommend you playing around a little with Rubular. There's also some docs about regegular expressions with Ruby:
http://www.ruby-doc.org/docs/ProgrammingRuby/html/language.html#UJ
http://www.regular-expressions.info/ruby.html
http://www.ruby-doc.org/core-1.9.3/Regexp.html
Your regex doesn't work because the hyphen is caught by \D, so you have to modify it to catch only the right set of characters.
[^0-9-]* would be a good option.

Variable Declaration Regex

I'm trying to make a simple Ruby regex to detect a JavaScript Declaration, but it fails.
Regex:
lines.each do |line|
unminifiedvar = /var [0-9a-zA-Z] = [0-9];/.match(line)
next if unminifiedvar == nil #no variable declarations on the line
#...
end
Testing Line:
var testvariable10 = 9;
A variable name can have more than one character, so you need a + after the character-set [...]. (Also, JS variable names can contain other characters besides alphanumerics.) A numeric literal can have more than one character, so you want a + on the RHS too.
More importantly, though, there are lots of other bits of flexibility that you'll find more painful to process with a regular expression. For instance, consider var x = 1+2+3; or var myString = "foo bar baz";. A variable declaration may span several lines. It need not end with a semicolon. It may have comments in the middle of it. And so on. Regular expressions are not really the right tool for this job.
Of course, it may happen that you're parsing code from a particular source with a very special structure and can guarantee that every declaration has the particular form you're looking for. In that case, go ahead, but if there's any danger that the nature of the code you're processing might change then you're going to be facing a painful problem that really isn't designed to be solved with regular expressions.
[EDITED about a day after writing, to fix a mistake kindly pointed out by "the Tin Man".]
You forgot the +, as in, more than one character for the variable name.
var [0-9a-zA-Z]+ = [0-9];
You may also want to add a + after the [0-9]. That way it can match multiple digits.
var [0-9a-zA-Z]+ = [0-9]+;
http://rubular.com/r/kPlNcGRaHA
Try /var [0-9a-zA-Z]+ = \d+;/
Without the +, [0-9a-zA-Z] will only match a single alphanumeric character. With +, it can match 1 or more alphanumeric characters.
By the way, to make it more robust, you may want to make it match any number of spaces between the tokens, not just exactly one space each. You may also want to make the semicolon at the end optional (because Javascript syntax doesn't require a semicolon). You might also want to make it always match against the whole line, not just a part of the line. That would be:
/\Avar\s+[0-9a-zA-Z]+\s*=\s*\d+;?\Z/
(There is a way to write [0-9a-zA-Z] more concisely, but it has slipped my memory; if someone else knows, feel free to edit this answer.)

Replacing partial regex matches in place with Ruby

I want to transform the following text
This is a ![foto](foto.jpeg), here is another ![foto](foto.png)
into
This is a ![foto](/folder1/foto.jpeg), here is another ![foto](/folder2/foto.png)
In other words I want to find all the image paths that are enclosed between brackets (the text is in Markdown syntax) and replace them with other paths. The string containing the new path is returned by a separate real_path function.
I would like to do this using String#gsub in its block version. Currently my code looks like this:
re = /!\[.*?\]\((.*?)\)/
rel_content = content.gsub(re) do |path|
real_path(path)
end
The problem with this regex is that it will match ![foto](foto.jpeg) instead of just foto.jpeg. I also tried other regexen like (?>\!\[.*?\]\()(.*?)(?>\)) but to no avail.
My current workaround is to split the path and reassemble it later.
Is there a Ruby regex that matches only the path inside the brackets and not all the contextual required characters?
Post-answers update: The main problem here is that Ruby's regexen have no way to specify zero-width lookbehinds. The most generic solution is to group what the part of regexp before and the one after the real matching part, i.e. /(pre)(matching-part)(post)/, and reconstruct the full string afterwards.
In this case the solution would be
re = /(!\[.*?\]\()(.*?)(\))/
rel_content = content.gsub(re) do
$1 + real_path($2) + $3
end
A quick solution (adjust as necessary):
s = 'This is a ![foto](foto.jpeg)'
s.sub!(/!(\[.*?\])\((.*?)\)/, '\1(/folder1/\2)' )
p s # This is a [foto](/folder1/foto.jpeg)
You can always do it in two steps - first extract the whole image expression out and then second replace the link:
str = "This is a ![foto](foto.jpeg), here is another ![foto](foto.png)"
str.gsub(/\!\[[^\]]*\]\(([^)]*)\)/) do |image|
image.gsub(/(?<=\()(.*)(?=\))/) do |link|
"/a/new/path/" + link
end
end
#=> "This is a ![foto](/a/new/path/foto.jpeg), here is another ![foto](/a/new/path/foto.png)"
I changed the first regex a bit, but you can use the same one you had before in its place. image is the image expression like ![foto](foto.jpeg), and link is just the path like foto.jpeg.
[EDIT] Clarification: Ruby does have lookbehinds (and they are used in my answer):
You can create lookbehinds with (?<=regex) for positive and (?<!regex) for negative, where regex is an arbitrary regex expression subject to the following condition. Regexp expressions in lookbehinds they have to be fixed width due to limitations on the regex implementation, which means that they can't include expressions with an unknown number of repetitions or alternations with different-width choices. If you try to do that, you'll get an error. (The restriction doesn't apply to lookaheads though).
In your case, the [foto] part has a variable width (foto can be any string) so it can't go into a lookbehind due to the above. However, lookbehind is exactly what we need since it's a zero-width match, and we take advantage of that in the second regex which only needs to worry about (fixed-length) compulsory open parentheses.
Obviously you can put real_path in from here, but I just wanted a test-able example.
I think that this approach is more flexible and more readable than reconstructing the string through the match group variables
In your block, use $1 to access the first capture group ($2 for the second and so on).
From the documentation:
In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.
As a side note, some people think '\1' inappropriate for situations where an unconfirmed number of characters are matched. For example, if you want to match and modify the middle content, how can you protect the characters on both sides?
It's easy. Put a bracket around something else.
For example, I hope replace a-ruby-porgramming-book-531070.png to a-ruby-porgramming-book.png. Remove context between last "-" and last ".".
I can use /.*(-.*?)\./ match -531070. Now how should I replace it? Notice
everything else does not have a definite format.
The answer is to put brackets around something else, then protect them:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1.')
# => "a-ruby-porgramming-book.png"
If you want add something before matched content, you can use:
"a-ruby-porgramming-book-531070.png".sub(/(.*)(-.*?)\./, '\1-2019\2.')
# => "a-ruby-porgramming-book-2019-531070.png"

How can I remove the string "\n" from within a Ruby string?

I have this string:
"some text\nandsomemore"
I need to remove the "\n" from it. I've tried
"some text\nandsomemore".gsub('\n','')
but it doesn't work. How do I do it? Thanks for reading.
You need to use "\n" not '\n' in your gsub. The different quote marks behave differently.
Double quotes " allow character expansion and expression interpolation ie. they let you use escaped control chars like \n to represent their true value, in this case, newline, and allow the use of #{expression} so you can weave variables and, well, pretty much any ruby expression you like into the text.
While on the other hand, single quotes ' treat the string literally, so there's no expansion, replacement, interpolation or what have you.
In this particular case, it's better to use either the .delete or .tr String method to delete the newlines.
See here for more info
If you want or don't mind having all the leading and trailing whitespace from your string removed you can use the strip method.
" hello ".strip #=> "hello"
"\tgoodbye\r\n".strip #=> "goodbye"
as mentioned here.
edit The original title for this question was different. My answer is for the original question.
When you want to remove a string, rather than replace it you can use String#delete (or its mutator equivalent String#delete!), e.g.:
x = "foo\nfoo"
x.delete!("\n")
x now equals "foofoo"
In this specific case String#delete is more readable than gsub since you are not actually replacing the string with anything.
You don't need a regex for this. Use tr:
"some text\nandsomemore".tr("\n","")
use chomp or strip functions from Ruby:
"abcd\n".chomp => "abcd"
"abcd\n".strip => "abcd"

Resources