Chaining array into new split function call - ruby

I have the following and am trying to split on '.' and then split the returned first part on '-' and return the last of the first part. I want to return 447.
a="cat-vm-447.json".split('.').split('-')
Also, how would I do this as a regular expression? I have this:
a="cat-vm-447.json".split(/-[\d]+./)
but this is splitting on the value. I want to return the number.
I can do this:
a="cat-vm-447.json".slice(/[\d]+/)
and this gives me back 447 but would really like to specify that the - and . surround it. Adding those in regex return them.

First question. Split returns an array, so you need to use Array#[] to get first(0) or last(-1) elements of this array. Alternatives is Array#first and Array#last methods.
a="cat-vm-447.json".split('.')[0].split('-')[-1] # => "447"
Second question. You can match your number into group and then get it from the response (it will have index 1. Item with index 0 will be full match ("-447." in your case). You can use String#[] or String#match (among others) methods to match your regex.
"cat-vm-447.json"[/-(\d+)\./, 1] # => "447"
# or
"cat-vm-447.json".match(/-(\d+)\./)[1] # => "447"

Split returns an array, so you need to specify the index for the next split.
a="cat-vm-447.json".split('.').first.split('-').last
For the regular expression, you need to wrap what you want to capture in parentheses.
/-(\d+)\./

a = "cat-vm-447.json"
b = a.match(/-(\d+)\./)
p b[0] # => 447

Try something like that:
if "cat-vm-447.json" =~ /([\d]+)/
p $1
else
p "No matches"
end
The parentheses in the regex extract the result in the $1 variable.

When you split your string second time, you actually trying to split Array instead of String.
ruby-1.9.3-head :003 > "cat-vm-447.json".split('.')
# => ["cat-vm-447", "json"]
In regexp case, you can use /[-.]/
ruby-1.9.3-head :008 > "cat-vm-447.json".split(/[-.]/)
# => ["cat", "vm", "447", "json"]
ruby-1.9.3-head :009 > "cat-vm-447.json".split(/[-.]/)[2]
# => "447"

Related

Why does /[<>]/ not return both angle brackets with String#match?

I expect this example to match the two characters <and >:
a = "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
a.match /[<>]/
# => #<MatchData "<">
It matches only the first character. Why?
#match only returns the first match as you have seen as MatchData, #scan will return all matches.
>> a="<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
=> "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>"
>> a.scan /[<>]/
=> ["<", ">"]
Problem
You are misunderstanding your expression. /[<>]/ means:
Match a single character from the character class, which may be either < or >.
Ruby is correctly giving you exactly what you've asked for in your pattern.
Solution
If you're expecting the entire string between the two characters, you need a different pattern. For example:
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".match /<.*?>/
#=> #<MatchData "<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>">
Alternatively, if you just want to match all the instances of < or > in your string, then you should use String#scan with a character class or alternation. In this particular case, the results will be identical either way. For example:
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".scan /<|>/
#=> ["<", ">"]
"<1acf457f477b41d4a363e89a1d0a6e57#Open-Xchange>".scan /[<>]/
#=> ["<", ">"]

Drop elements from array if regexp does not match

Is there a way to do this?
I have an array:
["file_1.jar", "file_2.jar","file_3.pom"]
And I want to keep only "file_3.pom", what I want to do is something like this:
array.drop_while{|f| /.pom/.match(f)}
But This way I keep everything in array but "file_3.pom" is there a way to do something like "not_match"?
I found these:
f !~ /.pom/ # => leaves all elements in array
OR
f !~ /*.pom/ # => leaves all elements in array
But none of those returns what I expect.
How about select?
selected = array.select { |f| /.pom/.match(f) }
p selected
# => ["file_3.pom"]
Hope that helps!
In your case you can use the Enumerable#grep method to get an array of the elements that matches a pattern:
["file_1.jar", "file_2.jar", "file_3.pom"].grep(/\.pom\z/)
# => ["file_3.pom"]
As you can see I've also slightly modified your regular expression to actually match only strings that ends with .pom:
\. matches a literal dot, without the \ it matches any character
\z anchor the pattern to the end of the string, without it the pattern would match .pom everywhere in the string.
Since you are searching for a literal string you can also avoid regular expression altogether, for example using the methods String#end_with? and Array#select:
["file_1.jar", "file_2.jar", "file_3.pom"].select { |s| s.end_with?('.pom') }
# => ["file_3.pom"]
If you whant to keep only Strings witch responds on regexp so you can use Ruby method keep_if.
But this methods "destroy" main Array.
a = ["file_1.jar", "file_2.jar","file_3.pom"]
a.keep_if{|file_name| /.pom/.match(file_name)}
p a
# => ["file_3.pom"]

regex for a pattern at end of string

I have a string which looks like:
hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0
Through regex I want to get the string after last '/' and until end of line i.e. in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0
I tried this - ^(.+)\/(.+)$ which returns me an array of which first object is "hello/world" and 2nd object is "1.9.2-some-text"
Is there a way to just get "1.9.2-some-text" as the output?
Try using a negative character class ([^…]) like this:
[^\/]+$
This will match one or more of any character other than / followed by the end of the string.
You can use a negated match here.
'hello/world/1.9.2-some-text'.match(Regexp.new('[^/]+$'))
# => "1.9.2-some-text"
Meaning any character except: / (1 or more times) followed by the end of the string.
Although, the simplest way would be to split the string.
'hello/world/1.9.2-some-text'.split('/').last
# => "1.9.2-some-text"
OR
'hello/world/1.9.2-some-text'.split('/')[-1]
# => "1.9.2-some-text"
If you do not need to use a regex, the ordinary way of doing such thing is:
File.basename("hello/world/1.9.2-some-text")
#=> "1.9.2-some-text"
This is one way:
s = 'hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0'
s.lines.map { |l| l[/.*\/(.*)/,1] }
#=> ["1.9.2-some-text", "2.0.2-some-text", "2.11.0"]
You said, "in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0". That's neither a string nor an array, so I assumed you wanted an array. If you want a string, tack .join(', ') onto the end.
Regex's are naturally "greedy", so .*\/ will match all characters up to and including the last / in each line. 1 returns the contents of the capture group (.*) (capture group 1).

Splitting string based on word

I have a string composed by words divided by'#'. For instance 'this#is#an#example' and I need to extract the last word or the last two words according to the second to last word.
If the second to last is 'myword' I need the last two words otherwise just the last one.
'this#is#an#example' => 'example'
'this#is#an#example#using#myword#also' => 'myword#also'
Is there a better way than splitting and checking the second to last? perhaps using regular expression?
Thanks.
You can use the end-of-line anchor $ and make the myword# prefix optional:
str = 'this#is#an#example'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "example"
str = 'this#is#an#example#using#myword#also'
str[/(?:#)((myword#)?[^#]+)$/, 1]
#=> "myword#also"
However, I don't think using a regular expression is "better" in this case. I would use something like Santosh's (deleted) answer: split the line by # and use an if clause.
def foo(str)
*, a, b = str.split('#')
if a == 'myword'
"#{a}##{b}"
else
b
end
end
str = 'this#is#an#example#using#myword#also'
array = str.split('#')
array[-2] == 'myword' ? array[-2..-1].join('#') : array[-1]
With regex:
'this#is#an#example'[/(myword\#)*\w+$/]
# => "example"
'this#is#an#example#using#myword#also'[/(myword\#)*\w+$/]
# => "myword#also"

Two strings evaluated by regex, but one of the scan results are being put into an extra array?

I can't figure out what I'm doing different in the below example. I have two string which in my perspective are similar - plain strings. For each string I have a regex, but the first regex, /\*Hi (.*) \*,/, gives me a result where the regex match is presented in 2 arrays: [["result"]]. I need my result to be presented in just 1 array: ["result"]. What am I doing differently in the 2 below examples?
✗ irb
2.0.0p247 :001 > name_line_1 = "*Hi Peter Parker *,"
=> "*Hi Peter Parker *,"
2.0.0p247 :002 > name_line_1.scan(/\*Hi (.*) \*,/)
=> [["Peter Parker"]]
2.0.0p247 :003 > name_line_2 = "Peter Parker<br />Memory Lane 60<br />0000 Gotham<br />USA<br />TEL:: 00000000000<br />peter#parker.com<br />\r"
=> "Peter Parker<br />Memory Lane 60<br />0000 Gotham<br />USA<br />TEL:: 00000000000<br />peter#parker.com<br />\r"
2.0.0p247 :004 > name_line_2.scan(/^[^<]*/)
=> ["Peter Parker"]
scan returns an array of matches. As the other answers point out, if your regex has capturing groups (parentheses), that means each match will return an array, with one string for each capturing group within the match.
If it didn't do this, scan wouldn't be very useful, as it is very common to use capturing groups in a regex to pick out different parts of the match.
I suspect that scan is not really the best method for your situation. scan is useful when you want to get all the matches from a string. But in the string you show, there is only one match anyways. If you want to get a specific capturing group from the first match in a string, the easiest way is:
string[/regex/, 1] # extract the first capturing group, or nil if there is no match
Another way is to do something like this:
if string =~ /regex/
# $1 will contain the first capturing group from the first match
Or:
if match = string.match(/regex/)
# match[1] will contain the first capturing group
If you really want to get all matches in the string, and need to use a capturing group (or feel it's more readable than using lookahead and lookbehind, which it is):
string.scan(/regex/) do |match|
# do something with match[0]
end
Or:
string.scan(/regex/).map(&:first)
Its because you are capturing the name in name_line_1 using parentheses. This causes the scan method to return an array of arrays. If you absolutely must return a 1 dimensional array, you can use forward and backward checking like so:
/(?<=\*Hi ).*(?= \*,)/
Or, if you find that too confusing, you could always just call .flatten on the resulting array ;-)
The difference is that, in the first regex, you have captured substring (). When a regex matches, the whole match is captured as $&, and in addition to that, you can capture parts of it as many as you want by using (). They will be captured as $1, $2, ...
And scan behaves differently depending whether you have $1, $2, ... When you don't, then it returns an array of all $&s. When you do have $1, $2, ..., then it returns an array of [$1, $2, ...].
In order to avoid $1 in the first regex, you have to avoid using captured substring:

Resources