Remove characters from right side until a special character is reached - ruby

I have a variable that represents a path:
path = "/foo/bar"
I want to remove the last part of the path. I tried it with gsub! like this:
path.gsub!("/bar","")
but I also want to throw an error if "/bar" isn't at the end of the string. I also tried path.split("/"), but this seems not very memory efficient. The method is called a lot, so an in-place approach would be perfect. Another variation would be to only remove every string until "/" is hit, without throwing the error.
What would be a fast and memory efficient method to do this?

You could use a Regexp to match only at the end of the string:
'/foo/bar'.gsub!(/\/bar\z/, '')
#=> '/foo'
Since gsub! returns nil if there wasn't a match, just combine it with raising an error:
'/foo/blub'.gsub!(/\/bar\z/, '') || raise(StandardError)
#=> StandardError: StandardError

To get what you want, you can do:
File.dirname("/foo/bar")
# => "/foo"
To raise an error is a different thing:
raise unless "/foo/bar".end_with?("/bar")

You could use String#rindex to find the last occurrence of / and use that value to get the preceding sub-string:
path[0, path.rindex("/")]

Related

How to chain methods in Ruby?

I need to do many methods on lnk.href to get the f_name.
I want to write the code that way, but it gave me
undefined method `gsub!' for nil:NilClass (NoMethodError)
If I don't want to write these in one line (as it's hard to read), what's a better way in Ruby?
f_name = lnk.href.split('/').last
.gsub!(/[(]+/, "_")
.gsub!(/[)]+/, "_")
String#gsub! returns nil if there's no match:
'1'.gsub!(/2/, '_')
# => nil
'1'.gsub!(/2/, '_').gsub!(/1/, '_')
# NoMethodError: undefined method `gsub!' for nil:NilClass
# from (irb):6
# from C:/Ruby200-x64/bin/irb:12:in `<main>'
Replace gsub! with gsub will probably solve your problem:
'1'.gsub(/2/, '_')
# => "1"
'1'.gsub(/2/, '_').gsub(/1/, '_')
# => "_"
This is not due to the syntax but only because some methods return nil in your case.
So, if I understand correctly, you are trying to write some of the methods you want to chain on a new line to make it more readable? You can't break a method call down to a new line all by itself, as you've discovered, since now Ruby thinks you are trying to call a method on nothing.
I'm operating on the assumption that what you are trying to accomplish is something like this:
lnk = Some Link
f_name = lnk.href.split('/').last.gsub!(/[(]+/, "_").gsub!(/[)]+/, "_")
So, you're trying to find and break up links in HTML by splitting them at the /, then pulling out and manipulating the last part of the URL.
There are a couple things you can do to make this more readable and logical. One is to fix your regex. Because you're using .gsub!, you are already searching for all matching occurrences, so the + is unnecessary. You can also combine your two calls to .gsub! into one like this: .gsub!(/[()]/, "_")
That will match and substitute all occurrences of either the open or close paren, making your chain one method shorter.
For the rest, I would suggest breaking this into two steps at the logical place: between creating a data structure, and manipulating the data within it. First, create the array of substrings:
f_name = lnk.href.split('/')
Then manipulate the data from that array:
manipulated_substring = f_name.last.gsub!(/[()]/, "_")
That will make your code more readable, and keep your data intact!

how can i consolidate this expression?

I'm removing the initial "The" and spaces of band names for concatenating into a url.
I have this, but it's ugly and I'd like to consolidate into one expression.
#artist.sub!(/[Tt]he/, '')
#artist.gsub!(/\s+/, '')
Try:
#artist.gsub!(/(\A[Tt]he)|(\s+)/, '')
You can of course chain #sub and #gsub expressions; e.g.,
#artist = #artist.sub(/^[Tt]he/, '').gsub(/\s+/, '')
Any more compact and I would hesitate to call it elegant—just clever (and unclear).
Note the use of #sub and #gsub instead of #sub! and #gsub!. Per #pguardiario's comment, the second two will return nil if there is no match, causing a NoMethodError exception. Also, note that this has an anchor to prevent "The" from being removed from the middle of the string.
If you're trying to create a slug for use in URLs, you might be better going with a method in a library.
I'd go with:
#artist = #artist.sub(/\Athe\b/i, '').strip

Optimising ruby regexp -- lots of match groups

I'm working on a ruby baser lexer. To improve performance, I joined up all tokens' regexps into one big regexp with match group names. The resulting regexp looks like:
/\A(?<__anonymous_-1038694222803470993>(?-mix:\n+))|\A(?<__anonymous_-1394418499721420065>(?-mix:\/\/[\A\n]*))|\A(?<__anonymous_3077187815313752157>(?-mix:include\s+"[\A"]+"))|\A(?<LET>(?-mix:let\s))|\A(?<IN>(?-mix:in\s))|\A(?<CLASS>(?-mix:class\s))|\A(?<DEF>(?-mix:def\s))|\A(?<DEFM>(?-mix:defm\s))|\A(?<MULTICLASS>(?-mix:multiclass\s))|\A(?<FUNCNAME>(?-mix:![a-zA-Z_][a-zA-Z0-9_]*))|\A(?<ID>(?-mix:[a-zA-Z_][a-zA-Z0-9_]*))|\A(?<STRING>(?-mix:"[\A"]*"))|\A(?<NUMBER>(?-mix:[0-9]+))/
I'm matching it to my string producing a MatchData where exactly one token is parsed:
bigregex =~ "\n ... garbage"
puts $~.inspect
Which outputs
#<MatchData
"\n"
__anonymous_-1038694222803470993:"\n"
__anonymous_-1394418499721420065:nil
__anonymous_3077187815313752157:nil
LET:nil
IN:nil
CLASS:nil
DEF:nil
DEFM:nil
MULTICLASS:nil
FUNCNAME:nil
ID:nil
STRING:nil
NUMBER:nil>
So, the regex actually matched the "\n" part. Now, I need to figure the match group where it belongs (it's clearly visible from #inspect output that it's _anonymous-1038694222803470993, but I need to get it programmatically).
I could not find any option other than iterating over #names:
m.names.each do |n|
if m[n]
type = n.to_sym
resolved_type = (n.start_with?('__anonymous_') ? nil : type)
val = m[n]
break
end
end
which verifies that the match group did have a match.
The problem here is that it's slow (I spend about 10% of time in the loop; also 8% grabbing the #input[#pos..-1] to make sure that \A works as expected to match start of string (I do not discard input, just shift the #pos in it).
You can check the full code at GH repo.
Any ideas on how to make it at least a bit faster? Is there any option to figure the "successful" match group easier?
You can do this using the regexp methods .captures() and .names():
matching_string = "\n ...garbage" # or whatever this really is in your code
#input = matching_string.match bigregex # bigregex = your regex
arr = #input.captures
arr.each_with_index do |value, index|
if not value.nil?
the_name_you_want = #input.names[index]
end
end
Or if you expect multiple successful values, you could do:
success_names_arr = []
success_names_arr.push(#input.names[index]) #within the above loop
Pretty similar to your original idea, but if you're looking for efficiency .captures() method should help with that.
I may have misunderstood this completely but but I'm assuming that all but one token is not nil and that's the one your after?
If so then, depending on the flavour of regex you're using, you could use a negative lookahead to check for a non-nil value
([^\n:]+:(?!nil)[^\n\>]+)
This will match the whole token ie NAME:value.

Ruby: check if object is nil

def parse( line )
_, remote_addr, status, request, size, referrer, http_user_agent, http_x_forwarded_for = /^([^\s]+) - (\d+) \"(.+)\" (\d+) \"(.*)\" \"([^\"]*)\" \"(.*)\"/.match(line).to_a
print line
print request
if request && request != nil
_, referrer_host, referrer_url = /^http[s]?:\/\/([^\/]+)(\/.*)/.match(referrer).to_a if referrer
method, full_url, _ = request.split(' ')
in parse: private method 'split' called for nil:NilClass (NoMethodError)
So as i understand it's calling split not on a string, but on nil.
This part is parsing web server log. But I can't understand why it's getting nil. As I understand it's null.
Some of the subpatterns in regex failed? So it's the webserver's fault, which sometimes generates wrong logging strings?
By the way how do I write to file in ruby? I can't read properly in this cmd window under windows.
You seem to have a few questions here, so I'll take a stab at what seems to be the main one:
If you want to see if something is nil, just use .nil? - so in your example, you can just say request.nil?, which returns true if it is nil and false otherwise.
Ruby 2.3.0 added a safe navigation operator (&.) that checks for nil before calling a method.
request&.split(' ')
This is functionally* equivalent to
!request.nil? && request.split(' ')
*(They are slightly different. When request is nil, the top expression evaluates to nil, while the bottom expression evaluates to false.)
To write to a file:
File.open("file.txt", "w") do |file|
file.puts "whatever"
end
As I write in a comment above - you didn't say what is nil. Also, check whether referrer contains what you think it contains. EDIT I see it's request that is nil. Obviously, regexp trouble.
Use rubular.com to easily test your regexp. Copy a line from your input file into "Your test string", and your regexp into "Your regular expression", and tweak until you get a highlight in "Match result".
Also, what are "wrong logging strings"? If we're talking Apache, log format is configurable.

Capture arbitrary string before either '/' or end of string

Suppose I have:
foo/fhqwhgads
foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar
And I want to replace everything that follows 'foo/' up until I either reach '/' or, if '/' is never reached, then up to the end of the line. For the first part I can use a non-capturing group like this:
(?<=foo\/).+
And that's where I get stuck. I could match to the second '/' like this:
(?<=foo\/).+(?=\/)
That doesn't help for the first case though. Desired output is:
foo/blah
foo/blah/bar
I'm using Ruby.
Try this regex:
/(?<=foo\/)[^\/]+/
Implementing #Endophage's answer:
def fix_post_foo_portion(string)
portions = string.split("/")
index_to_replace = portions.index("foo") + 1
portions[index_to_replace ] = "blah"
portions.join("/")
end
strings = %w{foo/fhqwhgads foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar}
strings.each {|string| puts fix_post_foo_portion(string)}
I'm not a ruby dev but is there some equivalent of php's explode() so you could explode the string, insert a new item at the second array index then implode the parts with / again... Of course you can match on the first array element if you only want to do the switch in certain cases.
['foo/fhqwhgads', 'foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar'].each do |s|
puts s.sub(%r|^(foo/)[^/]+(/.*)?|, '\1blah\2')
end
Output:
foo/blah
foo/blah/bar
I'm too tired to think of a nicer way to do it but I'm sure there is one.
Checking for the end-of-string anchor -- $ -- as well as the / character should do the trick. You'll also need to make the .+ non-greedy by changing it to .+? since the greedy version will always match right up to the end of the string, given the chance.
(?<=foo\/).+?(?=\/|$)

Resources