Capture arbitrary string before either '/' or end of string - ruby

Suppose I have:
foo/fhqwhgads
foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar
And I want to replace everything that follows 'foo/' up until I either reach '/' or, if '/' is never reached, then up to the end of the line. For the first part I can use a non-capturing group like this:
(?<=foo\/).+
And that's where I get stuck. I could match to the second '/' like this:
(?<=foo\/).+(?=\/)
That doesn't help for the first case though. Desired output is:
foo/blah
foo/blah/bar
I'm using Ruby.

Try this regex:
/(?<=foo\/)[^\/]+/

Implementing #Endophage's answer:
def fix_post_foo_portion(string)
portions = string.split("/")
index_to_replace = portions.index("foo") + 1
portions[index_to_replace ] = "blah"
portions.join("/")
end
strings = %w{foo/fhqwhgads foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar}
strings.each {|string| puts fix_post_foo_portion(string)}

I'm not a ruby dev but is there some equivalent of php's explode() so you could explode the string, insert a new item at the second array index then implode the parts with / again... Of course you can match on the first array element if you only want to do the switch in certain cases.

['foo/fhqwhgads', 'foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar'].each do |s|
puts s.sub(%r|^(foo/)[^/]+(/.*)?|, '\1blah\2')
end
Output:
foo/blah
foo/blah/bar
I'm too tired to think of a nicer way to do it but I'm sure there is one.

Checking for the end-of-string anchor -- $ -- as well as the / character should do the trick. You'll also need to make the .+ non-greedy by changing it to .+? since the greedy version will always match right up to the end of the string, given the chance.
(?<=foo\/).+?(?=\/|$)

Related

regex to get all slashes from url

I have the following URL:
localhost:3000/filter/shoes/color/white
I need to replace all slashes to - except the first slash from localhost:3000/.
The final URL must be:
localhost:3000/filter-shoes-color-white
I've tried some regex with ruby but I didn't have any success.
Thanks.
Here is a regexp that match all the / but the first:
\G(?:\A[^\/]*\/)?+[^\/]*\K\/
So you can do:
"localhost:3000/filter/shoes/color/white".gsub(/\G(?:\A[^\/]*\/)?+[^\/]*\K\//,'-')
#=> "localhost:3000/filter-shoes-color-white"
But it won't work if you have a scheme on your URI.
TL;DR:
regex is:
\/(?<!localhost:3000\/)
Longer one
A famous old Chinese saying is: Teaching how to fishing is better than giving you the fish.
For regex, you can use online regex site such as regex101.com to test immediately with your regex and test string. link
Found other answers from stackoverflow using other key words to describe your situation: Regex for matching something if it is not preceded by something else
Make you own magic.
This is a pretty simple parsing problem, so I question the need for a regular expression. I think the code would probably be easier to understand and maintain if you just iterated through the characters of the string with a loop like this:
def transform(url)
url = url.dup
slash_count = 0
(0...url.size).each do |i|
if url[i] == '/'
slash_count += 1
url[i] = '-' if slash_count >= 2
end
end
url
end
Here is something even simpler using Ruby's String#gsub method:
def transform2(url)
slash_count = 0
url.gsub('/') do
slash_count += 1
slash_count >= 2 ? '-' : '/'
end
end
Using Ruby >= 2.7 with String#partition
Provided you aren't passing in a URI scheme like 'https://' as part of your string, you can do this as a single method chain with String#partition and String#tr. Using Ruby 3.0.2
'localhost:3000/filter-shoes-color-white'.partition(?/).
map { _1.match?(/^\/$/) ? _1 : _1.tr(?/, ?-) }.join
#=> "localhost:3000/filter-shoes-color-white"
This basically relies on the fact that there are no forward slashes in the first array element returned by #partition, and the second element contains a slash and nothing else. You are then free to use #tr to replace forward slashes with dashes in the final element.
If you have an older Ruby, you'll need a different solution since String#partition wasn't introduced before Ruby 2.6.1. If you don't like using character literals, ternary operators, or numbered block arguments (introduced in Ruby 2.7), then you can refactor the solution to suit your own stylistic tastes.
And another way of doing it. No regex and "localhost" lookback needed.
[url.split("/").take(2).join("/"),url.split("/").drop(2).join("-")].join("-")

How to extract part of the string which comes after given substring?

For example I have url string like:
https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj
From this string I need to extract number 1234 which comes after subfolder/. I tried with gsub but no luck. Any help would be appreciated.
Suppose your url is saved in a variable called url.
Then the following should return 1234
url.match(/subfolder\/(\d*)/)[1]
Explanation:
url.match(/ # call the match function which takes a regex
subfolder\/ # search for the first appearance of the string 'subfolder/'
# note: we must escape the `/` so we don't end the regex early
(\d*) # match any number of digits in a capture group,
/)[1] # close the regex and return the first capture group
lwassink has the right idea, but it can be done more simply. If subfolder is always the same:
url = "https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj"
url[/subfolder\/\K\d+/]
# => "1234"
The \K discards the matched text up to that point, so only "1234" is returned.
If you want to get the number after any subfolder, and the domain name is always the same, you might do this instead:
url[%r{amazonaws\.com/[^/]+/\K\d+}]
# => "1234"
s.split('/')[4]
Add a .to_i at the end if you like.
Or, to key it on a substring like you asked for...
a = s.split '/'
a[a.find_index('subfolder') + 1]
Or, to do it as a one-liner I suppose you could:
s.split('/').tap { |a| #i = 1 + a.find_index('subfolder')}[#i]
Or, since I am a damaged individual, I would actually write that:
s.split('/').tap { |a| #i = 1 + (a.find_index 'subfolder')}[#i]
url = 'http://abc/xyz'
index= url.index('/abc/')
url[index+5..length_of_string_you_want_to_extract]
Hope, that helps!

regex for a pattern at end of string

I have a string which looks like:
hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0
Through regex I want to get the string after last '/' and until end of line i.e. in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0
I tried this - ^(.+)\/(.+)$ which returns me an array of which first object is "hello/world" and 2nd object is "1.9.2-some-text"
Is there a way to just get "1.9.2-some-text" as the output?
Try using a negative character class ([^…]) like this:
[^\/]+$
This will match one or more of any character other than / followed by the end of the string.
You can use a negated match here.
'hello/world/1.9.2-some-text'.match(Regexp.new('[^/]+$'))
# => "1.9.2-some-text"
Meaning any character except: / (1 or more times) followed by the end of the string.
Although, the simplest way would be to split the string.
'hello/world/1.9.2-some-text'.split('/').last
# => "1.9.2-some-text"
OR
'hello/world/1.9.2-some-text'.split('/')[-1]
# => "1.9.2-some-text"
If you do not need to use a regex, the ordinary way of doing such thing is:
File.basename("hello/world/1.9.2-some-text")
#=> "1.9.2-some-text"
This is one way:
s = 'hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0'
s.lines.map { |l| l[/.*\/(.*)/,1] }
#=> ["1.9.2-some-text", "2.0.2-some-text", "2.11.0"]
You said, "in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0". That's neither a string nor an array, so I assumed you wanted an array. If you want a string, tack .join(', ') onto the end.
Regex's are naturally "greedy", so .*\/ will match all characters up to and including the last / in each line. 1 returns the contents of the capture group (.*) (capture group 1).

ruby/regex getting the first letter of each word

I want to get the first letter of each word put together, making something like "I need help" turn into "Inh". I was thinking to trim everything off, then going from there, or grab each first letter right away.
You could simply use split, map and join together here.
string = 'I need help'
result = string.split.map(&:first).join
puts result #=> "Inh"
How about regular expressions? Using the split method here forces a focus on the parts of the string that you don't need to for this problem, then taking another step of extracting the first letter of each word (chr). that's why I think regular expressions is better for this case. Node that this will also work if you have a - or another special character in the string. And then, of course you can add .upcase method at the end to get a proper acronym.
string = 'something - something and something else'
string.scan(/\b\w/).join
#=> ssase
Alternative solution using regex
string = 'I need help'
result = string.scan(/(\A\w|(?<=\s)\w)/).flatten.join
puts result
This basically says "look for either the first letter or any letter directly preceded by a space". The scan function returns array of arrays of matches, which is flattened (made into one array) and joined (made into a string).
string = 'I need help'
result = string.split.map(&:chr).join
puts result
http://ruby-doc.org/core-2.0/String.html#method-i-chr

Optimising ruby regexp -- lots of match groups

I'm working on a ruby baser lexer. To improve performance, I joined up all tokens' regexps into one big regexp with match group names. The resulting regexp looks like:
/\A(?<__anonymous_-1038694222803470993>(?-mix:\n+))|\A(?<__anonymous_-1394418499721420065>(?-mix:\/\/[\A\n]*))|\A(?<__anonymous_3077187815313752157>(?-mix:include\s+"[\A"]+"))|\A(?<LET>(?-mix:let\s))|\A(?<IN>(?-mix:in\s))|\A(?<CLASS>(?-mix:class\s))|\A(?<DEF>(?-mix:def\s))|\A(?<DEFM>(?-mix:defm\s))|\A(?<MULTICLASS>(?-mix:multiclass\s))|\A(?<FUNCNAME>(?-mix:![a-zA-Z_][a-zA-Z0-9_]*))|\A(?<ID>(?-mix:[a-zA-Z_][a-zA-Z0-9_]*))|\A(?<STRING>(?-mix:"[\A"]*"))|\A(?<NUMBER>(?-mix:[0-9]+))/
I'm matching it to my string producing a MatchData where exactly one token is parsed:
bigregex =~ "\n ... garbage"
puts $~.inspect
Which outputs
#<MatchData
"\n"
__anonymous_-1038694222803470993:"\n"
__anonymous_-1394418499721420065:nil
__anonymous_3077187815313752157:nil
LET:nil
IN:nil
CLASS:nil
DEF:nil
DEFM:nil
MULTICLASS:nil
FUNCNAME:nil
ID:nil
STRING:nil
NUMBER:nil>
So, the regex actually matched the "\n" part. Now, I need to figure the match group where it belongs (it's clearly visible from #inspect output that it's _anonymous-1038694222803470993, but I need to get it programmatically).
I could not find any option other than iterating over #names:
m.names.each do |n|
if m[n]
type = n.to_sym
resolved_type = (n.start_with?('__anonymous_') ? nil : type)
val = m[n]
break
end
end
which verifies that the match group did have a match.
The problem here is that it's slow (I spend about 10% of time in the loop; also 8% grabbing the #input[#pos..-1] to make sure that \A works as expected to match start of string (I do not discard input, just shift the #pos in it).
You can check the full code at GH repo.
Any ideas on how to make it at least a bit faster? Is there any option to figure the "successful" match group easier?
You can do this using the regexp methods .captures() and .names():
matching_string = "\n ...garbage" # or whatever this really is in your code
#input = matching_string.match bigregex # bigregex = your regex
arr = #input.captures
arr.each_with_index do |value, index|
if not value.nil?
the_name_you_want = #input.names[index]
end
end
Or if you expect multiple successful values, you could do:
success_names_arr = []
success_names_arr.push(#input.names[index]) #within the above loop
Pretty similar to your original idea, but if you're looking for efficiency .captures() method should help with that.
I may have misunderstood this completely but but I'm assuming that all but one token is not nil and that's the one your after?
If so then, depending on the flavour of regex you're using, you could use a negative lookahead to check for a non-nil value
([^\n:]+:(?!nil)[^\n\>]+)
This will match the whole token ie NAME:value.

Resources