How to extract part of the string which comes after given substring? - ruby

For example I have url string like:
https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj
From this string I need to extract number 1234 which comes after subfolder/. I tried with gsub but no luck. Any help would be appreciated.

Suppose your url is saved in a variable called url.
Then the following should return 1234
url.match(/subfolder\/(\d*)/)[1]
Explanation:
url.match(/ # call the match function which takes a regex
subfolder\/ # search for the first appearance of the string 'subfolder/'
# note: we must escape the `/` so we don't end the regex early
(\d*) # match any number of digits in a capture group,
/)[1] # close the regex and return the first capture group

lwassink has the right idea, but it can be done more simply. If subfolder is always the same:
url = "https://abc.s3-something.amazonaws.com/subfolder/1234/5.html?X-Amz-Credential=abcd12bhhh34-1%2Fs3%2Faws4_request&X-Amz-Date=2016&X-Amz-Expires=3&X-Amz-SignedHeaders=host&X-Amz-Signature=abcd34hhhhbfbbf888ksdskj"
url[/subfolder\/\K\d+/]
# => "1234"
The \K discards the matched text up to that point, so only "1234" is returned.
If you want to get the number after any subfolder, and the domain name is always the same, you might do this instead:
url[%r{amazonaws\.com/[^/]+/\K\d+}]
# => "1234"

s.split('/')[4]
Add a .to_i at the end if you like.
Or, to key it on a substring like you asked for...
a = s.split '/'
a[a.find_index('subfolder') + 1]
Or, to do it as a one-liner I suppose you could:
s.split('/').tap { |a| #i = 1 + a.find_index('subfolder')}[#i]
Or, since I am a damaged individual, I would actually write that:
s.split('/').tap { |a| #i = 1 + (a.find_index 'subfolder')}[#i]

url = 'http://abc/xyz'
index= url.index('/abc/')
url[index+5..length_of_string_you_want_to_extract]
Hope, that helps!

Related

How to get text between #{...} with RegEx?

I have the next text:
My name is %{name}
how can I get name inside of %{ ... }?
I'm trying with:
/%{(.*)}/
but it takes whole %{name}, but I need just name.
When I try this expression in regex101.com, it gives me 2 cases: Full match({name}) and Group 1(name). In my ruby code it gives me Full case, but I need Group case.
What is the problem?
You can use lookaround:
(?<=%{)[^%]*(?=})
see demo.
(?<=%{) will ensure that the next part is preceded with %{
[^%]* will match avoid issue with encapsulated field
(?=}) will ensure that it's followed by a }
Don't know how you're applying that regex to the string, but .match method returns a MatchData object, from which you can extract matched groups
s = 'My name is %{name}'
regex = /%{(.*)}/
m = s.match(regex) # => #<MatchData "%{name}" 1:"name">
m[0] # => "%{name}"
m[1] # => "name"
It looks nicer with named groups
s = 'My name is %{name}'
regex = /%{(?<var>.*)}/
m = s.match(regex) # => #<MatchData "%{name}" var:"name">
m[:var] # => "name"
In Ruby, you can easily access any capture group you need with
s[/regex/, n]
where n is the ID of the capturing group. So, in your case, use
s[/%{([^}]*)}/, 1]
or
s[/%{(.*?)}/m, 1]
See the online demo
You need to make the Group 1 subpattern lazy or set to match any chars but } to get as few symbols as possible in order not to overflow to the next match.

Swap part of a string in Ruby

What's the easiest way in Ruby to interchange a part of a string with another value. Let's say that I have an email, and I want to check it on two domains, but I don't know which one I'll get as an input. The app I'm building should work with #gmail.com and #googlemail.com domains.
Example:
swap_string 'user#gmail.com' # >>user#googlemail.com
swap_string 'user#googlemail.com' # >>user#gmail.com
If you're looking to substitute a part of a string with something else, gsub works quite well.
Link to Gsub docs
It lets you match a part of a string with regex, and then substitute just that part with another string. Naturally, in place of regex, you can just use a specific string.
Example:
"user#gmail.com".gsub(/#gmail/, '#googlemail')
is equal to
user#googlemail.com
In my example I used #gmail and #googlemail instead of just gmail and googlemail. The reason for this is to make sure it's not an account with gmail in the name. It's unlikely, but could happen.
Don't match the .com either, as that can change depending on where the user's email is.
Assuming googlemail.com and gmail.com are the only two possibilities, you can use sub to replace a pattern with given replacement:
def swap_string(str)
if str =~ /gmail.com$/
str.sub("gmail.com","googlemail.com")
else
str.sub("googlemail.com","gmail.com")
end
end
swap_string 'user#gmail.com'
# => "user#googlemail.com"
swap_string 'user#googlemail.com'
# => "user#gmail.com"
You can try with Ruby gsub :
eg:
"user#gmail.com".gsub("gmail.com","googlemail.com");
As per your need of passing a string parameter in a function this should do:
def swap_mails(str)
if str =~ /gmail.com$/
str.sub('gmail.com','googlemail.com');
else
str.sub('googlemail.com','gmail.com');
end
end
swap_mails "vgmail#gmail.com" //vgmail#googlemail.com
swap_mails "vgmail#googlemail.com" ////vgmail#gmail.com
My addition :
def swap_domain str
str[/.+#/] + [ 'gmail.com', 'googlemail.com' ].detect do |d|
d != str.split('#')[1]
end
end
swap_domain 'user#gmail.com'
#=> user#googlemail.com
swap_domain 'user#googlemail.com'
#=> user#gmail.com
And this is bad code, imo.
String has a neat trick up it's sleeve in the form of String#[]:
def swap_string(string, lookups = {})
string.tap do |s|
lookups.each { |find, replace| s[find] = replace and break if s[find] }
end
end
# Example Usage
lookups = {"googlemail.com"=>"gmail.com", "gmail.com"=>"googlemail.com"}
swap_string("user#gmail.com", lookups) # => user#googlemail.com
swap_string("user#googlemail.com", lookups) # => user#gmail.com
Allowing lookups to be passed to your method makes it more reusable but you could just as easily have that hash inside of the method itself.

regex for a pattern at end of string

I have a string which looks like:
hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0
Through regex I want to get the string after last '/' and until end of line i.e. in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0
I tried this - ^(.+)\/(.+)$ which returns me an array of which first object is "hello/world" and 2nd object is "1.9.2-some-text"
Is there a way to just get "1.9.2-some-text" as the output?
Try using a negative character class ([^…]) like this:
[^\/]+$
This will match one or more of any character other than / followed by the end of the string.
You can use a negated match here.
'hello/world/1.9.2-some-text'.match(Regexp.new('[^/]+$'))
# => "1.9.2-some-text"
Meaning any character except: / (1 or more times) followed by the end of the string.
Although, the simplest way would be to split the string.
'hello/world/1.9.2-some-text'.split('/').last
# => "1.9.2-some-text"
OR
'hello/world/1.9.2-some-text'.split('/')[-1]
# => "1.9.2-some-text"
If you do not need to use a regex, the ordinary way of doing such thing is:
File.basename("hello/world/1.9.2-some-text")
#=> "1.9.2-some-text"
This is one way:
s = 'hello/world/1.9.2-some-text
hello/world/2.0.2-some-text
hello/world/2.11.0'
s.lines.map { |l| l[/.*\/(.*)/,1] }
#=> ["1.9.2-some-text", "2.0.2-some-text", "2.11.0"]
You said, "in above examples output should be 1.9.2-some-text, 2.0.2-some-text, 2.11.0". That's neither a string nor an array, so I assumed you wanted an array. If you want a string, tack .join(', ') onto the end.
Regex's are naturally "greedy", so .*\/ will match all characters up to and including the last / in each line. 1 returns the contents of the capture group (.*) (capture group 1).

Capture arbitrary string before either '/' or end of string

Suppose I have:
foo/fhqwhgads
foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar
And I want to replace everything that follows 'foo/' up until I either reach '/' or, if '/' is never reached, then up to the end of the line. For the first part I can use a non-capturing group like this:
(?<=foo\/).+
And that's where I get stuck. I could match to the second '/' like this:
(?<=foo\/).+(?=\/)
That doesn't help for the first case though. Desired output is:
foo/blah
foo/blah/bar
I'm using Ruby.
Try this regex:
/(?<=foo\/)[^\/]+/
Implementing #Endophage's answer:
def fix_post_foo_portion(string)
portions = string.split("/")
index_to_replace = portions.index("foo") + 1
portions[index_to_replace ] = "blah"
portions.join("/")
end
strings = %w{foo/fhqwhgads foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar}
strings.each {|string| puts fix_post_foo_portion(string)}
I'm not a ruby dev but is there some equivalent of php's explode() so you could explode the string, insert a new item at the second array index then implode the parts with / again... Of course you can match on the first array element if you only want to do the switch in certain cases.
['foo/fhqwhgads', 'foo/fhqwhgadshgnsdhjsdbkhsdabkfabkveybvf/bar'].each do |s|
puts s.sub(%r|^(foo/)[^/]+(/.*)?|, '\1blah\2')
end
Output:
foo/blah
foo/blah/bar
I'm too tired to think of a nicer way to do it but I'm sure there is one.
Checking for the end-of-string anchor -- $ -- as well as the / character should do the trick. You'll also need to make the .+ non-greedy by changing it to .+? since the greedy version will always match right up to the end of the string, given the chance.
(?<=foo\/).+?(?=\/|$)

Cut off the filename and extension of a given string

I build a little script that parses a directory for files of a given filetype and stores the location (including the filename) in an array. This look like this:
def getFiles(directory)
arr = Dir[directory + '/**/*.plt']
arr.each do |k|
puts "#{k}"
end
end
The output is the path and the files. But I want only the path.
Instead of /foo/bar.txt I want only the /foo/
My first thought was a regexp but I am not sure how to do that.
Could File.dirname be of any use?
File.dirname(file_name ) → dir_name
Returns all components of the filename
given in file_name except the last
one. The filename must be formed using
forward slashes (``/’’) regardless of
the separator used on the local file
system.
File.dirname("/home/gumby/work/ruby.rb") #=> "/home/gumby/work"
You don't need a regex or split.
File.dirname("/foo/bar/baz.txt")
# => "/foo/bar"
The following code should work (tested in the ruby console):
>> path = "/foo/bar/file.txt"
=> "/foo/bar/file.txt"
>> path[0..path.rindex('/')]
=> "/foo/bar/"
rindex finds the index of the last occurrence of substring. Here is the documentation http://docs.huihoo.com/api/ruby/core/1.8.4/classes/String.html#M001461
Good luck!
I would split it into an array by the slashes, then remove the last element (the filename), then join it into a string again.
path = '/foo/bar.txt'
path = path.split '/'
path.pop
path = path.join '/'
# path is now '/foo'
not sure what language your in but here is the regex for the last / to the end of the string.
/[^\/]*+$/
Transliterates to all characters that are not '/' before the end of the string
For a regular expression, this should work, since * is greedy:
.*/

Resources