Ruby - remove pattern from string - ruby

I have a string pattern that, as an example, looks like this:
WBA - Skinny Joe vs. Hefty Hal
I want to truncate the pattern "WBA - " from the string and return just "Skinny Joe vs. Hefty Hal".

Assuming that the "WBA" spot will be a sequence of any letter or number, followed by a space, dash, and space:
str = "WBA - Skinny Joe vs. Hefty Hal"
str.sub /^\w+\s-\s/, ''
By the way — RegexPal is a great tool for testing regular expressions like these.

If you need a more complex string replacement, you can look into writing a more sophisticated regular expression. Otherwise:
Keep it simple! If you only need to remove "WBA - " from the beginning of the string, use String#sub.
s = "WBA - Skinny Joe vs. Hefty Hal"
puts s.sub(/^WBA - /, '')
# => Skinny Joe vs. Hefty Hal

You can also remove the first occurrence of a pattern with the following snippet:
s[/^WBA - /] = ''

Related

How to write a regex to match .com or .org with a "-" in the domain name

How do I write a regex in ruby that will look for a "-" and ".org" or "com" like:
some-thing.org
some-thing.org.sg
some-thing.com
some-thing.com.sg
some-thing.com.* (there are too many countries so for now any suffix is fine- I will deal with this problem later )
but not:
some-thing
some-thing.moc
I wrote : /.-.(org)?|.*(.com)/i
but it fails to stop "some-thing" or "some-thing.moc" :(
Support optional hyphen
I can come with this regex:
(https?:\/\/)?(www\.)?[a-z0-9-]+\.(com|org)(\.[a-z]{2,3})?
Working demo
Keep in mind that I used capturing groups for simplicity, but if you want to avoid capturing the content you can use non capturing groups like this:
(?:https?:\/\/)?(?:www\.)?[a-z0-9-]+\.(?:com|org)(?:\.[a-z]{2,3})?
^--- Notice "?:" to use non capturing groups
Additionally, if you don't want to use protocol and www pattern you can use:
[a-z0-9-]+\.(?:com|org)(?:\.[a-z]{2,3})?
Support mandatory hyphen
However, as Greg Hewgill pointed in his comment, if you want to ensure you have a hyphen at least, you can use this regex:
(?:https?:\/\/)?(?:www\.)?[a-z0-9]+(?:[-][a-z0-9]+)+\.(?:com|org)(?:\.[a-z]{2,3})?
Although, this regex can fall in horrible backtracking issues.
Working demo
This may help :
/[a-z0-9]+-?[a-z0-9]+\.(org|com)(\.[a-z]+)?/i
It matches '-' in the middle optionally, i.e. still matches names without '-'.
I had a similar issue when I was writing an HTTP server...
... I ended up using the following Regexp:
m = url.match /(([a-z0-9A-Z]+):\/\/)?(([^\/\:]+))?(:([0-9]+))?([^\?\#]*)(\?([^\#]*))?/
m[1] # => requested_protocol (optional) - i.e. https, http, ws, ftp etc'
m[4] # => host_name (optional) - i.e. www.my-site.com
m[6] # => port (optional)
m[7] #=> encoded URI - i.e. /index.htm
If what you are trying to do is validate a host name, you can simply make sure it doesn't contain the few illegal characters (:, /) and contains at least one dot separated string.
If you want to validate only .com or .org (+ country codes), you can do something like this:
def is_legit_url?(url)
allowed_master_domains = %w{com org}
allowed_country_domains = %w{sg it uk}
url.match(/[^\/\:]+\.(#{allowed_master_domains.join '|'})(\.#{allowed_country_domains.join '|'})?/i) && true
end
* notice that certain countries use .co, i.e. the UK uses www.amazon.co.uk
I would convert the Regexp to a constant, for performance reasons:
module MyURLReview
ALLOWED_MASTER_DOMAINS = %w{com org}
ALLOWED_COUNTRY_DOMAINS = %w{sg it uk}
DOMAINS_REGEXP = /[^\/\:]+\.(#{ALLOWED_MASTER_DOMAINS.join '|'})(\.#{ALLOWED_COUNTRY_DOMAINS.join '|'})?/i
def self.is_legit_url?(url)
url.match(DOMAINS_REGEXP) && true
end
end
Good Luck!
Regex101 Example
/[a-zA-Z0-9]-[a-zA-Z0-9]+?\.(?:org|com)\.?/
Of course, the above could be simplified depending on how lenient your rules are. The following is a simpler pattern, but would allow s0me-th1ng.com-plete to pass through:
/\w-\w+?\.(?:org|com)\b/
You could use a lookahead:
^(?=[^.]+-[^.]+)([^.]+\.(?:org|com).*)
Demo
Assuming you are looking for the general pattern of letters-letters where letters could be Unicode, you can do:
^(?=\p{L}+-\p{L}+)([^.]+\.(?:org|com).*)
If you want to add digits:
^(?=[\p{L}0-9]+-[\p{L}0-9]+)([^.]+\.(?:org|com).*)
So that you can match sòme1-thing.com
Demo
(Ruby 2.0+ for \p{L} I think...)

Drop everything from a string before a specific word?

How can I remove everything in a string before a specific word (or including the first space and back)?
I have a string like this:
12345 Delivered to: Joe Schmoe
I only want Delivered to: Joe Schmoe
So, basically anything from the first space and back I don't want.
I'm running Ruby 1.9.3.
Use a regex to select just the part of the string you want.
"12345 Delivered to: Joe Schmoe"[/Delive.*/]
# => "Delivered to: Joe Schmoe"
Quite a few different ways are possible. Here are a couple:
s = '12345 Delivered to: Joe Schmoe'
s.split(' ')[1..-1].join(' ') # split on spaces, take the last parts, join on space
# or
s[s.index(' ')+1..-1] # Find the index of the first space and just take the rest
# or
/.*?\s(.*)/.match(s)[1] # Use a reg ex to pick out the bits after the first space
If Delivered isn't always the 2nd word, you can use this way:
s_line = "12345 Delivered to: Joe Schmoe"
puts s_line[/\s.*/].strip #=> "Delivered to: Joe Schmoe"

What does `(?:| ...)` mean in a Ruby regular expression?

While reading Engineering long-lasting software: an Agile approach using SaaS and cloud computing I came across the following regex (Chapter 5, Section 5.3 Introducing Cucumber and Capybara):
/^(?:|I )am on (.+)$/
I know about the non-capturing (?: ...) syntax, but what I don’t understand is the meaning of the first pipe character after the colon. Is it a typo? Does it serve any particular purpose?
The pipe in regex means alternative. In this case, it is expressing alternation between an empty string "" and the string "I ".
It is just the or. It can match either nothing or I (with a space). The rest is non-capturing group like you mention.
The regex matches something like I am on a diet and also am on a diet and in the above examples, captures a diet in the first group.
Try it out on Rubular - http://rubular.com/r/q3RFEoxj1e
(?:|something)
("nothing / empty string or the match")
Is exactly the same thing as:
(?:something)?
("the match, once or none")
In other words: the non-capturing subpattern is optional.

How do I split names with a regular expression?

I am new to ruby and regular expressions and trying to figure out how to attack seperating the attached string of baseball players into first/last name combinations.
This is a sample string:
"JohnnyCuetoJ.J.PutzBrianMcCann"
This is the desired output:
Johnny Cueto
J.J. Putz
Brian McCann
I have figured out how to separate by capital letters which gets me close, but the outlier names like J.J. and McCann mess that pattern up. Anyone have ideas on the best way to approach this?
If you don't have to do it in one single gsub than it gets a bit easier.
string = "JohnnyCuetoJ.J.PutzBrianMcCann"
string.gsub!(/([A-Z][^A-Z]+)/, '\1 ') # separate by capital letters
string.gsub!(/(\.) ([A-Z]\.)/, '\1\2') # paste together "J. J." -> "J.J."
string.gsub!(/Mc /, 'Mc') # Remove the space in "Mc "
string.strip # Remove the extra space after "Cann "
...and of course you can put this on a single line by chaining the gsub calls, but that will basically kill the readability of the code (but on the other hand, how readable is a block of regexen anyway?)

how to convert strings like "this is an example" to "this-is-an-example" under ruby

How do I convert strings like "this is an example" to "this-is-an-example" under ruby?
The simplest version:
"this is an example".tr(" ", "-")
#=> "this-is-an-example"
You could also do something like this, which is slightly more robust and easier to extend by updating the regular expression:
"this is an example".gsub(/\s+/, "-")
#=> "this-is-an-example"
The above will replace all chunks of white space (any combination of multiple spaces, tabs, newlines) to a single dash.
See the String class reference for more details about the methods that can be used to manipulate strings in Ruby.
If you are trying to generate a string that can be used in a URL, you should also consider stripping other non-alphanumeric characters (especially the ones that have special meaning in URLs), or replacing them with an alphanumeric equivalent (example, as suggested by Rob Cameron in his answer).
If you are trying to make something that is a good URL slug, there are lots of ways to do it.
Generally, you want to remove everything that is not a letter or number, and then replace all whitespace characters with dashes.
So:
s = "this is an 'example'"
s = s.gsub(/\W+/, ' ').strip
s = s.gsub(/\s+/,'-')
At the end s will equal "this-is-an-example"
I used the source code from a ruby testing library called contest to get this particular way to do it.
If you're using Rails take a look at parameterize(), it does exactly what you're looking for:
http://api.rubyonrails.org/classes/ActiveSupport/CoreExtensions/String/Inflections.html#M001367
foo = "Hello, world!"
foo.parameterize => 'hello-world'

Resources