I have a string "FooFoo2014".
I want the result to be => "Foo Foo 2014"
Any idea?
This works fine:
puts "FooFoo2014".scan(/(\d+|[A-Z][a-z]+)/).join(' ')
# => Foo Foo 2014
Of course in condition that you separate numbers and words from capital letter.
"FooFoo2014"
.gsub(/(?<=\d)(?=\D)|(?<=\D)(?=\d)|(?<=[a-z])(?=[A-Z])/, " ")
# => "Foo Foo 2014"
Your example is a little generic. So this might be guessing in the wrong direction. That being said, it seems like you want to reformat the string a little:
"FooFoo2014".scan(/^([A-Z].*)([A-Z].*\D*)(\d+)$/).flatten.join(" ")
As "FooFoo2014" is a string with some internal structure important to you, you need to come up with the right regular expression yourself.
From your question, I extract two tasks:
split the FooFoo at the capital letter.
/([A-Z].*)([A-Z].*)/ would do that, given you only have standard latin letters
split the letter from the digits
/(.*\D)(\d+)/ achieves that.
The result of scan is an array in my version of ruby. Please verify that in your setup.
If you think that regular expressions are too complicated for this, I suggest that you take a good look into ActiveSupport. http://api.rubyonrails.org/v3.2.1/ might help you.
If its only letters then only digits:
target = "FooFoo2014"
match_data = target.match(/([A-Za-z]+)(\d+)/)
p match_data[1] # => "FooFoo"
p match_data[2] # => "2014
If it is two words each made of one capitalized letter then lowercase letters, then digits:
target = "FooBar2014"
match_data = target.match(/([A-Z][a-z]+)([A-Z][a-z]+)(\d+)/)
p match_data[1] # => "Foo"
p match_data[2] # => "Bar"
p match_data[3] # => "2014
Better regex are probably possible.
Related
Here are my test cases.
Expected:
JUNKINFRONThttp://francium.tech should be http://francium.tech
JUNKINFRONThttp://francium.tech/http should be http://francium.tech/http
francium.tech/http should be francium.tech/http (unaffected)
Actual result:
http://francium.tech
francium.tech/http
http
I am trying to write a regex replace for this. I tried this,
text.sub(/.*http/,'http')
However, my second and third test cases fail because it searches till the end. It would help if the answer could also do the case insensitivity.
2.5.0 :001 > url = 'francium.tech/http'
=> "francium.tech/http"
2.5.0 :002 > url.sub(/^.*?(?=http)/i,'')
=> "http"
As per my original comments, you can use the pattern as shown below. If you want a really small performance gain, you can remove one step in the regex by using the second pattern instead. If you're especially concerned with performance, the last one performs even quicker.
^.*?(?=https?://)
^.*?(?=https?:/{2})
^.*?(?=ht{2}ps?:/{2})
See code in use here
strings = [
"JUNKINFRONThttp://francium.tech",
"JUNKINFRONThttp://francium.tech/http",
"francium.tech/http"
]
strings.each { |s| puts s.sub(%r{^.*?(?=https?://)}, '') }
Outputs the following:
http://francium.tech
http://francium.tech/http
francium.tech/http
I think this may solve your problem.
str1 = 'JUNKINFRONThttp://francium.tech'# should be http://francium.tech
str2 = 'JUNKINFRONThttp://francium.tech/http'# should be http://francium.tech/http
str3 = 'francium.tech/http' #should be francium.tech/http (unaffected)
str4 = 'JUNKINFRONThttps://francium.tech/http'# should be https://francium.tech/http
[str1, str2, str3, str4].each do |str|
puts str.gsub(/^.*(http|https):\/\//i, "\\1://")
end
Result:
http://francium.tech
http://francium.tech/http
francium.tech/http
https://francium.tech/http
When using regex you should make sure to use unique strings like http:\\ or better http:\\[SOMETHING].[AT_LEAST_TWO_CHARS][MAYBE_A_SLASH] and so on...
This works for your given cases:
str = ['JUNKINFRONThttp://francium.tech',
'JUNKINFRONThttp://francium.tech/http',
'francium.tech/http']
str.each do |str|
puts str.sub(/^.*?(https?:\/{2})/, '\1') # with capturing group
puts str.sub(/^.*?(?=https?:\/{2})/, '') # with positive lookahead
end
By using a group we can use it for the replacement, another method would be to use a positive lookahead
Given a sentence, I want to count all the duplicated words:
It is an exercice from Exercism.io Word count
For example for the input "olly olly in come free"
plain
olly: 2
in: 1
come: 1
free: 1
I have this test for exemple:
def test_with_quotations
phrase = Phrase.new("Joe can't tell between 'large' and large.")
counts = {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}
assert_equal counts, phrase.word_count
end
this is my method
def word_count
phrase = #phrase.downcase.split(/\W+/)
counts = phrase.group_by{|word| word}.map {|k,v| [k, v.count]}
Hash[*counts.flatten]
end
For the test above I have this failure when I run it in the terminal:
2) Failure:
PhraseTest#test_with_apostrophes [word_count_test.rb:69]:
--- expected
+++ actual
## -1 +1 ##
-{"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
+{"first"=>1, "don"=>2, "t"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
My problem is to remove all chars except 'apostrophe...
the regex in the method almost works...
phrase = #phrase.downcase.split(/\W+/)
but it remove the apostrophes...
I don't want to keep the single quote around a word, 'Hello' => Hello
but Don't be cruel => Don't be cruel
Maybe something like:
string.scan(/\b[\w']+\b/i).each_with_object(Hash.new(0)){|a,(k,v)| k[a]+=1}
The regex employs word boundaries (\b).
The scan outputs an array of the found words and for each word in the array they are added to the hash, which has a default value of zero for each item which is then incremented.
Turns out my solution whilst finding all items and ignoring case will still leave the items in the case they were found in originally.
This would now be a decision for Nelly to either accept as is or to perform a downcase on the original string or the array item as it is added to the hash.
I'll leave that decision up to you :)
Given:
irb(main):015:0> phrase
=> "First: don't laugh. Then: don't cry."
Try:
irb(main):011:0> Hash[phrase.downcase.scan(/[a-z']+/)
.group_by{|word| word.downcase}
.map{|word, words|[word, words.size]}
]
=> {"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
With your update, if you want to remove single quotes, do that first:
irb(main):038:0> p2
=> "Joe can't tell between 'large' and large."
irb(main):039:0> p2.gsub(/(?<!\w)'|'(?!\w)/,'')
=> "Joe can't tell between large and large."
Then use the same method.
But you say -- gsub(/(?<!\w)'|'(?!\w)/,'') will remove the apostrophe in 'Twas the night before. Which I reply you will eventually need to build a parser that can determine the distinction between an apostrophe and a single quote if /(?<!\w)'|'(?!\w)/ is not sufficient.
You can also use word boundaries:
irb(main):041:0> Hash[p2.downcase.scan(/\b[a-z']+\b/)
.group_by{|word| word.downcase}
.map{|word, words|[word, words.size]}
]
=> {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}
But that does not solve 'Tis the night either.
Another way:
str = "First: don't 'laugh'. Then: 'don't cry'."
reg = /
[a-z] #single letter
[a-z']+ #one or more letters or apostrophe
[a-z] #single letter
'? #optional single apostrophe
/ix #case-insensitive and free-spacing regex
str.scan(reg).group_by(&:itself).transform_values(&:count)
#=> {"First"=>1, "don't"=>2, "laugh"=>1, "Then"=>1, "cry'"=>1}
I'm writing a Rack app to split hostnames ending with certain prefixes.
For example, the hostname (and port) hello.world.lvh.me:3000 needs to be split into tokens hello.world, .lvh.me and :3000. Additionally, the prefix (hello.world), suffix (.lvh.me) and port (:3000) are all optional.
So far, I have a (Ruby) regex that looks like /(.*)(\.lvh\.me)(\:\d+)?/.
This successfully breaks the hostname into component parts but it falls down when one or more of the optional components is missing, e.g. for hello.world:3000 or lvh.me:3000 or even plain old hello.world.
I've tried adding ? to each group to make them optional (/(.*)?(\.lvh\.me)?(\:(\d+)?/) but this invariably ends up with the first group, (.*), capturing the entire string and stopping there.
My gut feeling is that this is something which might be solved using lookaround but I'll admit this is a totally new realm of regex for me.
You can try with this pattern:
\A(?=[^:])(.+?)??((?:\.|\A)lvh\.me)?(:[0-9]+)?\z
the lookahead (?=[^:]) checks there is at least one character that is not the : (in other words, not the port alone). This means that at least hello.word or lvh.me is present.
The first group is optional and non-greedy ??, this means that it is matched only when needed.
\A and \z are anchors for the start and the end of the string (when ^ and $ are used for the line)
Note that the character class \d matches all unicode digits in Ruby, but in this case you only need ascii digits. It's better to use [0-9]
Note too that \A(?=[^:])((?>[^l:\n.]+|\.|\Bl|l(?!vh\.me\b))*)((?:\.|\A)lvh\.me)?(:[0-9]+)?\z may be more performant.
online demo
Try ^(.*?)?(\.?lvh\.me)?(\:\d+)?$
I added:
a ? to the first group making the * non-greedy
^,$ to anchor it to the start and end.
a ? to the \. before lvh because you want to match lvh.me:3000 not .lvh.me:3000
A Tokenizing Answer
Just for fun, I decided to see if there was a relatively simple way to do what you wanted without a complicated regular expression. The only regular expressions I used were for splitting and validation.
This works for me with your provided corpus, and several variations.
str = 'hello.world.lvh.me:3000'
tokens = str.split /[.:]/
port = tokens.last =~ /\A\d+\z/ ? ?: + tokens.pop : ''
domain = sprintf '.%s.%s', *tokens.pop(2)
prefix = tokens.join ?.
You'll certainly need to check for empty strings in certain cases, but it seems like it might be more straightforward and/or flexible than a pure regex solution. I find it more readable, anyway. If you truly need a single regular expression, though, I'm sure one of the other answers will help you out.
You could try splitting rather than matching,
irb(main):012:0> "hello.world.lvh.me:3000".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world", "lvh.me", "3000"]
irb(main):013:0> "hello.world:3000".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world", "3000"]
irb(main):014:0> "lvh.me:3000".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["lvh.me", "3000"]
irb(main):015:0> "hello.world".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world"]
irb(main):016:0> "hello.world.lvh.me".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world", "lvh.me"]
Look, ma, no regex!
def split_up(str)
str.sub(':','.:')
.split('.')
.each_slice(2)
.map { |arr| arr.join('.') }
end
split_up("hello.world.lvh.me:3000") #=> ["hello.world", "lvh.me", ":3000"]
split_up("hello.world:3000") #=> ["hello.world", ":3000"]
split_up("hello.world.lvh.me") #=> ["hello.world", "lvh.me"]
split_up("hello.world") #=> ["hello.world"]
split_up("") #=> []
Steps:
str1 = "hello.world.lvh.me:3000" #=> "hello.world.lvh.me:3000"
str2 = str1.sub(':','.:') #=> "hello.world.lvh.me.:3000"
arr = str2.split('.') #=> ["hello", "world", "lvh", "me", ":3000"]
enum = arr.each_slice(2) #=> #<Enumerator: ["hello", "world", "lvh",
# "me", ":3000"]:each_slice(2)>
enum.to_a #=> [["hello", "world"], ["lvh", "me"],
# [":3000"]]
enum.map { |arr| arr.join('.') } #=> ["hello.world", "lvh.me", ":3000"]
I'd like to split the following string on letters:
1234B
There are always only ever 4 digits and one letter. I just want to split those out.
Here is my attempt, I think I have the method right and the regex matches the number but I dont think my syntax or my regex is pertinent to the problem Im attempting to solve.
"1234A".split(/^\d{4}/)
What you want is not clear, but a general solution to this kind of situation is:
"1234A".scan(/\d+|\D+/)
# => ["1234", "A"]
If there are always 4 digits and 1 letter, there's no need to use regular expressions to split the string. Just do this:
str = "1234A"
digits,letter = str[0..3],str[4]
Looking at it purely from the perspective of splitting any string into groups of 4:
"1234A".scan(/.{1,4}/)
# => ["1234", "A"]
Another no-regex version:
str = "1234A"
str.chars.to_a.last # => "A"
str.chop # => "1234"
How can I get the content in between "{ }" in Ruby? For example,
I love {you}
How can I fetch the element "you"? If I want to replace the content, say change "you" to "her", how should I do that? Probably using gsub?
replacements = {
'you' => 'her',
'angels' => 'demons',
'ice cream' => 'puppies',
}
my_string = "I love {you}.\nYour voice is like {angels} singing.\nI would love to eat {ice cream} with you sometime!"
replacements.each do |source, replacement|
my_string.gsub! "{#{source}}", replacement
end
puts my_string
# => I love her.
# => Your voice is like demons singing.
# => I would love to eat puppies with you sometime!
The simple way to get the content from the inside of the {...} is:
str = 'I love {you}'
str[/{(.+)}/, 1] # => "you"
That basically says, "grab everything inside a leading { to a trailing }. It's not real sophisticated and can be fooled by nested {} pairs.
Replacing the target string can be done various ways:
replace_str = 'her'
'I love {you}'.sub('you', replace_str) # => "I love {her}"
A simple sub will replace the first occurrence of the target string with the replacement text.
You could use a regex instead of the string:
'I love you {you}'.sub(/you/, replace_str) # => "I love her {you}"
If there are multiple occurrences of the target string then use a bit more text to locate it. This uses the wrapping delimiters to locate it, and then replaces them also. There are other ways to do this, but I'd do it like:
'I love you {you}'.sub(/{.+}/, "{#{ replace_str }}") # => "I love you {her}"
Alex Wayne's answer came close but didn't go all the way: Ruby's gsub has a really nice feature, where you can pass it a regex and a hash, and it will replace all the occurrences of the regex matches with the values in the hash:
hash = {
'I' => 'She',
'love' => 'loves',
'you' => 'me'
}
str.gsub(Regexp.union(hash.keys), hash) # => "She loves {me}"
That's really powerful when you want to take a template and quickly replace all the placeholders in it.
You can always use .index:
a = 'I love {bill gates}'
a[a.index('{')+1..a.index('}')-1]
The last line just says get 'a' from right after the first occurrence of '{' and right before the first occurrence of '}'. It is important to note, however, that this will only get the text between the first occurrences of {}. So it will work for your above example.
I would use indexing also to add something new between the {}s.
That would look something like:
a[0..a.index('{')] + 'Steve Jobs' + a[a.index('}')..-1]
Again this only works for the first occurrence of '{' and '}'.
Michael G.
why not use some template engine like: https://github.com/defunkt/mustache
note that ruby can do this for %{}:
"foo = %{foo}" % { :foo => 'bar' }
#=> "foo = bar"
and finally do not forget to check existing ruby template engines - do not reinvent the wheel!
Regular expressions are the way to go with gsub. Something like:
existingString.gsub(/\{(.*?)\}/) { "her" }