Replacing text except in the URL - ruby

Say that we have the following text
example abc http://www.example.com
I know how to replace example by some text for instance. But, when I do that, how can I tell the program NOT to substitute the example in the URL?

UPDATE
#kiddorails reminded me of a known trick to work around a missing variable-width look-behind that can be implemented in Ruby as well. However, the regex used by #kiddorails will not replace example before the URL. Also, it is not dynamic.
Here is a function that will replace specific words (whole word mode is enforced by using \bs, but they can be removed in case you need to match strings with non-word leading and trailing characters) not in an URL even if they contain symbols that must be escaped in a regex:
def removeOutsideOfURL(word, input)
rx = Regexp.new("(?i)\\b" + Regexp.escape(word.reverse) + "(?!\\S+ptth\\b)")
return input.reverse.gsub(rx,"").reverse
end
puts removeOutsideOfURL("example", "example def http://www.example.com with a new example")
Output of a sample program:
def http://www.example.com with a new
ORIGINAL ANSWER
For this concrete example and context, you can use (?<!http:\/\/www\.)example/:
puts "example def http://www.example.com".gsub(/(?<!http:\/\/www\.)example/, '')
>> def http://www.example.com
Demo on IDEONE
You can add more look-behinds to set more conditions, e.g. /(?<!http:\/\/www\.)(?<!http:\/\/)example/ to also keep example straight after http://.
Or, you can also check for periods on both ends:
(?<!\.)example(?!\.)

You can use sub:
"example def http://www.example.com".sub("example","")
Result:
" def http://www.example.com"

Update: Pointers by #stribizhev :)
For this particular use case, I will go along with negative lookbehind regex as #stribizhev used above.
But there is one gotcha with negative lookbehind regex - It only accepts fixed length regex.
So, if urls are like: http://example.com or http://www.example.com, the check can either pass for first case or last.
I suggest this approach - reverse the url, use negative lookahead regex and substitute reverse of "example" in your string. Here is the demo below:
regex = /elpmaxe(?!\S+ptth)/
str1 = "example http://example.com"
str2 = "example http://www.example.com"
str3 = "foo example http://wwww.someexampleurl.com"
str4 = "example def http://www.example.com with a new example"
[str1, str2, str3, str4].map do |str|
str.reverse.gsub(regex, '').reverse
end
#=>[" http://example.com",
" http://www.example.com",
"foo http://wwww.someexampleurl.com",
" def http://www.example.com with a new "]

Related

Delete all the whitespaces that occur after a word in ruby

I have a string " hello world! How is it going?"
The output I need is " helloworld!Howisitgoing?"
So all the whitespaces after hello should be removed. I am trying to do this in ruby using regex.
I tried strip and delete(' ') methods but I didn't get what I wanted.
some_string = " hello world! How is it going?"
some_string.delete(' ') #deletes all spaces
some_string.strip #removes trailing and leading spaces only
Please help. Thanks in advance!
There are numerous ways this could be accomplished without without a regular expressions, but using them could be the "cleanest" looking approach without taking sub-strings, etc. The regular expression I believe you are looking for is /(?!^)(\s)/.
" hello world! How is it going?".gsub(/(?!^)(\s)/, '')
#=> " helloworld!Howisitgoing?"
The \s matched any whitespace character (including tabs, etc), and the ^ is an "anchor" meaning the beginning of the string. The ! indicates to reject a match with following criteria. Using those together to your goal can be accomplished.
If you are not familiar with gsub, it is very similar to replace, but takes a regular expression. It additionally has a gsub! counter-part to mutate the string in place without creating a new altered copy.
Note that strictly speaking, this isn't all whitespace "after a word" to quote the exact question, but I gathered from your examples that your intentions were "all whitespace except beginning of string", which this will do.
def remove_spaces_after_word(str, word)
i = str.index(/\b#{word}\b/i)
return str if i.nil?
i += word.size
str.gsub(/ /) { Regexp.last_match.begin(0) >= i ? '' : ' ' }
end
remove_spaces_after_word("Hey hello world! How is it going?", "hello")
#=> "Hey helloworld!Howisitgoing?"

Replace all words before the start of the first word (Regex and Ruby)

Here are my test cases.
Expected:
JUNKINFRONThttp://francium.tech should be http://francium.tech
JUNKINFRONThttp://francium.tech/http should be http://francium.tech/http
francium.tech/http should be francium.tech/http (unaffected)
Actual result:
http://francium.tech
francium.tech/http
http
I am trying to write a regex replace for this. I tried this,
text.sub(/.*http/,'http')
However, my second and third test cases fail because it searches till the end. It would help if the answer could also do the case insensitivity.
2.5.0 :001 > url = 'francium.tech/http'
=> "francium.tech/http"
2.5.0 :002 > url.sub(/^.*?(?=http)/i,'')
=> "http"
As per my original comments, you can use the pattern as shown below. If you want a really small performance gain, you can remove one step in the regex by using the second pattern instead. If you're especially concerned with performance, the last one performs even quicker.
^.*?(?=https?://)
^.*?(?=https?:/{2})
^.*?(?=ht{2}ps?:/{2})
See code in use here
strings = [
"JUNKINFRONThttp://francium.tech",
"JUNKINFRONThttp://francium.tech/http",
"francium.tech/http"
]
strings.each { |s| puts s.sub(%r{^.*?(?=https?://)}, '') }
Outputs the following:
http://francium.tech
http://francium.tech/http
francium.tech/http
I think this may solve your problem.
str1 = 'JUNKINFRONThttp://francium.tech'# should be http://francium.tech
str2 = 'JUNKINFRONThttp://francium.tech/http'# should be http://francium.tech/http
str3 = 'francium.tech/http' #should be francium.tech/http (unaffected)
str4 = 'JUNKINFRONThttps://francium.tech/http'# should be https://francium.tech/http
[str1, str2, str3, str4].each do |str|
puts str.gsub(/^.*(http|https):\/\//i, "\\1://")
end
Result:
http://francium.tech
http://francium.tech/http
francium.tech/http
https://francium.tech/http
When using regex you should make sure to use unique strings like http:\\ or better http:\\[SOMETHING].[AT_LEAST_TWO_CHARS][MAYBE_A_SLASH] and so on...
This works for your given cases:
str = ['JUNKINFRONThttp://francium.tech',
'JUNKINFRONThttp://francium.tech/http',
'francium.tech/http']
str.each do |str|
puts str.sub(/^.*?(https?:\/{2})/, '\1') # with capturing group
puts str.sub(/^.*?(?=https?:\/{2})/, '') # with positive lookahead
end
By using a group we can use it for the replacement, another method would be to use a positive lookahead

Regex: How to replace all characters except a word/sequence of pattern?

I have the following strings:
"ft-2 MY AWESOME ft-12 APP"
"MY AWESOME APP"
"MY AWESOME APP ft-20"
I want to do some modification (titleization in this case) on the words except ft-<NUMBER> parts. ft-<NUMBER> word can appear anywhere. It can appear multiple times or may not be present at all. After string manipulation, the end results should look like this:
"ft-2 My Awesome ft-12 App"
"My Awesome App"
"My Awesome App ft-20"
Is it possible to write any regex in Ruby that can do this transformation?
I tried like this:
"ft-4 MY AWESOME ft-5 APP".gsub(/(?<=ft-\d\s).*/) { |s| s.titleize }
I got this: ft-4 My Awesome Ft 5 App in return.
R = /
[[:alpha:]]+ # match one or more uppercase or lowercase letters
(?=\s|\z) # match a whitespace or end of string (positive lookahead)
/x # free-spacing regex definition mode
def doit(str)
str.gsub(R) { |s| s.capitalize }
end
doit "ft-2 MY AWESOME ft-12 APP"
#=> "ft-2 My Awesome ft-12 App"
doit "MY AWESOME APP"
#=> "My Awesome App"
doit "MY AWESOME APP ft-20"
#=> "My Awesome App ft-20"
Your (?<=ft-\d\s).* pattern matches any location that is preceded with ft-<digits><whitespace>, and then matches any 0+ chars other than line break chars that you titleize.
You need to match whole words that do not start with ft-<NUMBER> pattern. Then all you need is to downcase the match and capitalize it:
s.gsub(/\b(?!ft-\d)\p{L}+/) { | m | m.capitalize }
Or, if you prefer to use $1 variable, add a capturing group:
s.gsub(/\b(?!ft-\d)(\p{L}+)/) { $1.capitalize }
See the Ruby demo
Pattern details:
\b - first of all, assert the position before a letter (because the next consuming pattern is \p{L} that matches a letter)
(?!ft-\d) - a negative lookahead that fails the match if the next 2 letters are ft that are followed with a - and a digit
(\p{L}+) - a capturing group matching 1+ letters (that is later referred to with $1 in the replacement block)
The capitalize "returns a copy of str with the first character converted to uppercase and the remainder to lowercase".
I am not 100% sure of what you want but I think that what you want to to downcase and then titleize a string
I have something similar on one of my projects
#lib/core_ext/string.rb
class String
def my_titleize
humanize.gsub(/\b('?[a-z])/) { $1.capitalize }
end
end
humanize(options = {}) public Capitalizes the first word, turns
underscores into spaces, and strips a trailing ‘_id’ if present. Like
titleize, this is meant for creating pretty output.
The capitalization of the first word can be turned off by setting the
optional parameter capitalize to false. By default, this parameter is
true.

How to split a string which contains multiple forward slashes

I have a string as given below,
./component/unit
and need to split to get result as component/unit which I will use this as key for inserting hash.
I tried with .split(/.\//).last but its giving result as unit only not getting component/unit.
I think, this should help you:
string = './component/unit'
string.split('./')
#=> ["", "component/unit"]
string.split('./').last
#=> "component/unit"
Your regex was almost fine :
split(/\.\//)
You need to escape both . (any character) and / (regex delimiter).
As an alternative, you could just remove the first './' substring :
'./component/unit'.sub('./','')
#=> "component/unit"
All the other answers are fine, but I think you are not really dealing with a String here but with a URI or Pathname, so I would advise you to use these classes if you can. If so, please adjust the title, as it is not about do-it-yourself-regexes, but about proper use of the available libraries.
Link to the ruby doc:
https://docs.ruby-lang.org/en/2.1.0/URI.html
and
https://ruby-doc.org/stdlib-2.1.0/libdoc/pathname/rdoc/Pathname.html
An example with Pathname is:
require 'pathname'
pathname = Pathname.new('./component/unit')
puts pathname.cleanpath # => "component/unit"
# pathname.to_s # => "component/unit"
Whether this is a good idea (and/or using URI would be cool too) also depends on what your real problem is, i.e. what you want to do with the extracted String. As stated, I doubt a bit that you are really intested in Strings.
Using a positive lookbehind, you could do use regex:
reg = /(?<=\.\/)[\w+\/]+\w+\z/
Demo
str = './component'
str2 = './component/unit'
str3 = './component/unit/ruby'
str4 = './component/unit/ruby/regex'
[str, str2, str3, str4].each { |s| puts s[reg] }
#component
#component/unit
#component/unit/ruby
#component/unit/ruby/regex

How to replace a specific character in a string along with the immediate next character

I have a string of text:
string = "%hello %world ho%w is i%t goin%g"
I want to return the following:
"Hello World hoW is iT goinG
The % sign is a key that tells me the next character should be capitalized. The closest I have gotten so far is:
#thing = "%this is a %test this is %only a %test"
if #thing.include?('%')
indicator_position = #thing.index("%")
lowercase_letter_position = indicator_position + 1
lowercase_letter = #thing[lowercase_letter_position]
#thing.gsub!("%#{lowercase_letter}","#{lowercase_letter.upcase}")
end
This returns:
"This is a Test this is %only a Test"
It looks like I need to iterate through the string to make it work as it is only replacing the lowercase 't' but I can't get it to work.
You can do this with gsub and a block:
string.gsub(/%(.)/) do |m|
m[1].upcase
end
Using a block allows you to run arbitrary code on each match.
Inferior to #tadman, but you could write:
string.gsub(/%./, &:upcase).delete('%')
#=> "Hello World hoW is iT goinG

Resources