I'm a little stumped on this one and I'm also not still on 1.8 so I don't have lookahead.
I have a bunch of strings which can look like:
"a/b/c/d/e/f 1/2/3"
which I want to turn into:
"a/b/c/d/e" "f" "1/2" "3"
So basically I want it to split by the last slash before the beginning of whitespace. I feel like I can do this normally but split always seems to do weird things.
1.8 lacks lookbehind, not lookahead! All you need is this:
str.split(/\/(?=[^\/]+(?: |$))| /)
This split pattern matches a) any slash that is followed by non-slash characters up to the next space or the end of the string, and b) any space.
def foo s
return [$1,$2] if s =~ /(.+)\/(\S)/
end
str = "a/b/c/d/e/f 1/2/3"
a = str.split /\s+/
a.collect { |e| foo e }.flatten
=> ["/a/b/c/d/e", "f", "1/2", "3"]
I broke down the split and collect. You could, of course, shorten this as needed.
Related
Based on "How to Delete Strings that Start with Certain Characters in Ruby", I know that the way to remove a string that starts with the character "#" is:
email = email.gsub( /(?:\s|^)#.*/ , "") #removes strings that start with "#"
I want to also remove strings that end in ".". Inspired by "Difference between \A \z and ^ $ in Ruby regular expressions" I came up with:
email = email.gsub( /(?:\s|$).*\./ , "")
Basically I used gsub to remove the dollar sign for the carrot and reversed the order of the part after the closing parentheses (making sure to escape the period). However, it is not doing the trick.
An example I'd like to match and remove is:
"a8&23q2aas."
You were so close.
email = email.gsub( /.*\.\s*$/ , "")
The difference lies in the fact that you didn't consider the relationship between string of reference and the regex tokens that describe the condition you wish to trigger. Here, you are trying to find a period (\.) which is followed only by whitespace (\s) or the end of the line ($). I would read the regex above as "Any characters of any length followed by a period, followed by any amount of whitespace, followed by the end of the line."
As commenters pointed out, though, there's a simpler way: String#end_with?.
I'd use:
words = %w[#a day in the life.]
# => ["#a", "day", "in", "the", "life."]
words.reject { |w| w.start_with?('#') || w.end_with?('.') }
# => ["day", "in", "the"]
Using a regex is overkill for this if you're only concerned with the starting or ending character, and, in fact, regular expressions will slow your code in comparison with using the built-in methods.
I would really like to stick to using gsub....
gsub is the wrong way to remove an element from an array. It could be used to turn the string into an empty string, but that won't remove that element from the array.
def replace_suffix(str,suffix)
str.end_with?(suffix)? str[0, str.length - suffix.length] : str
end
I want to convert all the words(alphabetic) in the string to their abbreviations like i18n does. In other words I want to change "extraordinary" into "e11y" because there are 11 characters between the first and the last letter in "extraordinary". It works with a single word in the string. But how can I do the same for a multi-word string? And of course if a word is <= 4 there is no point to make an abbreviation from it.
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/, "#{x[0]}#{(x.length-2)}#{x[-1]}")
end
end
Test.assert_equals( Abbreviator.abbreviate("banana"), "b4a", Abbreviator.abbreviate("banana") )
Test.assert_equals( Abbreviator.abbreviate("double-barrel"), "d4e-b4l", Abbreviator.abbreviate("double-barrel") )
Test.assert_equals( Abbreviator.abbreviate("You, and I, should speak."), "You, and I, s4d s3k.", Abbreviator.abbreviate("You, and I, should speak.") )
Your mistake is that your second parameter is a substitution string operating on x (the original entire string) as a whole.
Instead of using the form of gsub where the second parameter is a substitution string, use the form of gsub where the second parameter is a block (listed, for example, third on this page). Now you are receiving each substring into your block and can operate on that substring individually.
def short_form(str)
str.gsub(/[[:alpha:]]{4,}/) { |s| "%s%d%s" % [s[0], s.size-2, s[-1]] }
end
The regex reads, "match four or more alphabetic characters".
short_form "abc" # => "abc"
short_form "a-b-c" #=> "a-b-c"
short_form "cats" #=> "c2s"
short_form "two-ponies-c" #=> "two-p4s-c"
short_form "Humpty-Dumpty, who sat on a wall, fell over"
#=> "H4y-D4y, who sat on a w2l, f2l o2r"
I would recommend something along the lines of this:
class Abbreviator
def self.abbreviate(x)
x.gsub(/\w+/) do |word|
# Skip the word unless it's long enough
next word unless word.length > 4
# Do the same I18n conversion you do before
"#{word[0]}#{(word.length-2)}#{word[-1]}"
end
end
end
The accepted answer isn't bad, but it can be made a lot simpler by not matching words that are too short in the first place:
def abbreviate(str)
str.gsub(/([[:alpha:]])([[:alpha:]]{3,})([[:alpha:]])/i) { "#{$1}#{$2.size}#{$3}" }
end
abbreviate("You, and I, should speak.")
# => "You, and I, s4d s3k."
Alternatively, we can use lookbehind and lookahead, which makes the Regexp more complex but the substitution simpler:
def abbreviate(str)
str.gsub(/(?<=[[:alpha:]])[[:alpha:]]{3,}(?=[[:alpha:]])/i, &:size)
end
I meet some hard task for me. I has a string which need to parse into array and some other elements. I have a troubles with REGEXP so wanna ask help.
I need delete from string all non-digits, except commas (,) and dashes (-)
For example:
"!1,2e,3,6..-10" => "1,2,3,6-10"
"ffff5-10...." => "5-10"
"1.2,15" => "12,15"
and so.
[^0-9,-]+
This should do it for you.Replace by empty string.See demo.
https://regex101.com/r/vV1wW6/44
We must have at least one non-regex solution:
def keep_some(str, keepers)
str.delete(str.delete(keepers))
end
keep_some("!1,2e,3,6..-10", "0123456789,-")
#=> "1,2,3,6-10"
keep_some("ffff5-10....", "0123456789,-")
#=> "5-10"
keep_some("1.2,15", "0123456789,-")
#=> "12,15"
"!1,2e,3,6..-10".gsub(/[^\d,-]+/, '') # => "1,2,3,6-10"
Use String#gsub with a pattern that matches everything except what you want to keep, and replace it with the empty string. In a reguar expression, the negated character class [^whatever] matches everything except the characters in the "whatever", so this works:
a_string.gsub /[^0-9,-]/, ''
Note that the hyphen has to come last, as otherwise it will be interpreted as a range indicator.
To demonstrate, I put all your "before" strings into an Array and used Enumerable#map to run the above gsub call on all of them, producing an Array of the "after" strings:
["!1,2e,3,6..-10", "ffff5-10....", "1.2,15"].map { |s| s.gsub /[^0-9,-]/, '' }
# => ["1,2,3,6-10", "5-10", "12,15"]
I'm writing a Rack app to split hostnames ending with certain prefixes.
For example, the hostname (and port) hello.world.lvh.me:3000 needs to be split into tokens hello.world, .lvh.me and :3000. Additionally, the prefix (hello.world), suffix (.lvh.me) and port (:3000) are all optional.
So far, I have a (Ruby) regex that looks like /(.*)(\.lvh\.me)(\:\d+)?/.
This successfully breaks the hostname into component parts but it falls down when one or more of the optional components is missing, e.g. for hello.world:3000 or lvh.me:3000 or even plain old hello.world.
I've tried adding ? to each group to make them optional (/(.*)?(\.lvh\.me)?(\:(\d+)?/) but this invariably ends up with the first group, (.*), capturing the entire string and stopping there.
My gut feeling is that this is something which might be solved using lookaround but I'll admit this is a totally new realm of regex for me.
You can try with this pattern:
\A(?=[^:])(.+?)??((?:\.|\A)lvh\.me)?(:[0-9]+)?\z
the lookahead (?=[^:]) checks there is at least one character that is not the : (in other words, not the port alone). This means that at least hello.word or lvh.me is present.
The first group is optional and non-greedy ??, this means that it is matched only when needed.
\A and \z are anchors for the start and the end of the string (when ^ and $ are used for the line)
Note that the character class \d matches all unicode digits in Ruby, but in this case you only need ascii digits. It's better to use [0-9]
Note too that \A(?=[^:])((?>[^l:\n.]+|\.|\Bl|l(?!vh\.me\b))*)((?:\.|\A)lvh\.me)?(:[0-9]+)?\z may be more performant.
online demo
Try ^(.*?)?(\.?lvh\.me)?(\:\d+)?$
I added:
a ? to the first group making the * non-greedy
^,$ to anchor it to the start and end.
a ? to the \. before lvh because you want to match lvh.me:3000 not .lvh.me:3000
A Tokenizing Answer
Just for fun, I decided to see if there was a relatively simple way to do what you wanted without a complicated regular expression. The only regular expressions I used were for splitting and validation.
This works for me with your provided corpus, and several variations.
str = 'hello.world.lvh.me:3000'
tokens = str.split /[.:]/
port = tokens.last =~ /\A\d+\z/ ? ?: + tokens.pop : ''
domain = sprintf '.%s.%s', *tokens.pop(2)
prefix = tokens.join ?.
You'll certainly need to check for empty strings in certain cases, but it seems like it might be more straightforward and/or flexible than a pure regex solution. I find it more readable, anyway. If you truly need a single regular expression, though, I'm sure one of the other answers will help you out.
You could try splitting rather than matching,
irb(main):012:0> "hello.world.lvh.me:3000".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world", "lvh.me", "3000"]
irb(main):013:0> "hello.world:3000".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world", "3000"]
irb(main):014:0> "lvh.me:3000".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["lvh.me", "3000"]
irb(main):015:0> "hello.world".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world"]
irb(main):016:0> "hello.world.lvh.me".split(/\.(?=[^.:]+\.[^:.]+(?::\d+)?$)|:/)
=> ["hello.world", "lvh.me"]
Look, ma, no regex!
def split_up(str)
str.sub(':','.:')
.split('.')
.each_slice(2)
.map { |arr| arr.join('.') }
end
split_up("hello.world.lvh.me:3000") #=> ["hello.world", "lvh.me", ":3000"]
split_up("hello.world:3000") #=> ["hello.world", ":3000"]
split_up("hello.world.lvh.me") #=> ["hello.world", "lvh.me"]
split_up("hello.world") #=> ["hello.world"]
split_up("") #=> []
Steps:
str1 = "hello.world.lvh.me:3000" #=> "hello.world.lvh.me:3000"
str2 = str1.sub(':','.:') #=> "hello.world.lvh.me.:3000"
arr = str2.split('.') #=> ["hello", "world", "lvh", "me", ":3000"]
enum = arr.each_slice(2) #=> #<Enumerator: ["hello", "world", "lvh",
# "me", ":3000"]:each_slice(2)>
enum.to_a #=> [["hello", "world"], ["lvh", "me"],
# [":3000"]]
enum.map { |arr| arr.join('.') } #=> ["hello.world", "lvh.me", ":3000"]
If I wanted to remove things like:
.!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.
Allowed alphabetical characters should also include letters with diacritical marks including à or ç.
You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):
"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"
For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:
"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"
For all character properties, you can refer to the doc.
string.gsub(/[^[:alnum:]]/, "")
The following will work for an array:
z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect
I borrowed Jeremy's suggested regex.
You might consider a regular expression.
http://www.regular-expressions.info/ruby.html
I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.
A regexp you might use might go something like this:
[^.!,^-#]
That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.
If you truly have an array (as you state) and it is an array of strings (I'm guessing), e.g.
foo = [ "hello", "42 cats!", "yöwza" ]
then I can imagine that you either want to update each string in the array with a new value, or that you want a modified array that only contains certain strings.
If the former (you want to 'clean' every string the array) you could do one of the following:
foo.each{ |s| s.gsub! /\p{^Alnum}/, '' } # Change every string in place…
bar = foo.map{ |s| s.gsub /\p{^Alnum}/, '' } # …or make an array of new strings
#=> [ "hello", "42cats", "yöwza" ]
If the latter (you want to select a subset of the strings where each matches your criteria of holding only alphanumerics) you could use one of these:
# Select only those strings that contain ONLY alphanumerics
bar = foo.select{ |s| s =~ /\A\p{Alnum}+\z/ }
#=> [ "hello", "yöwza" ]
# Shorthand method for the same thing
bar = foo.grep /\A\p{Alnum}+\z/
#=> [ "hello", "yöwza" ]
In Ruby, regular expressions of the form /\A………\z/ require the entire string to match, as \A anchors the regular expression to the start of the string and \z anchors to the end.