I am trying to clean up email strings surrounded by extra characters. The method I am using is as follows:
def email_clean(email)
email = email.gsub(/(<+\w)/, "")
email = email.gsub(/(>+\w)/, "")
email = email.gsub(/(\w+=)/,"")
email = email.gsub(/(\w+:)/, "")
email = email.gsub!(/\A"|"\Z/, '')
email = email.delete('"')
return email
end
I'm calling it with the following example string:
email_clean("href="mailto:darren#*********.com"><span")
And getting the following output:
darren#*********.coman
I am trying to figure out why the first two gsub calls did not remove the trailing "an" when removing the angle brackets.
Your regular expression here is a problem:
email = email.gsub(/(<+\w)/, "")
This removes one or more < characters followed by a single word character. What you probably meant was:
/<\w+/
Though based on your data, you can probably trash everything after the <:
/<.*/
Keep in mind you can chain gsub operations together, plus you can rack up a bunch of "cleaner" expressions in an array defined beforehand:
MOPS = [
/<.*/,
/\A"|"\Z/
]
MOPS.inject(email) do |e, mop|
e.gsub(mop, '')
end
Related
I have this line:msg = "Couldn't find column: #{missing_columns.map(&:inspect).join(',')}"
that outputs: Couldn't find column: /firstname/i, /lastname/i
Is there a way that I can use gsub to return only the name of the column without the "/" and "/i"? Or is there a better way to do it?
I've tried errors = msg.gsub(/\/|i/, '') but it returns the the first missing column with "frstname".
Given that these appear to be case insensitive regular expressions meaning
missing_columns
#=> [/firstname/i,/lastname/i]
In this case rather than converting them to strings and trying to manipulate them from there you can use methods that a Regexp already responds to e.g. Regexp#source
Regexp#source - "Returns the original string of the pattern." It will not return the literal boundaries (/) or the options (i in this case)
missing_columns.map(&:source).join(', ')
#=> "firstname, lastname"
/\/|i/
Let's break this down. The // on the outside are delimiters, sort of like quotation marks for strings. So the actual regex is on the inside.
\/|i
\/ says to match a literal forward slash. \ prevents it from being interpreted as the end of the regular expression.
i says to match a literal i. So far nothing fancy. But | is an alternation. It says to match either the thing on the left or the thing on the right. Effectively, this removes all slashes and i from your string. You want to remove all / or /i, but not i on its own. You can still do that with alternation, provided you include the slash on both sides.
/\/|\/i/
You can also do it more compactly with the ? modifier, which makes the thing before it optional.
/\/i?/
Finally, you can avoid the /\/ fencepost shenanigans by using the %r{...} regular expression form rather than /.
%r{/i?}
All in all, that's
errors = msg.gsub(%r{/i?}, '')
It seems that missing_columns contains an array of Regexps. So you can use Regexp#source instead of Regexp#inspect.
For instance
msg = "Couldn't find column: #{missing_columns.map(&:source).join(', ')}"
pp msg # => "Couldn't find column: firstname, lastname"
instead of
msg = "Couldn't find column: #{missing_columns.map(&:inspect).join(', ')}"
pp msg # => "Couldn't find column: /firstname/i, /lastname/i"
Feel free to browse the documentation for Regexp#source.
hope this helps!
I want to write a regex in Ruby that will add a backslash prior to any open square brackets.
str = "my.name[0].hello.line[2]"
out = str.gsub(/\[/,"\\[")
# desired out = "my.name\[0].hello.line\[2]"
I've tried multiple combinations of backslashes in the substitution string and can't get it to leave a single backslash.
You don't need a regular expression here.
str = "my.name[0].hello.line[2]"
puts str.gsub('[', '\[')
# my.name\[0].hello.line\[2]
I tried your code and it worked correct:
str = "my.name[0].hello.line[2]"
out = str.gsub(/\[/,"\\[")
puts out #my.name\[0].hello.line\[2]
If you replace putswith p you get the inspect-version of the string:
p out #"my.name\\[0].hello.line\\[2]"
Please see the " and the masked \. Maybe you saw this result.
As Daniel already answered: You can also define the string with ' and don't need to mask the values.
I was doing the challenges from pythonchallenge writing code in ruby, specifically this one. It contains a really long string in page source with special characters. I was trying to find a way to delete them/check for the alphabetical chars.
I tried using scan method, but I think I might not use it properly. I also tried delete! like that:
a = "PAGE SOURCE CODE PASTED HERE"
a.delete! "!", "#" #and so on with special chars, does not work(?)
a
How can I do that?
Thanks
You can do this
a.gsub!(/[^0-9A-Za-z]/, '')
try with gsub
a.gsub!(/[!#%&"]/,'')
try the regexp on rubular.com
if you want something more general you can have a string with valid chars and remove what's not in there:
a.gsub!(/[^abcdefghijklmnopqrstuvwxyz ]/,'')
When you give multiple arguments to string#delete, it's the intersection of those arguments that is deleted. a.delete! "!", "#" deletes the intersections of the sets ! and # which means that nothing will be deleted and the method returns nil.
What you wanted to do is a.delete! "!#" with the characters to delete passed as a single string.
Since the challenge is asking to clean up the mess and find a message in it, I would go with a whitelist instead of deleting special characters. The delete method accepts ranges with - and negations with ^ (similar to a regex) so you can do something like this: a.delete! "^A-Za-z ".
You could also use regular expressions as shown by #arieljuod.
gsub is one of the most used Ruby methods in the wild.
specialname="Hello!#$#"
cleanedname = specialname.gsub(/[^a-zA-Z0-9\-]/,"")
I think a.gsub(/[^A-Za-z0-9 ]/, '') works better in this case. Otherwise, if you have a sentence, which typically should start with a capital letter, you will lose your capital letter. You would also lose any 1337 speak, or other possible crypts within the text.
Case in point:
phrase = "Joe can't tell between 'large' and large."
=> "Joe can't tell between 'large' and large."
phrase.gsub(/[^a-z ]/, '')
=> "oe cant tell between large and large"
phrase.gsub(/[^A-Za-z0-9 ]/, '')
=> "Joe cant tell between large and large"
phrase2 = "W3 a11 f10a7 d0wn h3r3!"
phrase2.gsub(/[^a-z ]/, '')
=> " a fa dwn hr"
phrase2.gsub(/[^A-Za-z0-9 ]/, '')
=> "W3 a11 f10a7 d0wn h3r3"
If you don't want to change the original string - i.e. to solve the challenge.
str.each_char do |letter|
if letter =~ /[a-z]/
p letter
end
end
You will have to write down your own string sanitize function, could easily use regex and the gsub method.
Atomic sample:
your_text.gsub!(/[!#\[;\]^%*\(\);\-_\/&\\|$\{#\}<>:`~"]/,'')
API sample:
Route: post 'api/sanitize_text', to: 'api#sanitize_text'
Controller:
def sanitize_text
return render_bad_request unless params[:text].present? && params[:text].present?
sanitized_text = params[:text].gsub!(/[!#\[;\]^%*\(\);\-_\/&\\|$\{#\}<>:`~"]/,'')
render_response( {safe_text: sanitized_text})
end
Then you call it
POST /api/sanitize_text?text=abcdefghijklmnopqrstuvwxyz123456<>$!#%23^%26*[]:;{}()`,.~'"\|/
I have the following code which is supposed to be removing a particular email address from a string if it exists. The problem is i get the error "invalid range "y-d" in string transliteration (ArgumentError)" which I assume is because it's treating my input as a regex. I will need to do this delete by a variable in the actual code, not a string literal but this is a simplified version of the problem.
So how do I properly perform this operation?
myvar = "test1#my-domain.com test2#my-domain.com"
myvar = myvar.delete("test1#my-domain.com")
Try
myvar = "test1#my-domain.com test2#my-domain.com"
myvar = myvar.gsub("test1#my-domain.com", '').strip
String#delete(str) does not delete the literal string str but builds a set out of individual characters of str and deletes all occurrences of these characters. try this:
"sets".delete("test")
=> ""
"sets".delete("est")
=> ""
The hyphen has a special meaning, it defines a range of characters. String#delete("a-d") will delete all occurrences of a,b,c and d characters. Range boundary characters should be given in ascending order: you should write "a-d" but not "d-a".
In your original example, ruby tries to build a character range from y-d substring and fails.
Use String#gsub method instead.
You can do it like this
myvar = "test1#my-domain.com test2#my-domain.com"
remove = "test1#my-domain.com"
myvar.gsub!(remove, "")
Edit: I solved this by using strip! to remove leading and trailing whitespaces as I show in this video. Then, I followed up by restoring the white space at the end of each string the array by iterating through and adding whitespace. This problem varies from the "dupe" as my intent is to keep the whitespace at the end. However, strip! will remove both the leading and trailing whitespace if that is your intent. (I would have made this an answer, but as this is incorrectly marked as a dupe, I could only edit my original question to include this.)
I have an array of words where I am trying to remove any whitespace that may exist at the beginning of the word instead of at the end. rstrip! just takes care of the end of a string. I want whitespaces removed from the beginning of a string.
example_array = ['peanut', ' butter', 'sammiches']
desired_output = ['peanut', 'butter', 'sammiches']
As you can see, not all elements in the array have the whitespace problem, so I can't just delete the first character as I would if all elements started with a single whitespace char.
Full code:
words = params[:word].gsub("\n", ",").delete("\r").split(",")
words.delete_if {|x| x == ""}
words.each do |e|
e.lstrip!
end
Sample text that a user may enter on the form:
Corn on the cob,
Fibonacci,
StackOverflow
Chat, Meta, About
Badges
Tags,,
Unanswered
Ask Question
String#lstrip (or String#lstrip!) is what you're after.
desired_output = example_array.map(&:lstrip)
More comments about your code:
delete_if {|x| x == ""} can be replaced with delete_if(&:empty?)
Except you want reject! because delete_if will only return a different array, rather than modify the existing one.
words.each {|e| e.lstrip!} can be replaced with words.each(&:lstrip!)
delete("\r") should be redundant if you're reading a windows-style text document on a Windows machine, or a Unix-style document on a Unix machine
split(",") can be replaced with split(", ") or split(/, */) (or /, ?/ if there should be at most one space)
So now it looks like:
words = params[:word].gsub("\n", ",").split(/, ?/)
words.reject!(&:empty?)
words.each(&:lstrip!)
I'd be able to give more advice if you had the sample text available.
Edit: Ok, here goes:
temp_array = text.split("\n").map do |line|
fields = line.split(/, */)
non_empty_fields = fields.reject(&:empty?)
end
temp_array.flatten(1)
The methods used are String#split, Enumerable#map, Enumerable#reject and Array#flatten.
Ruby also has libraries for parsing comma seperated files, but I think they're a little different between 1.8 and 1.9.
> ' string '.lstrip.chop
=> "string"
Strips both white spaces...