Replace/strip empty lines using gsub - ruby

I have an HTML page:
<strong>
Product Name:
</strong>
I want to strip its empty lines (^\n or ^$). Expected HTML is:
<strong>
Product Name:
</strong>
Here is my syntax:
r.gsub!(/^\\n/, '')
It doesn't seem to work. I tried many combinations and I can't get it to do anything. puts r.class => string and r always have spaces in them. I'm actually trying a larger set of reductions:
r.gsub!(/\\n\s+?/, '').gsub!(/\\t\s+?/, '').gsub!(/^\\n/, '')

The problem seems to be that you are escaping backslashes when you shouldn't be. E.g. /\\n/ will match the string \n, not a newline character. /\n/ will match a newline character. Same goes for \t.
If you want to play around with Ruby regular expressions, I recommend checking out Rubular.
Also, be careful with gsub!, especially chaining them like that. gsub! returns nil if nothing is replaced and you will get an undefined method for nil error on subsequent calls. You're much better off with
r = r.gsub(...).gsub(...) ...

I got it to work.
r = r.gsub(/\t\s+?/, "")
r = r.gsub(/^\s*$/, "")
The "\n" can be encapsulated by \s*. $ does not mean \n.

Related

How do I remove "/" and "/i" in a returned string with gsub?

I have this line:msg = "Couldn't find column: #{missing_columns.map(&:inspect).join(',')}"
that outputs: Couldn't find column: /firstname/i, /lastname/i
Is there a way that I can use gsub to return only the name of the column without the "/" and "/i"? Or is there a better way to do it?
I've tried errors = msg.gsub(/\/|i/, '') but it returns the the first missing column with "frstname".
Given that these appear to be case insensitive regular expressions meaning
missing_columns
#=> [/firstname/i,/lastname/i]
In this case rather than converting them to strings and trying to manipulate them from there you can use methods that a Regexp already responds to e.g. Regexp#source
Regexp#source - "Returns the original string of the pattern." It will not return the literal boundaries (/) or the options (i in this case)
missing_columns.map(&:source).join(', ')
#=> "firstname, lastname"
/\/|i/
Let's break this down. The // on the outside are delimiters, sort of like quotation marks for strings. So the actual regex is on the inside.
\/|i
\/ says to match a literal forward slash. \ prevents it from being interpreted as the end of the regular expression.
i says to match a literal i. So far nothing fancy. But | is an alternation. It says to match either the thing on the left or the thing on the right. Effectively, this removes all slashes and i from your string. You want to remove all / or /i, but not i on its own. You can still do that with alternation, provided you include the slash on both sides.
/\/|\/i/
You can also do it more compactly with the ? modifier, which makes the thing before it optional.
/\/i?/
Finally, you can avoid the /\/ fencepost shenanigans by using the %r{...} regular expression form rather than /.
%r{/i?}
All in all, that's
errors = msg.gsub(%r{/i?}, '')
It seems that missing_columns contains an array of Regexps. So you can use Regexp#source instead of Regexp#inspect.
For instance
msg = "Couldn't find column: #{missing_columns.map(&:source).join(', ')}"
pp msg # => "Couldn't find column: firstname, lastname"
instead of
msg = "Couldn't find column: #{missing_columns.map(&:inspect).join(', ')}"
pp msg # => "Couldn't find column: /firstname/i, /lastname/i"
Feel free to browse the documentation for Regexp#source.
hope this helps!

Removing all whitespace from a string in Ruby

How can I remove all newlines and spaces from a string in Ruby?
For example, if we have a string:
"123\n12312313\n\n123 1231 1231 1"
It should become this:
"12312312313123123112311"
That is, all whitespaces should be removed.
You can use something like:
var_name.gsub!(/\s+/, '')
Or, if you want to return the changed string, instead of modifying the variable,
var_name.gsub(/\s+/, '')
This will also let you chain it with other methods (i.e. something_else = var_name.gsub(...).to_i to strip the whitespace then convert it to an integer). gsub! will edit it in place, so you'd have to write var_name.gsub!(...); something_else = var_name.to_i. Strictly speaking, as long as there is at least one change made,gsub! will return the new version (i.e. the same thing gsub would return), but on the chance that you're getting a string with no whitespace, it'll return nil and things will break. Because of that, I'd prefer gsub if you're chaining methods.
gsub works by replacing any matches of the first argument with the contents second argument. In this case, it matches any sequence of consecutive whitespace characters (or just a single one) with the regex /\s+/, then replaces those with an empty string. There's also a block form if you want to do some processing on the matched part, rather than just replacing directly; see String#gsub for more information about that.
The Ruby docs for the class Regexp are a good starting point to learn more about regular expressions -- I've found that they're useful in a wide variety of situations where a couple of milliseconds here or there don't count and you don't need to match things that can be nested arbitrarily deeply.
As Gene suggested in his comment, you could also use tr:
var_name.tr(" \t\r\n", '')
It works in a similar way, but instead of replacing a regex, it replaces every instance of the nth character of the first argument in the string it's called on with the nth character of the second parameter, or if there isn't, with nothing. See String#tr for more information.
You could also use String#delete:
str = "123\n12312313\n\n123 1231 1231 1"
str.delete "\s\n"
#=> "12312312313123123112311"
You could use String#delete! to modify str in place, but note delete! returns nil if no change is made
Alternatively you could scan the string for digits /\d+/ and join the result:
string = "123\n\n12312313\n\n123 1231 1231 1\n"
string.scan(/\d+/).join
#=> "12312312313123123112311"
Please note that this would also remove alphabetical characters, dashes, symbols, basically everything that is not a digit.

Deleting all special characters from a string - ruby

I was doing the challenges from pythonchallenge writing code in ruby, specifically this one. It contains a really long string in page source with special characters. I was trying to find a way to delete them/check for the alphabetical chars.
I tried using scan method, but I think I might not use it properly. I also tried delete! like that:
a = "PAGE SOURCE CODE PASTED HERE"
a.delete! "!", "#" #and so on with special chars, does not work(?)
a
How can I do that?
Thanks
You can do this
a.gsub!(/[^0-9A-Za-z]/, '')
try with gsub
a.gsub!(/[!#%&"]/,'')
try the regexp on rubular.com
if you want something more general you can have a string with valid chars and remove what's not in there:
a.gsub!(/[^abcdefghijklmnopqrstuvwxyz ]/,'')
When you give multiple arguments to string#delete, it's the intersection of those arguments that is deleted. a.delete! "!", "#" deletes the intersections of the sets ! and # which means that nothing will be deleted and the method returns nil.
What you wanted to do is a.delete! "!#" with the characters to delete passed as a single string.
Since the challenge is asking to clean up the mess and find a message in it, I would go with a whitelist instead of deleting special characters. The delete method accepts ranges with - and negations with ^ (similar to a regex) so you can do something like this: a.delete! "^A-Za-z ".
You could also use regular expressions as shown by #arieljuod.
gsub is one of the most used Ruby methods in the wild.
specialname="Hello!#$#"
cleanedname = specialname.gsub(/[^a-zA-Z0-9\-]/,"")
I think a.gsub(/[^A-Za-z0-9 ]/, '') works better in this case. Otherwise, if you have a sentence, which typically should start with a capital letter, you will lose your capital letter. You would also lose any 1337 speak, or other possible crypts within the text.
Case in point:
phrase = "Joe can't tell between 'large' and large."
=> "Joe can't tell between 'large' and large."
phrase.gsub(/[^a-z ]/, '')
=> "oe cant tell between large and large"
phrase.gsub(/[^A-Za-z0-9 ]/, '')
=> "Joe cant tell between large and large"
phrase2 = "W3 a11 f10a7 d0wn h3r3!"
phrase2.gsub(/[^a-z ]/, '')
=> " a fa dwn hr"
phrase2.gsub(/[^A-Za-z0-9 ]/, '')
=> "W3 a11 f10a7 d0wn h3r3"
If you don't want to change the original string - i.e. to solve the challenge.
str.each_char do |letter|
if letter =~ /[a-z]/
p letter
end
end
You will have to write down your own string sanitize function, could easily use regex and the gsub method.
Atomic sample:
your_text.gsub!(/[!#\[;\]^%*\(\);\-_\/&\\|$\{#\}<>:`~"]/,'')
API sample:
Route: post 'api/sanitize_text', to: 'api#sanitize_text'
Controller:
def sanitize_text
return render_bad_request unless params[:text].present? && params[:text].present?
sanitized_text = params[:text].gsub!(/[!#\[;\]^%*\(\);\-_\/&\\|$\{#\}<>:`~"]/,'')
render_response( {safe_text: sanitized_text})
end
Then you call it
POST /api/sanitize_text?text=abcdefghijklmnopqrstuvwxyz123456<>$!#%23^%26*[]:;{}()`,.~'"\|/

Ruby: Remove whitespace chars at the beginning of a string

Edit: I solved this by using strip! to remove leading and trailing whitespaces as I show in this video. Then, I followed up by restoring the white space at the end of each string the array by iterating through and adding whitespace. This problem varies from the "dupe" as my intent is to keep the whitespace at the end. However, strip! will remove both the leading and trailing whitespace if that is your intent. (I would have made this an answer, but as this is incorrectly marked as a dupe, I could only edit my original question to include this.)
I have an array of words where I am trying to remove any whitespace that may exist at the beginning of the word instead of at the end. rstrip! just takes care of the end of a string. I want whitespaces removed from the beginning of a string.
example_array = ['peanut', ' butter', 'sammiches']
desired_output = ['peanut', 'butter', 'sammiches']
As you can see, not all elements in the array have the whitespace problem, so I can't just delete the first character as I would if all elements started with a single whitespace char.
Full code:
words = params[:word].gsub("\n", ",").delete("\r").split(",")
words.delete_if {|x| x == ""}
words.each do |e|
e.lstrip!
end
Sample text that a user may enter on the form:
Corn on the cob,
Fibonacci,
StackOverflow
Chat, Meta, About
Badges
Tags,,
Unanswered
Ask Question
String#lstrip (or String#lstrip!) is what you're after.
desired_output = example_array.map(&:lstrip)
More comments about your code:
delete_if {|x| x == ""} can be replaced with delete_if(&:empty?)
Except you want reject! because delete_if will only return a different array, rather than modify the existing one.
words.each {|e| e.lstrip!} can be replaced with words.each(&:lstrip!)
delete("\r") should be redundant if you're reading a windows-style text document on a Windows machine, or a Unix-style document on a Unix machine
split(",") can be replaced with split(", ") or split(/, */) (or /, ?/ if there should be at most one space)
So now it looks like:
words = params[:word].gsub("\n", ",").split(/, ?/)
words.reject!(&:empty?)
words.each(&:lstrip!)
I'd be able to give more advice if you had the sample text available.
Edit: Ok, here goes:
temp_array = text.split("\n").map do |line|
fields = line.split(/, */)
non_empty_fields = fields.reject(&:empty?)
end
temp_array.flatten(1)
The methods used are String#split, Enumerable#map, Enumerable#reject and Array#flatten.
Ruby also has libraries for parsing comma seperated files, but I think they're a little different between 1.8 and 1.9.
> ' string '.lstrip.chop
=> "string"
Strips both white spaces...

How to remove the first 4 characters from a string if it matches a pattern in Ruby

I have the following string:
"h3. My Title Goes Here"
I basically want to remove the first four characters from the string so that I just get back:
"My Title Goes Here".
The thing is I am iterating over an array of strings and not all have the h3. part in front so I can't just ditch the first four characters blindly.
I checked the docs and the closest thing I could find was chomp, but that only works for the end of a string.
Right now I am doing this:
"h3. My Title Goes Here".reverse.chomp(" .3h").reverse
This gives me my desired output, but there has to be a better way. I don't want to reverse a string twice for no reason. Is there another method that will work?
To alter the original string, use sub!, e.g.:
my_strings = [ "h3. My Title Goes Here", "No h3. at the start of this line" ]
my_strings.each { |s| s.sub!(/^h3\. /, '') }
To not alter the original and only return the result, remove the exclamation point, i.e. use sub. In the general case you may have regular expressions that you can and want to match more than one instance of, in that case use gsub! and gsub—without the g only the first match is replaced (as you want here, and in any case the ^ can only match once to the start of the string).
You can use sub with a regular expression:
s = 'h3. foo'
s.sub!(/^h[0-9]+\. /, '')
puts s
Output:
foo
The regular expression should be understood as follows:
^ Match from the start of the string.
h A literal "h".
[0-9] A digit from 0-9.
+ One or more of the previous (i.e. one or more digits)
\. A literal period.
A space (yes, spaces are significant by default in regular expressions!)
You can modify the regular expression to suit your needs. See a regular expression tutorial or syntax guide, for example here.
A standard approach would be to use regular expressions:
"h3. My Title Goes Here".gsub /^h3\. /, '' #=> "My Title Goes Here"
gsub means globally substitute and it replaces a pattern by a string, in this case an empty string.
The regular expression is enclosed in / and constitutes of:
^ means beginning of the string
h3 is matched literally, so it means h3
\. - a dot normally means any character so we escape it with a backslash
is matched literally

Resources