Ruby how to remove a more than one space character? - ruby

Ruby
Okay, I want to remove a more than one space character in a strings if there's any. What I mean is, let's say I have a text like this:
I want to learn ruby more and more.
See there's a more than one space character after "to" and before "learn" either it a tab or just a several spaces. Now what I want is, how can I know if there's something like this in a text file, and I want to make it just one space per word or string. So it will become like this
I want to learn ruby more and more.
Can I use Gsub? or do I need to use other method? I tried Gsub, but can't figure out how to implement it the right way so it can produce the result I want. Hopefully I explained it clear. Any help is appreciated, thanks.

String#squeeze remove runs of more than one character:
'I want to learn ruby more and more.'.squeeze(' ')
# => "I want to learn ruby more and more."

You can use gsub to replace one or more whitespace (regex / +/) to a single whitespace:
'I want to learn ruby more and more.'.gsub(/ +/, " ")
#=> "I want to learn ruby more and more."

Use this regex to remove all whitespace from a string, including spaces and also tabs. I use this for stripping whitespace from email addresses on login fields.
' I want to learn ruby more and more.'.gsub(/\s/,"")
# => "Iwanttolearnrubymoreandmore."
The /\s/ matches any whitespace character including tabs, whereas / +/ won't.

Related

Escape whitespace characters in Ruby

I have a string:
"hello\n\nsomeletters\t\nmoreletters\n"
What I want:
"hello\\n\\nsomeletters\\t\\nmoreletters\\n"
How to do it?
I know a gsub way. But it sounds very simple and seems to be a common problem therefore I am sure that Ruby Gods have already sent us a solution.
There are different possibilities. The closest to what you want would be Regexp#escape:
Regexp.escape "hello\n\nsomeletters\t\nmoreletters\n"
#⇒ "hello\\n\\nsomeletters\\t\\nmoreletters\\n"
But be aware it will escape some other symbols having a special meaning in regular expressions.
Also, we have Shellwords#escape, which is probably not what you want here.
For escaping backslashes only there is no dedicated method because this operation basically has a little sense and it is not worth it to call it instead of:
"hello\n\nsomeletters\t\nmoreletters\n".gsub(
/\n|\t/, {"\n" => "\\n", "\t" => "\\t"}
)
Please note, there are no slash characters in the initial string, hence you are to match all the expected sequences.

delete matched characters using regex in ruby

I need to write a regex for the following text:
"How can you restate your point (something like: \"<font>First</font>\") as a clear topic?"
that keeps whatever is between the
\" \"
characters (in this case <font>First</font>
I came up with this:
/"How can you restate your point \(something like: |\) as a clear topic\?"/
but how do I get ruby to remove the unwanted surrounding text and only return <font>First</font>?
lookbehind, lookahead and making what is greedy, lazy.
str[/(?<=\").+?(?=\")/] #=> "<font>First</font>"
If you have strings just like that, you can .split and get the first:
> str.split(/"/)[1]
=> "<font>First</font>"
You certainly can use a regular expression, but you don't need to:
str = "How can you restate (like: \"<font>First</font>\") as a clear topic?"
str[str.index('"')+1...str.rindex('"')]
#=> "<font>First</font>"
or, for those like me who never use three dots:
str[str.index('"')+1..str.rindex('"')-1]

Convert Ruby string to *nix filename-compatible string

In Ruby I have an arbitrary string, and I'd like to convert it to something that is a valid Unix/Linux filename. It doesn't matter what it looks like in its final form, as long as it is visually recognizable as the string it started as. Some possible examples:
"Here's my string!" => "Heres_my_string"
"* is an asterisk, you see" => "is_an_asterisk_you_see"
Is there anything built-in (maybe in the file libraries) that will accomplish this (or close to this)?
By your specifications, you could accomplish this with a regex replacement. This regex will match all characters other than basic letters and digits:
s/[^\w\s_-]+//g
This will remove any extra whitespace in between words, as shown in your examples:
s/(^|\b\s)\s+($|\s?\b)/\\1\\2/g
And lastly, replace the remaining spaces with underscores:
s/\s+/_/g
Here it is in Ruby:
def friendly_filename(filename)
filename.gsub(/[^\w\s_-]+/, '')
.gsub(/(^|\b\s)\s+($|\s?\b)/, '\\1\\2')
.gsub(/\s+/, '_')
end
First, I see that it was asked purely in ruby, and second that it's not the same purpose (*nix filename compatible), but if you are using Rails, there is a method called parameterize that should help.
In rails console:
"Here's my string!".parameterize => "here-s-my-string"
"* is an asterisk, you see".parameterize => "is-an-asterisk-you-see"
I think that parameterize, as being compliant with URL specifications, may work as well with filenames :)
You can see more about here:
http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-parameterize
There's also a whole lot of another helpful methods.

regex to match trailing whitespace, but not lines which are entirely whitespace (indent placeholders)

I've been trying to construct a ruby regex which matches trailing spaces - but not indentation placeholders - so I can gsub them out.
I had this /\b[\t ]+$/ and it was working a treat until I realised it only works when the line ends are [a-zA-Z]. :-( So I evolved it into this /(?!^[\t ]+)[\t ]+$/ and it seems like it's getting better, but it still doesn't work properly. I've spent hours trying to get this to work to no avail. Please help.
Here's some text test so it's easy to throw into Rubular, but the indent lines are getting stripped so it'll need a few spaces and/or tabs. Once lines 3 & 4 have spaces back in, it shouldn't match on lines 3-5, 7, 9.
some test test
some test test
some other test (text)
some other test (text)
likely here{ dfdf }
likely here{ dfdf }
and this ;
and this ;
Alternatively, is there an simpler / more elegant way to do this?
If you're using 1.9, you can use look-behind:
/(?<=\S)[\t ]+$/
but unfortunately, it's not supported in older versions of ruby, so you'll have to handle the captured character:
str.gsub(/(\S)[\t ]+$/) { $1 }
Your first expression is close, and you just need to change the \b to a negated character class. This should work better:
/([^\t ])[\t ]+$
In plain words, this matches all tabs and spaces on lines that follow a character that is not a tab or a space.
Wouldn't this help?
/([^\t ])([\t ]+)$/
You need to do something with the matched last non-space character, though.
edit: oh, you meant non blank lines. Then you would need something like /([^\s])\s+/ and sub them with the first part
I'm not entirely sure what you are asking for, but wouldn't something like this work if you just want to capture the trailing whitespaces?
([\s]+)$
or if you only wanted to capture tabs
([ \t]+)$
Since regexes are greedy, they'll capture as much as they can. You don't really need to give them context beforehand if you know what you want to capture.
I still am not sure what you mean by trailing indentation placeholders, so I'm sorry if I'm misunderstanding.
perhaps this...
[\t|\s]+?$
or
[ ]+$

how to convert strings like "this is an example" to "this-is-an-example" under ruby

How do I convert strings like "this is an example" to "this-is-an-example" under ruby?
The simplest version:
"this is an example".tr(" ", "-")
#=> "this-is-an-example"
You could also do something like this, which is slightly more robust and easier to extend by updating the regular expression:
"this is an example".gsub(/\s+/, "-")
#=> "this-is-an-example"
The above will replace all chunks of white space (any combination of multiple spaces, tabs, newlines) to a single dash.
See the String class reference for more details about the methods that can be used to manipulate strings in Ruby.
If you are trying to generate a string that can be used in a URL, you should also consider stripping other non-alphanumeric characters (especially the ones that have special meaning in URLs), or replacing them with an alphanumeric equivalent (example, as suggested by Rob Cameron in his answer).
If you are trying to make something that is a good URL slug, there are lots of ways to do it.
Generally, you want to remove everything that is not a letter or number, and then replace all whitespace characters with dashes.
So:
s = "this is an 'example'"
s = s.gsub(/\W+/, ' ').strip
s = s.gsub(/\s+/,'-')
At the end s will equal "this-is-an-example"
I used the source code from a ruby testing library called contest to get this particular way to do it.
If you're using Rails take a look at parameterize(), it does exactly what you're looking for:
http://api.rubyonrails.org/classes/ActiveSupport/CoreExtensions/String/Inflections.html#M001367
foo = "Hello, world!"
foo.parameterize => 'hello-world'

Resources