Convert leading spaces to tabs in ruby - ruby

Given the following indented text:
two spaces
four
six
non-leading spaces
I'd like to convert every 2 leading spaces to a tab, essentially converting from soft tabs to hard tabs. I'm looking for the following result (using an 'x' instead of "\t"):
xtwo spaces
xxfour
xxxsix
non-leading spaces
What is the most efficient or eloquent way to do this in ruby?
What I have so far seems to be working, but it doesn't feel right.
input.gsub!(/^ {2}/,"x")
res = []
input.split(/\n/).each do |line|
while line =~ /^x+ {2}/
line.gsub!(/^(x+) {2}/,"\\1x")
end
res << line
end
puts res.join("\n")
I noticed the answer using sed and \G:
perl -pe '1 while s/\G {2}/\t/gc' input.txt >output.txt
But I can't figure out how to mimic the pattern in Ruby. This is as far as I got:
rep = 1
while input =~ /^x* {2}/ && rep < 10
input.gsub!(/\G {2}/,"x")
rep += 1
end
puts input

Whats wrong with using (?:^ {2})|\G {2} in multi-line mode?
The first match will always be at the beginning of the line,
then \G will match succesively right next to that, or the match
will fail. The next match will always be the beginning of the line.. repeats.
In Perl its $str =~ s/(?:^ {2})|\G {2}/x/mg; or $str =~ s/(?:^ {2})|\G {2}/\t/mg;
Ruby http://ideone.com/oZ4Os
input.gsub!(/(?:^ {2})|\G {2}/m,"x")
Edit: Of course the anchors can be factored out and put into an alternation
http://ideone.com/1oDOJ
input.gsub!(/(?:^|\G) {2}/m,"x")

You can just use a single gsub for that:
str.gsub(/^( {2})+/) { |spaces| "\t" * (spaces.length / 2) }

Related

What is the best way to delimit a csv files thats contain commas and double quotes?

Lets say I have the following string and I want the below output without requiring csv.
this, "what I need", to, do, "i, want, this", to, work
this
what i need
to
do
i, want, this
to
work
This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."
We can solve it with a beautifully-simple regex:
"([^"]+)"|[^, ]+
The left side of the alternation | matches complete "quotes" and captures the contents to Group1. The right side matches characters that are neither commas nor spaces, and we know they are the right ones because they were not matched by the expression on the left.
Option 2: Allowing Multiple Words
In your input, all tokens are single words, but if you also want the regex to work for my cat scratches, "what I need", your dog barks, use this:
"([^"]+)"|[^, ]+(?:[ ]*[^, ]+)*
The only difference is the addition of (?:[ ]*[^, ]+)* which optionally adds spaces + characters, zero or more times.
This program shows how to use the regex (see the results at the bottom of the online demo):
subject = 'this, "what I need", to, do, "i, want, this", to, work'
regex = /"([^"]+)"|[^, ]+/
# put Group 1 captures in an array
mymatches = []
subject.scan(regex) {|m|
$1.nil? ? mymatches << $& : mymatches << $1
}
mymatches.each { |x| puts x }
Output
this
what I need
to
do
i, want, this
to
work
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...

Display characters around regex match

Is it possible to display the characters around a regex match? I have the string below, and I want to substitute every occurrence of "change" while displaying the 3-5 characters before the match.
string = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
What I have so far
while line.match(/change/)
printf "\n\n Substitute the FIRST change below:\n"
printf "#{line}\n"
printf "\n\tSubstitute => "
substitution = gets.chomp
line = line.sub(/change/, "#{substitution}")
end
If you want to get down and dirty Perl style:
before_chars = $`[-3, 3]
This is the last three characters just before your pattern match.
You would likely use gsub! with block given in the following manner:
line = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
# line.gsub!(/(?<where>.{0,3})change/) {
line.gsub!(/(?<where>\S+)change/) {
printf "\n\n Substitute the change around #{Regexp.last_match[:where]} => \n"
substitution = gets.chomp
"#{Regexp.last_match[:where]}#{substitution}"
}
puts line
Yielding:
Substitute the change around val= =>
111
Substitute the change around anotherval= =>
222
Substitute the change around stringhere: =>
333
val=111 anotherval=222 stringhere:333: foo=bar foofoo=barbar
gsub! will substitute the matches in place, while more suitable pattern \S+ instead of commented out .{0,3} will give you an ability to print out the human-readable hint.
Alternative: Use $1 Match Variable
tadman's answer uses the special prematch variable ($`). Ruby will also store a capture group in a numbered variable, which is probably just as magical but possibly more intuitive. For example:
string = "val=change anotherval=change stringhere:change: foo=bar foofoo=barbar"
string.sub(/(.{3})?change/, "\\1#{substitution}")
$1
# => "al="
No matter what method you use, though, make sure you explicitly check your match variables for nils in the event that your last attempted match was unsuccessful.

Ruby remove empty lines from string

How do i remove empty lines from a string?
I have tried
some_string = some_string.gsub(/^$/, "");
and much more, but nothing works.
Remove blank lines:
str.gsub /^$\n/, ''
Note: unlike some of the other solutions, this one actually removes blank lines and not line breaks :)
>> a = "a\n\nb\n"
=> "a\n\nb\n"
>> a.gsub /^$\n/, ''
=> "a\nb\n"
Explanation: matches the start ^ and end $ of a line with nothing in between, followed by a line break.
Alternative, more explicit (though less elegant) solution:
str.each_line.reject{|x| x.strip == ""}.join
squeeze (or squeeze!) does just that - without a regex.
str.squeeze("\n")
Replace multiple newlines with a single one:
fixedstr = str.gsub(/\n\n+/, "\n")
or
str.gsub!(/\n\n+/, "\n")
You could try to replace all occurrences of 2 or more line breaks with just one:
my_string.gsub(/\n{2,}/, '\n')
Originally
some_string = some_string.gsub(/\n/,'')
Updated
some_string = some_string.gsub(/^$\n/,'')

Ruby: Remove whitespace chars at the beginning of a string

Edit: I solved this by using strip! to remove leading and trailing whitespaces as I show in this video. Then, I followed up by restoring the white space at the end of each string the array by iterating through and adding whitespace. This problem varies from the "dupe" as my intent is to keep the whitespace at the end. However, strip! will remove both the leading and trailing whitespace if that is your intent. (I would have made this an answer, but as this is incorrectly marked as a dupe, I could only edit my original question to include this.)
I have an array of words where I am trying to remove any whitespace that may exist at the beginning of the word instead of at the end. rstrip! just takes care of the end of a string. I want whitespaces removed from the beginning of a string.
example_array = ['peanut', ' butter', 'sammiches']
desired_output = ['peanut', 'butter', 'sammiches']
As you can see, not all elements in the array have the whitespace problem, so I can't just delete the first character as I would if all elements started with a single whitespace char.
Full code:
words = params[:word].gsub("\n", ",").delete("\r").split(",")
words.delete_if {|x| x == ""}
words.each do |e|
e.lstrip!
end
Sample text that a user may enter on the form:
Corn on the cob,
Fibonacci,
StackOverflow
Chat, Meta, About
Badges
Tags,,
Unanswered
Ask Question
String#lstrip (or String#lstrip!) is what you're after.
desired_output = example_array.map(&:lstrip)
More comments about your code:
delete_if {|x| x == ""} can be replaced with delete_if(&:empty?)
Except you want reject! because delete_if will only return a different array, rather than modify the existing one.
words.each {|e| e.lstrip!} can be replaced with words.each(&:lstrip!)
delete("\r") should be redundant if you're reading a windows-style text document on a Windows machine, or a Unix-style document on a Unix machine
split(",") can be replaced with split(", ") or split(/, */) (or /, ?/ if there should be at most one space)
So now it looks like:
words = params[:word].gsub("\n", ",").split(/, ?/)
words.reject!(&:empty?)
words.each(&:lstrip!)
I'd be able to give more advice if you had the sample text available.
Edit: Ok, here goes:
temp_array = text.split("\n").map do |line|
fields = line.split(/, */)
non_empty_fields = fields.reject(&:empty?)
end
temp_array.flatten(1)
The methods used are String#split, Enumerable#map, Enumerable#reject and Array#flatten.
Ruby also has libraries for parsing comma seperated files, but I think they're a little different between 1.8 and 1.9.
> ' string '.lstrip.chop
=> "string"
Strips both white spaces...

Ruby Regex match unless escaped with \

Using Ruby I'm trying to split the following text with a Regex
~foo\~\=bar =cheese~monkey
Where ~ or = denotes the beginning of match unless it is escaped with \
So it should match
~foo\~\=bar
then
=cheese
then
~monkey
I thought the following would work, but it doesn't.
([~=]([^~=]|\\=|\\~)+)(.*)
What is a better regex expression to use?
edit To be more specific, the above regex matches all occurrences of = and ~
edit Working solution. Here is what I came up with to solve the issue. I found that Ruby 1.8 has look ahead, but doesn't have lookbehind functionality. So after looking around a bit, I came across this post in comp.lang.ruby and completed it with the following:
# Iterates through the answer clauses
def split_apart clauses
reg = Regexp.new('.*?(?:[~=])(?!\\\\)', Regexp::MULTILINE)
# need to use reverse since Ruby 1.8 has look ahead, but not look behind
matches = clauses.reverse.scan(reg).reverse.map {|clause| clause.strip.reverse}
matches.each do |match|
yield match
end
end
What does "remove the head" mean in this context?
If you want to remove everything before a certain char, this will do:
.*?(?<!\\)= // anything up to the first "=" that is not preceded by "\"
.*?(?<!\\)~ // same, but for the squiggly "~"
.*?(?<!\\)(?=~) // same, but excluding the separator itself (if you need that)
Replace by "", repeat, done.
If your string has exactly three elements ("1=2~3") and you want to match all of them at once, you can use:
^(.*?(?<!\\)(?:=))(.*?(?<!\\)(?:~))(.*)$
matches: \~foo\~\=bar =cheese~monkey
| 1 | 2 | 3 |
Alternatively, you split the string using this regex:
(?<!\\)[=~]
returns: ['\~foo\~\=bar ', 'cheese', 'monkey'] for "\~foo\~\=bar =cheese~monkey"
returns: ['', 'foo\~\=bar ', 'cheese', 'monkey'] for "~foo\~\=bar =cheese~monkey"

Resources