Format output to 40 characters long per line - ruby

I'm fairly new to Ruby and I've been searching Google for a few hours now.
Does anyone know how to format the output of a print to be no more than 40 characters long?
For example:
What I want to print:
This is a simple sentence.
This simple
sentence appears
on four lines.
But I want it formatted as:
This is a simple sentence. This simple
sentence appears on four lines.
I have each line of the original put into an array.
so x = ["This is a simple sentence.", "This simple", "sentence appears", "on three lines."]
I tried x.each { |n| print n[0..40], " " } but it didn't seem to do anything.
Any help would be fantastic!

The method word_wrap expects a Strind and makes a kind of pretty print.
Your array is converted to a string with join("\n")
The code:
def word_wrap(text, line_width = 40 )
return text if line_width <= 0
text.gsub(/\n/, ' ').gsub(/(.{1,#{line_width}})(\s+|$)/, "\\1\n").strip
end
x = ["This is a simple sentence.", "This simple", "sentence appears", "on three lines."]
puts word_wrap(x.join("\n"))
x << 'a' * 50 #To show what happens with long words
x << 'end'
puts word_wrap(x.join("\n"))
Code explanation:
x.join("\n")) build a string, then build one long line with text.gsub(/\n/, ' ').
In this special case this two steps could be merged: x.join(" "))
And now the magic happens with
gsub(/(.{1,#{line_width}})(\s+|$)/, "\\1\n")
(.{1,#{line_width}})): Take any character up to line_width characters.
(\s+|$): The next character must be a space or line end (in other words: the previous match may be shorter the line_width if the last character is no space.
"\\1\n": Take the up to 40 character long string and finish it with a newline.
gsub repeat the wrapping until it is finished.
And in the end, I delete leading and trailing spaces with strip
I added also a long word (50 a's). What happens? The gsub does not match, the word keeps as it is.

puts x.join(" ").scan(/(.{1,40})(?:\s|$)/m)
This is a simple sentence. This simple
sentence appears on three lines.

Ruby 1.9 (and not overly efficient):
>> x.join(" ").each_char.each_slice(40).to_a.map(&:join)
=> ["This is a simple sentence. This simple s", "entence appears on three lines."]
The reason your solution doesn't work is that all the individual strings are shorter than 40 characters, so n[0..40] always is the entire string.

Related

How to remove strings that end with a particular character in Ruby

Based on "How to Delete Strings that Start with Certain Characters in Ruby", I know that the way to remove a string that starts with the character "#" is:
email = email.gsub( /(?:\s|^)#.*/ , "") #removes strings that start with "#"
I want to also remove strings that end in ".". Inspired by "Difference between \A \z and ^ $ in Ruby regular expressions" I came up with:
email = email.gsub( /(?:\s|$).*\./ , "")
Basically I used gsub to remove the dollar sign for the carrot and reversed the order of the part after the closing parentheses (making sure to escape the period). However, it is not doing the trick.
An example I'd like to match and remove is:
"a8&23q2aas."
You were so close.
email = email.gsub( /.*\.\s*$/ , "")
The difference lies in the fact that you didn't consider the relationship between string of reference and the regex tokens that describe the condition you wish to trigger. Here, you are trying to find a period (\.) which is followed only by whitespace (\s) or the end of the line ($). I would read the regex above as "Any characters of any length followed by a period, followed by any amount of whitespace, followed by the end of the line."
As commenters pointed out, though, there's a simpler way: String#end_with?.
I'd use:
words = %w[#a day in the life.]
# => ["#a", "day", "in", "the", "life."]
words.reject { |w| w.start_with?('#') || w.end_with?('.') }
# => ["day", "in", "the"]
Using a regex is overkill for this if you're only concerned with the starting or ending character, and, in fact, regular expressions will slow your code in comparison with using the built-in methods.
I would really like to stick to using gsub....
gsub is the wrong way to remove an element from an array. It could be used to turn the string into an empty string, but that won't remove that element from the array.
def replace_suffix(str,suffix)
str.end_with?(suffix)? str[0, str.length - suffix.length] : str
end

What's different about this ruby regex?

I was trying to substitute either a comma or a percent sign, and it continually failed, so I opened up IRB and tried some things out. Can anyone explain to me why the first regex (IRB line 13) doesn't work but the flipped version does (IRB line 15)? I've looked it up and down and I don't see any typos, so it must be something to do with the rule but I can't see what.
b.gsub(/[%]*|[,]*/,"")
# => "245,324"
b.gsub(/[,]*/,"")
# => "245324"
b.gsub(/[,]*|[%]*/,"")
# => "245324"
b
# => "245,324"
Because ruby happily finds [%]* zero times throughout your string and does the substitution. Check out this result:
b = '232,000'
puts b.gsub(/[%]*/,"-")
--output:--
-2-3-2-,-0-0-0-
If you put all the characters that you want to erase into the same character class, then you will get the result you want:
b = "%245,324,000%"
puts b.gsub(/[%,]*/, '')
--output:--
245324000
Even then, there are a lot of needless substitutions going on:
b = "%245,324,000%"
puts b.gsub(/[%,]*/, '-')
--output:--
--2-4-5--3-2-4--0-0-0--
It's the zero or more that gets you into trouble because ruby can find lots of places where there are 0 percent signs or 0 commas. You actually don't want to do substitutions where ruby finds zero of your characters, instead you want to do substitutions where at least one of your characters occurs:
b = '%232,000,000%'
puts b.gsub(/%+|,+/,"")
--output:--
232000000
Or, equivalently:
puts b.gsub(/[%,]+/, '')
Also, note that regexes are like double quoted strings, so you can interpolate into them--it's as if the delimiters // are double quotes:
one_or_more_percents = '%+'
one_or_more_commas = ',+'
b = '%232,000,000%'
puts b.gsub(/#{one_or_more_percents}|#{one_or_more_commas}/,"")
--output:--
232000000
But when your regexes consist of single characters, just use a character class: [%,]+

Extract first word from a line in a file using Ruby

How do I get the first word from each line? Thanks to help from someone on Stack Overflow, I am working with the code below:
File.open("pastie.rb", "r") do |file|
while (line = file.gets)
next if (line[0,1] == " ")
labwords = line.split.first
print labwords.join(' ')
end
end
It extracts the first word from each line, but it has problems with spaces. I need help adjusting it. I need to use the first method, but I don't know how to use it.
If you want the first word from each line from a file:
first_words = File.read(file_name).lines.map { |l| l.split(/\s+/).first }
It's pretty simple. Let's break it apart:
File.read(file_name)
Reads the entire contents of the file and returns it as a string.
.lines
Splits a string by newline characters (\n) and returns an array of strings. Each string represents a "line."
.map { |l| ... }
Array#map calls the provided block passing in each item and taking the return value of the block to build a new array. Once Array#map finishes it returns the array containing new values. This allows you to transform the values. In the sample block here |l| is the block params portion meaning we're taking one argument and we'll reference it as l.
|l| l.split(/\s+/).first
This is the block internal, I've gone ahead and included the block params here too for completeness. Here we split the line by /\s+/. This is a regular expression, the \s means any whitespace (\t \n and space) and the + following it means one or more so \s+ means one or more whitespace character and of course, it will try to match as many consecutive whitespace characters as possible. Passing this to String#split will return an array of substrings that occur between the seperator given. Now, our separator was one or more whitespace so we should get everything between whitespace. If we had the string "A list of words" we'll get ["A", "list", "of", "words"] after the split call. It's very useful. Finally, we call .first which returns the first element of an array (in this case "the first word").
Now, in Ruby, the evaluated value of the last expression in a block is automatically returned so our first word is returned and given that this block is passed to map we should get an array of the first words from a file. To demonstrate, let's take the input (assuming our file contains):
This is line one
And line two here
Don't forget about line three
Line four is very board
Line five is the best
It all ends with line six
Running this through the line above we get:
["This", "And", "Don't", "Line", "Line", "It"]
Which is the first word from each line.
Consider this:
def first_words_from_file(file_name)
lines = File.readlines(file_name).reject(&:empty?)
lines.map do |line|
line.split.first
end
end
puts first_words_from_file('pastie.rb')

Matching English words algorithm stops working when using Ruby bang methods

I am writing a matching algorithm that checks a user-entered word against a huge list of english words to see how many matches it can find. Everything works, except I have two lines of code that are essentially meant to not pick the same letters twice, and they make the whole thing just return a single letter. Here is what I've done:
word_array = []
File.open("wordsEn.txt").each do |line|
word_array << line.chomp
end
puts "Please enter a string of characters with no spaces:"
user_string = gets.chomp.downcase
user_string_array = user_string.split("")
matching_words = []
word_array.each do |word|
one_array = word.split("")
tmp_user_string_array = user_string_array
letter_counter = 0
for i in 0...word.length
if tmp_user_string_array.include? one_array[i]
letter_counter += 1
string_index = tmp_user_string_array.index(one_array[i])
tmp_user_string_array.slice!(string_index)
end
end
if letter_counter == word.length
matching_words << word
end
end
puts matching_words
This part here is what breaks it:
string_index = tmp_user_string_array.index(one_array[i])
tmp_user_string_array.slice!(string_index)
Can anyone see an issue here? It all makes sense to me.
I see what's happening. You're eliminating letters for non-matching words, which prevents matching words from being found.
For example, take this word list:
ant
bear
cat
dog
emu
And this input to your program:
catdog
The first word you look for is ant, which causes the a and t to be sliced out of catdog, leaving cdog. Now the word cat can no longer be found.
The cure is to make sure that your tmp_user_string_array really is a temporary array. Currently it's a reference to the original user_string_array, which means that you're destructively modifying the user input. You should make a copy of it before you start slicing and dicing.
Once you've got that working, you might like to think about more efficient approaches that don't require duplicating and slicing arrays. Consider this: what if you were to sort each word of your lexicon as well as the input string before starting to look for a match? This would turn the word cat into act and the input acatdog into aacdgot. Do you see how you could traverse the sorted word and the sorted input in search of a match without the need to do any slicing?

How the Anchor \z and \G works in Ruby?

I am using Ruby1.9.3. I am newbie to this platform.
From the doc I just got familiared with two anchor which are \z and \G. Now I little bit played with \z to see how it works, as the definition(End or End of String) made me confused, I can't understand what it meant say - by End. So I tried the below small snippets. But still unable to catch.
CODE
irb(main):011:0> str = "Hit him on the head me 2\n" + "Hit him on the head wit>
=> "Hit him on the head me 2\nHit him on the head with a 24\n"
irb(main):012:0> str =~ /\d\z/
=> nil
irb(main):013:0> str = "Hit him on the head me 24 2\n" + "Hit him on the head >
=> "Hit him on the head me 24 2\nHit him on the head with a 24\n"
irb(main):014:0> str =~ /\d\z/
=> nil
irb(main):018:0> str = "Hit1 him on the head me 24 2\n" + "Hit him on the head>
=> "Hit1 him on the head me 24 2\nHit him on the head with a11 11 24\n"
irb(main):019:0> str =~ /\d\z/
=> nil
irb(main):020:0>
Every time I got nil as the output. So how the calculation is going on for \z ? what does End mean? - I think my concept took anything wrong with the End word in the doc. So anyone could help me out to understand the reason what is happening with the out why so happening?
And also i didn't find any example for the anchor \G . Any example please from you people to make visualize how \G used in real time programming?
EDIT
irb(main):029:0>
irb(main):030:0* ("{123}{45}{6789}").scan(/\G(?!^)\{\d+\}/)
=> []
irb(main):031:0> ('{123}{45}{6789}').scan(/\G(?!^)\{\d+\}/)
=> []
irb(main):032:0>
Thanks
\z matches the end of the input. You are trying to find a match where 4 occurs at the end of the input. Problem is, there is a newline at the end of the input, so you don't find a match. \Z matches either the end of the input or a newline at the end of the input.
So:
/\d\z/
matches the "4" in:
"24"
and:
/\d\Z/
matches the "4" in the above example and the "4" in:
"24\n"
Check out this question for example of using \G:
Examples of regex matcher \G (The end of the previous match) in Java would be nice
UPDATE: Real-World uses for \G
I came up with a more real world example. Say you have a list of words that are separated by arbitrary characters that cannot be well predicted (or there's too many possibilities to list). You'd like to match these words where each word is its own match up until a particular word, after which you don't want to match any more words. For example:
foo,bar.baz:buz'fuzz*hoo-har/haz|fil^bil!bak
You want to match each word until 'har'. You don't want to match 'har' or any of the words that follow. You can do this relatively easily using the following pattern:
/(?<=^|\G\W)\w+\b(?<!har)/
rubular
The first attempt will match the beginning of the input followed by zero non-word character followed by 3 word characters ('foo') followed by a word boundary. Finally, a negative lookbehind assures that the word which has just been matched is not 'har'.
On the second attempt, matching picks back up at the end of the last match. 1 non-word character is matched (',' - though it is not captured due to the lookbehind, which is a zero-width assertion), followed by 3 characters ('bar').
This continues until 'har' is matched, at which point the negative lookbehind is triggered and the match fails. Because all matches are supposed to be "attached" to the last successful match, no additional words will be matched.
The result is:
foo
bar
baz
buz
fuzz
hoo
If you want to reverse it and have all words after 'har' (but, again, not including 'har'), you can use an expression like this:
/(?!^)(?<=har\W|\G\W)\w+\b/
rubular
This will match either a word which is immediately preceeded by 'har' or the end of the last match (except we have to make sure not to match the beginning of the input). The list of matches is:
haz
fil
bil
bak
If you do want to match 'har' and all following words, you could use this:
/\bhar\b|(?!^)(?<=\G\W)\w+\b/
rubular
This produces the following matches:
har
haz
fil
bil
bak
Sounds like you want to know how Regex works? Or do you want to know how Regex works with ruby?
Check these out.
Regexp Class description
The Regex Coach - Great for testing regex matching
Regex cheat sheet
I understand \G to be a boundary match character. So it would tell the next match to start at the end of the last match. Perhaps since you haven't made a match yet you cant have a second.
Here is the best example I can find. Its not in ruby but the concept should be the same.
I take it back this might be more useful

Resources