Alternative code to read and process array by newline in Ruby - ruby

My code is supposed to read a file on the server, store its content in an Array, then read the array elements (eventually each element is a line) and split each line into 7 parts by (:)
I wrote this code and it works 100% fine.
lines = File.readlines('/etc/passwd')
lines.each do |line|
line = line.chomp! #I removed the \n
line_arr = line.split(/:/)
puts line_arr.inspect
puts "*************"
end
I just want to know if there is a shortcut to do this since each element of the array ends with \n.
Maybe I am a bit confused between a an array elements ending with \n and a string that contains \n
the content of the file looks like this
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
As for the output, there's no specific format, because I am going to use this part and extend my code later. As long as I can access those 7 parts that I extracted from the line_arr, i should be fine.
thank you

require 'etc'
[].tap {|ary| Etc.passwd {|u|
ary << [u.name, u.passwd, u.uid, u.gid, u.gecos, u.dir, u.shell, u.change,
u.uclass, u.expire]
}}
Rule of thumb: never try to reimplement behavior that someone else has already written for you. Unless you are really, really, really, REALLY smart.
Actually, now that you have edited your question, I don't even see why you need those arrays in the first place and cannot just use the Etc.passwd iterator and Struct::Passwd directly.

Related

How to preserve format of content while writing to another file?

I'm reading some content from a file and use a regex and scan to discard a few things in the file and write the content to another file.
If I look at the newly written file, it has escape characters and "\n" in the file instead of actual new line.
filea.txt is:
test
in run
]
}
end
I'm getting the content between 'test' and 'end' using:
file = File.open('filea.txt', 'r')
result = file.read
regex = /(?<=test) .*?(?=end)/mx
ans = result.scan(regex)
Writing ans to a new file like fileb.txt puts:
in run'\"\n ]\n }
But, if I try writing the entire result, then it has correct content format in fileb.txt.
Your question isn't clear and needs work, but you're using read in a way that can cause scalability problems.
Here's how to accomplish the same sort of task without using read:
content = []
DATA.each_line do |li|
marker = li.lstrip
if marker =~ /^in run/i .. marker =~ /^end of file/i
content << li
end
end
content # => ["in run\n", "]\n", "}\n", "end of file\n"]
__END__
test file
in run
]
}
end of file
The .. (elipsis) is a multitool in Ruby (and other languages). We use it to define ranges but can also use it to flip-flop between logic states. In this case I'm using it in the second form, a "flip-flop".
When Ruby runs the code it checks
marker =~ /^in run/i`
If that is false the if fails and the code continues. If
marker =~ /^in run/i
succeeds, Ruby will remember that it succeeded and immediately test for
marker =~ /^end of file/i
If that fails Ruby will fall into the if block and do whatever is inside the block, then continue as normal.
The next loop of each_line will hit the if tests and .. will remember that
marker =~ /^in run/i
succeeded previously and immediately test the second condition. If it is true it steps into the block and resets itself to a false again, so that any subsequent loops will fail until
marker =~ /^in run/i
returns true again.
This logic is really powerful and makes it easy to build code that can scan huge files, extracting portions of them.
There are other ways to do it but they generally run into messier logic.
In the example code I'm also using __END__ which has some rarely seen magic to it also. You'll want to read about __END__ and DATA if you don't understand what's happening.
If you're dealing with files in the GB or TB range, with lots of content you're grabbing, it might be smart to not accumulate too much into your data-gathering array content. A minor tweak will keep that from happening:
if marker =~ /^in run/i .. marker =~ /^end of file/i
content << li
next
end
unless content.empty?
# do something that clears content:
end
In this code I'm using DATA.each_line. In real life you'd want to use File.foreach instead.

How far does .each read? To the end of the line?

Sorry for the newbie question. Was loading a .txt file into the following code:
line_count = 0
File.open("text.txt").each {|line| line_count += 1}
puts line_count
Does .each simply read until the end of a line before passing its value to the code block? Little explanation would be great. Thanks!
You can use .each_line to be more explicit, but yes, http://www.ruby-doc.org/core-2.0.0/IO.html#method-i-each each reads a line.
f = File.new("testfile")
f.each {|line| puts "#{f.lineno}: #{line}" }
It's really important to read the documentation, because all sorts of things are explained there. For instance, the documentation for each says:
Executes the block for every line in ios, where lines are separated by sep.
sep means "\r", "\n" or "\r\n", depending on the OS the code is running on which is also the value of the special $/ global variable which contains the default line-ending character for that OS. You can tell Ruby to use a different value for the line-end/separator if you know the file uses something else.
Regarding your code:
I'd do it this way:
line_count = 0
File.foreach("text.txt") do |line|
line_count += 1
end
puts line_count
foreach is very self-explanatory, which is important when writing code. You want it to be self-documenting as much as possible. foreach iterates over "each" line in the file. It also assumes the line-ends are the same as $/, but you can force it to be something different, perhaps the letter "z" or "." or " ", depending on your whim and fancy at the moment.

How do I create an array from a txt file converted by a HTML file (Ruby)?

I'm trying to complete the first task to our assignment:
Get 5 regular emails and 5 advance-­‐fee fraud emails (aka spam). Convert them all into text files and then turn each into an array of words (split may help here). Then use a bunch of regular expressions to search the array of words looking for keywords to classify which files are spam or not. If you want to get fancy you could give each array a spam-­‐score out of 10.
Open HTML page and read file.
Strip script, links etc from file.
Have body/para on its own.
Open text file (file2) & write to it (UTF-8).
Pass content from HTML document (file 1).
Now put the words from text file (file2) into an array and later split.
Go through array finding any words that are considered spam and print message to screen stating if the email is a spam or not.
Here is my code:
require 'nokogiri'
file = File.open("EMAILS/REG/Membership.htm", "r")
doc = Nokogiri::HTML(file)
#What ever is passed from elements to the newFile is being put into the new array however the euro sign doesn't appear correctly
elements = doc.xpath("/html/body//p").text
#puts elements
newFile = File.open("test1.txt", "w")
newFile.write(elements)
newFile.close()
#I want to open the file again and print the lines to the screen
#
array_of_words = {}
puts "\n\tRetrieving test1.txt...\n\n"
File.open("test1.txt", "r:UTF-8").each_line do |line|
words = line.split(' ')
words.each do |word|
puts "#{word}"
#array_of_words[word] = gets.chomp.split(' ')
end
end
EDITED: Here I've edited the file, however, I'm unable to retrieve the UTF-8 encoding of the euro sign in the array (see the image).
require 'nokogiri'
doc = Nokogiri::HTML(File.open("EMAILS/REG/Membership.htm", "r:UTF-8"))
#What ever is passed from elements to the newFile is being put into the new
#array however the euro sign doesn't appear correctly
elements = doc.xpath("//p").text
#puts elements
File.write("test1.txt", elements)
puts "\n\tRetrieving test1.txt...\n\n"
#I want to open the file again and print the lines to the screen
#
word_array = Array.new
File.read("test1.txt").each_line do |line|
line.split(' ').each do |word|
puts "#{word}"
word_array << word
end
end
Because this is an assignment, I'm not going to try to answer how you're supposed to do this; You're supposed to figure it out on your own.
What I will do is show you how you should have written what you've already done, and point you in a direction:
require 'nokogiri'
doc = Nokogiri::HTML(File.read("EMAILS/REG/Membership.htm"))
# What ever is passed from elements to the newFile is being put into the new
# array however the euro sign doesn't appear correctly
elements = doc.xpath("//p").text
File.write("test1.txt", elements)
print "\n\tRetrieving test1.txt...\n\n"
# I want to open the file again and print the lines to the screen
word_hash = {}
File.open("test1.txt", "r:UTF-8").each_line do |line|
line.split(' ').each do |word|
puts "#{word}"
#word_hash[word] = gets.chomp.split(' ')
end
end
Many of Ruby's IO methods, and File's by inheritance, can take advantage of blocks, which automatically close the stream when the block exits. Use that capability as leaving files open throughout the run-time of an app is not good.
array_of_words = {} doesn't define an array, it's a hash.
#array_of_words[word] = gets.chomp.split(' ') wouldn't work because of where gets wants to read from. By default it's STDIN, which would be the console, meaning the keyboard. You've already got word at that point so do something with it.
But think, you're basically creating the basis for a Bayesian Filter. You need to be counting the number of occurrences of words, so merely assigning the word to the hash won't get you what you want to know, you need to know how many times a particular word was seen. Stack Overflow has a lot of questions answered about how to count the number of words found in a string, so search for those.
You're making things harder for yourself. You already have the paragraph text in elements so there's no need to read test1.txt after writing to it. Then use String#split without arguments to split on all whitespace.

Swap Words in File with hash

I have a text file and I am trying to replace certain lines with the values in a hash. I am trying to make it loop through the file, and swap out anything that matches the hash. For some reason this isn't working, it only duplicates the file, doesn't swap anything out. Any Ideas?
HASHBROWNS{
'mustard' => 'dijon',
'ketchup' => 'catsup',
}
File.open('new_hashed_file.txt', 'w') do |file|
File.open('oldfile.txt', 'r').readlines.each do |swaparoo|
if HASHBROWNS.has_key?(swaparoo.downcase)
file.puts HASHBROWNS[swaparoo.downcase]
else
file.puts swaparoo
end
end
end
Thanks
Ryn
Change this line:
File.open('oldfile.txt', 'r').readlines.each do |swaparoo|
to this:
File.open('oldfile.txt', 'r').readlines.map(&:chomp).each do |swaparoo|
The problem is your array of lines contains newlines.
When you read data with readlines there will be a newline present in each string. This is what's making your match miss. The easy way is to just trim it off with chomp. You may want do modify your test slightly:
File.open('new_hashed_file.txt', 'w') do |file|
File.open('oldfile.txt', 'r').readlines.each do |line|
line = line.chomp.downcase
file.puts HASHBROWNS[line] || line
end
end
One thing to pay attention to is not repeatedly calling methods like downcase if you can simply save the result to a temporary variable and recycle it.

How do you loop through a multiline string in Ruby?

Pretty simple question from a first-time Ruby programmer.
How do you loop through a slab of text in Ruby? Everytime a newline is met, I want to re-start the inner-loop.
def parse(input)
...
end
String#each_line
str.each_line do |line|
#do something with line
end
What Iraimbilanja said.
Or you could split the string at new lines:
str.split(/\r?\n|\r/).each { |line| … }
Beware that each_line keeps the line feed chars, while split eats them.
Note the regex I used here will take care of all three line ending formats. String#each_line separates lines by the optional argument sep_string, which defaults to $/, which itself defaults to "\n" simply.
Lastly, if you want to do more complex string parsing, check out the built-in StringScanner class.
You can also do with with any pattern:
str.scan(/\w+/) do |w|
#do something
end
str.each_line.chomp do |line|
# do something with a clean line without line feed characters
end
I think this should take care of the newlines.

Resources