Read files into variables, using Dir and arrays - ruby

For an assignment, I'm using the Dir.glob method to read a series of famous speech files, and then perform some basic speech analytics on each one (number of words, number of sentences, etc). I'm able to read the files, but have not figured out how to read each file into a variable, so that I may operate on the variables later.
What I've got is:
Dir.glob('/students/~pathname/public_html/speeches/*.txt').each do |speech|
#code to process the speech.
lines = File.readlines(speech)
puts lines
end
This prints all the speeches out onto the page as one huge block of text. Can anyone offer some ideas as to why?
What I'd like to do, within that code block, is to read each file into a variable, and then perform operations on each variable such as:
Dir.glob('/students/~pathname/public_html/speeches/*.txt').each do |speech|
#code to process the speech.
lines = File.readlines(speech)
text = lines.join
line_count = lines.size
sentence_count = text.split(/\.|\?|!/).length
paragraph_count = text.split(/\n\n/).length
puts "#{line_count} lines"
puts "#{sentence_count} sentences"
puts "#{paragraph_count} paragraphs"
end
Any advice or insight would be hugely appreciated! Thanks!

Regarding your first question:
readLines converts the file into an array of Strings and what you then see is the behaviour of puts with an array of Strings as the argument.
Try puts lines.inspect if you would rather see the data as an array.
Also: Have a look at the Ruby console irb in case you have not done so already. It is very useful for trying out the kinds of things you are asking about.

Here's what wound up working:
speeches = []
Dir.glob('/PATH TO DIRECTORY/speeches/*.txt').each do |speech|
#code to process the speech.
f = File.readlines(speech)
speeches << f
end
def process_file(file_name)
# count the lines
line_count = file_name.size
return line_count
end
process_file(speeches[0])

Related

Outputting hash to text file

I am having trouble outputting the contents of my hash to a file. The program is one that manages a list of student records, including their StudentID, first name, last name, Major, and catalog year. Once the user is finished adding records, it is then added to the hash.
Everything in the program works perfectly, except when I try running the quit_program function, it doesn't save the contents in the file. Additionally, i am not getting any errors, any ideas?
could it potentially not be working because it is having trouble with converting the text in my hash, which is alphanumeric, into the text file?
def quit_program()
puts "Save Changes? y/n"
#changes = gets().chomp
if #changes=="y"
#fh=File.open(#file_name, 'w')
#this_string=""
#sDB.each do |key, store_account_data| #line 50
puts "#{key}: #{store_account_data.join(',')}"
end
end
#fh.puts(#this_string)
#fh.close()
end
You're not writing anything to the file. The string #this_string is empty. You should do
#sDB.each do |key, store_account_data|
#fh.puts "#{key}: #{store_account_data.join(',')}"
end
it doesn't save the contents in the file.
The following is NOT how you write to a file:
puts "#{key}: #{store_account_data.join(',')}"
That is how you write to your terminal/console window.
And this code:
#this_string=""
#fh.puts(#this_string)
writes a blank string to the file.
Here is how you write to a file:
class Student
def initialize(sDB, filename)
#sDB = sDB
#filename = filename
end
def save_changes()
puts "Save Changes? y/n"
user_answer = gets().chomp
if user_answer == "y"
File.open(#file_name, 'w') do |f|
#sDB.each do |key, store_account_data| #line 50
f.puts "#{key}: #{store_account_data.join(',')}"
end
end
end
end
could it potentially not be working because it is having trouble with
converting the text in my hash, which is alphanumeric, into the text
file?
No. Here is a concrete example you can try:
data = {
"John" => ['a', 123, 'b', 456],
"Sally" => ['c', 789, 'b', 0]
}
File.open('data.txt', 'w') do |f|
data.each do |name, data|
f.puts "#{name}: #{data.join(',')}"
end
end
$ ruby myprog.rb
$ cat data.txt
John: a,123,b,456
Sally: c,789,b,0
Also, ruby indenting is 2 spaces--not 0 spaces or 3 spaces, or anything else.
The answer is given in the error message: undefined local variable or method 'sDB'. (Which you have since removed from your question making the edited version next to impossible to answer.) Where and when is sDB defined in your program? You are evidently attempting to quit before initializing it.
In any case it is not a good thing to be accessing instance variables directly inside other methods. You should use accessor (getter and setter) methods instead. That would have probably prevented this situation from biting you in the first place.
def sdb
#sDB ||= Hash.new
end
def sdb=( key, value )
sdb
#sDB[ key ] = value
end
. . .
You are not properly writing to a file even if #sDB is defined. See Ruby - Printing a hash to txt file for an example.
Your question is missing essential input data, so there's no way to test our suggested changes.
Here's untested code I'd work from:
def quit_program
puts "Save Changes? y/n"
if gets.chomp.downcase == 'y'
File.write(
#file_name,
#s_db.map{ |k, v| "#{ k }: #{ v.join(',') }" }.join("\n")
)
end
end
Note:
#sDB isn't a proper variable name in Ruby. We use snake_case, not camelCase for variables and method names. ItsAMatterOfReadability. Follow the convention or suffer the wrath of your team members the first time you have a code review.
Don't add empty parenthesis to method names (quit_program()) or calls (gets()) unless it's essential to tell the difference between a variable and a method invocation. You should also never name a variable the same as a method because it'll confuse everyone working on the code, so that should never be a consideration.
Don't create a variable (#changes) you use once and throw away, unless what you're doing is so complex you need to break down the operation into smaller chunks. And, if you're doing that, it'd be a really good candidate for refactoring into separate methods, so again, just don't.
When comparing user-input to something you expect, fold the case of their input to match what you expect. (gets.chomp.downcase == 'y'). It really irritates users to enter "y" and fail because you insisted on "Y".
While you can use File.open to create or write to a file, there's less visual noise to use File.write. open is great when you need to use various options for the mode but for plain text write is sufficient.
The whole block used for writing looks like it can be cleaned up to a single map and join, which coerces the data into an array of strings then into a single string.

Puts arrays in file using ruby

This is a part of my file:
project(':facebook-android-sdk-3-6-0').projectDir = new File('facebook-android-sdk-3-6-0/facebook-android-sdk-3.6.0/facebook')
project(':Forecast-master').projectDir = new File('forecast-master/Forecast-master/Forecast')
project(':headerListView').projectDir = new File('headerlistview/headerListView')
project(':library-sliding-menu').projectDir = new File('library-sliding-menu/library-sliding-menu')
I need to extract the names of the libs. This is my ruby function:
def GetArray
out_file = File.new("./out.txt", "w")
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
File.open(out_file, "w") do |f|
l.each do |ch|
f.write("#{ch}\n")
end
end
puts "#{l} "
end
end
My function returns this:
[]
[["CoverFlowLibrary"]]
[["Android-RSS-Reader-Library-master"]]
[["library"]]
[["facebook-android-sdk-3-6-0"]]
[["Forecast-master"]]
My problem is that I find nothing in out_file. How can I write to a file? Otherwise, I only need to get the name of the libs in the file.
Meditate on this:
"project(':facebook-android-sdk-3-6-0').projectDir'".scan(/project\(\'\:(.*)\'\).projectDir/)
# => [["facebook-android-sdk-3-6-0"]]
When scan sees the capturing (...), it will create a sub-array. That's not what you want. The knee-jerk reaction is to flatten the resulting array of arrays but that's really just a band-aid on the code because you chose the wrong method.
Instead consider this:
"project(':facebook-android-sdk-3-6-0').projectDir'"[/':([^']+)'/, 1]
# => "facebook-android-sdk-3-6-0"
This is using String's [] method to apply a regular expression with a capture and return that captured text. No sub-arrays are created.
scan is powerful and definitely has its place, but not for this sort of "find one thing" parsing.
Regarding your code, I'd do something like this untested code:
def get_array
File.new('./out.txt', 'w') do |out_file|
File.foreach('./file.txt') do |line|
l = line[/':([^']+)'/, 1]
out_file.puts l
puts l
end
end
end
Methods in Ruby are NOT camelCase, they're snake_case. Constants, like classes, start with a capital letter and are CamelCase. Don't go all Java on us, especially if you want to write code for a living. So GetArray should be get_array. Also, don't start methods with "get_", and don't call it array; Use to_a to be idiomatic.
When building a regular expression start simple and do your best to keep it simple. It's a maintainability thing and helps to reduce insanity. /':([^']+)'/ is a lot easier to read and understand, and accomplishes the same as your much-too-complex pattern. Regular expression engines are greedy and lazy and want to do as little work as possible, which is sometimes totally evil, but once you understand what they're doing it's possible to write very small/succinct patterns to accomplish big things.
Breaking it down, it basically says "find the first ': then start capturing text until the next ', which is what you're looking for. project( can be ignored as can ).projectDir.
And actually,
/':([^']+)'/
could really be written
/:([^']+)'/
but I felt generous and looked for the leading ' too.
The problem is that you're opening the file twice: once in:
out_file = File.new("./out.txt", "w")
and then once for each line:
File.open(out_file, "w") do |f| ...
Try this instead:
def GetArray
File.open("./out.txt", "w") do |f|
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
l.each do |ch|
f.write("#{ch}\n")
end # l.each
end # File.foreach
end # File.open
end # def GetArray

How can I read a word list in chunks of 100?

I want to read words in chunk of 100 from a file and then process them.
I can do it adding additional counter etc, but is there a in-build command in one of the IO libs that does this. I wasnt able to find it
require 'pp'
arr = []
i = 0
f=File.open("/home/pboob/Features/KB/178/synthetic/dataCreation/uniqEnglish.out").each(" ") { |word|
i=i+1
arr << word
if i==100
pp arr
arr.clear
i=0
end
}
pp arr
Thanks!
P.S:
The file is too big to fit in memory, so I will have to use ".each "
The file is too big to fit in memory, so I will have to use ".each "
Better than each, laziness with enumerable-lazy:
require 'enumerable/lazy'
result = open('/tmp/foo').lines.lazy.map(&:chomp).each_slice(100).map do |group_of_words|
# f(groups_of words)
end
More on functional programming and laziness here.
Actually, I believe the implementation of "each_slice" is sufficiently lazy for your purposes. Try this:
open('tmp/foo').lines.each_slice(100) do |lines|
lines = lines.collect &:chomp # optional
# do something with lines
end
Not as elegant as tokland's solution but it avoids adding an extra dependency to your app, which is always nice.
I think this might be useful to you:
http://blog.davidegrayson.com/2012/03/ruby-enumerable-module.html
Assuming one word per line, and the ability to slurp an entire file into memory:
IO.readlines('/tmp/foo').map(&:chomp).each_slice(100).to_a
If you are memory-constrained, then you can interate in chunks by specifying only the chunk size; no counter required!
File.open('/tmp/foo') do |f|
chunk = []
f.each do |line|
chunk.push(line)
next unless f.eof? or chunk.size == 100
puts chunk.inspect
chunk.clear
end
end
That's pretty verbose, though it does make it clear what's going on with the chunking. If you don't mind being less explicit, you can still use slicing with an Enumerator:
File.open('/tmp/foo').lines.map(&:chomp).each_slice(100) {|words| p words}
and replace the block with whatever processing you want to perform on each chunk.
Maybe it's more straightforward to do:
File.open(filename) do |file|
do_things(100.times.map{file.gets ' '}) until file.eof?
end

Nested scan block inside select block

I'm new to Ruby, I want to select some lines from a file that match a regex and then store to a list.
So I write the following code:
def get_valid_instr istream
p istream.select { |line| line.scan(/(^\s*\w+\s*;\s*\w+\s*$)/){|instr| instr[0].upcase.strip.split(";")}}
end
trace_instr = File.open("#{file_name}", "r"){|stream| get_valid_instr stream}
The output is simply the display of all file.
If I put a print in scan block, I see exactly what I want.
There are other ways to do that (filling an external list) but I wonder why it doesn't work and if there is ruby way.
If you pass a block to scan, it will return something different than if you don't:
"abc".scan(/./)
# => ["a", "b", "c"]
"abc".scan(/./) {|l| puts l }
# a
# b
# c
# => "abc"
You need to be aware of this when using scan.
However, even better than your current solution would be to use grep. You can pass both your regular expression and your block to grep.
It would be helpful to see some of the data you want to test with.
Is the data split by line? I'm not sure about you splitting by the semi-colon. What's the reason for that? If you could post some example data and some example output, I'll be able to help further.
This is my attempt at interpreting what you're trying to achieve, but it may be well off as I've not seen real data. Thanks!
def get_valid_instr(lines)
regex = /(^\s*\w+\s*;\s*\w+\s*$)/
lines.inject([]) do |matched_lines, line|
if match = line.match(regex)
p match[0]
matched_lines << match[0].upcase.strip.split(";")
end
matched_lines
end
end
trace_instr = get_valid_instr(File.readlines(file_name))
pp trace_instr
def get_valid_instr istream
istream.grep(/^\s*\w+\s*;\s*\w+\s*$/).map do |instr|
instr.upcase.strip.split(";")
end
end

Useful file output from reading a file (ruby/rails environment)

I have a model connected to a log, so I'm beginning to build ways to use that info with the model and pass it around elsewhere.
this method:
def read_log
counter = 1
f = File.open(self.log_file_path, 'r')
while (line = f.gets)
puts "#{counter}: #{line}"
counter = counter + 1
end
end
works, and dumps the log to the command line but ends with nil, so it reads it out to stdout but when calling that I get nothing. How can I read the contents into a more useful format? I need to read this into a controller variable for a template within rails on a webpage. It is basic, but something I haven't done yet.
contents = f.read
Now contents contains... the contents. Not sure what "useful" means in your context, but you can do things like split on newline to get each line.
You can also create an enumerator via f.lines, whether or not that's more useful, not sure.

Resources