Would it be better to use collect in this situation? - ruby

I'm just starting out using Ruby and I've written a bit of code to do basic parsing of a CSV file (Line is a basic class, omitted for brevity):
class File
def each_csv
each do |line|
yield line.split(",")
end
end
end
lines = Array.new
File.open("some.csv") do |file|
file.each_csv do |csv|
lines << Line.new(:field1 => csv[0], :field2 => csv[1])
end
end
I have a feeling I would be better off using collect somehow rather than pushing each Line onto the array but I can't work out how to do it.
Can anyone show me how to do it or is it perfectly fine as it is?
Edit: I should have made it clear that I'm not actually going to use this code in production, it's more to get used to the constructs of the language. It is still useful to know there are libraries to do this properly though.

Here's a (possibly wild) idea, use the Struct class instead of rolling your own simple POD class. But what you want from this is to have a constructor that accepts all of the arguments that could be generated from the file data.
Line = Struct.new(:field1, :field2, :field3)
Then at the core of the algorithm you want something like:
File.open("test.csv").lines.inject([]) do |result, line|
result << Line.new(line.split(",", Line.length))
end
or being a bit more concise and functional-like:
lines = File.open("test.csv").lines.map { |line| Line.new(line.split(",", Line.length)) }
To be honest I haven't used the Struct class much, but I should be, and I will probably refactor stuff already written to use it. It allows you to access the variables by their names like:
Line.field1 = blah
Line.field2 = 1
The Ruby Struct class.
So to actually answer your question, and looking above at the code, I would say it would be much simpler to use collect/map to perform the computation. The map function together with inject are very powerful and I find I use them quite frequently.

I don't know if you are aware of it, but ruby has it's own class for parsing and writing CSV files.
I found an example of using collect to turn a csv file into an array of hashes.
def csv_to_array(file_location)
csv = CSV::parse(File.open(file_location, 'r') {|f| f.read })
fields = csv.shift
csv.collect { |record| Hash[*(0..(fields.length - 1)).collect {|index| [fields[index],record[index].to_s] }.flatten ] }
end
This example is taken from this article.
If you are unfamiliar with the * notion, it basically dissolves the outer [] brackets, turning an array into a comma separated list of its elements.

Have you looked at FasterCSV, it does what your trying to do here, along with dealing with some of the brain deadness you find in some CSV files

See how this works for you (functional programming is fun!):
Try using inject. Inject takes as a parameter the starting "accumulator", and then a two parameter block:
[1,2,3].inject(0) { |sum,num| sum+num }
is naturally 6
[1,2,3].inject(5) { |sum,num| sum+num }
is 11
[1,2,3].inject(2) { |sum,num| sum*num }
is 12
To the point:
class Line
def initialize(options)
#options = options
end
def to_s
#options[:field1]+" "+#options[:field2]
end
end
File.open("test.csv").lines.inject([]) do |lines,line|
split = line.split(",")
lines << Line.new(:field1 => split[0],:field2 => split[1])
end

Related

I an getting an "Undefined method 'new' for.... (A number that changes each time)"

I made a simple program with a single method and I'm trying to test it, but I keep getting this weird error, and I have no idea why it keeps happening.
Here's my code for the only method I wrote:
def make_database(lines)
i = 0
foods = hash.new()
while i < lines.length do
lines[i] = lines[i].chomp()
words = lines[i].split(',')
if(words[1].casecmp("b") == 0)
foods[words[0]] = words[3]
end
end
return foods
end
And then here's what I have for calling the method (Inside the same program).
if __FILE__ == $PROGRAM_NAME
lines = []
$stdin.each { |line| lines << line}
foods = make_database(lines).new
puts foods
end
I am painfully confused, especially since it gives me a different random number for each "Undefined method 'new' for (Random number)".
It's a simple mistake. hash calls a method on the current object that returns a number used by the Hash structure for indexing entries, where Hash is the hash class you're probably intending:
foods = Hash.new()
Or more succinctly:
foods = { }
It's ideal to use { } in place of Hash.new unless you need to specify things like defaults, as is the case with:
Hash.new(0)
Where all values are initialized to 0 by default. This can be useful when creating simple counters.
Ruby classes are identified by leading capital letters to avoid confusion like this. Once you get used to the syntax you'll have an easier time spotting mistakes like that.
Note that when writing Ruby code you will almost always omit braces/brackets on empty argument lists. That is x() is expressed simply as x. This keeps code more readable, especially when chaining, like x.y.z instead of x().y().z()
Other things to note include being able to read in all lines with readlines instead of what you have there where you manually compose it. Try:
make_database($stdin.readlines.map(&:chomp))
A more aggressive refactoring of your code looks like this:
def make_database(lines)
# Define a Hash based on key/value pairs in an Array...
Hash[
# ...where these pairs are based on the input lines...
lines.map do |line|
# ...which have comma-separated components.
line.split(',')
end.reject do |key, flag, _, value|
# Pick out only those that have the right flag.
flag.downcase == 'b'
end.map do |key, flag, _, value|
# Convert to a simple key/value pair array
[ key, value ]
end
]
end
That might be a little hard to follow, but once you get the hang of chaining together a series of otherwise simple operations your Ruby code will be a lot more flexible and far easier to read.

Puts arrays in file using ruby

This is a part of my file:
project(':facebook-android-sdk-3-6-0').projectDir = new File('facebook-android-sdk-3-6-0/facebook-android-sdk-3.6.0/facebook')
project(':Forecast-master').projectDir = new File('forecast-master/Forecast-master/Forecast')
project(':headerListView').projectDir = new File('headerlistview/headerListView')
project(':library-sliding-menu').projectDir = new File('library-sliding-menu/library-sliding-menu')
I need to extract the names of the libs. This is my ruby function:
def GetArray
out_file = File.new("./out.txt", "w")
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
File.open(out_file, "w") do |f|
l.each do |ch|
f.write("#{ch}\n")
end
end
puts "#{l} "
end
end
My function returns this:
[]
[["CoverFlowLibrary"]]
[["Android-RSS-Reader-Library-master"]]
[["library"]]
[["facebook-android-sdk-3-6-0"]]
[["Forecast-master"]]
My problem is that I find nothing in out_file. How can I write to a file? Otherwise, I only need to get the name of the libs in the file.
Meditate on this:
"project(':facebook-android-sdk-3-6-0').projectDir'".scan(/project\(\'\:(.*)\'\).projectDir/)
# => [["facebook-android-sdk-3-6-0"]]
When scan sees the capturing (...), it will create a sub-array. That's not what you want. The knee-jerk reaction is to flatten the resulting array of arrays but that's really just a band-aid on the code because you chose the wrong method.
Instead consider this:
"project(':facebook-android-sdk-3-6-0').projectDir'"[/':([^']+)'/, 1]
# => "facebook-android-sdk-3-6-0"
This is using String's [] method to apply a regular expression with a capture and return that captured text. No sub-arrays are created.
scan is powerful and definitely has its place, but not for this sort of "find one thing" parsing.
Regarding your code, I'd do something like this untested code:
def get_array
File.new('./out.txt', 'w') do |out_file|
File.foreach('./file.txt') do |line|
l = line[/':([^']+)'/, 1]
out_file.puts l
puts l
end
end
end
Methods in Ruby are NOT camelCase, they're snake_case. Constants, like classes, start with a capital letter and are CamelCase. Don't go all Java on us, especially if you want to write code for a living. So GetArray should be get_array. Also, don't start methods with "get_", and don't call it array; Use to_a to be idiomatic.
When building a regular expression start simple and do your best to keep it simple. It's a maintainability thing and helps to reduce insanity. /':([^']+)'/ is a lot easier to read and understand, and accomplishes the same as your much-too-complex pattern. Regular expression engines are greedy and lazy and want to do as little work as possible, which is sometimes totally evil, but once you understand what they're doing it's possible to write very small/succinct patterns to accomplish big things.
Breaking it down, it basically says "find the first ': then start capturing text until the next ', which is what you're looking for. project( can be ignored as can ).projectDir.
And actually,
/':([^']+)'/
could really be written
/:([^']+)'/
but I felt generous and looked for the leading ' too.
The problem is that you're opening the file twice: once in:
out_file = File.new("./out.txt", "w")
and then once for each line:
File.open(out_file, "w") do |f| ...
Try this instead:
def GetArray
File.open("./out.txt", "w") do |f|
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
l.each do |ch|
f.write("#{ch}\n")
end # l.each
end # File.foreach
end # File.open
end # def GetArray

Push an array into another array with Ruby, and return square brackets

I've spent a few hours searching for a way to push an array into another array or into a hash. Apologies in advance if the formatting of this question is bit messy. This is the first time I've asked a question on StackOverflow so I'm trying to get the hang of styling my questions properly.
I have to write some code to make the following test unit past:
class TestNAME < Test::Unit::TestCase
def test_directions()
assert_equal(Lexicon.scan("north"), [['direction', 'north']])
result = Lexicon.scan("north south east")
assert_equal(result, [['direction', 'north'],
['direction', 'south'],
['direction', 'east']])
end
end
The most simple thing I've come up with is below. The first part passes, but then the second part is not returning the expected result when I run rake test.
Instead or returning:
[["direction", "north"], ["direction", "south"], ["direction",
"east"]]
it's returning:
["north", "south", "east"]
Although, if I print the result of y as a string to the console, I get 3 separate arrays that are not contained within another array (as below). Why hasn't it printed the outermost square brackets of the array, y?
["direction", "north"]
["direction", "south"]
["direction", "east"]
Below is the code I've written in an attempt to pass the test unit above:
class Lexicon
def initialize(stuff)
#words = stuff.split
end
def self.scan(word)
if word.include?(' ')
broken_words = word.split
broken_words.each do |word|
x = ['direction']
x.push(word)
y = []
y.push(x)
end
else
return [['direction', word]]
end
end
end
Any feedback about this will be much appreciated. Thank you all so much in advance.
What you're seeing is the result of each, which returns the thing being iterated over, or in this case, broken_words. What you want is collect which returns the transformed values. Notice in your original, y is never used, it's just thrown out after being composed.
Here's a fixed up version:
class Lexicon
def initialize(stuff)
#words = stuff.split
end
def self.scan(word)
broken_words = word.split(/\s+/)
broken_words.collect do |word|
[ 'direction', word ]
end
end
end
It's worth noting a few things were changed here:
Splitting on an arbitrary number of spaces rather than one.
Simplifying to a single case instead of two.
Eliminating the redundant return statement.
One thing you might consider is using a data structure like { direction: word } instead. That makes referencing values a lot easier since you'd do entry[:direction] avoiding the ambiguous entry[1].
If you're not instantiating Lexicon objects, you can use a Module which may make it more clear that you're not instantiating objects.
Also, there is no need to use an extra variable (i.e. broken_words), and I prefer the { } block syntax over the do..end syntax for functional blocks vs. iterative blocks.
module Lexicon
def self.scan str
str.split.map {|word| [ 'direction', word ] }
end
end
UPDATE: based on Cary's comment (I assume he meant split when he said scan), I've removed the superfluous argument to split.

Functionally find mapping of first value that passes a test

In Ruby, I have an array of simple values (possible encodings):
encodings = %w[ utf-8 iso-8859-1 macroman ]
I want to keep reading a file from disk until the results are valid. I could do this:
good = encodings.find{ |enc| IO.read(file, "r:#{enc}").valid_encoding? }
contents = IO.read(file, "r:#{good}")
...but of course this is dumb, since it reads the file twice for the good encoding. I could program it in gross procedural style like so:
contents = nil
encodings.each do |enc|
if (s=IO.read(file, "r:#{enc}")).valid_encoding?
contents = s
break
end
end
But I want a functional solution. I could do it functionally like so:
contents = encodings.map{|e| IO.read(f, "r:#{e}")}.find{|s| s.valid_encoding? }
…but of course that keeps reading files for every encoding, even if the first was valid.
Is there a simple pattern that is functional, but does not keep reading the file after a the first success is found?
If you sprinkle a lazy in there, map will only consume those elements of the array that are used by find - i.e. once find stops, map stops as well. So this will do what you want:
possible_reads = encodings.lazy.map {|e| IO.read(f, "r:#{e}")}
contents = possible_reads.find {|s| s.valid_encoding? }
Hopping on sepp2k's answer: If you can't use 2.0, lazy enums can be easily implemented in 1.9:
class Enumerator
def lazy_find
self.class.new do |yielder|
self.each do |element|
if yield(element)
yielder.yield(element)
break
end
end
end
end
end
a = (1..100).to_enum
p a.lazy_find { |i| i.even? }.first
# => 2
You want to use the break statement:
contents = encodings.each do |e|
s = IO.read( f, "r:#{e}" )
s.valid_encoding? and break s
end
The best I can come up with is with our good friend inject:
contents = encodings.inject(nil) do |s,enc|
s || (c=File.open(f,"r:#{enc}").valid_encoding? && c
end
This is still sub-optimal because it continues to loop through encodings after finding a match, though it doesn't do anything with them, so it's a minor ugliness. Most of the ugliness comes from...well, the code itself. :/

Nested scan block inside select block

I'm new to Ruby, I want to select some lines from a file that match a regex and then store to a list.
So I write the following code:
def get_valid_instr istream
p istream.select { |line| line.scan(/(^\s*\w+\s*;\s*\w+\s*$)/){|instr| instr[0].upcase.strip.split(";")}}
end
trace_instr = File.open("#{file_name}", "r"){|stream| get_valid_instr stream}
The output is simply the display of all file.
If I put a print in scan block, I see exactly what I want.
There are other ways to do that (filling an external list) but I wonder why it doesn't work and if there is ruby way.
If you pass a block to scan, it will return something different than if you don't:
"abc".scan(/./)
# => ["a", "b", "c"]
"abc".scan(/./) {|l| puts l }
# a
# b
# c
# => "abc"
You need to be aware of this when using scan.
However, even better than your current solution would be to use grep. You can pass both your regular expression and your block to grep.
It would be helpful to see some of the data you want to test with.
Is the data split by line? I'm not sure about you splitting by the semi-colon. What's the reason for that? If you could post some example data and some example output, I'll be able to help further.
This is my attempt at interpreting what you're trying to achieve, but it may be well off as I've not seen real data. Thanks!
def get_valid_instr(lines)
regex = /(^\s*\w+\s*;\s*\w+\s*$)/
lines.inject([]) do |matched_lines, line|
if match = line.match(regex)
p match[0]
matched_lines << match[0].upcase.strip.split(";")
end
matched_lines
end
end
trace_instr = get_valid_instr(File.readlines(file_name))
pp trace_instr
def get_valid_instr istream
istream.grep(/^\s*\w+\s*;\s*\w+\s*$/).map do |instr|
instr.upcase.strip.split(";")
end
end

Resources