Nested scan block inside select block - ruby

I'm new to Ruby, I want to select some lines from a file that match a regex and then store to a list.
So I write the following code:
def get_valid_instr istream
p istream.select { |line| line.scan(/(^\s*\w+\s*;\s*\w+\s*$)/){|instr| instr[0].upcase.strip.split(";")}}
end
trace_instr = File.open("#{file_name}", "r"){|stream| get_valid_instr stream}
The output is simply the display of all file.
If I put a print in scan block, I see exactly what I want.
There are other ways to do that (filling an external list) but I wonder why it doesn't work and if there is ruby way.

If you pass a block to scan, it will return something different than if you don't:
"abc".scan(/./)
# => ["a", "b", "c"]
"abc".scan(/./) {|l| puts l }
# a
# b
# c
# => "abc"
You need to be aware of this when using scan.
However, even better than your current solution would be to use grep. You can pass both your regular expression and your block to grep.

It would be helpful to see some of the data you want to test with.
Is the data split by line? I'm not sure about you splitting by the semi-colon. What's the reason for that? If you could post some example data and some example output, I'll be able to help further.
This is my attempt at interpreting what you're trying to achieve, but it may be well off as I've not seen real data. Thanks!
def get_valid_instr(lines)
regex = /(^\s*\w+\s*;\s*\w+\s*$)/
lines.inject([]) do |matched_lines, line|
if match = line.match(regex)
p match[0]
matched_lines << match[0].upcase.strip.split(";")
end
matched_lines
end
end
trace_instr = get_valid_instr(File.readlines(file_name))
pp trace_instr

def get_valid_instr istream
istream.grep(/^\s*\w+\s*;\s*\w+\s*$/).map do |instr|
instr.upcase.strip.split(";")
end
end

Related

Puts arrays in file using ruby

This is a part of my file:
project(':facebook-android-sdk-3-6-0').projectDir = new File('facebook-android-sdk-3-6-0/facebook-android-sdk-3.6.0/facebook')
project(':Forecast-master').projectDir = new File('forecast-master/Forecast-master/Forecast')
project(':headerListView').projectDir = new File('headerlistview/headerListView')
project(':library-sliding-menu').projectDir = new File('library-sliding-menu/library-sliding-menu')
I need to extract the names of the libs. This is my ruby function:
def GetArray
out_file = File.new("./out.txt", "w")
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
File.open(out_file, "w") do |f|
l.each do |ch|
f.write("#{ch}\n")
end
end
puts "#{l} "
end
end
My function returns this:
[]
[["CoverFlowLibrary"]]
[["Android-RSS-Reader-Library-master"]]
[["library"]]
[["facebook-android-sdk-3-6-0"]]
[["Forecast-master"]]
My problem is that I find nothing in out_file. How can I write to a file? Otherwise, I only need to get the name of the libs in the file.
Meditate on this:
"project(':facebook-android-sdk-3-6-0').projectDir'".scan(/project\(\'\:(.*)\'\).projectDir/)
# => [["facebook-android-sdk-3-6-0"]]
When scan sees the capturing (...), it will create a sub-array. That's not what you want. The knee-jerk reaction is to flatten the resulting array of arrays but that's really just a band-aid on the code because you chose the wrong method.
Instead consider this:
"project(':facebook-android-sdk-3-6-0').projectDir'"[/':([^']+)'/, 1]
# => "facebook-android-sdk-3-6-0"
This is using String's [] method to apply a regular expression with a capture and return that captured text. No sub-arrays are created.
scan is powerful and definitely has its place, but not for this sort of "find one thing" parsing.
Regarding your code, I'd do something like this untested code:
def get_array
File.new('./out.txt', 'w') do |out_file|
File.foreach('./file.txt') do |line|
l = line[/':([^']+)'/, 1]
out_file.puts l
puts l
end
end
end
Methods in Ruby are NOT camelCase, they're snake_case. Constants, like classes, start with a capital letter and are CamelCase. Don't go all Java on us, especially if you want to write code for a living. So GetArray should be get_array. Also, don't start methods with "get_", and don't call it array; Use to_a to be idiomatic.
When building a regular expression start simple and do your best to keep it simple. It's a maintainability thing and helps to reduce insanity. /':([^']+)'/ is a lot easier to read and understand, and accomplishes the same as your much-too-complex pattern. Regular expression engines are greedy and lazy and want to do as little work as possible, which is sometimes totally evil, but once you understand what they're doing it's possible to write very small/succinct patterns to accomplish big things.
Breaking it down, it basically says "find the first ': then start capturing text until the next ', which is what you're looking for. project( can be ignored as can ).projectDir.
And actually,
/':([^']+)'/
could really be written
/:([^']+)'/
but I felt generous and looked for the leading ' too.
The problem is that you're opening the file twice: once in:
out_file = File.new("./out.txt", "w")
and then once for each line:
File.open(out_file, "w") do |f| ...
Try this instead:
def GetArray
File.open("./out.txt", "w") do |f|
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
l.each do |ch|
f.write("#{ch}\n")
end # l.each
end # File.foreach
end # File.open
end # def GetArray

Using ARGF and hashes?

I'm trying to write a ruby program that contains a method that will convert a single letter amino acid code into it's corresponding amino acid name after reading from ARGF.
Here's the code that I have so far:
#!/usr/bin/ruby
def convert
aminos = Hash.new
aminos = {"A" => "alanine", "B" => "aspartate/asparagine", "C" => "cystine"}
output = aminos.values_at()
puts "#{output}"
end
ARGF.each_char do convert
end
If I run this program, it does the following:
$ ruby amino_acid.rb ABC
[]
[]
[]
When what I want it to do is this:
$ ruby amino_acid.rb ABC
alanine
aspartate/asparagine
cystine
I am not sure what to put in the parentheses after
aminos.values_at
or if I am even on the right track?
I think I have to pass a parameter to this method, maybe something like
def convert {|k,v| puts "#{v}}
Thanks in advance!
You are using ARGF when you are wanting to use ARGV based on your example code.
code review will give you other avenues for style, etc. But I think this will get you going.
#!/usr/bin/env ruby
def convert(value)
aminos = { 'A' => 'alanine', 'B' => 'aspartate/asparagine', 'C' => 'cystine' }
aminos[value.upcase]
end
ARGV.to_s.each_char do |string|
puts convert string
end
You could, of course, put that statement inside your method, but really methods should do a very focused thing. Passing it an argument, rather than it acting on ARGV directly gives it freedom to be used in different ways. A method should do one thing, generally. Keeps them easier to maintain.

Read files into variables, using Dir and arrays

For an assignment, I'm using the Dir.glob method to read a series of famous speech files, and then perform some basic speech analytics on each one (number of words, number of sentences, etc). I'm able to read the files, but have not figured out how to read each file into a variable, so that I may operate on the variables later.
What I've got is:
Dir.glob('/students/~pathname/public_html/speeches/*.txt').each do |speech|
#code to process the speech.
lines = File.readlines(speech)
puts lines
end
This prints all the speeches out onto the page as one huge block of text. Can anyone offer some ideas as to why?
What I'd like to do, within that code block, is to read each file into a variable, and then perform operations on each variable such as:
Dir.glob('/students/~pathname/public_html/speeches/*.txt').each do |speech|
#code to process the speech.
lines = File.readlines(speech)
text = lines.join
line_count = lines.size
sentence_count = text.split(/\.|\?|!/).length
paragraph_count = text.split(/\n\n/).length
puts "#{line_count} lines"
puts "#{sentence_count} sentences"
puts "#{paragraph_count} paragraphs"
end
Any advice or insight would be hugely appreciated! Thanks!
Regarding your first question:
readLines converts the file into an array of Strings and what you then see is the behaviour of puts with an array of Strings as the argument.
Try puts lines.inspect if you would rather see the data as an array.
Also: Have a look at the Ruby console irb in case you have not done so already. It is very useful for trying out the kinds of things you are asking about.
Here's what wound up working:
speeches = []
Dir.glob('/PATH TO DIRECTORY/speeches/*.txt').each do |speech|
#code to process the speech.
f = File.readlines(speech)
speeches << f
end
def process_file(file_name)
# count the lines
line_count = file_name.size
return line_count
end
process_file(speeches[0])

Functionally find mapping of first value that passes a test

In Ruby, I have an array of simple values (possible encodings):
encodings = %w[ utf-8 iso-8859-1 macroman ]
I want to keep reading a file from disk until the results are valid. I could do this:
good = encodings.find{ |enc| IO.read(file, "r:#{enc}").valid_encoding? }
contents = IO.read(file, "r:#{good}")
...but of course this is dumb, since it reads the file twice for the good encoding. I could program it in gross procedural style like so:
contents = nil
encodings.each do |enc|
if (s=IO.read(file, "r:#{enc}")).valid_encoding?
contents = s
break
end
end
But I want a functional solution. I could do it functionally like so:
contents = encodings.map{|e| IO.read(f, "r:#{e}")}.find{|s| s.valid_encoding? }
…but of course that keeps reading files for every encoding, even if the first was valid.
Is there a simple pattern that is functional, but does not keep reading the file after a the first success is found?
If you sprinkle a lazy in there, map will only consume those elements of the array that are used by find - i.e. once find stops, map stops as well. So this will do what you want:
possible_reads = encodings.lazy.map {|e| IO.read(f, "r:#{e}")}
contents = possible_reads.find {|s| s.valid_encoding? }
Hopping on sepp2k's answer: If you can't use 2.0, lazy enums can be easily implemented in 1.9:
class Enumerator
def lazy_find
self.class.new do |yielder|
self.each do |element|
if yield(element)
yielder.yield(element)
break
end
end
end
end
end
a = (1..100).to_enum
p a.lazy_find { |i| i.even? }.first
# => 2
You want to use the break statement:
contents = encodings.each do |e|
s = IO.read( f, "r:#{e}" )
s.valid_encoding? and break s
end
The best I can come up with is with our good friend inject:
contents = encodings.inject(nil) do |s,enc|
s || (c=File.open(f,"r:#{enc}").valid_encoding? && c
end
This is still sub-optimal because it continues to loop through encodings after finding a match, though it doesn't do anything with them, so it's a minor ugliness. Most of the ugliness comes from...well, the code itself. :/

Would it be better to use collect in this situation?

I'm just starting out using Ruby and I've written a bit of code to do basic parsing of a CSV file (Line is a basic class, omitted for brevity):
class File
def each_csv
each do |line|
yield line.split(",")
end
end
end
lines = Array.new
File.open("some.csv") do |file|
file.each_csv do |csv|
lines << Line.new(:field1 => csv[0], :field2 => csv[1])
end
end
I have a feeling I would be better off using collect somehow rather than pushing each Line onto the array but I can't work out how to do it.
Can anyone show me how to do it or is it perfectly fine as it is?
Edit: I should have made it clear that I'm not actually going to use this code in production, it's more to get used to the constructs of the language. It is still useful to know there are libraries to do this properly though.
Here's a (possibly wild) idea, use the Struct class instead of rolling your own simple POD class. But what you want from this is to have a constructor that accepts all of the arguments that could be generated from the file data.
Line = Struct.new(:field1, :field2, :field3)
Then at the core of the algorithm you want something like:
File.open("test.csv").lines.inject([]) do |result, line|
result << Line.new(line.split(",", Line.length))
end
or being a bit more concise and functional-like:
lines = File.open("test.csv").lines.map { |line| Line.new(line.split(",", Line.length)) }
To be honest I haven't used the Struct class much, but I should be, and I will probably refactor stuff already written to use it. It allows you to access the variables by their names like:
Line.field1 = blah
Line.field2 = 1
The Ruby Struct class.
So to actually answer your question, and looking above at the code, I would say it would be much simpler to use collect/map to perform the computation. The map function together with inject are very powerful and I find I use them quite frequently.
I don't know if you are aware of it, but ruby has it's own class for parsing and writing CSV files.
I found an example of using collect to turn a csv file into an array of hashes.
def csv_to_array(file_location)
csv = CSV::parse(File.open(file_location, 'r') {|f| f.read })
fields = csv.shift
csv.collect { |record| Hash[*(0..(fields.length - 1)).collect {|index| [fields[index],record[index].to_s] }.flatten ] }
end
This example is taken from this article.
If you are unfamiliar with the * notion, it basically dissolves the outer [] brackets, turning an array into a comma separated list of its elements.
Have you looked at FasterCSV, it does what your trying to do here, along with dealing with some of the brain deadness you find in some CSV files
See how this works for you (functional programming is fun!):
Try using inject. Inject takes as a parameter the starting "accumulator", and then a two parameter block:
[1,2,3].inject(0) { |sum,num| sum+num }
is naturally 6
[1,2,3].inject(5) { |sum,num| sum+num }
is 11
[1,2,3].inject(2) { |sum,num| sum*num }
is 12
To the point:
class Line
def initialize(options)
#options = options
end
def to_s
#options[:field1]+" "+#options[:field2]
end
end
File.open("test.csv").lines.inject([]) do |lines,line|
split = line.split(",")
lines << Line.new(:field1 => split[0],:field2 => split[1])
end

Resources