In Ruby, I have an array of simple values (possible encodings):
encodings = %w[ utf-8 iso-8859-1 macroman ]
I want to keep reading a file from disk until the results are valid. I could do this:
good = encodings.find{ |enc| IO.read(file, "r:#{enc}").valid_encoding? }
contents = IO.read(file, "r:#{good}")
...but of course this is dumb, since it reads the file twice for the good encoding. I could program it in gross procedural style like so:
contents = nil
encodings.each do |enc|
if (s=IO.read(file, "r:#{enc}")).valid_encoding?
contents = s
break
end
end
But I want a functional solution. I could do it functionally like so:
contents = encodings.map{|e| IO.read(f, "r:#{e}")}.find{|s| s.valid_encoding? }
…but of course that keeps reading files for every encoding, even if the first was valid.
Is there a simple pattern that is functional, but does not keep reading the file after a the first success is found?
If you sprinkle a lazy in there, map will only consume those elements of the array that are used by find - i.e. once find stops, map stops as well. So this will do what you want:
possible_reads = encodings.lazy.map {|e| IO.read(f, "r:#{e}")}
contents = possible_reads.find {|s| s.valid_encoding? }
Hopping on sepp2k's answer: If you can't use 2.0, lazy enums can be easily implemented in 1.9:
class Enumerator
def lazy_find
self.class.new do |yielder|
self.each do |element|
if yield(element)
yielder.yield(element)
break
end
end
end
end
end
a = (1..100).to_enum
p a.lazy_find { |i| i.even? }.first
# => 2
You want to use the break statement:
contents = encodings.each do |e|
s = IO.read( f, "r:#{e}" )
s.valid_encoding? and break s
end
The best I can come up with is with our good friend inject:
contents = encodings.inject(nil) do |s,enc|
s || (c=File.open(f,"r:#{enc}").valid_encoding? && c
end
This is still sub-optimal because it continues to loop through encodings after finding a match, though it doesn't do anything with them, so it's a minor ugliness. Most of the ugliness comes from...well, the code itself. :/
Related
I made a simple program with a single method and I'm trying to test it, but I keep getting this weird error, and I have no idea why it keeps happening.
Here's my code for the only method I wrote:
def make_database(lines)
i = 0
foods = hash.new()
while i < lines.length do
lines[i] = lines[i].chomp()
words = lines[i].split(',')
if(words[1].casecmp("b") == 0)
foods[words[0]] = words[3]
end
end
return foods
end
And then here's what I have for calling the method (Inside the same program).
if __FILE__ == $PROGRAM_NAME
lines = []
$stdin.each { |line| lines << line}
foods = make_database(lines).new
puts foods
end
I am painfully confused, especially since it gives me a different random number for each "Undefined method 'new' for (Random number)".
It's a simple mistake. hash calls a method on the current object that returns a number used by the Hash structure for indexing entries, where Hash is the hash class you're probably intending:
foods = Hash.new()
Or more succinctly:
foods = { }
It's ideal to use { } in place of Hash.new unless you need to specify things like defaults, as is the case with:
Hash.new(0)
Where all values are initialized to 0 by default. This can be useful when creating simple counters.
Ruby classes are identified by leading capital letters to avoid confusion like this. Once you get used to the syntax you'll have an easier time spotting mistakes like that.
Note that when writing Ruby code you will almost always omit braces/brackets on empty argument lists. That is x() is expressed simply as x. This keeps code more readable, especially when chaining, like x.y.z instead of x().y().z()
Other things to note include being able to read in all lines with readlines instead of what you have there where you manually compose it. Try:
make_database($stdin.readlines.map(&:chomp))
A more aggressive refactoring of your code looks like this:
def make_database(lines)
# Define a Hash based on key/value pairs in an Array...
Hash[
# ...where these pairs are based on the input lines...
lines.map do |line|
# ...which have comma-separated components.
line.split(',')
end.reject do |key, flag, _, value|
# Pick out only those that have the right flag.
flag.downcase == 'b'
end.map do |key, flag, _, value|
# Convert to a simple key/value pair array
[ key, value ]
end
]
end
That might be a little hard to follow, but once you get the hang of chaining together a series of otherwise simple operations your Ruby code will be a lot more flexible and far easier to read.
This is a part of my file:
project(':facebook-android-sdk-3-6-0').projectDir = new File('facebook-android-sdk-3-6-0/facebook-android-sdk-3.6.0/facebook')
project(':Forecast-master').projectDir = new File('forecast-master/Forecast-master/Forecast')
project(':headerListView').projectDir = new File('headerlistview/headerListView')
project(':library-sliding-menu').projectDir = new File('library-sliding-menu/library-sliding-menu')
I need to extract the names of the libs. This is my ruby function:
def GetArray
out_file = File.new("./out.txt", "w")
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
File.open(out_file, "w") do |f|
l.each do |ch|
f.write("#{ch}\n")
end
end
puts "#{l} "
end
end
My function returns this:
[]
[["CoverFlowLibrary"]]
[["Android-RSS-Reader-Library-master"]]
[["library"]]
[["facebook-android-sdk-3-6-0"]]
[["Forecast-master"]]
My problem is that I find nothing in out_file. How can I write to a file? Otherwise, I only need to get the name of the libs in the file.
Meditate on this:
"project(':facebook-android-sdk-3-6-0').projectDir'".scan(/project\(\'\:(.*)\'\).projectDir/)
# => [["facebook-android-sdk-3-6-0"]]
When scan sees the capturing (...), it will create a sub-array. That's not what you want. The knee-jerk reaction is to flatten the resulting array of arrays but that's really just a band-aid on the code because you chose the wrong method.
Instead consider this:
"project(':facebook-android-sdk-3-6-0').projectDir'"[/':([^']+)'/, 1]
# => "facebook-android-sdk-3-6-0"
This is using String's [] method to apply a regular expression with a capture and return that captured text. No sub-arrays are created.
scan is powerful and definitely has its place, but not for this sort of "find one thing" parsing.
Regarding your code, I'd do something like this untested code:
def get_array
File.new('./out.txt', 'w') do |out_file|
File.foreach('./file.txt') do |line|
l = line[/':([^']+)'/, 1]
out_file.puts l
puts l
end
end
end
Methods in Ruby are NOT camelCase, they're snake_case. Constants, like classes, start with a capital letter and are CamelCase. Don't go all Java on us, especially if you want to write code for a living. So GetArray should be get_array. Also, don't start methods with "get_", and don't call it array; Use to_a to be idiomatic.
When building a regular expression start simple and do your best to keep it simple. It's a maintainability thing and helps to reduce insanity. /':([^']+)'/ is a lot easier to read and understand, and accomplishes the same as your much-too-complex pattern. Regular expression engines are greedy and lazy and want to do as little work as possible, which is sometimes totally evil, but once you understand what they're doing it's possible to write very small/succinct patterns to accomplish big things.
Breaking it down, it basically says "find the first ': then start capturing text until the next ', which is what you're looking for. project( can be ignored as can ).projectDir.
And actually,
/':([^']+)'/
could really be written
/:([^']+)'/
but I felt generous and looked for the leading ' too.
The problem is that you're opening the file twice: once in:
out_file = File.new("./out.txt", "w")
and then once for each line:
File.open(out_file, "w") do |f| ...
Try this instead:
def GetArray
File.open("./out.txt", "w") do |f|
File.foreach("./file.txt") do |line|
l=line.scan(/project\(\'\:(.*)\'\).projectDir/)
l.each do |ch|
f.write("#{ch}\n")
end # l.each
end # File.foreach
end # File.open
end # def GetArray
Update: for the record, here's the implementation I ended up using.
Here's a trimmed down version of a parser I'm working on. There's still some code, but it should be quite easy to grasp the basic concepts of this parser.
class Markup
def initialize(markup)
#markup = markup
end
def to_html
#html ||= #markup.split(/(\r\n){2,}|\n{2,}/).map {|p| Paragraph.new(p).to_html }.join("\n")
end
class Paragraph
def initialize(paragraph)
#p = paragraph
end
def to_html
#p.gsub!(/'{3}([^']+)'{3}/, "<strong>\\1</strong>")
#p.gsub!(/'{2}([^']+)'{2}/, "<em>\\1</em>")
#p.gsub!(/`([^`]+)`/, "<code>\\1</code>")
case #p
when /^=/
level = (#p.count("=") / 2) + 1 # Starting on h2
#p.gsub!(/^[= ]+|[= ]+$/, "")
"<h#{level}>" + #p + "</h#{level}>"
when /^(\*|\#)/
# I'm parsing lists here. Quite a lot of code, and not relevant, so
# I'm leaving it out.
else
#p.gsub!("\n", "\n<br/>")
"<p>" + #p + "</p>"
end
end
end
end
p Markup.new("Here is `code` and ''emphasis'' and '''bold'''!
Baz").to_html
# => "<p>Here is <code>code</code> and <em>emphasis</em> and <strong>bold</strong>!</p>\n<p>Baz</p>"
So, as you can see, I'm breaking the text into paragraphs, and each paragraph is either a header, a list or a regular paragraph.
Is it feasible to add support for nowiki tags (where everything between <nowiki></nowiki> is not being parsed) for a parser like this? Feel free to answer "no", and suggest alternative methods of creating a parser :)
As a sidenote, you can see the actual parser code on Github. markup.rb and paragraph.rb
If you make use of a simple tokenizer, it's much easier to manage this sort of thing. One approach is to create a single regular expression that can capture your entire grammar, but this might prove to be problematic. An alternative is to split up the document into sections that need to be rewritten, and sections that should be skipped, which is likely the easier approach here.
Here's a simple framework you can extend as required:
def wiki_subst(string)
buffer = string.dup
result = ''
while (m = buffer.match(/<\s*nowiki\s*>.*?<\s*\/\s*nowiki\s*>/i))
result << yield(m.pre_match)
result << m.to_s
buffer = m.post_match
end
result << yield(buffer)
result
end
example = "replace me<nowiki>but not me</nowiki>replace me too<NOWIKI>but not me either</nowiki>and me"
puts wiki_subst(example) { |s| s.upcase }
# => REPLACE ME<nowiki>but not me</nowiki>REPLACE ME TOO<NOWIKI>but not me either</nowiki>AND ME
I've got a bit of an odd situation. If I were using a hash, this issue would be easy, however, I'm trying to use "OpenStruct" in Ruby as it provides some decently cool features.
Basically, I think I need to "constantize" a return value. I've got a regular expression:
textopts = OpenStruct.new()
textopts.recipients = []
fileparts = fhandle.read.split("<<-->>")
fileparts[0].chomp.each{|l|
if l =~ /Recipient.*/i
textopts.recipients << $&
elsif l =~ /(ServerAddress.*|EmailAddress.*)/i
textopts.$& = $&.split(":")[1]
end
}
I need a way to turn the $& for "textopts" into a valid property for filling. I've tried "constantize" and some others, but nothing works. I would assume this is possible, but perhaps I'm wrong. Obviously if I were using a hash I could just do "textopts[$&] = .....".
Any ideas?
Keeping the structure of your solution, this is one way to do it:
textopts = OpenStruct.new(:recipients => [])
fileparts = fhandle.read.split('<<-->>')
fileparts.first.chomp.each_line do |l|
case l
when /Recipient.*/i
textopts.recipients << $&
when /(Server|Email)Address.*/i
textopts.send "#{$&}=", $&.split(':')[1]
end
end
But I can't help but think that this should be a proper parser.
I'm just starting out using Ruby and I've written a bit of code to do basic parsing of a CSV file (Line is a basic class, omitted for brevity):
class File
def each_csv
each do |line|
yield line.split(",")
end
end
end
lines = Array.new
File.open("some.csv") do |file|
file.each_csv do |csv|
lines << Line.new(:field1 => csv[0], :field2 => csv[1])
end
end
I have a feeling I would be better off using collect somehow rather than pushing each Line onto the array but I can't work out how to do it.
Can anyone show me how to do it or is it perfectly fine as it is?
Edit: I should have made it clear that I'm not actually going to use this code in production, it's more to get used to the constructs of the language. It is still useful to know there are libraries to do this properly though.
Here's a (possibly wild) idea, use the Struct class instead of rolling your own simple POD class. But what you want from this is to have a constructor that accepts all of the arguments that could be generated from the file data.
Line = Struct.new(:field1, :field2, :field3)
Then at the core of the algorithm you want something like:
File.open("test.csv").lines.inject([]) do |result, line|
result << Line.new(line.split(",", Line.length))
end
or being a bit more concise and functional-like:
lines = File.open("test.csv").lines.map { |line| Line.new(line.split(",", Line.length)) }
To be honest I haven't used the Struct class much, but I should be, and I will probably refactor stuff already written to use it. It allows you to access the variables by their names like:
Line.field1 = blah
Line.field2 = 1
The Ruby Struct class.
So to actually answer your question, and looking above at the code, I would say it would be much simpler to use collect/map to perform the computation. The map function together with inject are very powerful and I find I use them quite frequently.
I don't know if you are aware of it, but ruby has it's own class for parsing and writing CSV files.
I found an example of using collect to turn a csv file into an array of hashes.
def csv_to_array(file_location)
csv = CSV::parse(File.open(file_location, 'r') {|f| f.read })
fields = csv.shift
csv.collect { |record| Hash[*(0..(fields.length - 1)).collect {|index| [fields[index],record[index].to_s] }.flatten ] }
end
This example is taken from this article.
If you are unfamiliar with the * notion, it basically dissolves the outer [] brackets, turning an array into a comma separated list of its elements.
Have you looked at FasterCSV, it does what your trying to do here, along with dealing with some of the brain deadness you find in some CSV files
See how this works for you (functional programming is fun!):
Try using inject. Inject takes as a parameter the starting "accumulator", and then a two parameter block:
[1,2,3].inject(0) { |sum,num| sum+num }
is naturally 6
[1,2,3].inject(5) { |sum,num| sum+num }
is 11
[1,2,3].inject(2) { |sum,num| sum*num }
is 12
To the point:
class Line
def initialize(options)
#options = options
end
def to_s
#options[:field1]+" "+#options[:field2]
end
end
File.open("test.csv").lines.inject([]) do |lines,line|
split = line.split(",")
lines << Line.new(:field1 => split[0],:field2 => split[1])
end