Strange number conversion while reading a csv file with ruby - ruby

i've got a strange problem in ruby on rails
There is a csv file, made with Excel 2003.
5437390264172534;Mark;5
I have a page with upload input and i read the file like this:
file = params[:upload]['datafile']
file.read.split("\n").each do |line|
num,name,type = line.split(";")
logger.debug "row: #{num} #{name} #{type}"
end
etc
So. finally i've got the following:
num = 5437...2534
name = Mark
type = 5
Why num has so strange value?
Also i tried to do like this:
str = file.read
csv = CSV.parse(str)
csv.each do |line|
RAILS_DEFAULT_LOGGER.info "######## #{line.to_yaml}"
end
but again i got
######## ---
- !str:CSV::Cell "5437...2534;Mark;5"
The csv file in win1251 (i can't change file encoding)
ruby file in UTF8
ruby version 1.8.4
rails version 2.0.2

If it indeed has a strange value, it probably has to to do with the code you didn't post. Edit your question, and include the smallest bit of code that will run independently and still produce your questionable output.
split() returns an array of strings. So the first value of your CSV file is a String, not a Bignum. Maybe you need num.to_i, or a test like num.is_a?(Bignum) somewhere in your code.
file = File.open("test.csv", "r")
# Just getting the first line
line = file.gets
num,name,type = line.split(";")
# split() returns an array of String
puts num.class
puts num
# Make num a number
puts num.to_i.class
puts num.to_i
file.close
Running that file here gives me this:
$ ruby test.rb
String
5437390264172534
Bignum
5437390264172534

Related

Stub a CSV file in Ruby Test::Unit

So I have a Ruby script which parses a CSV file as a list of rules and does some processing and returns an array of hashes as processed data.
CSV looks like this:
id_number,rule,notes
1,"dummy_rule","necessary"
2,"sample_rule","optional"
The parsing of CSV looks like this:
def parse_csv(file_name)
filtered_data = []
CSV.foreach(file_name, headers: true) do |row|
filtered_data << row # some processing
end
filtered_data
end
Now I am wondering if it is possible to stub/mock an actual CSV file for unit testing in such a way that I could pass a "filename" into this function. I could make a CSV object and use the generate function but then the actual filename would not be possible.
def test_parse_csv
expected_result = ["id_number"=>1, "rule"=>"dummy_rule", "notes"=>"optional"}]
# stub/mock csv with filename: say "rules.csv"
assert.equal(expected_result, parse_csv(file_name))
end
I use Test::Ruby
I also found a ruby gem library called mocha but I don't know how this works for CSVs
https://github.com/freerange/mocha
Any thoughts are welcome!
I would create a support directory inside your test directory then create a file..
# test/support/rules.csv
id_number,rule,notes
1,"dummy_rule","necessary"
2,"sample_rule","optional"
Your test should look like this, also adding an opening curly bracket which looks like you've missed on line #2.
def test_parse_csv
expected_result = [{"id_number"=>1, "rule"=>"dummy_rule", "notes"=>"optional"}]
assert.equal(expected_result, parse_csv(File.read(csv_file)).first)
end
def csv_file
Rails.root.join('test', 'support', 'rules.csv')
end
EDIT: Sorry, I should have noticed this wasn't Ruby on Rails... heres a ruby solution:
def test_parse_csv
expected_result = [{"id_number"=>1, "rule"=>"dummy_rule", "notes"=>"optional"}]
assert.equal(expected_result, parse_csv(csv_file).first)
end
def csv_file
File.read('test/support/rules.csv')
end
I did this in my test let(:raw_csv) { "row_number \n 1 \n 2" }, telling myself that a CSV file was surely read as a simple string with linebreaks and commas... and it works fine. Be careful with additional whitespaces and commas, it is easy to make a mistake.

Ruby - iterate tasks with files

I am struggling to iterate tasks with files in Ruby.
(Purpose of the program = every week, I have to save 40 pdf files off the school system containing student scores, then manually compare them to last week's pdfs and update one spreadsheet with every student who has passed their target this week. This is a task for a computer!)
I have converted a pdf file to text, and my program then extracts the correct data from the text files and turns each student into an array [name, score, house group]. It then checks each new array against the data in the csv file, and adds any new results.
My program works on a single pdf file, because I've manually typed in:
f = File.open('output\agb summer report.txt')
agb = []
f.each_line do |line|
agb.push line
end
But I have a whole folder of pdf files that I want to run the program on iteratively. I've also had problems when I try to write each result to a new-named file.
I've tried things with variables and code blocks, but I now don't think you can use a variable in that way?
Dir.foreach('output') do |ea|
f = File.open(ea)
agb = []
f.each_line do |line|
agb.push line
end
end
^ This doesn't work. I've also tried exporting the directory names to an array, and doing something like:
a.each do |ea|
var = '\'output\\' + ea + '\''
f = File.open(var)
agb = []
f.each_line do |line|
agb.push line
end
end
I think I'm fundamentally confused about the sorts of object File and Dir are? I've searched a lot and haven't found a solution yet. I am fairly new to Ruby.
Anyway, I'm sure this can be done - my current backup plan is to copy my program 40 times with different details, but that sounds absurd. Please offer thoughts?
You're very close. Dir.foreach() will return the name of the files whereas File.open() is going to want the path. A crude example to illustrate this:
directory = 'example_directory'
Dir.foreach(directory) do |file|
# Assuming Unix style filesystem, skip . and ..
next if file.start_with? '.'
# Simply puts the contents
path = File.join(directory, file)
puts File.read(path)
end
Use Globbing for File Lists
You need to use Dir#glob to get your list of files. For example, given three PDF files in /tmp/pdf, you collect them with a glob like so:
Dir.glob('/tmp/pdf/*pdf')
# => ["/tmp/pdf/1.pdf", "/tmp/pdf/2.pdf", "/tmp/pdf/3.pdf"]
Dir.glob('/tmp/pdf/*pdf').class
# => Array
Once you have a list of filenames, you can iterate over them with something like:
Dir.glob('/tmp/pdf/*pdf').each do |pdf|
text = %x(pdftotext "#{pdf}")
# do something with your textual data
end
If you're on a Windows system, then you might need a gem like pdf-reader or something else from Ruby Toolbox that suits you better to actually parse the PDF. Regardless, you should use globbing to create a file list; what you do after that depends on what kind of data the file actually holds. IO#read and descendants like File#read are good places to start.
Handling Text Files
If you're dealing with text files rather than PDF files, then something like this will get you started:
Dir.glob('/tmp/pdf/*txt').each do |text|
# Do something with your textual data. In this case, just
# dump the files to standard output.
p File.read(text)
end
You can use Dir.new("./") to get all the files in the current directory
so something like this should work.
file_names = Dir.new "./"
file_names.each do |file_name|
if file_name.end_with? ".txt"
f = File.open(file_name)
agb = []
f.each_line do |line|
agb.push line
end
end
end
btw, you can just use agb = f.to_a to convert the file contents into an array were each element is a line from the file.
file_names = Dir.new "./"
file_names.each do |file_name|
if file_name.end_with? ".txt"
f = File.open file_name
agb = f.to_a
# do whatever processing you need to do
end
end
if you assign your target folder like this /path/to/your/folder/*.txt it will only iterate over text files.
2.2.0 :009 > target_folder = "/home/ziya/Desktop/etc3/example_folder/*.txt"
=> "/home/ziya/Desktop/etc3/example_folder/*.txt"
2.2.0 :010 > Dir[target_folder].each do |texts|
2.2.0 :011 > puts texts
2.2.0 :012?> end
/home/ziya/Desktop/etc3/example_folder/ex4.txt
/home/ziya/Desktop/etc3/example_folder/ex3.txt
/home/ziya/Desktop/etc3/example_folder/ex2.txt
/home/ziya/Desktop/etc3/example_folder/ex1.txt
iteration over text files is ok
2.2.0 :002 > Dir[target_folder].each do |texts|
2.2.0 :003 > File.open(texts, 'w') {|file| file.write("your content\n")}
2.2.0 :004?> end
results
2.2.0 :008 > system ("pwd")
/home/ziya/Desktop/etc3/example_folder
=> true
2.2.0 :009 > system("for f in *.txt; do cat $f; done")
your content
your content
your content
your content

CSV.generate and converters?

I'm trying to create a converter to remove newline characters from CSV output.
I've got:
nonewline=lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
I've verified that this works properly IF I load a variable and then run something like:
csv=CSV(variable,:converters=>[nonewline])
However, I'm attempting to use this code to update a bunch of preexisting code using CSV.generate, and it does not appear to work at all.
CSV.generate(:converters=>[nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
returns:
"\"hello\ngoodbye\"\n"
I've tried quite a few things as well as trying other examples I've found online, and it appears as though :converters has no effect when used with CSV.generate.
Is this correct, or is there something I'm missing?
You need to write your converter as as below :
CSV::Converters[:nonewline] = lambda do |s|
s.gsub(/(\r?\n)+/,' ')
end
Then do :
CSV.generate(:converters => [:nonewline]) do |csv|
csv << ["hello\ngoodbye"]
end
Read the documentation Converters .
Okay, above part I didn't remove, as to show you how to write the custom CSV converters. The way you wrote it is incorrect.
Read the documentation of CSV::generate
This method wraps a String you provide, or an empty default String, in a CSV object which is passed to the provided block. You can use the block to append CSV rows to the String and when the block exits, the final String will be returned.
After reading the docs, it is quite clear that this method is for writing to a csv file, not for reading. Now all the converters options ( like :converters, :header_converters) is applied, when you are reading a CSV file, but not applied when you are writing into a CSV file.
Let me show you 2 examples to illustrate this more clearly.
require 'csv'
string = <<_
foo,bar
baz,quack
_
File.write('a',string)
CSV::Converters[:upcase] = lambda do |s|
s.upcase
end
I am reading from a CSV file, so :converters option is applied to it.
CSV.open('a','r',:converters => :upcase) do |csv|
puts csv.read
end
output
# >> FOO
# >> BAR
# >> BAZ
# >> QUACK
Now I am writing into the CSV file, converters option is not applied.
CSV.open('a','w',:converters => :upcase) do |csv|
csv << ['dog','cat']
end
CSV.read('a') # => [["dog", "cat"]]
Attempting to remove newlines using :converters did not work.
I had to override the << method from csv.rb adding the following code to it:
# Change all CR/NL's into one space
row.map! { |element|
if element.is_a?(String)
element.gsub(/(\r?\n)+/,' ')
else
element
end
}
Placed right before
output = row.map(&#quote).join(#col_sep) + #row_sep # quote and separate
at line 21.
I would think this would be a good patch to CSV, as newlines will always produce bad CSV output.

How do I make an array of arrays out of a CSV?

I have a CSV file that looks like this:
Jenny, jenny#example.com ,
Ricky, ricky#example.com ,
Josefina josefina#example.com ,
I'm trying to get this output:
users_array = [
['Jenny', 'jenny#example.com'], ['Ricky', 'ricky#example.com'], ['Josefina', 'josefina#example.com']
]
I've tried this:
users_array = Array.new
file = File.new('csv_file.csv', 'r')
file.each_line("\n") do |row|
puts row + "\n"
columns = row.split(",")
users_array.push columns
puts users_array
end
Unfortunately, in Terminal, this returns:
Jenny
jenny#example.com
Ricky
ricky#example.com
Josefina
josefina#example.com
Which I don't think will work for this:
users_array.each_with_index do |user|
add_page.form_with(:id => 'new_user') do |f|
f.field_with(:id => "user_email").value = user[0]
f.field_with(:id => "user_name").value = user[1]
end.click_button
end
What do I need to change? Or is there a better way to solve this problem?
Ruby's standard library has a CSV class with a similar api to File but contains a number of useful methods for working with tabular data. To get the output you want, all you need to do is this:
require 'csv'
users_array = CSV.read('csv_file.csv')
PS - I think you are getting the output you expected with your file parsing as well, but maybe you're thrown off by how it is printing to the terminal. puts behaves differently with arrays, printing each member object on a new line instead of as a single array. If you want to view it as an array, use puts my_array.inspect.
Assuming that your CSV file actually has a comma between the name and email address on the third line:
require 'csv'
users_array = []
CSV.foreach('csv_file.csv') do |row|
users_array.push row.delete_if(&:nil?).map(&:strip)
end
users_array
# => [["Jenny", "jenny#example.com"],
# ["Ricky", "ricky#example.com"],
# ["Josefina", "josefina#example.com"]]
There may be a simpler way, but what I'm doing there is discarding the nil field created by the trailing comma and stripping the spaces around the email addresses.

File system crawler - iteration bugs

I'm currently building a file system crawler with the following code:
require 'find'
require 'spreadsheet'
Spreadsheet.client_encoding = 'UTF-8'
count = 0
Find.find('/Users/Anconia/crawler/') do |file|
if file =~ /\b.xls$/ # check if filename ends in desired format
contents = Spreadsheet.open(file).worksheets
contents.each do |row|
if row =~ /regex/
puts file
count += 1
end
end
end
end
puts "#{count} files were found"
And am receiving the following output:
0 files were found
The regex is tested and correct - I currently use it in another crawler that works.
The output of row.inspect is
#<Spreadsheet::Excel::Worksheet:0x003ffa5d418538 #row_addresses= #default_format= #selected= #dimensions= #name=Sheet1 #workbook=#<Spreadsheet::Excel::Workbook:0x007ff4bb147140> #rows=[] #columns=[] #links={} #merged_cells=[] #protected=false #password_hash=0 #changes={} #offsets={} #reader=#<Spreadsheet::Excel::Reader:0x007ff4bb1f3b98> #ole=#<Ole::Storage::RangesIOMigrateable:0x007ff4bb126fa8> #offset=15341 #guts={} #rows[3]> - certainly nothing to iterate over.
Try this:
content = Spreadsheet.open(file)
sheet = content.worksheet 0
sheet.each do |row|
...
As Diego mentioned, I should have been iterating over contents - really appreciate the clarification! It should also be noted that row must be converted to a string before any iteration takes place.

Resources