I'm getting a Illegal quoting in line 1. (CSV::MalformedCSVError) when I try to read a CSV that I download using Selenium WebDriver.
CSV.foreach( "foo.csv" ) do |row|
# anger :(
end
but when I copy the contents and paste it into a new file and save it again, it works just fine:
CSV.foreach( "bar.csv" ) do |row|
# works fine
end
Here's the first 5 lines of the CSV in question, in case it helps...
"Name","W","L","ERA","GS","G","SV","IP","H","ER","HR","SO","BB","WHIP","K/9","BB/9","FIP","WAR","playerid"
"Craig Kimbrel","5","1","1.79","0","65","35","65.0","42","13","4","95","19","0.95","13.16","2.65","1.84","1.7","6655"
"Aroldis Chapman","2","1","1.93","0","30","27","30.0","18","6","2","47","12","0.99","14.24","3.56","2.22","0.6","10233"
"Greg Holland","5","2","2.39","0","65","34","65.0","47","17","5","83","21","1.05","11.53","2.95","2.48","1.3","7196"
"Kenley Jansen","5","2","2.16","0","65","32","65.0","46","16","6","86","19","1.00","11.97","2.64","2.51","0.9","3096"
I haven't been able to find or come up with a way to get my raw, selenium-downloaded CSV to be read correctly. Anyone run into this or have any ideas on what could be wrong with my data, or how I can fix this programmatically?
Thank you!
It's very likely that your file has a byte-order mark U+FEFF at the very beginning. You are probably losing it when you copy and paste again.
The proper solution is:
CSV.foreach("foo.csv", "r:bom|utf-8") { ... }
Related
I have this block of ruby code. I need to read big json.gz file, that cannot be loaded into RAM at once (so no GzipReader.new method). To achieve this, I use GzipReader and then lazy read with batch loading. Everything works perfectly, but for some reasons, not all data from json get to block. Only 5500125 rows are being processed in this code, but file has cca 6600000 rows. If i use File.open('authors.jsonl.gz') instead of Zlib, then all rows are processed, but are not unzipped.
I looked almost all day to documenattion and haven't found anything :( I also try to unzip each row, that is processed, but all my attempts failed also. Is there way how to unzip file and then read it in chunks (all of its content not just part), or at least read line by line and unzip each line on its own?
Thank you guys :)
Zlib::GzipReader(File.open('authors.jsonl.gz')) do |file|
file.lazy.each_slice(batch_size) do |lines|
lines.each do |line|
parsed_line = JSON.parse(line.gsub('\u0000', ''))
array_of_authors << {id: parsed_line['id'],
name: parsed_line['name'],
username: parsed_line['username'],
description: parsed_line['description'],
followers_count: parsed_line.dig('public_metrics', 'followers_count'),
following_count: parsed_line.dig('public_metrics', 'following_count'),
tweet_count: parsed_line.dig('public_metrics', 'tweet_count'),
listed_count: parsed_line.dig('public_metrics', 'listed_count')}
end
end
end
I am trying to create a program that will count the word frequency within a text file that I have created. I have a text file titled moms_letter.txt and this is my code:
word_count = {}
File.open("moms_letter.txt", "r") do |f|
f.each_line do |line|
words = line.split(' ').each do |word|
word_count[word] += 1 if word_count.has_key? word
word_count[word] = 1 if not word_count.has_key? word
end
end
end
puts word_count
The problem I am getting is when I go to run the file, I get the error:
there is no such file or directory - moms_letter.txt (Errno: : ENOENT)
Not quite sure why this is occurring when I have the text file created.
Any help is appreciated.
I am also newbie in Ruby, so thanks for the patience.
You must be executing your program from outside the directory where your moms_letter.txt file resides. You need to use an absolute path to open your file. Or, execute your program always from the directory where the .txt is. So, instead of using "moms_letter.txt" go with "complete/path/to/file/moms_letter.txt".
I'm fairly new to Ruby too, but have worked with text files a bit recently. It may seem like an obvious question, but is the text file you're trying to open in the same directory as your .rb file? Otherwise you'll need to include the relative path to it.
For troubleshooting sake, try File.new("temp.txt", "w") and then File.open("temp.txt", "r") to see if that works. Then you'll know if it's an issue with your code or with the txt file you're trying to access.
Also using File.exists?("moms_letter.txt") will help you determine whether you can access that file from within your .rb script.
Hope that helps!
Is there a nice way to assert the contents of a CSV file in Ruby?
I understand how to use the CSV libraries and how to read in the CSV file, but that results in a long list of assertions such as:
`assert_equal("0", #csv_array[0].field('impressions'))
assert_equal("7", #csv_array[0].field('clicks'))
assert_equal("330", #csv_array[0].field('currency.GBP.commissions'))
assert_equal("6", #csv_array[0].field('currency.GBP.conversions'))
assert_equal("3300", #csv_array[0].field('currency.GBP.ordervalue'))`
Is there some sort of file comparator so I could write:
assert_equal(expected.csv ,actual.csv )
or something along those lines?
How about this:
expected_csv = "impressions,clicks,currency.GBP.comiisions,currency.GBP.conversions,currency.GBP.ordervalue
0,7,330,6,3300"
actual_csv = File.open('actual.csv').read
assert_equal(expected_csv, actual_csv)
That should work if the entire contents of the CSV file is only 2 lines. Otherwise you will have to manipulate actual_csv to get the parts you want to test. You could do that like so:
IO.readlines('actual.csv')[3]
That will get you the third line. You can then concatenate with a header line or compare to a string without the header.
If you have to test very output, you might find approval testing an interesting approach. Basically, the output is saved the first time your test runs. You can then check the output manually and approve it if correct. On subsequent runs, there will be an error when the output differs.
I created a quick and dirty method for doing this which I may clean up and turn into a gem at some point. https://gist.github.com/bpardee/513b4a15e5ebdc596e0b
For instance, the following code:
file = 'test.csv'
File.open(file, 'w') do |fout|
fout.puts "foo,bar,zulu\n1,2,3\n4,5,6"
end
assert_csv(file) do |csv|
csv << %w(foo bar warrior)
csv << [1,3,5]
csv << [4,5,6]
end
Would result in:
Missing columns: ["zulu"]
Unexpected columns: ["warrior"]
The following mismatches were found in line 2:
bar actual=3 expected=2
I don't recommend this for big csv files since everything is loaded into memory.
Is there a way to edit each line in a file, without involving 2 files? Say, the original file has,
test01
test02
test03
I want to edit it like
test01,a
test02,a
test03,a
Tried something as show in the code block, but it replaces some of the characters.
Writing it to a temporary file and then replace the original file works, However, I need to edit the file quite often and therefore prefer to do it within the file itself .Any pointers are appreciated.
Thank you!
File.open('mytest.csv', 'r+') do |file|
file.each_line do |line|
file.seek(-line.length, IO::SEEK_CUR)
file.puts 'a'
end
end
f = open 'mytest.csv', 'r+'
r = f.readlines.map { |e| e.strip << ',a' }
f.rewind
f.puts r
f.close # you can leave out this line if it's the last one that runs
Here is a one-liner variation, note that in this case 2 descriptors are left open until the program exits.
open(F='mytest.csv','r+').puts open(F,'r').readlines.map{|e|e.strip<<',a'}
Writing to a file doesn't insert; it always overwrites. This makes it awkward to modify text in-place, because you have to rewrite the entire rest of the contents of the file every time you add something new.
If the file is small enough to fit in memory, you can read it in, modify it, and write it back out. Otherwise, you really are better off with the temporary file.
I'm trying to create a new file and things don't seem to be working as I expect them too. Here's what I've tried:
File.new "out.txt"
File.open "out.txt"
File.new "out.txt","w"
File.open "out.txt","w"
According to everything I've read online all of those should work but every single one of them gives me this:
ERRNO::ENOENT: No such file or directory - out.txt
This happens from IRB as well as a Ruby script. What am I missing?
Use:
File.open("out.txt", [your-option-string]) {|f| f.write("write your stuff here") }
where your options are:
r - Read only. The file must exist.
w - Create an empty file for writing.
a - Append to a file.The file is created if it does not exist.
r+ - Open a file for update both reading and writing. The file must exist.
w+ - Create an empty file for both reading and writing.
a+ - Open a file for reading and appending. The file is created if it does not exist.
In your case, 'w' is preferable.
OR you could have:
out_file = File.new("out.txt", "w")
#...
out_file.puts("write your stuff here")
#...
out_file.close
Try
File.open("out.txt", "w") do |f|
f.write(data_you_want_to_write)
end
without using the
File.new "out.txt"
Try using "w+" as the write mode instead of just "w":
File.open("out.txt", "w+") { |file| file.write("boo!") }
OK, now I feel stupid. The first two definitely do not work but the second two do. Not sure how I convinced my self that I had tried them. Sorry for wasting everyone's time.
In case this helps anyone else, this can occur when you are trying to make a new file in a directory that does not exist.
If the objective is just to create a file, the most direct way I see is:
FileUtils.touch "foobar.txt"
The directory doesn't exist. Make sure it exists as open won't create those dirs for you.
I ran into this myself a while back.
File.new and File.open default to read mode ('r') as a safety mechanism, to avoid possibly overwriting a file. We have to explicitly tell Ruby to use write mode ('w' is the most common way) if we're going to output to the file.
If the text to be output is a string, rather than write:
File.open('foo.txt', 'w') { |fo| fo.puts "bar" }
or worse:
fo = File.open('foo.txt', 'w')
fo.puts "bar"
fo.close
Use the more succinct write:
File.write('foo.txt', 'bar')
write has modes allowed so we can use 'w', 'a', 'r+' if necessary.
open with a block is useful if you have to compute the output in an iterative loop and want to leave the file open as you do so. write is useful if you are going to output the content in one blast then close the file.
See the documentation for more information.
data = 'data you want inside the file'.
You can use File.write('name of file here', data)
You can also use constants instead of strings to specify the mode you want. The benefit is if you make a typo in a constant name, your program will raise an runtime exception.
The constants are File::RDONLY or File::WRONLY or File::CREAT. You can also combine them if you like.
Full description of file open modes on ruby-doc.org