Ruby - CSV works while SmarteCSV doesn't - ruby

I want to open a csv file using SmarterCSV.process
market_csv = SmarterCSV.process(market)
p "just read #{market_csv}"
The problem is that the data is not read and this prints:
[]
However, if I attempt the same thing with the default CSV library implementation the content of the file is read(the following print statement prints the file).
CSV.foreach(market) do |row|
p row
end
The content of the file I was reading is of the form:
Date,Close
03/06/15,0.1634
02/06/15,0.1637
01/06/15,0.1638
31/05/15,0.1638

The problem could come from the line separator, the file is not exactly the same if you're using windows or unix system ("\r\n" or "\r"). Try to identify and specify the character in the SmarterCSV.process like this:
market_csv = SmarterCSV.process(market, row_sep: "\r")
p "just read #{market_csv}"
or like this:
market_csv = SmarterCSV.process(market, row_sep: :auto)
p "just read #{market_csv}"

Related

CSV.foreach Not Reading First Column in CSV File

Learning Ruby for the first time to automate cleaning up some CSV files.
I've managed to piece together the script below from other SO questions but for some reason the script does not read the first column of the original CSV file. If I add a dummy first column everything works perfectly. What am I missing?
require 'csv'
COLUMNS = ['SFID','Date','Num','Transaction Type']
CSV.open("invoicesfixed.csv", "wb",
:write_headers=> true,
:headers => ["Account__c","Invoice_Date__c","Invoice_Number__c","Transaction_Type__c"]) do |csv|
CSV.foreach('invoices.csv', :headers=>true, :converters => :all) do |row|
#convert date format to be compatible with Salesforce
row['Date'] = Date.strptime(row['Date'], '%m/%d/%y').strftime('%Y-%m-%d')
csv << COLUMNS.map { |col| row[col] }
end
end
This input file:
Transaction Type,Date,Num,SFID
Invoice,7/1/19,151466,SFID1
Invoice,7/1/19,151466,SFID2
Invoice,7/1/19,151466,SFID3
Invoice,7/1/19,151466,SFID4
Invoice,7/1/19,151466,SFID5
Invoice,7/1/19,151466,SFID6
Invoice,7/1/19,151153,SFID7
Sales Receipt,7/1/19,149487,SFID8
Sales Receipt,7/1/19,149487,SFID9
Sales Receipt,7/1/19,149758,SFID10
Sales Receipt,7/1/19,149758,SFID11
Yields this output:
Account__c,Invoice_Date__c,Invoice_Number__c,Transaction_Type__c
SFID1,2019-07-01,151466,
SFID2,2019-07-01,151466,
SFID3,2019-07-01,151466,
SFID4,2019-07-01,151466,
SFID5,2019-07-01,151466,
SFID6,2019-07-01,151466,
SFID7,2019-07-01,151153,
SFID8,2019-07-01,149487,
SFID9,2019-07-01,149487,
SFID10,2019-07-01,149758,
SFID11,2019-07-01,149758,
However, this input:
Dummy,Transaction Type,Date,Num,SFID
,Invoice,7/1/19,151466,SFID1
,Invoice,7/1/19,151466,SFID2
,Invoice,7/1/19,151466,SFID3
,Invoice,7/1/19,151466,SFID4
,Invoice,7/1/19,151466,SFID5
,Invoice,7/1/19,151466,SFID6
,Invoice,7/1/19,151153,SFID7
,Sales Receipt,7/1/19,149487,SFID8
,Sales Receipt,7/1/19,149487,SFID9
,Sales Receipt,7/1/19,149758,SFID10
,Sales Receipt,7/1/19,149758,SFID11
Yields the correct output of:
Account__c,Invoice_Date__c,Invoice_Number__c,Transaction_Type__c
SFID1,2019-07-01,151466,Invoice
SFID2,2019-07-01,151466,Invoice
SFID3,2019-07-01,151466,Invoice
SFID4,2019-07-01,151466,Invoice
SFID5,2019-07-01,151466,Invoice
SFID6,2019-07-01,151466,Invoice
SFID7,2019-07-01,151153,Invoice
SFID8,2019-07-01,149487,Sales Receipt
SFID9,2019-07-01,149487,Sales Receipt
SFID10,2019-07-01,149758,Sales Receipt
SFID11,2019-07-01,149758,Sales Receipt
Any ideas why this might be happening?
I had a similar problem, though running your example worked.
I realized that problem (at least for me) was that I was creating CSV file using "Save As UTF-8 CSV" from Excel.
This adds BOM to the beginning of the file - before the first column header name and consequently row['firstColumnName'] was returning nil.
Saving file as CSV fixed the issue for me.

Replace and remove string

From file access_file.txt, which has around 500 entries, with the content:
id\hzxcr
roll\85pol
id\byt65_d
rfc\myid
sub\aa_frt_09
.........
.........
I want to check if any of its lines is present in any of the files under the directory :D:/Details/Ruby_new, which has around 100+ files, with an extension ending i.e., *-accessfile.txt, as follows:
Name_accessfile.txt
ID_accessfile.txt
domain_accessfile.txt
roll_accessfile.txt
.......
.......
If present, I want to delete that instance or string, and save it in the same file. I don't want to create a new file or a backup file, but edit and save in the same file.
I came up with the following code:
value=File.open('D:\\my_work\\access_file.txt').read
value.gsub!(/\r\n?/, "\n")
value.each_line do |line|
line.chomp!
Dir.glob("D:/Details/Ruby_new/*-accessfile.txt") do |file_name|
text = File.read(file_name)
#print "FileName: #{file_name}\n"
replace = text.gsub(/#{line}/, "")
File.open(file_name, "w") { |file| file.puts replace }
end
end
but I'm facing the following warning, and the string is not removed from the target files.
my_ruby.rb:10: warning: invalid subexp call: /id\hzxcr/ my_ruby:10:
warning: invalid subexp call: /id\hzxcr/
Looking for any suggestions.
Try this replace = text.gsub(line.strip, "") instead of replace = text.gsub(/#{line}/, "")

How to sequentially create multiple CSV files in Ruby?

Silly question, but I want to do some processing on a dataset and put them into different CSVs, like UDID1.csv, UDID2.csv, ..., UDID1000.csv. So this is my code:
for i in 1..1000
logfile = File.new('C:\Users\hp1\Desktop\Datasets\New File\UDID#{i}\.csv',"a")
#I'll do some processing here
end
But the program throws an error when running because of the UDID#{i} part. So, how to overcome this issue? Thanks.
Edit: This is the error:
in `initialize': No such file or directory # rb_sysopen - C:\Users\hp1\Desktop\Datasets\New File\udid#{1}\.csv (Errno::ENOENT)from C:/Ruby21/bin/hashedUDID.rb:38:in `new' from C:/Ruby21/bin/hashedUDID.rb:38:in '<main>'
The ' is one problem, another problem is the path.
In your posting the New File must exist as a directory. Inside this directory must exist another directories like UDID0001. This gets a .csv file.
Correct is (I don't use the non-rubyesk for-loop):
1.upto(1000) do |i|
logfile = File.new("C:\\Users\\hp1\\Desktop\\Datasets\\UDID#{i}.csv", "a")
#I'll do some processing here
logfile.close #Don't forget to close the file
end
Inside " the backslash must be masked (\\). Instead you may use /:
logfile = File.new("C:/Users/hp1/Desktop/Datasets/New File/UDID#{i}/.csv", "a")
Another possibility is the usage of %i to insert the number:
logfile = File.new("C:/Users/hp1/Desktop/Datasets/New File/UDID%02i/.csv" % i, "a")
I prefer to use open, then the file is closed with the end of the block:
File.open("C:/Users/hp1/Desktop/Datasets/New File/UDID%04i/.csv" % i, "a") do |logfile|
#I'll do some processing here
end #closes the file
Warning:
I'm not sure, if you really want to create 1000 log files (The File is opened inside the loop. so each step creates a file.).
If yes, then the %04i-version has the advantage, that the files get all the same number of digits (starting with 0001 and ending with 1000).
(1..10).each { |i| logfile = File.new("/base/path/UDID#{i}.csv") }
You must use double quote (") when you need string interpolation.
#{} can only be used in strings with double quotes ". So change your code to:
for i in 1..1000
logfile = File.new("C:\Users\hp1\Desktop\Datasets\New File\UDID#{i}\.csv","a")
# other stuff
end

Python edit file with an insanely long line

I am trying to edit particular html files that I download in python. I am running into a problem where I run my code to edit the file and my python context locks up. I checked the file it's writing to and found that there are two files. The html file and a .bak file.
The html file starts out at 0kb and the .bak file constantly grows to a point, maybe 12 mb or so, then the .html file will grow to a larger size, then the .bak file will grow again. This seems to cycle endlessly. The html file I am editing is 22kb. I watched the output file grow to a gig once just to see if it would stop... It doesn't.
Here is the function I am using to edit the file:
def replace(self, search_str, replace_str):
f = open(self.path,'r+')
content = f.readlines()
for i, line in enumerate(content):
content[i] = line.replace(search_str, replace_str)
f.writelines(content)
f.close()
The issue, I imagine relates to the fact that the html file, as downloaded, is mostly in a single line with ~ 21,000 characters in it. Any ideas?
edit:
I have also tried another function, but get the same result:
def replace(self, search_str, replace_str):
assert self.path != None, 'No file path provided.'
fi = fileinput.FileInput(self.path,inplace=1)
for line in fi:
if search_str in line:
line=line.replace(search_str,replace_str)
print line
fi.close()
Try using generator. Thats the way to go if you need to read a large file
for line in open(self.path,'r+'):
# do stuff with line
I re-wrote the function to write everything out to a new file and it works.
def replace(self, search_str, replace_str):
f = open(self.path,'r+')
new_path = self.path.split('.')[0]+'.TEMP'
new_f = open(new_path,'w')
new_lines = [x.replace(search_str, replace_str) for x in f]
new_f.writelines(new_lines)
f.close()
new_f.close()
os.remove(self.path)
os.rename(new_path, self.path)

No such file or directory - ruby

I am trying to read the contents of the file from a local disk as follows :
content = File.read("C:\abc.rb","r")
when I execute the rb file I get an exception as Error: No such file or directory .What am I missing in this?
In a double quoted string, "\a" is a non-printable bel character. Similar to how "\n" is a newline. (I think these originate from C)
You don't have a file with name "C:<BEL>bc.rb" which is why you get the error.
To fix, use single quotes, where these interpolations don't happen:
content = File.read('C:\abc.rb')
content = File.read("C:\/abc.rb","r")
First of all:
Try using:
Dir.glob(".")
To see what's in the directory (and therefore what directory it's looking at).
open("C:/abc.rb", "rb") { |io| a = a + io.read }
EDIT: Unless you're concatenating files together, you could write it as:
data = File.open("C:/abc.rb", "rb") { |io| io.read }

Resources