Ruby csv: no rows found in single column import file - ruby

I'm happily using the Ruby 1.9.3 CSV library to import CSV files (csv/rdoc)
But when the file has only a single column, no data rows are found, even though it can find the header field.
require 'csv'
csv = CSV.new(File.open(import_dir + "#{table}.csv"), :headers => true, :col_sep => ';')
csv.each do |row|
each doesn't return any elements for a single column file. This code is working fine for all other files
The file is simply:
name
sample account
The code finds the header "name" but sees no data rows. I tried quoting the value and adding extra rows. If I add a second column before or after, the data rows can be seen.
Any ideas?

This was being caused by a bug in the app code, nothing to do with the ruby CSV library

Related

Talend tInputFileDelimited component java.lang.NumberFormatException for CSV file

As a beginner to TOS for BD, I am trying to read two csv files in Talend OS, i have inferred the metadata schema from the same CSV file, and setup the first row to be header, and delimiter as comma (,)
In my code:
The tMap will read the csv file, and do a lookup on another csv file, and generate two output files passed and reject records.
But while running the job i am getting below error.
Couldn't parse value for column 'Product_ID' in 'row1', value is '4569,Laptop,10'. Details: java.lang.NumberFormatException: For input string: "4569,Laptop,10"
I believe it is considering entire row as one string to be the value for "Product_ID" column
I don't know why that is happening when i have set the delimiter and row separator correctly.
Schema
I can see no rows are going from the first tInputFileDelimited due to above error.
Job Run
Input component
Any idea what else can i check?
Thanks in advance.
In your last screenshot, you can see that the Field separator of your tFileInputDelimited_1 is ; and not ,.
I believe that you haven't set up your component to use the metadata you created for your csv file.
So you need to configure the component to use the metadata you've created by selecting Repository under Property Type, and selecting the delimited file metadata.

ruby, extract a spreadsheet column and put it in an array

I have an xlsx spreadsheet that has database information, like table names and other info, is there any way how to extract the whole column called Table_Names and put it in array of strings ?
UPDATE,
I could get only rows, my question how to get a special column :
require 'spreadsheet'
Spreadsheet.open('MyTestSheet.xls') do |book|
book.worksheet('Sheet1').each do |row|
break if row[0].nil?
puts row.join(',')
end
end
Thanks,
to deal with excel files use the gem spreadsheet and add require 'spreadsheet'
Thanks.

CSV only returns strings. I need to keep value types

I'm trying to parse through a CSV file and grab every row and upload it to Postgres. The problem is that CSV.foreach returns every value as a string and Postgres won't accept string values in double columns.
Is there an easy way to keep the value types? Or am I going to have to go column by column and convert the strings into doubles and date formats?
require 'csv'
CSV.foreach("C:\\test\\file.csv") do |row|
print row
end
All I need is the values to keep their type and not be returned as a string. I don't know if this is possible with CSV. I have it working just fine when using spreadsheet gem to parse through .xls files.
CSVs do not natively have types; a CSV contains simple comma-separated text. When you view a CSV, you are seeing everything there is to the file. In an Excel file, there is a lot of hidden metadata that tracks the type of each cell.
When you #foreach through a CSV, each row is given as an array of string values. A row might look something like
[ "2.33", "4", "Hello" ]
with each value given as a string. You may think of "2.33" as a float/double, but CSV parsers only know to think of it as a string.
You can convert strings to other types using Ruby's type conversion functions, assuming each column contains only one type (which, since you're using an SQL database, is a pretty safe assumption).
You could write something like this, to convert the values in each row to specific types. This example converts the first row to a float (which should work with Postgres' `double), converts the second row to an integer, and the third row to a string.
require 'csv'
CSV.foreach("C:\\test\\file.csv") do |row|
puts [ row[0].to_f, row[1].to_i, row[2].to_s ]
end
Given the sample row from above, this function would print an array like
>> [ 2.33, 4, "Hello" ]
You should be able to use these converted values in whatever else you're doing with Postgres.
require 'csv'
CSV.foreach("test.txt", converters: :all) do |row|
print row
end
This should convert numerics and datetimes. For integers and floats this works perfectly, but I was not able to get an actual conversion to DateTime going.

Importing CSV data to update existing records with Rails

I'm having a bit of trouble getting a CSV into my application witch I'd like to use to update existing and create records. My CSV data only has two headers Date and Total. I've create a import method in my model which creates everything but if I can the CSV and upload it won't update existing records, it just creates duplicates?
Here is my method, as you can see I'm finding the row by Date heading once matched using find_by, then creating a new record if this returns false and update with the data from the current row if matched but that doesn't seem to be the case, I just get duplicate rows.
def self.import(file)
CSV.foreach(file.path, headers: true) do |row|
entry = find_by(Date: row["Date"]) || new
entry.update row.to_hash
entry.save!
end
end
I hope I'm understanding this correctly. As discovered in the comments below, the CSV date format is DD-MM-YYYY and the database is storing the date as YYYY-MM-DD.
As we found in the comment thread for the question, the date was being persisted to the database in yyyy-mm-dd format.
The date being read in from the CSV file was in mm-dd-yyyy format. Doing a find_by using this format never returned results, as the format differed from that used in the database.
Date.parse will convert the string read from the CSV file into a true Date object which can be successfully compared against the date stored in the database.
So, rather than:
entry = find_by(Date: row["Date"]) || new
Use:
entry = find_by(Date: Date.parse(row["Date"])) || new

Parsing Excel and Google Docs Spreadsheet with column headers or titles

I have excel or google docs spreadsheet files I need to parse. The columns may be ordered differently from file to file but there is always a title in the first row.
I'm looking for a way to use column titles in referencing cells when reading a file with Roo or another similarly purposed gem.
Here's what I'd like to be able to do
Get the 4th for the column titled Widget Count no matter what the column position is:
thisWidgetCount = cell[4,'Widget Count'];
I realize I could just walk the columns and build a hash of title to column name but it seems like this is likely something someone has already wrapped into a lib.
You can extend Roo, or just write as a helper:
oo = Openoffice.new("simple_spreadsheet.ods")
this_widget_count = tcell(oo, 3, 'Widget Count')
def tcell(roo, line, column_title)
column = first_column.upto(last_column).map{ |column| roo.cell(1, column) }.index( column_type ) + 1
roo.cell(line, column)
end
And of course it is better to preload all captions in an Array, because in our case you are fetching titles every time (and that is bad idea in performance)

Resources