Clean headers and remove bold text - django-import-export

I have a CSV that has bold text on all of the first columns. I wanted to sanitize it first since now it failed to get the row I wanted to get.
I tried printing the row in before_import_row and this is how it looks like.
('\ufeffaccount_number', '000-152-1808')

It is possible using dynamic columns in tablib. Add a callable which returns the unsanitized column value, and then add it to a new column.
def accno_cleaned(row):
return '\ufeffaccount_number'
def before_import(self, dataset, using_transactions, dry_run, **kwargs):
dataset.append_col(accno_cleaned, header='account_number')
However I think it might be better to sanitize the data before it gets imported into django-import-export if you can because this will be easier to maintain in the long run.

Related

Ruby CSV, how to write two variables and an array to the same row?

Hey guys I'm writing a ruby program that reads a database of food items and recipes that is in CSV format, and writes it back to a file. I'm having issues writing to a CSV file correctly
I want to write an objects attributes to a CSV file
csv_text = CSV.open("FoodDB1.txt","w") do |i|
##dataList.each do |j|
if j.get_type == "b"
i << [j.name,j.get_type,j.cal]
elsif j.get_type == "r"
i << [j.name,j.get_type,j.print_bFood]
end
end
end
I have two types of objects, basic food and a recipe. Both are stored in the dataList array. I check each object for its type, if it's a basic food, writing it is easy since it is just three simple fields. If it is a recipe, I write the name,type,and the basic foods that make up that recipe.
The issue I'm having is at this line
i << [j.name,j.get_type,j.print_bFood]
So it prints out the name of the recipe, the type(whether its a basic food or a recipe) and then finally the list of foods in the recipe. That is where I'm having issues.
bFood is an array of basic foods that is stored in the object, and I'm having trouble adding it to the CSV row. I tried making a method(which is print_bFood) that returns a string of the combined array using .join(","), but because of the comma in the string, when CSV writes it to a file it is wrapped in quotes
"PB&J Sandwich,r,"Jelly,Peanut butter,Bread slice, Bread slice""
I want it to look like this
"PB&J Sandwich,r,Jelly,Peanut butter,Bread slice, Bread slice"
Any ideas on what can help. I've looked for ways to do this and I just can't think of anything anymore.
One idea I had was if I had the ability to just add on to a row, I could iterate through the bFood array and add each one to the row, but I haven't found any functionality that can do that.
If I read this correctly you should just need...
i << [j.name, j.get_type, j.bFood].flatten

Ruby 1.9.3 - How does CSV.table know if no headers are in a CSV file?

I have been doing some testing with CSV.table. I have two small and almost identical CSV files, however one is missing the header row.
When I run CSV.table against the CSV file with the header row, everything works as expected.
When I run it against the CSV file without the header row I get:
NoMethodError: undefined method `encode' for nil:NilClass
I tried this with different types of data, with different types of headers, and get the same results.
I am curious to the magic of CSV.table. If I use CSV.parse with headers set to true, then it always makes the first row be headers no matter what. So, I have been using CSV.table to check if a CSV file being imported has a header row but I am not too comfortable with this because I don't understand if or when it will or will not work the way I'm using it.
begin
CSV.table(csv_file_path)
rescue
# Add error to log or something.
end
Does anyone know?
P.S. I've already read through this and the source code it provides on each method - http://www.ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
There isn't any magic involved, and it won't work for you in general.
As you can see from the source, table literally just calls read with headers: true. But it also converts the header to symbols (header_converters: :symbol) and this is the key to why it appears to work.
You get an error without headers because you have a blank column in your first row of data (something like a,b,,d,e). The blank gets read in as nil, and since nil can't be converted to a symbol, it blows up.
Try it with some data that doesn't have a blank in the first row - you'll see that table will treat that row of data as headers just like any of the other methods.

How do I create a copy of some columns of a CSV file in Ruby with different data in one column?

I have a CSV file called "A.csv". I need to generate a new CSV file called "B.csv" with data from "A.csv".
I will be using a subset of columns from "A.csv" and will have to update one column's values to new values in "B.csv". Ultimately, I will use this data from B.csv to validate against a database.
How do I create a new CSV file?
How do I copy the required columns' data from A.csv to "B.csv"?
How do I append values for a particular column?
I am new to Ruby, but I am able to read CSV to get an array or hash.
As mikeb pointed out, there are the docs - http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html - Or you can follow along with the examples below (all are tested and working):
To create a new file:
In this file we'll have two rows, a header row and data row, very simple CSV:
require "csv"
CSV.open("file.csv", "wb") do |csv|
csv << ["animal", "count", "price"]
csv << ["fox", "1", "$90.00"]
end
result, a file called "file.csv" with the following:
animal,count,price
fox,1,$90.00
How to append data to a CSV
Almost the same formula as above only instead of using "wb" mode, we'll use "a+" mode. For more information on these see this stack overflow answer: What are the Ruby File.open modes and options?
CSV.open("file.csv", "a+") do |csv|
csv << ["cow", "3","2500"]
end
Now when we open our file.csv we have:
animal,count,price
fox,1,$90.00
cow,3,2500
Read from our CSV file
Now you know how to copy and to write to a file, to read a CSV and therefore grab the data for manipulation you just do:
CSV.foreach("file.csv") do |row|
puts row #first row would be ["animal", "count", "price"] - etc.
end
Of course, this is like one of like a hundred different ways you can pull info from a CSV using this gem. For more info, I suggest visiting the docs now that you have a primer: http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
Have you seen Ruby's CSV class? It seems pretty comprehensive. Check it out here:
http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html
You will probably want to use CSV::parse to help Ruby understand your CSV as the table of data that it is and enable easy access to values by header.
Unfortunately, the available documentation on the CSV::parse method doesn't make it very clear how to actually use it for this purpose.
I had a similar task and was helped much more by How to Read & Parse CSV Files With Ruby on rubyguides.com than by the CSV class documentation or by the answers pointing to it from here.
I recommend reading that page in its entirety. The crucial part is about transforming a given CSV into a CSV::Table object using:
table = CSV.parse(File.read("cats.csv"), headers: true)
Now there's documentation on the CSV::Table class, but again you might be helped more by the clear examples on the rubyguides.com page. One thing I'll highlight is that when you tell .parse to expect headers, the resulting table will treat the first row of data as row [0].
You will probably be especially interested in the .by_col method available for your new Table object. This will allow you to iterate through different column index positions in the input and/or output and either copy from one to the other or add a new value to the output. If I get it working, I'll come back and post an example.

performance issue of watir table object processing. How to make Nokogiri html table into array?

The following works but is always very slow, seemingly halting my scraping program and its Firefox or Chrome browser for even whole minutes per page:
pp recArray = $browser.table(:id,"recordTable").to_a
Getting the HTML table's text or html source is fast though:
htmlcode = $browser.table(:id,"recordTable").html # .text shows only plaintext portion like lynx
How might I be able to create the same recArray (each element from a <TR>) using for example a Nokogiri object holding only that table's html?
recArray = Nokogiri::HTML(htmlcode). ??
I wrote a blog post about that a few days ago: http://zeljkofilipin.com/watir-nokogiri/
If you have further questions, ask.
You want each tr in the table?
Nokogiri::HTML($browser.html).css('table[#id="recordTable"] > tr')
This gives a NodeSet which can be more useful than Array. Of course there's still to_a
Thought it would be useful to sum up all the steps here and there:
The question was how to produce the same array object filled with strings from the page's text content that a Watir::Webdriver Table #to_a might produce, but much faster:
recArray = Nokogiri::HTML(htmlcode). **??**
So instead of this as I was doing before:
recArray=$browser.table(:class, 'detail-table w-Positions').to_a
I send the whole page's html as a string to Nokogiri to let it do the parsing:
recArray=Nokogiri::HTML($browser.html).css('table[#class="detail-table w-Positions"] tr').to_a
Which found me the rows of the table I want and put them into an array.
Not done yet since the elements of that array are still Nokogiri (Table Row?) types, which barfed when I attempted things like .join(",") (useful for writing into a .CSV file or database for instance)
So the following iterates through each row element, turning each into an array of pure Ruby String types, containing only the text content of each table cell stripped of html tags:
recArray= recArray.map {|row| row.css("td").map {|c| c.text}.to_a } # Could of course be merged with above to even longer, nastier one-liner
Each cell had previously also been a Nokogiri Element type, done away with the .text mapping.
Significant speedup achieved.
Next I wonder what it would take to simply override the #to_a method of every Watir::Webdriver Table object globally in my Ruby code files....
(I realize that may not be 100% compatible but it would spare me so much code rewriting. Am willing to try in my personal.lib.rb include file.)

Read Dates as Strings with Spreadsheet Ruby Gem

I have been looking for a way to read out an Excel spreadsheet with keeping the dates that are in them being kept as a string. Unfortunately I can't see if this is possible or not, has anyone managed to do this or know how?
You may want to have a look at the Row class of the spreadsheet gem:
http://spreadsheet.rubyforge.org/Spreadsheet/Row.html
There's a lot that you can get there, but the Row#formatted method is probably what you want:
row = sheet.to_a[row_index] # Get row object
value = row.formatted[column_index]
The formatted method takes all the Excel formatting data for you and gives you an array of Ruby-classed objects
I think you can try row.at(col_index) method..
You can refer to this page

Resources