Prevent backslash escaping in CSV import - laravel

I'm using Laravel Excel for importing products from CSV files.
CSV settings:
ISO-8859-1 format
Comma as delimiter.
" as enclosure
A few of the products have a name ending with a backslash.
Ex.: Example product model DN123\
Looking at the raw CSV file in a text editor, it will look like this:
... "Example product model DN123\", "Shoes", "Men" ...
When importing it's perceived as the enclousure (") after the backslash is escaped. How can I prevent this? When I open the CSV in the mac-program Numbers everything is fine.

It's not well documented, but the escape_character can be changed in custom CSV settings. I changed it to " and it worked.

Related

Talend Open Studio: delimited file with semi colon and header with quotes

I have a delimited file that is delimited by semi colon.
The first row in this file is the header, and the header tokens are in double quotes: an example is below:
"name", "telephone", "age", "address", "y"
When using the tFileDelimited and tMap and you pull the fields in, they look like this with underscores around the fields:
_name_, _telephone_, _age_, _address_, Column05
SO it seems that the fields, the double quote is changed to underscore character and for some reason the last field is a single character without the quotes, but Talend seems to ignore this field name and gives its own default.
Just wondering if anyone has encountered this kind of behaviour and whether one should use a regex to remove the double quotes, to preprocess this first.
Any help appreciated.
Be sure to remove extra blank spaces in the first row, between header tokens. If you use Metadata to import your file, you should have the right names appearing, (just check the options : 'heading rows as column names' and "\"" as the text enclosure)

Changing escape character for quotes

I am trying to read a CSV file which contains escaped quote values such as:
"1","unquoted text","\"quoted text\""
It seems that SuperCSV wants quotes to be quoted as
"1","unquoted text","""quoted text"""
Is there a way to change the escape character to a backslash? I've looked at the docs and not seen anything.
Just found a link to an issue logged in github: https://github.com/super-csv/super-csv/issues/14
Seems like a different CSV handler is in order.

How to escape both " and ' when importing each row

I import a text file and save each row as a new record:
CSV.foreach(csv_file_path) do |row|
# saving each row to a new record
end
Strangely enough, the following escapes double quotes, but I have no clue how to escape different characters:
CSV.foreach(csv_file_path, {quote_char: "\""}) do |row|
How do I escape both the characters " and '?
Note that you have additional options available to configure the CSV handler. The useful options for specifying character delimiter handling are these:
:col_sep - defines the column separator character
:row_sep - defines the row separator character
:quote_char - defines the quote separator character
Now, for traditional CSV (comma-separated) files, these values default to { col_sep: ",", row_sep: "\n", quote_char: "\"" }. These will satisfy many needs, but not necessarily all. You can specify the right set to suit your well-formed CSV needs.
However, for non-standard CSV input, consider using a two-pass approach to reading your CSV files. I've done a lot of work with CSV files from Real Estate MLS systems, and they're basically all broken in some fundamental way. I've used various pre- and post-processing approaches to fixing the issues, and had quite a lot of success with files that were failing to process with default options.
In the case of handling single quotes as a delimiter, you could possibly strip off leading and trailing single quotes after you've parsed the file using the standard double quotes. Iterating on the values and using a gsub replacement may work just fine if the single quotes were used in the same way as double quotes.
There's also an "automatic" converter that the CSV parser will use when trying to retrieve values for individual columns. You can specify the : converters option, like so: { converters: [:my_converter] }
To write a converter is pretty simple, it's just a small function that checks to see if the column value matches the right format, and then returns the re-formatted value. Here's one that should strip leading and trailing single quotes:
CSV::Converters[:strip_surrounding_single_quotes] = lambda do |field|
return nil if field.nil?
match = field ~= /^'([^']*)'$/
return match.nil? ? field : match[1]
end
CSV.parse(input, { converters: [:strip_surrounding_single_quotes] }
You can use as many converters as you like, and they're evaluated in the order that you specify. For instance, to use the pre-defined :all along with the custom converter, you can write it like so:
CSV.parse(input, { converters: [:all, :strip_surrounding_single_quotes] }
If there's an example of the input data to test against, we can probably find a complete solution.
In general, you can't, because that will create a CSV-like record that is not standard CSV (Wikipedia has the rules in a bit easier to read format). In CSV, only double quotes are escaped - by doubling, not by using a backslash.
What you are trying to write is not a CSV; you should not use a CSV library to do it.

Can't import product names/descriptions with special characters

I'm trying to import a product CSV which has Bulgarian product names/descriptions (using the standard import under import/export->import)
The only way I've been able to import any so far is by wrapping them in quotes or by putting roman characters in front of the Bulgarian.
e.g. 'Ламинирани ПДЧ' or xxx Ламинирани ПДЧ
without adding these characters it outputs the error: Required attribute 'name' has an empty value in rows: 1
It seems like the Bulgarian is being stripped out completely? My file is encoded as UTF-8 and I've also set the default charset as UTF-8 in the htaccess file.
Is it possible to import the Bulgarian without quotes/roman characters?
I haven't personally tried it, but I have seen this answer that seems to work.
Try saving your csv file through Libre Office Calc with UTF-8 and try import again.

How to clean a csv file where fields contains the csv separator and delimiter

I'm currently strugling to clean csv files generated automatically with fields containing the csv separator and the field delimiter using sed or awk or via a script.
The source software has no settings to play with to improve the situation.
Format of the csv:
"111111";"text";"";"text with ; and " sometimes "; or ;" multiple times";"user";
Fortunately, the csv is "well" formatted, the exporting software just doesn't escape or replace "forbidden" chars from the fields.
In the last few days I tried to improve my knowledge of regular expression and find expression to clean the files but I failed.
What I managed to do so far:
RegEx to find the fields (I wanted to find the fields and perform a replace inside but I didn't find a way to do it)
(?:";"|^")(.*?)(?=";"|";\n)
RegEx that find semicolon, does not work if the semicolon is the last char of the field only find one per field.
(?:^"|";")(?:.*?)(;)(?:[^"\n].*?)(?=";"|";\n)
RegEx to find the double quotes, seems to pick the first double quote of the line in online regex testers
(?:^"|";")(?:.*?)[^;](")(?:[^;].*?)(?=";"|";\n)
I thought of adding space between each chars in the fields then searching for lonely semi colon and double quotes and remove single space after that but I don't know if it's even possible and seems like a poor solution anyway.
Any standard library should be able to handle it if there is no explicit error in the CSV itself. This is why we have quote-characters and escape characters.
When you create a CSV by yourself - you may forgot handling such cases and let your final output file use this situation. AWK is not a CSV reader but simply a text processing utility.
This is what your row should rather look like.
"111111";"text";"";"text with \; and \" sometimes \"; or ;\" multiple times";"user";
So if you can still re-fetch the data, find a way to export the CSV either through the database's own functionality of csv library for the languages you work with.
In python, this would look like this:-
mywriter = csv.writer(csvfile, delimiter=';', quotechar='"', escapechar="\\")
But if you can't create csv again, the only hope is that you expect some pattern within the fields, as in this question:- parse a csv file that contains commans in the fields with awk
But this is rarely true in textual data - esp comments or posts on a webpage. Another idea in such situations would be to use '\t' as separator.

Resources