Need to strip out invalid characters in CSV file

Need to strip out invalid characters in CSV file - vbscript

I am generating a CSV file from a Microsoft SQL database that was provided to me, but somehow there are invalid characters in about two dozen places throughout the text (there are many thousands of lines of data). When I open the CSV in my text editor, they display as red, upside-down question marks (there are two of them in the attached screenshot).
When I copy the character and view the "find/replace" dialog in my text editor, I see this:
\x{0D}
...but I have no idea what that means. I need to modify my script that generates the CSV so it strips these characters out, but I don't know how to identify them. My script is written in Classic ASP.

You can also use RegEx to remove unwanted characters:
Set objRegEx = CreateObject(“VBScript.RegExp”)
objRegEx.Global = True
objRegEx.Pattern = “[^A-Za-z]”
strCSV = objRegEx.Replace(strCSV, “”)
This code is from the following article which explains in details what it does:
How Can I Remove All the Non-Alphabetic Characters in a String?
In your case you will want to add some characters to the Pattern:
^[a-zA-Z0-9!##$&()\\-`.+,/\"]*$

You can simply use the Replace function and specify Chr(191) (or "¿" directly):
Replace(yourCSV, Chr(191), "")
or
Replace(yourCSV, "¿", "")
This will remove the character. If you need to replace it with something else, change the last parameter from "" to a different value ("-" for example).
In general, you can use charmap.exe (Character Map) from Run menu, select Arial, find a symbol and copy it to the clipboard. You can then check its value using Asc("¿"), this will return the ASCII code to use with Chr().

Related

Symbol # in variable cannot be handled

I got a CSV file from my front-end as a XString and after I convert it into String it looks as follows:
In the next step I'm trying to perform SPLIT lv_string AT '##' INTO TABLE itab so I can get my data but it doesn't split anything, itab contains one line equal to lv_string.
If I try REPLACE '#' IN lv_string WITH space, lv_string doesn't change and sy-subrc is 4.
From my point of view I have this problem because the symbol # is used by SAP in this context as a symbol for non-printable symbols (that result from the conversion byte->string).
My question is: how may I use SPLIT/REPLACE with # in this case?
I also thought that I can change the SAP code page when converting XString to String but I already use the SAP code page 4110 (utf-8) and don't know a better alternative...

When you display a variable with the debugger, it displays the generic character # (U+0023) for all control characters which are not assigned a glyph ("non-printable symbols" as you say).
If the variable corresponds to the contents of a text file, and ## frequently occurs, there is a big chance that it's the combination of the control characters U+000D and U+000A which correspond to "newline" in Windows files.
In the backend debugger, you can check the hexadecimal values of those characters by clicking the button "Hexadezimal" (shown in your screenshot).
You may use the variable CL_ABAP_CHAR_UTILITIES=>CR_LF which contains those two control characters.

Certain characters make moveItemAtURL:toURL:error: crash. how do I avoid them?

First, I'm using Swift. Second this line works fine in my code:
let didIt = fileManager.moveItemAtURL(originalFilePath, toURL: newFilePath, error: nil)
...as long as there are no special characters in the newFilePath. if the newFilePath has a dollar sign or an ampersand ($, & ) in it, the line fails. My issue is that the newFilePath comes from a text field in a window where the user can type any old thing. How do I escape special characters, or encode them so they will pass the test and be included in the new filename?
thanks in advance for any pointers.

My issue is that the newFilePath comes from a text field in a window where the user can type any old thing.
Right there is your problem. Why are you not using an NSSavePanel for letting the user select a name under which to save a file?
If you insist on taking input from a text field, the docs for -URLByAppendingPathComponent: specifically say that the path component string should be "in its original form (not URL encoded)" (emphasis mine).
How did you originally create newFilePath, before appending the path component? For example, you should have used one of the methods with "[fF]ileURL" in the name.

Notepad++ - Binary text error

I have a huge txt file made using python. When I'm trying to sort it using Notepad++/TextFX it returns error: This tool is not compatible with binary text. Please select text without [NUL] characters.. Does it means that I have non-printable chars in this txt file? Is it possible to convert this file to compatible format so I could sort it using TextFX?
EDIT: I used mode 'a' in Python to write this file.
Thank you for your advices.

using TextFX in Notepad++ you could try the following:
Mark the suspicious part or the whole text
Select TextFX, TextFX Characters, Zap all nonprintable characters to #. (The last entry in that submenu.)
All the problematic characters should have been replaced with "#", you can then search for "#".
Another idea is the function: Search, "Find characters in range". Check "My range:" and enter "0" and "0" as range, to find [Nul] characters.
Lars

Parsing out abnormal characters

I have to work with text that was previously copy/pasted from an excel document into a .txt file. There are a few characters that I assume mean something to excel but that show up as an unrecognised character (i.e. that '?' symbol in gedit, or one of those rectangles in some other text editors.). I wanted to parse those out somehow, but I'm unsure of how to do so. I know regular expressions can be helpful, but there really isn't a pattern that matches unrecognisable characters. How should I set about doing this?

you could work with http://spreadsheet.rubyforge.org/ maybe to read / parse the data

I suppose you're getting these characters because the text file contains invalid Unicode characters, that means your '?'s and triangles could actually be unrecognized multi byte sequences.
If you want to properly handle the spreadsheet contents, i recommend you to first export the data to CSV using (Open|Libre)Office and choosing UTF-8 as file encoding.
https://en.wikipedia.org/wiki/Comma-separated_values

If you are not worried about multi byte sequences I find this regex to be handy:
line.gsub( /[^0-9a-zA-Z\-_]/, '*' )

ADO and Microsoft Text Driver - Field Delimiter Problem

I'm using VB6 and ADO together with the Microsoft Text Driver to import data from an ASCII file. The file is comma delimited but it also contains double quotation marks around text data fields. The fields are also fixed width.
I'm having a problem that the driver reads the columns incorrectly any time one of the rows contains a quotation mark double quotation inside the content. This happens inside the "part description" column which is the second column from the left. When this occurs, columns to the right are all Null value, which is not the case in the text file.
I think it would be better to use only the commas as delimiters. However, I believe that commas also occur in the "part description" column so this means I should really load the file as fixed width. I'm not aware that there is any way of doing this unless I can specify this in the schema.ini file.
Any ideas on how to resolve this?
Edit:
You are allowed to specify fixed width in your Schema.ini file. However, it appears to me that the commas and quotation marks that also exist as delimiters/qualifiers will prevent this from working properly. It looks like I may have to "manually" read the file in and write it back out in my own format before I load it using the MS Text driver. Still looking for other opinions.

I would try changing the Format value in the registry for the Jet text engine (if that's what you're using) at the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text. I think the default is CSVDelimited but you would change this to FixedLength. See http://msdn.microsoft.com/en-us/library/ms974559.aspx
It's probably worth adding that although you have a Schema.ini file for settings, on some options the registry overrules them anyway
Actually, looking at the link I supplied, it seems you have to use a schema.ini file for fixed-length files. Have you tried something like the following, which specifies the width?
[Test.txt]
Format=FixedLength
Col1=FirstName Text Width 7
Col2=LastName Text Width 10
Col3=ID Text Integer 3

I'm extra precautious with regional settings -- some users change default list separator. Usualy fix this with schema.ini like this:
[MyFile.csv]
Format=Delimited(,)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Need to strip out invalid characters in CSV file - vbscript

Related

Symbol # in variable cannot be handled

Certain characters make moveItemAtURL:toURL:error: crash. how do I avoid them?

Notepad++ - Binary text error

Parsing out abnormal characters

ADO and Microsoft Text Driver - Field Delimiter Problem

Categories

Resources