ADO and Microsoft Text Driver - Field Delimiter Problem - vb6

I'm using VB6 and ADO together with the Microsoft Text Driver to import data from an ASCII file. The file is comma delimited but it also contains double quotation marks around text data fields. The fields are also fixed width.
I'm having a problem that the driver reads the columns incorrectly any time one of the rows contains a quotation mark double quotation inside the content. This happens inside the "part description" column which is the second column from the left. When this occurs, columns to the right are all Null value, which is not the case in the text file.
I think it would be better to use only the commas as delimiters. However, I believe that commas also occur in the "part description" column so this means I should really load the file as fixed width. I'm not aware that there is any way of doing this unless I can specify this in the schema.ini file.
Any ideas on how to resolve this?
Edit:
You are allowed to specify fixed width in your Schema.ini file. However, it appears to me that the commas and quotation marks that also exist as delimiters/qualifiers will prevent this from working properly. It looks like I may have to "manually" read the file in and write it back out in my own format before I load it using the MS Text driver. Still looking for other opinions.

I would try changing the Format value in the registry for the Jet text engine (if that's what you're using) at the key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Text. I think the default is CSVDelimited but you would change this to FixedLength. See http://msdn.microsoft.com/en-us/library/ms974559.aspx
It's probably worth adding that although you have a Schema.ini file for settings, on some options the registry overrules them anyway
Actually, looking at the link I supplied, it seems you have to use a schema.ini file for fixed-length files. Have you tried something like the following, which specifies the width?
[Test.txt]
Format=FixedLength
Col1=FirstName Text Width 7
Col2=LastName Text Width 10
Col3=ID Text Integer 3

I'm extra precautious with regional settings -- some users change default list separator. Usualy fix this with schema.ini like this:
[MyFile.csv]
Format=Delimited(,)

Related

Need to strip out invalid characters in CSV file

I am generating a CSV file from a Microsoft SQL database that was provided to me, but somehow there are invalid characters in about two dozen places throughout the text (there are many thousands of lines of data). When I open the CSV in my text editor, they display as red, upside-down question marks (there are two of them in the attached screenshot).
When I copy the character and view the "find/replace" dialog in my text editor, I see this:
\x{0D}
...but I have no idea what that means. I need to modify my script that generates the CSV so it strips these characters out, but I don't know how to identify them. My script is written in Classic ASP.
You can also use RegEx to remove unwanted characters:
Set objRegEx = CreateObject(“VBScript.RegExp”)
objRegEx.Global = True
objRegEx.Pattern = “[^A-Za-z]”
strCSV = objRegEx.Replace(strCSV, “”)
This code is from the following article which explains in details what it does:
How Can I Remove All the Non-Alphabetic Characters in a String?
In your case you will want to add some characters to the Pattern:
^[a-zA-Z0-9!##$&()\\-`.+,/\"]*$
You can simply use the Replace function and specify Chr(191) (or "¿" directly):
Replace(yourCSV, Chr(191), "")
or
Replace(yourCSV, "¿", "")
This will remove the character. If you need to replace it with something else, change the last parameter from "" to a different value ("-" for example).
In general, you can use charmap.exe (Character Map) from Run menu, select Arial, find a symbol and copy it to the clipboard. You can then check its value using Asc("¿"), this will return the ASCII code to use with Chr().

Symbol # in variable cannot be handled

I got a CSV file from my front-end as a XString and after I convert it into String it looks as follows:
In the next step I'm trying to perform SPLIT lv_string AT '##' INTO TABLE itab so I can get my data but it doesn't split anything, itab contains one line equal to lv_string.
If I try REPLACE '#' IN lv_string WITH space, lv_string doesn't change and sy-subrc is 4.
From my point of view I have this problem because the symbol # is used by SAP in this context as a symbol for non-printable symbols (that result from the conversion byte->string).
My question is: how may I use SPLIT/REPLACE with # in this case?
I also thought that I can change the SAP code page when converting XString to String but I already use the SAP code page 4110 (utf-8) and don't know a better alternative...
When you display a variable with the debugger, it displays the generic character # (U+0023) for all control characters which are not assigned a glyph ("non-printable symbols" as you say).
If the variable corresponds to the contents of a text file, and ## frequently occurs, there is a big chance that it's the combination of the control characters U+000D and U+000A which correspond to "newline" in Windows files.
In the backend debugger, you can check the hexadecimal values of those characters by clicking the button "Hexadezimal" (shown in your screenshot).
You may use the variable CL_ABAP_CHAR_UTILITIES=>CR_LF which contains those two control characters.

Feeding multiple values into a barcode leaves a '0' barcode. How do I remove this barcode entirely?

I have a system currently set up that creates a barcode for a UPC on a label. This works for single items, but sometimes I have more than one item that tries to feed into that barcode, and when that happens it is set to have no value.
However, instead of there being no barcode, there is actually a small barcode that scans in as 0. How do I ensure that no barcode appears?
^FT350,698^BY2,,75
^BCN,75,N,N,N^FD$ItemBarCode$^FS
"$ItemBarCode$" is an item from a populated table that I do not control, and there can be as many items as needed. The customer requires no barcode when there are multiple items and requires a barcode when there is one. Their sample does not use a typical UPC style barcode.
You say you don't have control over the data in the table, but do you have control over the content/format of $ItemBarCode$?
Have the variable contain the ^FD prefix and ^FS suffix (and remove from the ZPL code). When the variable is blank/empty, nothing will print.
According to the software developer consultant, the solution is to create a customization in the system's code that allows for a logic line to fix this error. This is not something that can be fixed within ZPL itself, rather, there will be two separate labels. For instance,
if single item then print X
if multiple items then print Y
I have same situation. My solution is input barcode command in single line with its data and terminator ^FD and ^FS. So during parsing label file line by line, if data is zero or error than remove entire line. And its work for me

Parsing out abnormal characters

I have to work with text that was previously copy/pasted from an excel document into a .txt file. There are a few characters that I assume mean something to excel but that show up as an unrecognised character (i.e. that '?' symbol in gedit, or one of those rectangles in some other text editors.). I wanted to parse those out somehow, but I'm unsure of how to do so. I know regular expressions can be helpful, but there really isn't a pattern that matches unrecognisable characters. How should I set about doing this?
you could work with http://spreadsheet.rubyforge.org/ maybe to read / parse the data
I suppose you're getting these characters because the text file contains invalid Unicode characters, that means your '?'s and triangles could actually be unrecognized multi byte sequences.
If you want to properly handle the spreadsheet contents, i recommend you to first export the data to CSV using (Open|Libre)Office and choosing UTF-8 as file encoding.
https://en.wikipedia.org/wiki/Comma-separated_values
If you are not worried about multi byte sequences I find this regex to be handy:
line.gsub( /[^0-9a-zA-Z\-_]/, '*' )

Least used delimiter character in normal text < ASCII 128

For coding reasons which would horrify you (I'm too embarrassed to say), I need to store a number of text items in a single string.
I will delimit them using a character.
Which character is best to use for this, i.e. which character is the least likely to appear in the text? Must be printable and probably less than 128 in ASCII to avoid locale issues.
I would choose "Unit Separator" ASCII code "US": ASCII 31 (0x1F)
In the old, old days, most things were done serially, without random access. This meant that a few control codes were embedded into ASCII.
ASCII 28 (0x1C) File Separator - Used to indicate separation between files on a data input stream.
ASCII 29 (0x1D) Group Separator - Used to indicate separation between tables on a data input stream (called groups back then).
ASCII 30 (0x1E) Record Separator - Used to indicate separation between records within a table (within a group). These roughly map to a tuple in modern nomenclature.
ASCII 31 (0x1F) Unit Separator - Used to indicate separation between units within a record. The roughly map to fields in modern nomenclature.
Unit Separator is in ASCII, and there is Unicode support for displaying it (typically a "us" in the same glyph) but many fonts don't display it.
If you must display it, I would recommend displaying it in-application, after it was parsed into fields.
Assuming for some embarrassing reason you can't use CSV I'd say go with the data. Take some sample data, and do a simple character count for each value 0-127. Choose one of the ones which doesn't occur. If there is too much choice get a bigger data set. It won't take much time to write, and you'll get the answer best for you.
The answer will be different for different problem domains, so | (pipe) is common in shell scripts, ^ is common in math formulae, and the same is probably true for most other characters.
I personally think I'd go for | (pipe) if given a choice but going with real data is safest.
And whatever you do, make sure you've worked out an escaping scheme!
When using different languages, this symbol: ¬
proved to be the best. However I'm still testing.
Probably | or ^ or ~ you could also combine two characters
You said "printable", but that can include characters such as a tab (0x09) or form feed (0x0c). I almost always choose tabs rather than commas for delimited files, since commas can sometimes appear in text.
(Interestingly enough the ascii table has characters GS (0x1D), RS (0x1E), and US (0x1F) for group, record, and unit separators, whatever those are/were.)
If by "printable" you mean a character that a user could recognize and easily type in, I would go for the pipe | symbol first, with a few other weird characters (# or ~ or ^ or \, or backtick which I can't seem to enter here) as a possibility. These characters +=!$%&*()-'":;<>,.?/ seem like they would be more likely to occur in user input. As for underscore _ and hash # and the brackets {}[] I don't know.
How about you use a CSV style format? Characters can be escaped in a standard CSV format, and there's already a lot of parsers already written.
Can you use a pipe symbol? That's usually the next most common delimiter after comma or tab delimited strings. It's unlikely most text would contain a pipe, and ord('|') returns 124 for me, so that seems to fit your requirements.
For fast escaping I use stuff like this:
say you want to concatinate str1, str2 and str3
what I do is:
delimitedStr=str1.Replace("#","#a").Replace("|","#p")+"|"+str2.Replace("#","#a").Replace("|","#p")+"|"+str3.Replace("#","#a").Replace("|","#p");
then to retrieve original use:
splitStr=delimitedStr.Split("|".ToCharArray());
str1=splitStr[0].Replace("#p","|").Replace("#a","#");
str2=splitStr[1].Replace("#p","|").Replace("#a","#");
str3=splitStr[2].Replace("#p","|").Replace("#a","#");
note: the order of the replace is important
its unbreakable and easy to implement
Pipe for the win! |
We use ascii 0x7f which is pseudo-printable and hardly ever comes up in regular usage.
Well it's going to depend on the nature of your text to some extent but a vertical bar 0x7C doesn't crop up in text very often.
I don't think I've ever seen an ampersand followed by a comma in natural text, but you can check the file first to see if it contains the delimiter, and if so, use an alternative. If you want to always be able to know that the delimiter you use will not cause a conflict, then do a loop checking the file for the delimiter you want, and if it exists, then double the string until the file no longer has a match. It doesn't matter if there are similar strings because your program will only look for exact delimiter matches.
This can be good or bad (usually bad) depending on the situation and language, but keep mind mind that you can always Base64 encode the whole thing. You then don't have to worry about escaping and unescaping various patterns on each side, and you can simply seperate and split strings based on a character which isn't used in your Base64 charset.
I have had to resort to this solution when faced with putting XML documents into XML properties/nodes. Properties can't have CDATA blocks in them at all, and nodes escaped as CDATA obviously cannot have further CDATA blocks inside that without breaking the structure.
CSV is probably a better idea for most situations, though.
Both pipe and caret are the obvious choices. I would note that if users are expected to type the entire response, caret is easier to find on any keyboard than is pipe.
I've used double pipe and double caret before. The idea of a non printable char works if your not hand creating or modifying the file. For quick random access file storage and retrieval field width is used. You don't even have to read the file.. your literally pulling from the file by reference. This is how databases do some storage.. but they also manage the spaces between records and such. And it introduced the problem of max data element width. (Index attach a header which is used to define the width of each element and it's data type in the original old days.. later they introduced compression with remapping chars. This allows for a text file to get about 1/8 the size in transmission.. variable length char encoding for the win
make it dynamic : )
announce your control characters in the file header
for example
delimiter: ~
escape: \
wrapline: $
width: 19
hello world~this i$
s \\just\\ a sampl$
e text~$someVar$~h$
ere is some \~\~ma$
rkdown strikethrou$
gh\~\~ text
would give the strings
hello world
this is \just\ a sample text
$someVar$
here is some ~~markdown strikethrough~~ text
i have implemented something similar:
a plaintar text container format,
to escape and wrap utf16 text in ascii,
as an alternative to mime multipart messages.
see https://github.com/milahu/live-diff-html-editor

Resources