How to load the "|" character (vertical bar) using vertica copy command - vertica

I'm trying to load a csv that contains the character "|" without success
can i escape it or use other techinieue?
can you help?
thanks

If you are using '|' as your delimiter and some fields also contain '|', you can escape them as '\|'. (Or with some other character, if you've changed your escape character. But by default, '\'.)
If you have a lot of these, it might be easier to change your delimiter character. It doesn't have to be '|'. For example, you can do this:
=> COPY t1 FROM '/data/*.csv' DELIMITER '+';
You can use any ASCII value in the range E'\000' to E'\177', inclusive. See the documentation for COPY parameters.

Related

How to insert text starting with double quotes in a column delimited with | in a import command in db2

Table contains 3 columns
ID -integer
Name-varchar
Description-varchar
A file with .FILE extension has data with delimiter as |
Eg: 12|Ramu|"Ramu" is an architect
Command I am using to load data to db2:
db2 "Load CLIENT FROM ABC.FILE of DEL MODIFIED BY coldel0x7x keepblanks REPLACE INTO tablename(ID,Name,Description) nonrecoverable"
Data is loaded as follows:
12 Ramu Ramu
but I want it as:
12 Ramu "Ramu" is an architect
Take a look at how the format of delimited ASCII files is defined. The double quote (") is an optional delimited for character data. You would need to escape it. I have not tested it, but I would assume that you double the quote as you would do in SQL:
|12|Ramu|"""Ramu"" is an architect"
Delimited files (CSV) are defined in RFC 4180. You need to either use quotes for the entire field or none at all. Only in fields beginning and ending with a quote, other quotes can be used. They need to be escaped as shown.
Use the nochardel modifier.
If you use '|' as a column delimiter, you must use 0x7C and not 0x7x:
MODIFIED BY coldel0x7C keepblanks nochardel

Apache NiFi Replace Text processor to use control character as delimiter

Using replace text processor while converting fixed width file to delimited with normal character like ';' , '|' ,',' as delimiters is working. However considering \u0001 or [^]A or \^A is not working as expected.
to use special chars you could use literal + unescapeXml nifi expression functions:
${literal(''):unescapeXml()}

ruby gsub new line characters

I have a string with newline characters that I want to gsub out for white space.
"hello I\r\nam a test\r\n\r\nstring".gsub(/[\\r\\n]/, ' ')
something like this ^ only my regex seems to be replacing the 'r' and 'n' letters as well. the other constraint is sometimes the pattern repeats itself twice and thus would be replaced with two whitespaces in a row, although this is not preferable it is better than all the text being cut apart.
If there is a way to only select the new line characters. Or even better if there a more rubiestic way of approaching this outside of going to regex?
If you have mixed consecutive line breaks that you want to replace with a single space, you may use the following regex solution:
s.gsub(/\R+/, ' ')
See the Ruby demo.
The \R matches any type of line break and + matches one or more occurrences of the quantified subpattern.
Note that in case you have to deal with an older version of Ruby, you will need to use the negated character class [\r\n] that matches either \r or \n:
.gsub(/[\r\n]+/, ' ')
or - add all possible linebreaks:
/gsub(/(?:\u000D\u000A|[\u000A\u000B\u000C\u000D\u0085\u2028\u2029])+/, ' ')
This should work for your test case:
"hello I\r\nam a test\r\n\r\nstring".gsub(/[\r\n]/, ' ')
If you don't want successive \r\n characters to result in duplicate spaces you can use this instead:
"hello I\r\nam a test\r\n\r\nstring".gsub(/[\r\n]+/, ' ')
(Note the addition of the + after the character class.)
As Wiktor mentioned, you're using \\ in your regex, which inside the regex literal /.../ actually escapes a backslash, meaning you're matching a literal backslash \, r, or n as part of your expression. Escaping characters works differently in regex literals, since \ is used so much, it makes no sense to have a special escape for it (as opposed to regular strings, which is a whole different animal).

pig custom function to load multiple character ^^ (double carrot) delimiter

I am new to PIG, can some one help me how can I load a file with multiple characters (in my case '^^') as a column delimiter.
for example i have file with following columns
aisforapple^^bisforball^^cisforcat^^disfordoll^^andeisforelephant
fisforfish^^gisforgreen^^hisforhat^^iisforicecreem^^andjisforjar
kisforking^^lisforlion^^misformango^^nisfornose^^andoisfororange
Regards
Regex is best suited for these kind of multiple characters
input.txt
aisforapple^^bisforball^^cisforcat^^disfordoll^^andeisforelephant
fisforfish^^gisforgreen^^hisforhat^^iisforicecreem^^andjisforjar
kisforking^^lisforlion^^misformango^^nisfornose^^andoisfororange
PigScript
A = LOAD 'input.txt' AS line;
B = FOREACH A GENERATE FLATTEN(REGEX_EXTRACT_ALL(line,'(.*)\\^\\^(.*)\\^\\^(.*)\\^\\^(.*)\\^\\^(.*)')) AS (f1,f2,f3,f4,f5);
DUMP B;
Output:
(aisforapple,bisforball,cisforcat,disfordoll,andeisforelephant)
(fisforfish,gisforgreen,hisforhat,iisforicecreem,andjisforjar)
(kisforking,lisforlion,misformango,nisfornose,andoisfororange)
Explanation:
For better understanding i break the regex into multiple lines
(.*)\\^\\^ ->Any character match till ^^ and stored into f1,(double backslash for special characters)
(.*)\\^\\^ ->Any character match till ^^ and stored into f2,(double backslash for special characters)
(.*)\\^\\^ ->Any character match till ^^ and stored into f3,(double backslash for special characters)
(.*)\\^\\^ ->Any character match till ^^ and stored into f4,(double backslash for special characters)
(.*) ->Any character match till the end of string and stored into f5

Sed or String Replace command in Unix to change last First Character after Sequence to UpperCase

I basically have these xml files where I need to change the first alphabet after
Eg.
Result:
I tried: sed 's/<structure name=\"/\U\/g'
However, this changes the entire word to uppercase. Can someone help me out?
\U is for converting all characters. You will need to use \u to convert the first occurrence.
Also, you will need to group them to ensure correct letter is converted:
sed 's/\(<structure name=\"\)\(.\)/\1\u\2/' xml-file
sed 's/<structure name=\"\(.\)/<structure name=\"\U\1/'
sed will only convert strings being substituted to uppercase. We can use a capturing group to only convert the first character after the sequence to uppercase.
Otherwise, you can also use \E, which is similar to \U, except it stops converting characters instead of starting it.

Resources