Sed command to replace comma-separated values with pipe-separated values - shell

My CSV file has records in the following format:
571283,1,"R","01/15/2002","IBMS,SL"
I want to convert them to the following format:
571283|1|R|01/15/2002|IBMS,SL
I tried this:
sed -e 's/ //g' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' test.csv
but the output I am getting is:
571283,1|R|01/15/2002|IBMS,SL
Please advise.

Using gnu-awk with FPAT:
awk -v FPAT='"[^"]+"|[^,]+' -v OFS='|' '{for(i=1; i<=NF; i++) gsub(/"/, "", $i)} 1' file
571283|1|R|01/15/2002|IBMS,SL
In case gnu-awk is unavailable use this perl command:
perl -pe 's/(?=(([^"]*"){2})*[^"]*$),/|/g; s/"//g' file
571283|1|R|01/15/2002|IBMS,SL

This works:
sed -e 's/,/|/' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' test.csv
Result is:
571283|1|R|01/15/2002|IBMS,SL
Your first sequence:
-e 's/ //g'
has to be changed in:
-e 's/,/|/'
Expanding to reply to your comment.
First of all you have to take care of the fact that sed is sequential, so the order of the transformations is important.
In your string:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015,"330930929, 330931800",,
You have same characters that you want to transform in different ways. The use of g for global and the sequence of transformations are therefore very important.
Let us build the sequence:
first of all let us get rid of ",, that we want to transform in ||:
sed -e 's/\"\,\,/||/' test.csv
will give us:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015,"330930929, 330931800||
Then we do the same with ," that we want to become |:
sed -e 's/\"\,\,/||/' -e 's/\,\"/|/' test.csv
gives:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015|330930929, 330931800||
Now we still have 2 commas we want to transform in | but not the third, the simply way is to repeat twice that transformation:
sed -e 's/\"\,\,/||/' -e 's/\,\"/|/' -e 's/,/|/' -e 's/,/|/' test.csv
Market Basket - WF Note A-2|RECM-PS Transfer|09/22/2015|330930929, 330931800||
That is!
Advise:
Think to your sequence of transformations strategy;
Apply one transformation at the time and look at the result, remembering that that will be fed to the next one.
But finally I think you need to transform BOTH strings at the same time, so that:
Market Basket - WF Note A-2,RECM-PS Transfer,09/22/2015,"330930929, 330931800",,
571283,1,"R","01/15/2002","IBMS,SL"
becomes:
Market Basket - WF Note A-2|RECM-PS Transfer|09/22/2015|330930929, 330931800||
571283|1|R|01/15/2002|IBMS,SL
This is the sequence of transformations that does that:
sed -e 's/\",\"/|/' -e 's/\"\,\,/||/' -e 's/\,\"/|/' -e 's/,/|/' -e 's/,/|/' -e 's/\"//g' test.csv
Regards

Generally, using a bona fide CSV parser is the simplest and most robust choice when it comes to parsing CSV data.
For instance, python's CSV parsing provides a straightforward solution:
$ python -c 'import csv,sys; reader=csv.reader(sys.stdin)
for row in reader:
print("|".join(row))' < test.csv
571283|1|R|01/15/2002|IBMS,SL
As a one-liner (in bash, ksh, or zsh):
python -c $'import csv,sys; reader=csv.reader(sys.stdin)\nfor row in reader:\n print("|".join(row))' < test.csv

sed -e 's/ //g' -e 's/\"\,\"/\|/g' -e 's/\,\"/\|/g' -e 's/\"$//' -e 's/\,/\|/g' btest.txt

Related

multiple sed with -e and escape characters

I'm trying to do multiple replacements in a gzipped file and have been having trouble.
zcat PteBra.fa.align.gz | sed -e 's#Simple_repeat/Satellite/Y-chromosome#Simple_repeat/Satellite#g' -e sed 's#Unknown/Unknown/Y-chromosome#Unknown/Unknown#g' -e sed 's#DNA/DNA/TcMar#DNA/TcMar#g' -e sed 's#DNA/DNA/Crypton#DNA/Crypton#g' -e sed 's#DNA/DNA/PIF-Harbinger#DNA/PIF-Harbinger#g' -e sed 's#DNA/DNA/CMC-Chapaev-3#DNA/CMC-Chapaev-3#g' -e sed 's#SINE/SINE/RTE#SINE/RTE#g' > PteBra.fa.align.corrected
Note that I'm using # instead of the standard / because of the presence of / in the text I want to replace. Each individual sed works with no problem but stringing them together yields this consistent error:
sed: -e expression #2, char 3: unterminated `s' command
I have looked all over for a solution but finally, to get the work done, just did all the sed's individually. It takes FOREVER, so I'd like to get this option working.
I've been at this for hours and would appreciate some help.
What am I doing wrong?
Thanks.
You don't have to write -e sed each time! -e will do.
zcat PteBra.fa.align.gz | sed -e 's#Simple_repeat/Satellite/Y-chromosome#Simple_repeat/Satellite#g' -e 's#Unknown/Unknown/Y-chromosome#Unknown/Unknown#g' -e 's#DNA/DNA/TcMar#DNA/TcMar#g' -e 's#DNA/DNA/Crypton#DNA/Crypton#g' -e 's#DNA/DNA/PIF-Harbinger#DNA/PIF-Harbinger#g' -e 's#DNA/DNA/CMC-Chapaev-3#DNA/CMC-Chapaev-3#g' -e 's#SINE/SINE/RTE#SINE/RTE#g' > PteBra.fa.align.corrected
or you can use semicolon inside sed string expression itself
zcat PteBra.fa.align.gz | sed -e '
s#Simple_repeat/Satellite/Y-chromosome#Simple_repeat/Satellite#g;
s#Unknown/Unknown/Y-chromosome#Unknown/Unknown#g;
s#DNA/DNA/TcMar#DNA/TcMar#g;
s#DNA/DNA/Crypton#DNA/Crypton#g;
s#DNA/DNA/PIF-Harbinger#DNA/PIF-Harbinger#g;
s#DNA/DNA/CMC-Chapaev-3#DNA/CMC-Chapaev-3#g;
s#SINE/SINE/RTE#SINE/RTE#g
' > PteBra.fa.align.corrected
As you already have a proper answer, this is not yet another answer
but a small suggestion for the actual operation.
I imagine writing the sed command in a line may be a messy job. How about
preparing a look-up table which describes a replacee and a replacer
in a line as a csv format like:
table.txt
Simple_repeat/Satellite/Y-chromosome,Simple_repeat/Satellite
Unknown/Unknown/Y-chromosome,Unknown/Unknown
DNA/DNA/TcMar,DNA/TcMar
DNA/DNA/Crypton,DNA/Crypton
DNA/DNA/PIF-Harbinger,DNA/PIF-Harbinger
DNA/DNA/CMC-Chapaev-3,DNA/CMC-Chapaev-3
SINE/SINE/RTE,SINE/RTE
Then you can execute the following awk script to replace the strings:
zcat PteBra.fa.align.gz | awk -F, '
NR==FNR {repl[$1] = $2; next}
{
for (r in repl) gsub(r, repl[r])
print
}
' table.txt - > PteBra.fa.align.corrected
Hope this helps.

Error on sed script - extra characters after command

I've been trying to create a sed script that reads a list of phone numbers and only prints ones that match the following schemes:
+1(212)xxx-xxxx
1(212)xxx-xxxx
I'm an absolute beginner, but I tried to write a sed script that would print this for me using the -n -r flags (the contents of which are as follows):
/\+1\(212\)[0-9]{3}-[0-9]{4}/p
/1\(212\)[0-9]{3}-[0-9]{4}/p
If I run this in sed directly, it works fine (i.e. sed -n -r '/\+1\(212\)[0-9]{3}-[0-9]{4}/p' sample.txt prints matching lines as expected. This does NOT work in the sed script I wrote, instead sed says:
sed: -e expression #1, char 2: extra characters after command
I could not find a good solution, this error seems to have so many causes and none of the answers I found apply easily here.
EDIT: I ran it with sed -n -r script.sed sample.txt
sed can not automatically determine whether you intended a parameter to be a script file or a script string.
To run a sed script from a file, you have to use -f:
$ echo 's/hello/goodbye/g' > demo.sed
$ echo "hello world" | sed -f demo.sed
goodbye world
If you neglect the -f, sed will try to run the filename as a command, and the delete command is not happy to have emo.sed after it:
$ echo "hello world" | sed demo.sed
sed: -e expression #1, char 2: extra characters after command
Of the various unix tools out there, two use BRE as their default regex dialect. Those two tools are sed and grep.
In most operating systems, you can use egrep or grep -E to tell that tool to use ERE as its dialect. A smaller (but still significant) number of sed implementations will accept a -E option to use ERE.
In BRE mode, however, you can still create atoms with brackets. And you do it by escaping parentheses. That's why your initial expression is failing -- the parentheses are NOT special by default in BRE, but you're MAKING THEM SPECIAL by preceding the characters with backslashes.
The other thing to keep in mind is that if you want sed to execute a script from a command line argument, you should use the -e option.
So:
$ cat ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
212-xxx-xxxx
$ grep '^+\{0,1\}1([0-9]\{3\})' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ egrep '^[+]?1\([0-9]{3}\)' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ sed -n -e '/^+\{0,1\}1([0-9]\{3\})/p' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ sed -E -n -e '/^[+]?1\([0-9]{3}\)/p' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
Depending on your OS, you may be able to get a full list of how this works from man re_format.

How to combine multiple sed commands into one [duplicate]

This question already has answers here:
Combining two sed commands
(2 answers)
Closed 1 year ago.
I have 4 different sed commands which I am running on a file. And in order to tune in the performance of these 4 commands, I want to combine them into one.
Each command is a complex command with -E switch. Searched many many forums but could not get my specific answer.
sed -i -E ':a; s/('"$search_str"'X*)[^X&]/\1X/; ta' "$newfile"
sed -i -E '/[<]ExtData[>?" "]/{:a; /Name=/{/Name="'"$nvp_list_ORed"'"/!b}; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/\1X/; tb; }' "$newfile"
sed -i -E ':a; s/('"$search_str1"'X*)[^X\<]/\1X/; ta' "$newfile"
sed -i -E ':a; s/('"$search_str2"'X*)[^X\/]/\1X/; ta' "$newfile"
And i want to combine them say something like
sed -i -E 'command1' -e 'command2' -e 'command3' -e 'command4'
"$newfile"
But it is not working. Because may be -E and -e can't be combine.
Please let me know.
Thanks !! Puneet
-E means "extended regex" and is a standalone flag, -e means "expression" and must be followed by a sed expression.
You can combine them, but each of your sed expression must be preceded by a -e if you want multiple of them, which isn't the case of your first one.
sed -i -E -e 'command1' -e 'command2' -e 'command3' -e 'command4' "$newfile"
A second option is to write each command in the same expression :
sed -i -E 'command1;command2;command3;command4' "$newfile"
However, since you're using labels I wouldn't rely on this option ; some implementations may not support it as John1024 pointed out.
Lastly, as mentionned by Mad Physicist, you can write your sed expressions to a file which you'll reference through the -f option.
The file must contain a single sed expression by line (you can write multiline expressions by suffixing each line but the last by a \, thus escaping the line-feed).
Simply pipe them:
sed -E 'A' file | sed -E 'B' | ... >file.tmp && mv file.tmp file
As #Aaron observed, if you want to give multiple separate expressions to sed, you must designate them as -e options; they will be combined. You can also combine a bunch of expressions into one by separating the pieces with semicolons.
Your case is a bit special however: your particular expressions use labels and branch instructions, with one of the label names (a) repeated in each expression. In order to combine these, each label should be distinct, and each branch (either conditional and absolute) should specify the correct label. That would look something like this:
sed -i -E \
-e ':a1; s/('"$search_str"'X*)[^X&]/\1X/; ta1' \
-e '/[<]ExtData[>?" "]/ {:a2; /Name=/ {/Name="'"$nvp_list_ORed"'"/ !b}; /Value=/ bb2; n; ba2; :b2; s/(Value="X*)[^X"]/\1X/; tb2; }' \
-e ':a3; s/('"$search_str1"'X*)[^X\<]/\1X/; ta3' \
-e ':a4; s/('"$search_str2"'X*)[^X\/]/\1X/; ta4' \
"$newfile"
Do note that even with proper quoting from a shell perspsective, which you appear to have, your approach will not do what you expect if the value of any of the interpolated shell variables contains a regex metacharacter.
Warning: It is not always possible to combine multiple sed scripts into a single one without change. Sometimes you might have to do a redesign of your algorithm.
Sed makes has two concepts of memory. The pattern space and the hold space. Concatenation is only working if these two spaces are identical in both sed commands. Below you find an example where the pattern space changes:
$ echo aa | sed -e 's/./&\n/' | sed -e '1s/a/b/g'
b
a
$ echo aa | sed -e 's/./&\n/' -e '1s/a/b/g'
b
b
$ echo aa | gsed -e 's/./&\n/;1s/a/b/g'
b
b
In the original pipeline, the first sed command works on the pattern space aa, while the second script's pattern space is only a.

Case insensitive search matching with sed?

I'm trying to use SED to extract text from two words, such as "Account" and "Recognized", and I'd like that the searching be case insensitive. So I tried to use the I parameter, but receive this error message:
cat Security.txt | sed -n "/Account/,/Recognized/pI" | sed -e '1d' -e '$d'
sed: -e expression #1, char 24: extra characters after command
Avoid useless use of cat
/pattern/I is how to specify case-insensitive matching in sed
sed -n "/Account/I,/Recognized/Ip" Security.txt | sed -e '1d' -e '$d'
You can use single sed command to achieve the same:
sed -n '/account/I,/recognized/I{/account/I!{/recognized/I!p}}' Security.txt
Or awk
awk 'BEGIN{IGNORECASE=1} /account/{f=1; next} /recognized/{f=0} f' Security.txt
Reference:
How to select lines between two patterns?
Use:
sed -n "/Account/,/Recognized/Ip"
i.e. change the order to: Ip instead of pI
You have useless use of cat where you should've fed the file directly to sed. Below could be a way of doing it.
$ cat file.txt
Some stuff Account sllslsljjs Security.
Another stuff account name and ffss security.
$ sed -nE 's/^.*account[[:blank:]]*(.*)[[:blank:]]*security.*$/\1/pI' file.txt
sllslsljjs
name and ffss
The [[:blank:]]* is greedy and will strip the spaces before and after the required text. The -E option enables the use of extended regular expressions.

How to compose custom command-line argument from file lines?

I know about the xargs utility, which allows me to convert lines into multiple arguments, like this:
echo -e "a\nb\nc\n" | xargs
Results in:
a b c
But I want to get:
a:b:c
The character : is used for an example. I want to be able to insert any separator between lines to get a single argument. How can I do it?
If you have a file with multiple lines than you want to change to a single argument changing the NEWLINES by a single character, the paste command is what you need:
$ echo -en "a\nb\nc\n" | paste -s -d ":"
a:b:c
Then, your command becomes:
your_command "$(paste -s -d ":" your_file)"
EDIT:
If you want to insert more than a single character as a separator, you could use sed before paste:
your_command "$(sed -e '2,$s/^/<you_separator>/' your_file | paste -s -d "")"
Or use a single more complicated sed:
your_command "$(sed -n -e '1h;2,$H;${x;s/\n/<you_separator>/gp}' your_file)"
The example you gave is not working for me. You would need:
echo -e "a\nb\nc\n" | xargs
to get a b c.
Coming back to your need, you could do this:
echo "a b c" | awk 'OFS=":" {print $1, $2, $3}'
it will change the separator from space to : or whatever you want it to be.
You can also use sed:
echo "a b c" | sed -e 's/ /:/g
that will output a:b:c.
After all these data processing, you can use xargs to perform the command you want to. Just | xargs and do whatever you want.
Hope it helps.
You can join the lines using xargs and then replace the space(' ' ) using sed.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'
will result in
a:b:c
obviously you can use this output as argument for other command using another xargs.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'|xargs

Resources