Replace special character with sed - shell

I'm trying to replace a special character with sed, the character are Þ to replace for ;
The lines of the file are, for example;
0370ÞA020Þ4000011600ÞRED USADOÞ0,00Þ20190414
0370ÞA020Þ4000011601ÞRED USADOÞ0,00Þ20190414
0370ÞA020Þ4000011602ÞRED USADOÞ0,00Þ20190414
Thanks!
Edit
Its worked and solved.
Thanks!!!

Try this - simple substitution work for me
sed 's/Þ/;/g'

That's the job tr was created to do but look at these results:
$ tr 'Þ' ';' < file
0370;;A020;;4000011600;;RED USADO;;0,00;;20190414
0370;;A020;;4000011601;;RED USADO;;0,00;;20190414
0370;;A020;;4000011602;;RED USADO;;0,00;;20190414
$ sed 's/Þ/;/g' < file
0370;A020;4000011600;RED USADO;0,00;20190414
0370;A020;4000011601;RED USADO;0,00;20190414
0370;A020;4000011602;RED USADO;0,00;20190414
tr seems to consider every Þ as being 2 duplicate characters - sed may think the same but while tr is converting a set of chars to a set of chars, sed is converting a regexp to a string and so even if it considers Þ to be 2 characters wide it'll still do what you want. So just an interesting warning about trying to use tr to replace non-ASCII characters - YMMV!

if your data in 'd' file, try gnu sed:
sed -E 'y/Þ/;/' d

Related

How to replace 2 characters if they are side by side in bash?

I'm looking to replace 2 characters in a string only if they are side by side.
Output I have now
hu1.cqf51:qu-2/2
hu2.cqf55:qe-2/2
hu2.cqf41:qe-2/2
The first line is incorrect as it has "qu" instead of "qe"
I'm looking to replace the qu with qe without replacing any other "q" or "e" in the string
Desired output
hu1.cqf51:qe-2/2
hu2.cqf55:qe-2/2
hu2.cqf41:qe-2/2
What I have tried
sed -r 's/[qu]+/qe/g'
sed -e 's//qe/g'
sed 's/\S*\(qu\)\S*//g'
Even considered just trying to delete the whole word alone if it matched with the command below, however it deleted everything with a q or u in it.
sed -e 's/[^ ]*qu[^ ]*//g'
Thank you for your help!!
Can you just try this ?
sed -e 's/qu/qe/' your_file
I'd suggest
sed 's/:qu-/:qe-/'
but if this is a standard format there is probably a standard way.

How to add double quote in csv file where field contains space?

One feature of legacy code doesn't work and I have to make a work around by redevelopping a quick and dirty feature.
We are generating csv file and I had something like that with legacy code :
foo; bar;"foo bar";foobar
"bla ble"; bli;blo;"blu bly"
Each field in my csv containing a space must be surrounded by a double quote "
Currently, with my quick and dirty script, my csv file got only
foo; bar;foo bar;foobar
bla ble; bli;blo;blu bly
This is not good because clients will have a breaking change with my quick and dirty script :D
I am developping a script using shell /bin/bash, I've search arround sed or awk but wasn't able to find something to help me.
Will you ? :)
Thanks !
Here is a simple awk:
$ awk 'BEGIN{FS=OFS=";"}{for(i=1;i<=NF;++i) if ($i ~ / /) $i = "\042" $i "\042"}1' file.csv
To quote fields that contain spaces (for example foo;foo bar -> foo;"foo bar") you can use sed:
sed 's/ *\(\w\+ \)\+\w\+/"&"/g' input.csv > output.csv
The pattern *\(\w\+ \+\)\+\w\+ matches zero or more spaces, followed by a group with a word and one or more spaces \(\w\+ \+\), then one or more occurrences of the group \+, followed by a word \w\+. The replacement "&" quotes the matched pattern.
Using Miller (https://github.com/johnkerl/miller) and running
mlr --icsvlite --ocsv --quote-all --fs ";" cat input
you will have
"foo";"bar";"foo bar";"foobar"
"bla ble";"bli";"blo";"blu bly"
I think it's no problem for you to have double quotes for all
echo "foo; bar;foo bar;foobar" | sed s'#;#+#'g | tr '+' '\n' | \
sed s'#^#\"#'g | sed s'#$#\";#'g | tr -d '\n'
The first thing this code does, is replace the colon delimiters with a placeholder, that can then be replaced with newlines.
From there, it's simple. I first replace the start of every new line with double quotes, and then the end with closing double quotes and a colon.
After that, I use tr to remove the newlines again, which puts all of the colon delimited fields back on the same line.

Insert character after pattern with character exclusion using sed

I have this string of file names.
FileNames="FileName1.txtStrange-File-Name2.txt.zipAnother-FileName.txt"
What I like to do is to separate the file names by semicolon so I can iterate over it. For the .zipextension I have a working command.
I tried the following:
FileNames="${FileNames//.zip/.zip;}"
echo "$FileNames" | sed 's|.txt[^.zip]|.txt;|g'
Which works partially. It add a semicolon to the .zip as expected, but where sed matches the .txt I got the output:
FileName1.txt;trange-File-Name2.txt.zip;Another-FileName.txt
I think because of the character exclusion sed replaces the following character after the match.
I would like to have an output like this:
FileName1.txt;Strange-File-Name2.txt.zip;Another-FileName.txt
I'm not sticked to sed, but it would be fine to using it.
There might be a better way, but you can do it with sed like this:
$ echo "FileName1.txtStrange-File-Name2.txt.zipAnother-FileName.txt" | sed 's/\(zip\|txt\)\([^.]\)/\1;\2/g'
FileName1.txt;Strange-File-Name2.txt.zip;Another-FileName.txt
Beware that [^.zip] matches 'one char that is not ., nor z, nor i nor p'. It does not match 'a word that is not .zip'
Note the less verbose solution by #sundeep:
sed -E 's/(zip|txt)([^.])/\1;\2/g'
sed -r 's/(\.[a-z]{3})(.)/\1;\2/g'
would be a more generic expression.

Unix shell script to remove new lines preceded with specific characters

First, thanks in advance for your helps.
I need to replace new lines (\n) by a space in an unix files when they are not preceded with ';'.
For example, if you have in an unix file something like :
TestFields;TestFields2
;TestFields3;TestFields4
The output should be :
TestFields;TestFields2 ;TestFields3;TestFields4
So I am using a sed command like that :
sed ':a;N;$!ba;s/[^;]\n/ /g'
The problem is that this command will replace also the character which is before \n so my outpu is like :
TestFields;TestFields ;TestFields3;TestFields4
I loose the '2' in the 'TestFields2' ..
Someone have an idea on how to keep my character but replace the \n ?
capture the matched char and use in replacement
$ sed -r ':a;N;$!ba;s/([^;])\n/\1 /g' file
TestFields;TestFields2 ;TestFields3;TestFields4
g suffix is probably not needed.
This might work for you (GNU sed):
sed ':a;N;/;\n/!s/\n/ /;ta;P;D' file
An alternative to slurping the whole file into memory and reads as the question was read i.e. if the character preceeding the newline is a ; do nothing otherwise replace the newline by a space.

Trying to remove non-printable characters (junk values) from a UNIX file

I am trying to remove non-printable character (for e.g. ^#) from records in my file. Since the volume to records is too big in the file using cat is not an option as the loop is taking too much time.
I tried using
sed -i 's/[^#a-zA-Z 0-9`~!##$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' FILENAME
but still the ^# characters are not removed.
Also I tried using
awk '{ sub("[^a-zA-Z0-9\"!##$%^&*|_\[](){}", ""); print } FILENAME > NEW FILE
but it also did not help.
Can anybody suggest some alternative way to remove non-printable characters?
Used tr -cd but it is removing accented characters. But they are required in the file.
Perhaps you could go with the complement of [:print:], which contains all printable characters:
tr -cd '[:print:]' < file > newfile
If your version of tr doesn't support multi-byte characters (it seems that many don't), this works for me with GNU sed (with UTF-8 locale settings):
sed 's/[^[:print:]]//g' file
Remove all control characters first:
tr -dc '\007-\011\012-\015\040-\376' < file > newfile
Then try your string:
sed -i 's/[^#a-zA-Z 0-9`~!##$%^&*()_+\[\]\\{}|;'\'':",.\/<>?]//g' newfile
I believe that what you see ^# is in fact a zero value \0.
The tr filter from above will remove those as well.
strings -1 file... > outputfile
seems to work. The strings program will take all printable characters, in this case of length 1 (the -1 argument) and print them. It effectively is removing all the non-printable characters.
"man strings" will provide the documentation.
Was searching for this for a while & found a rather simple solution:
The package ansifilter does exactly this. All you need to do is just pipe the output through it.
On Mac:
brew install ansifilter
Then:
cat file.txt | ansifilter

Resources