Sed replace character tab delimited csv - shell

I am very new to shell scripting and having issue replacing character of tab delimited csv.
I want to convert the csv to text file and change the delimiter from tab to ~ , i tried below code but the delimiter turns out different some like japanese char+" instead of ~.
sed 's/\t/\"\~\"/g' test.csv > test.txt
Appreciate your help.. thanks in advance

If it's char for char, use tr:
cat test.csv | tr '\t' '~' > test.txt

Use iconv to change encoding to UTF-8:
iconv -f utf-16 -t utf-8 < test.csv | sed 's/\t/"~"/g' > test.txt
" and ~ can go unescaped.

Related

Why is awk overwriting the earlier-printed columns? [duplicate]

I have text file which shows ^M character when opened using less command in mac terminal.
I tried using the below command to remove ^M character.
awk '{ gsub("\n", "\r"); print $0;}' input > output
cat input | tr ‘\n’ ‘\r’ > output
But none of them worked. Could someone help to fix this using some Linux commands.
You can use sed:
sed 's/^M// filename > newfilename
If you wish to use awk then do:
awk '{sub(/^M/,"")}1' filename > newfilename
To enter ^M, type CTRL-V, then CTRL-M. That is, hold down the CTRL key then press V and M in succession.
Update
As suggested by #glenn jackman in comments, it is easy to use \r then to get ^M
col < input > output
Or:
vim "+set ff=unix" "+saveas output" "+q" input
use the octal value from http://www.asciitable.com/
echo "1'2^M34" | awk 'gsub(/\015/,":")'
1'2:34

convert a fully quoted csv file to tsv format

I have a file like:
"a","b","c"...
And want to convert the comma to tab for the delimiter.
I tried:
sed -e 's/","/"\t"/g' < input_file > output_file
Yet, it looks the only effect is to change the comma to the literal character t:
"a"t"b"t"c"...
Anything wrong with my sed expression?
This is a problem with non GNU versions of sed, if possible use space as delimier or paste tab instead of sed, or use $(printf \t) instead of \t

need to replace 2 things from a csv file; one is Y with 'Y and the the newline charater by ',' value

I need to replace 2 things from a csv file using Unix shell scripting ; one is Y with 'Y and the newline character by ',' value..
my csv:(values will be one below another vertically and all values start with Y)
YC1234
YC5678
expected output is NEEDED IN A LINE horizontally like :
'YC1234','YC5678'
kindly help as i am new to shell scripting...
i tried sed by its difficult in removing newline
cat master_upd.csv | sed -e 's//'\'','\''/g' | sed 's/\n/ /'
tr '\n' ',' < master.csv
The first command 'cat' is probably unnecessary.
Here is a thread on replacing newlines with something else using sed: How can I replace a newline (\n) using sed?
sed -e 's/^.*$/'\''&'\''/' master_upd.csv | sed ':a;N;$!ba;s/\n/,/g'
The first sed puts in the single quotes, the second one replaces newlines with commas. Notice that I removed the cat and just told sed to open your file directly.
awk -v q="'" 'NR>1{printf ","}{printf "%s",q$0q}END{print ""}' file

Remove Unicode characters from textfiles - sed , other Bash/shell methods

How do I remove Unicode characters from a bunch of text files in the terminal?
I've tried this, but it didn't work:
sed 'g/\u'U+200E'//' -i *.txt
I need to remove these Unicode characters from the text files:
U+0091 - sort of weird "control" space
U+0092 - same sort of weird "control" space
A0 - non-space break
U+200E - left to right mark
Clear all non-ASCII characters of file.txt:
$ iconv -c -f utf-8 -t ascii file.txt
$ strings file.txt
Options:
-c # discard unconvertible characters
-f # from ENCODING
-t # to ENCODING
If you want to remove only particular characters and you have Python, you can:
CHARS=$(python -c 'print u"\u0091\u0092\u00a0\u200E".encode("utf8")')
sed 's/['"$CHARS"']//g' < /tmp/utf8_input.txt > /tmp/ascii_output.txt
For UTF-8 encoding of Unicode, you can use this regular expression for sed:
sed 's/\xc2\x91\|\xc2\x92\|\xc2\xa0\|\xe2\x80\x8e//g'
Use iconv:
iconv -f utf8 -t ascii//TRANSLIT < /tmp/utf8_input.txt > /tmp/ascii_output.txt
This will translate characters like "Š" into "S" (most similar looking ones).
Convert Swift files from UTF-8 to ASCII:
for file in *.swift; do
iconv -f utf-8 -t ascii "$file" > "$file".tmp
mv -f "$file".tmp "$file"
done
Swift auto completion not working in Xcode 6 Beta

shell replace cr\lf by comma

I have input.txt
1
2
3
4
5
I need to get such output.txt
1,2,3,4,5
How to do it?
Try this:
tr '\n' ',' < input.txt > output.txt
With sed, you could use:
sed -e 'H;${x;s/\n/,/g;s/^,//;p;};d'
The H appends the pattern space to the hold space (saving the current line in the hold space). The ${...} surrounds actions that apply to the last line only. Those actions are: x swap hold and pattern space; s/\n/,/g substitute embedded newlines with commas; s/^,// delete the leading comma (there's a newline at the start of the hold space); and p print. The d deletes the pattern space - no printing.
You could also use, therefore:
sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}'
The -n suppresses default printing so the final d is no longer needed.
This solution assumes that the CRLF line endings are the local native line ending (so you are working on DOS) and that sed will therefore generate the local native line ending in the print operation. If you have DOS-format input but want Unix-format (LF only) output, then you have to work a bit harder - but you also need to stipulate this explicitly in the question.
It worked OK for me on MacOS X 10.6.5 with the numbers 1..5, and 1..50, and 1..5000 (23,893 characters in the single line of output); I'm not sure that I'd want to push it any harder than that.
In response to #Jonathan's comment to #eumiro's answer:
tr -s '\r\n' ',' < input.txt | sed -e 's/,$/\n/' > output.txt
tr and sed used be very good but when it comes to file parsing and regex you can't beat perl
(Not sure why people think that sed and tr are closer to shell than perl... )
perl -pe 's/\n/$1,/' your_file
if you want pure shell to do it then look at string matching
${string/#substring/replacement}
Use paste command. Here is using pipes:
echo "1\n2\n3\n4\n5" | paste -s -d, /dev/stdin
Here is using a file:
echo "1\n2\n3\n4\n5" > /tmp/input.txt
paste -s -d, /tmp/input.txt
Per man pages the s concatenates all lines and d allows to define the delimiter character.
Awk versions:
awk '{printf("%s,",$0)}' input.txt
awk 'BEGIN{ORS=","} {print $0}' input.txt
Output - 1,2,3,4,5,
Since you asked for 1,2,3,4,5, as compared to 1,2,3,4,5, (note the comma after 5, most of the solutions above also include the trailing comma), here are two more versions with Awk (with wc and sed) to get rid of the last comma:
i='input.txt'; awk -v c=$(wc -l $i | cut -d' ' -f1) '{printf("%s",$0);if(NR<c){printf(",")}}' $i
awk '{printf("%s,",$0)}' input.txt | sed 's/,\s*$//'
printf "1\n2\n3" | tr '\n' ','
if you want to output that to a file just do
printf "1\n2\n3" | tr '\n' ',' > myFile
if you have the content in a file do
cat myInput.txt | tr '\n' ',' > myOutput.txt
python version:
python -c 'import sys; print(",".join(sys.stdin.read().splitlines()))'
Doesn't have the trailing comma problem (because join works that way), and splitlines splits data on native line endings (and removes them).
cat input.txt | sed -e 's|$|,|' | xargs -i echo "{}"

Resources