convert a fully quoted csv file to tsv format - bash

I have a file like:
"a","b","c"...
And want to convert the comma to tab for the delimiter.
I tried:
sed -e 's/","/"\t"/g' < input_file > output_file
Yet, it looks the only effect is to change the comma to the literal character t:
"a"t"b"t"c"...
Anything wrong with my sed expression?

This is a problem with non GNU versions of sed, if possible use space as delimier or paste tab instead of sed, or use $(printf \t) instead of \t

Related

How to convert separators using regex in bash

How do I modify my bash file to achieve the expected result shown below ?
#!/bin/bash
filename=$1
var="$(<$filename)" | tr -d '\n'
sed -i 's/;/,/g' $var
Convert this input file
a,b;c^d"e}
f;g,h!;i8j-
To this output file
a,b,c,d,e,f,g,h,i,j
How to convert separators using regex in bash
You would, well, literally, do exactly that - convert any of the separators using regex. This consists of steps:
most importantly, figure out the exact definition of what consists of a "separator"
writing a regex for it
writing an algorithm for it
running and testing the code
For example, assuming a separator is a sequence of of any of \n,;^"}!8- characters, you could do:
sed -zi 's/[,;^"}!8-]\+/,/g; s/,$/\n/' input_file
Or similar with first tr '\n' , for example when -z is not available with your sed, and then pass the result of tr to sed. The second regex adds a trailing newline on the output instead of a trailing ,.
Additionally, in your code:
var is unset on sed line. Parts of | pipeline are running in a subshell.
var=$(<$filename) contains the contents of the file, whereas sed wants a filename as argument, not file contents.
var=.... | ... is pipeing the result of assignment to tr. The output of assignment is empty, so that line produces nothing, and its output is unused.
Remember to check bash scripts with shellcheck.
For a somewhat portable solution, maybe try
tr -cs A-Za-z , <input_file | sed '$s/,$/\n/' >output_file
The use of \n to force a final newline is still not entirely reliable; there are some sed versions which interpret the sequence as a literal n.
You'd move output_file back on top of input_file after this command if you want to replace the original.

sed or awk in bash to format input list > output list and escape special characters

Hope someone can help me with a sed / awk pipe which I can use in bash to take an input list like this
Battleztar Bazinga
com.plumanalytics
ECCP/1.0
Go!Zilla
GT::WWW
MegaIndex.ru
MS Web Services Client
POE-Component-Client-HTTP
and give an output list like this
Battleztar\ Bazinga
com\.plumanalytics
ECCP\/1\.0
Go\!Zilla
GT\:\:WWW
MegaIndex\.ru
MS\ Web\ Services\ Client
POE\-Component\-Client\-HTTP
Basically escaping all special characters and spaces with backslash \
You can use this sed:
sed 's/[^[:alnum:]_]/\\&/g' file
Battleztar\ Bazinga
com\.plumanalytics
ECCP\/1\.0
Go\!Zilla
GT\:\:WWW
MegaIndex\.ru
MS\ Web\ Services\ Client
POE\-Component\-Client\-HTTP
Negated bracket expression [^[:alnum:]_] will match any character that is not an alpha-numeric and underscore.
In replacement \\& will place \ before matched text.
Using awk
awk 'gsub(/[^[:alnum:]]/,"\\\\&")+1' infile
If you have gawk, then you may use
gawk -i inplace 'gsub(/[^[:alnum:]]/,"\\\\&")+1' infile
which will modify original file,
to keep backup before
gawk -v INPLACE_SUFFIX=.bak -i inplace 'gsub(/[^[:alnum:]]/,"\\\\&")+1' infile

Bash Concatenate and Replace carriage returns with newline

I need to convert a series of text files that are formatted with line breaks to single lines separated by newlines (\n). For example:
This is an example text file
where the contents are separated
by line breaks
What I want this to look like is:
This is an example text file\nwhere the contents are separated\nby line breaks\n
I'm open to using awk, sed, or any builtin POSIX commands.
Please try this solution:
awk 'BEGIN{RS="\n";ORS="\\n"}1' file.txt
What we are doing is detect the Record Separator like '\n', and when we print we use '\n', the double slash implies it must print '\n', to force the printing we use the pattern 1 with the default action (print the whole record).
If you have any problem let me know, I don't have an awk available to try it.
It's not clear when you say "line break" if you you mean Carriage Return, Line Feed, or Newline or something else, nor is it clear if you want to replace newlines with the string \n or if you just want to strip Carriage Returns from newlines or something else, but if its the latter then all you need is:
dos2unix file
If you don't have dos2unix you can do it with any awk:
$ printf 'foo\r\nbar\r\n' | cat -v
foo^M
bar^M
$ printf 'foo\r\nbar\r\n' | awk '{sub(/\r$/,"")}1' | cat -v
foo
bar
You can't do it robustly with tr since it can't tell when a \r is at the end of a line or not, and you can't do it portably with sed.
This might work for you (GNU sed):
sed '1h;1!H;$!d;x;s/\n/\\n/g' file
Slurp the file into memory and quote newlines.

Bash: Filtering records in a file based on multi column delimiter

Need help in Bash to filter records based on a multicolumn delimiter.
Delimiter is |^^|
Sample record
xyz#ATT.NET|^^|xyz|^^|307
Awk runs file when used with single character delimiter but not with multi character.
awk -F"|^^|" "NF !=3 {print}" file.txt
Any suggestions?
The issue is that every character in your delimiter is a regexp metacharacter so you need to escape them when appropriate so awk knows you want them treated literally. This might be overkill:
awk -F'\\|\\^\\^\\|' 'NF!=3' file.txt
but I can't test it since you only provided one line of input, not the selection of lines some of which do/don't match that'd be required to test the script.
awk -F "<regex>" ...
It is not a multicolumn delimiter, is is a regular expression
simple regex,
such as match this single char are what you get use to,
but not all there is.
One way is to escape all the regex characters as #Ed Morton answered.
Alternatively,
you can replace all |^^| with a single character which never shows in your file content, here let's say a comma
sed 's/|^^|/,/g' file.txt
xyz#ATT.NET,xyz,307
The command would be
sed 's/|^^|/,/g' file.txt | awk -F, 'NF != 3'

How to replace newlines with tab characters?

I have pattern like below
hi
hello
hallo
greetings
salutations
no more hello for you
I am trying to replace all newlines with tab using the following command
sed -e "s_/\n_/\t_g"
but it's not working.
Could anybody please help? I'm looking for a solution in sed/awk.
tr is better here, I think:
tr "\n" "\t" < newlines
As Nifle suggested in a comment, newlines here is the name of the file holding the original text.
Because sed is so line-oriented, it's more complicated to use in a case like this.
not sure about output you want
# awk -vRS="\n" -vORS="\t" '1' file
hi hello hallo greetings salutations no more hello for you
sed '$!{:a;N;s/\n/\t/;ta}' file
You can't replace newlines on a line-by-line basis with sed. You have to accumulate lines and replace the newlines between them.
text abc\n <- can't replace this one
text abc\ntext def\n <- you can replace the one after "abc" but not the one at the end
This sed script accumulates all the lines and eliminates all the newlines but the last:
sed -n '1{x;d};${H;x;s/\n/\t/g;p};{H}'
By the way, your sed script sed -e "s_/\n_/\t_g" is trying to say "replace all slashes followed by newlines with slashes followed by tabs". The underscores are taking on the role of delimiters for the s command so that slashes can be more easily used as characters for searching and replacing.
paste -s
-s Concatenate all of the lines of each separate input file in
command line order. The newline character of every line
except the last line in each input file is replaced with the
tab character, unless otherwise specified by the -d option.
You are almost there with your sed script, you'd just need to change it to:
sed -e "s/\n/\t/g"
The \ is enough for escape, you don't need to add _
And you need to add the / before g at the end to let sed know that this is the last part of the script.

Resources