How to append a character after N patterns at each line in bash? - bash

How can I insert a ',' after the 2nd character ',' at each line ?
I want the following :
input.txt
a,b,c,d,e
e,f,g,
h,,i
output.txt
a,b,,c,d,e
e,f,,g
h,,,i
Thanks in advance

input
$ cat input
a,b,c,d,e
e,f,g,
h,,i
using sed like:
$ N=2
$ cat input | sed "s/,/&,/${N}"
a,b,,c,d,e
e,f,,g,
h,,,i
$ N=3
$ cat input | sed "s/,/&,/${N}"
a,b,c,,d,e
e,f,g,,
h,,i
you can change the N.
s/pattern/replacement/flags
Substitute the replacement string for the pattern.
The value of flags in substitute function is zero or more of the following:
N Make the substitution only for the N'th occurrence
g Make the substitution for all
for function s/,/&,/${N}, it is find the N'th comma and replace it with two commas (An ampersand (&) appearing in the replacement is replaced by the pattern string). And ${N} just is a variable.
BTW, you need to escape the special character double quote if you want to insert ,""

awk to the rescue!
$ awk -F, -v OFS=, '{$3=OFS $3}1' file
a,b,,c,d,e
e,f,,g,
h,,,i
after second , is the third field. Prefix the third field with , and print.
Or, making the column number a parameter and writing delimiter once.
$ awk -F, -v c=3 'BEGIN{OFS=FS} {$c=OFS $c}1' file
This can be read as "insert a new column at position 3". Note that this will also work, adding the 6th column, which will be hard to replicate with sed.
$ awk -F, -v c=6 'BEGIN{OFS=FS} {$c=OFS $c}1' file
a,b,c,d,e,,
e,f,g,,,,
h,,i,,,,

Using sed:
sed -E 's/^([^,]*,[^,]*,)(.*)/\1,\2/' file.txt
Example:
% cat file.txt
a,b,c,d,e
e,f,g,
h,,i
% sed -E 's/^([^,]*,[^,]*,)(.*)/\1,\2/' file.txt
a,b,,c,d,e
e,f,,g,
h,,,i

You can use sed like this:
sed 's/^[^,]*,[^,]*/&,/' file
a,b,,c,d,e
e,f,,g,
h,,,i

Related

Writing the output of a command to specific columns of a csv file, unix

I wanted to write the output of command to specific columns (3rd and 5th) of the csv file.
#!/bin/bash
echo -e "Value,1\nCount,1" >> file.csv
echo "Header1,Header2,Path,Header4,Value,Header6" >> file.csv
sed 'y/ /,/' input.csv >> file.csv
input.csv in the above snippet will look something like this
1234567890 /training/folder
0325435287 /training/newfolder
Current output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
1234567890,/training/folder
0325435287,/training/newfolder
Expected Output of file.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,/training/folder,,1234567890,
,,/training/newfolder,,0325435287,
All the operations can be done in a single awk:
awk -v OFS=, -v pre="Value,1\nCount,1" -v hdr="Header1,Header2,Path,Header4,Value,Header6" '
BEGIN {print pre; print hdr}
{print "", "", $1, "", $2, ""}
' input.csv
Value,1
Count,1
Header1,Header2,Path,Header4,Value,Header6
,,i1234567890,,/training/folder,
,,0325435287,,/training/newfolder,
With sed you could try following code. Which is using sed's capability of back reference.
sed -E 's/(^[^ ]*) +(.*$)/,,\2,,\1,/' Input_file
Explanation: Using -E option of sed to enable ERE(extended regular expressions) first. Then in main program using s option to perform substitution operation. In 1st part of substitution creating 2 back references(capability to catch values by using regex and keep them in temp buffer memory to be used later on while substituting it with in 2nd part of substitution). In 2nd part of substitution substituting whole line with 2 commas followed by 2nd capturing group\2 followed by 2 commas followed by 1st capturing group \1 following by ,.
You can use awk instead of sed
cat input.csv | awk '{print ",," $1 "," $2 ","}' >> file.csv
awk can process a stdin input by line to line. It implements a print function and each word is processed as a argument (in your case, $1 and $2). In the above example, I added ,, and , as an inline argument.
You can trivially add empty columns as part of your sed script.
sed 'y/ /,/;s/,/,,/;s/^/,,/;s/$/,/' input.csv >> file.csv
This replaces the first comma with two, then adds two up front and one at the end.
Your expected output does not look like valid CSV, though. This is also brittle in that it will fail for any file names which contain a space or a comma.

Repeatly replace a delimiter at a given count (4), with another character

Given this line:
12,34,56,47,56,34,56,78,90,12,12,34,45
If the count of the commas(,) is greater than four, replace 4th comma(,) with ||.
If the count is lesser or equal to 4 no need replace the comma(,).
I am able to find the count by the following awk:
awk -F\, '{print NF-1}' text.txt
then I used an if condition to check if the result is greater than 4. But unable to replace 4th comma with ||
Find the count of the delimiter in a line and replace the particular position with another character.
Update:
I want to replace comma with || symbol after every 4th occurrence of the comma. Sorry for the confusion.
Expected output:
12,34,56,47||56,34,56,78||90,12,12,34||45
With GNU awk for gensub():
$ echo '12,34,56,47,56,34' | awk -F, 'NF>5{$0=gensub(/,/,"||",4)}1'
12,34,56,47||56,34
$ echo '12,34,56,47,56' | awk -F, 'NF>5{$0=gensub(/,/,"||",4)}1'
12,34,56,47,56
$ echo 12,34,56,47,56,34,56,78,90,12,12,34,45 | sed 's/,/||/4'
12,34,56,47||56,34,56,78,90,12,12,34,45
$ echo 12,34,56,47 | sed 's/,/||/4'
12,34,56,47
Should work with any POSIX sed
Update:
For the updated question you can use
$ echo 12,34,56,47,56,34,56,78,90,12,12,34,45 | sed -e 's/\(\([^,]*,\)\{3\}[^,]*\),/\1||/g'
12,34,56,47||56,34,56,78||90,12,12,34||45
Unfortunately, POSIX sed's s command can take either a number or g as a flag, but not both. GNU sed allows the combination, but it does not do what we want in this case. So you have to spell it out in the regular expression.
Using awk you can do:
s='12,34,56,47,56,34,56,78,90,12,12,34,45'
awk -F, '{for (i=1; i<NF; i++) printf "%s%s", $i, (i%4?FS:"||"); print $i}' <<< "$s"
12,34,56,47||56,34,56,78||90,12,12,34||45
if the count is greater than four i want to replace 4th comma(,) with
||
give this line a try (gnu sed):
sed -r '/([^,]*,){4}.*,/s/,/||/4' file
test:
kent$ echo ",,,,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,||,
kent$ echo ",,,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,,
kent$ echo ",,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,
with awk
awk -F, 'NF-1>4{for(i=1;i<NF;i++){if(i==4)k=k$i"||";else k=k$i","} print k$NF}' filename

Concatenating characters on each field of CSV file

I am dealing with a CSV file which has the following form:
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
Since the BLAS routine I need to implement on such data takes double-floats only, I guess the easiest way is to concatenate d0 at the end of each field, so that each line looks like:
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
In pseudo-code, that would be:
For every line except the first line
For every field except the first field
Substitute ; with d0; and Substitute newline with d0 newline
My imagination suggests me it should be something like
cat file.csv | awk -F; 'NR>1 & NF>1'{print line} | sed 's/;/d0\n/g' | sed 's/\n/d0\n/g'
Any input?
Could use this sed
sed '1!{s/\(;[^;]*\)/\1d0/g}' file
Skips the first line then replaces each field beginning with ;(skipping the first) with itself and d0.
Output
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
I would say:
$ awk 'BEGIN{FS=OFS=";"} NR>1 {for (i=2;i<=NF;i++) $i=$i"d0"} 1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
That is, set the field separator to ;. Starting on line 2, loop through all the fields from the 2nd one appending d0. Then, use 1 to print the line.
Your data format looks a bit weird. Enclosing the first column in double quotes makes me think that it can contain the delimiter, the semicolon, itself. However, I don't know the application which produces that data but if this is the case, then you can use the following GNU awk command:
awk 'NR>1{for(i=2;i<=NF;i++){$i=$i"d0"}}1' OFS=\; FPAT='("[^"]+")|([^;]+)' file
The key here is the FPAT variable. Using it use are able to define how a field can look like instead of being limited to specify a set of field delimiters.
big-prices.csv
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
preprocess script
head -n 1 big-prices.csv 1>output.txt; \
tail -n +2 big-prices.csv | \
sed 's/;/d0;/g' | \
sed 's/$/d0/g' | \
sed 's/"d0/"/g' 1>>output.txt;
output.txt
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
note: would have to make minor modification to second sed if file has trailing whitespaces at end of lines..
Using awk
Input
$ cat file
Dates;A;B;C;D;E
"1999-01-04";1391.12;3034.53;66.515625;86.2;441.39
"1999-01-05";1404.86;3072.41;66.3125;86.17;440.63
"1999-01-06";1435.12;3156.59;66.4375;86.32;441
gsub (any awk)
$ awk 'FNR>1{ gsub(/;[^;]*/,"&d0")}1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0
gensub (gawk)
$ awk 'FNR>1{ print gensub(/(;[^;]*)/,"\\1d0","g"); next }1' file
Dates;A;B;C;D;E
"1999-01-04";1391.12d0;3034.53d0;66.515625d0;86.2d0;441.39d0
"1999-01-05";1404.86d0;3072.41d0;66.3125d0;86.17d0;440.63d0
"1999-01-06";1435.12d0;3156.59d0;66.4375d0;86.32d0;441d0

How to retrieve digits including the separator "."

I am using grep to get a string like this: ANS_LENGTH=266.50 then I use sed to only get the digits: 266.50
This is my full command: grep --text 'ANS_LENGTH=' log.txt | sed -e 's/[^[[:digit:]]]*//g'
The result is : 26650
How can this line be changed so the result still shows the separator: 266.50
You don't need grep if you are going to use sed. Just use sed' // to match the lines you need to print.
sed -n '/ANS_LENGTH/s/[^=]*=\(.*\)/\1/p' log.txt
-n will suppress printing of lines that do not match /ANS_LENGTH/
Using captured group we print the value next to = sign.
p flag at the end allows to print the lines that matches our //.
If your grep happens to support -P option then you can do:
grep -oP '(?<=ANS_LENGTH=).*' log.txt
(?<=...) is a look-behind construct that allows us to match the lines you need. This requires the -P option
-o allows us to print only the value part.
You need to match a literal dot as well as the digits.
Try sed -e 's/[^[[:digit:]\.]]*//g'
The dot will match any single character. Escaping it with the backslash will match only a literal dot.
Here is some awk example:
cat file:
some data ANS_LENGTH=266.50 other=22
not mye data=43
gnu awk (due to RS)
awk '/ANS_LENGTH/ {f=NR} f&&NR-1==f' RS="[ =]" file
266.50
awk '/ANS_LENGTH/ {getline;print}' RS="[ =]" file
266.50
Plain awk
awk -F"[ =]" '{for(i=1;i<=NF;i++) if ($i=="ANS_LENGTH") print $(i+1)}' file
266.50
awk '{for(i=1;i<=NF;i++) if ($i~"ANS_LENGTH") {split($i,a,"=");print a[2]}}' file
266.50

Shell scripting - replace every 5 commas with a newline

How can I replace every 5th comma in some input with a newline?
For example:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
becomes
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
Looking for a one-liner using something like sed...
This should work:
sed 's/\(\([^,]*,\)\{4\}[^,]*\),/\1\n/g'
Example:
$ echo "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15" |
> sed 's/\(\([^,]*,\)\{4\}[^,]*\),/\1\n/g'
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
This expression will do.
sed 's/\(\([0-9]\+,\)\{4\}\)\([0-9]\+\),/\1\3\n/g'
http://ideone.com/d4Va2
$ echo -n 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 | xargs -d, printf '%d,%d,%d,%d,%d\n'
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
The accepted solution works, but is overly complicated. Try:
sed ':d s/,/\n/5; P; D; Td'
Not all sed allow commands to be separated by semi-colons, so you may need a literal newline after each semi-colon. Also, I'm not sure that all sed allow a label followed by a command, so a literal newline may be required before the s command. In other words:
sed ':d
s/,/\n/5
P
D
Td'
nawk -F, '{for(i=1;i<=NF;i++){printf("%s%s",$i,i%5?",":"\n")}}' file3
test:
pearl.246> nawk -F, '{for(i=1;i<=NF;i++){printf("%s%s",$i,i%5?",":"\n")}}' file3
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
pearl.247>

Resources