Repeatly replace a delimiter at a given count (4), with another character - shell

Given this line:
12,34,56,47,56,34,56,78,90,12,12,34,45
If the count of the commas(,) is greater than four, replace 4th comma(,) with ||.
If the count is lesser or equal to 4 no need replace the comma(,).
I am able to find the count by the following awk:
awk -F\, '{print NF-1}' text.txt
then I used an if condition to check if the result is greater than 4. But unable to replace 4th comma with ||
Find the count of the delimiter in a line and replace the particular position with another character.
Update:
I want to replace comma with || symbol after every 4th occurrence of the comma. Sorry for the confusion.
Expected output:
12,34,56,47||56,34,56,78||90,12,12,34||45

With GNU awk for gensub():
$ echo '12,34,56,47,56,34' | awk -F, 'NF>5{$0=gensub(/,/,"||",4)}1'
12,34,56,47||56,34
$ echo '12,34,56,47,56' | awk -F, 'NF>5{$0=gensub(/,/,"||",4)}1'
12,34,56,47,56

$ echo 12,34,56,47,56,34,56,78,90,12,12,34,45 | sed 's/,/||/4'
12,34,56,47||56,34,56,78,90,12,12,34,45
$ echo 12,34,56,47 | sed 's/,/||/4'
12,34,56,47
Should work with any POSIX sed
Update:
For the updated question you can use
$ echo 12,34,56,47,56,34,56,78,90,12,12,34,45 | sed -e 's/\(\([^,]*,\)\{3\}[^,]*\),/\1||/g'
12,34,56,47||56,34,56,78||90,12,12,34||45
Unfortunately, POSIX sed's s command can take either a number or g as a flag, but not both. GNU sed allows the combination, but it does not do what we want in this case. So you have to spell it out in the regular expression.

Using awk you can do:
s='12,34,56,47,56,34,56,78,90,12,12,34,45'
awk -F, '{for (i=1; i<NF; i++) printf "%s%s", $i, (i%4?FS:"||"); print $i}' <<< "$s"
12,34,56,47||56,34,56,78||90,12,12,34||45

if the count is greater than four i want to replace 4th comma(,) with
||
give this line a try (gnu sed):
sed -r '/([^,]*,){4}.*,/s/,/||/4' file
test:
kent$ echo ",,,,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,||,
kent$ echo ",,,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,,
kent$ echo ",,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,

with awk
awk -F, 'NF-1>4{for(i=1;i<NF;i++){if(i==4)k=k$i"||";else k=k$i","} print k$NF}' filename

Related

Using sed command in shell script for substring and replace position to need

I’m dealing data on text file and I can’t find a way with sed to select a substring at a fixed position and replace it.
This is what I have:
X|001200000000000000000098765432|1234567890|TQ
This is what I need:
‘X’,’00000098765432’,’1234567890’,’TQ’
The following code in sed gives the substring I need (00000098765432) but not overwrites position to need
echo “ X|001200000000000000000098765432|1234567890|TQ” | sed “s/
*//g;s/|/‘,’/g;s/^/‘/;s/$/‘/“
Could you help me?
Rather than sed, I would use awk for this.
echo "X|001200000000000000000098765432|1234567890|TQ" | awk 'BEGIN {FS="|";OFS=","} {print $1,substr($2,17,14),$3,$4}'
Gives output:
X,00000098765432,1234567890,TQ
Here is how it works:
FS = Field separator (in the input)
OFS = Output field separator (the way you want output to be delimited)
BEGIN -> think of it as the place where configurations are set. It runs only one time. So you are saying you want output to be comma delimited and input is pipe delimited.
substr($2,17,14) -> Take $2 (i.e. second field - awk begins counting from 1 - and then apply substring on it. 17 means the beginning character position and 14 means the number of characters from that position onwards)
In my opinion, this is much more readable and maintainable than sed version you have.
If you want to put the quotes in, I'd still use awk.
$: awk -F'|' 'BEGIN{q="\047"} {print q $1 q","q substr($2,17,14) q","q $3 q","q $4 q"\n"}' <<< "X|001200000000000000000098765432|1234567890|TQ"
'X','00000098765432','1234567890','TQ'
If you just want to use sed, note that you say above you want to remove 16 characters, but you are actually only removing 14.
$: sed -E "s/^(.)[|].{14}([^|]+)[|]([^|]+)[|]([^|]+)/'\1','\2','\3','\4'/" <<< "X|0012000000000000000098765432|1234567890|TQ"
'X','00000098765432','1234567890','TQ'
Using sed
$ sed "s/|\(0[0-9]\{15\}\)\?/','/g;s/^\|$/'/g" input_file
'X','00000098765432','1234567890','TQ'
Using any POSIX awk:
$ echo 'X|001200000000000000000098765432|1234567890|TQ' |
awk -F'|' -v OFS="','" -v q="'" '{sub(/.{16}/,"",$2); print q $0 q}'
'X','00000098765432','1234567890','TQ'
not as elegant as I hoped for, but it gets the job done :
'X','00000098765432','1234567890','TQ'
# gawk profile, created Mon May 9 21:19:17 2022
# BEGIN rule(s)
'BEGIN {
1 _ = sprintf("%*s", (__ = +2)^++__+--__*++__,__--)
1 gsub(".", "[0-9]", _)
1 sub("$", "$", _)
1 FS = "[|]"
1 OFS = "\47,\47"
}
# Rule(s)
1 (NF *= NF == __*__) * sub(_, "|&", $__) * \
sub("^.*[|]", "", $__) * sub(".+", "\47&\47") }'
Tested and confirmed working on gnu gawk 5.1.1, mawk 1.3.4, mawk 1.9.9.6, and macosx nawk
— The 4Chan Teller
awk -v del1="\047" \
-v del2="," \
-v start="3" \
-v len="17" \
'{
gsub(substr($0,start+1,len),"");
gsub(/[\|]/,del1 del2 del1);
print del1$0del1
}' input_file
'X',00000098765432','1234567890','TQ'

Use sed to transform a comma space seperated list into a comma seperated list with quotes around each element

I have this
a/b/Test b/c/Test c/d/Test
and want to transform it into:
"a/b/Test", "b/c/Test", "c/d/Test"
I know I can use this (here: path=a/b/Test b/c/Test c/d/Test)
test=$(echo $path | sed 's/ /", "/g')
to transform it into
a/b/Test", "b/c/Test", "c/d/Test
But here I am missing the first and last ".
I dont quite know how to use sed for this. Can I somehow change it and use the anchors ^ and $ to get the first and last part of the string and add " there?
sed 's/.*/"&"/g ; s/ /", "/g' filename
You may use awk:
s='a/b/Test b/c/Test c/d/Test'
awk -v OFS=', ' '{for (i=1; i<=NF; i++) $i = "\"" $i "\""} 1' <<< "$s"
"a/b/Test", "b/c/Test", "c/d/Test"
awk is easier:
awk -v OFS=", " -v q='"' '{for(i=1;i<=NF;i++)$i=q $i q}7'
You may just add double quotes if you have a single line text:
test="a/b/Test b/c/Test c/d/Test"
test='"'$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g')'"'
echo "$test"
See the online demo
If you have multiple lines use
test=$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g; s/^/"/g; s/$/"/g')
test=$(echo "$test" | sed -E 's/[[:space:]]+/",&"/g; s/^|$/"/g')
See this online demo
The [[:space:]]\{1,\} POSIX BRE pattern (equal to [[:space:]]+ POSIX ERE) matches one or more whitespace chars and & in the replacement pattern inserts this matched value back in the resulting string.

How to print keys from all key-value pairs

Text file looks like this:
key11=val1|key12=val2|key13=val3
key21=val1|key22=val2|key23=val3
How can I extract keys so that:
key11|key12|key13
key21|key22|key23
I have tried unsuccessfully :
awk '{ gsub(/[^[|]=]+=/,"") }1' file.txt
gives back the actual data:
key11=val1|key12=val2|key13=val3
key21=val1|key22=val2|key23=val3
Since you tagged bash
while IFS='=|' read -ra words; do
n=${#words[#]}
for ((i=1; i<n; i+=2)); do
unset words[i]
done
( IFS='|'; echo "${words[*]}" )
done < file
gawk
This can be done by awk, by setting FS and OFS :
kent$ awk -F'=[^|]*' -v OFS="" '$1=$1' file
key11|key12|key13
key21|key22|key23
or safer: awk -F.... '{$1=$1}1' file
substitution (by sed for example):
kent$ sed 's/=[^|]*//g' file
key11|key12|key13
key21|key22|key23
Here's one solution
echo "key11=val1|key12=val2|key13=val3" \
| awk -F'[=|]' '{
for (i=1;i<=NF;i+=2){
printf("%s%s", $i, (i<(NF-1))?"|":"")
}
print""
}'
output
key11|key12|key13
It should also work by passing in the filename as an argument to awk, i.e.
awk -F'[=|]' '{for (i=1;i<=NF;i+=2){printf("%s%s", $i, (i<(NF-1))?"|":"") }print""}' file1 [file_more_as_will_fit]
Discussion
We use a multiple character value for FS (FieldSeperator) so each = and | char mark the beginning of a new field.
-F'[=|]'
Because we know we want to start with field1 for output and skip every other field, we use
for (i=1;i<=NF;i+=2)
printf formats the output as defined by the format string '%s%s' . There area a zillion options available for printf format strs, but you only need the value for $i (the looping value that generates the key) and whether to print a | char or not.
printf("%s%s", $i ...)
And we use awk's ternary operator, which evaluates what element number is being processed (i<..). As long as it is not the 2nd to last field, the | char is emitted.
(i<(NF-1))?"|":""
IHTH
sed
I did this with sed:
sed -r 's/([[:alnum:]]*)=[[:alnum:]]*/\1/g' < file.txt
tested here and got:
key11|key12|key13
key21|key22|key23
s/<pattern>/<subst>/ means "replace <pattern> by <subst>", and with the g in the end it will do it for every pattern found in the line.
The [[:alnum:]]* is equivalent to [0-9a-zA-Z]*, and means any number of letters or digits.
The first pattern between parentesis will correspond to \1 in the substitution, the second \2 and so on.
So, it will match every "key=value" and replace it by "key".
awk -F'[=|]' '{print $1,$3,$5}' OFS="|" file
key11|key12|key13
key21|key22|key23

How to append a character after N patterns at each line in bash?

How can I insert a ',' after the 2nd character ',' at each line ?
I want the following :
input.txt
a,b,c,d,e
e,f,g,
h,,i
output.txt
a,b,,c,d,e
e,f,,g
h,,,i
Thanks in advance
input
$ cat input
a,b,c,d,e
e,f,g,
h,,i
using sed like:
$ N=2
$ cat input | sed "s/,/&,/${N}"
a,b,,c,d,e
e,f,,g,
h,,,i
$ N=3
$ cat input | sed "s/,/&,/${N}"
a,b,c,,d,e
e,f,g,,
h,,i
you can change the N.
s/pattern/replacement/flags
Substitute the replacement string for the pattern.
The value of flags in substitute function is zero or more of the following:
N Make the substitution only for the N'th occurrence
g Make the substitution for all
for function s/,/&,/${N}, it is find the N'th comma and replace it with two commas (An ampersand (&) appearing in the replacement is replaced by the pattern string). And ${N} just is a variable.
BTW, you need to escape the special character double quote if you want to insert ,""
awk to the rescue!
$ awk -F, -v OFS=, '{$3=OFS $3}1' file
a,b,,c,d,e
e,f,,g,
h,,,i
after second , is the third field. Prefix the third field with , and print.
Or, making the column number a parameter and writing delimiter once.
$ awk -F, -v c=3 'BEGIN{OFS=FS} {$c=OFS $c}1' file
This can be read as "insert a new column at position 3". Note that this will also work, adding the 6th column, which will be hard to replicate with sed.
$ awk -F, -v c=6 'BEGIN{OFS=FS} {$c=OFS $c}1' file
a,b,c,d,e,,
e,f,g,,,,
h,,i,,,,
Using sed:
sed -E 's/^([^,]*,[^,]*,)(.*)/\1,\2/' file.txt
Example:
% cat file.txt
a,b,c,d,e
e,f,g,
h,,i
% sed -E 's/^([^,]*,[^,]*,)(.*)/\1,\2/' file.txt
a,b,,c,d,e
e,f,,g,
h,,,i
You can use sed like this:
sed 's/^[^,]*,[^,]*/&,/' file
a,b,,c,d,e
e,f,,g,
h,,,i

Explode to Array

I put together this shell script to do two things:
Change the delimiters in a data file ('::' to ',' in this case)
Select the columns and I want and append them to a new file
It works but I want a better way to do this. I specifically want to find an alternative method for exploding each line into an array. Using command line arguments doesn't seem like the way to go. ANY COMMENTS ARE WELCOME.
# Takes :: separated file as 1st parameters
SOURCE=$1
# create csv target file
TARGET=${SOURCE/dat/csv}
touch $TARGET
echo #userId,itemId > $TARGET
IFS=","
while read LINE
do
# Replaces all matches of :: with a ,
CSV_LINE=${LINE//::/,}
set -- $CSV_LINE
echo "$1,$2" >> $TARGET
done < $SOURCE
Instead of set, you can use an array:
arr=($CSV_LINE)
echo "${arr[0]},${arr[1]}"
The following would print columns 1 and 2 from infile.dat. Replace with
a comma-separated list of the numbered columns you do want.
awk 'BEGIN { IFS='::'; OFS=","; } { print $1, $2 }' infile.dat > infile.csv
Perl probably has a 1 liner to do it.
Awk can probably do it easily too.
My first reaction is a combination of awk and sed:
Sed to convert the delimiters
Awk to process specific columns
cat inputfile | sed -e 's/::/,/g' | awk -F, '{print $1, $2}'
# Or to avoid a UUOC award (and prolong the life of your keyboard by 3 characters
sed -e 's/::/,/g' inputfile | awk -F, '{print $1, $2}'
awk is indeed the right tool for the job here, it's a simple one-liner.
$ cat test.in
a::b::c
d::e::f
g::h::i
$ awk -F:: -v OFS=, '{$1=$1;print;print $2,$3 >> "altfile"}' test.in
a,b,c
d,e,f
g,h,i
$ cat altfile
b,c
e,f
h,i
$

Resources