Use sed to replace percent signs with brackets - bash

I need to replace text surrounded by percent signs with brackets, eg:
This %is% a %test%
should become
This {is} a {test}
I tried: sed 's/\%([^]]*)\%/{\1}/g'
But that resulted in:
This {is% a %test}

Try this:
$ echo "This %is% a %test%" | sed -e 's/%\([^%]*\)%/{\1}/g'
This {is} a {test}
you need to escape the groups: \(...\) (otherwise you get invalid reference \1 on 's' command's RHS)
use [^%]* to match anything but %
you don't need to escape % (but it works with \% aswell).

I would suggest using awk instead:
s='This %is% a %test%'
awk -F'%' '{for (i=1; i<NF; i++) p = p $i (i%2 ? "{" : "}"); print p $NF}' <<< "$s"
This {is} a {test}

Related

Use sed to transform a comma space seperated list into a comma seperated list with quotes around each element

I have this
a/b/Test b/c/Test c/d/Test
and want to transform it into:
"a/b/Test", "b/c/Test", "c/d/Test"
I know I can use this (here: path=a/b/Test b/c/Test c/d/Test)
test=$(echo $path | sed 's/ /", "/g')
to transform it into
a/b/Test", "b/c/Test", "c/d/Test
But here I am missing the first and last ".
I dont quite know how to use sed for this. Can I somehow change it and use the anchors ^ and $ to get the first and last part of the string and add " there?
sed 's/.*/"&"/g ; s/ /", "/g' filename
You may use awk:
s='a/b/Test b/c/Test c/d/Test'
awk -v OFS=', ' '{for (i=1; i<=NF; i++) $i = "\"" $i "\""} 1' <<< "$s"
"a/b/Test", "b/c/Test", "c/d/Test"
awk is easier:
awk -v OFS=", " -v q='"' '{for(i=1;i<=NF;i++)$i=q $i q}7'
You may just add double quotes if you have a single line text:
test="a/b/Test b/c/Test c/d/Test"
test='"'$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g')'"'
echo "$test"
See the online demo
If you have multiple lines use
test=$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g; s/^/"/g; s/$/"/g')
test=$(echo "$test" | sed -E 's/[[:space:]]+/",&"/g; s/^|$/"/g')
See this online demo
The [[:space:]]\{1,\} POSIX BRE pattern (equal to [[:space:]]+ POSIX ERE) matches one or more whitespace chars and & in the replacement pattern inserts this matched value back in the resulting string.

Ignore comma after backslash in a line in a text file using awk or sed

I have a text file containing several lines of the following format:
name,list_of_subjects,list_of_sports,school
Eg1: john,science\,social,football,florence_school
Eg2: james,painting,tennis\,ping_pong\,chess,highmount_school
I need to parse the text file and print the output of fields ignoring the escaped commas. Here those will be fields 2 or 3 like this:
science, social
tennis, ping_pong, chess
I do not know how to ignore escaped characters. How can I do it with awk or sed in terminal?
Substitute \, with a character that your records do not contain normally (e.g. \n), and restore it before printing. For example:
$ awk -F',' 'NR>1{ if(gsub(/\\,/,"\n")) gsub(/\n/,",",$2); print $2 }' file
science,social
painting
Since first gsub is performed on the whole record (i.e $0), awk is forced to recompute fields. But the second one is performed on only second field (i.e $2), so it will not affect other fields. See: Changing Fields.
To be able to extract multiple fields with properly escaped commas you need to gsub \ns in all fields with a for loop as in the following example:
$ awk 'BEGIN{ FS=OFS="," } NR>1{ if(gsub(/\\,/,"\n")) for(i=1;i<=NF;++i) gsub(/\n/,"\\,",$i); print $2,$3 }' file
science\,social,football
painting,tennis\,ping_pong\,chess
See also: What's the most robust way to efficiently parse CSV using awk?.
You could replace the \, sequences by another character that won't appear in your text, split the text around the remaining commas then replace the chosen character by commas :
sed $'s/\\\,/\31/g' input | awk -F, '{ printf "Name: %s\nSubjects : %s\nSports: %s\nSchool: %s\n\n", $1, $2, $3, $4 }' | tr $'\31' ','
In this case using the ASCII control char "Unit Separator" \31 which I'm pretty sure your input won't contain.
You can try it here.
Why awk and sed when bash with coreutils is just enough:
# Sorry my cat. Using `cat` as input pipe
cat <<EOF |
name,list_of_subjects,list_of_sports,school
Eg1: john,science\,social,football,florence_school
Eg2: james,painting,tennis\,ping_pong\,chess,highmount_school
EOF
# remove first line!
tail -n+2 |
# substitute `\,` by an unreadable character:
sed 's/\\\,/\xff/g' |
# read the comma separated list
while IFS=, read -r name list_of_subjects list_of_sports school; do
# read the \xff separated list into an array
IFS=$'\xff' read -r -d '' -a list_of_subjects < <(printf "%s" "$list_of_subjects")
# read the \xff separated list into an array
IFS=$'\xff' read -r -d '' -a list_of_sports < <(printf "%s" "$list_of_sports")
echo "list_of_subjects : ${list_of_subjects[#]}"
echo "list_of_sports : ${list_of_sports[#]}"
done
will output:
list_of_subjects : science social
list_of_sports : football
list_of_subjects : painting
list_of_sports : tennis ping_pong chess
Note that this will be most probably slower then solution using awk.
Note that the principle of operation is the same as in other answers - substitute \, string by some other unique character and then use that character to iterate over the second and third field elemetns.
This might work for you (GNU sed):
sed -E 's/\\,/\n/g;y/,\n/\n,/;s/^[^,]*$//Mg;s/\n//g;/^$/d' file
Replace quoted commas by newlines and then revert newlines to commas and commas to newlines. Remove all lines that do not contain a comma. Delete empty lines.
Using Perl. Change the \, to some control char say \x01 and then replace it again with ,
$ cat laxman.txt
john,science\,social,football,florence_school
james,painting,tennis\,ping_pong\,chess,highmount_school
$ perl -ne ' s/\\,/\x01/g and print ' laxman.txt | perl -F, -lane ' for(#F) { if( /\x01/ ) { s/\x01/,/g ; print } } '
science,social
tennis,ping_pong,chess
You can perhaps join columns with a function.
function joincol(col, i) {
$col=$col FS $(col+1)
for (i=col+1; i<NF; i++) {
$i=$(i+1)
}
NF--
}
This might get used thusly:
{
for (col=1; col<=NF; col++) {
if ($col ~ /\\$/) {
joincol(col)
}
}
}
Note that decrementing NF is undefined behaviour in POSIX. It may delete the last field, or it may not, and still be POSIX compliant. This works for me in BSDawk and Gawk. YMMV. May contain nuts.
Use gawk's FPAT:
awk -v FPAT='(\\\\.|[^,\\\\]*)+' '{print $3}' file
#list_of_sports
#football
#tennis\,ping_pong\,chess
then use gnusub to replace the backslashes:
awk -v FPAT='(\\\\.|[^,\\\\]*)+' '{print gensub("\\\\", "", "g", $3)}' file
#list_of_sports
#football
#tennis,ping_pong,chess

Bash sorting commas and strings

This is the list I have, I would like it to appear as the second one (remove commas and break row before the second word)
jkdlfid
ljidklf,
kdjfhda,kdospad,kfmduaj,
hello
lkoplkj
Would like the result to be:
jkdlfid
ljidklf
kdjfhda
kdospad
kfmduaj
hello
lkoplkj
Is there any grep command for this? To clearify I would like to break row before the comma, remove the comma and then break row.
The following will do the trick :
grep -o '[^,]*' file
The idea is to match anything other than comma ([^,] negative character class), zero or more times. However, I would personally use ;
grep -Eo '[^,]+' file
f.awk
function emptyp(s) { # 1 if `s' consists of spaces and tabs
return s ~ /^[ \t]*$/
}
{
n = split($0, a, ",")
for (i=1; i<=n; i++)
if (!emptyp(a[i])) print a[i]
}
f.example
jkdlfid
ljidklf,
kdjfhda,kdospad,kfmduaj,
hello
lkoplkj
Usage:
awk -f f.awk f.example
You can use tr command to do that
I am assuming that your input is in test.txt
tr -cs "[:alpha:]" "\n" < test.txt
You can easily translate commas to newlines and remove any resulting empty lines:
$ printf 'foo\nbar,baz,ban\nbay,bat\n' | tr ',' '\n' | grep -v '^$'
foo
bar
baz
ban
bay
bat
Remove first and last comma by sub and then replace the two commas in between with RS new line.
awk'{sub(/f,/,f"")sub(/j,/,j"")}{gsub(/,/,RS)}1' file
jkdlfid
ljidklf
kdjfhda
kdospad
kfmduaj
hello
lkoplkj

Modify content inside quotation marks, BASH

Good day to all,
I was wondering how to modify the content inside quotation marks and left unmodified the outside.
Input line:
,,,"Investigacion,,, desarrollo",,,
Output line:
,,,"Investigacion, desarrollo",,,
Initial try:
sed 's/\"",,,""*/,/g'
But nothing happens, thanks in advance for any clue
The idiomatic awk way to do this is simply:
$ awk 'BEGIN{FS=OFS="\""} {sub(/,+/,",",$2)} 1' file
,,,"Investigacion, desarrollo",,,
or if you can have more than one set of quoted strings on each line:
$ cat file
,,,"Investigacion,,, desarrollo",,,"foo,,,,bar",,,
$ awk 'BEGIN{FS=OFS="\""} {for (i=2;i<=NF;i+=2) sub(/,+/,",",$i)} 1' file
,,,"Investigacion, desarrollo",,,"foo,bar",,,
This approach works because everything up to the first " is field 1, and everything from there to the second " is field 2 and so on so everything between "s is the even-numbered fields. It can only fail if you have newlines or escaped double quotes inside your fields but that'd affect every other possible solution too so you'd need to add cases like that to your sample input if you want a solution that handles it.
Using a language that has built-in CSV parsing capabilities like perl will help.
perl -MText::ParseWords -ne '
print join ",", map { $_ =~ s/,,,/,/; $_ } parse_line(",", 1, $_)
' file
,,,"Investigacion, desarrollo",,,
Text::ParseWords is a core module so you don't need to download it from CPAN. Using the parse_line method we set the delimiter and a flag to keep the quotes. Then just do simple substitution and join the line to make your CSV again.
Using egrep, sed and tr:
s=',,,"Investigacion,,, desarrollo",,,'
r=$(egrep -o '"[^"]*"|,' <<< "$s"|sed '/^"/s/,\{2,\}/,/g'|tr -d "\n")
echo "$r"
,,,"Investigacion, desarrollo",,,
Using awk:
awk '{ p = ""; while (match($0, /"[^"]*,{2,}[^"]*"/)) { t = substr($0, RSTART, RLENGTH); gsub(/,+/, ",", t); p = p substr($0, 1, RSTART - 1) t; $0 = substr($0, RSTART + RLENGTH); }; $0 = p $0 } 1'
Test:
$ echo ',,,"Investigacion,,, desarrollo",,,' | awk ...
,,,"Investigacion, desarrollo",,,
$ echo ',,,"Investigacion,,, desarrollo",,,",,, "' | awk ...
,,,"Investigacion, desarrollo",,,", "

Extract string from brackets

I'm pretty new at bash so this is a pretty noob question..
Suppose I have a string:
string1 [string2] string3 string4
I would like to extract string2 from the square brackets; but the brackets may be surrounding any other string at any other time.
How would I use sed, etc, to do this? Thanks!
Try this:
echo $str | cut -d "[" -f2 | cut -d "]" -f1
Here's one way using awk:
echo "string1 [string2] string3 string4" | awk -F'[][]' '{print $2}'
This sed option also works:
echo "string1 [string2] string3 string4" | sed 's/.*\[\([^]]*\)\].*/\1/g'
Here's a breakdown of the sed command:
s/ <-- this means it should perform a substitution
.* <-- this means match zero or more characters
\[ <-- this means match a literal [ character
\( <-- this starts saving the pattern for later use
[^]]* <-- this means match any character that is not a [ character
the outer [ and ] signify that this is a character class
having the ^ character as the first character in the class means "not"
\) <-- this closes the saving of the pattern match for later use
\] <-- this means match a literal ] character
.* <-- this means match zero or more characters
/\1 <-- this means replace everything matched with the first saved pattern
(the match between "\(" and "\)" )
/g <-- this means the substitution is global (all occurrences on the line)
In pure bash:
STR="string1 [string2] string3 string4"
STR=${STR#*[}
STR=${STR%]*}
echo $STR
Specify awk multiple delimiters with -F '[delimiters]'
If the delimiters are square brackets, put them back to back like this ][
awk -F '[][]' '{print $2}'
otherwise you will have to escape them
awk -F '[\\[\\]]' '{print $2}'
Other examples to get the value between the brackets:
echo "string1 (string2) string3" | awk -F '[()]' '{print $2}'
echo "string1 {string2} string3" | awk -F '[{}]' '{print $2}'
Here's another one , but it takes care of multiple occurrences, eg
$ echo "string1 [string2] string3 [string4 string5]" | awk -vRS="]" -vFS="[" '{print $2}'
string2
string4 string5
The simple logic is this, you split on "]" and go through the split words finding a "[", then split on "[" to get the first field. In Python
for item in "string1 [string2] string3 [string4 string5]".split("]"):
if "[" in item:
print item.split("]")[-1]
Here is an awk example, but I'm matching on parenthesis which also makes it more obvious of how the -F works.
echo 'test (lskdjf)' | awk -F'[()]' '{print $2}'
Another awk:
$ echo "string1 [string2] string3 [string4]" |
awk -v RS=[ -v FS=] 'NR>1{print $1}'
string2
string4
Read file in which the delimiter is square brackets:
$ cat file
123;abc[202];124
125;abc[203];124
127;abc[204];124
To print the value present within the brackets:
$ awk -F '[][]' '{print $2}' file
202
203
204
At the first sight, the delimiter used in the above command might be confusing. Its simple. 2 delimiters are to be used in this case: One is [ and the other is ]. Since the delimiters itself is square brackets which is to be placed within the square brackets, it looks tricky at the first instance.
Note: If square brackets are delimiters, it should be put in this way only, meaning first ] followed by [. Using the delimiter like -F '[[]]' will give a different interpretation altogether.
Refer this link: http://www.theunixschool.com/2012/07/awk-10-examples-to-read-files-with.html
Inline solution could be:
a="first \"Foo1\" and second \"Foo2\""
echo ${a#*\"} | { read b; echo ${b%%\"*}; }
You can test in single line:
a="first \"Foo1\" and second \"Foo2\""; echo ${a#*\"} | { read b; echo ${b%%\"*}; }
Output: Foo1
Example with brackets:
a="first [Foo1] and second [Foo2]"
echo ${a#*[} | { read b; echo ${b%%]*}; }
That in one line:
a="first [Foo1] and second [Foo2]"; echo ${a#*[} | { read b; echo ${b%%]*}; }
Output: Foo1

Resources