Use sed to transform a comma space seperated list into a comma seperated list with quotes around each element - bash

I have this
a/b/Test b/c/Test c/d/Test
and want to transform it into:
"a/b/Test", "b/c/Test", "c/d/Test"
I know I can use this (here: path=a/b/Test b/c/Test c/d/Test)
test=$(echo $path | sed 's/ /", "/g')
to transform it into
a/b/Test", "b/c/Test", "c/d/Test
But here I am missing the first and last ".
I dont quite know how to use sed for this. Can I somehow change it and use the anchors ^ and $ to get the first and last part of the string and add " there?

sed 's/.*/"&"/g ; s/ /", "/g' filename

You may use awk:
s='a/b/Test b/c/Test c/d/Test'
awk -v OFS=', ' '{for (i=1; i<=NF; i++) $i = "\"" $i "\""} 1' <<< "$s"
"a/b/Test", "b/c/Test", "c/d/Test"

awk is easier:
awk -v OFS=", " -v q='"' '{for(i=1;i<=NF;i++)$i=q $i q}7'

You may just add double quotes if you have a single line text:
test="a/b/Test b/c/Test c/d/Test"
test='"'$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g')'"'
echo "$test"
See the online demo
If you have multiple lines use
test=$(echo "$test" | sed 's/[[:space:]]\{1,\}/",&"/g; s/^/"/g; s/$/"/g')
test=$(echo "$test" | sed -E 's/[[:space:]]+/",&"/g; s/^|$/"/g')
See this online demo
The [[:space:]]\{1,\} POSIX BRE pattern (equal to [[:space:]]+ POSIX ERE) matches one or more whitespace chars and & in the replacement pattern inserts this matched value back in the resulting string.

Related

getting first part of a string that has two parts

I have a string that has two parts (path and owner) both separated by a space.
This is the input file input.txt
/dir1/dir2/file1 #owner1
/dir1/dir2/foo\ bar #owner2
I want to extract all the paths to a separate output file - output.txt
I cannot use space as delimiter since paths can also have filenames with space and delimiter in them
/dir1/dir2/file1
/dir1/dir2/foo\ bar
Here could be a different way of doing it with rev + GNU grep:
rev file | grep -oP '.*# \K.*' | rev
OR
rev file | grep -oP '.*#\s+\K.*' | rev
With original simple solution go with:
awk -F' #' '{print $1}' Input_file
Assuming spaces that shouldn't be parsed as delimiters are escaped by a backslash as in your sample, you could use the following regex :
^(\\ |[^ ])*
For instance with grep :
grep -oE '^(\\ |[^ ])*'
The regex matches from the start of the line any number of either a backslash followed by a space or any other character than a space and will stop at the first occurence of a space that isn't preceded by a backslash.
You can try it here.
I would trim the ending part with sed.
sed 's/ [^ ]*$//' /path/to/file
This will match from the end of the line:
(blank) matches the space character
[^ ]* matches the longest string that contains no spaces, i.e. #owner1
$ matches the end of the line
And they will be replaced by nothing, which will act as if you deleted the matched string.
A one-line would do it:
while read p _; do printf '%q\n' "$p"; done <input.txt >output.txt
You can put them in an array and process
mapfile test < input.txt; test=("${test[#]% *}")
echo "${test[#]}"
echo "${test[0]}"
echo "${test[1]}"
You can try with simple awk
awk ' { $NF=""; print } '
Try it here https://ideone.com/W8J1ZO

Extract string between qoutes in a script

my text-
(
"en-US"
)
what i need -
en-US
currently im able to get it by piping it with
... | tr -d '[:space:]' | sed s/'("'// | sed s/'("'// | sed s/'")'//
I wonder if there is a simple way to extract the string between the qoutes rather than chopping off useless parts one by one
... | grep -oP '(?<=").*(?=")'
Explanation:
-o: Only output matching string
-P: Use Perl style RegEx
(?<="): Lookbehind, so only match text that is preceded by a double quote
.*: Match any characters
(?="): Lookahead, so only match text that is followed by a double quote
With sed
echo '(
"en-US"
)' | sed -rn 's/.*"(.*)".*/\1/p'
with 2 commands
echo '(
"en-US"
)' | tr -d "\n" | cut -d '"' -f2
Could you please try following. Where var is the bash variable haveing shown sample value stored in it.
echo "$var" | awk 'match($0,/".*"/){print substr($0,RSTART+1,RLENGTH-2)}'
Explanation: Following is only for explanation purposes.
echo "$var" | ##Using echo to print variable named var and using |(pipe) to send its output to awk command as an Input.
awk ' ##Starting awk program from here.
match($0,/".*"/){ ##using match function of awk to match a regex which is to match from till next occurrence of " by this match 2 default variables named RSTART and RLENGTH will be set as per values.
print substr($0,RSTART+1,RLENGTH-2) ##Where RSTART means starting point index of matched regex and RLENGTH means matched regex length, here printing sub-string whose starting point is RSTART and ending point of RLENGTH to get only values between " as per request.
}' ##Closing awk command here.
Consider using
... | grep -o '"[^"]\{1,\}"' | sed -e 's/^"//' -e 's/"$//'
grep will extract all substrings between quotes (excluding empty ones), the sed later will remove the quotes on both ends.
And this one ?
... | grep '"' | cut -d '"' -f 2
It works if you have just 1 quoted value by line.

Repeatly replace a delimiter at a given count (4), with another character

Given this line:
12,34,56,47,56,34,56,78,90,12,12,34,45
If the count of the commas(,) is greater than four, replace 4th comma(,) with ||.
If the count is lesser or equal to 4 no need replace the comma(,).
I am able to find the count by the following awk:
awk -F\, '{print NF-1}' text.txt
then I used an if condition to check if the result is greater than 4. But unable to replace 4th comma with ||
Find the count of the delimiter in a line and replace the particular position with another character.
Update:
I want to replace comma with || symbol after every 4th occurrence of the comma. Sorry for the confusion.
Expected output:
12,34,56,47||56,34,56,78||90,12,12,34||45
With GNU awk for gensub():
$ echo '12,34,56,47,56,34' | awk -F, 'NF>5{$0=gensub(/,/,"||",4)}1'
12,34,56,47||56,34
$ echo '12,34,56,47,56' | awk -F, 'NF>5{$0=gensub(/,/,"||",4)}1'
12,34,56,47,56
$ echo 12,34,56,47,56,34,56,78,90,12,12,34,45 | sed 's/,/||/4'
12,34,56,47||56,34,56,78,90,12,12,34,45
$ echo 12,34,56,47 | sed 's/,/||/4'
12,34,56,47
Should work with any POSIX sed
Update:
For the updated question you can use
$ echo 12,34,56,47,56,34,56,78,90,12,12,34,45 | sed -e 's/\(\([^,]*,\)\{3\}[^,]*\),/\1||/g'
12,34,56,47||56,34,56,78||90,12,12,34||45
Unfortunately, POSIX sed's s command can take either a number or g as a flag, but not both. GNU sed allows the combination, but it does not do what we want in this case. So you have to spell it out in the regular expression.
Using awk you can do:
s='12,34,56,47,56,34,56,78,90,12,12,34,45'
awk -F, '{for (i=1; i<NF; i++) printf "%s%s", $i, (i%4?FS:"||"); print $i}' <<< "$s"
12,34,56,47||56,34,56,78||90,12,12,34||45
if the count is greater than four i want to replace 4th comma(,) with
||
give this line a try (gnu sed):
sed -r '/([^,]*,){4}.*,/s/,/||/4' file
test:
kent$ echo ",,,,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,||,
kent$ echo ",,,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,,
kent$ echo ",,,"|sed -r '/([^,]*,){4}.*,/s/,/||/4'
,,,
with awk
awk -F, 'NF-1>4{for(i=1;i<NF;i++){if(i==4)k=k$i"||";else k=k$i","} print k$NF}' filename

How to retrieve digits including the separator "."

I am using grep to get a string like this: ANS_LENGTH=266.50 then I use sed to only get the digits: 266.50
This is my full command: grep --text 'ANS_LENGTH=' log.txt | sed -e 's/[^[[:digit:]]]*//g'
The result is : 26650
How can this line be changed so the result still shows the separator: 266.50
You don't need grep if you are going to use sed. Just use sed' // to match the lines you need to print.
sed -n '/ANS_LENGTH/s/[^=]*=\(.*\)/\1/p' log.txt
-n will suppress printing of lines that do not match /ANS_LENGTH/
Using captured group we print the value next to = sign.
p flag at the end allows to print the lines that matches our //.
If your grep happens to support -P option then you can do:
grep -oP '(?<=ANS_LENGTH=).*' log.txt
(?<=...) is a look-behind construct that allows us to match the lines you need. This requires the -P option
-o allows us to print only the value part.
You need to match a literal dot as well as the digits.
Try sed -e 's/[^[[:digit:]\.]]*//g'
The dot will match any single character. Escaping it with the backslash will match only a literal dot.
Here is some awk example:
cat file:
some data ANS_LENGTH=266.50 other=22
not mye data=43
gnu awk (due to RS)
awk '/ANS_LENGTH/ {f=NR} f&&NR-1==f' RS="[ =]" file
266.50
awk '/ANS_LENGTH/ {getline;print}' RS="[ =]" file
266.50
Plain awk
awk -F"[ =]" '{for(i=1;i<=NF;i++) if ($i=="ANS_LENGTH") print $(i+1)}' file
266.50
awk '{for(i=1;i<=NF;i++) if ($i~"ANS_LENGTH") {split($i,a,"=");print a[2]}}' file
266.50

Extract string from brackets

I'm pretty new at bash so this is a pretty noob question..
Suppose I have a string:
string1 [string2] string3 string4
I would like to extract string2 from the square brackets; but the brackets may be surrounding any other string at any other time.
How would I use sed, etc, to do this? Thanks!
Try this:
echo $str | cut -d "[" -f2 | cut -d "]" -f1
Here's one way using awk:
echo "string1 [string2] string3 string4" | awk -F'[][]' '{print $2}'
This sed option also works:
echo "string1 [string2] string3 string4" | sed 's/.*\[\([^]]*\)\].*/\1/g'
Here's a breakdown of the sed command:
s/ <-- this means it should perform a substitution
.* <-- this means match zero or more characters
\[ <-- this means match a literal [ character
\( <-- this starts saving the pattern for later use
[^]]* <-- this means match any character that is not a [ character
the outer [ and ] signify that this is a character class
having the ^ character as the first character in the class means "not"
\) <-- this closes the saving of the pattern match for later use
\] <-- this means match a literal ] character
.* <-- this means match zero or more characters
/\1 <-- this means replace everything matched with the first saved pattern
(the match between "\(" and "\)" )
/g <-- this means the substitution is global (all occurrences on the line)
In pure bash:
STR="string1 [string2] string3 string4"
STR=${STR#*[}
STR=${STR%]*}
echo $STR
Specify awk multiple delimiters with -F '[delimiters]'
If the delimiters are square brackets, put them back to back like this ][
awk -F '[][]' '{print $2}'
otherwise you will have to escape them
awk -F '[\\[\\]]' '{print $2}'
Other examples to get the value between the brackets:
echo "string1 (string2) string3" | awk -F '[()]' '{print $2}'
echo "string1 {string2} string3" | awk -F '[{}]' '{print $2}'
Here's another one , but it takes care of multiple occurrences, eg
$ echo "string1 [string2] string3 [string4 string5]" | awk -vRS="]" -vFS="[" '{print $2}'
string2
string4 string5
The simple logic is this, you split on "]" and go through the split words finding a "[", then split on "[" to get the first field. In Python
for item in "string1 [string2] string3 [string4 string5]".split("]"):
if "[" in item:
print item.split("]")[-1]
Here is an awk example, but I'm matching on parenthesis which also makes it more obvious of how the -F works.
echo 'test (lskdjf)' | awk -F'[()]' '{print $2}'
Another awk:
$ echo "string1 [string2] string3 [string4]" |
awk -v RS=[ -v FS=] 'NR>1{print $1}'
string2
string4
Read file in which the delimiter is square brackets:
$ cat file
123;abc[202];124
125;abc[203];124
127;abc[204];124
To print the value present within the brackets:
$ awk -F '[][]' '{print $2}' file
202
203
204
At the first sight, the delimiter used in the above command might be confusing. Its simple. 2 delimiters are to be used in this case: One is [ and the other is ]. Since the delimiters itself is square brackets which is to be placed within the square brackets, it looks tricky at the first instance.
Note: If square brackets are delimiters, it should be put in this way only, meaning first ] followed by [. Using the delimiter like -F '[[]]' will give a different interpretation altogether.
Refer this link: http://www.theunixschool.com/2012/07/awk-10-examples-to-read-files-with.html
Inline solution could be:
a="first \"Foo1\" and second \"Foo2\""
echo ${a#*\"} | { read b; echo ${b%%\"*}; }
You can test in single line:
a="first \"Foo1\" and second \"Foo2\""; echo ${a#*\"} | { read b; echo ${b%%\"*}; }
Output: Foo1
Example with brackets:
a="first [Foo1] and second [Foo2]"
echo ${a#*[} | { read b; echo ${b%%]*}; }
That in one line:
a="first [Foo1] and second [Foo2]"; echo ${a#*[} | { read b; echo ${b%%]*}; }
Output: Foo1

Resources