Bash - remove specific textblock from file - bash

I want to remove a specific block of text from a file. I want to find the start of the text block to remove, and remove everything until a specific pattern is found.
Example string to search in:
\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component and then follow many more characters with various special characters -- / ending with another \n---\n that I dont want to remove
I want to remove everything, starting from this string match \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component
So basically, find pattern \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component and remove everything until I match the next \n---\n
Expected output here would be:
\n---\n that I dont want to remove
Things I tried with sed:
sed 's/\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component.*\n---\n//g'
Things I tried with grep:
echo $string | grep -Ewo "\\\n---\\\n# Source: app/templates/deployment.yaml\\\n# template file\napiVersion: apps/v1\\\nkind: Deployment\nmetadata:\\\n name: component"
Nothing really works. Is there any bash wizard that can help?

Using literal strings to avoid having to escape any characters and assuming your target string only exists once in the input:
$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN {
begStr = ARGV[1]
endStr = ARGV[2]
ARGV[1] = ARGV[2] = ""
begLgth = length(begStr)
}
begPos = index($0,begStr) {
tail = substr($0,begPos+begLgth)
endPos = begPos + begLgth + index(tail,endStr) - 1
print substr($0,1,begPos-1) substr($0,endPos)
}
' \
'\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component' \
'\n---\n' \
"${#:--}"
$ ./tst.sh file
\n---\n that I dont want to remove

With your shown samples please try following awk code. Searching string \\n---\\n# Source: app\/templates\/deployment.yaml\\n# template file\\napiVersion: apps\/v1\\nkind: Deployment\\nmetadata:\\n name: component and making field separator as \\\\n---\\\\n then printing last field of that line.
awk -v OFS="\\\\n---\\\\n " -F'\\\\n---\\\\n ' '
/\\n---\\n# Source: \
app\/templates\/deployment.yaml\\n# template \
file\\napiVersion: apps\/v1\\nkind: Deployment\
\\nmetadata:\\n name: component/{
print OFS $NF
}
' Input_file
Output will be as follows:
\n---\n that I dont want to remove

You need to escape the backslashes in the regexp to match them literally.
If the part between \\n---\\n123456789 and \\n---\\n can't contain another -, you can use
sed 's/\\n---\\n123456789[^-]*\\n---\\n//g'
This assumption is needed because sed doesn't support non-greedy quantifiers, and .* will match until the last \\n---\\n, not the next one.

So basically, find pattern \n---\n123456789 and remove everything until I match the next \n---\n
Using gnu-awk it might be simpler by making \n---\n a record separator (a non-regex approach):
s='aaa aaa\n---\n123456789 hha faewb\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb'
awk -v RS='\\\\n---\\\\n' '$1 != 123456789 {ORS=RT; print}' <<< "$s"
aaa aaa\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb

This might work for you (GNU sed):
sed 'N;/\n---$/!{P;D};:a;N;//!ba
s~\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component.*~\n---~' file
Open a two line window and if the second line in the window does not match \n--- print/delete the first of the lines and repeat.
If the second line matches \n---, gather up any following lines until another match is made and if the subsequent lines also match the required string, delete all lines up until the second match.
Otherwise print lines as normal.
N.B. This does not cater for two such matches in a row.

Related

Unix sed command - global replacement is not working

I have scenario where we want to replace multiple double quotes to single quotes between the data, but as the input data is separated with "comma" delimiter and all column data is enclosed with double quotes "" got an issue and the same explained below:
The sample data looks like this:
"int","","123","abd"""sf123","top"
So, the output would be:
"int","","123","abd"sf123","top"
tried below approach to get the resolution, but only first occurrence is working, not sure what is the issue??
sed -ie 's/,"",/,"NULL",/g;s/""/"/g;s/,"NULL",/,"",/g' inputfile.txt
replacing all ---> from ,"", to ,"NULL",
replacing all multiple occurrences of ---> from """ or "" or """" to " (single occurrence)
replacing 1 step changes back to original ---> from ,"NULL", to ,"",
But, only first occurrence is getting changed and remaining looks same as below:
If input is :
"int","","","123","abd"""sf123","top"
the output is coming as:
"int","","NULL","123","abd"sf123","top"
But, the output should be:
"int","","","123","abd"sf123","top"
You may try this perl with a lookahead:
perl -pe 's/("")+(?=")//g' file
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
Where input is:
cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
Breakup:
("")+: Match 1+ pairs of double quotes
(?="): If those pairs are followed by a single "
Using sed
$ sed -E 's/(,"",)?"+(",)?/\1"\2/g' input_file
"int","","123","abd"sf123","top"
"int","","NULL","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
In awk with your shown samples please try following awk code. Written and tested in GNU awk, should work in any version of awk.
awk '
BEGIN{ FS=OFS="," }
{
for(i=1;i<=NF;i++){
if($i!~/^""$/){
gsub(/"+/,"\"",$i)
}
}
}
1
' Input_file
Explanation: Simple explanation would be, setting field separator and output field separator as , for all the lines of Input_file. Then traversing through each field of line, if a field is NOT NULL then Globally replacing all 1 or more occurrences of " with single occurrence of ". Then printing the line.
With sed you could repeat 1 or more times sets of "" using a group followed by matching a single "
Then in the replacement use a single "
sed -E 's/("")+"/"/g' file
For this content
$ cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
The output is
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
sed s'#"""#"#' file
That works. I will demonstrate another method though, which you may also find useful in other situations.
#!/bin/sh -x
cat > ed1 <<EOF
3s/"""/"/
wq
EOF
cp file stack
cat stack | tr ',' '\n' > f2
ed -s f2 < ed1
cat f2 | tr '\n' ',' > stack
rm -v ./f2
rm -v ./ed1
The point of this is that if you have a big csv record all on one line, and you want to edit a specific field, then if you know the field number, you can convert all the commas to carriage returns, and use the field number as a line number to either substitute, append after it, or insert before it with Ed; and then re-convert back to csv.

sed replace string with pipe and stars

I have the following string:
|**barak**.version|2001.0132012031539|
in file text.txt.
I would like to replace it with the following:
|**barak**.version|2001.01.2012031541|
So I run:
sed -i "s/\|\*\*$module\*\*.version\|2001.0132012031539/|**$module**.version|$version/" text.txt
but the result is a duplicate instead of replacing:
|**barak**.version|2001.01.2012031541|**barak**.version|2001.0132012031539|
What am I doing wrong?
Here is the value for module and version:
$ echo $module
barak
$ echo $version
2001.01.2012031541
Assumptions:
lines of interest start and end with a pipe (|) and have one more pipe somewhere in the middle of the data
search is based solely on the value of ${module} existing between the 1st/2nd pipes in the data
we don't know what else may be between the 1st/2nd pipes
the version number is the only thing between the 2nd/3rd pipes
we don't know the version number that we'll be replacing
Sample data:
$ module='barak'
$ version='2001.01.2012031541'
$ cat text.txt
**barak**.version|2001.0132012031539| <<<=== leave this one alone
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.0132012031539| <<<=== replace this one
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.0132012031539| <<<=== replace this one
One sed solution with -Extended regex support enabled and making use of a capture group:
$ sed -E "s/^(\|[^|]*${module}[^|]*).*/\1|${version}|/" text.txt
Where:
\| - first occurrence (escaped pipe) tells sed we're dealing with a literal pipe; follow-on pipes will be treated as literal strings
^(\|[^|]*${module}[^|]*) - first capture group that starts at the beginning of the line, starts with a pipe, then some number of non-pipe characters, then the search pattern (${module}), then more non-pipe characters (continues up to next pipe character)
.* - matches rest of the line (which we're going to discard)
\1|${version}| - replace line with our first capture group, then a pipe, then the new replacement value (${version}), then the final pipe
The above generates:
**barak**.version|2001.0132012031539|
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.01.2012031541| <<<=== replaced
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.01.2012031541| <<<=== replaced
An awk alternative using GNU awk:
awk -v mod="$module" -v vers="$version" -F \| '{ OFS=FS;split($2,map,".");inmod=substr(map[1],3,length(map[1])-4);if (inmod==mod) { $3=vers } }1' file
Pass two variables mod and vers to awk using $module and $version. Set the field delimiter to |. Split the second field into array map using the split function and using . as the delimiter. Then strip the leading and ending "**" from the first index of the array to expose the module name as inmod using the substr function. Compare this to the mod variable and if there is a match, change the 3rd delimited field to the variable vers. Print the lines with short hand 1
Pipe is only special when you're using extended regular expressions: sed -E
There's no reason why you need extended here, stick with basic regex:
sed "
# for lines matching module.version
/|\*\*$module\*\*.version|/ {
# replace the version
s/|2001.0132012031539|/|$version|/
}
" text.txt
or as an unreadable one-liner
sed "/|\*\*$module\*\*.version|/ s/|2001.0132012031539|/|$version|/" text.txt

bash script: how to insert text between two specific characters

For example, I have a file containing a line as below:
"abc":"def"
I need to insert 123 between "abc":" and def" so that the line will become: "abc":"123def".
As "abc" appears only once so I think I can just search it and do the insertion.
How to do this with bash script such as sed or awk?
AMD$ sed 's/"abc":"/&123/' File
"abc":"123def"
Match "abc":", then append this match with 123 (& will contain the matched string "abc":")
If you want to take care of space before and after :, you can use:
sed 's/"abc" *: *"/&123/'
For replacing all such patterns, use g with sed.
sed 's/"abc" *: *"/&123/g' File
sed:
$ sed -E 's/(:")(.*)/\1123\2/' <<<'"abc":"def"'
"abc":"123def"
(:") gets :" and put in captured group 1
(.*) gets the remaining portion and put in captured group 2
in the replacement, \1123\2 puts 123 between the groups
awk:
$ awk -F: 'sub(".", "&123", $2)' <<<'"abc":"def"'
"abc" "123def"
In the sub() function, the second ($2) field is being operated on, pattern is used as . (which would match "), and in the replacement the matched portion (&) is followed by 123.
echo '"abc":"def"'| awk '{sub(/def/,"123def")}1'
"abc":"123def"

Replace first match after a match with sed

what I want to achieve is to find an string in file and go on finding first occurrence of another match and replace that with some value.
Ex: string = {Name: name;Address:someadd;var1:var1;var2:var2},{Name: differntName;Address:someadd;var1:var1;var2:var2}
Now what i need to do is to find Name: name and then find first occurrence of "var2:var2" after "Name: name" and replace it with "var2:newvarvalue"
Please note that i need to complete this task with sed in bash scripting.
Thanks in advance.
Edit : i am trying to modify .yaml docker compose file
Here is a terrible solution using sed :
split on ,, one part per line :
sed 's/,/\n/g'
replace var2:var2 by var2:newvarvalue if on the same line as Name: name :
sed '/Name: name/s/var2:var2/var2:newvarvalue/'
or
sed -E 's/(Name: name.*)var2:var2/\1var2:newvarvalue/'
It's terrible because any extra comma or linefeed might break the whole thing.
var='{Name: name;Address:somea,dd;var1:var1;var2:var2},{Name: differntName;Address:someadd;var1:var1;var2:var2}'
NEW_VAL='new_value'
IFS=$'\n'
OBJECTS=("$(echo "${var}" | sed -nE 's/[^{]*(\{[^}]+\})/\1\n/gp')")
for obj in "${OBJECTS[#]}"; do
echo "${obj}" | sed -E 's/(.*var2:)(var2)(.*)/\1'"${NEW_VAL}"'\3/'
done
Output:
{Name: name;Address:somea,dd;var1:var1;var2:new_value}
{Name: differntName;Address:someadd;var1:var1;var2:new_value}
This solution accounts for a comma in the object (as exemplified in the first object) by extracting each and setting the delimiter to a newline.

Need regex to remove a character in a datetime string in csv file

I have a csv file which has the following string:
"2016-10-25T14:07:49.298-07:00"
which I would like to replace with:
"2016-10-25", "14:07:49"
I matched the original string with a regular expression:
([0-9]{4}-[0-9]{2}-[0-9]{2})[T]([0-9]{2}\:[0-9]{2}\:[0-9]{2})\.[0-9]{3}-07\:00
but I need some help
With awk, assuming T and . are unique
$ echo '"2016-10-25T14:07:49.298-07:00"' | awk -F'[T.]' '{print $1 "\", \"" $2 "\""}'
"2016-10-25", "14:07:49"
-F'[T.]' assign T or . as field separator
Then print first and second field with required formatting
With sed:
sed -E 's/^([^T]+)T([^.]+).*/\1", "\2"/'
^([^T]+) matches the portion upto T, and put that in captured group 1
T matches T literally
([^.]+) matches upto next ., and put that in captured group (2)
.* matches the rest
in the replacement, the captured groups are used with proper formatting to get desired output, \1", "\2"
Example:
$ sed -E 's/^([^T]+)T([^.]+).*/\1", "\2"/' <<<'"2016-10-25T14:07:49.298-07:00"'
"2016-10-25", "14:07:49"

Resources