Sed conditional match and execute command with offset - bash

I am finding a bash command for a conditional replacement with offset. The existing posts that I've found are conditional replacement without offset or with a fixed offset.
Task: If uid contains 8964, then insert the line FORBIDDEN before DOB.
Each TXT file below represents one user, and it contains (in the following order)
some property(ies)
unique uid
some quality(ies)
unique DOB
a random lorem ipsum
I hope I can transform the following files
# file1.txt (uid doens't match 8964)
admin: false
uid: 123456
happy
movie
DOB: 6543-02-10
lorem ipsum
seo varis lireccuni paccem noba sako
# file2.txt (uid matches 8964)
citizen: true
hasSEAcct: true
uid: 289641
joyful hearty
final debug Juno XYus
magazine
DOB: 1234-05-06
saadi torem lopez dupont
into
# file1.txt (uid doens't match 8964)
admin: false
uid: 123456
happy
movie
DOB: 6543-02-10
lorem ipsum
seo varis lireccuni paccem noba sako
# file2.txt (uid matches 8964)
citizen: true
hasSEAcct: true
uid: 289641
joyful hearty
final debug Juno XYus
magazine
FORBIDDEN
DOB: 1234-05-06
saadi torem lopez dupont
My try:
If uid contains 8964, then do a 2nd match with DOB, and insert FORBIDDEN above DOB.
sed '/^uid: [0-9]*8964[0-9]*$/{n;/^DOB: .*$/{iFORBIDDEN}}' file*.txt
This gives me an unmatched { error.
sed: -e expression #1, char 0: unmatched `{'
I know that sed '/PAT/{n;p}' will execute {n;p} if PAT is matched, but it seems impossible to put /PAT2/{iTEXT} inside /PAT/{ }.
How can I perform such FORBIDDEN insertion?

$ awk '
/^uid/ && /8964/ {f=1} #1
/^DOB/ && f {print "FORBIDDEN"; f=0} #2
1 #3
' file
If a line starting with "uid" matches "8964", set flag
If a line starts with "DOB" and flag is set, print string and unset flag
print every line
$ awk -v RS='' '/uid: [0-9]*8964/{sub(/DOB/, "FORBIDDEN\nDOB")} 1' file
Alternatively, treat every block separated by a blank line as a single record, then sub in "FORBIDDEN\nDOB" if there's a match. I think the first one's better practice. As a very general rule, once you start thinking in terms of fields/records, it's time for awk/perl.

In my opinion, this is a good use-case for sed.
Here is a GNU sed solution with some explanation:
# script.sed
/^uid:.*8964/,/DOB/ { # Search only inside this range, if it exists.
/DOB/i FORBIDDEN # Insert FORBIDDEN before the line matching /DOB/.
}
Testing:
▶ gsed -f script.sed FILE2
citizen: true
hasSEAcct: true
uid: 289641
joyful hearty
final debug Juno XYus
magazine
FORBIDDEN
DOB: 1234-05-06
saadi torem lopez dupont
▶ gsed -f script.sed FILE1
admin: false
uid: 123456
happy
movie
DOB: 6543-02-10
lorem ipsum
seo varis lireccuni paccem noba sako
Or on one line:
▶ gsed -e '/^uid:.*8964/,/DOB/{/DOB/i FORBIDDEN' -e '}' FILE*

tried on gnu sed
sed -Ee '/^uid:\s*\w*8964\w*$/{n;/^DOB:/iFORBIDDEN' -e '}' file*.txt

Related

Bash - remove specific textblock from file

I want to remove a specific block of text from a file. I want to find the start of the text block to remove, and remove everything until a specific pattern is found.
Example string to search in:
\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component and then follow many more characters with various special characters -- / ending with another \n---\n that I dont want to remove
I want to remove everything, starting from this string match \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component
So basically, find pattern \n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component and remove everything until I match the next \n---\n
Expected output here would be:
\n---\n that I dont want to remove
Things I tried with sed:
sed 's/\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component.*\n---\n//g'
Things I tried with grep:
echo $string | grep -Ewo "\\\n---\\\n# Source: app/templates/deployment.yaml\\\n# template file\napiVersion: apps/v1\\\nkind: Deployment\nmetadata:\\\n name: component"
Nothing really works. Is there any bash wizard that can help?
Using literal strings to avoid having to escape any characters and assuming your target string only exists once in the input:
$ cat tst.sh
#!/usr/bin/env bash
awk '
BEGIN {
begStr = ARGV[1]
endStr = ARGV[2]
ARGV[1] = ARGV[2] = ""
begLgth = length(begStr)
}
begPos = index($0,begStr) {
tail = substr($0,begPos+begLgth)
endPos = begPos + begLgth + index(tail,endStr) - 1
print substr($0,1,begPos-1) substr($0,endPos)
}
' \
'\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component' \
'\n---\n' \
"${#:--}"
$ ./tst.sh file
\n---\n that I dont want to remove
With your shown samples please try following awk code. Searching string \\n---\\n# Source: app\/templates\/deployment.yaml\\n# template file\\napiVersion: apps\/v1\\nkind: Deployment\\nmetadata:\\n name: component and making field separator as \\\\n---\\\\n then printing last field of that line.
awk -v OFS="\\\\n---\\\\n " -F'\\\\n---\\\\n ' '
/\\n---\\n# Source: \
app\/templates\/deployment.yaml\\n# template \
file\\napiVersion: apps\/v1\\nkind: Deployment\
\\nmetadata:\\n name: component/{
print OFS $NF
}
' Input_file
Output will be as follows:
\n---\n that I dont want to remove
You need to escape the backslashes in the regexp to match them literally.
If the part between \\n---\\n123456789 and \\n---\\n can't contain another -, you can use
sed 's/\\n---\\n123456789[^-]*\\n---\\n//g'
This assumption is needed because sed doesn't support non-greedy quantifiers, and .* will match until the last \\n---\\n, not the next one.
So basically, find pattern \n---\n123456789 and remove everything until I match the next \n---\n
Using gnu-awk it might be simpler by making \n---\n a record separator (a non-regex approach):
s='aaa aaa\n---\n123456789 hha faewb\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb'
awk -v RS='\\\\n---\\\\n' '$1 != 123456789 {ORS=RT; print}' <<< "$s"
aaa aaa\n---\naaaaaa\n---\n67891 0238\n---\nbbbf bb
This might work for you (GNU sed):
sed 'N;/\n---$/!{P;D};:a;N;//!ba
s~\n---\n# Source: app/templates/deployment.yaml\n# template file\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n name: component.*~\n---~' file
Open a two line window and if the second line in the window does not match \n--- print/delete the first of the lines and repeat.
If the second line matches \n---, gather up any following lines until another match is made and if the subsequent lines also match the required string, delete all lines up until the second match.
Otherwise print lines as normal.
N.B. This does not cater for two such matches in a row.

extract text between two words using sed

I want to extract text in shell variable which is in between two matching words/characters, like below.
Input string-
extract='sometext Query State: FINISHED\n Query Status: OK\n soonnnnnnnnnnnn Query State: STARTING\n'
I want to extract the query state, which is in between the text 'Query State' and first occurrence of '\n'
I have used below sed expression-
query_state=$(echo $extract | sed 's/.*Query State: \(.*\)\\n .*/\1/')
but I get output as - FINISHED\n Query Status: OK,
basically, the above is giving eveything between the the words 'Query Status' and last occurence of '\n'.
So, I changed to sed expression like below to get output 'FINISHED'
query_state=$(echo $extract | sed 's/.*Query State: \(.*\)\\n Query Status.*/\1/')
But above expression is having hard dependency on the text 'Query Status'. How can I change the expression to get exactly at first occurence of '\n' ?
Update:I want to extract the query state, which is in between the first occurence of text 'Query State' followed by first occurrence of '\n'
-Thanks
grep solution (since you are only looking to find a match, you are not looking to edit anything):
$ echo "$extract"
sometext Query State: FINISHED\n Query Status: OK\n soonnnnnnnnnnnn
$ echo "$extract" | grep -oP '(?<=Query State: ).*?(?=\\n)'
FINISHED
Explanation:
-o Return only the matched substring (this will return all matches, one per line)
-P For perl-compatible regular expressions; needed for lookaround as well as lazy quantifier
(?<= ... ) lookbehind : The match should start at a position immediately following the last character (in this case, the space) between the opening sequence (?<= and the closing parenthesis.
.*? zero or more characters (any characters), as few as possible. *? is called lazy (or non-greedy) quantifier.
(?=\\n) lookahead : Similar to lookbehind. Backslash must be escaped.
EDIT:
If the "Query State: ..." fragment may appear at the very end of the string, not terminated by the \n marker, and if in that case the state must still be returned, the regular expression needs to be modified as follows:
$ echo $extract
sometext Query State: FINISHED
$ echo $extract | grep -oP '(?<=Query State: ).*?((?=\\n)|$)'
FINISHED
Notice the alternation in the lookahead: we are looking for the substring \n or the end of the input string; either one will work.
For a short case you can consider an extra call to sed:
echo "$extract" | sed -n 's/\\n/\n/g; s/.*Query State: //p'
Can you tell anything about the possible values of the state? Another solution can be something like
echo "$extract" | sed -r 's/.*Query State: ([A-Za-z ]*).*/\1/'
This works.
extract='sometext Query State: FINISHED\n Query Status: OK\n soonnnnnnnnnnnn'
echo "$extract" | sed 's/.*Query State: \([^\\n]\+\).*$/\1/'
Output
FINISHED
Works with awk
echo "$extract" | awk -F'[: \\\\n]+' '{print $4}'

How to get the value after a particular string is found in a text file , using shell scripting

I have a text file and the contents of the file are as follows:
#SERVICE INFO:
srv id [8503]
serv rqst id xxxxxx
serv rqst len [17]
serv status [C]
#SERVICE INFO:
srv id [8501]
serv rqst id xxxxxx
serv rqst len [17]
serv status [C]
#SERVICE INFO:
srv id [8500]
serv rqst id xxxxxx
serv rqst len [17]
serv status [C]
I want to read the srv id and find its corresponding status and use it for further validation.
For ex:
for srv id 9500, serv status is C
I have tried the below awk statement:
awk '{for (I=1;I<=NF;I++) if ($I == "service id") {print $(I+1)};}' $testfile
It gives me a blank output.
Here testfile is my sample text file.
Any input is appreciated.
awk -F '[][]' '$1 ~ /srv id/ {id = $2} $1 ~ /serv status/ {print id, $2}' file
That uses [ or ] as the field separator. If the first field contains "srv id", remember the id. If the first field contains "serv status", print the id and the status value.
Output:
8503 C
8501 C
8500 C
If you don't mind Perl:
perl -00 -ne 'm{srv id.+?(\d+).+status.+\[(\w)\]}s and print "$1 $2\n"' file
This yields:
8503 C
8501 C
8500 C
The -00 switch tells Perl to read the file in "paragraph mode" where a record separator is one or more blank lines.
We match a sequence of characters that begin with "srv id" and ends with the token "status" followed by its value. A dot is any character; the + signifies one or more; and the +? denotes non-greedy matching. \d signifies a digit and \w a word character. Opening and closing square brackets must be escaped to mean themselves. In order to have a dot match a newline character, we add the s modifier at the end of the match pattern m{...}s
Should you want to look up an ID to print its status, simply grep for the ID in the output, piped to grep:
perl ... | grep 8501

Using awk to print a new column without apostrophes or spaces

I'm processing a text file and adding a column composed of certain components of other columns. A new requirement to remove spaces and apostrophes was requested and I'm not sure the most efficient way to accomplish this task.
The file's content can be created by the following script:
content=(
john smith thomas blank 123 123456 10
jane smith elizabeth blank 456 456123 12
erin "o'brien" margaret blank 789 789123 9
juan "de la cruz" carlos blank 1011 378943 4
)
# put this into a tab-separated file, with the syntactic (double) quotes above removed
printf '%s\t%s\t%s\t%s\t%s\t%s\t%s\n' "${content[#]}" >infile
This is what I have now, but it fails to remove spaces and apostrophes:
awk -F "\t" '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$6 tolower(substr($2,0,3)); }' infile > outfile
This throws an error "sub third parameter is not a changeable object", which makes sense since I'm trying to process output instead of input, I guess.
awk -F "\t" '{OFS="\t"; print $1,$2,$3,$5,$6,$7,$6 sub("'\''", "",tolower(substr($2,0,3))); }' infile > outfile
Is there a way I can print a combination of column 6 and part of column 2 in lower case, all while removing spaces and apostrophes from the output to the new column? Worst case scenario, I can just create a new file with my first command and process that output with a new awk command, but I'd like to do it in one pass is possible.
The second approach was close, but for order of operations:
awk -F "\t" '
BEGIN { OFS="\t"; }
{
var=$2;
sub("['\''[:space:]]", "", var);
var=substr(var, 0, 3);
print $1,$2,$3,$5,$6,$7,$6 var;
}
'
Assigning the contents you want to modify to a variable lets that variable be modified in-place.
Characters you want to remove should be removed before taking the substring, since otherwise you shorten your 3-character substring.
It's a guess since you didn't provide the expected output but is this what you're trying to do?
$ cat tst.awk
BEGIN { FS=OFS="\t" }
{
abbr = $2
gsub(/[\047[:space:]]/,"",abbr)
abbr = tolower(substr(abbr,1,3))
print $1,$2,$3,$5,$6,$7,$6 abbr
}
$ awk -f tst.awk infile
john smith thomas 123 123456 10 123456smi
jane smith elizabeth 456 456123 12 456123smi
erin o'brien margaret 789 789123 9 789123obr
juan de la cruz carlos 1011 378943 4 378943del
Note that the way to represent a ' in a '-enclosed awk script is with the octal \047 (which will continue to work if/when you move your script to a file, unlike if you relied on "'\''" which only works from the command line), and that strings, arrays, and fields in awk start at 1, not 0, so your substr(..,0,3) is wrong and awk is treating the invalid start position of 0 as if you had used the first valid start position which is 1.
The "sub third parameter is not a changeable object" error you were getting is because sub() modifies the object you call it with as the 3rd argument but you're calling it with a literal string (the output of tolower(substr(...))) and you can't modify a literal string - try sub(/o/,"","foo") and you'll get the same error vs if you used var="foo"; sub(/o/,"",var) which is valid since you can modify the content of variables.

bash script: how to insert text between two specific characters

For example, I have a file containing a line as below:
"abc":"def"
I need to insert 123 between "abc":" and def" so that the line will become: "abc":"123def".
As "abc" appears only once so I think I can just search it and do the insertion.
How to do this with bash script such as sed or awk?
AMD$ sed 's/"abc":"/&123/' File
"abc":"123def"
Match "abc":", then append this match with 123 (& will contain the matched string "abc":")
If you want to take care of space before and after :, you can use:
sed 's/"abc" *: *"/&123/'
For replacing all such patterns, use g with sed.
sed 's/"abc" *: *"/&123/g' File
sed:
$ sed -E 's/(:")(.*)/\1123\2/' <<<'"abc":"def"'
"abc":"123def"
(:") gets :" and put in captured group 1
(.*) gets the remaining portion and put in captured group 2
in the replacement, \1123\2 puts 123 between the groups
awk:
$ awk -F: 'sub(".", "&123", $2)' <<<'"abc":"def"'
"abc" "123def"
In the sub() function, the second ($2) field is being operated on, pattern is used as . (which would match "), and in the replacement the matched portion (&) is followed by 123.
echo '"abc":"def"'| awk '{sub(/def/,"123def")}1'
"abc":"123def"

Resources