Shell script to insert string in line if it does not contain pattern - shell

I have line which may or not contain [TICKET: 12345(any 5 digit number)] in it.
Want to write shell script, which will insert[TICKET: 54234(random 5 digit number)] if it does not exist.
Example:
Subject: [TICKET: 12345] test subject // do nothing
Output: Subject: [TICKET: 12345] test subject
Subject: test subject //insert [TICKET: 12345]
Output: Subject: [TICKET: 12345] test subject
I tried some sed and awk commands but could not achieve the result

Related

How to search for multiple Substrings in a string

I'm new to bash programming and I've been trying to write a function to search for multiple substrings in a string for log analysis.
For example:
I have a Log-File which contains a string like this:
"01-01-2020 STREETNEW Function Triggered Command 3 processed - New street created."
Now I want to search for 2 substrings in this string.
The first substring I'm looking for is "Command 3" to identify which action was triggered. If "Command 3" is found, I want to search for a second substring "New street created" to check the output of the triggered action.
So far, I wrote a contain function which helps me to find a match and this works fine so far. The problem is, that this function is only able to find a match for one substring.
My function looks like this:
declare -a arrayLog # This Array contains my Log-File line per line
declare -a arrayCodeNumber # Containing the code numbers, e.g. "Command 3"
declare -a arrayActionTriggered # Containing the result of the triggered action, e.g. "New street created"
#[... build arrays etc...]
function contains() {
local n=$#
local value=${!n}
for ((i=1;i < $#;i++)) {
shopt -s nocasematch
[[ "${!i}" =~ "${value^^}" ]] && echo "y;$i" || echo "n;$i"
}
}
#I'm calling the function like this in a for-loop:
contains "${arrayLog[#]}" "${arrayCodeNumber[i]}"
#[... processing function results ...]
My function returns "y;$i" or "n;$i" to indicate if there was a match and in which line of the log file the match was found - i need this output for the processing of the matching results later in my code.
Unfortunately I don't know how to extend or improve my function to search for multiple substrings in a line.
What would I do to extend the function to accept 2 input arrays (for my matching parameter) and 1 log array and also extending the matching process?
Thanks a lot in advance!
Kind regards,
Tobi
Consider this approach
#!/bin/bash
cmd=('Command 2' 'Command 3')
act=('Street destroyed' 'New street created')
for i in ${!cmd[#]}; {
grep -no "${cmd[$i]}.*${act[$i]}" file
}
Usage
$ ./test
2:Command 2 processed - Street destroyed
1:Command 3 processed - New street created
From grep help
$ grep --help
...
-o, --only-matching show only the part of a line matching PATTERN
-n, --line-number print line number with output lines
...

Parsing a colon-separated values with broken quotes in bash

I have a colon-separated file cik.coleft.c, which looks like this:
!J INC:0001438823:
#1 A LIFESAFER HOLDINGS, INC.:0001509607:
#1 ARIZONA DISCOUNT PROPERTIES LLC:0001457512:
#1 PAINTBALL CORP:0001433777:
$ LLC:0001427189:
& S MEDIA GROUP LLC:0001447162:
&TV COMMUNICATIONS INC.:0001479357:
'MKTG, INC.':0000886475:
11:11 CAPITAL CORP.:0001463262:
It's a two-column csv where separating commas were replaced with colons. Meanwhile, single quotes escape values with commas, instead of values with colons (the separator).
But the first column contains colons, which break parsers. So when I try to convert cik.coleft.c into normal csv...
curl -o cik.coleft.c 'https://www.sec.gov/edgar/NYU/cik.coleft.c'
in2csv --format 'csv' -d ':' -q "'" -e 'latin1' cik.coleft.c > cik.coleft.csv
... I get four and more columns.
I tried reading the lines with sed, but haven't succeed.
How can I convert this into a proper two-column table?
You can use awk and do some string manipulation with subtr and length:
awk 'BEGIN{OFS="|"}{col1=substr($0,1,length($0)-12);col2=substr($0,length($0)-10, 10);print col1,col2}' yourfile
That sets the Output Field Seperator OFS to pipe |. It delineates the two columns using substr() and length(). Column 1 is found by starting at character 1 and ending at 12 characters before the end of the record. Column 2 is found by starting 10 characters before the end of the record and grabbing 10 characters following.
Test output:
$ awk 'BEGIN{OFS="|"}{col1=substr($0,1,length($0)-12);col2=substr($0,length($0)-10, 10);print col1,col2}' test
!J INC|0001438823
#1 A LIFESAFER HOLDINGS, INC.|0001509607
#1 ARIZONA DISCOUNT PROPERTIES LLC|0001457512
#1 PAINTBALL CORP|0001433777
$ LLC|0001427189
& S MEDIA GROUP LLC|0001447162
&TV COMMUNICATIONS INC.|0001479357
'MKTG, INC.'|0000886475
11:11 CAPITAL CORP.|0001463262
This only works because your second field appears to always be a 10 digit number. If that varies in other parts of the file, then you'll have to go a different route.
you can approach it from backwards
$ rev file | sed 's/:/~/3' | rev | column -ts:
!J INC 0001438823
#1 A LIFESAFER HOLDINGS, INC. 0001509607
#1 ARIZONA DISCOUNT PROPERTIES LLC 0001457512
#1 PAINTBALL CORP 0001433777
$ LLC 0001427189
& S MEDIA GROUP LLC 0001447162
&TV COMMUNICATIONS INC. 0001479357
'MKTG, INC.' 0000886475
11~11 CAPITAL CORP. 0001463262
knowing that there are two columns, we reverse the line and replace the third instance of the : with ~.
If you have more than one extra need to replaced, with gnu sed use g3 instead of 3 suffix.
Possible solution in TXR:
The strategy is to match through the data, but with the lines reversed left to right. For that, we redirect the input using #(next ...) to a lazy :list of lines, produced by lazily mapping the output of (get-lines) through the reverse function. The following is fixcolon.txr:
#(next :list #[mapcar* reverse (get-lines)])
#(repeat)
# (assert)
# (cases)
:#right:'#left'
# (or)
:#right:#left
# (end)
# (do (put-line (reverse
(if (break-str left ":")
`:#right:'#left'`
`:#right:#left`))))
#(end)
Basically there are only two cases: we have a single quoted left or we don't. We want to remove the single quotes if they are present, and re-instate them only if the field contains colons.
The following extra line has been added to the data:
11:11 CA:PI:TAL CORP.:0001463262:
Output:
$ txr fixcolon.txr < data
!J INC:0001438823:
#1 A LIFESAFER HOLDINGS, INC.:0001509607:
#1 ARIZONA DISCOUNT PROPERTIES LLC:0001457512:
#1 PAINTBALL CORP:0001433777:
$ LLC:0001427189:
& S MEDIA GROUP LLC:0001447162:
&TV COMMUNICATIONS INC.:0001479357:
MKTG, INC.:0000886475:
'11:11 CAPITAL CORP.':0001463262:
'11:11 CA:PI:TAL CORP.':0001463262:
The superfluous quoting is gone around MKTG, INC.. Quotes are introduced around the 11:11 ... fields. (No attempt is made to handle embedded single quotes, since the sample data and question text do not specify or imply any requirements).
The #(assert) ensures that the pattern matching blows up with an exception on data which doesn't match the cases that follow. The directive effectively says "everything after me matches, or else I throw!" Without it, the #(repeat) directive will skip over non-matching data. (If told not to skip using :gap 0 it will stop at the first nonmatching line. Then to catch this issue we need an assertion that we are at EOF).
$ txr fixcolon.txr
foo:bar:
junk!
[Ctrl-D][Enter]
foo:bar:
txr: unhandled exception of type assert:
txr: (fixcolon.txr:3) assertion (at var:2)
txr: during evaluation at fixcolon.txr:3 of form (assert)

Extracting lines between two patterns and including line above the first and below the second

Having the following text file, I need to extract and print strings between two patterns and ,also, include the line above the first pattern and the one following the second
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
I have found many solution with sed and awk to extract between two tags as the following
sed -n '/FIRST/,/SECOND/p' FileName
but how to include the line before and after the pattern?
Desired output:
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
As you've asked for an sed/awk solution (and everyone is scared of ed ;-), here's one way you can do it in awk:
awk '/FIRST/{print p; f=1} {p=$0} /SECOND/{c=1} f; c--==0{f=0}' file
When the first pattern is matched, print the previous line p and set the print flag f. When the second pattern is matched set c to 1. If f is 1 (true), the current line will be printed. c--==0 is only true the line after the second pattern is matched.
Another way you can do this is by looping through the file twice:
awk 'NR==FNR{if(/FIRST/)s=NR;else if(/SECOND/)e=NR;next}FNR>=s-1&&FNR<=e+1' file file
The first pass through the file loops through the file and records the line numbers. The second prints the lines in the range.
The advantage of the second approach is that it is trivially easy to print M lines before and N lines after the range, simply by changing the numbers in the script.
To use shell variables instead of hard-coded patterns, you can pass the variables like this:
awk -v first="$first" -v second="$second" '...' file
Then use $0 ~ first instead of /FIRST/.
I'd say
sed '/FIRST/ { x; G; :a n; /SECOND/! ba; n; q; }; h; d' filename
That is:
/FIRST/ { # If a line matches FIRST
x # swap hold buffer and pattern space,
G # append hold buffer to pattern space.
# We saved the last line before the match in the hold
# buffer, so the pattern space now contains the previous
# and the matching line.
:a # jump label for looping
n # print pattern space, fetch next line.
/SECOND/! ba # unless it matches SECOND, go back to :a
n # fetch one more line after the match
q # quit (printing that last line in the process)
}
h # If we get here, it's before the block. Hold the current
# line for later use.
d # don't print anything.
Note that BSD sed (as comes with Mac OS X and *BSD) is a bit picky about branching commands. If you're working on one of those platforms,
sed -e '/FIRST/ { x; G; :a' -e 'n; /SECOND/! ba' -e 'n; q; }; h; d' filename
should work.
This will work whether or not there's multiple ranges in your file:
$ cat tst.awk
/FIRST/ { print prev; gotBeg=1 }
gotBeg {
print
if (gotEnd) gotBeg=gotEnd=0
if (/SECOND/) gotEnd=1
}
{ prev=$0 }
$ awk -f tst.awk file
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
If you ever need to print more than 1 line before FIRST change prev to an array. If you ever need to print more than 1 line after SECOND, change gotEnd to a count.
sed '#n
H;$!d
x;s/\n/²/g
/FIRST.*SECOND/!b
s/.*²\([^²]*²[^²]*FIRST\)/\1/
:a
s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/
ta
s/²/\
/g
p' YourFile
POSIX sed version (GNU sed use --posix)
take the following SECOND pattern also if on the same line, easy to adapt for taking at least one new line between
#n : don't print unless expres request (like p)
H;$!d : append each line to buffer, if not last line, delete current line and loop
x;s/\n/²/g : load buffer and replace any new line with another character (here i use ²) because posix sed does not allow a [^\n]
/FIRST.*SECOND/!b : if no pattern presence, quit without output
s/.*²\([^²]*²[^²]*FIRST\)/\1/ : remove everything before line before your first pattern
:a : label for a goto (used later)
s/\(FIRST.*SECOND[^²]*²[^²]*\)².\{1,\}/\1/ : remove everything after a line after your second pattern. It take the biggest string so last occurence of the pattern is the reference
ta : if last s/// occur, got to label a. It cyle, until first SECOND pattern occuring in file (after FIRST)
s/²/\
/g : put back the new lines
p : print the result
based on the Tom's comment: if the file isn't large we can just store it in the array, and then loop over it:
awk '{a[++i]=$0} /FIRST/{s=NR} /SECOND/{e=NR} END {for(i=s-1;i<e+1;i++) print a[i]}'
I would do it with Perl personally. We have the 'range operator' which we can use to detect if we're between two patterns:
if ( m/FIRST/ .. /SECOND/ )
That's the easy part. What's a little less easy is 'catching' the preceeding and next lines. So I set a $prev_line value, so that when I first hit that test, I know what to print. And I clear that $prev_line, both because then it's empty when I print it again, but also because then I can spot the transition at the end of the range.
So something like this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_line = " ";
while (<DATA>) {
if ( m/FIRST/ .. /SECOND/ ) {
print $prev_line;
$prev_line = '';
print;
}
else {
if ( not $prev_line ) {
print;
}
$prev_line = $_;
}
}
__DATA__
asdgs sdagasdg sdagdsag
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
line before first pattern
***** FIRST *****
dddd ffff cccc
wwww rrrrrrrr xxxx
***** SECOND *****
line after second pattern
asdfgsdagg gsfagsaf
asdfsdaf dsafsdfdsfas
asdfdasfadf
nnnn nnnnn aaaaa
This might work for you (GNU sed):
sed '/FIRST/!{h;d};H;g;:a;n;/SECOND/{n;q};$!ba' file
If the current line is not FIRST save it in the hold space and delete the current line. If the line is FIRST append it to the saved line and then print both and any further lines untill SECOND when an additional line is printed and the script exited.

How to create an array of lines of double quoted strings in a shell script

I'm trying to split a line with double quoted strings into an array:
input.txt:
"ABC" "This is TEST 1" "12.3.0"
"AC" "This is TEST 221" "123"
"CX" "This is TEST 16" "123.2"
"LM" "This is TEST 9000" "123.6.6.1"
What I'm hoping to be the outcome for each line:
print $a[0] $a[1] $a[2]
ABC This is TEST 1 12.3.0
How best to grab each string per line? I'm trying to do this via command line and/or a shell script
Update:
To help reduce complexity, I've updated my "input.txt" file as follows:
input.txt:
'ABC' 'This is TEST 1' '12.3.0'
'AC' 'This is "TEST" 221' '123'
'CX' 'This is TEST 16' '123.2'
'LM' 'This is TEST 9000' '123.6.6.1'
All the double quotes have been replaced with Single quotes, other that the ones with-in a value.
Assuming you are using bash:
IFS='"' a=("ABC" "This is TEST1" "12.3.0")
should almost work. The indexes will be off, with empty entries, but:
while IFS='"' read -a a; do
echo ${a[1]} ${a[3]} ${a[5]}; done < input
gets you most of the way there. Keep in mind that this is pretty fragile.

How to remove \n, \t out of the word eg "\n\t\t\t\tDay of Week\n\t\t\t" ?

How to remove \n, \t of a string.
For example "\n\t\t\t\tDay of Week\n\t\t\t" should be viewed as "Day of Week" when cucumber reads it from a table.
Getting error as below:
RuntimeError: Element do not match for entry : Day of Week
The test data is entered as text only but cucumber reads it from the application as "\n\t\t\t\tDay of Week\n\t\t\t" and comparison fails.
What needs to be done exactly?
This is the way:
p "\n\t\t\t\tDay of Week\n\t\t\t".strip!
# >> "Day of Week"

Resources