Using awk to get lines between two patterns - bash

newbie with awk and trying to write a bash script to use it to print lines between two patterns in a log file and for the life of me I cannot make it work.
I am thinking I need to escape some of the characters.
Here's an example of the section of log I am trying to get lines from:
Processing... AP710 (/var/opt/testsys/rptprint/AP710)
sidjosajdois
sokds3488sds
doskdoskdoskdo
sodk229929
sending entire report to Job Mgr (spool) for user
I want the four lines between the "Processing..." line (first pattern) and the "sending" line (second pattern), and there is only one section of the log that has this above section with both the first pattern line and second pattern line.
I've tried using awk with the following command using a portion of the first pattern, and escaping the "/" characters as needed:
awk '/\/var\/opt\/testsys\/rptprint\/AP710/{flag=1;next}/sending entire report to Job Mgr/{flag=0}flag' log
But it gives me some other different section of the log that also happens to have the path "/var/opt/testsys/rptprint/AP710", so then I tried changing it to have more of the line (first pattern) by adding "Processing..." and it doesn't return anything....
awk '/Processing\.\.\. AP710 \(\/var\/opt\/testsys\/rptprint\/AP710/{flag=1;next}/sending entire report to Job Mgr/{flag=0}flag' log
Can someone give some guidance about awk so I can get the lines between the 2 patterns? After spending a few hours I am going a little bonkers trying to figure it out, I think my being new to awk is causing me to miss something obvious.
Cheers.

Whenever you find yourself escaping characters in a regexp to make them literal, really consider whether or not you should be using a regexp or if instead you should be doing a string comparison. In fact, always start out with a string comparison and switch to regexp if you need to.
$ awk '
$0=="sending entire report to Job Mgr (spool) for user" { inSection=0 }
inSection;
$0=="Processing... AP710 (/var/opt/testsys/rptprint/AP710)" { inSection=1 }
' file
sidjosajdois
sokds3488sds
doskdoskdoskdo
sodk229929

Related

extract data between similar patterns

I am trying to use sed to print the contents between two patterns including the first one. I was using this answer as a source.
My file looks like this:
>item_1
abcabcabacabcabcabcabcabacabcabcabcabcabacabcabc
>item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
>item_3
cdecde
>item_4
defdefdefdefdefdefdef
I want it to start searching from item_2 (and include) and finish at next occuring > (not include). So my code is sed -n '/item_2/,/>/{/>/!p;}'.
The result wanted is:
item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
but I get it without item_2.
Any ideas?
Using awk, split input by >s and print part(s) matching item_2.
$ awk 'BEGIN{RS=">";ORS=""} /item_2/' file
item_2
bcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdbbcdbcdbcdbcdb
I would go for the awk method suggested by oguz for its simplicity. Now if you are interested in a sed way, out of curiosity, you could fix what you have already tried with a minor change :
sed -n '/^>item_2/ s/.// ; //,/>/ { />/! p }' input_file
The empty regex // recalls the previous regex, which is handy here to avoid duplicating /item_2/. But keep in mind that // is actually dynamic, it recalls the latest regex evaluated at runtime, which is not necessarily the closest regex on its left (although it's often the case). Depending on the program flow (branching, address range), the content of the same // can change and... actually here we have an interesting example ! (and I'm not saying that because it's my baby ^^)
On a line where /^>item_2/ matches, the s/.// command is executed and the latest regex before // becomes /./, so the following address range is equivalent to /./,/>/.
On a line where /^>item_2/ does not match, the latest regex before // is /^>item_2/ so the range is equivalent to /^>item_2/,/>/.
To avoid confusion here as the effect of // changes during execution, it's important to note that an address range evaluates only its left side when not triggered and only its right side when triggered.
This might work for you (GNU sed):
sed -n ':a;/^>item_2/{s/.//;:b;p;n;/^>/!bb;ba}' file
Turn off implicit printing -n.
If a line begins >item_2, remove the first character, print the line and fetch the next line
If that line does not begins with a >, repeat the last two instructions.
Otherwise, repeat the whole set of instructions.
If there will always be only one line following >item_2, then:
sed '/^>item_2/!d;s/.//;n' file

Newlines in shell script variable not being replaced properly

Situation: Using a shell script (bash/ksh), there is a message that should be shown in the console log, and subsequently sent via email.
Problem: There are newline characters in the message.
Example below:
ErrMsg="File names must be unique. Please correct and rerun.
Duplicate names are listed below:
File 1.txt
File 1.txt
File 2.txt
File 2.txt
File 2.txt"
echo "${ErrMsg}"
# OK. After showing the message in the console log, send an email
Question: How can these newline characters be translated into HTML line breaks for the email?
Constraint: We must use HTML email. Downstream processes (such as Microsoft Outlook) are too inconsistent for anything else to be of use. Simple text email is usually a good choice, but off the table for this situation.
To be clear, the newlines do not need to be completely removed, but HTML line breaks must be inserted wherever there is a newline character.
This question is being asked because I have already attempted to use several commands, such as sed, tr, and awk with varying degrees of success.
TL;DR: The following snippet will do the job:
ErrMsg=`echo "$ErrMsg"|awk 1 ORS='<br/>'`
Just make sure there are double quotes around the variable when using echo.
This turned out to be a tricky situation. Some notes of explanation are below.
Using sed
Turns out, sed reads through input line by line, which makes finding and replacing those newlines somewhat outside the norm. There were several clever tricks that appeared to work, but I felt they were far too complicated to apply appropriately to this rather simple situation.
Using tr
According to this answer the tr command should work. Unfortunately, this only translates character by character. The two character strings are not the same length, and I am limited to translating the newline into a space or other single character.
For the following:
ErrMsg="Line 1
Line 2
"
ErrMsg=`echo $ErrMsg| tr '\n' 'BREAK'`
# You might expect:
# "Line 1BREAKLine 2BREAK"
# But instead you get:
# "Line 1BLine 2B"
echo "${ErrMsg}"
Using awk
Using awk according to this answer initially appeared to work, but due to some other circumstances with echo there was a subtle problem. The solution is noted in this forum.
You must have double-quotes around your variable, or echo will strip out all newlines.(Of course, awk will receive the characters with a newline at the end, because that's what echo does after it echos stuff.)
This snippet is good: (line breaks in the middle are preserved and replaced correctly)
ErrMsg=`echo "$ErrMsg"|awk 1 ORS='<br/>'`
This snipped is bad: (newlines converted to spaces by echo, one line break at end)
ErrMsg=`echo $ErrMsg|awk 1 ORS='<br/>'`
You can wrap your message in HTML using <pre>, something like
<pre>
${ErrMsg}
and more.
</pre>

Sed keep original indentation and camel-casing a variable

I have a simple sed script and I am replacing a bunch of lines in my application dynamically with a variable, the variable is a list of strings.My function works but does not keep the original indentation.the function deletes the line if it contains the certain string and replaces the line with a completely new line, I could not do a replace due to certain syntax restrictions.
How do I keep my original indentation when the line is replaced
Can I capitalize my variable and remove the underscore on the fly, i.e. the title is a capitalize and underscore removed version of the variableName, the list of items in the variable array is really long so I am trying to do this in one shot.
Ex: I want report_type -> Report Type done mid process
Is there a better way to solve this with sed? Thanks for any inputs much appreciated.
sed function is as follows
variableName=$1
sed -i "/name\=\"${variableName}\.name\" value\=model\.${variableName}\.name options\=\#lists\./c\\{\{\> \_dropdown title\=\"${variableName}\" required\=true name\=\"${variableName}\"\}\}" test
SAMPLE INPUT
{{> _select title="Report Type" required=true name="report_type.name" value=model.report_type.name options=#lists.report_type}}
SAMPLE EXPECTED OUPUT
{{> _dropdown title="Report Type" required=true name="report_type" value=model.report_type.name}}
sample input variable
report_type
Try this:
sed -E "s/^(\s+).*name\=\"(report_type)\.name\" value\=model\.report_type\.name options\=\#lists\..*$/\1\{\{\> \_dropdown title\=\"\2\" required\=true name\=\"\2\"\}\}/;T;s/\"(\w+)_(\w+)\"/\"\u\1 \u\2\"/g" input.txt > output.txt
I used "report_type" instead of ${variableName} for testing as an sed one-liner.
Please change back to ${variableName}.
Then go back to using -i (in addition to -E, which is for extended regex).
I am not sure whether I can do it without extended regex, let me know if that is necessary.
use s/// to replace fine tuned line
first capture group for the white space making the indentation
second capture group for the variable name
stop if that did not replace anything, T;
another s///
look for something consisting of only letters between "",
with a "_" between two parts,
seems safe enough because this step is only done on the already replaced line
replace by two parts, without "_"
\u for making camel case
Note:
Doing this on your sample input creates two very similar lines.
I assume that is intentional. Otherwise please provide desired output.
Using GNU sed version 4.2.1.
Interesting line of output:
{{> _dropdown title="Report Type" required=true name="Report Type"}}

Substitution of substring doesn't work in bash (tried sed, ${a/b/c/})

Before to write, of course I read many other similar cases. Example I used #!/bin/bash instead of #!/bin/sh
I have a very simple script that reads lines from a template file and wants to replace some keywords with real data. Example the string <NAME> will be replaced with a real name. In the example I want to replace it with the word Giuseppe. I tried 2 solutions but they don't work.
#!/bin/bash
#read the template and change variable information
while read LINE
do
sed 'LINE/<NAME>/Giuseppe' #error: sed: -e expression #1, char 2: extra characters after command
${LINE/<NAME>/Giuseppe} #error: WORD(*) command not found
done < template_mail.txt
(*) WORD is the first word found in the line
I am sorry if the question is too basic, but I cannot see the error and the error message is not helping.
EDIT1:
The input file should not be changed, i want to use it for every mail. Every time i read it, i will change with a different name according to the receiver.
EDIT2:
Thanks your answers i am closer to the solution. My example was a simplified case, but i want to change also other data. I want to do multiple substitutions to the same string, but BASH allows me only to make one substitution. In all programming languages i used, i was able to substitute from a string, but BASH makes this very difficult for me. The following lines don't work:
CUSTOM_MAIL=$(sed 's/<NAME>/Giuseppe/' template_mail.txt) # from file it's ok
CUSTOM_MAIL=$(sed 's/<VALUE>/30/' CUSTOM_MAIL) # from variable doesn't work
I want to modify CUSTOM_MAIL a few times in order to include a few real informations.
CUSTOM_MAIL=$(sed 's/<VALUE1>/value1/' template_mail.txt)
${CUSTOM_MAIL/'<VALUE2>'/'value2'}
${CUSTOM_MAIL/'<VALUE3>'/'value3'}
${CUSTOM_MAIL/'<VALUE4>'/'value4'}
What's the way?
No need to do the loop manually. sed command itself runs the expression on each line of provided file:
sed 's/<NAME>/Giuseppe/' template_mail.txt > output_file.txt
You might need g modifier if there are more appearances of the <NAME> string on one line: s/<NAME>/Giuseppe/g

Concatenating strings fails when read from certain files

I have a web application that is deployed to a server. I am trying to create a script that amoing other things reads the current version of the web application from a properties file that is deployed along with the application.
The file looks like this:
//other content
version=[version number]
build=[buildnumber]
//other content
I want to create a variable that looks like this: version-buildnumber
Here is my script for it:
VERSION_FILE=myfile
VERSION_LINE="$(grep "version=" $VERSION_FILE)"
VERSION=${VERSION_LINE#$"version="}
BUILDNUMBER_LINE=$(grep "build=" $VERSION_FILE)
BUILDNUMBER=${BUILDNUMBER_LINE#$"build="}
THEVERSION=${VERSION}-${BUILDNUMBER}
The strange thing is that this works in some cases but not in others.
The problem I get is when I am trying to concatenate the strings (i.e. the last line above). In some cases it works perfectly, but in others characters from one string replace the characters from the other instead of being placed afterwards.
It does not work in these cases:
When I read from the deployed file
If I copy the deployed file to another location and read from there
It does work in these cases:
If I write a file from scratch and read from that one.
If I create my own file and then copy the content from the deployed file into my created file.
I find this very strange. Is there someone out there recognizing this?
It is likely that your files have carriage returns in them. You can fix that by running dos2unix on the file.
You may also be able to do it on the fly on the strings you're retrieving.
Here are a couple of ways:
Do it with sed instead of grep:
VERSION_LINE="$(sed -n "/version=/{s///;s/\r//g;p}" $VERSION_FILE)"
and you won't need the Bash parameter expansion to strip the "version=".
OR
Do the grep as you have it now and do a second parameter expansion to strip the carriage return.
VERSION=${VERSION_LINE#$"version="}
VERSION=${VERSION//$'\r'}
By the way, I recommend habitually using lowercase or mixed case variable names in order to reduce the chance of name collisions.
Given this foo.txt:
//other content
version=[version number]
build=[buildnumber]
//other content
you can extract a version-build string more easily with awk:
awk -F'=' '$1 == "version" { version = $2}; $1 == "build" { build = $2}; END { print version"-"build}' foo.txt
I don't know why your script doesn't work. Can you provide an example of erroneous output?
From this sentence:
In some cases it works perfectly, but in others characters from one string replace the characters from the other instead of being placed afterwards.
I can't understand what's actually going on (I'm not a native English speaker so it's probably my fault).
Cheers,
Giacomo

Resources