How to parse a config file using sed - bash

I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.

This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.

If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"

Related

bash how to extract a field based on its content from a delimited string

Problem - I have a set of strings that essentially look like this:
|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|...|ZZZZZZZZZ|
The '...' denotes omitted fields.
Please note that the fields between the pipes ('|') can appear in ANY ORDER and not all fields are necessarily present. My task is to find the "XXXXXXX" field and extract it from the string; I can specify that field with a regex and find it with grep/awk/etc., but once I have that one line extracted from the file, I am at a loss as to how to extract just that text between the pipes.
My searches have turned up splitting the line into individual fields and then extracting the Nth field, however, I do not know what N is, that is the trick.
I've thought of splitting the string by the delimiter, substituting the delimiter with a newline, piping those lines into a grep for the field, but that involves running another program and this will be run on a production server through near-TB of data, so I wanted to minimize program invocations. And I cannot copy the files to another machine nor do I have the benefit of languages like Python, Perl, etc., I'm stuck with the "standard" UNIX commands on SunOS. I think I'm being punished.
Thanks
As an example, let's extract the field that matches MyField:
Using sed
$ s='|AAAAAA|BBBBBB|CCCCCCC|...|XXXXXXXXX|12MyField34|ZZZZZZZZZ|'
$ sed -E 's/.*[|]([^|]*MyField[^|]*)[|].*/\1/' <<<"$s"
12MyField34
Using awk
$ awk -F\| -v re="MyField" '{for (i=1;i<=NF;i++) if ($i~re) print $i}' <<<"$s"
12MyField34
Using grep -P
$ grep -Po '(?<=\|)[^|]*MyField[^|]*' <<<"$s"
12MyField34
The -P option requires GNU grep.
$ sed -e 's/^.*|\(XXXXXXXXX\)|.*$/\1/'
Naturally, this only makes sense if XXXXXXXXX is a regular expression.
This should be really fast if used something like:
$ grep '|XXXXXXXXX|' somefile | sed -e ...
One hackish way -
sed 's/^.*|\(<whatever your regex is>\)|.*$/\1/'
but that might be too slow for your production server since it may involve a fair amount of regex backtracking.

Use awk to extract value from a line

I have these two lines within a file:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
where I'd like to get the following as output using awk or sed:
3
50000
Using this sed command does not work as I had hoped, and I suspect this is due to the presence of the quotes and delimiters in my line entry.
sed -n '/WORD1/,/WORD2/p' /path/to/file
How can I extract the values I want from the file?
awk -F'[<>]' '{print $3}' input.txt
input.txt:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
Output:
3
50000
sed -e 's/[a-zA-Z.<\/>= \-]//g' file
Using sed:
sed -E 's/.*limit"*>([0-9]+)<.*/\1/' file
Explanation:
.* takes care of everything that comes before the string limit
limit"* takes care of both the lines, one with limit" and the other one with just limit
([0-9]+) takes care of matching numbers and only numbers as stated in your requirement.
\1 is actually a shortcut for capturing pattern. When a pattern groups all or part of its content into a pair of parentheses, it captures that content and stores it temporarily in memory. For more details, please refer https://www.inkling.com/read/introducing-regular-expressions-michael-fitzgerald-1st/chapter-4/capturing-groups-and
The script solution with parameter expansion:
#!/bin/bash
while read line || test -n "$line" ; do
value="${line%<*}"
printf "%s\n" "${value##*\>}"
done <"$1"
output:
$ ./ltags.sh dat/ltags.txt
3
50000
Looks like XML to me, so assuming it forms part of some valid XML, e.g.
<root>
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
</root>
You can use Perl's XML::Simple and do something like this:
perl -MXML::Simple -E '$xml = XMLin("file"); say $xml->{"first-value"}->{"content"}; say $xml->{"second-value-limit"}'
Output:
3
50000
If the XML structure is more complicated, then you may have to drill down a bit deeper to get to the values you want. If that's the case, you should edit the question to show the bigger picture.
Ashkan's awk solution is straightforward, but let me suggest a sed solution that accepts non-integer numbers:
sed -n 's/[^>]*>\([.[:digit:]]*\)<.*/\1/p' input.txt
This extracts the number between the first > character of the line and the following <. In my RE this "number" can be the empty string, if you don't want to accept an empty string please add the -r option to sed and replace \([.[:digit:]]*\) by ([.[:digit:]]+).

grep pipe searching for one word, not line

For some reason I cannot get this to output just the version of this line. I suspect it has something to do with how grep interprets the dash.
This command:
admin#DEV:~/TEMP$ sendemail
Yields the following:
sendemail-1.56 by Brandon Zehm
More output below omitted
The first line is of interest. I'm trying to store the version to variable.
TESTVAR=$(sendemail | grep '\s1.56\s')
Does anyone see what I am doing wrong? Thanks
TESTVAR is just empty. Even without TESTVAR, the output is empty.
I just tried the following too, thinking this might work.
sendemail | grep '\<1.56\>'
I just tried it again, while editing and I think I have another issue. Perhaps im not handling the output correctly. Its outputting the entire line, but I can see that grep is finding 1.56 because it highlights it in the line.
$ TESTVAR=$(echo 'sendemail-1.56 by Brandon Zehm' | grep -Eo '1.56')
$ echo $TESTVAR
1.56
The point is grep -Eo '1.56'
from grep man page:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
-o, --only-matching
Print only the matched (non-empty) parts of a matching line, with each such part on a separate output
line.
Your regular expression doesn't match the form of the version. You have specified that the version is surrounded by spaces, yet in front of it you have a dash.
Replace the first \s with the capitalized form \S, or explicit set of characters and it should work.
I'm wondering: In your example you seem to know the version (since you grep for it), so you could just assign the version string to the variable. I assume that you want to obtain any (unknown) version string there. The regular expression for this in sed could be (using POSIX character classes):
sendemail |sed -n -r '1 s/sendemail-([[:digit:]]+\.[[:digit:]]+).*/\1/ p'
The -n suppresses the normal default output of every line; -r enables extended regular expressions; the leading 1 tells sed to only work on line 1 (I assume the version appears in the first line). I anchored the version number to the telltale string sendemail- so that potential other numbers elsewhere in that line are not matched. If the program name changes or the hyphen goes away in future versions, this wouldn't match any longer though.
Both the grep solution above and this one have the disadvantage to read the whole output which (as emails go these days) may be long. In addition, grep would find all other lines in the program's output which contain the pattern (if it's indeed emails, somebody might discuss this problem in them, with examples!). If it's indeed the first line, piping through head -1 first would be efficient and prudent.
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail
sendemail-1.56 by Brandon Zehm
jayadevan#jayadevan-Vostro-2520:~$ echo $sendmail | cut -f2 -d "-" | cut -f1 -d" "
1.56

Insert line after match using sed

For some reason I can't seem to find a straightforward answer to this and I'm on a bit of a time crunch at the moment. How would I go about inserting a choice line of text after the first line matching a specific string using the sed command. I have ...
CLIENTSCRIPT="foo"
CLIENTFILE="bar"
And I want insert a line after the CLIENTSCRIPT= line resulting in ...
CLIENTSCRIPT="foo"
CLIENTSCRIPT2="hello"
CLIENTFILE="bar"
Try doing this using GNU sed:
sed '/CLIENTSCRIPT="foo"/a CLIENTSCRIPT2="hello"' file
if you want to substitute in-place, use
sed -i '/CLIENTSCRIPT="foo"/a CLIENTSCRIPT2="hello"' file
Output
CLIENTSCRIPT="foo"
CLIENTSCRIPT2="hello"
CLIENTFILE="bar"
Doc
see sed doc and search \a (append)
Note the standard sed syntax (as in POSIX, so supported by all conforming sed implementations around (GNU, OS/X, BSD, Solaris...)):
sed '/CLIENTSCRIPT=/a\
CLIENTSCRIPT2="hello"' file
Or on one line:
sed -e '/CLIENTSCRIPT=/a\' -e 'CLIENTSCRIPT2="hello"' file
(-expressions (and the contents of -files) are joined with newlines to make up the sed script sed interprets).
The -i option for in-place editing is also a GNU extension, some other implementations (like FreeBSD's) support -i '' for that.
Alternatively, for portability, you can use perl instead:
perl -pi -e '$_ .= qq(CLIENTSCRIPT2="hello"\n) if /CLIENTSCRIPT=/' file
Or you could use ed or ex:
printf '%s\n' /CLIENTSCRIPT=/a 'CLIENTSCRIPT2="hello"' . w q | ex -s file
Sed command that works on MacOS (at least, OS 10) and Unix alike (ie. doesn't require gnu sed like Gilles' (currently accepted) one does):
sed -e '/CLIENTSCRIPT="foo"/a\'$'\n''CLIENTSCRIPT2="hello"' file
This works in bash and maybe other shells too that know the $'\n' evaluation quote style. Everything can be on one line and work in
older/POSIX sed commands. If there might be multiple lines matching the CLIENTSCRIPT="foo" (or your equivalent) and you wish to only add the extra line the first time, you can rework it as follows:
sed -e '/^ *CLIENTSCRIPT="foo"/b ins' -e b -e ':ins' -e 'a\'$'\n''CLIENTSCRIPT2="hello"' -e ': done' -e 'n;b done' file
(this creates a loop after the line insertion code that just cycles through the rest of the file, never getting back to the first sed command again).
You might notice I added a '^ *' to the matching pattern in case that line shows up in a comment, say, or is indented. Its not 100% perfect but covers some other situations likely to be common. Adjust as required...
These two solutions also get round the problem (for the generic solution to adding a line) that if your new inserted line contains unescaped backslashes or ampersands they will be interpreted by sed and likely not come out the same, just like the \n is - eg. \0 would be the first line matched. Especially handy if you're adding a line that comes from a variable where you'd otherwise have to escape everything first using ${var//} before, or another sed statement etc.
This solution is a little less messy in scripts (that quoting and \n is not easy to read though), when you don't want to put the replacement text for the a command at the start of a line if say, in a function with indented lines. I've taken advantage that $'\n' is evaluated to a newline by the shell, its not in regular '\n' single-quoted values.
Its getting long enough though that I think perl/even awk might win due to being more readable.
A POSIX compliant one using the s command:
sed '/CLIENTSCRIPT="foo"/s/.*/&\
CLIENTSCRIPT2="hello"/' file
Maybe a bit late to post an answer for this, but I found some of the above solutions a bit cumbersome.
I tried simple string replacement in sed and it worked:
sed 's/CLIENTSCRIPT="foo"/&\nCLIENTSCRIPT2="hello"/' file
& sign reflects the matched string, and then you add \n and the new line.
As mentioned, if you want to do it in-place:
sed -i 's/CLIENTSCRIPT="foo"/&\nCLIENTSCRIPT2="hello"/' file
Another thing. You can match using an expression:
sed -i 's/CLIENTSCRIPT=.*/&\nCLIENTSCRIPT2="hello"/' file
Hope this helps someone
The awk variant :
awk '1;/CLIENTSCRIPT=/{print "CLIENTSCRIPT2=\"hello\""}' file
I had a similar task, and was not able to get the above perl solution to work.
Here is my solution:
perl -i -pe "BEGIN{undef $/;} s/^\[mysqld\]$/[mysqld]\n\ncollation-server = utf8_unicode_ci\n/sgm" /etc/mysql/my.cnf
Explanation:
Uses a regular expression to search for a line in my /etc/mysql/my.cnf file that contained only [mysqld] and replaced it with
[mysqld]
collation-server = utf8_unicode_ci
effectively adding the collation-server = utf8_unicode_ci line after the line containing [mysqld].
I had to do this recently as well for both Mac and Linux OS's and after browsing through many posts and trying many things out, in my particular opinion I never got to where I wanted to which is: a simple enough to understand solution using well known and standard commands with simple patterns, one liner, portable, expandable to add in more constraints. Then I tried to looked at it with a different perspective, that's when I realized i could do without the "one liner" option if a "2-liner" met the rest of my criteria. At the end I came up with this solution I like that works in both Ubuntu and Mac which i wanted to share with everyone:
insertLine=$(( $(grep -n "foo" sample.txt | cut -f1 -d: | head -1) + 1 ))
sed -i -e "$insertLine"' i\'$'\n''bar'$'\n' sample.txt
In first command, grep looks for line numbers containing "foo", cut/head selects 1st occurrence, and the arithmetic op increments that first occurrence line number by 1 since I want to insert after the occurrence.
In second command, it's an in-place file edit, "i" for inserting: an ansi-c quoting new line, "bar", then another new line. The result is adding a new line containing "bar" after the "foo" line. Each of these 2 commands can be expanded to more complex operations and matching.

Limiting SED to the first 10 characters of a line

I'm running sed as a part of a shell script to clean up bind logs for insertion into a database.
One of the sed commands is the following:
sed -i 's/-/:/g' $DPath/named.query.log
This turns out to be problematic as it disrupts any resource requests that also include a dash (I'm using : as a delimiter for an awk statement further down).
My question is how do I limit the sed command above to only the first ten characters of the line? I haven't seen a specific switch that does this, and I'm nowhere near good enough with RegEx to even start on developing one that works. I can't just use regex to match the preceding numbers because it's possible that the pattern could be part of a resource request. Heck, I can't even use pattern matching for ####-##-## because, again, it could be part of the resource.
Any ideas are much appreciated.
It's [almost always] simpler with awk:
awk '{target=substr($0,1,10); gsub(/-/,":",target); print target substr($0,11)}' file
I think the shortest solution, and perhaps the simplest, is provided by sed itself, rather than awk[ward]:
sed "h;s/-/:/g;G;s/\(..........\).*\n........../\1/"
Explanation:
(h) copy everything to the hold space
(s) do the substitution (to the entire pattern space)
(G) append the hold space, with a \n separator
(s) delete the characters up to the tenth after the \n, but keep the first ten.
Some test code:
echo "--------------------------------" > foo
sed -i "h;s/-/:/g;G;s/\(..........\).*\n........../\1/" foo
cat foo
::::::::::----------------------
I'm not sure how make sed do it per se, however, I do know that you can feed sed the first 10 characters then paste the rest back in, like so:
paste -d"\0" <(cut -c1-10 $DPath/named.query.log | sed 's/\-/:/g') <(cut -c11- $DPath/named.query.log)
You can do the following:
cut -c 1-10 $DPath/named.query.log | sed -i 's/-/:/g'
The cut statemnt takes only the first 10 chars of each line in that file. The output of that should be piped in a file. As of now it will just output to your terminal

Resources