Cutting a string using multiple delimiters using the awk or sed commands - bash

I am using a SIPP server simulator to verify incoming calls.
What I need to verify is the caller ID and the dialed digits. I've logged this information to a file, which now contains, for example, the following:
From: <sip:972526134661#server>;tag=60=.To: <sip:972526134662#server>}
in each line.
What I want is to modify it to a csv file containing simply the two phone numbers, such as follows:
and etc.
I've tried using the awk -F command, but then I can only use the sip: as a delimiter or the # or / as delimiters.
While, basically what I want to do is to take all the strings which begin with a < and end with >, and then take all the strings that follow the sip: delimiter.
using the cut command is also not an option, as I understand that it cannot use strings as delimiters.
I guess it should be really simple but I haven't find quite the right thing to use.. Would appreciate the help, thanks!

OK, for fun, picking some random data (from your original post) and using awk -F as you originally wanted.
To note, because your file is "generated", we can assume a regular format for the data and not expect the "short" patterns to cause mis-hits.
[g]awk -F'sip:|#' -v OFS="," '{print $2,$4}' yourlogfile
It uses both sip: and # as the Field Separator, by means of the alternation operator |. It can easily be extended to allow further characters or strings to also be used to separate fields in the input if required. The built-in variable FS can contain a regular expression/regexp like this.
For that first sample in your question, it yields this:
For the latest (revision 8) version, and guessing what you want:
[g]awk -F'sip:|#|to_number:' -v OFS="," '{print $2,$5}' yourlogfile
Yields this:
The [g]awk is because I used gawk on my machine, and got same behaviour with awk.
Slight amendment in style, suggested by #fedorqui, to use the command-line option -v to set the value for the Output Field Separator (an AWK built-in variable which can be amended using -v like any other variable) and separating the print fields with a comma, so that they are treated in the output as fields, rather than building a string with a hard-coded "," and treating it as one field.

I would suggest using sed to extract the two numbers:
$ sed -n 's/^From: <sip:\([0-9]*\).*To: <sip:\([0-9]*\).*/\1,\2/p' file
The regular expression matches a line beginning with From and captures the two numbers after <sip:. If the spaces are variable, you may want to add * to those places.

You can use a regex replace, as long as the format stays the same (order is always From/To):
sed -E "s/^.*sip:([0-9]+)#.*sip:([0-9]+)#.*$/\1,\2/"
It's not a very specific or perfect solution, but in most cases an approach like this is enough.


Logfile reformatting with regex

I'm using grep to filter certain lines from a log file and present them to my conky config.
The log file is /var/log/messages.
The entries pertain to UFW block events.
The trouble is that I only care about certain strings of each line.
I can grep the only the UFW blocks, but the line is too long to fit in conky.
Even if conky were not part of the equation, learning to only show pieces of a log line would benefit me in future.
I have got somewhere by using the following:
grep -Ewoh '(IN=([a-z]){4,})|((DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3})|(PROTO=[a-z]{2,6})|((SPT|DPT)=[0-9]{1,5})' /var/log/messages
This ugly-looking regex is filtering for entries (like) this:
Which (nearly) line-for-line is as such:
The problem is that this produces a new line for each matching word, where I just want the filtered strings of each line, in their line.
$ grep -Ewoh '(IN=([a-z]){4,})|((DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3})|(PROTO=[a-z]{2,6})|((SPT|DPT)=[0-9]{1,5})' /var/log/messages
I'd rather not use a very complicated awk method unless I can walk back into it a few months later and remember it easily. awk is incredible, but it can be difficult to digest if you drop the ball just once!
Thank you.
If I understand correctly, instead of the list provided by grep -o, you want to remove non matching strings, and print only matching strings, in place. Ie. in the lines and order they appear.
Using gawk's FPAT:
gawk -v FPAT='my-regex' '$1=$1'
Replace my-regex with the regex for strings you want to see.
This will print the matches on each line, in order, delimited by a space.
Add -v OFS= to remove the space, or for example, -v OFS=', ' to change the delimiting string.
You were using grep -w to match a whole word. You can do this in a gawk regex by using \< and \> for left and right word boundary (respectively).
For example, add parentheses and word boundaries around the whole list of 'or' operators (|):
-v FPAT='\<((IN=([a-z]){4,})|((DST|SRC)=(([0-9]){1,3}\.){3,}([0-9]){1,3})|(PROTO=[a-z]{2,6})|((SPT|DPT)=[0-9]{1,5}))\>'
Note the bugs in your regex, such as tshiono commented, which won't match PROTO=TCP.

bash how to extract a field based on its content from a delimited string

Problem - I have a set of strings that essentially look like this:
The '...' denotes omitted fields.
Please note that the fields between the pipes ('|') can appear in ANY ORDER and not all fields are necessarily present. My task is to find the "XXXXXXX" field and extract it from the string; I can specify that field with a regex and find it with grep/awk/etc., but once I have that one line extracted from the file, I am at a loss as to how to extract just that text between the pipes.
My searches have turned up splitting the line into individual fields and then extracting the Nth field, however, I do not know what N is, that is the trick.
I've thought of splitting the string by the delimiter, substituting the delimiter with a newline, piping those lines into a grep for the field, but that involves running another program and this will be run on a production server through near-TB of data, so I wanted to minimize program invocations. And I cannot copy the files to another machine nor do I have the benefit of languages like Python, Perl, etc., I'm stuck with the "standard" UNIX commands on SunOS. I think I'm being punished.
As an example, let's extract the field that matches MyField:
Using sed
$ sed -E 's/.*[|]([^|]*MyField[^|]*)[|].*/\1/' <<<"$s"
Using awk
$ awk -F\| -v re="MyField" '{for (i=1;i<=NF;i++) if ($i~re) print $i}' <<<"$s"
Using grep -P
$ grep -Po '(?<=\|)[^|]*MyField[^|]*' <<<"$s"
The -P option requires GNU grep.
$ sed -e 's/^.*|\(XXXXXXXXX\)|.*$/\1/'
Naturally, this only makes sense if XXXXXXXXX is a regular expression.
This should be really fast if used something like:
$ grep '|XXXXXXXXX|' somefile | sed -e ...
One hackish way -
sed 's/^.*|\(<whatever your regex is>\)|.*$/\1/'
but that might be too slow for your production server since it may involve a fair amount of regex backtracking.

How to pull a value from between 2 strings which occur several times in a file

I am trying to pull the value from inbetween 2 strings and line break each result. I am then hoping to combine this with another value from the same document being pulled the same way. The problem is there are NO linebreaks in this file and it is quite large. Here is an example of the file.
My end result would ideally look something like this.
My closest attemps so far have been creating variables with grep, but I cant seem to format them into a table. Im also very new to scripting so forgive my ignorance.
If your grep supports -P (--Perl-regexp), then you're free to use the below regex.
$ grep -oP '<ID>\K[^<>]*(?=</ID>)|<DNS_NAME>\K[^<>]*(?=</DNS_NAME>)' file | sed 'N;s/\n/-----/g'
\K Discards the previously matched characters from printing.
(?=...) posiitve lookahead assertion which asserts where the match would occur. It won't consume any characters.
Here is an gnu awk (do to multiple characters in RS) to get your data:
awk -v RS="<ID>" -F"<|>" 'NR>1 {print $1"-----"$9}' file

Using sed/awk to process a pattern in bash

I have a command whose output is of the form:
[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]
I want to take the output of this command and just get the value corresponding to foo2
How do I use sed/awk or any other shell utility readily available in a bash script to do this?
Assuming that the values do not contain commas, this sed rune will do it:
sed -n 's/.*"foo2":\([^,]*\),.*/\1/'p
sed -n tells sed not to print lines by default.
The s ("substitute") command uses a regexp group delimited by \( and \) to pick out just the bit you want.
"foo2": provides the context needed to find the right value.
[^,]* means "a character that is not a comma, any number of times". This is your . If values are not delimited by commas, change this (and the comma after the grouping parens) to match correctly.
.* means "any character, any number of times", and it is used to match all the characters before and after the bit you want. Now the regexp will match the entire line.
\1 means the contents of the grouping parentheses. sed will substitute the string that matches the pattern (which is the whole line, because we used .* at the beginning and end) with the contents of the parens, .
Finally, the p on the end means "print the resulting line".
With this awk for example:
$ awk -F[:,] '{print $4}' file
<some value2>
-F[:,] sets possible field separators as : or ,. Then, it is a matter of counting the position in which <some value> of foo2 are. It happens to be the 4th.
With sed:
$ sed 's/.*"foo2":\([^,]*\).*/\1/g' file
<some value2>
.*"foo2":\([^,]*\).* gets the string coming after foo2: and until the comma appears. Then it prints it back with \1.
Your block of data looks like JSON. There is no native JSON parsing in bash, sed or awk, so ALL the answers here will either suggest that you use a different, more appropriate tool, or they will be hackish and might easily fail if your real data looks different from the example you've provided here.
That said, if you are confident that your variable:value blocks and line structure are always in the same format as this example, you may be able to get away with writing your own (very) basic parser that will work for just your use case.
Note that you can't really parse things in sed, it's just not designed for that. If your data always looks the same, a sed solution may be sufficient ... but remember that you are simply pattern matching, not parsing the input data. There are other answers already which cover this.
For very simple matching of the string that appears after the colon after "foo2", as Peter suggested, you could use the following:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | sed -ne 's/.*"foo2":\([^,]*\),.*/\1/p'
As I say, this should in no way be confused with parsing of your JSON. It would work equally well (or badly) with an input string of abcde"foo2":bar,abcde.
In awk, you can make things that are a bit more advanced, but you still have serious limitations when it comes to JSON. For example, if you choose to separate fields with commas, but then you put a comma inside the <some value> in your data, awk doesn't know how to distinguish it from a field separator.
That said, if your JSON is only one level deep (i.e. matches your sample data), the following might work for you:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | awk -F: -vRS=, '{gsub(/[^[:alnum:]]/,"",$1)} $1=="foo2" {print $2}'
This awk script considers commas as record separators and colons as field separators. It does not support any level of depth in your JSON, and depends on alphanumeric variable names. But it should handle JSON split on to multiple lines.
Alternately, if you want to avoid ugly hacks, and perl or python solutions don't work for you, you might want to try out jsawk. With it, you might use something like this:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | jsawk -a 'return this.foo2'
SEE ALSO: Parsing json with awk/sed in bash to get key value pair
This worked for me. You can Try this one
echo "[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]" | awk -F"[:,]+" '{ if($3=="foo2") { print $4 }}'
Above line awk uses multiple field separators.I have used colon and comma here
Since this looks like JSON, let's parse it like JSON:
perl -MJSON -ne '$json = decode_json($_); print $json->[0]{foo2}, "\n"' <<END
[{"foo1":"some value","foo2":"some, value","foo3":"some value"}]
some, value

Delete a specific string with tr

Is it possible to delete a specific string with tr command in a UNIX-Shell?
For example: If I type:
tr -d "1."
and the input is 1.1231, it would show 23 as an output, but I want it to show 1231 (notice only the first 1 has gone). How would I do that?
If you know a solution or a better way, please explain the syntax since I don't want to just copy&paste but also to learn.
I have huge problems with awk, so if you use this, please explain it even more.
In your example above the cut command would suffice.
Example: echo '1.1231' | cut -d '.' -f 2 would return 1231.
For more information on cut, just type man cut.
You would be better off using some kind of regex (maybe something like sed).
For example, with the input 1.1231 you could use the following to get the 1231 output:
sed 's/1\.//g'
Maybe have a look here:
You could also use sed for this kind of thing:
$ echo "1.1231" | sed -e "s/1\.//"
This is just using sed to run a regular expression search and replace, replacing "1." (with appropriate escaping) with "". It only deletes the first match by default.
If you are using bash, you can do this easily with parameter substitution:
$ a=1.1231
$ echo ${a#1.}
This will remove the leading "1." string. If you want to remove up to and including the first occurrence, use ${a#*1.} and if you want to remove everything up to and including the last occurrence, use ${##*1.}.
The TLDP page on string manipulation has further options (such as substring extraction).
Note that using standard sh built-in string manipulation tools for such simple transformations will always be much faster than using an external tool, such as sed, awk or cut because the shell doesn't have to create a sub-process to perform the operation. However, for more complicated things (e.g. you need to use regular expressions or when the input is large), you're better of using the dedicated tools.
Since you asked specifically about awk, here is another one.
awk '{ gsub(/1\./,"") }1' input.txt
As any awk tutorial will tell you, the general form of an awk program is a sequence of 'condition { actions }'. If you have no actions, the default action is to print. If you have no conditions, the actions will be taken unconditionally. This program uses both of these special cases.
The first part is an action without a condition, i.e. it will be taken for all lines. The action is to substitute all occurrences of the regular expression /1\./ with nothing. So this will trim any '1.' (regardless of context) from a line.
The second part is a condition without an action, i.e. it will print if the condition is true, and the condition is always true. This is a common idiom for "we are done -- print whatever we have now". It consists simply of the constant 1 (which when used as a condition means "true", simply).
This could be reformulated in a number of ways. For example, you could factor the print into the first action;
awk '{ gsub(/1\./,""); print }' input.txt
Perhaps you want to substitute the integer part, i.e. any numbers before a period sign. The regex for that would be something like /[0-9]+\./.
gsub is a GNU extension, so you might want to replace it with sub or some sort of loop if you need portability to legacy awk syntax.
