Is it possible to delete a specific string with tr command in a UNIX-Shell?
For example: If I type:
tr -d "1."
and the input is 1.1231, it would show 23 as an output, but I want it to show 1231 (notice only the first 1 has gone). How would I do that?
If you know a solution or a better way, please explain the syntax since I don't want to just copy&paste but also to learn.
I have huge problems with awk, so if you use this, please explain it even more.
In your example above the cut command would suffice.
Example: echo '1.1231' | cut -d '.' -f 2 would return 1231.
For more information on cut, just type man cut.
You would be better off using some kind of regex (maybe something like sed).
For example, with the input 1.1231 you could use the following to get the 1231 output:
sed 's/1\.//g'
Maybe have a look here:
http://tldp.org/LDP/abs/html/string-manipulation.html
You could also use sed for this kind of thing:
$ echo "1.1231" | sed -e "s/1\.//"
1231
This is just using sed to run a regular expression search and replace, replacing "1." (with appropriate escaping) with "". It only deletes the first match by default.
If you are using bash, you can do this easily with parameter substitution:
$ a=1.1231
$ echo ${a#1.}
1231
This will remove the leading "1." string. If you want to remove up to and including the first occurrence, use ${a#*1.} and if you want to remove everything up to and including the last occurrence, use ${##*1.}.
The TLDP page on string manipulation has further options (such as substring extraction).
Note that using standard sh built-in string manipulation tools for such simple transformations will always be much faster than using an external tool, such as sed, awk or cut because the shell doesn't have to create a sub-process to perform the operation. However, for more complicated things (e.g. you need to use regular expressions or when the input is large), you're better of using the dedicated tools.
Since you asked specifically about awk, here is another one.
awk '{ gsub(/1\./,"") }1' input.txt
As any awk tutorial will tell you, the general form of an awk program is a sequence of 'condition { actions }'. If you have no actions, the default action is to print. If you have no conditions, the actions will be taken unconditionally. This program uses both of these special cases.
The first part is an action without a condition, i.e. it will be taken for all lines. The action is to substitute all occurrences of the regular expression /1\./ with nothing. So this will trim any '1.' (regardless of context) from a line.
The second part is a condition without an action, i.e. it will print if the condition is true, and the condition is always true. This is a common idiom for "we are done -- print whatever we have now". It consists simply of the constant 1 (which when used as a condition means "true", simply).
This could be reformulated in a number of ways. For example, you could factor the print into the first action;
awk '{ gsub(/1\./,""); print }' input.txt
Perhaps you want to substitute the integer part, i.e. any numbers before a period sign. The regex for that would be something like /[0-9]+\./.
gsub is a GNU extension, so you might want to replace it with sub or some sort of loop if you need portability to legacy awk syntax.
Related
how to tell awk ignore the delmiter in double quotation marks
eg
line='test,t2,t3,"t5,"'
$(echo $line | awk -F "," '{print $4}')
Expected value is "t5,"
but in fact is "t5"
how to get "t5,"?
With GNU awk for FPAT, all you need for your case is:
$ line='test,t2,t3,"t5,"'
$ echo "$line" | awk -v FPAT='([^,]*)|("[^"]*")' '{print $4}'
"t5,"
and if your awk can contain newlines and escaped quotes then see What's the most robust way to efficiently parse CSV using awk?.
Your arbitrary input could be checked or if you know where your input is not well formatted, use substr() starting from index 2 in column 4.
$ echo 'test,t2,t3,"t5,"' | awk -F, '{printf "%s,\n", substr($4,2) }'
t5,
Perhaps this is better.
echo 'test,t2,t3,"t5,"' | awk -F, '{print $(NF-1),$NF}' OFS=,
"t5,"
In the general case, you can't. You need a full parser to remember a tag, change state, then go back to the prior state when it encounters the matching tag. You can't do it with a regular expression unless you make a lot of assumptions about the shape of your data--and since I see you're parsing CSV, those assumptions will not hold true.
If you like awk, I suggest trying perl for this problem. You can either use somebody else's CSV parsing library (search here), or you can write your own. Of course, there's no reason you can't write a CSV parser in pure awk, so long as you understand that this is not what awk is good at. You need to parse character by character (don't separate records by newlines), remember the current state (is the line quoted?) and remember the previous character to see whether it was a backslash (for treating a quote as a literal quote or a comma as a literal comma). You need to remember the previous quote so you can parse "" as an escaped quote instead of a malformed field. It's kind of fun, and it's a bitch. Use somebody else's library if you like. I wouldn't choose awk to write any parser where the records don't have an obvious separator.
Edit: Ed Morton actually did write a full CSV parser for Gawk, which he linked to in his answer. I helped him break it, and he quickly fixed the problem case. His script will be useful, though it will be somewhat unwieldy to adapt to real-world uses.
I am using a SIPP server simulator to verify incoming calls.
What I need to verify is the caller ID and the dialed digits. I've logged this information to a file, which now contains, for example, the following:
From: <sip:972526134661#server>;tag=60=.To: <sip:972526134662#server>}
in each line.
What I want is to modify it to a csv file containing simply the two phone numbers, such as follows:
972526134661,972526134662
and etc.
I've tried using the awk -F command, but then I can only use the sip: as a delimiter or the # or / as delimiters.
While, basically what I want to do is to take all the strings which begin with a < and end with >, and then take all the strings that follow the sip: delimiter.
using the cut command is also not an option, as I understand that it cannot use strings as delimiters.
I guess it should be really simple but I haven't find quite the right thing to use.. Would appreciate the help, thanks!
OK, for fun, picking some random data (from your original post) and using awk -F as you originally wanted.
To note, because your file is "generated", we can assume a regular format for the data and not expect the "short" patterns to cause mis-hits.
[g]awk -F'sip:|#' -v OFS="," '{print $2,$4}' yourlogfile
It uses both sip: and # as the Field Separator, by means of the alternation operator |. It can easily be extended to allow further characters or strings to also be used to separate fields in the input if required. The built-in variable FS can contain a regular expression/regexp like this.
For that first sample in your question, it yields this:
972526134661,972526134662
For the latest (revision 8) version, and guessing what you want:
[g]awk -F'sip:|#|to_number:' -v OFS="," '{print $2,$5}' yourlogfile
Yields this:
from_number,972526134662
The [g]awk is because I used gawk on my machine, and got same behaviour with awk.
Slight amendment in style, suggested by #fedorqui, to use the command-line option -v to set the value for the Output Field Separator (an AWK built-in variable which can be amended using -v like any other variable) and separating the print fields with a comma, so that they are treated in the output as fields, rather than building a string with a hard-coded "," and treating it as one field.
I would suggest using sed to extract the two numbers:
$ sed -n 's/^From: <sip:\([0-9]*\).*To: <sip:\([0-9]*\).*/\1,\2/p' file
972526134661,972526134662
The regular expression matches a line beginning with From and captures the two numbers after <sip:. If the spaces are variable, you may want to add * to those places.
You can use a regex replace, as long as the format stays the same (order is always From/To):
sed -E "s/^.*sip:([0-9]+)#.*sip:([0-9]+)#.*$/\1,\2/"
It's not a very specific or perfect solution, but in most cases an approach like this is enough.
I have a command whose output is of the form:
[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]
I want to take the output of this command and just get the value corresponding to foo2
How do I use sed/awk or any other shell utility readily available in a bash script to do this?
Assuming that the values do not contain commas, this sed rune will do it:
sed -n 's/.*"foo2":\([^,]*\),.*/\1/'p
sed -n tells sed not to print lines by default.
The s ("substitute") command uses a regexp group delimited by \( and \) to pick out just the bit you want.
"foo2": provides the context needed to find the right value.
[^,]* means "a character that is not a comma, any number of times". This is your . If values are not delimited by commas, change this (and the comma after the grouping parens) to match correctly.
.* means "any character, any number of times", and it is used to match all the characters before and after the bit you want. Now the regexp will match the entire line.
\1 means the contents of the grouping parentheses. sed will substitute the string that matches the pattern (which is the whole line, because we used .* at the beginning and end) with the contents of the parens, .
Finally, the p on the end means "print the resulting line".
With this awk for example:
$ awk -F[:,] '{print $4}' file
<some value2>
-F[:,] sets possible field separators as : or ,. Then, it is a matter of counting the position in which <some value> of foo2 are. It happens to be the 4th.
With sed:
$ sed 's/.*"foo2":\([^,]*\).*/\1/g' file
<some value2>
.*"foo2":\([^,]*\).* gets the string coming after foo2: and until the comma appears. Then it prints it back with \1.
Your block of data looks like JSON. There is no native JSON parsing in bash, sed or awk, so ALL the answers here will either suggest that you use a different, more appropriate tool, or they will be hackish and might easily fail if your real data looks different from the example you've provided here.
That said, if you are confident that your variable:value blocks and line structure are always in the same format as this example, you may be able to get away with writing your own (very) basic parser that will work for just your use case.
Note that you can't really parse things in sed, it's just not designed for that. If your data always looks the same, a sed solution may be sufficient ... but remember that you are simply pattern matching, not parsing the input data. There are other answers already which cover this.
For very simple matching of the string that appears after the colon after "foo2", as Peter suggested, you could use the following:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | sed -ne 's/.*"foo2":\([^,]*\),.*/\1/p'
As I say, this should in no way be confused with parsing of your JSON. It would work equally well (or badly) with an input string of abcde"foo2":bar,abcde.
In awk, you can make things that are a bit more advanced, but you still have serious limitations when it comes to JSON. For example, if you choose to separate fields with commas, but then you put a comma inside the <some value> in your data, awk doesn't know how to distinguish it from a field separator.
That said, if your JSON is only one level deep (i.e. matches your sample data), the following might work for you:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | awk -F: -vRS=, '{gsub(/[^[:alnum:]]/,"",$1)} $1=="foo2" {print $2}'
This awk script considers commas as record separators and colons as field separators. It does not support any level of depth in your JSON, and depends on alphanumeric variable names. But it should handle JSON split on to multiple lines.
Alternately, if you want to avoid ugly hacks, and perl or python solutions don't work for you, you might want to try out jsawk. With it, you might use something like this:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | jsawk -a 'return this.foo2'
[222]
SEE ALSO: Parsing json with awk/sed in bash to get key value pair
This worked for me. You can Try this one
echo "[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]" | awk -F"[:,]+" '{ if($3=="foo2") { print $4 }}'
Above line awk uses multiple field separators.I have used colon and comma here
Since this looks like JSON, let's parse it like JSON:
perl -MJSON -ne '$json = decode_json($_); print $json->[0]{foo2}, "\n"' <<END
[{"foo1":"some value","foo2":"some, value","foo3":"some value"}]
END
some, value
I've been searching google for ever, and I cannot find an example of how to do this. I also do not grasp the concept of how to construct a regular expression for SED, so I was hoping someone could explain this to me.
I'm running a bash script against a file full of lines of text that look like this: 2222,H,73.82,04,07,2012
and I need to make them all look like this: 2222,H,73.82,04072012
I need to remove the last two commas, which are the 16th and 19th characters in the line.
Can someone tell me how to do that? I was going to use colrm, which is blessedly simple, but i can't seem to get that installed in CYGWIN. Please and thank you!
I'd use awk for this:
awk -F',' -v OFS=',' '{ print $1, $2, $3, $4$5$6 }' inputfile
This takes a CSV file and prints the first, second and third fields, each followed by the output field separator (",") and then the fourth, fifth and sixth fields concatenated.
Personally I find this easier to read and maintain than regular expression-based solutions in sed and it will cope well if any of your columns get wider (or narrower!).
This will work on any string and will remove only the last 2 commas:
sed -e 's/\(.*\),\([^,]*\),\([^,]*\)$/\1\2\3/' infile.txt
Note that in my sed variant I have to escape parenthesis, YMMV.
I also do not grasp the concept of how to construct a regular
expression for SED, so I was hoping someone could explain this to me.
The basic notation that people are telling you here is: s/PATTERN/REPLACEMENT/
Your PATTERN is a regular expression, which may contain parts that are in brackets. Those parts can then be referred to in the REPLACEMENT part of the command. For example:
> echo "aabbcc" | sed 's/\(..\)\(..\)\(..\)/\2\3\1/'
bbccaa
Note that in the version of sed I'm using defaults to the "basic" RE dialect, where the brackets in expressions need to be escaped. You can do the same thing in the "extended" dialect:
> echo "aabbcc" | sed -E 's/(..)(..)(..)/\2\3\1/'
bbccaa
(In GNU sed (which you'd find in Linux), you can get the same results with the -r options instead of -E. I'm using OS X.)
I should say that for your task, I would definitely follow Johnsyweb's advice and use awk instead of sed. Much easier to understand. :)
It should work :
sed -e 's~,~~4g' file.txt
remove 4th and next commas
echo "2222,H,73.82,04,07,2012" | sed -r 's/(.{15}).(..)./\1\2/'
Take 15 chars, drop one, take 2, drop one.
sed -e 's/(..),(..),(....)$/\1\2\3/' myfile.txt
I want to test whether a phone number is valid, and then translate it to a different format using a script. This far I can test the number like this:
sed -n -e '/(0..)-...\s..../p' -e '/(0..)-...-..../p'
However, I don't just want to test the number and output it, I would like to remove the brackets, dashes and spaces and output that.
Is there any way to do that using sed? Or should I be using something else, like AWK?
I'm not sure why you're using a 0 in that position. You're saying "a zero followed by any two characters" in the area code position. Is that really what you mean?
Anyway, you want to use the sed substitution operator with the p command in conjunction with the -n switch. Here's one way to do it:
sed -n 's/(\([0-9][0-9][0-9]\))\s\?\([0-9][0-9][0-9]\)[- ]\([0-9][0-9][0-9][0-9]\)/\1\2\3/p'
You can also use something as simple as egrep to validate lines and tr to remove the characters you don't want to see:
egrep "\([0-9]+\)[0-9.-]+" <file> |tr -d '()\-'
Note that it will only work if you don't want to keep any of those characters.
This is a more succinct version of Jonathan Feinberg's answer. It uses extended regular expressions to avoid having to do all the escaping that the curly braces would require (in addition to moving the escaping of parentheses from the special ones to the literal ones).
sed -r 's/\(([[:digit:]]{3})\)\s?([[:digit:]]{3})[ -]([[:digit:]]{4})/\1\2\3/'
this suggestion depends on how your number format looks like , for example, i assume phone number like this
echo "(703) 234 5678" | awk '
{
for(i=1;i<=NF;i++){
gsub(/\(|\)/,"",$i) # remove ( and )
if ($i+0>=0 ){ # check if it more than 0 and a number
print $i
}
if (){
# some other checks
}
}
}
'
do it systematically, and you don't have to waste time crafting out complex regex