Bash get last sentence in line - bash

assume, we got the following variable containing a string:
text="All of this is one line. But it consists of multiple sentences. Those are separated by dots. I'd like to get this sentence."
I now need the last sentence "I'd like to get this sentence.". I tried using sed:
echo "$text" | sed 's/.*\.*\.//'
I thought it would delete everything up to the pattern .*.. It doesn't.
What's the issue here? I'm sure this can be resolved rather fast, unfortunatly I did not find any solution for this.

Using awk you can do:
awk -F '\\. *' '{print $(NF-1) "."}' <<< "$text"
I'd like to get this sentence.
Using sed:
sed -E 's/.*\.([^.]+\.)$/\1/' <<< "$text"
I'd like to get this sentence.

Don't forget the built-in
echo "${text##*. }"
This requires a space after the full stop, but the pattern is easy to adapt if you don't want that.
As for your failed attempt, the regex looks okay, but weird. The pattern \.*\. looks for zero or more literal periods followed by a literal period, i.e. effectively one or more period characters.

Related

Using sed/awk to process a pattern in bash

I have a command whose output is of the form:
[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]
I want to take the output of this command and just get the value corresponding to foo2
How do I use sed/awk or any other shell utility readily available in a bash script to do this?
Assuming that the values do not contain commas, this sed rune will do it:
sed -n 's/.*"foo2":\([^,]*\),.*/\1/'p
sed -n tells sed not to print lines by default.
The s ("substitute") command uses a regexp group delimited by \( and \) to pick out just the bit you want.
"foo2": provides the context needed to find the right value.
[^,]* means "a character that is not a comma, any number of times". This is your . If values are not delimited by commas, change this (and the comma after the grouping parens) to match correctly.
.* means "any character, any number of times", and it is used to match all the characters before and after the bit you want. Now the regexp will match the entire line.
\1 means the contents of the grouping parentheses. sed will substitute the string that matches the pattern (which is the whole line, because we used .* at the beginning and end) with the contents of the parens, .
Finally, the p on the end means "print the resulting line".
With this awk for example:
$ awk -F[:,] '{print $4}' file
<some value2>
-F[:,] sets possible field separators as : or ,. Then, it is a matter of counting the position in which <some value> of foo2 are. It happens to be the 4th.
With sed:
$ sed 's/.*"foo2":\([^,]*\).*/\1/g' file
<some value2>
.*"foo2":\([^,]*\).* gets the string coming after foo2: and until the comma appears. Then it prints it back with \1.
Your block of data looks like JSON. There is no native JSON parsing in bash, sed or awk, so ALL the answers here will either suggest that you use a different, more appropriate tool, or they will be hackish and might easily fail if your real data looks different from the example you've provided here.
That said, if you are confident that your variable:value blocks and line structure are always in the same format as this example, you may be able to get away with writing your own (very) basic parser that will work for just your use case.
Note that you can't really parse things in sed, it's just not designed for that. If your data always looks the same, a sed solution may be sufficient ... but remember that you are simply pattern matching, not parsing the input data. There are other answers already which cover this.
For very simple matching of the string that appears after the colon after "foo2", as Peter suggested, you could use the following:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | sed -ne 's/.*"foo2":\([^,]*\),.*/\1/p'
As I say, this should in no way be confused with parsing of your JSON. It would work equally well (or badly) with an input string of abcde"foo2":bar,abcde.
In awk, you can make things that are a bit more advanced, but you still have serious limitations when it comes to JSON. For example, if you choose to separate fields with commas, but then you put a comma inside the <some value> in your data, awk doesn't know how to distinguish it from a field separator.
That said, if your JSON is only one level deep (i.e. matches your sample data), the following might work for you:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | awk -F: -vRS=, '{gsub(/[^[:alnum:]]/,"",$1)} $1=="foo2" {print $2}'
This awk script considers commas as record separators and colons as field separators. It does not support any level of depth in your JSON, and depends on alphanumeric variable names. But it should handle JSON split on to multiple lines.
Alternately, if you want to avoid ugly hacks, and perl or python solutions don't work for you, you might want to try out jsawk. With it, you might use something like this:
$ data='[{"foo1":11,"foo2":222,"foo3":3333}]'
$ echo "$data" | jsawk -a 'return this.foo2'
[222]
SEE ALSO: Parsing json with awk/sed in bash to get key value pair
This worked for me. You can Try this one
echo "[{"foo1":<some value>,"foo2":<some value>,"foo3":<some value>}]" | awk -F"[:,]+" '{ if($3=="foo2") { print $4 }}'
Above line awk uses multiple field separators.I have used colon and comma here
Since this looks like JSON, let's parse it like JSON:
perl -MJSON -ne '$json = decode_json($_); print $json->[0]{foo2}, "\n"' <<END
[{"foo1":"some value","foo2":"some, value","foo3":"some value"}]
END
some, value

Sed is not replacing all occurrences of pattern

I've got a the following variable LINES with the format date;album;song;duration;singer;author;genre.
August 2013;MDNA;Falling Free;00:31:40;Madonna;Madonna;Pop
August 2013;MDNA;I don't give a;00:45:40;Madonna;Madonna;Pop
August 2013;MDNA;I'm a sinner;01:00:29;Madonna;Madonna;Pop
August 2013;MDNA;Give Me All Your Luvin';01:15:02;Madonna;Madonna;Pop
I want to output author-song, so I made this script:
echo $LINES | sed s_"^[^;]*;[^;]*;\([^;]*\);[^;]*;[^;]*;\([^;]*\)"_"\2-\1"_g
The desired output is:
Madonna-Falling Free
Madonna-I don't give a
Madonna-I'm a sinner
Madonna-Give Me All Your Luvin'
However, I am getting this:
Madonna-Falling Free;Madonna;Pop August 2013;MDNA;I don't give a;00:45:40;Madonna;Madonna;Pop August 2013;MDNA;I'm a sinner;01:00:29;Madonna;Madonna;Pop August 2013;MDNA;Give Me All Your Luvin';01:15:02;Madonna;Madonna;Pop
Why?
EDIT: I need to use sed.
When I run your sed script on your input, I get this output:
Madonna-Falling Free;Pop
Madonna-I don't give a;Pop
Madonna-I'm a sinner;Pop
Madonna-Give Me All Your Luvin';Pop
which is fine except for the extra ;Pop - you just need to add .*$ to the end of your regex so that the entire line is replaced.
Based on your reported output, I'm guessing your input file is using a different newline convention from what sed expects.
In any case, this is a pretty silly thing to use sed for. Much better with awk, for instance:
awk 'BEGIN {FS=";";OFS="-"} {print $5,$3}'
or, slightly more tersely,
awk -F\; -vOFS=- '{print $5,$3}'
If you want sed to see more than one line of input, you must quote the variable to echo:
echo "$LINES" | sed ...
Note that I'm not even going to try to evaluate the correctness of your sed script; using sed here is a travesty, given that awk is so much better suited to the task.
It looks like sed is viewing your entire sample text as a single line. So it is performing the operation requested and then leaving the rest unchanged.
I would look into the newline issue first. How are you populating $LINES?
You should also add to the pattern that seventh field in your input (genre), so that the expression actually does consume all of the text that you want it to. And perhaps anchor the end of the pattern on $ or \b (word boundary) or \s (a spacey character) or \n (newline).
If your format is absolutely permanent, just try below:
echo $line | sed 's#.*;.*;\(.*\);.*;.*;\(.*\);.*#\2-\1#'

How to parse a config file using sed

I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.
This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.
If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"

Remove nth character from middle of string using Shell

I've been searching google for ever, and I cannot find an example of how to do this. I also do not grasp the concept of how to construct a regular expression for SED, so I was hoping someone could explain this to me.
I'm running a bash script against a file full of lines of text that look like this: 2222,H,73.82,04,07,2012
and I need to make them all look like this: 2222,H,73.82,04072012
I need to remove the last two commas, which are the 16th and 19th characters in the line.
Can someone tell me how to do that? I was going to use colrm, which is blessedly simple, but i can't seem to get that installed in CYGWIN. Please and thank you!
I'd use awk for this:
awk -F',' -v OFS=',' '{ print $1, $2, $3, $4$5$6 }' inputfile
This takes a CSV file and prints the first, second and third fields, each followed by the output field separator (",") and then the fourth, fifth and sixth fields concatenated.
Personally I find this easier to read and maintain than regular expression-based solutions in sed and it will cope well if any of your columns get wider (or narrower!).
This will work on any string and will remove only the last 2 commas:
sed -e 's/\(.*\),\([^,]*\),\([^,]*\)$/\1\2\3/' infile.txt
Note that in my sed variant I have to escape parenthesis, YMMV.
I also do not grasp the concept of how to construct a regular
expression for SED, so I was hoping someone could explain this to me.
The basic notation that people are telling you here is: s/PATTERN/REPLACEMENT/
Your PATTERN is a regular expression, which may contain parts that are in brackets. Those parts can then be referred to in the REPLACEMENT part of the command. For example:
> echo "aabbcc" | sed 's/\(..\)\(..\)\(..\)/\2\3\1/'
bbccaa
Note that in the version of sed I'm using defaults to the "basic" RE dialect, where the brackets in expressions need to be escaped. You can do the same thing in the "extended" dialect:
> echo "aabbcc" | sed -E 's/(..)(..)(..)/\2\3\1/'
bbccaa
(In GNU sed (which you'd find in Linux), you can get the same results with the -r options instead of -E. I'm using OS X.)
I should say that for your task, I would definitely follow Johnsyweb's advice and use awk instead of sed. Much easier to understand. :)
It should work :
sed -e 's~,~~4g' file.txt
remove 4th and next commas
echo "2222,H,73.82,04,07,2012" | sed -r 's/(.{15}).(..)./\1\2/'
Take 15 chars, drop one, take 2, drop one.
sed -e 's/(..),(..),(....)$/\1\2\3/' myfile.txt

How to use sed to test and then edit one line of input?

I want to test whether a phone number is valid, and then translate it to a different format using a script. This far I can test the number like this:
sed -n -e '/(0..)-...\s..../p' -e '/(0..)-...-..../p'
However, I don't just want to test the number and output it, I would like to remove the brackets, dashes and spaces and output that.
Is there any way to do that using sed? Or should I be using something else, like AWK?
I'm not sure why you're using a 0 in that position. You're saying "a zero followed by any two characters" in the area code position. Is that really what you mean?
Anyway, you want to use the sed substitution operator with the p command in conjunction with the -n switch. Here's one way to do it:
sed -n 's/(\([0-9][0-9][0-9]\))\s\?\([0-9][0-9][0-9]\)[- ]\([0-9][0-9][0-9][0-9]\)/\1\2\3/p'
You can also use something as simple as egrep to validate lines and tr to remove the characters you don't want to see:
egrep "\([0-9]+\)[0-9.-]+" <file> |tr -d '()\-'
Note that it will only work if you don't want to keep any of those characters.
This is a more succinct version of Jonathan Feinberg's answer. It uses extended regular expressions to avoid having to do all the escaping that the curly braces would require (in addition to moving the escaping of parentheses from the special ones to the literal ones).
sed -r 's/\(([[:digit:]]{3})\)\s?([[:digit:]]{3})[ -]([[:digit:]]{4})/\1\2\3/'
this suggestion depends on how your number format looks like , for example, i assume phone number like this
echo "(703) 234 5678" | awk '
{
for(i=1;i<=NF;i++){
gsub(/\(|\)/,"",$i) # remove ( and )
if ($i+0>=0 ){ # check if it more than 0 and a number
print $i
}
if (){
# some other checks
}
}
}
'
do it systematically, and you don't have to waste time crafting out complex regex

Resources