Printing only the matched pattern between <start> and <end> in sed - bash

Input -
ajflkajlkaj
ddddddddddd/need/to/print/thiskkkkkkkkkkkk
ahfkjahkfja
Output needed
/need/to/print/this
Googling and tweaking I was able to extract "this" like by doing this-
sed -n 's/^.*\(this\).*$/\1/p' myfile
but can't seem to get the syntax right for getting the rest of the pattern starting from from "/need".
I would appreciate any pointers/suggestions.
Thanks in advance.

You need to let us know what the ddd and kkk really stand for. Assuming there is some white space behind "this", you can match this pattern instead:
sed -n 's/^[^\/]*\(.*\)\s.*$/\1/p' myfile
[^\/]* means match as many non-/ characters first.

Try this :
sed -n 's![^/]\+\(.*/this\).*!\1!p' filename.txt
NOTE
the sed separator can be what you want instead of the default /. I use ! here.
If you prefer, a simple grep command will works too :
grep --binary-files=text -o '/.*/this' filename.txt

Related

How to replace 2 characters if they are side by side in bash?

I'm looking to replace 2 characters in a string only if they are side by side.
Output I have now
hu1.cqf51:qu-2/2
hu2.cqf55:qe-2/2
hu2.cqf41:qe-2/2
The first line is incorrect as it has "qu" instead of "qe"
I'm looking to replace the qu with qe without replacing any other "q" or "e" in the string
Desired output
hu1.cqf51:qe-2/2
hu2.cqf55:qe-2/2
hu2.cqf41:qe-2/2
What I have tried
sed -r 's/[qu]+/qe/g'
sed -e 's//qe/g'
sed 's/\S*\(qu\)\S*//g'
Even considered just trying to delete the whole word alone if it matched with the command below, however it deleted everything with a q or u in it.
sed -e 's/[^ ]*qu[^ ]*//g'
Thank you for your help!!
Can you just try this ?
sed -e 's/qu/qe/' your_file
I'd suggest
sed 's/:qu-/:qe-/'
but if this is a standard format there is probably a standard way.

Extracting all but a certain sequence of characters in Bash

In bash I need to extract a certain sequence of letters and numbers from a filename. In the example below I need to extract just the S??E?? section of the filenames. This must work with both upper/lowercase.
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
Expected output would be:
s01e02
s03e12
S05E11
I've been trying to do this with SED but can't get it to work. This is what I have tried, without success:
sed 's/.*s[0-9][0-9]e[0-9][0-9].*//'
Many thanks for any help.
With sed we can match the desired string in a capture group, and use the I suffix for case-insensitive matching, to accomplish the desired result.
For the sake of this answer I'm assuming the filenames are in a file:
$ cat fnames
my.show.s01e02.h264.aac.subs.mkv
great.s03e12.h264.Dolby.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
One sed solution:
$ sed -E 's/.*\.(s[0-9][0-9]e[0-9][0-9])\..*/\1/I' fnames
s01e02
s03e12
S05E11
Where:
-E - enable extended regex support
\.(s[0-9][0-9]e[0-9][0-9])\. - match s??e?? with a pair of literal periods as bookends; the s??e?? (wrapped in parens) will be stored in capture group #1
\1 - print out capture group #1
/I - use case-insensitive matching
I think your pattern is ok. With the grep -o you get only the matched part of a string instead of matching lines. So
grep -io 'S[0-9]{2}E[0-9]{2}'
solves your problem. Compared to your pattern only numbers will be matched. Maybe you can put it in an if, so lines without a match show that something is wrong with the filename.
Suppose you have those file names:
$ ls -1
great.s03e12.h264.Dolby.mkv
my.show.s01e02.h264.aac.subs.mkv
what.a.fab.title.S05E11.Atmos.h265.subs.eng.mp4
You can extract the substring this way:
$ printf "%s\n" * | sed -E 's/^.*([sS][0-9][0-9][eE][0-9][0-9]).*/\1/'
Or with grep:
$ printf "%s\n" *.m* | grep -o '[sS][0-9][0-9][eE][0-9][0-9]'
Either prints:
s03e12
s01e02
S05E11
You could use that same sed or grep on a file (with filenames in it) as well.

Use awk to extract value from a line

I have these two lines within a file:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
where I'd like to get the following as output using awk or sed:
3
50000
Using this sed command does not work as I had hoped, and I suspect this is due to the presence of the quotes and delimiters in my line entry.
sed -n '/WORD1/,/WORD2/p' /path/to/file
How can I extract the values I want from the file?
awk -F'[<>]' '{print $3}' input.txt
input.txt:
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
Output:
3
50000
sed -e 's/[a-zA-Z.<\/>= \-]//g' file
Using sed:
sed -E 's/.*limit"*>([0-9]+)<.*/\1/' file
Explanation:
.* takes care of everything that comes before the string limit
limit"* takes care of both the lines, one with limit" and the other one with just limit
([0-9]+) takes care of matching numbers and only numbers as stated in your requirement.
\1 is actually a shortcut for capturing pattern. When a pattern groups all or part of its content into a pair of parentheses, it captures that content and stores it temporarily in memory. For more details, please refer https://www.inkling.com/read/introducing-regular-expressions-michael-fitzgerald-1st/chapter-4/capturing-groups-and
The script solution with parameter expansion:
#!/bin/bash
while read line || test -n "$line" ; do
value="${line%<*}"
printf "%s\n" "${value##*\>}"
done <"$1"
output:
$ ./ltags.sh dat/ltags.txt
3
50000
Looks like XML to me, so assuming it forms part of some valid XML, e.g.
<root>
<first-value system-property="unique.setting.limit">3</first-value>
<second-value-limit>50000</second-value-limit>
</root>
You can use Perl's XML::Simple and do something like this:
perl -MXML::Simple -E '$xml = XMLin("file"); say $xml->{"first-value"}->{"content"}; say $xml->{"second-value-limit"}'
Output:
3
50000
If the XML structure is more complicated, then you may have to drill down a bit deeper to get to the values you want. If that's the case, you should edit the question to show the bigger picture.
Ashkan's awk solution is straightforward, but let me suggest a sed solution that accepts non-integer numbers:
sed -n 's/[^>]*>\([.[:digit:]]*\)<.*/\1/p' input.txt
This extracts the number between the first > character of the line and the following <. In my RE this "number" can be the empty string, if you don't want to accept an empty string please add the -r option to sed and replace \([.[:digit:]]*\) by ([.[:digit:]]+).

How to parse a config file using sed

I've never used sed apart from the few hours trying to solve this. I have a config file with parameters like:
test.us.param=value
test.eu.param=value
prod.us.param=value
prod.eu.param=value
I need to parse these and output this if REGIONID is US:
test.param=value
prod.param=value
Any help on how to do this (with sed or otherwise) would be great.
This works for me:
sed -n 's/\.us\././p'
i.e. if the ".us." can be replaced by a dot, print the result.
If there are hundreds and hundreds of lines it might be more efficient to first search for lines containing .us. and then do the string replacement... AWK is another good choice or pipe grep into sed
cat INPUT_FILE | grep "\.us\." | sed 's/\.us\./\./g'
Of course if '.us.' can be in the value this isn't sufficient.
You could also do with with the address syntax (technically you can embed the second sed into the first statement as well just can't remember syntax)
sed -n '/\(prod\|test\).us.[^=]*=/p' FILE | sed 's/\.us\./\./g'
We should probably do something cleaner. If the format is always environment.region.param we could look at forcing this only to occur on the text PRIOR to the equal sign.
sed -n 's/^\([^,]*\)\.us\.\([^=]\)=/\1.\2=/g'
This will only work on lines starting with any number of chars followed by '.' then 'us', then '.' and then anynumber prior to '=' sign. This way we won't potentially modify '.us.' if found within a "value"

Remove nth character from middle of string using Shell

I've been searching google for ever, and I cannot find an example of how to do this. I also do not grasp the concept of how to construct a regular expression for SED, so I was hoping someone could explain this to me.
I'm running a bash script against a file full of lines of text that look like this: 2222,H,73.82,04,07,2012
and I need to make them all look like this: 2222,H,73.82,04072012
I need to remove the last two commas, which are the 16th and 19th characters in the line.
Can someone tell me how to do that? I was going to use colrm, which is blessedly simple, but i can't seem to get that installed in CYGWIN. Please and thank you!
I'd use awk for this:
awk -F',' -v OFS=',' '{ print $1, $2, $3, $4$5$6 }' inputfile
This takes a CSV file and prints the first, second and third fields, each followed by the output field separator (",") and then the fourth, fifth and sixth fields concatenated.
Personally I find this easier to read and maintain than regular expression-based solutions in sed and it will cope well if any of your columns get wider (or narrower!).
This will work on any string and will remove only the last 2 commas:
sed -e 's/\(.*\),\([^,]*\),\([^,]*\)$/\1\2\3/' infile.txt
Note that in my sed variant I have to escape parenthesis, YMMV.
I also do not grasp the concept of how to construct a regular
expression for SED, so I was hoping someone could explain this to me.
The basic notation that people are telling you here is: s/PATTERN/REPLACEMENT/
Your PATTERN is a regular expression, which may contain parts that are in brackets. Those parts can then be referred to in the REPLACEMENT part of the command. For example:
> echo "aabbcc" | sed 's/\(..\)\(..\)\(..\)/\2\3\1/'
bbccaa
Note that in the version of sed I'm using defaults to the "basic" RE dialect, where the brackets in expressions need to be escaped. You can do the same thing in the "extended" dialect:
> echo "aabbcc" | sed -E 's/(..)(..)(..)/\2\3\1/'
bbccaa
(In GNU sed (which you'd find in Linux), you can get the same results with the -r options instead of -E. I'm using OS X.)
I should say that for your task, I would definitely follow Johnsyweb's advice and use awk instead of sed. Much easier to understand. :)
It should work :
sed -e 's~,~~4g' file.txt
remove 4th and next commas
echo "2222,H,73.82,04,07,2012" | sed -r 's/(.{15}).(..)./\1\2/'
Take 15 chars, drop one, take 2, drop one.
sed -e 's/(..),(..),(....)$/\1\2\3/' myfile.txt

Resources