bash: How to extract episode number from a string - bash

Suppose you have this string variable in bash:
filename="House Of Lies 5x02 HDTV XviD [DivxTotaL]"
What can I do to get the 5x02 part?
I've tried with grep with no luck:
echo "$filename" > grep -c '[0-9]x[0-9]{2}'

The option -c which you are passing with grep is wrong
-c Only a count of selected lines is written to standard output.
$ echo $filename | grep -oE '[0-9]{1,2}x[0-9]{1,3}'
5x02
-o Prints only the matching part of the lines.
-E Extended Regex

echo "$filename" | egrep -o '[0-9]x[0-9]{2}'
>file redirects output to a file; |cmd pipes it to another command. -c counts the number of matches, which isn't useful here; -o outputs the matching string(s). To be able to use {2} you need to enable extended regexes, which egrep does.

Related

User input into variables and grep a file for pattern

H!
So I am trying to run a script which looks for a string pattern.
For example, from a file I want to find 2 words, located separately
"I like toast, toast is amazing. Bread is just toast before it was toasted."
I want to invoke it from the command line using something like this:
./myscript.sh myfile.txt "toast bread"
My code so far:
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w "$keyword_first""$keyword_second" )
echo $find_keyword
i have tried a few different ways. Directly from the command line I can make it run using:
cat myfile.txt | grep -E 'toast|bread'
I'm trying to put the user input into variables and use the variables to grep the file
You seem to be looking simply for
grep -E "$2|$3" "$1"
What works on the command line will also work in a script, though you will need to switch to double quotes for the shell to replace variables inside the quotes.
In this case, the -E option can be replaced with multiple -e options, too.
grep -e "$2" -e "$3" "$1"
You can pipe to grep twice:
find_keyword=$(cat $text_file | grep -w "$keyword_first" | grep -w "$keyword_second")
Note that your search word "bread" is not found because the string contains the uppercase "Bread". If you want to find the words regardless of this, you should use the case-insensitive option -i for grep:
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
In a full script:
#!/bin/bash
#
# usage: ./myscript.sh myfile.txt "toast" "bread"
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
echo $find_keyword

pattern matching in the filename and change extension - bash script

I want to use name-last.txt files to call another several files in previous directories which names belong to the filename string:
For example, for Perez-Castillo.txt, I want to used: (1) grep in Perez-Castillo.txt, (2) grep in Perez.list and (3) grep in Castillo.list.
I have this part:
for i in *.txt;
do
wc -l $i > out1.txt
grep -c "something" ../${i%-*}.list > out2.txt
grep -c "something" ../${i#*-}.list > out3.txt
done;
However, I fail to call i.e Castillo.list, as my script is calling Castillo.txt.list
Any suggestion?
Bash doesn't let you nest two transformations into a single parameter expansion, so there is no way to delete both a prefix and a suffix with a parameter expansion.
So the simplest approach is to just remove the .txt extension at the beginning:
for i in *.txt; do
pfx=${i%.txt}
wc -l "${pfx}.txt" > out1.txt
grep -c "something" "../${pfx%-*}.list" > out2.txt
grep -c "something" "../${pfx#*-}.list" > out3.txt
done;

How to grep and match the first occurrence of a line?

Given the following content:
title="Bar=1; Fizz=2; Foo_Bar=3;"
I'd like to match the first occurrence of Bar value which is 1. Also I don't want to rely on soundings of the word (like double quote in the front), because the pattern could be in the middle of the line.
Here is my attempt:
$ grep -o -m1 'Bar=[ ./0-9a-zA-Z_-]\+' input.txt
Bar=1
Bar=3
I've used -m/--max-count which suppose to stop reading the file after num matches, but it didn't work. Why this option doesn't work as expected?
I could mix with head -n1, but I wondering if it is possible to achieve that with grep?
grep is line-oriented, so it apparently counts matches in terms of lines when using -m[1]
- even if multiple matches are found on the line (and are output individually with -o).
While I wouldn't know to solve the problem with grep alone (except with GNU grep's -P option - see anubhava's helpful answer), awk can do it (in a portable manner):
$ awk -F'Bar=|;' '{ print $2 }' <<<"Bar=1; Fizz=2; Foo_Bar=3;"
1
Use print "Bar=" $2, if the field name should be included.
Also note that the <<< method of providing input via stdin (a so-called here-string) is specific to Bash, Ksh, Zsh; if POSIX compliance is a must, use echo "..." | grep ... instead.
[1] Options -m and -o are not part of the grep POSIX spec., but both GNU and BSD/OSX grep support them and have chosen to implement the line-based logic.
This is consistent with the standard -c option, which counts "selected lines", i.e., the number of matching lines:
grep -o -c 'Bar=[ ./0-9a-zA-Z_-]\+' <<<"Bar=1; Fizz=2; Foo_Bar=3;" yields 1.
Using perl based regex flavor in gnu grep you can use:
grep -oP '^(.(?!Bar=\d+))*Bar=\d+' <<< "Bar=1; Fizz=2; Foo_Bar=3;"
Bar=1
(.(?!Bar=\d+))* will match 0 or more of any characters that don't have Bar=\d+ pattern thus making sure we match first Bar=\d+
If intent is to just print the value after = then use:
grep -oP '^(.(?!Bar=\d+))*Bar=\K\d+' <<< "Bar=1; Fizz=2; Foo_Bar=3;"
1
You can use grep -P (assuming you are on gnu grep) and positive look ahead ((?=.*Bar)) to achieve that in grep:
echo "Bar=1; Fizz=2; Foo_Bar=3;" | grep -oP -m 1 'Bar=[ ./0-9a-zA-Z_-]+(?=.*Bar)'
First use a grep to make the line start with Bar, and then get the Bar at the start of the line:
grep -o "Bar=.*" input.txt | grep -o -m1 "^Bar=[ ./0-9a-zA-Z_-]\+"
When you have a large file, you can optimize with
grep -o -m1 "Bar=.*" input.txt | grep -o -m1 "^Bar=[ ./0-9a-zA-Z_-]\+"

How could I redirect file name into counts by tab using one line commands in bash?

I have some files in fasta format and want to counts their reads and would like to have output in file names and their corresponding counts.
input file names:
1.fa
2.fa
3.fa
...
I tried:
for i in $(ls -t -v *.fa); do grep -c '>' $i > echo $i >> out.txt ; done
Problem:
It gives me out.txt but double file names and their counts by ':' separated. However, I need a tab and unique file names.
1.fa:7323580
1.fa:7323580
2.fa:5591179
2.fa:5591179
...
Suggested solution
grep -c '>' *.fa | sed 's/:/'$'\t'/ > out.txt
The $'\t\' is a Bash-ism called ANSI C Quoting.
Analysis of what went wrong
Your code is:
for i in $(ls -t -v *.fa); do grep -c '>' $i > echo $i >> out.txt ; done
It isn't a good idea to parse the output of the ls command. However, if your file names are well behaved (roughly, in the portable filename character set, which is [-A-Za-z._]), you'll be reasonably OK.
Your grep command, though, is confused. It is:
grep -c '>' $i > echo $i >> out.txt
That could be written more clearly as:
grep -c '>' $i $i > echo >> out.txt
This means 'count the number of lines containing > in $i, and then in $i again, and send the output first to a file echo, and then append to out.txt. Since the append overrides the redirection, the file echo is empty. You get the file name included in the output because there are two files to search; with only one file, you wouldn't get the file name too. (One way to ensure you get file names with regular (not -c or -l) grep is to scan /dev/null too. Many versions of grep also provide options to get the name explicitly, but POSIX doesn't mandate one. BSD grep uses -H; so does GNU grep.)
So, that's why you got the double file names and entries in your output.
Try this:
for i in $(ls -t -v *.fa)
do
c=$(grep -c '>' $i | awk -F: '{print $2}')
echo "$i: $c" >> out.txt
done

How to pass output of grep to sed?

I have a command like this :
cat error | grep -o [0-9]
which is printing only numbers like 2,30 and so on. Now I wish to pass this number to sed.
Something like :
cat error | grep -o [0-9] | sed -n '$OutPutFromGrep,$OutPutFromGrepp'
Is it possible to do so?
I'm new to shell scripting. Thanks in advance
If the intention is to print the lines that grep returns, generating a sed script might be the way to go:
grep -E -o '[0-9]+' error | sed 's/$/p/' | sed -f - error
You are probably looking for xargs, particularly the -I option:
themel#eristoteles:~$ xargs -I FOO echo once FOO, twice FOO
hi
once hi, twice hi
there
once there, twice there
Your example:
themel#eristoteles:~$ cat error
error in line 123
error in line 234
errors in line 345 and 346
themel#eristoteles:~$ grep -o '[0-9]*' < error | xargs -I OutPutFromGrep echo sed -n 'OutPutFromGrep,OutPutFromGrepp'
sed -n 123,123p
sed -n 234,234p
sed -n 345,345p
sed -n 346,346p
For real-world use, you'll probably want to pass sed an input file and remove the echo.
(Fixed your UUOC, by the way. )
Yes you can pass output from grep to sed.
Please note that in order to match whole numbers you need to use [0-9]* not only [0-9] which would match only a single digit.
Also note you should use double quotes to get variables expanded(in the sed argument) and it seems you have a typo in the second variable name.
Hope this helps.

Resources