How to get only matching part of a string? - bash

I know how to use the utility sed in bash, and grep is also good. But for the output, they always output a line containing the pattern.
Is there a way in bash to cut out a particular string containing the pattern I want (using regexp)?

That's why you have --help flag, grep --help:
-o, --only-matching show only the part of a line matching PATTERN
Now you can
$ echo 'hello world' | grep -o hello
hello
Using sed:
$ echo "hello world" | sed 's/.*\(hello\).*/\1/'

With GNU grep you can use -o:
-o Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
Example:
$ echo ab cd ef gh | grep a
ab cd ef gh
$ echo ab cd ef gh | grep -o a
a

Through sed,
$ echo 'hello world' | sed 's/\(hello\)\|./\1/g'
hello
$ echo 'hello world' | sed -r 's/(hello)|./\1/g'
hello

Related

Strip everything after the ampersand in a URL [duplicate]

How can I remove all text after a character, in this case a colon (":"), in bash? Can I remove the colon, too? I have no idea how to.
In Bash (and ksh, zsh, dash, etc.), you can use parameter expansion with % which will remove characters from the end of the string or # which will remove characters from the beginning of the string. If you use a single one of those characters, the smallest matching string will be removed. If you double the character, the longest will be removed.
$ a='hello:world'
$ b=${a%:*}
$ echo "$b"
hello
$ a='hello:world:of:tomorrow'
$ echo "${a%:*}"
hello:world:of
$ echo "${a%%:*}"
hello
$ echo "${a#*:}"
world:of:tomorrow
$ echo "${a##*:}"
tomorrow
An example might have been useful, but if I understood you correctly, this would work:
echo "Hello: world" | cut -f1 -d":"
This will convert Hello: world into Hello.
$ echo 'hello:world:again' |sed 's/:.*//'
hello
I know some solutions:
# Our mock data:
A=user:mail:password
With awk and pipe:
$ echo $A | awk -v FS=':' '{print $1}'
user
Via bash variables:
$ echo ${A%%:*}
user
With pipe and sed:
$ echo $A | sed 's#:.*##g'
user
With pipe and grep:
$ echo $A | egrep -o '^[^:]+'
user
With pipe and cut:
$ echo $A | cut -f1 -d\:
user
egrep -o '^[^:]*:'
trim off everything after the last instance of ":"
grep -o '^.*:' fileListingPathsAndFiles.txt
and if you wanted to drop that last ":"
grep -o '^.*:' file.txt | sed 's/:$//'
#kp123: you'd want to replace : with / (where the sed colon should be \/)
Let's say you have a path with a file in this format:
/dirA/dirB/dirC/filename.file
Now you only want the path which includes four "/". Type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-4 -d"/"
and your output will be
/dirA/dirB/dirC
The advantage of using cut is that you can also cut out the uppest directory as well as the file (in this example), so if you type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-3 -d"/"
your output would be
/dirA/dirB
Though you can do the same from the other side of the string, it would not make that much sense in this case as typing
$ echo "/dirA/dirB/dirC/filename.file" | cut -f2-4 -d"/"
results in
dirA/dirB/dirC
In some other cases the last case might also be helpful. Mind that there is no "/" at the beginning of the last output.

How do you remove a section of of a file name after underscore including the underscore using bash? [duplicate]

How can I remove all text after a character, in this case a colon (":"), in bash? Can I remove the colon, too? I have no idea how to.
In Bash (and ksh, zsh, dash, etc.), you can use parameter expansion with % which will remove characters from the end of the string or # which will remove characters from the beginning of the string. If you use a single one of those characters, the smallest matching string will be removed. If you double the character, the longest will be removed.
$ a='hello:world'
$ b=${a%:*}
$ echo "$b"
hello
$ a='hello:world:of:tomorrow'
$ echo "${a%:*}"
hello:world:of
$ echo "${a%%:*}"
hello
$ echo "${a#*:}"
world:of:tomorrow
$ echo "${a##*:}"
tomorrow
An example might have been useful, but if I understood you correctly, this would work:
echo "Hello: world" | cut -f1 -d":"
This will convert Hello: world into Hello.
$ echo 'hello:world:again' |sed 's/:.*//'
hello
I know some solutions:
# Our mock data:
A=user:mail:password
With awk and pipe:
$ echo $A | awk -v FS=':' '{print $1}'
user
Via bash variables:
$ echo ${A%%:*}
user
With pipe and sed:
$ echo $A | sed 's#:.*##g'
user
With pipe and grep:
$ echo $A | egrep -o '^[^:]+'
user
With pipe and cut:
$ echo $A | cut -f1 -d\:
user
egrep -o '^[^:]*:'
trim off everything after the last instance of ":"
grep -o '^.*:' fileListingPathsAndFiles.txt
and if you wanted to drop that last ":"
grep -o '^.*:' file.txt | sed 's/:$//'
#kp123: you'd want to replace : with / (where the sed colon should be \/)
Let's say you have a path with a file in this format:
/dirA/dirB/dirC/filename.file
Now you only want the path which includes four "/". Type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-4 -d"/"
and your output will be
/dirA/dirB/dirC
The advantage of using cut is that you can also cut out the uppest directory as well as the file (in this example), so if you type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-3 -d"/"
your output would be
/dirA/dirB
Though you can do the same from the other side of the string, it would not make that much sense in this case as typing
$ echo "/dirA/dirB/dirC/filename.file" | cut -f2-4 -d"/"
results in
dirA/dirB/dirC
In some other cases the last case might also be helpful. Mind that there is no "/" at the beginning of the last output.

How to remove all symbols after dot [duplicate]

How can I remove all text after a character, in this case a colon (":"), in bash? Can I remove the colon, too? I have no idea how to.
In Bash (and ksh, zsh, dash, etc.), you can use parameter expansion with % which will remove characters from the end of the string or # which will remove characters from the beginning of the string. If you use a single one of those characters, the smallest matching string will be removed. If you double the character, the longest will be removed.
$ a='hello:world'
$ b=${a%:*}
$ echo "$b"
hello
$ a='hello:world:of:tomorrow'
$ echo "${a%:*}"
hello:world:of
$ echo "${a%%:*}"
hello
$ echo "${a#*:}"
world:of:tomorrow
$ echo "${a##*:}"
tomorrow
An example might have been useful, but if I understood you correctly, this would work:
echo "Hello: world" | cut -f1 -d":"
This will convert Hello: world into Hello.
$ echo 'hello:world:again' |sed 's/:.*//'
hello
I know some solutions:
# Our mock data:
A=user:mail:password
With awk and pipe:
$ echo $A | awk -v FS=':' '{print $1}'
user
Via bash variables:
$ echo ${A%%:*}
user
With pipe and sed:
$ echo $A | sed 's#:.*##g'
user
With pipe and grep:
$ echo $A | egrep -o '^[^:]+'
user
With pipe and cut:
$ echo $A | cut -f1 -d\:
user
egrep -o '^[^:]*:'
trim off everything after the last instance of ":"
grep -o '^.*:' fileListingPathsAndFiles.txt
and if you wanted to drop that last ":"
grep -o '^.*:' file.txt | sed 's/:$//'
#kp123: you'd want to replace : with / (where the sed colon should be \/)
Let's say you have a path with a file in this format:
/dirA/dirB/dirC/filename.file
Now you only want the path which includes four "/". Type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-4 -d"/"
and your output will be
/dirA/dirB/dirC
The advantage of using cut is that you can also cut out the uppest directory as well as the file (in this example), so if you type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-3 -d"/"
your output would be
/dirA/dirB
Though you can do the same from the other side of the string, it would not make that much sense in this case as typing
$ echo "/dirA/dirB/dirC/filename.file" | cut -f2-4 -d"/"
results in
dirA/dirB/dirC
In some other cases the last case might also be helpful. Mind that there is no "/" at the beginning of the last output.

command to count occurrences of word in entire file

I am trying to count the occurrences of a word in a file.
If word occurs multiple times in a line, I will count is a 1.
Following command will give me the output but will fail if line has multiple occurrences of word
grep -c "word" filename.txt
Is there any one liner?
You can use grep -o to show the exact matches and then count them:
grep -o "word" filename.txt | wc -l
Test
$ cat a
hello hello how are you
hello i am fine
but
this is another hello
$ grep -c "hello" a # Normal `grep -c` fails
3
$ grep -o "hello" a
hello
hello
hello
hello
$ grep -o "hello" a | wc -l # grep -o solves it!
4
Set RS in awk for a shorter one.
awk 'END{print NR-1}' RS="word" file
GNU awk allows it to be done in single command with use of multiple piped commands:
awk -v w="word" '$1==w{n++} END{print n}' RS=' |\n' file
cat file | cut -d ' ' | grep -c word
This assumes that all words in the file have spaces between the words. If there's punctuation concatenating the word to itself, or otherwise no spaces on a single line between the word and itself, they'll count as one.
grep word filename.txt | wc -l
grep prints the lines that match, then wc -l prints the number of lines matched

How can I remove all text after a character in bash?

How can I remove all text after a character, in this case a colon (":"), in bash? Can I remove the colon, too? I have no idea how to.
In Bash (and ksh, zsh, dash, etc.), you can use parameter expansion with % which will remove characters from the end of the string or # which will remove characters from the beginning of the string. If you use a single one of those characters, the smallest matching string will be removed. If you double the character, the longest will be removed.
$ a='hello:world'
$ b=${a%:*}
$ echo "$b"
hello
$ a='hello:world:of:tomorrow'
$ echo "${a%:*}"
hello:world:of
$ echo "${a%%:*}"
hello
$ echo "${a#*:}"
world:of:tomorrow
$ echo "${a##*:}"
tomorrow
An example might have been useful, but if I understood you correctly, this would work:
echo "Hello: world" | cut -f1 -d":"
This will convert Hello: world into Hello.
$ echo 'hello:world:again' |sed 's/:.*//'
hello
I know some solutions:
# Our mock data:
A=user:mail:password
With awk and pipe:
$ echo $A | awk -v FS=':' '{print $1}'
user
Via bash variables:
$ echo ${A%%:*}
user
With pipe and sed:
$ echo $A | sed 's#:.*##g'
user
With pipe and grep:
$ echo $A | egrep -o '^[^:]+'
user
With pipe and cut:
$ echo $A | cut -f1 -d\:
user
egrep -o '^[^:]*:'
trim off everything after the last instance of ":"
grep -o '^.*:' fileListingPathsAndFiles.txt
and if you wanted to drop that last ":"
grep -o '^.*:' file.txt | sed 's/:$//'
#kp123: you'd want to replace : with / (where the sed colon should be \/)
Let's say you have a path with a file in this format:
/dirA/dirB/dirC/filename.file
Now you only want the path which includes four "/". Type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-4 -d"/"
and your output will be
/dirA/dirB/dirC
The advantage of using cut is that you can also cut out the uppest directory as well as the file (in this example), so if you type
$ echo "/dirA/dirB/dirC/filename.file" | cut -f1-3 -d"/"
your output would be
/dirA/dirB
Though you can do the same from the other side of the string, it would not make that much sense in this case as typing
$ echo "/dirA/dirB/dirC/filename.file" | cut -f2-4 -d"/"
results in
dirA/dirB/dirC
In some other cases the last case might also be helpful. Mind that there is no "/" at the beginning of the last output.

Resources