Colorize specific strings in a text - shell

I would like to highlight a few strings while outputting a text file. For example the literals [2], quick and lazy in:
... => any number of lines with non-matching content
He’s quick but lazy.
...
• The future belongs to those who believe in the beauty of their dreams [2].
...
I’m lazy but quick (2 times faster); is there a difference when "lazy" comes before "quick"?
...
My intuitive approach would be to use grep for the colorization (in fact I'm not fixed on any specific tool):
grep -F -e '[2]' -e 'quick' -e 'lazy' --color file.txt
But it has two problems:
It filters out the lines that don't match while I want to include them in the output.
It doesn't highlight all the matching strings; it seems like the order in which the -e expressions are provided matters (problem noticed with macOS grep).
My expected output (with <...> standing for the colorization) would be:
... => any number of lines with non-matching content
He’s <quick> but <lazy>.
...
• The future belongs to those who believe in the beauty of their dreams <[2]>.
...
I’m <lazy> but <quick> (2 times faster); is there a difference when "<lazy>" comes before "<quick>"?
...

grep -n -F -e '[2]' -e 'quick' -e 'lazy' --color=always file.txt |
awk -F':' '
FILENAME==ARGV[1] { n=substr($1,9,length($1)-22); sub(/[^:]+:/,""); a[n]=$0; next }
{ print (FNR in a ? a[FNR] : $0) }
' - file.txt
would use grep to find and highlight the strings, and then awk would print the grep output for those lines and the original lines from the input file otherwise.

UPDATE
I found a way using grep -E instead of grep -F. As a side-effect, matching a literal string will require its ERE-escaping.
The method is to build a single regex composed of the union of the search strings plus an additional $ anchor (for selecting the "non-matching" lines).
Hence, for highlighting the literals [2], quick and lazy in the sample text, you can use:
grep -E '\[2]|quick|lazy|$' --color file.txt
edit: I replaced the ^ anchor with the $ one because on macOS:
grep -E '\[2]|quick|lazy|^' --color doesn't highlight any word
grep -E -e '\[2]|quick|lazy' -e '^' --color SEGFAULTS !!!

Related

What is the best way to process multiple lines and extract a large list of specific strings? [duplicate]

I'm after a grep-type tool to search for purely literal strings. I'm looking for the occurrence of a line of a log file, as part of a line in a seperate log file. The search text can contain all sorts of regex special characters, e.g., []().*^$-\.
Is there a Unix search utility which would not use regex, but just search for literal occurrences of a string?
You can use grep for that, with the -F option.
-F, --fixed-strings PATTERN is a set of newline-separated fixed strings
That's either fgrep or grep -F which will not do regular expressions. fgrep is identical to grep -F but I prefer to not have to worry about the arguments, being intrinsically lazy :-)
grep -> grep
fgrep -> grep -F (fixed)
egrep -> grep -E (extended)
rgrep -> grep -r (recursive, on platforms that support it).
Pass -F to grep.
you can also use awk, as it has the ability to find fixed string, as well as programming capabilities, eg only
awk '{for(i=1;i<=NF;i++) if($i == "mystring") {print "do data manipulation here"} }' file
cat list.txt
one:hello:world
two:2:nothello
three:3:kudos
grep --color=always -F"hello
three" list.txt
output
one:hello:world
three:3:kudos
I really like the -P flag available in GNU grep for selective ignoring of special characters.
It makes grep -P "^some_prefix\Q[literal]\E$" possible
from grep manual
-P, --perl-regexp
Interpret I as Perl-compatible regular
expressions (PCREs). This option is experimental when
combined with the -z (--null-data) option, and grep -P may
warn of unimplemented features.

Grep multiple strings from text file

Okay so I have a textfile containing multiple strings, example of this -
Hello123
Halo123
Gracias
Thank you
...
I want grep to use these strings to find lines with matching strings/keywords from other files within a directory
example of text files being grepped -
123-example-Halo123
321-example-Gracias-com-no
321-example-match
so in this instance the output should be
123-example-Halo123
321-example-Gracias-com-no
With GNU grep:
grep -f file1 file2
-f FILE: Obtain patterns from FILE, one per line.
Output:
123-example-Halo123
321-example-Gracias-com-no
You should probably look at the manpage for grep to get a better understanding of what options are supported by the grep utility. However, there a number of ways to achieve what you're trying to accomplish. Here's one approach:
grep -e "Hello123" -e "Halo123" -e "Gracias" -e "Thank you" list_of_files_to_search
However, since your search strings are already in a separate file, you would probably want to use this approach:
grep -f patternFile list_of_files_to_search
I can think of two possible solutions for your question:
Use multiple regular expressions - a regular expression for each word you want to find, for example:
grep -e Hello123 -e Halo123 file_to_search.txt
Use a single regular expression with an "or" operator. Using Perl regular expressions, it will look like the following:
grep -P "Hello123|Halo123" file_to_search.txt
EDIT:
As you mentioned in your comment, you want to use a list of words to find from a file and search in a full directory.
You can manipulate the words-to-find file to look like -e flags concatenation:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' '
This will return something like -e "Hello123" -e "Halo123" -e "Gracias" -e" Thank you", which you can then pass to grep using xargs:
cat words_to_find.txt | sed 's/^/-e "/;s/$/"/' | tr '\n' ' ' | dir_to_search/*
As you can see, the last command also searches in all of the files in the directory.
SECOND EDIT: as PesaThe mentioned, the following command would do this in a much more simple and elegant way:
grep -f words_to_find.txt dir_to_search/*

read values of txt file from bash [duplicate]

This question already has answers here:
How to grep for contents after pattern?
(8 answers)
Closed 5 years ago.
I'm trying to read values from a text file.
I have test1.txt which looks like
sub1 1 2 3
sub8 4 5 6
I want to obtain values '1 2 3' when I specify 'sub1'.
The closest I get is:
subj="sub1"
grep "$subj" test1.txt
But the answer is:
sub8 4 5 6
I've read that grep gives you the next line to the match, so I've tried to change the text file to the following:
test2.txt looks like:
sub1
1 2 3
sub8
4 5 6
However, when I type
grep "$subj" test2.txt
The answer is:
sub1
It should be something super simple but I've tried awk, seg, grep,egrep, cat and none is working...I've also read some posts somehow related but none was really helpful
Awk works: awk '$1 == "'"$subj"'" { print $2, $3, $4 }' test1.txt
The command outputs fields two, three, and four for all lines in test1.txt where the first field is $subj (i.e.: the contents of the variable named subj).
With your original text file format:
target=sub1
while IFS=$' \t\n' read -r key values; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
This requires no external tools, using only functionality built into bash itself. See BashFAQ #1.
As has come up during debugging in comments, if you have a traditional Apple-format text file (CR newlines only), then you might want something more like:
target=sub1
while IFS=$' \t\n' read -r -d $'\r' key values || [[ $key ]]; do
if [[ $key = "$target" ]]; then
echo "Found values: $values"
fi
done <test1.txt
Alternately, using awk (for a standard UNIX text file):
target="sub1"
awk -v target="$target" '$1 == target { $1 = ""; print; }' <test1.txt
...or, for a file with CR-only newlines:
target="sub1"
tr '\r' '\n' <test1.txt | awk -v target="$target" '$1 == target { $1 = ""; print; }'
This version will be slower if the text file being read is small (since awk, like any other external tool, takes time to start up); but faster if it's large (since awk's operation is much faster than that of bash's built-ins once it's done starting up).
grep "sub1" test1.txt | cut -c6-
or
grep -A 1 "sub1" test2.txt | tail -n 1
You doing it right, but it seems like test1.txt has a wrong value in it.
with grep foo you get all lines with foo in it. use grep -m1 foo to find the first line with foo in it only.
then you can use cut -d" " -f2- to get all the values behind foo, while seperated by empty spaces.
In the end the command would look like this ...
$ subj="sub1"
$ grep -m1 "$subj" test1.txt | cut -d" " -f2-
But this doenst explain why you could not find sub1 in the first place.
Did you read the proper file ?
There's a bunch of ways to do this (and shorter/more efficient answers than what I'm giving you), but I'm assuming you're a beginner at bash, and therefore I'll give you something that's easy to understand:
egrep "^$subj\>" file.txt | sed "s/^\S*\>\s*//"
or
egrep "^$subj\>" file.txt | sed "s/^[^[:blank:]]*\>[[:blank:]]*//"
The first part, egrep, will search for you subject at the beginning of the line in file.txt (that's what the ^ symbol does in the grep string). It also is looking for a whole word (the \> is looking for an end of word boundary -- that way sub1 doesn't match sub12 in the file.) Notice you have to use egrep to get the \>, as grep by default doesn't recognize that escape sequence. Once done finding the lines, egrep then passes it's output to sed, which will strip the first word and trailing whitespace off of each line. Again, the ^ symbol in the sed command, specifies it should only match at the beginning of the line. The \S* tells it to read as many non-whitespace characters as it can. Then the \s* tells sed to gobble up as many whitespace as it can. sed then replaces everything it matched with nothing, leaving the other stuff behind.
BTW, there's a help page in Stack overflow that tells you how to format your questions (I'm guessing that was the reason you got a downvote).
-------------- EDIT ---------
As pointed out, if you are on a Mac or something like that you have to use [:alnum:] instead of \S, and [:blank:] instead of \s in your sed expression (as these are portable to all platforms)
awk '/sub1/{ print $2,$3,$4 }' file
1 2 3
What happens? After regexp /sub1/ the three following fields are printed.
Any drawbacks? It affects the space.
Sed also works: sed -n -e 's/^'"$subj"' *//p' file1.txt
It outputs all lines matching $subj at the beginning of a line after having removed the matching word and the spaces following. If TABs are used the spaces should be replaced by something like [[:space:]].

How to grep and match the first occurrence of a line?

Given the following content:
title="Bar=1; Fizz=2; Foo_Bar=3;"
I'd like to match the first occurrence of Bar value which is 1. Also I don't want to rely on soundings of the word (like double quote in the front), because the pattern could be in the middle of the line.
Here is my attempt:
$ grep -o -m1 'Bar=[ ./0-9a-zA-Z_-]\+' input.txt
Bar=1
Bar=3
I've used -m/--max-count which suppose to stop reading the file after num matches, but it didn't work. Why this option doesn't work as expected?
I could mix with head -n1, but I wondering if it is possible to achieve that with grep?
grep is line-oriented, so it apparently counts matches in terms of lines when using -m[1]
- even if multiple matches are found on the line (and are output individually with -o).
While I wouldn't know to solve the problem with grep alone (except with GNU grep's -P option - see anubhava's helpful answer), awk can do it (in a portable manner):
$ awk -F'Bar=|;' '{ print $2 }' <<<"Bar=1; Fizz=2; Foo_Bar=3;"
1
Use print "Bar=" $2, if the field name should be included.
Also note that the <<< method of providing input via stdin (a so-called here-string) is specific to Bash, Ksh, Zsh; if POSIX compliance is a must, use echo "..." | grep ... instead.
[1] Options -m and -o are not part of the grep POSIX spec., but both GNU and BSD/OSX grep support them and have chosen to implement the line-based logic.
This is consistent with the standard -c option, which counts "selected lines", i.e., the number of matching lines:
grep -o -c 'Bar=[ ./0-9a-zA-Z_-]\+' <<<"Bar=1; Fizz=2; Foo_Bar=3;" yields 1.
Using perl based regex flavor in gnu grep you can use:
grep -oP '^(.(?!Bar=\d+))*Bar=\d+' <<< "Bar=1; Fizz=2; Foo_Bar=3;"
Bar=1
(.(?!Bar=\d+))* will match 0 or more of any characters that don't have Bar=\d+ pattern thus making sure we match first Bar=\d+
If intent is to just print the value after = then use:
grep -oP '^(.(?!Bar=\d+))*Bar=\K\d+' <<< "Bar=1; Fizz=2; Foo_Bar=3;"
1
You can use grep -P (assuming you are on gnu grep) and positive look ahead ((?=.*Bar)) to achieve that in grep:
echo "Bar=1; Fizz=2; Foo_Bar=3;" | grep -oP -m 1 'Bar=[ ./0-9a-zA-Z_-]+(?=.*Bar)'
First use a grep to make the line start with Bar, and then get the Bar at the start of the line:
grep -o "Bar=.*" input.txt | grep -o -m1 "^Bar=[ ./0-9a-zA-Z_-]\+"
When you have a large file, you can optimize with
grep -o -m1 "Bar=.*" input.txt | grep -o -m1 "^Bar=[ ./0-9a-zA-Z_-]\+"

bash grep newline

[Editorial insertion: Possible duplicate of the same poster's earlier question?]
Hi, I need to extract from the file:
first
second
third
using the grep command, the following line:
second
third
How should the grep command look like?
Instead of grep, you can use pcregrep which supports multiline patterns
pcregrep -M 'second\nthird' file
-M allows the pattern to match more than one line.
Your question abstract "bash grep newline", implies that you would want to match on the second\nthird sequence of characters - i.e. something containing newline within it.
Since the grep works on "lines" and these two are different lines, you would not be able to match it this way.
So, I'd split it into several tasks:
you match the line that contains "second" and output the line that has matched and the subsequent line:
grep -A 1 "second" testfile
you translate every other newline into the sequence that is guaranteed not to occur in the input. I think the simplest way to do that would be using perl:
perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;'
you do a grep on these lines, this time searching for string ##UnUsedSequence##third:
grep "##UnUsedSequence##third"
you unwrap the unused sequences back into the newlines, sed might be the simplest:
sed -e 's/##UnUsedSequence##/\n'
So the resulting pipe command to do what you want would look like:
grep -A 1 "second" testfile | perl -npe '$x=1-$x; s/\n/##UnUsedSequence##/ if $x;' | grep "##UnUsedSequence##third" | sed -e 's/##UnUsedSequence##/\n/'
Not the most elegant by far, but should work. I'm curious to know of better approaches, though - there should be some.
I don't think grep is the way to go on this.
If you just want to strip the first line from any file (to generalize your question), I would use sed instead.
sed '1d' INPUT_FILE_NAME
This will send the contents of the file to standard output with the first line deleted.
Then you can redirect the standard output to another file to capture the results.
sed '1d' INPUT_FILE_NAME > OUTPUT_FILE_NAME
That should do it.
If you have to use grep and just don't want to display the line with first on it, then try this:
grep -v first INPUT_FILE_NAME
By passing the -v switch, you are telling grep to show you everything but the expression that you are passing. In effect show me everything but the line(s) with first in them.
However, the downside is that a file with multiple first's in it will not show those other lines either and may not be the behavior that you are expecting.
To shunt the results into a new file, try this:
grep -v first INPUT_FILE_NAME > OUTPUT_FILE_NAME
Hope this helps.
I don't really understand what do you want to match. I would not use grep, but one of the following:
tail -2 file # to get last two lines
head -n +2 file # to get all but first line
sed -e '2,3p;d' file # to get lines from second to third
(not sure how standard it is, it works in GNU tools for sure)
So you just don't want the line containing "first"? -v inverts the grep results.
$ echo -e "first\nsecond\nthird\n" | grep -v first
second
third
Line? Or lines?
Try
grep -E -e '(second|third)' filename
Edit: grep is line oriented. you're going to have to use either Perl, sed or awk to perform the pattern match across lines.
BTW -E tell grep that the regexp is extended RE.
grep -A1 "second" | grep -B1 "third" works nicely, and if you have multiple matches it will even get rid of the original -- match delimiter
grep -E '(second|third)' /path/to/file
egrep -w 'second|third' /path/to/file
you could use
$ grep -1 third filename
this will print a string with match and one string before and after. Since "third" is in the last string you get last two strings.
I like notnoop's answer, but building on AndrewY's answer (which is better for those without pcregrep, but way too complicated), you can just do:
RESULT=`grep -A1 -s -m1 '^\s*second\s*$' file | grep -s -B1 -m1 '^\s*third\s*$'`
grep -v '^first' filename
Where the -v flag inverts the match.

Resources