How can I prefix the output of each match in grep with some text? - bash

I have a file with a list of phrases
apples
banananas
oranges
I'm running cat file.txt | xargs -I% sh -c "grep -Eio '(an)' >> output.txt"
What I can't figure out, is that I want the output to contain the original line, for example:
bananas,an
oranges,an
How can I prefix the output of grep to also include the value being piped to it?

This should be a task for awk, could you please try following.
awk '/an/{print $0",an"}' Input_file
This will look for string an in all lines of Input_file and append an in them too.

Solution with sed:
sed '/an/s/$/,an/' intput_file
This finds lines that match the pattern /an/, and appends ,an to the end of the pattern space $.

Use awk instead of grep:
$ awk -v s="an" ' # search string
BEGIN {
OFS="," # separating comma
}
match($0,s) { # when there is a match
print $0,substr($0,RSTART,RLENGTH) # output
}' file
Output:
banananas,an
oranges,an

Related

How to ignore case when using awk or sed [duplicate]

sed -i '/first/i This line to be added'
In this case,how to ignore case while searching for pattern =first
You can use the following:
sed 's/[Ff][Ii][Rr][Ss][Tt]/last/g' file
Otherwise, you have the /I and n/i flags:
sed 's/first/last/Ig' file
From man sed:
I
i
The I modifier to regular-expression matching is a GNU extension which
makes sed match regexp in a case-insensitive manner.
Test
$ cat file
first
FiRst
FIRST
fir3st
$ sed 's/[Ff][Ii][Rr][Ss][Tt]/last/g' file
last
last
last
fir3st
$ sed 's/first/last/Ig' file
last
last
last
fir3st
GNU sed
sed '/first/Ii This line to be added' file
You can try
sed 's/first/somethingelse/gI'
if you want to save some typing, try awk. I don't think sed has that option
awk -v IGNORECASE="1" '/first/{your logic}' file
For versions of awk that don't understand the IGNORECASE special variable, you can use something like this:
awk 'toupper($0) ~ /PATTERN/ { print "string to insert" } 1' file
Convert each line to uppercase before testing whether it matches the pattern and if it does, print the string. 1 is the shortest true condition, so awk does the default thing: { print }.
To use a variable, you could go with this:
awk -v var="$foo" 'BEGIN { pattern = toupper(foo) } toupper($0) ~ pattern { print "string to insert" } 1' file
This passes the shell variable $foo and transforms it to uppercase before the file is processed.
Slightly shorter with bash would be to use -v pattern="${foo^^}" and skip the BEGIN block.
Use the following, \b for word boundary
sed 's/\bfirst\b/This line to be added/Ig' file

Grep to exclude comment like # and -- with trailing spaces and within line

I tried to grep the word inside file which contains # and -- as a comment. The command that I used is
grep "^[^#]" -H -R -I "pathtofile" | grep "^[^--]" | grep -in ${1} | awk -F : ' { print $2 } ' | uniq)
which will print the file name of specific word. However, if there is a line like this
--test_specific_word_test test
The code above will treat above code as not to skip it. This case also apply to where the comment is in line with the code like var=1 --comment.
Should I use sed to delete comment line first or use just grep.
The downside is I have a significant amount of file to search and GNU grep is 2.0 and I can't upgrade the grep version because I don't have permission.
The command you've provided uses grep 4 times. You can skip commented lines with a single grep command:
grep -v "^ *\(--\|#\)" "pathtofile"
To print the filenames containing word1 use cut like so:
grep -Hv "^ *\(--\|#\)" filenames | grep "word1" | cut -d: -f1
To skip inline comments use sed:
sed "s/\(.*\)\(--\|#\).*/\1/g" inputfile
Sample input:
word1
word2
-word3 # inline comment
#comment1
--comment2
#comment3
output:
word1
word2
-word3
If in fact you are attempting to parse a programming language's source files, you are probably better off using a proper parser. Here is an attempt at refactoring your code into an Awk script, with several guesses as to what exactly the script should actually do.
find pathtofile -type f -exec awk -v word="$1" -F : '
# this doesn't reimplement grep -I though
{ sub("(#|--).*", "") } # remove comments
tolower($0) ~ tolower(word) && !($2 in a) { print FILENAME ":" FNR ":" $2; a[$2] }' {} +
This has the obvious flaw that if the programming language allows for # or -- in quoted strings and doesn't regard those as comments, the script will do the wrong thing.
There are no word boundaries in your script, so I didn't put any in mine either. This means if word="dog" then it will print any string which contains the three adjacent letters d-o-g in this order, even in substring matches like "doggone" or "endogenous". If that's not what you want, you can add word boundary markers -- if you have GNU Awk, you can say BEGIN { word = "\\<" word "\\> } at the beginning of the script; or see here.
The technique to add the key to an array and only print the key if it wasn't already in the array is a common way to implement uniq. This will fail if find returns so many files that it will end up running more than one instance of awk -- this will be controlled by the value of ARG_MAX of your kernel.

grep text after keyword with unknown spaces and remove comments

I am having trouble saving variables from file using grep/sed/awk.
The text in file.txt is on the form:
NUM_ITER = 1000 # Number of iterations
NUM_STEP = 1000
And I would like to save these to bash variables without the comments.
So far, I have attempted this:
grep -oP "^NUM_ITER[ ]*=\K.*#" file.txt
which yields
1000 #
Any suggestions?
I would use awk, like this:
awk -F'[=[:blank:]#]+' '$1 == "NUM_ITER" {print $2}' file
To store it in a variable:
NUM_ITER=$(awk -F'[=[:blank:]#]+' '$1 == "NUM_ITER" {print $2}' file)
As long as a line can only contain a single match, this is easy with sed.
sed -n '# Remove comments
s/[ ]*#.*//
# If keyword found, remove keyword and print value
s/^NUM_ITER[ ]*=[ ]*//p' file.txt
This can be trimmed down to a one-liner if you remove the comments.
sed -n 's/[ ]*#.*//;s/^NUM_ITER[ ]*=[ ]*//p' file.txt
The -n option turns off printing, and the /p flag after the final substitution says to print that line after all only if the substitution was successful.

how to extract string appears after one particular string in Shell

I am working on a script where I am grepping lines that contains -abc_1.
I need to extract string that appear just after this string as follow :
option : -abc_1 <some_path>
I have used following code :
grep "abc_1" | awk -F " " {print $4}
This code is failing if there are more spaces used between string , e.g :
option : -abc_1 <some_path>
It will be helpful if I can extract the path somehow without bothering of spaces.
thanks
This should do:
echo 'option : -abc_1 <some_path>' | awk '/abc_1/ {print $4}'
<some_path>
If you do not specify field separator, it uses one ore more blank as separator.
PS you do not need both grep and awk
With sed you can do the search and the filter in one step:
sed -n 's/^.*abc_1 *: *\([^ ]*\).*$/\1/p'
The -n option suppresses printing, but the p command at the end still prints if a successful substitution was made.
perl -lne ' print $1 if(/-abc_1 (.*)/)' your_file
Tested Here
Or if you want to use awk:
awk '{for(i=1;i<=NF;i++)if($i="-abc_1")print $(i+1)}' your_file
try this grep only way:
grep -Po '^option\s*:\s*-abc_1\s*\K.*' file
or if the white spaces were fixed:
grep -Po '^option : -abc_1 \K.*' file

How to retrieve digits including the separator "."

I am using grep to get a string like this: ANS_LENGTH=266.50 then I use sed to only get the digits: 266.50
This is my full command: grep --text 'ANS_LENGTH=' log.txt | sed -e 's/[^[[:digit:]]]*//g'
The result is : 26650
How can this line be changed so the result still shows the separator: 266.50
You don't need grep if you are going to use sed. Just use sed' // to match the lines you need to print.
sed -n '/ANS_LENGTH/s/[^=]*=\(.*\)/\1/p' log.txt
-n will suppress printing of lines that do not match /ANS_LENGTH/
Using captured group we print the value next to = sign.
p flag at the end allows to print the lines that matches our //.
If your grep happens to support -P option then you can do:
grep -oP '(?<=ANS_LENGTH=).*' log.txt
(?<=...) is a look-behind construct that allows us to match the lines you need. This requires the -P option
-o allows us to print only the value part.
You need to match a literal dot as well as the digits.
Try sed -e 's/[^[[:digit:]\.]]*//g'
The dot will match any single character. Escaping it with the backslash will match only a literal dot.
Here is some awk example:
cat file:
some data ANS_LENGTH=266.50 other=22
not mye data=43
gnu awk (due to RS)
awk '/ANS_LENGTH/ {f=NR} f&&NR-1==f' RS="[ =]" file
266.50
awk '/ANS_LENGTH/ {getline;print}' RS="[ =]" file
266.50
Plain awk
awk -F"[ =]" '{for(i=1;i<=NF;i++) if ($i=="ANS_LENGTH") print $(i+1)}' file
266.50
awk '{for(i=1;i<=NF;i++) if ($i~"ANS_LENGTH") {split($i,a,"=");print a[2]}}' file
266.50

Resources