Equivalent of grep -v -f -F using sed? - bash

I would like to use sed instead of grep because of set -e presence in my shell script.
I am using grep -v -f -F to filter some patterns from a file file1.txt like,
grep -v -f -F file.txt data.txt > filtereddata.txt
This will filter the data which is not matching the pattern in file.txt.
I need to do this using sed. I was able to find invert match using '//p!' in sed but could not find combination of -f with '//p!'.
Is above grep command possible using sed with equivalent or better performance?

Related

User input into variables and grep a file for pattern

H!
So I am trying to run a script which looks for a string pattern.
For example, from a file I want to find 2 words, located separately
"I like toast, toast is amazing. Bread is just toast before it was toasted."
I want to invoke it from the command line using something like this:
./myscript.sh myfile.txt "toast bread"
My code so far:
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w "$keyword_first""$keyword_second" )
echo $find_keyword
i have tried a few different ways. Directly from the command line I can make it run using:
cat myfile.txt | grep -E 'toast|bread'
I'm trying to put the user input into variables and use the variables to grep the file
You seem to be looking simply for
grep -E "$2|$3" "$1"
What works on the command line will also work in a script, though you will need to switch to double quotes for the shell to replace variables inside the quotes.
In this case, the -E option can be replaced with multiple -e options, too.
grep -e "$2" -e "$3" "$1"
You can pipe to grep twice:
find_keyword=$(cat $text_file | grep -w "$keyword_first" | grep -w "$keyword_second")
Note that your search word "bread" is not found because the string contains the uppercase "Bread". If you want to find the words regardless of this, you should use the case-insensitive option -i for grep:
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
In a full script:
#!/bin/bash
#
# usage: ./myscript.sh myfile.txt "toast" "bread"
text_file=$1
keyword_first=$2
keyword_second=$3
find_keyword=$(cat $text_file | grep -w -i "$keyword_first" | grep -w -i "$keyword_second")
echo $find_keyword

GREP: is there a way to use grep inserting a text between filename and the pattern?

I have grep --color -EH "^([^,]*\,){3}5" try.csv
and the output it does is this:
try.csv:410,30151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,512
try.csv:652,20151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41
try.csv:109,30151010,R,5005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,455
I tried grep --color -EH "^([^,]*,){3}5" try.csv | perl -ne 'print ",$_"'
but the output looks like this :
,try.csv:410,30151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,512
,try.csv:652,20151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41
,try.csv:109,30151010,R,5005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,455
Expected output:
try.csv:,410,30151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,512
try.csv:,652,20151010,K,5001,,,,,,,,,,,,,,,,,,,,,,,,,,,,,41
try.csv:,109,30151010,R,5005,,,,,,,,,,,,,,,,,,,,,,,,,,,,,455
I am very new to Perl and shell. I'm searching in the CSV files.
You may insert a comma by using sed,
$ grep --color -EH "^([^,]*\,){3}5" try.csv | sed 's/:/&,/'
s/:/&,/: the the special character & in the replacement refers to that portion of the string which matched. And you may add a comma behind & to meet your requirement.

How to save automatically with append comment to file name?

I am extracting data from files and I'd like to apply these working (ugly) command lines to all the txt files from a given folder. Thus I would also need to append a string to the output file name to avoid overwriting during the loop... any suggestion is warmly welcome.
for file in ./TEST/;
do
awk '/R.Time/,/LC/' 070_WT3a.txt|awk '/R.Time/,/PDA/'|grep -v -E "PDA|LC"|grep -w -v "80,00000"|grep -w -v "80,00833"|grep -w -v "80,01667"|grep -w -v "80,01067"|grep -w -v "80,02133"|sed -n '1,9601p' > ./Output/Fluo.txt;
awk '/R.Time/,/LC/' 070_WT3a.txt|awk '/R.Time/,/PDA/'|grep -v -E "PDA|LC"|grep -w -v "80,00000"|grep -w -v "80,00833"|grep -w -v "80,01667"|grep -w -v "80,01067"|grep -w -v "80,02133"|sed -n '9603,19203p' > ./Output/RID.txt;
done
Inside the loop you can use the variable ${file}. A first improvement (with additional lines that you can add after a pipe) :
for file in ./TEST/;
do
filebasename=${file##*/}
awk '/R.Time/,/LC/' ${file}.txt |
awk '/R.Time/,/PDA/' |
grep -v -E "PDA|LC" |
grep -w -v "80,00000"|
grep -w -v "80,00833"|
grep -w -v "80,01667"|
grep -w -v "80,01067"|
grep -w -v "80,02133"|
sed -n '1,9601p' > ./Output/Fluo_${filebasename};
awk '/R.Time/,/LC/' 070_WT3a.txt |
awk '/R.Time/,/PDA/'|
grep -v -E "PDA|LC"|
grep -w -v "80,00000"|
grep -w -v "80,00833"|
grep -w -v "80,01667"|
grep -w -v "80,01067"|
grep -w -v "80,02133"|
sed -n '9603,19203p' > ./Output/RID_${filebasename};
done
The next thing you can do is improving the parsing.
Without example input/output it is hard to see/test a solution, I can not tell for sure that all files needs to be split on lines 9601/9603/19203, what seems to be working for 070_WT3a.txt.
I would like to start with skipping the 80* lines, but these lines might have the boundaries R.Time/LC inside, so that won't help.
You might want to test on 070_WT3a.txt with
awk '/R.Time/,/LC/' 070_WT3a.txt |awk '/R.Time/,/PDA/'|
grep -v -E "PDA|LC"|grep -Ewv "80,0(0000|0833|1667|1067|2133)"
You can try to combine the 2 awk's into one (or even get the grep's inside the awk, but that is becoming offtopic and difficult to test without clear requirements and examples.
EDIT:
After testing with an example input I found this simplified:
for file in TEST/*.txt; do
filebasename=${file##*/}
awk '/LC Chromatogram(Detector A-Ch1)/,/^80,/' "${file}" |
grep -E "^[0-7]" > Output/Fluo_${filebasename}
awk '/LC Chromatogram(Detector B-Ch1)/,/^80,/' "${file}" |
grep -E "^[0-7]" > Output/RID_${filebasename}
done
Inside the loop I use ${file}, that will have different filenames each loop.
The filenaam is also used for the name of the outputfiles. The filename will start with TEST/, that can be stripped with ${file##*/} (there are a lot different ways like using cut -d"/" and sed 's/.., this one is fast).

xargs sed and command substitution in bash

I'm trying to pass a xargs string replace into a sed replacement inside of a substitution, here's the non-working code.
CALCINT=$CALCINT$(seq $CALCLINES | xargs -Iz echo $CALCINT' -F "invoiceid'z'="'$(sed -n '/invoiceid'z'/s/.*name="invoiceid'z'"\s\+value="\([^"]\+\).*/\1/p' output.txt))
Everything works up until the sed inside the second substitution. the 'z' should be a number 1-20 based on the $CALCLINES variable. I know it has something to do with not escaping properly for sed but I'm having trouble wrapping my head around how sed wants things escaped in this situation.
Here's the surrounding lines of code:
curl -b mycookiefile -c mycookiefile http://localhost/calcint.php > output.txt
CALCLINES=`grep -o 'class="addinterest"' output.txt | wc -l`
CALCINT=$CALCINT$(seq $CALCLINES | xargs -Iz echo $CALCINT' -F "invoiceid'z'="'$(sed -n '/invoiceid17/s/.*name="invoiceid17"\s\+value="\([^"]\+\).*/\1/p' output.txt))
echo $CALCINT
Output: (What I get now)
-F "invoiceid1=" -F "invoiceid2=" -F "invoiceid3=" -F "invoiceid4=" -F "invoiceid5=" -F "invoiceid6=" -F "invoiceid7=" -F "invoiceid8=" -F "invoiceid9=" -F "invoiceid10=" -F "invoiceid11=" -F "invoiceid12=" -F "invoiceid13=" -F "invoiceid14=" -F "invoiceid15=" -F "invoiceid16=" -F "invoiceid17=" -F "invoiceid18=" -F "invoiceid19=" -F "invoiceid20="
What I'm hoping to see as output is something like this
-F "invoiceid1=2342" -F "invoiceid2=456456" -F "invoiceid3=78987" ...etc etc
-------------------------EDIT-----------------------
FWIW...here's the output.txt and other things I've tried.
for i in $(seq -f "%02g" ${CALCLINES});do
sed -n "/interest$i/s/.*name=\"interest$i\"\s\+value=\"\([^\"]\+\).*/\1/p" output.txt > output2.txt
done
output2.txt contains nothing
Thanks to #janos response for clearing things up but taking a step back makes it clear to me that the root of the issue here is that I'm struggling to get the invoice ids out. It's dynamically generated HTML "....name="invoiceid7" value="556"..." so there isn't anything consistent in those particular tags that I can grep on, which is why I was counting another tag that IS consistent then trying to use a variable sed to basically deduce the tag name then extract the value.
Annd..output.txt https://pastebin.com/ewUaddVi
------UPDATE-----
Working solution
Stuff sed into a loop. Note how I had to use ' to use variables in the sed string. That is well documented elsewhere on here. :)
for i in $(seq ${CALCLINES});do
e="interest"$i`
CALCINT=$CALCINT' -F "'$e'='
CALCINT=$CALCINT$(sed -n '/'$e'/s/.*name="'$e'"\s\+value="\([^"]\+\).*/\1/p' output.txt)'"'
done
Please read through the comments on the solution below, there is a cleaner way of doing this.
Your current approach cannot work, specifically this part:
... | xargs -Iz echo -F "invoiceid'z'="$(sed ...)"
The problem is that the $(sed ...) will not be evaluated for each line in the input during the execution of xargs.
The shell will evaluate this once, before it actually runs xargs.
And you need there dynamic values from your input.
You can make this work by taking a different approach:
Extract the invoice ids. For example, write a grep or sed pipeline that produces as output simply the list of invoice ids
Transform the invoice list to the -F "invoiceidNUM=..." form that you need
For the second step, Awk could be practical. The script could be something like this:
curl -b mycookiefile -c mycookiefile http://localhost/calcint.php > output.txt
args=$(sed ... output.txt | awk '{ print "-F \"invoice" NR "=" $0 "\"" }')
echo $args
For example if the sed step produces 2342, 456456, 78987, then the output will be:
-F "invoice1=2342" -F "invoice2=456456" -F "invoice3=78987"

Shell script for string search between particular lines, timestamps

I have a file with more than 10000 lines. I am trying to search for a string in between particular set of lines, between 2 timestamps.
I am using sed command to achieve this.
sed -n '1,4133p' filename | sed -n '/'2015-08-12'/, /'2015-09-12'/p' filename | grep -i "string"
With the above command I am getting desired result. The above command is considering entire file not the lines I have specified.
Is there is a way to achieve this?.
Please help
I think the problem is here:
sed -n '1,4133p' filename | sed -n '/'2015-08-12'/, /'2015-09-12'/p' filename |
^^^
You want to pipe the output of your first sed command into the second. The way you have this, the output from the first is clobbered and replaced with a re-scan of the file.
Try this:
sed -n '1,4133p' filename | sed -n '/'2015-08-12'/, /'2015-09-12'/p' | grep -i "string"
Any time you find yourself chaining together pipes of seds and greps stop and just use 1 awk command instead:
awk -v IGNORECASE=1 '/2015-08-12/{f=1} f&&/string/; /2015-09-12/||(NR==4133){exit}' file
The above uses GNU awk for IGNORECASE, with other awks you'd just change /string/ to tolower($0)~/string/.

Resources