How to save automatically with append comment to file name? - bash

I am extracting data from files and I'd like to apply these working (ugly) command lines to all the txt files from a given folder. Thus I would also need to append a string to the output file name to avoid overwriting during the loop... any suggestion is warmly welcome.
for file in ./TEST/;
do
awk '/R.Time/,/LC/' 070_WT3a.txt|awk '/R.Time/,/PDA/'|grep -v -E "PDA|LC"|grep -w -v "80,00000"|grep -w -v "80,00833"|grep -w -v "80,01667"|grep -w -v "80,01067"|grep -w -v "80,02133"|sed -n '1,9601p' > ./Output/Fluo.txt;
awk '/R.Time/,/LC/' 070_WT3a.txt|awk '/R.Time/,/PDA/'|grep -v -E "PDA|LC"|grep -w -v "80,00000"|grep -w -v "80,00833"|grep -w -v "80,01667"|grep -w -v "80,01067"|grep -w -v "80,02133"|sed -n '9603,19203p' > ./Output/RID.txt;
done

Inside the loop you can use the variable ${file}. A first improvement (with additional lines that you can add after a pipe) :
for file in ./TEST/;
do
filebasename=${file##*/}
awk '/R.Time/,/LC/' ${file}.txt |
awk '/R.Time/,/PDA/' |
grep -v -E "PDA|LC" |
grep -w -v "80,00000"|
grep -w -v "80,00833"|
grep -w -v "80,01667"|
grep -w -v "80,01067"|
grep -w -v "80,02133"|
sed -n '1,9601p' > ./Output/Fluo_${filebasename};
awk '/R.Time/,/LC/' 070_WT3a.txt |
awk '/R.Time/,/PDA/'|
grep -v -E "PDA|LC"|
grep -w -v "80,00000"|
grep -w -v "80,00833"|
grep -w -v "80,01667"|
grep -w -v "80,01067"|
grep -w -v "80,02133"|
sed -n '9603,19203p' > ./Output/RID_${filebasename};
done
The next thing you can do is improving the parsing.
Without example input/output it is hard to see/test a solution, I can not tell for sure that all files needs to be split on lines 9601/9603/19203, what seems to be working for 070_WT3a.txt.
I would like to start with skipping the 80* lines, but these lines might have the boundaries R.Time/LC inside, so that won't help.
You might want to test on 070_WT3a.txt with
awk '/R.Time/,/LC/' 070_WT3a.txt |awk '/R.Time/,/PDA/'|
grep -v -E "PDA|LC"|grep -Ewv "80,0(0000|0833|1667|1067|2133)"
You can try to combine the 2 awk's into one (or even get the grep's inside the awk, but that is becoming offtopic and difficult to test without clear requirements and examples.
EDIT:
After testing with an example input I found this simplified:
for file in TEST/*.txt; do
filebasename=${file##*/}
awk '/LC Chromatogram(Detector A-Ch1)/,/^80,/' "${file}" |
grep -E "^[0-7]" > Output/Fluo_${filebasename}
awk '/LC Chromatogram(Detector B-Ch1)/,/^80,/' "${file}" |
grep -E "^[0-7]" > Output/RID_${filebasename}
done
Inside the loop I use ${file}, that will have different filenames each loop.
The filenaam is also used for the name of the outputfiles. The filename will start with TEST/, that can be stripped with ${file##*/} (there are a lot different ways like using cut -d"/" and sed 's/.., this one is fast).

Related

grep from two variables

I am trying to eliminate the duplicate lines of a list like this.
LINES='opa
opa
eita
eita
argh'
DUPLICATE='opa
eita'
The output I am looking for is argh.
Till now, this is what I tried:
echo -e "$DUPLICATE" | grep --invert-match -Ff- <(echo -e "$LINES")
And:
grep --invert-match -Ff- <(echo -e "$DUPLICATE") <(echo -e "$LINES")
But unsuccessfuly.
I know that I can achieve this if I put the content of $LINES into a file:
echo -e "$DUPLICATE" | grep --invert-match -Ff- FILE
But I'd like to know if this is possible only with variables.
Passing a dash as the file name to -f means "read from stdin". Get rid of it so the file name given to -f is the process substitution.
There's no need for echo -e, and -v is shorter and more common than --invert-match.
echo "$LINES" | grep -vFf <(echo "$DUPLICATE")
Equivalently, using a herestring:
grep -vFf <(echo "$DUPLICATE") <<< "$LINES"
another approach which doesn't require to create a duplicate list separately,
$ awk '{a[$0]++} END{for(k in a) if(a[k]==1) print k}' <<< "$LINES"
count occurrence of each line, print only if it's not duplicated (count==1).

GREP by result of awk

Output of awk '{print $4}' is
b05808aa-c6ad-4d30-a334-198ff5726f7c
59996d37-9008-4b3b-ab22-340955cb6019
2b41f358-ff6d-418c-a0d3-ac7151c03b78
7ac4995c-ff2c-4717-a2ac-e6870a5670f0
I need to grep file st.log by these records. Something like awk '{print $4}' |xargs -i grep -w "pattern from awk" st.log I dont know how to pass pattern correctly?
What about
awk '{print $4}' | grep -F -f - st.log
Credits to Eric Renouf, who noticed that -f - can be used for standard input instead -f <(cat), Note: -f /dev/stdin also works and avoids launching a new process.
or closer to the question to have the output ordered
awk '{print $4}' | xargs -i grep -F {} st.log
maybe -w was not the option OP needed but -F
grep --help
-F, --fixed-strings PATTERN is a set of newline-separated strings
-w, --word-regexp force PATTERN to match only whole words
-w will match only line that contain exactly pattern
examples
grep -w . <<<a # matches
grep -w . <<<ab # doesn't match
grep -F . <<<a # doesn't match
grep -F . <<<a.b # matches
May be something along these lines be helpful
How to process each line received as a result of grep command
awk '{print $4} | while read -r line; do
grep $line st.log
done

Execute piped shell commands in Tcl

I want to execute these piped shell commands in Tcl:
grep -v "#" inputfile | grep -v ">" | sort -r -nk7 | head
I try:
exec grep -v "#" inputfile | grep -v ">" | sort -r -nk7 | head
and get an error:
Error: grep: invalid option -- 'k'
When I try to pipe only 2 of the commands:
exec grep -v "#" inputfile | grep -v ">"
I get:
Error: can't specify ">" as last word in command
Update: I also tried {} and {bash -c '...'}:
exec {bash -c 'grep -v "#" inputfile | grep -v ">"'}
Error: couldn't execute "bash -c 'grep -v "#" inputfile | grep -v ">"'": no such file or directory
My question: how can I execute the initial piped commands in a tcl script?
Thanks
The problem is that exec does “special things” when it sees a > on its own (or at the start of a word) as that indicates a redirection. Unfortunately, there's no practical way to avoid this directly; this is an area where Tcl's syntax system doesn't help. You end up having to do something like this:
exec grep -v "#" inputfile | sh -c {exec grep -v ">"} | sort -r -nk7 | head
You can also move the entire pipeline to the Unix shell side:
exec sh -c {grep -v "#" inputfile | grep -v ">" | sort -r -nk7 | head}
Though to be frank this is something that you can do in pure Tcl, which will then make it portable to Windows too…
The > is causing problems here.
You need to escape it from tcl and the shell to make it work here.
exec grep -v "#" inputfile | grep -v {\\>} | sort -r -nk7 | head
or (and this is better since you have one less grep)
exec grep -Ev {#|>} inputfile | sort -r -nk7 | head
If you look in the directory you were running this from (assuming tclsh or similar) you'll probably see that you created an oddly named file (i.e. |) before.
In pure Tcl:
package require fileutil
set lines {}
::fileutil::foreachLine line inputfile {
if {![regexp #|> $line]} {
lappend lines $line
}
}
set lines [lsort -decreasing -integer -index 6 $lines]
set lines [lrange $lines 0 9]
puts [join $lines \n]\n
(-double might be more appropriate than -integer)
Edit: I mistranslated the (1-based) -k index for the command sort when writing the (0-based) -index option for lsort. It is now corrected.
Documentation: fileutil package, if, join, lappend, lrange, lsort, package, puts, regexp, set

Equivalent of grep -v -f -F using sed?

I would like to use sed instead of grep because of set -e presence in my shell script.
I am using grep -v -f -F to filter some patterns from a file file1.txt like,
grep -v -f -F file.txt data.txt > filtereddata.txt
This will filter the data which is not matching the pattern in file.txt.
I need to do this using sed. I was able to find invert match using '//p!' in sed but could not find combination of -f with '//p!'.
Is above grep command possible using sed with equivalent or better performance?

Solaris: find files not containing a string (alternative to grep -L)

I want to search files that does not contain a specific string.
I used -lv but this was a huge mistake because it was returning all the files that contain any line not containing my string.
I knew what I need exactly is grep -L, however, Solaris grep does not implement this feature.
What is the alternative, if any?
You can exploit grep -c and do the following (thanks #Scrutinizer for the /dev/null hint):
grep -c foo /dev/null * 2>/dev/null | awk -F: 'NR>1&&!$2{print $1}'
This will unfortunately also print directories (if * expands to any) which might not be desired in which case a simple loop, albeit slower, might be your best bet:
for file in *; do
[ -f "${file}" ] || continue
grep -q foo "${file}" 2>/dev/null || echo "${file}"
done
However, if you have GNU awk 4 on your system you can do:
awk 'BEGINFILE{f=0} /foo/{f=1} ENDFILE{if(!f)print FILENAME}' *
after using grep -c u can use grep again to find your desired filenames:
grep -c 'pattern' * | grep ':0$'
and to see just filnames :
grep -c 'pattern' * | grep ':0$' | cut -d":" -f1
You can use awk like this:
awk '!/not this/' file
To do multiple not:
awk '!/jan|feb|mars/' file

Resources