BASH output from grep - bash

I am relatively new to bash and I am testing my code for the first case.
counter=1
for file in not_processed/*.txt; do
if [ $counter -le 1 ]; then
grep -v '2018-07' $file > bis.txt;
counter=$(($counter+1));
fi;
done
I want to subtract all the lines containing '2018-07' from my file. The new file needs to be named $file_bis.txt.
Thanks

With sed or awk it's much easier and faster to process complex files.
sed -n '/2018-07/p' not_processed/*.txt
then you get the output in your console. If you want you can pipe the output to a new file.
sed -n '/2018-07/p' not_processed/*.txt >> out.txt

This is to do it on all files in not_processed/*.txt
for file in not_processed/*.txt
do
grep -v '2018-07' $file > "$file"_bis.txt
done
And this is to do it only on the first 2 files in not_processed/*.txt
for file in $(ls not_processed/*.txt|head -2)
do
grep -v '2018-07' $file > "$file"_bis.txt
done
Don't forget to add "" on $file, because otherwise bash considers $file_bis as a new variable, which has no assigned value.

I don't understood why you are using a counter and if condition for this simple requirement. Use below script which will fulfill you requirement:-
#first store all the files in a variable
files=$(ls /your/path/*.txt)
# now use a for loop
for file in $files;
do
grep '2018-07' $file >> bis.txt
done
Better avoid for loop here as below single line is suffice
grep -h '2018-07' /your/path/*.txt > bis.txt

Related

BASH - Check if file exists and if it does, append the filename to a .txt

I'm trying to create a bash script that first looks for a name and then checks whether a certain filename, for example vacation021.jpg exists in the file system, and if it exists I want to append the filename to a .txt file.
I'm having a lot of issues with this, I'm still very new to bash.
This is as far as I've gotten.
> oldFiles.txt
files=$(grep "jane " list.txt)
for i in $files; do
if test -e vacation021.jpg;
then echo $i >> oldFiles.txt; fi
done
This however appends all the separate words in the list.txt to the oldFiles.txt.
Any help would be much appreciated.
for i in $files will iterate over each word in $files, not the lines. If you want to iterate over the lines, pipe the output of grep to the loop:
grep 'jane ' list.txt | while read -r i; do
if test -e vacation021.jpg
then printf "%s" "%i"
fi
done > oldFiles.txt
But as mentioned in the comments, unless the vacation021.jpg file is going to be created or deleted during the loop, you can simply use a single if:
if test -e vacation021.jpg
then
grep 'jane ' list.txt > oldFiles.txt
fi

I'm trying to validate the domains from a .csv file in bash script

Here is what I have and not working:
for i in `cat cnames.csv`
do nslookup $i | grep -v "8.8.8.8\|=\|Non-authoritative" >> output.txt
done
Any better solutions?
This is Bash FAQ 001; you don't iterate over a file using a for loop.
while IFS= read -r i; do
nslookup "$i"
done < cnames.csv | grep -v "8.8.8.8\|=\|Non-authoritative" > output.txt
Note that you don't need to run grep separate for each call to nslookup; you can pipe the aggregate output to a single call.
You can use the exit status of nslookup.
for i in $(cat cnames.csv); do
if nslookup "$i"; then
echo "$i is valid"
else
echo "$i not found"
fi
done
Is cnames.csv a real .csv file? Wouldn't that require to extract only the column with addresses in them? Right now the commas and other fields (if existing) are read too.
You could probably get them all looked up faster in parallel and more succinctly with GNU Parallel
parallel -a cnames.csv nslookup {} | grep ...

How could I redirect file name into counts by tab using one line commands in bash?

I have some files in fasta format and want to counts their reads and would like to have output in file names and their corresponding counts.
input file names:
1.fa
2.fa
3.fa
...
I tried:
for i in $(ls -t -v *.fa); do grep -c '>' $i > echo $i >> out.txt ; done
Problem:
It gives me out.txt but double file names and their counts by ':' separated. However, I need a tab and unique file names.
1.fa:7323580
1.fa:7323580
2.fa:5591179
2.fa:5591179
...
Suggested solution
grep -c '>' *.fa | sed 's/:/'$'\t'/ > out.txt
The $'\t\' is a Bash-ism called ANSI C Quoting.
Analysis of what went wrong
Your code is:
for i in $(ls -t -v *.fa); do grep -c '>' $i > echo $i >> out.txt ; done
It isn't a good idea to parse the output of the ls command. However, if your file names are well behaved (roughly, in the portable filename character set, which is [-A-Za-z._]), you'll be reasonably OK.
Your grep command, though, is confused. It is:
grep -c '>' $i > echo $i >> out.txt
That could be written more clearly as:
grep -c '>' $i $i > echo >> out.txt
This means 'count the number of lines containing > in $i, and then in $i again, and send the output first to a file echo, and then append to out.txt. Since the append overrides the redirection, the file echo is empty. You get the file name included in the output because there are two files to search; with only one file, you wouldn't get the file name too. (One way to ensure you get file names with regular (not -c or -l) grep is to scan /dev/null too. Many versions of grep also provide options to get the name explicitly, but POSIX doesn't mandate one. BSD grep uses -H; so does GNU grep.)
So, that's why you got the double file names and entries in your output.
Try this:
for i in $(ls -t -v *.fa)
do
c=$(grep -c '>' $i | awk -F: '{print $2}')
echo "$i: $c" >> out.txt
done

Passing input to sed, and sed info to a string

I have a list of files (~1000) and there is 1 file per line in my text file named: 'files.txt'
I have a macro that looks something like the following:
#!/bin/sh
b=$(sed '${1}q;d' files.txt)
cat > MyMacro_${1}.C << +EOF
myFile = new TFile("/MYPATHNAME/$b");
+EOF
and I use this input script by doing
./MakeMacro.sh 1
and later I want to do
./MakeMacro.sh 2
./MakeMacro.sh 3
...etc
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
So that it reads the n'th line of my files.txt and feeds that string to my created .C macro.
Given this statement and your tags, I'm going to answer using shell tools and not really address the issue of the .c macro.
The first line of your script contains a sed script. There are numerous ways to get the Nth line from a text file. The simplest might be to use head and tail.
$ head -n "${i}" files.txt | tail -n 1
This takes the first $i lines of files.txt, and shows you the last 1 lines of that set.
$ sed -ne "${i}p" files.txt
This use of sed uses -n to avoid printing by default, then prints the $ith line. For better performance, try:
$ sed -ne "${i}{p;q;}" files.txt
This does the same, but quits after printing the line, so that sed doesn't bother traversing the rest of the file.
$ awk -v i="$i" 'NR==i' files.txt
This passes the shell variable $i into awk, then evaluates an expression that tests whether the number of records processed is the same as that variable. If the expression evaluates true, awk prints the line. For better performance, try:
$ awk -v i="$i" 'NR==i{print;exit}' files.txt
Like the second sed script above, this will quit after printing the line, so as to avoid traversing the rest of the file.
Plenty of ways you could do this by loading the file into an array as well, but those ways would take more memory and perform less well. I'd use one-liners if you can. :)
To take any of these one-liners and put it into your script, you already have the notation:
if expr "$i" : '[0-9][0-9]*$' >/dev/null; then
b=$(sed -ne "${i}{p;q;}" files.txt)
else
echo "ERROR: invalid line number" >&2; exit 1
fi
If I am understanding you correctly, you can do a for loop in bash to call the script multiple times with different arguments.
for i in `seq 1 n`; do ./MakeMacro.sh $i; done
Based on the OP's comment, it seems that he wants to submit the generated files to Condor. You can modify the loop above to include the condor submission.
for i in `seq 1 n`; do ./MakeMacro.sh $i; condor_submit <OutputFile> ; done
i=0
while read file
do
((i++))
cat > MyMacro_${i}.C <<-'EOF'
myFile = new TFile("$file");
EOF
done < files.txt
Beware: you need tab indents on the EOF line.
I'm puzzled about why this is the way you want to do the job. You could have your C++ code read files.txt at runtime and it would likely be more efficient in most ways.
If you want to get the Nth line of files.txt into MyMacro_N.C, then:
{
echo
sed -n -e "${1}{s/.*/myFile = new TFILE(\"&\");/p;q;}" files.txt
echo
} > MyMacro_${1}.C
Good grief. The entire script should just be (untested):
awk -v nr="$1" 'NR==nr{printf "\nmyFile = new TFile(\"/MYPATHNAME/%s\");\n\n",$0 > ("MyMacro_"nr".C")}' files.txt
You can throw in a ;exit before the } if performance is an issue but I doubt if it will be.

grep-ing multiple files

I want to grep multiple files in a directory and collect the output of each grep in a separate file ..So if I grep 20 files, I should get 20 output-files which contain the searched item. Can anybody help me with this? Thanks.
Use a for statement:
for a in *.txt; do grep target $a >$a.out; done
just one gawk command
gawk '/target/ {print $0 > FILENAME".out"}' *.txt
you can use just the shell, no need external commands
for file in *.txt
do
while read -r line
do
case "$line" in
*pattern*) echo $line >> "${file%.txt}.out";;
esac
done < "$file"
done

Resources