I need your help with a short bash script. I have a folder, which contains about 150,000(!) xml-files. I need a script which extracts all those files, which contain a specified line. The script should be work as fast as possible, because the script have to be used very often.
My first approach was the following, using grep:
for f in temp/*
do
if grep "^.*the line which should be equal.*$" "$f"
then
echo "use this file"
else
echo "this file does not contain the line"
fi
done
This approach works, but it takes too much time. Does somebody know an faster approach? If another scripting language is a better choice, it is also ok.
Best regards,
Michael
You can use grep without any bash handlers.
-l, --files-with-matches
Suppress normal output; instead print the name of each input file from which output would normally have been printed. The scanning will stop on the first match. (-l is
specified by POSIX.)
So, try this:
grep "the line which should be equal" --files-with-matches temp/*
Related
I have many files, each in a directory. My script should:
Find a string in a file. Let's say the file is called "results" and the string is "average."
Then append everything else on the string's line to another file called "allResults." After running the script, the file "allResults" should contain as many lines as there are "results" files, like
allResults.txt (what I want):
Everything on the same line as the string, "average" in directory1/results
Everything on the same line as the string, "average" in directory2/results
Everything on the same line as the string, "average" in directory3/results
...
Everything on the same line as the string, "average" in directory-i/results
My script can find what I need. I have checked by doing a "cat" on "allResults.txt" as the script is working and an "ls -l" on the parent directory of "allResults.txt." I.e., I can see the output of the "find" on my screen and the size of "allResults.txt" increases briefly, then goes back to 0. The problem is that "allResults.txt" is empty when the script has finished. So the results of the "find" are not being appended/added to "allResults.txt." They're being overwritten.
Here is my script (I use "gsed", GNU sed, because I'm a Mac OSX Sierra user):
#!/bin/bash
# Loop over all directories, find.
let allsteps=100000
for ((step=0; step <= allsteps; step++)); do
i=$((step));
findme="average"
find ${i}/experiment-1/results.dat -type f -exec gsed -n -i "s/${findme}//p" {} \; >> allResults.txt
done
Please note that I have used ">>" in my example here because I read that it appends (which is what I want--a list of all lines matching my "find" from all files), whereas ">" overwrites. However, in both cases (when I use ">" or ">>"), I end up with an empty allResults.txt file.
grep's default behavior is to print out matching lines. Using sed is overkill.
You also don't need an explicit loop. Indeed, excess looping is a common trope programmers tend to import from other languages where looping is common. Most shell commands and constructs accept multiple file names.
grep average */experiment-1/results.dat > allResults.txt
What's nice about this is the output file is only opened once and is written to in one fell swoop.
If you indeed have hundreds of thousands of files to process you might encounter a command-line length limit. If that happens you can switch to a find call which will make sure not to call grep with too many files at once.
find . -name results.dat -exec grep average {} + > allResults.txt
Disclaimer: I'm very new to bash and for some reason I'm having a very hard time learning this one. The syntax seems very different depending on the website I visit.
I have a simple wrapper script that I want to test if a file is gzipped or not, and if so, to zcat the file to a new temporary file and open it in an editor. Here's part of the script:
if file $FILE | grep -q gzip
then
timestamp=$(date +"%D_%T")
$( zcat $FILE > tmp-$timestamp )
fi
I'm getting an error: "tmp-10/19/15_15:16:41: No such file or directory"
I tried removing the command substitution syntax or putting tmp-$timestamp in double quotes and I get the same error. If I remove the -$timestamp part, then it seems to work fine. Can someone tell me what's going on here? I'm clearing missing something very simple.
tmp-10/19/15_15:16:41 refers to a file named 15_15:16:41 in directory 19 which is a subdirectory of tmp-10. If those directories and subdirectories do not exist, you cannot write to them.
Replace:
timestamp=$(date +"%D_%T")
With:
timestamp=$(date +"%F_%T")
This gives the date without the /.
As an example of this format:
$ date +"%F_%T"
2015-10-19_12:37:05
With %F, the year comes before the month which comes before the day. This means that your files will sort properly. For most people, that is an important advantage over %D.
Revised script
Your script can be simplified to:
if file "$file" | grep -q gzip
then
zcat "$file" > "tmp-$(date +"%F_%T")"
fi
Notes:
It is best practices not to use all caps for your shell variable. The system uses all caps for its variables and you don't want to accidentally overwrite one. Use lower case or mixed case and you'll be safe.
File names, such as $file, should always be in double-quotes. Some day, someone will give you a file name with a space in it and you don't want that to cause your script to fail.
The command substitution $(...) does not belong here. It has been removed.
I have this script:
#!/bin/bash
FASTQFILES=~/Programs/ncbi-blast-2.2.29+/DB_files/*.fastq
FASTAFILES=~/Programs/ncbi-blast-2.2.29+/DB_files/*.fasta
clear
for file in $FASTQFILES
do cat $FASTQFILES | perl -e '$i=0;while(<>){if(/^\#/&&$i==0){s/^\#/\>/;print;}elsif($i==1){print;$i=-3}$i++;}' > ~/Programs/ncbi-blast-2.2.29+/DB_files/"${FASTQFILES%.*}.fasta"
mv $FASTAFILES ~/Programs/ncbi-blast-2.2.29+/db/
done
I'm trying it to grab the files defined in $FASTQFILES, do the .fastq to .fasta conversion, name the output with the same filename of the input, and move it to a new folder. E.g., ~/./DB_files/HELLO.fastq should give a converted ~/./db/HELLO.fasta
The problem is that the output of the conversion is a properly formatted hidden file called .fasta in the first folder instead of the expected one named HELLO.fasta. So there is nothing to mv. I think I'm messing up in the ${FASTQFILES%.*}.fasta argument but I can't seem to fix it.
I see three problems:
One part of your trouble is that you use cat $FASTQFILES instead of cat $file.
You also need to fix the I/O redirection at the end of that line to > ~/Programs/ncbi-blast-2.2.29+/DB_files/"${file%.fastq}.fasta".
The mv command needs to be executed outside the loop.
In fact, when processing a single file at a time, you don't need to use cat at all (UUOC — Useless Use Of Cat). Simply provide "$file" as an argument to the Perl script.
Hi i want to write a script that will go to a directory with many files and search a filename e.g. test_HTTP_abc.txt for this and search for HTTP string pattern, if it contains this string then set a variable equal to something:
something like:
var1=0
search for 06
if it contains 06 then
var1=1
else
var1=0
end if
but in unix script . Thanks
Probably the simplest thing is:
if test "${filename#*HTTP}" = "$filename"; then
# the variable does not contain the string `HTTP`
var=0
else
var=1
fi
Some shells allow regex matches in [[ comparisons, but it's not necessary to introduce that sort of non-portable code into your script.
Like this?
var=0
if fgrep -q 06 /path/to/dir/*HTTP*
then
var=1
fi
fgrep will return 0 ("truth") if there is a match in one of the files, and non-true otherwise (including the case of no matching input files).
If you want a list of matching files, try fgrep -l.
Well, I'm not going to write the script for you, you have to learn :)
Its easy if you break it down into smaller tasks;
The ls command is for looking at a directorie's contents. You can also use the find command to be a bit more intuitive, like find /some/folder -name "*string*"
To sift through the output of a command. You could store the output of a command to a variable or at using pipes.
You can search this output with something like awk (link), grep (link) an so on.
Setting variables is easy also in bash; http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-5.html
foundit=1
Why don't you have a go at trying to solve this puzzle first rather than someone telling you :D Show us where you get stuck in the puzzle.
I'm thinking of using find or grep to collect the files, and maybe sed to make the change, but what to do with the output? Or would it be better to use "argdo" in vim?
Note: this question is asking for command line solutions, not IDE's. Answers and comments suggesting IDE's will be calmly, politely and serenely flagged. :-)
I am huge fan of the following
export MYLIST=`find . -type f -name *.java`
for a in $MYLIST; do
mv $a $a.orig
echo "import.stuff" >> $a
cat $a.orig >> $a
chmod 755 $a
done;
mv is evil and eventually this will get you. But I use this same construct for a lot of things and it is my utility knife of choice.
Update: This method also backs up the files which you should do using any method. In addition it does not use anything but the shell's features. You don't have to jog your memory about tools you don't use often. It is simple enough to teach a monkey (and believe me I have) to do. And you are generally wise enough to just throw it away because it took four seconds to write.
you can use sed to insert a line before the first line of the file:
sed -ie "1i import package.name.*;" YourClass.java
use a for loop to iterate through all your files and run this expression on them. but be careful if you have packages, because the import statements must be after the package declaration. you can use a more complex sed expression, if that's the case.
I'd suggest sed -i to obviate the need to worry about the output. Since you don't specify your platform, check your man pages; the semantics of sed -i vary from Linux to BSD.
I would use sed if there was a decent way to so "do this for the first line only" but I don't know of one off of the top of my head. Why not use perl instead. Something like:
find . -name '*.java' -exec perl -p -i.bak -e '
BEGIN {
print "import package.name.*;\n"
}' {} \;
should do the job. Check perlrun(1) for more details.
for i in `ls *java`
do
sed -i '.old' '1 i\
Your include statement here.
' $i
done
Should do it. -i does an in place replacement and .old saves the old file just in case something goes wrong. Replace the iterator *java as necessary (maybe 'find . | grep java' or something instead.)
You may also use the ed command to do in-file search and replace:
# delete all lines matching foobar
ed -s test.txt <<< $'g/foobar/d\nw'
see: http://bash-hackers.org/wiki/doku.php?id=howto:edit-ed
I've actually starting to do it using "argdo" in vim. First of all, set the args:
:args **/*.java
The "**" traverses all the subdir, and the "args" sets them to be the arg list (as if you started vim with all those files in as arguments to vim, eg: vim package1/One.java package1/Two.java package2/One.java)
Then fiddle with whatever commands I need to make the transform I want, eg:
:/^package.*$/s/$/\rimport package.name.*;/
The "/^package.*$/" acts as an address for the ordinary "s///" substitution that follows it; the "/$/" matches the end of the package's line; the "\r" is to get a newline.
Now I can automate this over all files, with argdo. I hit ":", then uparrow to get the above line, then insert "argdo " so it becomes:
:argdo /^package.*$/s/$/\rimport package.name.*;/
This "argdo" applies that transform to each file in the argument list.
What is really nice about this solution is that it isn't dangerous: it hasn't actually changed the files yet, but I can look at them to confirm it did what I wanted. I can undo on specific files, or I can exit if I don't like what it's done (BTW: I've mapped ^n and ^p to :n and :N so I can scoot quickly through the files). Now, I commit them with ":wa" - "write all" files.
:wa
At this point, I can still undo specific files, or finesse them as needed.
This same approach can be used for other refactorings (e.g. change a method signature and calls to it, in many files).
BTW: This is clumsy: "s/$/\rtext/"... There must be a better way to append text from vim's commandline...