use shell script to traverse subdirectories and do grep - shell

My directory has the following structure (on Mac)
main_folder/
|_folder1/
|_folder2/
|_folder3/
each sub folder has a file that has the identical name "classFile.trans"
I want to traverse the sub folders and do grep on the file classFile.trans. But I don't know how to save the output new file in the corresponding sub folder. Thanks
#!/bin/bash
find file in ./main_folder/**classFile.trans; do
grep -v "keyword" $file > newClassFile.trans #how do I save the output new file in the corresponding sub folder?
done

Probably easiest to run the grep in each subdirectory:
#!/bin/bash
for d in ./main_folder/*; do
( cd $d;
file=classFile.trans
test -f $file && grep -v "keyword" $file > newClassFile.trans
)
done
The parentheses cause the body of the loop to iterate in a new subshell, so the working directory of the main shell is not changed. However, this makes error messages from grep fairly useless. If any of the classFile.trans is not readable, for example, the error message will not indicate which directory. So it is probably better to do:
for d in ./main_folder/*; do grep -v keyword $d/$file > $d/$newfile; done

I would use
#!/bin/bash
for file in `find ./main_folder/ -name "classFile.trans" `; do
newFile=`dirname $file`/newClassFile.trans;
grep -v "keyword" $file > $newFile
done

The find command has -exec* flags that allow it to run commands for each file matched. For your case you would do:
find ./main_folder -name classFile.trans \
-execdir $SHELL -c "grep -v '$keyword' {} >newClassFile.trans" \;
(\ and linebreak added so the whole command can be read without scrolling)
Breaking the various arguments down:
-name classFile.trans searches for all files named classFile.trans
-execdir runs everything up to the ; character in the directory that contains the matched file
$SHELL -c runs your $SHELL (e.g., /bin/bash) with the -c argument which immediately executes its respective value instead of creating an interactive shell that you can type in
"grep -v '$keyword' {} >newClassFile.trans" runs your grep and output redirection in the file's respective directory thanks to -execdir; note that find turns {} in to the matched file's name
This is necessary so the > redirection runs in the sub-command, not the "current" shell the find command is being run in
\; escapes the ; character so it can be sent to find instead of acting as a command terminator
A test:
# Set up the folders and test files
$ mkdir -p main_folder/{f1,f2,f3}
$ for i in main_folder/f?; do printf 'a\nb\nc\n' >$i/classFile.trans; done
# Contents of one of the test files
$ cat main_folder/f1/classFile.trans
a
b
c
# Set the keyword to the letter 'b'
$ keyword=b
$ find ./main_folder -name classFile.trans -execdir $SHELL -c "grep -v '$keyword' {} >newClassFile.trans" \;
# newClassFile.trans was created sans 'b'
$ cat main_folder/f1/newClassFile.trans
a
c

Related

Create archive from difference of two folders

I have the following problem.
There are two nested folders A and B. They are mostly identical, but B has a few files that A does not. (These are two mounted rootfs images).
I want to create a shell script that does the following:
Find out which files are contained in B but not in A.
copy the files found in 1. from B and create a tar.gz that contains these files, keeping the folder structure.
The goal is to import the additional data from image B afterwards on an embedded system that contains the contents of image A.
For the first step I put together the following code snippet. Note to grep "Nur" : "Nur in" = "Only in" (german):
diff -rq <A> <B>/ 2>/dev/null | grep Nur | awk '{print substr($3, 1, length($3)-1) "/" substr($4, 1, length($4)-1)}'
The result is the output of the paths relative to folder B.
I have no idea how to implement the second step. Can someone give me some help?
Using diff for finding files which don't exist is severe overkill; you are doing a lot of calculations to compare the contents of the files, where clearly all you care about is whether a file name exists or not.
Maybe try this instead.
tar zcf newfiles.tar.gz $(comm -13 <(cd A && find . -type f | sort) <(cd B && find . -type f | sort) | sed 's/^\./B/')
The find commands produce a listing of the file name hierarchies; comm -13 extracts the elements which are unique to the second input file (which here isn't really a file at all; we are using the shell's process substitution facility to provide the input) and the sed command adds the path into B back to the beginning.
Passing a command substitution $(...) as the argument to tar is problematic; if there are a lot of file names, you will run into "command line too long", and if your file names contain whitespace or other irregularities in them, the shell will mess them up. The standard solution is to use xargs but using xargs tar cf will overwrite the output file if xargs ends up calling tar more than once; though perhaps your tar has an option to read the file names from standard input.
With find:
$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print
./c
./d
The idea is to use the exec action with a shell script that tests the existence of the current file in the other directory. There are a few subtleties:
The first argument of sh -c is the script to execute, the second (here _ but could be anything else) corresponds to the $0 positional parameter of the script and the third ({}) is the current file name as set by find and passed to the script as positional parameter $1.
The -print action at the end is needed, even if it is normally the default with find, because the use of -exec cancels this default.
Example of use to generate your tarball with GNU tar:
$ cd B
$ find . -type f -exec sh -c '[ ! -f ../A/"$1" ]' _ {} \; -print > ../list.txt
$ tar -c -v -f ../diff.tar --files-from=../list.txt
./c
./d
Note: if you have unusual file names the --verbatim-files-from GNU tar option can help. Or a combination of the -print0 action of find and the --null option of GNU tar.
Note: if the shell is POSIX (e.g., bash) you can also run find from the parent directory and get the path of the files relative from there, if you prefer:
$ mkdir -p A B
$ touch A/a A/b
$ touch B/a B/b B/c B/d
$ find B -type f -exec sh -c '[ ! -f A"${1#B}" ]' _ {} \; -print
B/c
B/d

shell script to batch replace specific string in .csv file

I want to replace some strings in my raw csv file for further use and I search for the internet and create the script so far. But it seems they doesn't work. Hope anyone can help me
The csv file is like this and I want to delete "^M" and "# Columns: " so that I can read my file.
# Task: bending1^M
# Frequency (Hz): 20^M
# Clock (millisecond): 250^M
# Duration (seconds): 120^M
# Columns: time,avg_rss12,var_rss12,avg_rss13,var_rss13,avg_rss23,var_rss23^M
#!/usr/bin/env bash
function scandir(){
cd `dirname $0`
echo `pwd`
local cur_dir parent_dir workir
workdir=$1
cd ${workdir}
if [ ${workdir}="/" ]
then
cur_dir=""
else
cur_dir=$(pwd)
fi
for dirlist in $(ls ${cur_dir})
do
if test -d ${dirlist}
then
cd ${dirlist}
scandir ${cur_dir}/${dirlist}
cd ..
else
vi ${cur_dir}/${dirlist} << EOF
:%s/\r//g
:%s/\#\ Columns:\ //g
:wq
EOF
fi
done
}
Your whole script looks like just:
find "$workdir" -type f | xargs -n1 sed -i -e 's/\r//g; s/^# Columns://'
Notes to your script:
Check your scripts for validity on https://www.shellcheck.net/
The of << EOF here document is invalid. The closing word EOF has to start from the beginning of the line inside the script:
vi ${cur_dir}/${dirlist} << EOF
:%s/\r//g
:%s/\#\ Columns:\ //g
:wq
EOF
#^^ no spaces in front of EOF, also no spaces/tabs after EOF
# the whole line needs to be exactly 'EOF'
There cannot be any spaces, tabs in front of it. Also, I don't think vi is not the best tool to run substitutions on a file, also I don't know how it acts with tabs or spaces infront of :. You may want to try to run it without whitespace characters in front of ::
vi ${cur_dir}/${dirlist} << EOF
:%s/\r//g
:%s/\#\ Columns:\ //g
:wq
EOF
Backticks ` are deprecated, less readable and don't allow for easy nesting. Use $( ... ) command substitution instead.
echo `pwd` is just invalid use of echo, just use pwd.
for dirlist in $(ls parsing ls output is bad. Use find command instead, or if you have to, shell globulation, ie. for dirlist in *.
if [ ${workdir}="/" ] is invalid. This tests if the string "${workdir}=/ is not null. Bash is space aware, it needs a space between = and operands. It should be if [ "${workdir}" = "/" ].
Always quote your variables. Don't cd ${dirlist} do cd "${dirlist}" and so on.
Well posted answer are corrects, but I would recommand this syntax:
find "$1" -type f -name '*.csv' -exec sed -e 's/\r$//;s/^# Columns: //' -i~ {} +
Using + instead of \; at end of find command will permit sed to work on many files at once, reducing forks and make whole job quicker.
The ~ after -i option will rename existing files by appending tilde at end of names instead of deleting them.
Using -type f will ensure working on files only (no symlinks, dirs, socket, fifos, devices...)
You can reduce the entire script to one command, and you do not have to use Vim to process the files:
find ${workdir} -name '*.csv' -exec sed -i -e 's/\r$//; /^#/d' '{}' \;
Explanation:
find <dir> -name <pattern> -exec <command> \; will search <dir> for files matchingand execute` on each file. You want to search for CSV files and do something with them (run a command on them).
The command run on every (CSV) file that was found will be sed -i -e 's/\r$//; /^#/d'. This means to edit the files in-place (-i) and run two transformations on them. s/\r$// will remove the ^M from each line, and /^#/d will remove all lines that start with a #.
'{}' is replaced with the files found by find and \; marks the end of the command run by find (see the find manual page for this).
Most of your script emulates part of the find command. That is not a good idea.
Also, for simple text processing, it is easier and faster to use sed instead of invoking an editor such as Vim.

Using touch and sed within a find -ok command

I have some wav files. For each of those files I would like to create a new text file with the same name (obviously with the wav extension being replaced with txt).
I first tried this:
find . -name *.wav -exec 'touch $(echo '{}" | sed -r 's/[^.]+\$/txt/')" \;
which outputted
< touch $(echo {} | sed -r 's/[^.]+$/txt/') ... ./The_stranglers-Golden_brown.wav > ?
Then find complained after I hit y key with:
find: ‘touch $(echo ./music.wav | sed -r 's/[^.]+$/txt/')’: No such file or directory
I figured out I was using a pipe and actually needed a shell. I then ran:
find . -name *.wav -exec sh -c 'touch $(echo "'{}"\" | sed -r 's/[^.]+\$/txt/')" \;
Which did the job.
Actually, I do not really get what is being done internally, but I guess a shell is spawned on every file right ? I fear this is memory costly.
Then, what if I need to run this command on a large bunch of files and directories !?
Now is there a way to do this in a more efficient way ?
Basically I need to transform the current file's name and to feed touch command.
Thank you.
This find with bash parameter-expansion will do the trick for you. You don't need sed at all.
find . -type f -name "*.wav" -exec sh -c 'x=$1; file="${x##*/}"; woe="${file%.*}"; touch "${woe}.txt"; ' sh {} \;
The idea is the part
x=$1 represents each of the entry returned from the output of find
file="${x##*/}" strips the path of the file leaving only the last file name part (only filename.ext)
The part woe="${file%.*}" stores the name without extension, and the new file is created with an extension .txt from the name found.
EDIT
Parameter expansion sets us free from using Command substitution $() sub-process and sed.
After looking at sh man page, I figured out that the command up above could be simplified.
Synopsis -c [-aCefnuvxIimqVEbp] [+aCefnuvxIimqVEbp] [-o option_name] [+o option_name] command_string [command_name [argument ...]]
...
-c Read commands from the command_string operand instead of from the stan‐dard input. Special parameter 0 will be set from the command_name oper‐and and the positional parameters ($1, $2, etc.) set from the remaining argument operands.
We can directly pass the file path, skipping the shell's name (which is useless inside the script anyway). So {} is passed as the command_name $0 which can be expanded right away.
We end up with a cleaner command.
find . -name *.wav -exec sh -c 'touch "${0%.*}".txt ;' {} \;

shell script does not find the directory

I'm starting in the shell script.I'm need to make the checksum of a lot of files, so I thought to automate the process using an shell script.
I make to scripts: the first script uses an recursive ls command with an egrep -v that receive as parameter the path of file inputed by me, these command is saved in a ambient variable that converts the output in a string, follow by a loop(for) that cut the output's string in lines and pass these lines as a parameter when calling the second script; The second script take this parameter and pass they as parameter to hashdeep command,wich in turn is saved in another ambient variable that, as in previous script,convert the output's command in a string and cut they using IFS,lastly I'm take the line of interest and put then in a text file.
The output is:
/home/douglas/Trampo/shell_scripts/2016-10-27-001757.jpg: No such file
or directory
----Checksum FILE: 2016-10-27-001757.jpg
----Checksum HASH:
the issue is: I sets as parameter the directory ~/Pictures but in the output error they return another directory,/home/douglas/Trampo/shell_scripts/(the own directory), in this case, the file 2016-10-27-001757.jpg is in the ~/Pictures directory,why the script is going in its own directory?
First script:
#/bin/bash
arquivos=$(ls -R $1 | egrep -v '^d')
for linha in $arquivos
do
bash ./task2.sh $linha
done
second script:
#/bin/bash
checksum=$(hashdeep $1)
concatenado=''
for i in $checksum
do
concatenado+=$i
done
IFS=',' read -ra ADDR <<< "$concatenado"
echo
echo '----Checksum FILE:' $1
echo '----Checksum HASH:' ${ADDR[4]}
echo
echo ${ADDR[4]} >> ~/Trampo/shell_scripts/txt2.txt
I think that's...sorry about the English grammatic errors.
I hope that the question has become clear.
Thanks ins advanced!
There are several wrong in the first script alone.
When running ls in recursive mode using -R, the output is listed per directory and each file is listed relative to their parent instead of full pathname.
ls -R doesn't list the directory in long format as implied by | grep -v ^d where it seems you are looking for files (non directories).
In your specific case, the missing file 2016-10-27-001757.jpg is in a subdirectory but you lost the location by using ls -R.
Do not parse the output of ls. Use find and you won't have the same issue.
First script can be replaced by a single line.
Try this:
#!/bin/bash
find $1 -type f -exec ./task2.sh "{}" \;
Or if you prefer using xargs, try this:
#!/bin/bash
find $1 -type f -print0 | xargs -0 -n1 -I{} ./task2.sh "{}"
Note: enclosing {} in quotes ensures that task2.sh receives a complete filename even if it contains spaces.
In task2.sh the parameter $1 should also be quoted "$1".
If task2.sh is executable, you are all set. If not, add bash in the line so it reads as:
find $1 -type f -exec bash ./task2.sh "{}" \;
task2.sh, though not posted in the original question, is not executable. It has a missing execute permission.
Add execute permission to it by running chmod like:
chmod a+x task2.sh
Goodluck.

How to cd into grep output?

I have a shell script which basically searches all folders inside a location and I use grep to find the exact folder I want to target.
for dir in /root/*; do
grep "Apples" "${dir}"/*.* || continue
While grep successfully finds my target directory, I'm stuck on how I can move the folders I want to move in my target directory. An idea I had was to cd into grep output but that's where I got stuck. Tried some Google results, none helped with my case.
Example grep output: Binary file /root/ant/containers/secret/Documents/2FD412E0/file.extension matches
I want to cd into 2FD412E0and move two folders inside that directory.
dirname is the key to that:
cd $(dirname $(grep "...." ...))
will let you enter the directory.
As people mentioned, dirname is the right tool to strip off the file name from the path.
I would use find for such kind of task:
while read -r file
do
target_dir=`dirname $file`
# do something with "$target_dir"
done < <(find /root/ -type f \
-exec grep "Apples" --files-with-matches {} \;)
Consider using find's -maxdepth option. See the man page for find.
Well, there is actually simpler solution :) I just like to write bash scripts. You might simply use single find command like this:
find /root/ -type f -exec grep Apples {} ';' -exec ls -l {} ';'
Note the second -exec. It will be executed, if the previous -exec command exited with status 0 (success). From the man page:
-exec command ;
Execute command; true if 0 status is returned. All following arguments to find are taken to be arguments to the command until an argument consisting of ; is encountered. The string {} is replaced by the current file name being processed everywhere it occurs in the arguments to the command, not just in arguments where it is alone, as in some versions of find.
Replace the ls -l command with your stuff.
And if you want to execute dirname within the -exec command, you may do the following trick:
find /root/ -type f -exec grep -q Apples {} ';' \
-exec sh -c 'cd `dirname $0`; pwd' {} ';'
Replace pwd with your stuff.
When find is not available
In the comments you write that find is not available on your system. The following solution works without find:
grep -R --files-with-matches Apples "${dir}" | while read -r file
do
target_dir=`dirname $file`
# do something with "$target_dir"
echo $target_dir
done

Resources