I am using a shell script to remove the XML tags of a set of files in a folder. This is how my file looks like:
#!/bin/sh
find texts -type f -name '*.xml' -exec sh -c '
mkdir -p modified
file="$0"
sed "s/<[^>]*>//g" "$file" > modified/modified_texts
' {} ';'
This is supposed to take all the files(using $file) in the "texts" folder, remove their XML tags and place the files without the XML tags into the file "modified".
The problem is that, instead of taking all the files, it is using just one, and filling the file "modified_texts" with the content of one of the files(without XML tags, that part works).
I don't really understand what I'm doing wrong, so I would appreciate any help.
Instead of doing the output redirection (with truncation!) for every sed command, move it to the outer scope, so the output file is opened (and its prior contents are truncated) only once, before find is started at all.
#!/bin/sh
mkdir -p modified # this only needs to happen once, so move it outside
find texts -type f -name '*.xml' -exec sed 's/<[^>]*>//g' {} ';' > modified/modified_texts
Related
I need to strip files out of a number of directories that all have the same file name a.txt. The difference comes from the parent folder so
example1\a.txt
example2\a.txt
...
so I am hoping to run a FIND command that will capture a.txt but not overwrite the file as it moves from folder to folder.
so the output would be
example1_a.txt
example2_a.txt
So from another post the FIND command I want is the following
find . -name "a.txt" -execdir echo cp -v {} /path/to/dest/ \;
So I want to modify in some way to append the source folder to the file. so my guess is to manipulate {} somehow to do it.
Thanks in advance
A one liner might be possible, but you could use this:
#!/bin/bash
targetprefix="targetdir"
find . -name "a.txt" -print0 | while read -r -d '' line
do
path=$(dirname "$line")
newpath=$(echo "${path#./}" | tr '/' '_')
target="$targetprefix/$newpath"
filename=$(basename "$line")
cp -v $line $target/$filename
done
change variable "targetprefix" to be the destination directory you desire.
this find with -print0 and while comes from https://mywiki.wooledge.org/BashFAQ/001
since results from find all start with "./", I use "${path#./}" to remove that prefix.
the tr replaces all subsequent "/" with an underscore. This will take care of sub directories
WARNING: I did not test all "weird" directory and filename formats for proper execution (like carriage returns in filenames!).
I wrote a script in bash that should read the contents of a text file, look for the corresponding files for each line, and copy them to another folder. It's not copying all the files, only two, the third and the last.
#!/bin/bash
filelist=~/Desktop/file.txt
sourcedir=~/ownCloud2
destdir=~/Desktop/file_out
while read line; do
find $sourcedir -name $line -exec cp '{}' $destdir/$line \;
echo find $sourcedir -name $line
sleep 1
done < "$filelist"
If I use this string on the command line it finds me and copies the file.
find ~/ownCloud2 -name 123456AA.pdf -exec cp '{}' ~/Desktop/file_out/123456AA.pdf \;
If I use the script instead it doesn't work.
I used your exact script and had no problems, for both bash or sh, so maybe you are using another shell in your shebang line.
Use find only when you need to find the file "somewhere" in multiple directories under the search start point.
If you know the exact directory in which the file is located, there is no need to use find. Just use the simple copy command.
Also, if you use "cp -v ..." instead of the "echo", you might see what the command is actually doing, from which you might spot what is wrong.
I've a .txt file which contains names of various files.
When I simply use a while loop it works fine,
while read -r name
do
echo "$name"
done <fileNames.txt
But,
when I try to use find inside the loop, like this:
while read -r name
do
find ./ -iname "$name" -exec sed -i '1s/^/NEW LINE INSERTED \n/' '{}' ';'
done < fileNames.txt
nothing happens!
If i use the find outside the loop like with a specific file name it does what it's supposed to do, I can also use it on all files with a specific file-type but it doesn't work inside the loop.
What am I doing wrong over here?
I'm trying to read file names from a file, search for it inside a folder recursively and then append a line in the beginning using sed.
Use xargs instead to capture the results of find
while read -r name
do
find ./ -iname "$name" |xargs sed -i '1s/^/NEW LINE INSERTED \n/'
done <fileNames.txt
The history of this problem is:
I have millions of files and directories on a NAS system. I found a count of 1,095,601 empty (0 byte) files. These files used to have data but were destroyed by a predecessor not using the correct toolsets to migrate data between an XSAN and this Isilon NAS.
The files were media production data, like fonts, pdfs and image files. They are no longer useful beyond the history of their existence. Before I proceed to delete them, the production user's need a record of which files used to exist, so when they browse a project folder, they can use the unaffected files but then refer to a text file in the same directory which records which files used to also be there and thus provide reason as to why certain reference files are broken.
So how do I find files across multiple directories and delete them but first output their filename to a text file which would be saved to each relevant path location?
I am thinking along the lines of:
for file in $(find . -type f -size 0); do
echo "$file" >> /PATH/TO/FOUND/FILE/PARENT/DIR/deletedFiles.txt -print0 |
xargs -0 rm ;
done
To delete each empty file while leaving behind a file called deletedFiles.txt which contains the names of the deleted files, try:
PATH=/bin:/usr/bin find . -empty -type f -execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} + -delete
How it works
PATH=/bin:/usr/bin
This sets a temporary but secure path.
find .
This starts find looking in the current directory
-empty
This tells find to only look for empty files
-type f
This restricts find to looking for regular files.
-execdir bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
In each directory that contains an empty file, this adds the name of each empty file to the file deletedFiles.txt.
Notice the peculiar use of none in the command:
bash -c 'printf "%s\n" "$#" >>deletedFiles.txt' none {} +
When this command is run, bash will execute the string printf "%s\n" "$#" >>deletedFiles.txt and the arguments that follow that string are assigned to the positional parameters: $0, $1, $2, etc. When we use $#, it does not include $0. It, as is usual, expands to $1, $2, .... Thus, we add the placeholder none so that the placeholder is assigned is the $0, which we will ignore, and the complete list of file names are assigned to "$#".
-delete
This deletes each empty file.
Why not simply
find . -type f -size 0 -exec rm -v + |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
If your find is too old to support -exec ... + you'll need to revert to -exec rm -v {} \; or refactor to
find . -type f -size 0 -print0 |
xargs -r -0 rm -v |
sed -e 's%^removed .\./%%' -e 's/.$//' >deletedFiles.txt
The brief sed script is to postprocess the output from rm -v which looks like
removed ‘./bar’
removed ‘./foo’
(with some funny quote characters around the file name) on my system. If you are fine with that output, of course, just omit the sed script from the pipeline.
If you know in advance which directories contain empty files, you can run the above snippet individually in those directories. Assuming you saved the snippet above as a script (with a proper shebang and execute permissions) named find-empty, you could simply use
for path in /path/to/first /path/to/second/directory /path/to/etc; do
cd "$path" && find-empty
done
This will only work if you have absolute paths (if not, you can run the body of the loop in a subshell by adding parentheses around it).
If you want to inspect all the directories in a tree, change the script to print to standard output instead (remove >deletedFiles.txt from the script) and try something like
find /path/to/tree -type d -exec sh -c '
t=$(mktemp -t find-emptyXXXXXXXX)
cd "$1" &&
find-empty | grep . >"$t" &&
mv "$t" deletedFiles.txt ||
rm "$t"' _ {} \;
This uses a temporary file so as to avoid updating the timestamp of directories which do not contain any empty files. The grep . is used purely for side effect; if any (non-empty) lines are printed, it will return success, whereas otherwise, it will report failure; this way, we know whether or not to move the temporary file to the target directory.
With prompting from #JonathanLeffler I have succeeded with the following:
#!/bin/bash
## call this script with: find . -type f -empty -exec handleEmpty.sh {} +
for file in "$#"
do
file2="$(basename "$file")"
echo "$file2" >> "$(dirname "$file")"/deletedFiles.txt
rm "$file"
done
This means I retain a trace of the removed files in a deletedFiles.txt flag file in each respective directory for the users to see when files are missing. That way, they can pursue going back to archive CD's to retrieve these deleted files, which are hopefully not 0 byte files.
Thanks to #John1024 for the suggestion of using the empty flag rather than size.
I was thinking if using a BASH script is possible without manually copying each file that is in this parent directory
"/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.0.sdk
/System/Library/PrivateFrameworks"
So in this folder PrivateFrameworks, there are many subfolders and in each subfolder it consists of the file that I would like to copy it out to another location. So the structure of the path looks like this:
-PrivateFrameworks
-AccessibilityUI.framework
-AccessibilityUI <- copy this
-AccountSettings.framework
-AccountSettings <- copy this
I do not want the option of copying the entire content in the folder as there might be cases where the folders contain files which I do not want to copy. So the only way I thought of is to copy by the file extension. However as you can see, the files which I specified for copying does not have an extension(I think?). I am new to bash scripting so I am not familiar if this can be done with it.
To copy all files in or below the current directory that do not have extensions, use:
find . ! -name '*.*' -exec cp -t /your/destination/dir/ {} +
The find . command looks for all files in or below the current directory. The argument -name '*.*' would restrict that search to files that have extensions. By preceding it with a not (!), however, we get all files that do not have an extension. Then, -exec cp -t /your/destination/dir/ {} + tells find to copy those files to the destination.
To do the above starting in your directory with the long name, use:
find "/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS7.0.sdk/System/Library/PrivateFrameworks" ! -name '*.*' -exec cp -t /your/destination/dir/ {} +
UPDATE: The unix tag on this question has been removed and replaced with a OSX tag. That means we can't use the -t option on cp. The workaround is:
find . ! -name '*.*' -exec cp {} /your/destination/dir/ \;
This is less efficient because a new cp process is created for every file moved instead of once for all the files that fit on a command line. But, it will accomplish the same thing.
MORE: There are two variations of the -exec clause of a find command. In the first use above, the clause ended with {} + which tells find to fill up the end of command line with as many file names as will fit on the line.
Since OSX lacks cp -t, however, we have to put the file name in the middle of the command. So, we put {} where we want the file name and then, to signal to find where the end of the exec command is, we add a semicolon. There is a trick, though. Because bash would normally consume the semicolon itself rather than pass it on to find, we have to escape the semicolon with a backslash. That way bash gives it to the find command.
sh SCRIPT.sh copy-from-directory .extension copy-to-directory
FROM_DIR=$1
EXTENSION=$2
TO_DIR=$3
USAGE="""Usage: sh SCRIPT.sh copy-from-directory .extension copy-to-directory
- EXAMPLE: sh SCRIPT.sh PrivateFrameworks .framework .
- NOTE: 'copy-to-directory' argument is optional
"""
## print usage if less than 2 args
if [[ $# < 2 ]]; then echo "${USAGE}" && exit 1 ; fi
## set copy-to-dir default args
if [[ -z "$TO_DIR" ]] ; then TO_DIR=$PWD ; fi
## DO SOMETHING...
## find directories; find target file;
## copy target file to copy-to-dir if file exist
find $FROM_DIR -type d | while read DIR ; do
FILE_TO_COPY=$(echo $DIR | xargs basename | sed "s/$EXTENSION//")
if [[ -f $DIR/$FILE_TO_COPY ]] ; then
cp $DIR/$FILE_TO_COPY $TO_DIR
fi
done