I'm having trouble dealing with escape characters in filenames reacting with tar. They can't be removed due to the nature of the data I'm moving. No matter how I format my escape character, it seems to pick up .file and \.file as the same thing.
Ex:
file contents
namefile1 contains \.file
namefile2 contains \\.file
Commands and their output as they appear:
tar -cvzf "./exampleout.tar" -C . -T namefile1
\\.file
tar -cvzf "./exampleout.tar" -C . -T namefile2
\\.file
If i try to list either I get this.
tar -tvf ./exampleout.tar
\\.file
I think I have resolved my issue. If you have \ in your filename, even if you pass it in via a file, you must still escape it. Even if it will take just one in some test cases the only way to make it work consistently. If you try to list the contents of your tar file, it will list it as whatever it was within the file but if you restore it restores in its original form if you can manage to find a glob that tar will take.
Related
I'm trying to find a glob that I can use with tar to match all files and folders, including hidden files in the current directory. I don't want to include the parent directory. I need my tar not have anything leading before the files. I also can't use shopt
I'm using a format similar to
tar --ignore-failed-read -czvf ../archive.tar.gz .[^.]* *
The --ignore-failed-read is required if there are NO hidden files because the first glob will return an error when it doesn't match anything.
I'm considering the following types of names for hidden files:
.a
.aa
..a
I've found a few examples of globs that will work but here are the problems I have:
.[^.]*
gets files .a and .aa but misses ..a
.??*
gets .aa and ..a but misses .a
Any ideas here?
Is there any way I can remove --ignore-failed-read
Try this:
tar --ignore-failed-read -czvf ../archive.tar.gz .[^.]* ..?* *
Just add a separate wildcard to cover the two dots case.
tar --ignore-failed-read -czvf ../archive.tar.gz .[^.]* ..?* *
The ? says there needs to be at least one character after the second dot.
You'll probably want to quote these wildcards to prevent the shell from expanding them, but I'll assume you understand how to handle that correctly.
I can remove file extensions if I know the extensions, for example to remove .txt from files:
foreach file (`find . -type f`)
mv $file `basename $file .txt`
end
However if I don't know what kind of file extension to begin with, how would I do this?
I tried:
foreach file (`find . -type f`)
mv $file `basename $file .*`
end
but it wouldn't work.
What shell is this? At least in bash you can do:
find . -type f | while read -r; do
mv -- "$REPLY" "${REPLY%.*}"
done
(The usual caveats apply: This doesn't handle files whose name contains newlines.)
You can use sed to compute base file name.
foreach file (`find . -type f`)
mv $file `echo $file | sed -e 's/^\(.*\)\.[^.]\+$/\1/'`
end
Be cautious: The command you seek to run could cause loss of data!
If you don't think your file names contain newlines or double quotes, then you could use:
find . -type f -name '?*.*' |
sed 's/\(.*\)\.[^.]*$/mv "&" "\1"/' |
sh
This generates your list of files (making sure that the names contain at least one character plus a .), runs each file name through the sed script to convert it into an mv command by effectively removing the material from the last . onwards, and then running the stream of commands through a shell.
Clearly, you test this first by omitting the | sh part. Consider running it with | sh -x to get a trace of what the shell's doing. Consider making sure you capture the output of the shell, standard output and standard error, into a log file so you've got a record of the damage that occurred.
Do make sure you've got a backup of the original set of files before you start playing with this. It need only be a tar file stored in a different part of the directory hierarchy, and you can remove it as soon as you're happy with the results.
You can choose any shell; this doesn't rely on any shell constructs except pipes and single quotes and double quotes (pretty much common to all shells), and the sed script is version neutral too.
Note that if you have files xyz.c and xyz.h before you run this, you'll only have a file xyz afterwards (and what it contains depends on the order in which the files are processed, which needn't be alphabetic order).
If you think your file names might contain double quotes (but not single quotes), you can play with the changing the quotes in the sed script. If you might have to deal with both, you need a more complex sed script. If you need to deal with newlines in file names, then it is time to (a) tell your user(s) to stop being silly and (b) fix the names so they don't contain newlines. Then you can use the script above. If that isn't feasible, you have to work a lot harder to get the job done accurately — you probably need to make sure you've got a find that supports -print0, a sed that supports -z and an xargs that supports -0 (installing the most recent GNU versions if you don't already have the right support in place).
It's very simple:
$ set filename=/home/foo/bar.dat
$ echo ${filename:r}
/home/foo/bar
See more in man tcsh, in "History substitution":
r
Remove a filename extension '.xxx', leaving the root name.
I'm trying to write two (edit: shell) scripts and am having some difficulty. I'll explain the purpose and then provide the script and current output.
1: get a list of every file name in a directory recursively. Then search the contents of all files in that directory for each file name. Should return the path, filename, and line number of each occurrence of the particular file name.
2: get a list of every file name in a directory recursively. Then search the contents of all files in the directory for each file name. Should return the path and filename of each file which is NOT found in any of the files in the directories.
I ultimately want to use script 2 to find and delete (actually move them to another directory for archiving) unused files in a website. Then I would want to use script 1 to see each occurrence and filter through any duplicate filenames.
I know I can make script 2 move each file as it is running rather than as a second step, but I want to confirm the script functions correctly before I do any of that. I would modify it after I confirm it is functioning correctly.
I'm currently testing this on an IMBi system in strqsh.
My test folder structure is:
scriptTest
---subDir1
------file4.txt
------file5.txt
------file6.txt
---subDir2
------file1.txt
------file7.txt
------file8.txt
------file9.txt
---file1.txt
---file2.txt
---file3.txt
I have text in some of those files which contains existing file names.
This is my current script 1:
#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d -exec basename {} \;`
for i in $files
do
grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;
done
Right now it functions correctly with exception to providing the path to the file which had a match. Doesn't grep return the file path by default?
I'm a little further away with script 2:
#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d`
for i in $files
do
#split $i on '/' and store into an array
IFS='/' read -a array <<< "$i"
#get last element of the array
echo "${array[-1]}"
#perform a grep similar to script 2 and store it into a variable
filename="grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;"
#Check if the variable has anything in it
if [ $filename = "" ]
#if not then output $i for the full path of the current needle.
then echo $i;
fi
done
I don't know how to split the string $i into an array. I keep getting an error on line 6
001-0059 Syntax error on line 6: token redirection not expected.
I'm planning on trying this on an actual linux distro to see if I get different results.
I appreciate any insight in advanced.
Introduction
This isn't really a full solution, as I'm not 100% sure I understand what you're trying to do. However, the following contain pieces of a solution that you may be able to stitch together to do what you want.
Create Test Harness
cd /tmp
mkdir -p scriptTest/subDir{1,2}
mkdir -p scriptTest/subDir1/file{4,5,6}.txt
mkdir -p scriptTest/subDir2/file{1,8,8}.txt
touch scriptTest/file{1,2,3}.txt
Finding and Deleting Duplicates
In the most general sense, you could use find's -exec flag or a Bash loop to run grep or other comparison on your files. However, if all you're trying to do is remove duplicates, then you might simply be better of using the fdupes or duff utilities to identify (and optionally remove) files with duplicate contents.
For example, given that all the .txt files in the test corpus are zero-length duplicates, consider the following duff and fdupes examples
duff
Duff has more options, but won't delete files for you directly. You'll likely need to use a command like duff -e0 * | xargs -0 rm to delete duplicates. To find duplicates using the default comparisons:
$ duff -r scriptTest/
8 files in cluster 1 (0 bytes, digest da39a3ee5e6b4b0d3255bfef95601890afd80709)
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
fdupes
This utility offers the ability to delete duplicates directly in various ways. One such way is to invoke fdupes . --delete --noprompt once you're confident that you're ready to proceed. However, to find the list of duplicates:
$ fdupes -R scriptTest/
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
Get a List of All Files, Including Non-Duplicates
$ find scriptTest -name \*.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
You could then act on each file with the find's -exec {} + feature, or simply use a grep that supports the --recursive --files-with-matches flags to find files with matching content.
Passing Find Results to a Bash Loop as an Array
Alternatively, if you know for sure that you won't have spaces in the file names, you can also use a Bash array to store the files into a variable you can iterate over in a Bash for-loop. For example:
files=$(find scriptTest -name \*.txt)
for file in "${files[#]}"; do
: # do something with each "$file"
done
Looping like this is often slower, but may provide you with the additional flexibility you need if you're doing something complicated. YMMV.
I am a total newbie to Linux and bash scripting and am currently stumped with this problem!
I have a directory containing many images from which I need to copy the unique images to a new location. I know there are numerous options for how to go about doing this but have very limited knowledge at the moment so appreciate I may be going about this the wrong way.
I used find and cat to create this list and have attempted to copy the files across with the intention of comparing them (using md5 and checking file names) when they are there.
However, the text file has 30 files on it but only 18 have been copied over. Can anyone advise?
My code to find files is -
find $1 -name "IMG_****.JPG" | cat > list.txt
and my code to copy from the list is
for image in $(cat list.txt);
do
cp $image $2
done
You're doing this much too complicated. Do not pipe find output to cat to pipe it into a list. This is an unnecessary use of cat. If you must, you can redirect the output of every program directly:
find "$1" -name "IMG_*.JPG" > list.txt
Also, do not use for to read lines from a file. Better use while with read:
while read -r filename; do
cp "$filename" "$2"
done < list.txt
But it's even easier. You can just work with the files directly from find:
find "$1" -name "IMG_*.JPG" -exec cp {} "$2" \;
Here, {} will be replaced by each filename that find finds. Don't forget to quote your variables, so that spaces in file paths are no problem.
Another much simpler method with Bash options:
shopt -s nullglob globstar
cp -t "$2" -- "$1"/**/IMG_*.JPG
Here, globstar enables recursive matching of directories through **. The -t option to cp specifies the target of the copy operation.* The command will be expanded to cp -t target -- source1/IMG_foo.JPG source2/IMG_bar.JPG et cetera.
Now, as to your original issue, it could have been that some images have a space in their name. This would have broken your original script. If your image files contained a newline in their name, it also wouldn't have worked with while read … – but you would have gotten an error in that case of a file not being found.
Also note that cp overwrites files with the same name. Without asking for confirmation. So if in your subdirectories there are images with the same filename, you'd only get one result, with the latest overwriting the existing one.
* The -- isn't strictly necessary, but it's a good habit to include it to tell the command when the options arguments are over.
I have several folders with some files that I would like to rename from
Foo'Bar - Title
to
Title
I'm using OS X 10.7. I've looked at other solutions, but none that address recursion very well.
Any suggestions?
There are two parts to your problem: Finding files to operate on recursively, and renaming them.
For the first, if everything is exactly one level below the current directory, you can just list the contents of every directory in the current directory (as in Mattias Wadman's answer above), but more generally (and possibly more easy to understand, to boot), you can just use the find command.
For the second, you can use sed and work out how to get the quoting and piping right (which you should definitely eventually learn), but it's much simpler to use the rename command. Unfortunately, this one isn't built in on Mac, but you can install it with, e.g., Homebrew, or just download the perl script and sudo install -m755 rename /usr/local/bin/rename.
So, you can do this:
find . -exec rename 's|[^/]* - ||' {} +
If you want to do a "dry run" to make sure it's right, add the "-n" flag to rename:
find . -exec rename -n 's|[^/]* - ||' {} +
To understand how it works, you really should read the tutorial for find, and the manpage for rename, but breaking it down:
find . means 'find all files recursively under the current directory'.
You can add additional tests to filter things (e.g., -type f if you want to skip everything but regular files, or `-name '*Title' if you want to only change files that end in 'Title'), but that isn't necessary for your use.
-exec … + means to batch up the found files, and pass as many of them as possible in place of any {} in the command that appears in the '…'.
rename 's|[^/]* - ||' {} means for each file in {}, apply the perl expression s|[^/]* - || to the filename, and, if the result is different, rename it to that result.
s|[^/]* - || means to match the regular expression '[^/]* -' and replace the match with '' (the empty string).
[^/]* - means to match any string of non-slash characters that ends with ' - '. So, in './A/FooBar - Title', it'll match the 'FooBar -'.
I should mention that, when I have something complicated to do like this, if after a few minutes and a couple attempts to get it right with find/sed/awk/rename/etc., I still haven't got it, I often just code it up imperatively with Python and os.walk. If you know Python, that might be easier for you to understand (although more verbose and less simple), and easier for you to modify to other use cases, so if you're interested, ask for that.
Try this:
ls -1 * | while read f ; do mv "$f" "`echo $f | sed 's/^.* - //'`" ; done
I recommend you to add a echo before mv before running it to make sure the commands look ok. And as abarnert noted in the comments this command will only work for one directory at a time.
Detailed explanation of the various commands:
ls -1 * will output a line for each file (and directory) in the current directory (except .-files). So this will be expanded in to ls -1 file1 file2 ..., -1 to ls tells it to list the filename only and one file per line.
The output is then piped into while read f ; ... ; done which will loop while read f returns zero, which it does until it reaches end of file. read f reads one line at a time from standard input (which in this case is the output from ls -1 ...) and store it in the the variable specified, in this case f.
In the while loop we run a mv command with two arguments, first "$f" as the source file (note the quotes to handle filenames with spaces etc) and second the destination filename which uses sed and ` (backticks) to do what is called command substitution that will call the command inside the backticks and be replaced it with the output from standard output.
echo $f | sed 's/^.* - //' pipes the current file $f into sed that will match a regular expression and do substitution (the s in s/) and output the result on standard output. The regular expression is ^.* - which will match from the start of the string ^ (called anchoring) and then any characters .* followed by - and replace it with the empty string (the string between //).
I know you asked for batch rename, but I suggest you to use Automator.
It works perfectly, and if you create it as a service you will have the option in your contextual menu :)
After some trial and error, I came across this solution that worked for me to solve the same problem.
find <dir> -name *.<oldExt> -exec rename -S .<oldExt> .<newExt> {} \;
Basically, I leverage the find and rename utilities. The trick here is figuring out where to place the '{}' (which represents the files that need to be processed by rename) of rename.
P.S. rename is not a built-in linux utility. I work with OS X and used homebrew to install rename.