Commandline find, sed, exec - bash

I have a bunch of files in a folder, in subfolders and I'm trying to make some kind of one-liner for quick copy/pasting once in a while.
The contents is (too long to paste here): http://pastebin.com/4aZCPbwT
I've tried the following commands:
List all files and their directories
find . -name '[!.]*'
Replace all instances of "Namespace" with "Test:
find . -name '[!.]*' -print0 | sed 's/Namespace/Test/gI' | xargs -i -0 echo '{}'
What I need to do is:
Replace foldes names like above, and copy the folders (including files), to another location. Create the folders if they don't exist (they most likely won't) - BUT, there are some of them that I don't need, like ./app, as this folder exists. I could use -wholename './app' for that.
When they are copied, I need to replace some text inside each file, same as above (Namespace with Test - also occours inside the files and save them of course).
Something like this I would imagine:
-print -exec sed -i 's/Namespace/Test/gI' {} \;
Can these 3 things be done in a one-liner? Replace text in files (Namespace <=> Test), copy files including their directories with cp -p (don't want to write over folders), but renaming each directory/file with as above (Namespace <=> Test).
Thanks a lot :-)

Besides describing the how with painstaking verbosity below, this method may also be unique in that it incorporates built-in debugging. It basically doesn't do anything at all as written except compile and save to a variable all commands it believes it should do in order to perform the work requested.
It also explicitly avoids loops as much as possible. Besides the sed recursive search for more than one match of the pattern there is no other recursion as far as I know.
And last, this is entirely null delimited - it doesn't trip on any character in any filename except the null. I don't think you should have that.
By the way, this is REALLY fast. Look:
% _mvnfind() { mv -n "${1}" "${2}" && cd "${2}"
> read -r SED <<SED
> :;s|${3}\(.*/[^/]*${5}\)|${4}\1|;t;:;s|\(${5}.*\)${3}|\1${4}|;t;s|^[0-9]*\(.*\)${5}|\1|p
> SED
> find . -name "*${3}*" -printf "%d\tmv %P ${5} %P\000" |
> sort -zg | sed -nz ${SED} | read -r ${6}
> echo <<EOF
> Prepared commands saved in variable: ${6}
> To view do: printf ${6} | tr "\000" "\n"
> To run do: sh <<EORUN
> $(printf ${6} | tr "\000" "\n")
> EORUN
> EOF
> }
% rm -rf "${UNNECESSARY:=/any/dirs/you/dont/want/moved}"
% time ( _mvnfind ${SRC=./test_tree} ${TGT=./mv_tree} \
> ${OLD=google} ${NEW=replacement_word} ${sed_sep=SsEeDd} \
> ${sh_io:=sh_io} ; printf %b\\000 "${sh_io}" | tr "\000" "\n" \
> | wc - ; echo ${sh_io} | tr "\000" "\n" | tail -n 2 )
<actual process time used:>
0.06s user 0.03s system 106% cpu 0.090 total
<output from wc:>
Lines Words Bytes
115 362 20691 -
<output from tail:>
mv .config/replacement_word-chrome-beta/Default/.../googlestars \
.config/replacement_word-chrome-beta/Default/.../replacement_wordstars
NOTE: The above function will likely require GNU versions of sed and find to properly handle the find printf and sed -z -e and :;recursive regex test;t calls. If these are not available to you the functionality can likely be duplicated with a few minor adjustments.
This should do everything you wanted from start to finish with very little fuss. I did fork with sed, but I was also practicing some sed recursive branching techniques so that's why I'm here. It's kind of like getting a discount haircut at a barber school, I guess. Here's the workflow:
rm -rf ${UNNECESSARY}
I intentionally left out any functional call that might delete or destroy data of any kind. You mention that ./app might be unwanted. Delete it or move it elsewhere beforehand, or, alternatively, you could build in a \( -path PATTERN -exec rm -rf \{\} \) routine to find to do it programmatically, but that one's all yours.
_mvnfind "${#}"
Declare its arguments and call the worker function. ${sh_io} is especially important in that it saves the return from the function. ${sed_sep} comes in a close second; this is an arbitrary string used to reference sed's recursion in the function. If ${sed_sep} is set to a value that could potentially be found in any of your path- or file-names acted upon... well, just don't let it be.
mv -n $1 $2
The whole tree is moved from the beginning. It will save a lot of headache; believe me. The rest of what you want to do - the renaming - is simply a matter of filesystem metadata. If you were, for instance, moving this from one drive to another, or across filesystem boundaries of any kind, you're better off doing so at once with one command. It's also safer. Note the -noclobber option set for mv; as written, this function will not put ${SRC_DIR} where a ${TGT_DIR} already exists.
read -R SED <<HEREDOC
I located all of sed's commands here to save on escaping hassles and read them into a variable to feed to sed below. Explanation below.
find . -name ${OLD} -printf
We begin the find process. With find we search only for anything that needs renaming because we already did all of the place-to-place mv operations with the function's first command. Rather than take any direct action with find, like an exec call, for instance, we instead use it to build out the command-line dynamically with -printf.
%dir-depth :tab: 'mv '%path-to-${SRC}' '${sed_sep}'%path-again :null delimiter:'
After find locates the files we need it directly builds and prints out (most) of the command we'll need to process your renaming. The %dir-depth tacked onto the beginning of each line will help to ensure we're not trying to rename a file or directory in the tree with a parent object that has yet to be renamed. find uses all sorts of optimization techniques to walk your filesystem tree and it is not a sure thing that it will return the data we need in a safe-for-operations order. This is why we next...
sort -general-numerical -zero-delimited
We sort all of find's output based on %directory-depth so that the paths nearest in relationship to ${SRC} are worked first. This avoids possible errors involving mving files into non-existent locations, and it minimizes need to for recursive looping. (in fact, you might be hard-pressed to find a loop at all)
sed -ex :rcrs;srch|(save${sep}*til)${OLD}|\saved${SUBSTNEW}|;til ${OLD=0}
I think this is the only loop in the whole script, and it only loops over the second %Path printed for each string in case it contains more than one ${OLD} value that might need replacing. All other solutions I imagined involved a second sed process, and while a short loop may not be desirable, certainly it beats spawning and forking an entire process.
So basically what sed does here is search for ${sed_sep}, then, having found it, saves it and all characters it encounters until it finds ${OLD}, which it then replaces with ${NEW}. It then heads back to ${sed_sep} and looks again for ${OLD}, in case it occurs more than once in the string. If it is not found, it prints the modified string to stdout (which it then catches again next) and ends the loop.
This avoids having to parse the entire string, and ensures that the first half of the mv command string, which needs to include ${OLD} of course, does include it, and the second half is altered as many times as is necessary to wipe the ${OLD} name from mv's destination path.
sed -ex...-ex search|%dir_depth(save*)${sed_sep}|(only_saved)|out
The two -exec calls here happen without a second fork. In the first, as we've seen, we modify the mv command as supplied by find's -printf function command as necessary to properly alter all references of ${OLD} to ${NEW}, but in order to do so we had to use some arbitrary reference points which should not be included in the final output. So once sed finishes all it needs to do, we instruct it to wipe out its reference points from the hold-buffer before passing it along.
AND NOW WE'RE BACK AROUND
read will receive a command that looks like this:
% mv /path2/$SRC/$OLD_DIR/$OLD_FILE /same/path_w/$NEW_DIR/$NEW_FILE \000
It will read it into ${msg} as ${sh_io} which can be examined at will outside of the function.
Cool.
-Mike

I haven't tested this, but I think it's what you're after.
find . -name '[!.]*' -print | while read line; do nfile=`echo "$line" | sed 's/Namespace/Test/gI'`; mkdir -p "`dirname $nfile`"; cp -p "$line" "$nfile"; sed -i 's/Namespace/Test/gI' "$nfile"; done

Related

How do I find duplicate files by comparing them by size (ie: not hashing) in bash

How do I find duplicate files by comparing them by size (ie: not hashing) in bash.
Testbed files:
-rw-r--r-- 1 usern users 68239 May 3 12:29 The W.pdf
-rw-r--r-- 1 usern users 68239 May 3 12:29 W.pdf
-rw-r--r-- 1 usern users 8 May 3 13:43 X.pdf
Yes, files can have spaces (Boo!).
I want to check files in the same directory, move the ones which match something else into 'these are probably duplicates' folder.
My probable use-case is going to have humans randomly mis-naming a smaller set of files (ie: not generating files of arbitrary length). It is fairly unlikely that two files will be the same size and yet be different files. Sure, as a backup I could hash and check two files of identical size. But mostly, it will be people taking a file and misnaming it / re-adding it to a pile, of which it is already there.
So, preferably a solution with widely installed tools (posix?). And I'm not supposed to parse the output of ls, so I need another way to get actual size (and not a du approximate).
"Vote to close!"
Hold up cowboy.
I bet you're going to suggest this (cool, you can google search):
https://unix.stackexchange.com/questions/71176/find-duplicate-files
No fdupes (nor jdupes, nor...), nor finddup, nor rmlint, nor fslint - I can't guarantee those on other systems (much less mine), and I don't want to be stuck as customer support dealing with installing them on random systems from now to eternity, nor even in getting emails about that sh...stuff and having to tell them to RTFM and figure it out. Plus, in reality, I should write my script to test functionality of what is installed, but, that's beyond the scope.
https://unix.stackexchange.com/questions/192701/how-to-remove-duplicate-files-using-bash
All these solutions want to start by hashing. Some cool ideas in some of these: hash just a chunk of both files, starting somewhere past the header, then only do full compare if those turn up matching. Good idea for double checking work, but would prefer to only do that on the very, very few that actually are duplicate. As, looking over the first several thousand of these by hand, not one duplicate has been even close to a different file.
https://unix.stackexchange.com/questions/277697/whats-the-quickest-way-to-find-duplicated-files
Proposed:
$find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
Breaks for me:
find: unknown option -- n
usage: find [-dHhLXx] [-f path] path ... [expression]
uniq: unknown option -- w
usage: uniq [-ci] [-d | -u] [-f fields] [-s chars] [input_file [output_file]]
find: unknown option -- t
usage: find [-dHhLXx] [-f path] path ... [expression]
xargs: md5sum: No such file or directory
https://unix.stackexchange.com/questions/170693/compare-directory-trees-regarding-file-name-and-size-and-date
Haven't been able to figure out how rsync -nrvc --delete might work in the same directory, but there might be solution in there.
Well how about cmp? Yeah, that looks pretty good, actually!
cmp -z file1 file2
Bummer, my version of cmp does not include the -z size option.
However, I tried implementing it just for grins - and when it failed, looking at it I realized that I also need help constructing my loop logic. Removing things from my loops in the midst of processing them is probably a recipe for breakage, duh.
if [ ! -d ../Dupes/ ]; then
mkdir ../Dupes/ || exit 1 # Cuz no set -e, and trap not working
fi
for i in ./*
do
for j in ./*
do
if [[ "$i" != "$j" ]]; then # Yes, it will be identical to itself
if [[ $(cmp -s "$i" "$j") ]]; then
echo "null" # Cuz I can't use negative of the comparison?
else
mv -i "$i" ../Dupes/
fi
fi
done
done
https://unix.stackexchange.com/questions/367749/how-to-find-and-delete-duplicate-files-within-the-same-directory
Might have something I could use, but I'm not following what's going on in there.
https://superuser.com/questions/259148/bash-find-duplicate-files-mac-linux-compatible
If it were something that returns size, instead of md5, maybe one of the answers in here?
https://unix.stackexchange.com/questions/570305/what-is-the-most-efficient-way-to-find-duplicate-files
Didn't really get answered.
TIL: Sending errors from . scriptname will close my terminal instantly. Thanks, Google!
TIL: Sending errors from scripts executed via $PATH will close the terminal if shopt -s extdebug + trap checkcommand DEBUG are set in profile to try and catch rm -r * - but at least will respect my alias for exit
TIL: Backticks deprecated, use $(things) - Ugh, so much re-writing to do :P
TIL: How to catch non-ascii characters in filenames, without using basename
TIL: "${file##*/}"
TIL: file - yes, X.pdf is not a PDF.
On the matter of POSIX
I'm afraid you cannot get the actual file size (not the number of blocks allocated by the file) in a plain posix shell without using ls. All the solutions like du --apparent-size, find -printf %s, and stat are not posix.
However, as long as your filenames don't contain linebreaks (spaces are ok) you could create safe solutions relying on ls. Correctly handling filenames with linebreaks would require very non-posix tools (like GNU sort -z) anyway.
Bash+POSIX Approach Actually Comparing The Files
I would drop the approach to compare only the file sizes and use cmp instead. For huge directories the posix script will be slow no matter what you do. Also, I expect cmp to do some fail fast checks (like comparing the file sizes) before actually comparing the file contents. For common scenarios with only a few files speed shouldn't matter anyway as even the worst script will run fast enough.
The following script places each group of actual duplicates (at least two, but can be more) into its own subdirectory of dups/. The script should work with all filenames; spaces, special symbols, and even linebreaks are ok. Note that we are still using bash (which is not posix). We just assume that all tools (like mv, find, ...) are posix.
#! /usr/bin/env bash
files=()
for f in *; do [ -f "$f" ] && files+=("$f"); done
max=${#files[#]}
for (( i = 0; i < max; i++ )); do
sameAsFileI=()
for (( j = i + 1; j < max; j++ )); do
cmp -s "${files[i]}" "${files[j]}" &&
sameAsFileI+=("${files[j]}") &&
unset 'files[j]'
done
(( ${#sameAsFileI[#]} == 0 )) && continue
mkdir -p "dups/$i/"
mv "${files[i]}" "${sameAsFileI[#]}" "dups/$i/"
# no need to unset files[i] because loops won't visit this entry again
files=("${files[#]}") # un-sparsify array
max=${#files[#]}
done
Fairly Portable Non-POSIX Approach Using File Sizes Only
If you need a faster approach that only compares the file sizes I suggest to not use a nested loop. Loops in bash are slow already, but if you nest them you have quadratic time complexity. It is faster and easier to ...
print only the file sizes without file names
apply sort | uniq -d to retrieve duplicates in time O(n log n)
Move all files having one of the duplicated sizes to a directory
This solution is not strictly posix conform. However, I tried to verify, that the tools and options in this solution are supported by most implementations. Your find has to support the non-posix options -maxdepth and -printf with %s for the actual file size and %f for the file basename (%p for the full path would be acceptable too).
The following script places all files of the same size into the directory potential-dups/. If there are two files of size n and two files of size m all four files end up in this single directory. The script should work with all file names expect those with linebreaks (that is \n; \r should be fine though).
#! /usr/bin/env sh
all=$(find . -maxdepth 1 -type f -printf '%s %f\n' | sort)
dupRegex=$(printf %s\\n "$all" | cut -d' ' -f1 | uniq -d |
sed -e 's/[][\.|$(){}?+*^]/\\&/g' -e 's/^/^/' | tr '\n' '|' | sed 's/|$//')
[ -z "$dupRegex" ] && exit
mkdir -p potential-dups
printf %s\\n "$all" | grep -E "$dupRegex" | cut -d' ' -f2- |
sed 's/./\\&/' | xargs -I_ mv _ potential-dups
In case you wonder about some of the sed commands: They quote the file names such that spaces and special symbols are processed correctly by subsequent tools. sed 's/[][\.|$(){}?+*^]/\\&/g' is for turning raw strings into equivalent extended regular expressions (ERE) and sed 's/./\\&/' is for literal processing by xargs. See the posix documentation of xargs:
-I replstr [...] Any <blank>s at the beginning of each line shall be ignored.
[...]
Note that the quoting rules used by xargs are not the same as in the shell. [...] An easy rule that can be used to transform any string into a quoted form that xargs interprets correctly is to precede each character in the string with a backslash.

How do i loop through certain number of folders

I am trying to loop through a folder that has 744 sub directories. How do i loop through only certain number of folders. Since i have 744 sub directories i would split this into half and loop through first 372 directories and then later on loop through rest of the 372 directories. I want to make sure i don't copy directories multiple times. Below is what i tried doing but i want to know what would be the effective way of doing this to avoid duplication.
for d in `ls -tr|tail -372`
do
echo $d
done
Since my xargs answer didn't receive any feedback, here's another approach.
printf "%s\n" */ |
awk 'BEGIN { n=1; OFS="\t"
split("first:second:third", destination, /:/) }
(i++ % 372)=0 { ++n }
{ print destination[n], $0 }'
This will add a field in front of each directory name, which you can use to process the results further. Sample output:
first directory1/
first directory2/
first directory3/
:
first directory372/
second directory373/
second directory374/
:
second directory743/
second directory744/
So the field value third from the Awk script is never used, but I put it in anyway to demonstrate that this could easily be extended to do three-way partitions, or four-way or what have you.
You would use this e.g. by piping to
while IFS= read -r dest dir; do
echo mv "dir" "$dest"
done
Unlike the xargs -0 answer, this is not robust against arbitrary file names; in particular, directory names which contain newlines will not work correctly.
Actually a much better solution would be to split the files the other way -- i.e. for a two-way partition, print first on every other line, and second on every other. Then you don't have to hard-code the number of items, just the number of partitions.
printf "%s\n" */ |
awk 'BEGIN { OFS="\t"
n = split("ernie:bert", host, /:/) }
{ print host[1+((NR-1)% n)], $0 }' |
while IFS= read -r server dir; do
mkdir -p "$server"
mv "$dir" "$server/"
done
Regardless of the number of directories, this splits them evenly into the directories ernie and bert, on the optimistic assumption that you (too) might have named your file servers after Sesame Street characters.
If you want to scp the directories instead of mv them, grouping them by server name would be a lot more efficient; but a simple sort takes care of that if necessary. (That's not the only reason we print the destination before each file name; it's also useful because then we don't have to worry that the directory names could contain our field separator.)
You can use xargs but this requires the 372 directory names to fit in one invocation (i.e. the directory names combined must not exceed ARG_MAX).
printf '%s\0' */ |
xargs -n 372 -r -0 sh -c '
d=dest$$; mkdir "$d"; cp "$#" "$d"' _
This will generate a unique new directory with the prefix dest and a number for each batch of directories it copies. There are probably better ways to split the files (and calling sh from xargs is not exactly a newbie-friendly answer) but maybe this should at least give you some ideas.
In some more detail, xargs -n 372 limits the number of arguments that get processed in one go, and the command you pass to xargs could be something a lot simpler; xargs -n 372 cp -t fnord would copy first 372 directories to fnord, then another 372; but in order for this to be actualy useful, we want the destination directory to change each time we call xargs, and so I put in a simple script which does that.
You also need to understand that 372 is a maximum, and if the directory names are really long, xargs could decide that it needs to pass fewer directories in order to not cross the "argument list too long" limit. But for your use case, on any remotely modern system, we are probably far below that limit anyway.
xargs -0 and cp -t are GNU extensions, i.e. they should work on Linux out of the box, and you can install them on most other platforms; if you really need to support something like Solaris without installing external tools, that's going to be slightly more challenging.
Addendum: Here's a xargs implementation of the ernie & bert part of the Awk answer:
printf "%s\0" */ |
xargs -r -0 -n 2 sh -c '
mkdir -p ernie bert
mv "$1" ernie
mv "$2" bert' _
There will be an ugly but harmless error message for the last item if you have an uneven number of input directories. There are obvious but inelegant ways to fix that, or elegant but obscure ones; but I prefer to keep this plain for now.

bash change absolute path in file line by line for script creation

I'm trying to create a bash script based on a input file (list.txt). The input File contains a list of files with absolute path. The output should be a bash script (move.sh) which moves the files to another location, preserve the folder structure, but changing the target folder name slightly before.
the Input list.txt File example looks like this :
/In/Folder_1/SomeFoldername1/somefilename_x.mp3
/In/Folder_2/SomeFoldername2/somefilename_y.mp3
/In/Folder_3/SomeFoldername3/somefilename_z.mp3
The output file (move.sh) should looks like this after creation :
mv "/In/Folder_1/SomeFoldername1/somefilename_x.mp3" /gain/Folder_1/
mv "/In/Folder_2/SomeFoldername2/somefilename_y.mp3" /gain/Folder_2/
mv "/In/Folder_3/SomeFoldername3/somefilename_z.mp3" /gain/Folder_3/
The folder structure should be preserved, more or less.
after executing the created bash script (move.sh), the result should looks like this :
/gain/Folder_1/somefilename_x.mp3
/gain/Folder_2/somefilename_y.mp3
/gain/Folder_3/somefilename_z.mp3
What I've done so far.
1. create a list of files with absolute path
find /In/ -iname "*.mp3" -type f > /home/maars/mp3/list.txt
2. create the move.sh script
cp -a /home/maars/mp3/list.txt /home/maars/mp3/move.sh
# read the list and split the absolute path into fields
while IFS= read -r line;do
fields=($(printf "%s" "$line"|cut -d'/' --output-delimiter=' ' -f1-))
done < /home/maars/mp3/move.sh
# add the target path based on variables at the end of the line
sed -i -E "s|\.mp3|\.mp3"\"" /gain/"${fields[1]}"/|g" /home/maars/mp3/move.sh
sed -i "s|/In/|mv "\""/In/|g" /home/maars/mp3/move.sh
The script just use the value of ${fields[1]}, which is Folder_1 and put this in all lines at the end. Instead of Folder_2 and Folder_3.
The current result looks like
mv "/In/Folder_1/SomeFoldername1/somefilename_x.mp3" /gain/Folder_1/
mv "/In/Folder_2/SomeFoldername2/somefilename_y.mp3" /gain/Folder_1/
mv "/In/Folder_3/SomeFoldername3/somefilename_z.mp3" /gain/Folder_1/
rsync is not an option since I need the full control of files to be moved.
What could I do better to solve this issue ?
EDIT : #Socowi helped me a lot by pointing me in the right direction. After I did a deep dive into the World of Regex, I could solve my Issues. Thank you very much
The script just use the value of ${fields[1]}, which is Folder_1 and put this in all lines at the end. Instead of Folder_2 and Folder_3.
You iterate over all lines and update fields for every line. After you finished the loop, fields retains its value (from the last line). You would have to move the sed commands into your loop and make sure that only the current line is replaced by sed. However, there's a better way – see down below.
What could I do better
There are a lot of things you could improve, for instance
Creating the array fields with mapfile -d/ fields instead of printf+cut+($()). That way, you also wouldn't have problems with spaces in paths.
Use sed only once instead of creating the array fields and using multiple sed commands. You can replace step 2 with this small script:
cp -a /home/maars/mp3/list.txt /home/maars/mp3/move.sh
sed -i -E 's|^/[^/]*/([^/]*).*$|mv "&" "/gain/\1"|' /home/maars/mp3/move.sh
However, the best optimization would be to drop that three step approach and use only one script to find and move the files:
find /In/ -iname "*.mp3" -type f -exec rename -n 's|^/.*?/(.*?)/.*/(.*)$|/gain/$1/$2|' {} +
The -n option will print what will be renamed without actually renaming anything . Remove the -n when you are happy with the result. Here is the output:
rename(/In/Folder_1/SomeFoldername1/somefilename_x.mp3, /gain/Folder_1/somefilename_x.mp3)
rename(/In/Folder_2/SomeFoldername2/somefilename_y.mp3, /gain/Folder_2/somefilename_y.mp3)
rename(/In/Folder_3/SomeFoldername3/somefilename_z.mp3, /gain/Folder_3/somefilename_z.mp3)
It's not builtin to bash, but the mmv command is nice for this kind of mv where you need to use wildcards in paths. Something like the following should work:
mmv "in/*/*/*" "#1/#3"
Note that this won't create the directories for you - but in your example above it looks like these already exist?

Iterate through list of filenames in order they were created in bash

Parsing output of ls to iterate through list of files is bad. So how should I go about iterating through list of files in order by which they were first created? I browsed several questions here on SO and they all seem to parsing ls.
The embedded link suggests:
Things get more difficult if you wanted some specific sorting that
only ls can do, such as ordering by mtime. If you want the oldest or
newest file in a directory, don't use ls -t | head -1 -- read Bash FAQ
99 instead. If you truly need a list of all the files in a directory
in order by mtime so that you can process them in sequence, switch to
perl, and have your perl program do its own directory opening and
sorting. Then do the processing in the perl program, or -- worst case
scenario -- have the perl program spit out the filenames with NUL
delimiters.
Even better, put the modification time in the filename, in YYYYMMDD
format, so that glob order is also mtime order. Then you don't need ls
or perl or anything. (The vast majority of cases where people want the
oldest or newest file in a directory can be solved just by doing
this.)
Does that mean there is no native way of doing it in bash? I don't have the liberty to modify the filename to include the time in them. I need to schedule a script in cron that would run every 5 minutes, generate an array containing all the files in a particular directory ordered by their creation time and perform some actions on the filenames and move them to another location.
The following worked but only because I don't have funny filenames. The files are created by a server so it will never have special characters, spaces, newlines etc.
files=( $(ls -1tr) )
I can write a perl script that would do what I need but I would appreciate if someone can suggest the right way to do it in bash. Portable option would be great but solution using latest GNU utilities will not be a problem either.
sorthelper=();
for file in *; do
# We need something that can easily be sorted.
# Here, we use "<date><filename>".
# Note that this works with any special characters in filenames
sorthelper+=("$(stat -n -f "%Sm%N" -t "%Y%m%d%H%M%S" -- "$file")"); # Mac OS X only
# or
sorthelper+=("$(stat --printf "%Y %n" -- "$file")"); # Linux only
done;
sorted=();
while read -d $'\0' elem; do
# this strips away the first 14 characters (<date>)
sorted+=("${elem:14}");
done < <(printf '%s\0' "${sorthelper[#]}" | sort -z)
for file in "${sorted[#]}"; do
# do your stuff...
echo "$file";
done;
Other than sort and stat, all commands are actual native Bash commands (builtins)*. If you really want, you can implement your own sort using Bash builtins only, but I see no way of getting rid of stat.
The important parts are read -d $'\0', printf '%s\0' and sort -z. All these commands are used with their null-delimiter options, which means that any filename can be procesed safely. Also, the use of double-quotes in "$file" and "${anarray[*]}" is essential.
*Many people feel that the GNU tools are somehow part of Bash, but technically they're not. So, stat and sort are just as non-native as perl.
With all of the cautions and warnings against using ls to parse a directory notwithstanding, we have all found ourselves in this situation. If you do find yourself needing sorted directory input, then about the cleanest use of ls to feed your loop is ls -opts | read -r name; do... This will handle spaces in filenames, etc.. without requiring a reset of IFS due to the nature of read itself. Example:
ls -1rt | while read -r fname; do # where '1' is ONE not little 'L'
So do look for cleaner solutions avoiding ls, but if push comes to shove, ls -opts can be used sparingly without the sky falling or dragons plucking your eyes out.
let me add the disclaimer to keep everyone happy. If you like newlines inside your filenames -- then do not use ls to populate a loop. If you do not have newlines inside your filenames, there are no other adverse side-effects.
Contra: TLDP Bash Howto Intro:
#!/bin/bash
for i in $( ls ); do
echo item: $i
done
It appears that SO users do not know what the use of contra means -- please look it up before downvoting.
You can try using use stat command piped with sort:
stat -c '%Y %n' * | sort -t ' ' -nk1 | cut -d ' ' -f2-
Update: To deal with filename with newlines we can use %N format in stat andInstead of cut we can use awk like this:
LANG=C stat -c '%Y^A%N' *| sort -t '^A' -nk1| awk -F '^A' '{print substr($2,2,length($2)-2)}'
Use of LANG=C is needed to make sure stat uses single quotes only in quoting file names.
^A is conrtrol-A character typed using ControlVA keys together.
How about a solution with GNU find + sed + sort?
As long as there are no newlines in the file name, this should work:
find . -type f -printf '%T# %p\n' | sort -k 1nr | sed 's/^[^ ]* //'
It may be a little more work to ensure it is installed (it may already be, though), but using zsh instead of bash for this script makes a lot of sense. The filename globbing capabilities are much richer, while still using a sh-like language.
files=( *(oc) )
will create an array whose entries are all the file names in the current directory, but sorted by change time. (Use a capital O instead to reverse the sort order). This will include directories, but you can limit the match to regular files (similar to the -type f predicate to find):
files=( *(.oc) )
find is needed far less often in zsh scripts, because most of its uses are covered by the various glob flags and qualifiers available.
I've just found a way to do it with bash and ls (GNU).
Suppose you want to iterate through the filenames sorted by modification time (-t):
while read -r fname; do
fname=${fname:1:((${#fname}-2))} # remove the leading and trailing "
fname=${fname//\\\"/\"} # removed the \ before any embedded "
fname=$(echo -e "$fname") # interpret the escaped characters
file "$fname" # replace (YOU) `file` with anything
done < <(ls -At --quoting-style=c)
Explanation
Given some filenames with special characters, this is the ls output:
$ ls -A
filename with spaces .hidden_filename filename?with_a_tab filename?with_a_newline filename_"with_double_quotes"
$ ls -At --quoting-style=c
".hidden_filename" " filename with spaces " "filename_\"with_double_quotes\"" "filename\nwith_a_newline" "filename\twith_a_tab"
So you have to process a little each filename to get the actual one. Recalling:
${fname:1:((${#fname}-2))} # remove the leading and trailing "
# ".hidden_filename" -> .hidden_filename
${fname//\\\"/\"} # removed the \ before any embedded "
# filename_\"with_double_quotes\" -> filename_"with_double_quotes"
$(echo -e "$fname") # interpret the escaped characters
# filename\twith_a_tab -> filename with_a_tab
Example
$ ./script.sh
.hidden_filename: empty
filename with spaces : empty
filename_"with_double_quotes": empty
filename
with_a_newline: empty
filename with_a_tab: empty
As seen, file (or the command you want) interprets well each filename.
Each file has three timestamps:
Access time: the file was opened and read. Also known as atime.
Modification time: the file was written to. Also known as mtime.
Inode modification time: the file's status was changed, such as the file had a new hard link created, or an existing one removed; or if the file's permissions were chmod-ed, or a few other things. Also known as ctime.
Neither one represents the time the file was created, that information is not saved anywhere. At file creation time, all three timestamps are initialized, and then each one gets updated appropriately, when the file is read, or written to, or when a file's permissions are chmoded, or a hard link created or destroyed.
So, you can't really list the files according to their file creation time, because the file creation time isn't saved anywhere. The closest match would be the inode modification time.
See the descriptions of the -t, -u, -c, and -r options in the ls(1) man page for more information on how to list files in atime, mtime, or ctime order.
Here's a way using stat with an associative array.
n=0
declare -A arr
for file in *; do
# modified=$(stat -f "%m" "$file") # For use with BSD/OS X
modified=$(stat -c "%Y" "$file") # For use with GNU/Linux
# Ensure stat timestamp is unique
if [[ $modified == *"${!arr[#]}"* ]]; then
modified=${modified}.$n
((n++))
fi
arr[$modified]="$file"
done
files=()
for index in $(IFS=$'\n'; echo "${!arr[*]}" | sort -n); do
files+=("${arr[$index]}")
done
Since sort sorts lines, $(IFS=$'\n'; echo "${!arr[*]}" | sort -n) ensures the indices of the associative array get sorted by setting the field separator in the subshell to a newline.
The quoting at arr[$modified]="${file}" and files+=("${arr[$index]}") ensures that file names with caveats like a newline are preserved.

How can I process a list of files that includes spaces in its names in Unix?

I'm trying to list the files in a directory and do something to them in the Mac OS X prompt.
It should go like this: for f in $(ls -1); do echo $f; done
If I have files without spaces in their names (fileA.txt, fileB.txt), the echo works fine.
If the files include spaces in their names ("file A.txt", "file B.txt"), I get 4 strings (file, A.txt, file, B.txt).
I've tried quoting the listing command, but it only changed the problem.
If I do this: for f in $(ls -1); do echo $f; done
I get: file A.txt\nfile B.txt
(It displays correctly, but it is a single string and I need the 2 lines separated.
Step away from ls if at all possible. Use find from the findutils package.
find /target/path -type f -print0 | xargs -0 your_command_here
-print0 will cause find to output the names separated by NUL characters (ASCII zero). The -0 argument to xargs tells it to expect the arguments separated by NUL characters too, so everything will work just fine.
Replace /target/path with the path under which your files are located.
-type f will only locate files. Use -type d for directories, or omit altogether to get both.
Replace your_command_here with the command you'll use to process the file names. (Note: If you run this from a shell using echo for your_command_here you'll get everything on one line - don't get confused by that shell artifact, xargs will do the expected right thing anyway.)
Edit: Alternatively (or if you don't have xargs), you can use the much less efficient
find /target/path -type f -exec your_command_here \{\} \;
\{\} \; is the escape for {} ; which is the placeholder for the currently processed file. find will then invoke your_command_here with {} ; replaced by the file name, and since your_command_here will be launched by find and not by the shell the spaces won't matter.
The second version will be less efficient since find will launch a new process for each and every file found. xargs is smart enough to pipe the commands to a newly launched process if it can figure it's safe to do so. Prefer the xargs version if you have the choice.
for f in *; do echo "$f"; done
should do what you want. Why are you using ls instead of * ?
In general, dealing with spaces in shell is a PITA. Take a look at the $IFS variable, or better yet at Perl, Ruby, Python, etc.
Here's an answer using $IFS as discussed by derobert
http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html
You can pipe the arguments into read. For example, to cat all files in the directory:
ls -1 | while read FILENAME; do cat "$FILENAME"; done
This means you can still use ls, as you have in your question, or any other command that produces $IFS delimited output.
The while loop makes it much easier to do several things to the argument, and makes complex processing more readable in my opinion. A contrived example:
ls -1 | while read FILE
do
echo 1: "$FILE"
echo 2: "$FILE"
done
look --quoting-style option.
for instance, --quoting-style=c would produce :
$ ls --quoting-style=c
"file1" "file2" "dir one"
Check out the manpage for xargs:
it works like this:
ls -1 /tmp/*.jpeg | xargs rm

Resources