I'm grouping files by dates in the filenames and processing them by groups.
for m in {01..12}; do
for d in {01..31}; do
f=`ls ./mydir/2018.${m}.${d}T*.jpg`
# process files
done
done
However, the code raises error if no files exist for some dates, e.g.,
ls: cannot access '2018.01.20T*.jpg': No such file or directory
How can I skip missing dates?
Enable nullglob so non-matching wildcards expand to nothing. Then you can skip parsing ls altogether and simply iterate over the matching files.
shopt -s nullglob
for m in {01..12}; do
for d in {01..31}; do
for f in ./mydir/2018.${m}.${d}T*.jpg; do
# process file
done
done
done
If you want all the file names at once, store them in an array. Arrays are better than a plain strings because they can handle file names with spaces and other special characters.
shopt -s nullglob
for m in {01..12}; do
for d in {01..31}; do
files=(./mydir/2018.${m}.${d}T*.jpg)
# process files
echo "processing ${files[#]}..."
done
done
What's the cleanest way to localise the shopt so as to restore nullglob to its original (unknown) value after this block?
Use a subshell: surround the block with parentheses. A subshell creates a child process which ensures changes don't leak into the parent.
(
shopt -s nullglob
...
)
It's polite to do this whenever you're changing shell options, and it's an elegant alternative to pushd+popd. Note that any variable assignments will be local to the subshell, so be careful there.
Here is another way, using find:
Assume the following dir:
$ ls -l mydir/
total 0
-rw-r--r-- 1 0 Jan 23 16:46 2018.01.20Thellowet.jpg
-rw-r--r-- 1 0 Jan 23 16:47 2018.04.24Thellowet.jpg
-rw-r--r-- 1 0 Jan 23 16:46 some_random_crap
-rw-r--r-- 1 0 Jan 23 16:46 wet
-rw-r--r-- 1 0 Jan 23 16:46 when
-rw-r--r-- 1 0 Jan 23 16:46 who
-rw-r--r-- 1 0 Jan 23 16:46 wtf
Using find:
find ./mydir/ -type f -regextype sed -regex ".*2018\.[0-9]\{,2\}\.[0-9]\{,2\}T.*\.jpg.*" -exec echo "---{}" \;
Gives (minor processing of data by appending --- to the file name):
---./mydir/2018.04.24Thellowet.jpg
---./mydir/2018.01.20Thellowet.jpg
NOTE: This will also return files that have 2018.00.xy or 2018.xy.00 where x and y can be any number between 0 and 9
Regex explained:
.* : any pattern
[0-9]{,2}: a 2 digit number
The \ are used to escape special characters.
Related
This question already has answers here:
Listing only directories using ls in Bash? [closed]
(29 answers)
Closed 4 years ago.
I would like to list all directories in a directory. Some of them have spaces in their names. There are also files in the target directory, which I would like to ignore.
Here is the output of ls -lah data/:
drwxr-xr-x 5 me staff 160B 24 Sep 11:30 Wrecsam - Wrexham
-rw-r--r-- 1 me staff 77M 24 Sep 11:31 Wrexham.csv
drwxr-xr-x 5 me staff 160B 24 Sep 11:32 Wychavon
-rw-r--r-- 1 me staff 84M 24 Sep 11:33 Wychavon.csv
I would like to iterate only over the "Wrecsam - Wrexham" and "Wychavon" directories.
This is what I've tried.
for d in "$(find data -maxdepth 1 -type d -print | sort -r)"; do
echo $d
done
But this gives me output like this:
Wychavon
Wrecsam
-
Wrexham
I want output like this:
Wychavon
Wrecsam - Wrexham
What can I do?
Your for loop is not doing the right thing because of word splitting. You can use a glob instead of having to invoke an external command in a subshell:
shopt -s nullglob # make glob expand to nothing if there are no matches
for dir in data/*/; do
echo dir="$dir"
done
Related:
Looping over directories in Bash
Why you shouldn't parse the output of ls(1)
I am trying to output the number of directories in a given path on a SINGLE line. My desire is to output this:
X-many directories
Currently, with my bash sript, I get this:
X-many
directories
Here's my code:
ARGUMENT=$1
ls -l $ARGUMENT | egrep -c '^drwx'; echo -n "directories"
How can I fix my output? Thanks
I suggest
echo "$(ls -l "$ARGUMENT" | egrep -c '^drwx') directories"
This uses the shell's feature of final newline removal for command substitution.
Do not pipe to ls output and count directories as you can get wrong results if special characters have been used in file/directory names.
To count directories use:
shopt -s nullglob
arr=( "$ARGUMENT"/*/ )
echo "${#arr[#]} directories"
/ at the end of glob will make sure to match only directories in "$ARGUMENT" path.
shopt -s nullglob is to make sure to return empty results if glob pattern fails (no directory in given argument).
as alternative solution
$ bc <<< "$(find /etc -maxdepth 1 -type d | wc -l)-1"
116
another one
$ count=0; while read curr_line; do count=$((count+1)); done < <(ls -l ~/etc | grep ^d); echo ${count}
116
Would work correctly with spaces in the folder name
$ ls -la
total 20
drwxrwxr-x 5 alex alex 4096 Jun 30 18:40 .
drwxr-xr-x 11 alex alex 4096 Jun 30 16:41 ..
drwxrwxr-x 2 alex alex 4096 Jun 30 16:43 asdasd
drwxrwxr-x 2 alex alex 4096 Jun 30 16:43 dfgerte
drwxrwxr-x 2 alex alex 4096 Jun 30 16:43 somefoler with_space
$ count=0; while read curr_line; do count=$((count+1)); done < <(ls -l ./ | grep ^d); echo ${count}
3
I'm trying to do the following on OSX:
ls -lR --ignore *.app
So that I can recursively search through all folders except for .app folders.
However it seems there is seems to be no --ignore or --hide options in Darwin.
Perhaps a script to recursively search one folder deep for a given set and I'm not sure I cant pipe ls -lR through anything because of the format of the output:
./ROOT/Applications/Some_app:
drwxr-xr-x 3 admin root 102 26 Jun 11:03 app-bundle.app #<- WANT THIS
drwxr-xr-x# 24 admin root 816 26 Jun 11:24 folder #<- WANT THIS
./ROOT/Applications/Some_app/app-bundle.app: #<- DON'T WANT
drwxr-xr-x 7 admin root 238 26 Jun 11:03 Contents #<- DON'T WANT
...
Use find:
find . -ls -name '*.app' -prune
In bash, you can use extended globbing to exclude a pattern.
shopt -s extglob # this must be on its own line
echo !(*.app) # match everything except for the given pattern
If you have bash version 4 or higher, you can use globstar to do this recursively.
shopt -s globstar
shopt -s extglob
echo **/!(*.app)
An alternative is to pipe to grep:
ls | grep -v
If the glob */ only matches directories, then logically the extglob !(*/) should match non-directories; but this doesn't work. Is this a bug or am I missing something? Does this work on any shell?
Test 1 to prove that */ works
$ cd /tmp; ls -ld */
drwxr-xr-x 2 seand users 4096 Jan 1 15:59 test1//
drwxr-xr-x 2 seand users 4096 Jan 1 15:59 test2//
drwxr-xr-x 2 seand users 4096 Jan 1 15:59 test3//
Test 2 to show potential bug with !(*/)
$ cd /tmp; shopt -s extglob; ls -ld !(*/)
/bin/ls: cannot access !(*/): No such file or directory
In Bash, !() (like *, ?, *(), and #()) only applies to one path component. Thus, !(anything containing a / slash) doesn't work.
If you switch to zsh, you can use *(^/) to match all non-directories, or *(.) to match all plain files.
The answer to the specific question has already been given; and I am not sure if you really wanted another solution or if you were just interested to analyze the behavior, but one way to list all non-directories in the current folder is to use find:
find . ! -type d -maxdepth 1
I'm using bash shell.
Hi,ppl
Would be glad if someone could provide some kind of advice, because googling around yielded some answers
but couldn't still get the script to work.
I'am new to using bash script and got a script to modify because it was failing to copy
a large number of files from and input directory to an output directory after the files were processed.
Description:
We have a bunch of pdf's in a large directory.
We process a file called filename.pdf, after it's processed an additional file is created called filename.pdf.marker
Then both files filename.pdf.marker and filename.pdf shoud be moved from input/in directory to directory output/out.
We work with about 10 -15 thousands of files.
The script should do the following:
select all .marker file names
move.marker files from input/in directory to directory output/out (done in separate line)
remove the .marker from the selected filename,
move the file filename.pdf to the output/out directory
Old script (didn't work for a larger number of files) :
FILELIST=$(ls ${V04}/*.pdf.marker 2> /dev/null | sort)
for FILEMARKER in ${FILELIST}; do
FILENAME=${V04}/$(basename $FILEMARKER .marker)
mv ${FILENAME} ${VLOGDIR}/.
mv ${FILENAME}.marker ${VLOGDIR}/.
done
Because of that I needed to use xargs command.
Problem:
I managed to move the .marker files in a separate line.
Now i need to move the .pdf files with this script line.
find /input/in -iname "*.marker" -print0 | xargs -0 -r -I {} mv `basename {} .marker` /output/out
My problem lies in the part: `basename {} .marker`
Why isn't the string filename.pdf extracted from the string filename.pdf.marker, and substituted into the mv command ?
Any help i's welcome ;)
UPDATED
Corrected description of what script should do: Both filetypes .pdf
and .pdf.marker should be moved in my script not copied.
Added old script that didn't work well for larger amount of files.
The problem is that the command in backticks is executed once before xargs is ever invoked.
The fix is a bit harder, not least because your step 2 says 'copy' but the previous description suggests 'move'. I'd probably create a simple script to be invoked by xargs:
find /input/in -name '*.marker' -print0 | xargs -0 mover.sh
The contents of mover.sh might be:
for mrk_source in "$#"
do
pdf_source=$(echo "$mrk_source" | sed 's/\.marker$//')
mrk_target=$(echo "/output/out/$mrk_source" | sed 's%/input/in%%')
pdf_target=$(echo "/output/out/$pdf_source" | sed 's%/input/in%%')
mv "$mrk_source" "$mrk_target"
mv "$pdf_source" "$pdf_target"
done
Note that this code preserves any directory structure under /input/in but assumes that the corresponding directory exists under /output/out (without checking). It would be possible to alter the code to flatten any directory structure, or to create the directories as needed (exercise for the reader). There is a small sleight-of-hand going on in the file name manipulation in the two xxx_target assignment lines; I think it will work OK for relative names as well as absolute names, but be a little cautious with that part (test before using, in other words).
tripleee commented:
The echo and sed invocations are very brittle -- for example, echo on some platforms will interpret backslashes in the filename as escape sequences. Fortunately, you can use the shell's substitution mechanisms to mv "${mrk_source#.marker}" /output/out instead. (Why would you want to calculate the destination file name, when all you need to give to mv is the destination directory?)
I explained the destination file name - preserving sub-directories, so /input/in/dir1/abc.pdf goes to /output/out/dir1/abc.pdf; if you want to flatten the directory structure (or there is no directory structure), then simply specifying the destination is sufficient.
The problem with echo 'should not' be a problem in the sense that the original design of echo was simple and all the later additional ... baggage simply makes what should be utterly reliable into something horrendously unreliable. That said, there could be problems with names containing backticks, $(...) and so on. There are no problems with backticks or $(...) in the names. There is a problem with backslashes in the name.
$ mkdir -p input/in output/out
$ for name in a b 'c d' 'e f g' '$(cat x)' '`cat y`' 'a\\nb'
> do
> cp /dev/null input/in/"$name.pdf"
> cp /dev/null "input/in/$name.pdf.marker"
> done
$ ls -lR [io]*
input:
total 0
drwxr-xr-x 16 jleffler staff 544 Aug 22 00:45 in
input/in:
total 0
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 $(cat x).pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 $(cat x).pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 `cat y`.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 `cat y`.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 a.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 a.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 a\\nb.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 a\\nb.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 b.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 b.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 c d.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 c d.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 e f g.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 e f g.pdf.marker
output:
total 0
drwxr-xr-x 2 jleffler staff 68 Aug 22 00:45 out
output/out:
$ find input/in -name '*.marker' -print0 | xargs -0 sh mover.sh
mv: rename input/in/a\nb.pdf to ./output/out/a
b.pdf: No such file or directory
$ ls -lR [io]*
input:
total 0
drwxr-xr-x 3 jleffler staff 102 Aug 22 00:46 in
input/in:
total 0
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 a\\nb.pdf
output:
total 0
drwxr-xr-x 15 jleffler staff 510 Aug 22 00:46 out
output/out:
total 0
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 $(cat x).pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 $(cat x).pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 `cat y`.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 `cat y`.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 a.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 a.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 a\nb.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 b.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 b.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 c d.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 c d.pdf.marker
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 e f g.pdf
-rw-r--r-- 1 jleffler staff 0 Aug 22 00:45 e f g.pdf.marker
$
Using the Bash built-ins is sensible; I'm still stuck in the 1980s on occasion, and need reminding of that.
Solution that works with backslashes etc
for mrk_source in "$#"
do
pdf_source=${mrk_source%.marker}
mrk_target=${mrk_source/\/input\/in/\/output\/out}
pdf_target=${pdf_source/\/input\/in/\/output\/out}
mv "$mrk_source" "$mrk_target"
mv "$pdf_source" "$pdf_target"
done
With the same set of input files, this code works cleanly:
EDIT: As pointed out in the comments, this will not work if there are spaces in the filenames. In that case see #Jonathan Leffler's answer (even if there are no spaces now, you should probably use his version anyway, to avoid breakage when there suddenly are spaces...).
Since the command is expanded before it is executed, you can't use it that way. The command you'll give xargs would look like this:
xargs -0 -r -I {} mv {} /output/out
Since it tries to remove any path components, and the a .marker suffix, from the string {}.
I'd say you want to use a loop in this case:
for f in $(find /input/in -iname "*.marker"); do
mv `basename $f .marker` /output/out
done
With GNU Parallel you should be able to do:
ls "$V04"/*.pdf.marker | parallel -q mv {.} {} "$VLOGDIR"
This will work even if $V04 and $VLOGDIR contains ' " space \t.
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ
The problem with the old script was, that you try to catch all entries in one var, which have size limits.
You can solve that, if you must not sort the entries in this way:
ls -1 "${V04}/*.pdf,.marker | while read FM;
do
mv "${FM}" "${VLOGDIR}/"
mv "${V04}/$(basename "${FM}" .marker)" "${VLOGDIR}/"
done;
The backticks are executed at evaluation time, not when xargs runs. Perhaps try something like this?
find /input/in -iname "*.marker" -print0 |
xargs -r0 -i sh -c 'mv `basename "{}" .marker` /output/out; mv "{}" /output/out'
Edit: The shell is still problematic here; if the file name contains double quotes, it will not parse correctly. Using a separate script might be better:
find /input/in -iname "*.marker" -exec ./myscript {} \;
where myscript contains the simple moving commands:
#!/bin/sh
mv `basename "$1" .marker` /output/out
mv "$1" /output/out