How to use a while read filename; do to take filenames strip "(-to the end" and then create a directory with that information? - bash

I have hundreds of movies saved as "Title (year).mkv". They are all in one directory, however, I wish to create a directory by just using the "Title" of the file and then mv the filename into the newly created directory to clean things up a little bit.
Here is what I have so far:
dest=/storage/Uploads/destination/
find "$dest" -maxdepth 1 -mindepth 1 -type f -printf "%P\n" | sort -n | {
while read filename ; do
echo $filename;
dir=${filename | cut -f 1 -d '('};
echo $dir;
# mkdir $dest$dir;
# rename -n "s/ *$//" *;
done;
}
~
dest=/storage/Uploads/destination/
is my working dirctory
find $dest -maxdepth 1 -mindepth 1 type f -printf "%P\n" | sort -n | {
is my find all files in $dest variable
while read filename ; do
as long as there's a filename to read, the loop continues
echo $filename
just so I can see what it is
dir=${filename | cut -f 1 -d '('};
dir = the results of command within the {}
echo $dir;
So I can see the name of the upcoming directory
mkdir $dest$dir;
Make the directory
rename -n "s/ *$//" *;
will rename the pesky directories that have a trailing space
And since we have more files to read, starts over until the last one, and
done;
}
When I run it, I get"
./new.txt: line 8: ${$filename | cut -f 1 -d '('}: bad substitution
I have two lines commented so it won't use those until I get the other working. Anyone have a way to do what I'm trying to do? I would prefer a bash script so I can run it again when necessary.
Thanks in advance!

dir=${filename | cut -f 1 -d '('}; is invalid. To run a command and capture it's output use $( ) and echo the text into the pipe. By the way, that cut will leave a trailing space which you probably don't want.
But don't use external programs like cut when there is no need, bash expansion will do it for you, and get rid of the trailing space:
filename="Title (year).mkv"
# remove all the characters on the right after and including <space>(
dir=${filename%% (*}
echo "$dir"
Gives
Title
General syntax is %%pattern to remove the longest pattern from the right. Pattern uses the glob (filename expansion) syntax, so (* is a space, followed by ( followed by zero or more of any character.
% is the shortest pattern, and ## and # do the same but remove from the left of the pattern.

Related

Automator/Apple Script: Move files with same prefix on a new folder. The folder name must be the files prefix

I'm a photographer and I have multiple jpg files of clothings in one folder. The files name structure is:
TYPE_FABRIC_COLOR (Example: BU23W02CA_CNU_RED, BU23W02CA_CNU_BLUE, BU23W23MG_LINO_WHITE)
I have to move files of same TYPE (BU23W02CA) on one folder named as TYPE.
For example:
MAIN FOLDER>
BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg, BU23W23MG_LINO_WHITE.jpg
Became:
MAIN FOLDER>
BU23W02CA_CNU > BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg
BU23W23MG_LINO > BU23W23MG_LINO_WHITE.jpg
Here are some scripts.
V1
#!/bin/bash
find . -maxdepth 1 -type f -name "*.jpg" -print0 | while IFS= read -r -d '' file
do
# Extract the directory name
dirname=$(echo "$file" | cut -d'_' -f1-2 | sed 's#\./\(.*\)#\1#')
#DEBUG echo "$file --> $dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
mv "$file" "$dirname"
done
it assumes all files that the find lists are of the format you described in your question, i.e. TYPE_FABRIC_COLOR.ext.
dirname is the extraction of the first two words delimited by _ in the file name.
since find lists the files with a ./ prefix, it is removed from the dirname as well (that is what the sed command does).
the find specifies the name of the files to consider as *.jpg. You can change this to something else, if you want to restrict which files are considered in the move.
this version loops through each file, creates a directory with it's first two sections (if it does not exists already), and moves the file into it.
if you want to see what the script is doing to each file, you can add option -v to the mv command. I used it to debug.
However, since it loops though each file one by one, this might take time with a large number of files, hence this next version.
V2
#!/bin/bash
while IFS= read -r dirname
do
echo ">$dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
find . -maxdepth 1 -type f -name "${dirname}_*" -exec mv {} "$dirname" \;
done < <(find . -maxdepth 1 -type f -name "*.jpg" -print | sed 's#^\./\(.*\)_\(.*\)_.*\..*$#\1_\2#' | sort | uniq)
this version loops on the directory names instead of on each file.
the last line does the "magic". It finds all files, and extracts the first two words (with sed) right away. Then these words are sorted and "uniqued".
the while loop then creates each directory one by one.
the find inside the while loop moves all files that match the directory being processed into it. Why did I not simply do mv ${dirname}_* ${dirname}? Since the expansion of the * wildcard could result in a too long arguments list for the mv command. Doing it with the find ensures that it will work even on LARGE number of files.
Suggesting oneliner awk script:
echo "$(ls -1 *.jpg)"| awk '{system("mkdir -p "$1 OFS $2);system("mv "$0" "$1 OFS $2)}' FS=_ OFS=_
Explanation:
echo "$(ls -1 *.jpg)": List all jpg files in current directory one file per line
FS=_ : Set awk field separator to _ $1=type $2=fabric $3=color.jpg
OFS=_ : Set awk output field separator to _
awk script explanation
{ # for each file name from list
system ("mkdir -p "$1 OFS $2); # execute "mkdir -p type_fabric"
system ("mv " $0 " " $1 OFS $2); # execute "mv current-file to type_fabric"
}

Shell script to loop over all files in a folder and pick them in numerical order

I have the following code to loop through the files of a folder. Files are named 1.txt, 2.txt all the way to 15.txt
for file in .solutions/*; do
if [ -f "$file" ]; then
echo "test case ${file##*/}:"
cat ./testcases/${file##*/}
echo
echo "result:"
cat "$file"
echo
echo
fi
done
My issue I get 1.txt then 10.txt to 15.txt displayed.
I would like it to be displayed in numerical order instead of lexicographical order, in other words I want the loop to iterate though the files in numerical order. Is there any way to achieve this?
ls *.txt | sort -n
This would solve the problem, provided .solutions is a directory and no directory is named with an extension .txt.
and if you want complete accuracy,
ls -al *.txt | awk '$0 ~ /^-/ {print $9}' | sort -n
Update:
As per your edits,
you can simply do this,
ls | sort -n |
while read file
do
#do whatever you want here
:
done
Looping through ls is usually a bad idea since file names can have newlines in them. Redirecting using process substitution instead of piping the results will keep the scope the same (variables you set will stay after the loop).
#!/usr/bin/env bash
while IFS= read -r -d '' file; do
echo "test case ${file##*/}:"
cat ./testcases/${file##*/}
echo
echo "result:"
cat "$file"
echo
echo
done < <(find '.solutions/' -name '*.txt' -type f -print0 | sort -nz)
Setting IFS to "" keeps the leading/trailing spaces, -r to stop backslashes messing stuff up, and -d '' to use NUL instead of newlines.
The find command looks normal files -type f, so the if [ -f "$file" ] check isn't needed. It finds -name '*.txt' files in '.solutions/' and prints them -print0 NUL terminated.
The sort command accepts NUL terminated strings with the -z option, and sorts them numerically with -n.

Faster way to list files with similar names (using bash)?

I have a directory with more than 20K files all with a random number prefix (eg 12345--name.jpg). I want to find files with similar names and remove all but one. I don't care which one because they are duplicates.
To find duplicated names I've use
find . -type f \( -name "*.jpg" \) | | sed -e 's/^[0-9]*--//g' | sort | uniq -d
as the list of a for/next loop.
To find all but one to delete, I'm currently using
rm $(ls -1 *name.jpg | tail -n +2)
This operation is pretty slow. I want to speed this up. Any suggestions?
I would do it like this.
*Note that you are dealing with rm command, so make sure that you have backup of the existing directory in case something goes south.
Create a backup directory and take backup of existing files. Once done check if all the files are there.
mkdir bkp_dir;cp *.jpg /bkp_dir
Create another temp directory where we will keep all only 1 file for each similar name. So all unique file names will be here.
$ mkdir tmp
$ for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
*Explanation of the command is at the last. Once executed, check in /tmp directory if you got unique instances of the files.
Remove all *.jpg files from main directory. Saying again, please verify that all files have been backed up before executing rm command.
rm *.jpg
Backup the unique instances from the temp directory.
cp tmp/*.jpg .
Explanation of command in step 2.
Command to get unique file names for step 2 will be
for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
$(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq) will get the unique file names like file1.jpg , file2.jpg
for i in $(...);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done will copy one file for each filename to tmp/ directory.
You should not be using ls in scripts and there is no reason to use a separate file list like in userunknown's reply.
keepone () {
shift
rm "$#"
}
keepone *name.jpg
If you are running find to identify the files you want to isolate anyway, traversing the directory twice is inefficient. Filter the output from find directly.
find . -type f -name "*.jpg" |
awk '{ f=$0; sub(/^[0-9]*--/, "", f); if (a[f]++) print }' |
xargs echo rm
Take out the echo if the results look like what you expect.
As an aside, the /g flag to sed is useless for a regex which can only match once. The flag says to replace all occurrences on a line instead of the first occurrence on a line, but if there can be only one, the first is equivalent to all.
Assuming no subdirectories and no whitespace-in-filenames involved:
find . -type f -name "*.jpg" | sed -e 's/^[0-9]*--//' | sort | uniq -d > namelist
removebutone () { shift; echo rm "$#"; }; cat namelist | while read n; do removebutone "*--$n"; done
or, better readable:
removebutone () {
shift
echo rm "$#"
}
cat namelist | while read n; do removebutone "*--$n"; done
Shift takes the first parameter from $* off.
Note that the parens around the name parmeter are superflous, and that there shouldn't be two pipes before sed. Maybe you had something else there, which needed to be covered.
If it looks promising, you have, of course, to remove the 'echo' in front of 'rm'.

Find the biggest index in extension of file in a bash script

So I have a folder with bunch of files.
File, File.0, File.1, File.2
I'm trying to find the biggest index in extension of this files. So it has to be 2.
I wrote this command, which count all files with numeric extension.
But it's not working properly when the index is greater than 10. It's not working at all, because I just want to find biggest index, not sum of file with number in index.
$1 (is file name in this case File)
y=$(echo $(ls -d $1.[0-inf] | wc -l))
How can I do this ?
First tip : do not parse the output of ls. Especially in your case.
You could use the following script in pure bash to address your issue :
#!/bin/bash
# needed for correct glob expansion
shopt -s nullglob
# we check every file following the format $1.extension
max_index=0
for f in $1.*
do
# we retrieve the last extension
ext=${f##*.}
re="^[0-9]+$"
# if ext is a number and greater than our max, we store it
if [[ $ext =~ $re && $ext -gt $max_index ]]
then
max_index=$ext
fi
done
echo $max_index
You can try this:
for i in file\.*; do echo ${i##*.}; done | sort -g | tail -n1
${i##*.} is removing everything before the last . in the filename.
sort -g is sorting as numeric value.
tail -n1 prints the last index.
A more error prone way is to use findcommand as the it will cope with file not matching the pattern, filename with spaces...
find -type f -name "file\.*" -exec bash -c 'echo ${1/*\.}' _ "{}" \; 2>/dev/null | sort -n | tail -n1
bash -c 'echo ${1/*\.}' _ "{}" is the command that will strip the characters before the ..
You may want to add -maxdepth 1 at the beginning of the command to avoid looking recursively inside directories.

How can I escape white space in a bash loop list?

I have a bash shell script that loops through all child directories (but not files) of a certain directory. The problem is that some of the directory names contain spaces.
Here are the contents of my test directory:
$ls -F test
Baltimore/ Cherry Hill/ Edison/ New York City/ Philadelphia/ cities.txt
And the code that loops through the directories:
for f in `find test/* -type d`; do
echo $f
done
Here's the output:
test/Baltimore
test/Cherry
Hill
test/Edison
test/New
York
City
test/Philadelphia
Cherry Hill and New York City are treated as 2 or 3 separate entries.
I tried quoting the filenames, like so:
for f in `find test/* -type d | sed -e 's/^/\"/' | sed -e 's/$/\"/'`; do
echo $f
done
but to no avail.
There's got to be a simple way to do this.
The answers below are great. But to make this more complicated - I don't always want to use the directories listed in my test directory. Sometimes I want to pass in the directory names as command-line parameters instead.
I took Charles' suggestion of setting the IFS and came up with the following:
dirlist="${#}"
(
[[ -z "$dirlist" ]] && dirlist=`find test -mindepth 1 -type d` && IFS=$'\n'
for d in $dirlist; do
echo $d
done
)
and this works just fine unless there are spaces in the command line arguments (even if those arguments are quoted). For example, calling the script like this: test.sh "Cherry Hill" "New York City" produces the following output:
Cherry
Hill
New
York
City
First, don't do it that way. The best approach is to use find -exec properly:
# this is safe
find test -type d -exec echo '{}' +
The other safe approach is to use NUL-terminated list, though this requires that your find support -print0:
# this is safe
while IFS= read -r -d '' n; do
printf '%q\n' "$n"
done < <(find test -mindepth 1 -type d -print0)
You can also populate an array from find, and pass that array later:
# this is safe
declare -a myarray
while IFS= read -r -d '' n; do
myarray+=( "$n" )
done < <(find test -mindepth 1 -type d -print0)
printf '%q\n' "${myarray[#]}" # printf is an example; use it however you want
If your find doesn't support -print0, your result is then unsafe -- the below will not behave as desired if files exist containing newlines in their names (which, yes, is legal):
# this is unsafe
while IFS= read -r n; do
printf '%q\n' "$n"
done < <(find test -mindepth 1 -type d)
If one isn't going to use one of the above, a third approach (less efficient in terms of both time and memory usage, as it reads the entire output of the subprocess before doing word-splitting) is to use an IFS variable which doesn't contain the space character. Turn off globbing (set -f) to prevent strings containing glob characters such as [], * or ? from being expanded:
# this is unsafe (but less unsafe than it would be without the following precautions)
(
IFS=$'\n' # split only on newlines
set -f # disable globbing
for n in $(find test -mindepth 1 -type d); do
printf '%q\n' "$n"
done
)
Finally, for the command-line parameter case, you should be using arrays if your shell supports them (i.e. it's ksh, bash or zsh):
# this is safe
for d in "$#"; do
printf '%s\n' "$d"
done
will maintain separation. Note that the quoting (and the use of $# rather than $*) is important. Arrays can be populated in other ways as well, such as glob expressions:
# this is safe
entries=( test/* )
for d in "${entries[#]}"; do
printf '%s\n' "$d"
done
find . -type d | while read file; do echo $file; done
However, doesn't work if the file-name contains newlines. The above is the only solution i know of when you actually want to have the directory name in a variable. If you just want to execute some command, use xargs.
find . -type d -print0 | xargs -0 echo 'The directory is: '
Here is a simple solution which handles tabs and/or whitespaces in the filename. If you have to deal with other strange characters in the filename like newlines, pick another answer.
The test directory
ls -F test
Baltimore/ Cherry Hill/ Edison/ New York City/ Philadelphia/ cities.txt
The code to go into the directories
find test -type d | while read f ; do
echo "$f"
done
The filename must be quoted ("$f") if used as argument. Without quotes, the spaces act as argument separator and multiple arguments are given to the invoked command.
And the output:
test/Baltimore
test/Cherry Hill
test/Edison
test/New York City
test/Philadelphia
This is exceedingly tricky in standard Unix, and most solutions run foul of newlines or some other character. However, if you are using the GNU tool set, then you can exploit the find option -print0 and use xargs with the corresponding option -0 (minus-zero). There are two characters that cannot appear in a simple filename; those are slash and NUL '\0'. Obviously, slash appears in pathnames, so the GNU solution of using a NUL '\0' to mark the end of the name is ingenious and fool-proof.
You could use IFS (internal field separator) temporally using :
OLD_IFS=$IFS # Stores Default IFS
IFS=$'\n' # Set it to line break
for f in `find test/* -type d`; do
echo $f
done
IFS=$OLD_IFS
<!>
Why not just put
IFS='\n'
in front of the for command? This changes the field separator from < Space>< Tab>< Newline> to just < Newline>
find . -print0|while read -d $'\0' file; do echo "$file"; done
I use
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")
for f in $( find "$1" -type d ! -path "$1" )
do
echo $f
done
IFS=$SAVEIFS
Wouldn't that be enough?
Idea taken from http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html
Don't store lists as strings; store them as arrays to avoid all this delimiter confusion. Here's an example script that'll either operate on all subdirectories of test, or the list supplied on its command line:
#!/bin/bash
if [ $# -eq 0 ]; then
# if no args supplies, build a list of subdirs of test/
dirlist=() # start with empty list
for f in test/*; do # for each item in test/ ...
if [ -d "$f" ]; then # if it's a subdir...
dirlist=("${dirlist[#]}" "$f") # add it to the list
fi
done
else
# if args were supplied, copy the list of args into dirlist
dirlist=("$#")
fi
# now loop through dirlist, operating on each one
for dir in "${dirlist[#]}"; do
printf "Directory: %s\n" "$dir"
done
Now let's try this out on a test directory with a curve or two thrown in:
$ ls -F test
Baltimore/
Cherry Hill/
Edison/
New York City/
Philadelphia/
this is a dirname with quotes, lfs, escapes: "\''?'?\e\n\d/
this is a file, not a directory
$ ./test.sh
Directory: test/Baltimore
Directory: test/Cherry Hill
Directory: test/Edison
Directory: test/New York City
Directory: test/Philadelphia
Directory: test/this is a dirname with quotes, lfs, escapes: "\''
'
\e\n\d
$ ./test.sh "Cherry Hill" "New York City"
Directory: Cherry Hill
Directory: New York City
ps if it is only about space in the input, then some double quotes worked smoothly for me...
read artist;
find "/mnt/2tb_USB_hard_disc/p_music/$artist" -type f -name *.mp3 -exec mpg123 '{}' \;
To add to what Jonathan said: use the -print0 option for find in conjunction with xargs as follows:
find test/* -type d -print0 | xargs -0 command
That will execute the command command with the proper arguments; directories with spaces in them will be properly quoted (i.e. they'll be passed in as one argument).
#!/bin/bash
dirtys=()
for folder in *
do
if [ -d "$folder" ]; then
dirtys=("${dirtys[#]}" "$folder")
fi
done
for dir in "${dirtys[#]}"
do
for file in "$dir"/\*.mov # <== *.mov
do
#dir_e=`echo "$dir" | sed 's/[[:space:]]/\\\ /g'` -- This line will replace each space into '\ '
out=`echo "$file" | sed 's/\(.*\)\/\(.*\)/\2/'` # These two line code can be written in one line using multiple sed commands.
out=`echo "$out" | sed 's/[[:space:]]/_/g'`
#echo "ffmpeg -i $out_e -sameq -vcodec msmpeg4v2 -acodec pcm_u8 $dir_e/${out/%mov/avi}"
`ffmpeg -i "$file" -sameq -vcodec msmpeg4v2 -acodec pcm_u8 "$dir"/${out/%mov/avi}`
done
done
The above code will convert .mov files to .avi. The .mov files are in different folders and
the folder names have white spaces too. My above script will convert the .mov files to .avi file in the same folder itself. I don't know whether it help you peoples.
Case:
[sony#localhost shell_tutorial]$ ls
Chapter 01 - Introduction Chapter 02 - Your First Shell Script
[sony#localhost shell_tutorial]$ cd Chapter\ 01\ -\ Introduction/
[sony#localhost Chapter 01 - Introduction]$ ls
0101 - About this Course.mov 0102 - Course Structure.mov
[sony#localhost Chapter 01 - Introduction]$ ./above_script
... successfully executed.
[sony#localhost Chapter 01 - Introduction]$ ls
0101_-_About_this_Course.avi 0102_-_Course_Structure.avi
0101 - About this Course.mov 0102 - Course Structure.mov
[sony#localhost Chapter 01 - Introduction]$ CHEERS!
Cheers!
Had to be dealing with whitespaces in pathnames, too. What I finally did was using a recursion and for item in /path/*:
function recursedir {
local item
for item in "${1%/}"/*
do
if [ -d "$item" ]
then
recursedir "$item"
else
command
fi
done
}
Convert the file list into a Bash array. This uses Matt McClure's approach for returning an array from a Bash function:
http://notes-matthewlmcclure.blogspot.com/2009/12/return-array-from-bash-function-v-2.html
The result is a way to convert any multi-line input to a Bash array.
#!/bin/bash
# This is the command where we want to convert the output to an array.
# Output is: fileSize fileNameIncludingPath
multiLineCommand="find . -mindepth 1 -printf '%s %p\\n'"
# This eval converts the multi-line output of multiLineCommand to a
# Bash array. To convert stdin, remove: < <(eval "$multiLineCommand" )
eval "declare -a myArray=`( arr=(); while read -r line; do arr[${#arr[#]}]="$line"; done; declare -p arr | sed -e 's/^declare -a arr=//' ) < <(eval "$multiLineCommand" )`"
for f in "${myArray[#]}"
do
echo "Element: $f"
done
This approach appears to work even when bad characters are present, and is a general way to convert any input to a Bash array. The disadvantage is if the input is long you could exceed Bash's command line size limits, or use up large amounts of memory.
Approaches where the loop that is eventually working on the list also have the list piped in have the disadvantage that reading stdin is not easy (such as asking the user for input), and the loop is a new process so you may be wondering why variables you set inside the loop are not available after the loop finishes.
I also dislike setting IFS, it can mess up other code.
Well, I see too many complicated answers. I don't want to pass the output of find utility or to write a loop , because find has "exec" option for this.
My problem was that I wanted to move all files with dbf extension to the current folder and some of them contained white space.
I tackled it so:
find . -name \*.dbf -print0 -exec mv '{}' . ';'
Looks much simple for me
just found out there are some similarities between my question and yours. Aparrently if you want to pass arguments into commands
test.sh "Cherry Hill" "New York City"
to print them out in order
for SOME_ARG in "$#"
do
echo "$SOME_ARG";
done;
notice the $# is surrounded by double quotes, some notes here
I needed the same concept to compress sequentially several directories or files from a certain folder. I have solved using awk to parsel the list from ls and to avoid the problem of blank space in the name.
source="/xxx/xxx"
dest="/yyy/yyy"
n_max=`ls . | wc -l`
echo "Loop over items..."
i=1
while [ $i -le $n_max ];do
item=`ls . | awk 'NR=='$i'' `
echo "File selected for compression: $item"
tar -cvzf $dest/"$item".tar.gz "$item"
i=$(( i + 1 ))
done
echo "Done!!!"
what do you think?
find Downloads -type f | while read file; do printf "%q\n" "$file"; done
For me this works, and it is pretty much "clean":
for f in "$(find ./test -type d)" ; do
echo "$f"
done
Just had a simple variant problem... Convert files of typed .flv to .mp3 (yawn).
for file in read `find . *.flv`; do ffmpeg -i ${file} -acodec copy ${file}.mp3;done
recursively find all the Macintosh user flash files and turn them into audio (copy, no transcode) ... it's like the while above, noting that read instead of just 'for file in ' will escape.

Resources