c shell script: find directory and rename the output of find - shell

i am still new in this shell script. I have a task that given to me which i have difficulty to execute it.
The task is i have a lot of directory which is based on ddmmyy.
03Mar2014 08Aug2013 11Jan2015 16Jan2014 22Feb2014 26Mar2014
03Nov2013 08Jan2014 11Jul2013 16Jul2013 22Jul2013 26Oct2014
03Oct2013 08Jan2015 11Nov2014 16May2014 22Mar2014 26Sep2013
The task is to make the directory to mmyy.
So far, my code is
foreach file(`find . -type d | awk -F/ 'NF == 3'`)
echo $file
set newmove = `echo $file | cut -c 1-2,5-`
echo $newmove
mv $file $newmove
output:
for find . -type d | awk -F/ 'NF == 3':
./24Jan2015/W51A
`echo $file | cut -c 1-2,5-`
./Jan2015/W51A
mv $file $newmove
mv: cannot rename ./24Jan2015/W51A to ./Jan2015/W51A: No such file or directory
but the script didnt work.
Do you guys have any idea how to do this?

First of all, the issue is you're renaming the file but actually tryng to rename the directory, hence the error.
As I understood the idea is to rename the folders in the current working directory to the desired format and by this actually merge the content of the folders from the same mmYYYY format to the new one (since 11Jan2015 and 16Jan2014 will be renamed both to Jan2014)
you can try this:
foreach dir ( `ls` )
set newdir = `echo $dir| cut -c 3-`
mkdir -p $newdir
mv -f $dir/* $newdir
rmdir $dir
end
-p will create the folder and will do nothing if the folder already exists.
Some assumptions:
The folders are at the same place which is pwd
There are always two leading digits in date
You want to merge the folders content of the same mmYYYY format
The files with the same name in different folders will be overwritten
There are only folders in pwd (add check if it's not the case)
In case these folders are in different places (which is not the case according to your output: ./24Jan2015 ) and collision is not an issue - the code should be changed to :
Use find
Create the new folder with the correct path
No merge and overwrite will occur, so 1,3,4 and 5 are not relevant.
UPDATE:
After additional input - if I understand correctly your find is looking only for the folders of depth 3. I can't say why but you can achieve the same much faster with
find . -type d -mindepth=2 -maxdepth=2
The output is the list of the folders that have subfolders.
Then you need to get the second folder's name (assuming it will always be of the expected format).
set olddir = `echo $file| cut -f 1-3 -d '/'`
set newdir = `echo $olddir | cut -c 1-2,5-`
and finally
foreach file(`find . -type d -mindepth=2 -maxdepth=2`)
set olddir = `echo $file| cut -f 1-3 -d '/'`
set newdir = `echo $olddir | cut -c 1-2,5-`
mkdir -p $newdir
mv -f $file $newdir
end
This will also handle the case if two folders were found under the same path.
UPDATE 2:
Since the script will run on Unix - the following updates should be made:
Original find was returned since unix find lacks the mindepth/maxdepth options
We should try to remove the olddir to cleanup the empty folders - it will fail if the folder is not empty but the script should continue to run
foreach file(`find . -type d | awk -F/ 'NF == 3'`)
set olddir = `echo $file| cut -f 1-2 -d '/'`
set newdir = `echo $olddir | cut -c 1-2,5-`
set dir_name=`basename "$file"`
if ( -d "$newdir/$dir_name" ) then
mv -f $file/* $newdir/$dir_name/
else
mkdir -p $newdir
mv -f $file $newdir
endif
rmdir $olddir
end

I really think c shell is the wrong tool for just about anything that involves programming. That said, this looks like it would do what you want with only a little help from an external tool:
#!/bin/csh
foreach file ([0-9][0-9][A-Z][a-z][a-z][0-9][0-9][0-9][0-9])
set new = `echo $file:q | cut -c 3-`
if ( -d "$new" ) then
echo "skipping $file because $new already exists"
continue
endif
mv -v "$file" "$new"
end
Note the glob that matches your list of directories to rename. This script isn't bothering to confirm whether the matched files ARE in fact directories, so if there's the possibility they might not be, you should account for that somehow.
Note that we are using a back-quoted expression to use an external tool, cut to grab a substring from each directory name. We use this (as you did in your question) because CSH IS NOT A PROGRAMMING LANGUAGE, and has no string processing capabilities of its own.
The if statement within the loop will skip any directory whose target already exists. So for example, if you were to go through the top row of your input in your question, 26Mar2014 would be converted to Mar2014, which already exists due to 03Mar2014. Since you haven't specified how this should be handled, this script skips that condition.

Related

Automator/Apple Script: Move files with same prefix on a new folder. The folder name must be the files prefix

I'm a photographer and I have multiple jpg files of clothings in one folder. The files name structure is:
TYPE_FABRIC_COLOR (Example: BU23W02CA_CNU_RED, BU23W02CA_CNU_BLUE, BU23W23MG_LINO_WHITE)
I have to move files of same TYPE (BU23W02CA) on one folder named as TYPE.
For example:
MAIN FOLDER>
BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg, BU23W23MG_LINO_WHITE.jpg
Became:
MAIN FOLDER>
BU23W02CA_CNU > BU23W02CA_CNU_RED.jpg, BU23W02CA_CNU_BLUE.jpg
BU23W23MG_LINO > BU23W23MG_LINO_WHITE.jpg
Here are some scripts.
V1
#!/bin/bash
find . -maxdepth 1 -type f -name "*.jpg" -print0 | while IFS= read -r -d '' file
do
# Extract the directory name
dirname=$(echo "$file" | cut -d'_' -f1-2 | sed 's#\./\(.*\)#\1#')
#DEBUG echo "$file --> $dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
mv "$file" "$dirname"
done
it assumes all files that the find lists are of the format you described in your question, i.e. TYPE_FABRIC_COLOR.ext.
dirname is the extraction of the first two words delimited by _ in the file name.
since find lists the files with a ./ prefix, it is removed from the dirname as well (that is what the sed command does).
the find specifies the name of the files to consider as *.jpg. You can change this to something else, if you want to restrict which files are considered in the move.
this version loops through each file, creates a directory with it's first two sections (if it does not exists already), and moves the file into it.
if you want to see what the script is doing to each file, you can add option -v to the mv command. I used it to debug.
However, since it loops though each file one by one, this might take time with a large number of files, hence this next version.
V2
#!/bin/bash
while IFS= read -r dirname
do
echo ">$dirname"
# Create it if not already existing
if [[ ! -d "$dirname" ]]
then
mkdir "$dirname"
fi
# Move the file into it
find . -maxdepth 1 -type f -name "${dirname}_*" -exec mv {} "$dirname" \;
done < <(find . -maxdepth 1 -type f -name "*.jpg" -print | sed 's#^\./\(.*\)_\(.*\)_.*\..*$#\1_\2#' | sort | uniq)
this version loops on the directory names instead of on each file.
the last line does the "magic". It finds all files, and extracts the first two words (with sed) right away. Then these words are sorted and "uniqued".
the while loop then creates each directory one by one.
the find inside the while loop moves all files that match the directory being processed into it. Why did I not simply do mv ${dirname}_* ${dirname}? Since the expansion of the * wildcard could result in a too long arguments list for the mv command. Doing it with the find ensures that it will work even on LARGE number of files.
Suggesting oneliner awk script:
echo "$(ls -1 *.jpg)"| awk '{system("mkdir -p "$1 OFS $2);system("mv "$0" "$1 OFS $2)}' FS=_ OFS=_
Explanation:
echo "$(ls -1 *.jpg)": List all jpg files in current directory one file per line
FS=_ : Set awk field separator to _ $1=type $2=fabric $3=color.jpg
OFS=_ : Set awk output field separator to _
awk script explanation
{ # for each file name from list
system ("mkdir -p "$1 OFS $2); # execute "mkdir -p type_fabric"
system ("mv " $0 " " $1 OFS $2); # execute "mv current-file to type_fabric"
}

Shell Script: How to copy files with specific string from big corpus

I have a small bug and don't know how to solve it. I want to copy files from a big folder with many files, where the files contain a specific string. For this I use grep, ack or (in this example) ag. When I'm inside the folder it matches without problem, but when I want to do it with a loop over the files in the following script it doesn't loop over the matches. Here my script:
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" | while read -d $'\0' file; do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done
SEARCH_QUERY holds the String I want to find inside the files, INPUT_DIR is the folder where the files are located, OUTPUT_DIR is the folder where the found files should be copied to. Is there something wrong with the while do?
EDIT:
Thanks for the suggestions! I took this one now, because it also looks for files in subfolders and saves a list with all the files.
ag -l "${SEARCH_QUERY}" "${INPUT_DIR}" > "output_list.txt"
while read file
do
echo "${file##*/}"
cp "${file}" "${OUTPUT_DIR}/${file##*/}"
done < "output_list.txt"
Better implement it like below with a find command:
find "${INPUT_DIR}" -name "*.*" | xargs grep -l "${SEARCH_QUERY}" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
or another option:
grep -l "${SEARCH_QUERY}" "${INPUT_DIR}/*.*" > /tmp/file_list.txt
while read file
do
echo "$file"
cp "${file}" "${OUTPUT_DIR}/${file}"
done < /tmp/file_list.txt
rm /tmp/file_list.txt
if you do not mind doing it in just one line, then
grep -lr 'ONE\|TWO\|THREE' | xargs -I xxx -P 0 cp xxx dist/
guide:
-l just print file name and nothing else
-r search recursively the CWD and all sub-directories
match these works alternatively: 'ONE' or 'TWO' or 'THREE'
| pipe the output of grep to xargs
-I xxx name of the files is saved in xxx it is just an alias
-P 0 run all the command (= cp) in parallel (= as fast as possible)
cp each file xxx to the dist directory
If i understand the behavior of ag correctly, then you have to
adjust the read delimiter to '\n' or
use ag -0 -l to force delimiting by '\0'
to solve the problem in your loop.
Alternatively, you can use the following script, that is based on find instead of ag.
while read file; do
echo "$file"
cp "$file" "$OUTPUT_DIR/$file"
done < <(find "$INPUT_DIR" -name "*$SEARCH_QUERY*" -print)

Faster way to list files with similar names (using bash)?

I have a directory with more than 20K files all with a random number prefix (eg 12345--name.jpg). I want to find files with similar names and remove all but one. I don't care which one because they are duplicates.
To find duplicated names I've use
find . -type f \( -name "*.jpg" \) | | sed -e 's/^[0-9]*--//g' | sort | uniq -d
as the list of a for/next loop.
To find all but one to delete, I'm currently using
rm $(ls -1 *name.jpg | tail -n +2)
This operation is pretty slow. I want to speed this up. Any suggestions?
I would do it like this.
*Note that you are dealing with rm command, so make sure that you have backup of the existing directory in case something goes south.
Create a backup directory and take backup of existing files. Once done check if all the files are there.
mkdir bkp_dir;cp *.jpg /bkp_dir
Create another temp directory where we will keep all only 1 file for each similar name. So all unique file names will be here.
$ mkdir tmp
$ for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
*Explanation of the command is at the last. Once executed, check in /tmp directory if you got unique instances of the files.
Remove all *.jpg files from main directory. Saying again, please verify that all files have been backed up before executing rm command.
rm *.jpg
Backup the unique instances from the temp directory.
cp tmp/*.jpg .
Explanation of command in step 2.
Command to get unique file names for step 2 will be
for i in $(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done
$(ls -1 *.jpg|sed 's/^[[:digit:]].*--\(.*\.jpg\)/\1/'|sort|uniq) will get the unique file names like file1.jpg , file2.jpg
for i in $(...);do cp $(ls -1|grep "$i"|head -1) tmp/ ;done will copy one file for each filename to tmp/ directory.
You should not be using ls in scripts and there is no reason to use a separate file list like in userunknown's reply.
keepone () {
shift
rm "$#"
}
keepone *name.jpg
If you are running find to identify the files you want to isolate anyway, traversing the directory twice is inefficient. Filter the output from find directly.
find . -type f -name "*.jpg" |
awk '{ f=$0; sub(/^[0-9]*--/, "", f); if (a[f]++) print }' |
xargs echo rm
Take out the echo if the results look like what you expect.
As an aside, the /g flag to sed is useless for a regex which can only match once. The flag says to replace all occurrences on a line instead of the first occurrence on a line, but if there can be only one, the first is equivalent to all.
Assuming no subdirectories and no whitespace-in-filenames involved:
find . -type f -name "*.jpg" | sed -e 's/^[0-9]*--//' | sort | uniq -d > namelist
removebutone () { shift; echo rm "$#"; }; cat namelist | while read n; do removebutone "*--$n"; done
or, better readable:
removebutone () {
shift
echo rm "$#"
}
cat namelist | while read n; do removebutone "*--$n"; done
Shift takes the first parameter from $* off.
Note that the parens around the name parmeter are superflous, and that there shouldn't be two pipes before sed. Maybe you had something else there, which needed to be covered.
If it looks promising, you have, of course, to remove the 'echo' in front of 'rm'.

Delete all files in a directory matching a time pattern

I am taking one of important folder every day backup by using cron. That folder name it will store with the current date.
Now my requirement is i need to keep only the current day and last two days backup.
i.e I want to keep only:
test_2016-11-04.tgz
test_2016-11-03.tgz
test_2016-11-02.tgz
Remaining folder it has to delete automatically. Please let us know how to do in shell script.
Below is my backup folder structure.
test_2016-10-30.tgz test_2016-11-01.tgz test_2016-11-03.tgz
test_2016-10-31.tgz test_2016-11-02.tgz test_2016-11-04.tgz
With ls -lrt | head -n -3 | awk '{print $9}
you can print all but the last 3 files in your directory.
Passing this output into rm you obtain the result desidered.
you could append end of backup script;
find ./backupFolder -name "test_*.tgz" -mtime +3 -type f -delete
also use this;
ls -1 test_*.tgz | sort -r | awk 'NR > 3 { print }' | xargs -d '\n' rm -f --
Generate an array on files you want to keep:
names=()
for d in {0..2}; do
names+=( "test_"$(date -d"$d days ago" "+%Y-%m-%d")".tgz" )
done
so that it looks like this:
$ printf "%s\n" "${names[#]}"
test_2016-11-04.tgz
test_2016-11-03.tgz
test_2016-11-02.tgz
Then, loop through the files and keep those that are not in the array:
for file in test_*.tgz; do
[[ ! ${names[*]} =~ "$file" ]] && echo "remove $file" || echo "keep $file"
done
If ran on your directory, this would result on an output like:
remove test_2016-10-30.tgz
remove test_2016-10-31.tgz
remove test_2016-11-01.tgz
keep test_2016-11-02.tgz
keep test_2016-11-03.tgz
keep test_2016-11-04.tgz
So now it is just a matter or replacing those echo with something more meaningful like rm.

How can I manipulate file names using bash and sed?

I am trying to loop through all the files in a directory.
I want to do some stuff on each file (convert it to xml, not included in example), then write the file to a new directory structure.
for file in `find /home/devel/stuff/static/ -iname "*.pdf"`;
do
echo $file;
sed -e 's/static/changethis/' $file > newfile +".xml";
echo $newfile;
done
I want the results to be:
$file => /home/devel/stuff/static/2002/hello.txt
$newfile => /home/devel/stuff/changethis/2002/hello.txt.xml
How do I have to change my sed line?
If you need to rename multiple files, I would suggest to use rename command:
# remove "-n" after you verify it is what you need
rename -n 's/hello/hi/g' $(find /home/devel/stuff/static/ -type f)
or, if you don't have rename try this:
find /home/devel/stuff/static/ -type f | while read FILE
do
# modify line below to do what you need, then remove leading "echo"
echo mv $FILE $(echo $FILE | sed 's/hello/hi/g')
done
Are you trying to change the filename? Then
for file in /home/devel/stuff/static/*/*.txt
do
echo "Moving $file"
mv "$file" "${file/static/changethis}.xml"
done
Please make sure /home/devel/stuff/static/*/*.txt is what you want before using the script.
First, you have to create the name of the new file based on the name of the initial file. The obvious solution is:
newfile=${file/static/changethis}.xml
Second you have to make sure that the new directory exists or create it if not:
mkdir -p $(dirname $newfile)
Then you can do something with your file:
doSomething < $file > $newfile
I wouldn't do the for loop because of the possibility of overloading your command line. Command lines have a limited length, and if you overload it, it'll simply drop off the excess without giving you any warning. It might work if your find returns 100 file. It might work if it returns 1000 files, but it might fail if your find returns 1000 files and you'll never know.
The best way to handle this is to pipe the find into a while read statement as glenn jackman.
The sed command only works on STDIN and on files, but not on file names, so if you want to munge your file name, you'll have to do something like this:
$newname="$(echo $oldname | sed 's/old/new/')"
to get the new name of the file. The $() construct executes the command and puts the results of the command on STDOUT.
So, your script will look something like this:
find /home/devel/stuff/static/ -name "*.pdf" | while read $file
do
echo $file;
newfile="$(echo $file | sed -e 's/static/changethis/')"
newfile="$newfile.xml"
echo $newfile;
done
Now, since you're renaming the file directory, you'll have to make sure the directory exists before you do your move or copy:
find /home/devel/stuff/static/ -name "*.pdf" | while read $file
do
echo $file;
newfile="$(echo $file | sed -e 's/static/changethis/')"
newfile="$newfile.xml"
echo $newfile;
#Check for directory and create it if it doesn't exist
$dirname=$(dirname "$newfile")
if [ ! -d "$dirname" ]
then
mkdir -p "$dirname"
fi
#Directory now exists, so you can do the move
mv "$file" "$newfile"
done
Note the quotation marks to handle the case there's a space in the file name.
By the way, instead of doing this:
if [ ! -d "$dirname" ]
then
mkdir -p "$dirname"
fi
You can do this:
[ -d "$dirname"] || mkdir -p "$dirname"
The || means to execute the following command only if the test isn't true. Thus, if [ -d "$dirname" ] is a false statement (the directory doesn't exist), you run mkdir.
It's a fairly common shortcut when you see shell scripts.
find ... | while read file; do
newfile=$(basename "$file").xml;
do something to "$file" > "$somedir/$newfile"
done
OUTPUT="$(pwd)";
for file in `find . -iname "*.pdf"`;
do
echo $file;
cp $file $file.xml
echo "file created in directory = {$OUTPUT}"
done
This will create a new file with name whatyourfilename.xml, for hello.pdf the new file created would be hello.pdf.xml, basically it creates a new file with .xml appended at the end.
Remember the above script finds files in the directory /home/devel/stuff/static/ whose file names match the matcher string of the find command (in this case *.pdf), and copies it to your present working directory.
The find command in this particular script only finds files with filenames ending with .pdf If you wanted to run this script for files with file names ending with .txt, then you need to change the find command to this find /home/devel/stuff/static/ -iname "*.txt",
Once I wanted to remove trailing -min from my files. i.e. wanted alg-min.jpg to turn into alg.jpg. so after some struggle, managed to figure something like this:
for f in *; do echo $f; mv $f $(echo $f | sed 's/-min//g');done;
Hope this helps someone willing to REMOVE or SUBTITUDE some part of their file names.

Resources