bash for semantic file structure creation - bash

Update 2010-11-02 7p: Shortened description; posted initial bash solution.
I'd like to create a semantic file structure to better organize my data. I don't want to go a route like recoll, strigi, or beagle; I want no gui and full control. The closest might be oyepa or even closer, Tagsistant.
Here's the idea: one maintains a "regular" tree of their files. For example, mine are organized in project folders like this:
| ~/proj1
| ---- ../proj1_file1[tag1-tag2].ext
| ---- ../proj1_file2[tag3]_yyyy-mm-dd.ext
| ~/proj2
| ---- ../proj2_file3[tag2-tag4].ext
| ---- ../proj1_file4[tag1].ext
proj1, proj2 are very short abbreviations I have for my projects.
Then what I want to do is recursively go through the directory and get the following:
proj ID
Each of these will be form a complete "tag list" for each file.
Then in a user-defined directory, a "semantic hierarchy" will be created based on these tags. This gets a bit long, so just take a look at the directory structure created for all files containing tag2 in the name:
| ~/tag2
| --- ../proj1_file1[tag1-tag2].ext -> ~/proj1/proj1_file1[tag1-tag2].ext
| --- ../proj2_file3[tag2-tag4].ext -> ~/proj2/proj2_file3[tag2-tag4].ext
| ---../tag1
| ------- ../proj1_file1[tag1-tag2].ext -> ~/proj1/proj1_file1[tag1-tag2].ext
| --- ../tag4
| ------- ../proj2_file3[tag2-tag4].ext -> ~/proj2/proj2_file3[tag2-tag4].ext
| --- ../proj1
| ------- ../proj1_file1[tag1-tag2].ext -> ~/proj1/proj1_file1[tag1-tag2].ext
| --- ../proj2
| ------- ../proj2_file3[tag2-tag4].ext -> ~/proj2/proj2_file3[tag2-tag4].ext
In other words, directories are created with all combinations of a file's tags, and each contains a symlink to the actual files having those tags. I have omitted the file type directories, but these would also exist. It looks really messy in type, but I think the effect would be very cool. One could then fine a given file along a number of "tag bread crumbs."
My thoughts so far:
ls -R in a top directory to get all the file names
identify those files with a [ and ] in the filename (tagged files)
with what's left, enter a loop:
strip out the proj ID, tags, and extension
create all the necessary dirs based on the tags
create symlinks to the file in all of the dirs created
First Solution! 2010-11-3 7p
Here's my current working code. It only works on files in the top level directory, does not figure out extension types yet, and only works on 2 tags + the project ID for a total of 3 tags per file. It is a hacked manual chug solution but maybe it would help someone see what I'm doing and how this could be muuuuch better:
#### User Variables ####
## set top directory for the semantic filer
## example: ~/semantic
## result will be ~/semantic/tag1, ~/semantic/tag2, etc.
## set document extensions, space separated
## example: "doc odt txt"
doc_ext="doc odt txt"
## set presentation extensions, space separated
pres_ext="ppt odp pptx"
## set image extensions, space separated
img_ext="jpg png gif"
#### End User Variables ####
#### Begin Script####
cd $top_dir
ls -1 | (while read fname;
if [[ $fname == *[* ]]
tag_names=$( echo $fname | sed -e 's/-/ /g' -e 's/_.*\[/ /' -e 's/\].*$//' )
num_tags=$(echo $tag_names | wc -w)
current_tags=( `echo $tag_names | sed -e 's/ /\n/g'` )
echo ${current_tags[0]}
echo ${current_tags[1]}
echo ${current_tags[2]}
case $num_tags in
mkdir -p ./${current_tags[0]}/${current_tags[1]}/${current_tags[2]}
mkdir -p ./${current_tags[0]}/${current_tags[2]}/${current_tags[1]}
mkdir -p ./${current_tags[1]}/${current_tags[0]}/${current_tags[2]}
mkdir -p ./${current_tags[1]}/${current_tags[2]}/${current_tags[0]}
mkdir -p ./${current_tags[2]}/${current_tags[0]}/${current_tags[1]}
mkdir -p ./${current_tags[2]}/${current_tags[1]}/${current_tags[0]}
cd $top_dir/${current_tags[0]}
echo $PWD
ln -s $top_dir/$fname
ln -s $top_dir/$fname ./${current_tags[1]}/$fname
ln -s $top_dir/$fname ./${current_tags[2]}/$fname
cd $top_dir/${current_tags[1]}
echo $PWD
ln -s $top_dir/$fname
ln -s $top_dir/$fname ./${current_tags[0]}/$fname
ln -s $top_dir/$fname ./${current_tags[2]}/$fname
cd $top_dir/${current_tags[2]}
echo $PWD
ln -s $top_dir/$fname
ln -s $top_dir/$fname ./${current_tags[0]}/$fname
ln -s $top_dir/$fname ./${current_tags[1]}/$fname
cd $top_dir
It's actually pretty neat. If you want to try it, do this:
create a dir somewhere
use touch to create a bunch of files with the format above: proj_name[tag1-tag2].ext
define the top_dir variable
run the script
play around!
make this work using an "ls -R" in order to get into sub-dirs in my actual tree
robustness check
consider switching languages; hey, I've always wanted to learn perl and/or python!
Still open to any suggestions you have. Thanks!

Hmm, big problem, too big to do on a short break...
But I can give you an example of one of the various ways you could structure the script...
ls -1 / | (while read fname; do
echo "$fname"
# example transformation...
test2=`echo $fname | tr a-z A-Z`
echo "$test2"
echo post-loop processing here, $test
# then finally close the subshell with a right paren

Maybe something like this for each tag?
find . -type f|grep -Z "[[-]$tag[]-]"| \
xargs -0 -I %%% ln -s "../../%%%" "tagfolder/$tag/"
Note: The second line doesn't really work, don't know why.


Change date modified of multiple folders to match that of their most recently modified file

I've been using the following shell bin/bash script as an app which I can drop a folder on, and it will update the date modified of the folder to match the most recently modified file in that folder.
for f in each "$#"
echo "$f"
$HOME/setMod "$#"
This gets the folder name, and then passes it to this setMod script in my home folder.
# Check that exactly one parameter has been specified - the directory
if [ $# -eq 1 ]; then
# Go to that directory or give up and die
cd "$1" || exit 1
# Get name of newest file
newest=$(stat -f "%m:%N" * | sort -rn | head -1 | cut -f2 -d:)
# Set modification date of folder to match
touch -r "$newest" .
However, if I drop more than one folder on it at a time, it won't work, and I can't figure out how to make it work with multiple folders at once.
Also, I learned from Apple Support that the reason so many of my folders keep getting the mod date updated is due to some Time Machine-related process, despite the fact I haven't touched some of them in years. If anyone knows of a way to prevent this from happening, or to somehow automatically periodically update the date modified of folders to match the date/time of the most-recently-modified file in them, that would save me from having to run this step manually pretty regularly.
The setMod script current accepts only one parameter.
You could either make it accept many parameters and loop over them,
or you could make the calling script use a loop.
I take the second option, because the caller script has some mistakes and weak points. Here it is corrected and extended for your purpose:
for dir; do
echo "$dir"
"$HOME"/setMod "$dir"
Or to make setMod accept multiple parameters:
setMod() {
cd "$1" || return 1
# Get name of newest file
newest=$(stat -f "%m:%N" * | sort -rn | head -1 | cut -f2 -d:)
# Set modification date of folder to match
touch -r "$newest" .
for dir; do
if [ ! -d "$dir" ]; then
echo not a directory, skipping: $dir
(setMod "$dir")
for dir; do is equivalent to for dir in "$#"; do
The parentheses around (setMod "$dir") make it run in a sub-shell, so that the script itself doesn't change the working directory, the effect of the cd operation is limited to the sub-shell within (...)

Readlink - How to crop full path?

I use readlink to find a file's full path:
cek=$(readlink -f "$1")
mkdir -p "$ydk$cek"
mv "$1" "$ydk/$cek/$ydkfile"
But readlink -f "$1" gives me the full path. How can I crop the full path?
For example:
But I need just
How can I do it?
Judging from multiple comments:
The output should be the last four directory components of the full path returned by readlink.
the output should be:
(Don't build any assumption about today's date into the path trimming code.)
If you need the last four directory components of the full path, and if you don't have newlines in the full path, and if you have GNU grep or BSD (Mac OS X) grep with support for -o (output only the matched material) then this gives the required result:
$ cek="/home/test/test/2014/10/13/log.file"
$ echo "${cek%/*}"
$ echo "${cek%/*}" | grep -o -E -e '(/[^/]+){4}$'
$ full_path=/home/some/where/hidden/test/2014/08/29/sparefile.log
$ echo "${full_path%/*}" | grep -o -E -e '(/[^/]+){4}$'
I need path starting /201[0-9]:
/home/bla/bla2/bla3/2014/01/13/13… ⟶ /2014/01/13/13….
So, you need to use grep -o again, starting with the year pattern:
echo "${fullpath%/*}" | grep -o -e '/201[0-9]/.*$'
This is much simpler; you don't even need extended regular expressions for this!
If you need the path element before the year too, then you need:
echo "{fullpath%/*}" | grep -o -e '/[^/][^/]*/201[0-9]/.*$'
Do you really need to remove "/home" ?
dir=$(dirname "$cek")
echo "${dir#/home}"
Just last 4 directory components:
last4dirs() {
local IFS=/
local -a components=($1)
local l=${#components[#]}
echo "${components[*]:l-5:4}"
last4dirs /home/some/where/hidden/test/2014/08/29/sparefile.log

Custom unix command combination assigning to variable

I want to make UNIX script, which will automatically move my working directory files to newly created directories.
Example: In you dir you got files:
And 2 files will be moved to ./NewDir/001-file and another 2 to ./NewDir/002-file
My problem is that after I get correct string from Unix commands I cannot assign it to variable.
Here is my code:
echo "Starting script"
echo "Dir = "$(pwd)
read -p "Please enter count(max '999') of different file groups:" max_i
read -p "Enter new dir name:" outer_dir_name
for ((i=0; i<=$max_i;i++)) do
inner_dir_name=$((ls *[$a1][$a2][$a3]* 2>/dev/null | head -n 1 | cut -f1 -d"."))
echo $inner_dir_name
echo "--------------"
One pair of round parentheses is enough for command substitution.
inner_dir_name=$(ls *[$a1][$a2][$a3]* 2>/dev/null | head -n 1 | cut -f1 -d".")
It looks like you're going about the operation the hard way. I would probably do something like this, assuming that there are no spaces in the file names:
ls | sed 's/\..*$//' | sort -u |
while read prefix
mkdir -p $outer_dir_name/$prefix
mv $prefix.* $outer_dir_name/$prefix
The ls could be made more precise with:
ls [0-9][0-9][0-9]-file.*
If I was worried about blanks and other odd-ball characters in the file names, I'd have to use something more careful:
for file in [0-9][0-9][0-9]-file.*
[ -d "$outer_dir_name/$prefix" ] || mkdir -p "$outer_dir_name/$prefix"
mv "$file" "$outer_dir_name/$prefix"
This executes more mv commands, in general.

grep spacing error

Hi guys i've a problem with grep . I don't know if there is another search code in shell script.
I'm trying to backup a folder AhmetsFiles which is stored in my Flash Disk , but at the same time I've to group them by their extensions and save them into [extensionName] Folder.
An example : /media/FlashDisk/AhmetsFiles/lecture.pdf must be stored in /home/$(whoami)/Desktop/backups/pdf
Problem is i cant copy a file which name contains spaces.(lecture 2.pptx)
After this introduction here my code.
exec 3<&0
exec 0< $filename
mkdir "/home/$(whoami)/Desktop/backups"
while read extension
cd "/home/$(whoami)/Desktop/backups"
rm -rf "$extension"
mkdir "$extension"
cd "/media/FlashDisk/AhmetsFiles"
files=( `ls | grep -i "$extension"` )
fCount=( `ls | grep -c -i "$extension"` )
for (( i=0 ; $i<$fCount ; i++ ))
cp -f "/media/FlashDisk/AhmetsFiles/${files[$i]}" "/home/$(whoami)/Desktop/backups/$extension"
let count++
exec 0<&3
exit 0
Your looping is way more complicated than it needs to be, no need for either ls or grep or the files and fCount variables:
for file in *.$extension
cp -f "/media/FlashDisk/AhmetsFiles/$file" "$HOME/Desktop/backups/$extension"
This works correctly with spaces.
I'm assuming that you actually wanted to interpret $extension as a file extension, not some random string in the middle of the filename like your original code does.
Why don't you
grep -i "$extension" | while IFS=: read x ; do
cp ..
Also, I believe you may prefer something like grep -i ".$extension$" instead (anchor it to the end of line).
On the other hand, the most optimal way is probably
cp -f /media/FlashDisk/AhmetsFiles/*.$extension "$HOME/Desktop/backups/$extension/"

Need help writing bash script to move folders around

What i need to do is replace the folder amtlib.framework into each Adobe app on my mac
if i do:
cd /Applications; ls | grep Adobe, this gives me all the folders which i need
here's some pseudo code:
apps = ls | grep Adobe
for each x in apps
if (x/ //if this folder exists
add .bak extension //amtlib.framework.bak
copy ~/Downloads/.../amtlib.framwork to x/
how would i implement this as a bash script?
Something like
for x in $( ls | grep Adobe) ; do
if [[ -d "${x}"/"${x}".app/contents/frameworks/amtlib.framwork ]] ; then
# add .bak extension # //amtlib.framework.bak
#? mkdir "${x}"/"${x}".app/contents/frameworks/amtlib.framwork.bak
#? /bin/mv "${x}"/"${x}".app/contents/frameworks/amtlib.framwork {x}/${x}.app/contents/frameworks/amtlib.framwork.bak
/bin/cp ~/Downloads/.../amtlib.framwork to "${x}"/"${x}".app/contents/frameworks/
: # ??? what do you want to do if there's not
done # loop
If you're likely to have spaces in your dirnames, (not sure if OSX support -print0), but try
find . -name 'Adobe' -print0 \
| while read x ; do
if ....
As an FYI, assignments in bash are done like (without spaces around the =):
apps=$(ls | grep Adobe)
Depending on the situation then, you'll want to use "$apps", or just plain $apps, which leaves each word in the list as a separate token. (If there are spaces in your filename or path, 1 path/file is now 2 words, and will cause issues). There are also array notations to use, apps=( $(ls | grep Adobe) ), and using those vars like ${#apps[#]} (number of elems), ${apps[#]} (all elems), ${apps[1]}, (first elem) is possible.
Also, it's not clear what your intent with add .bak extension is for. My best guess is my 2nd option, /bin/mv ... .bak.
First of all there's a typo error in the original post that's made its way throughout the examples given. The folder you are looking to rename/replace is amtlib.framework, not framwork.
Second, for some reason, the test for existence of the .bak directory is not working for me, even when I split this out to a separate if-then statement it doesn't work:
cd /Applications
for x in *Adobe* ; do
printf "$x \n"
printf "%s" " "
if [ -d "$x/$" ]; then
printf "removing old bak... "
if [ -d "$x/$" ]; then
printf "moving... "
printf "copying... "
printf "%s\n" "done!"
printf "%s\n" "nothing to do here!"
cd ~
Finally, understanding the goal you will fail to update a couple of apps that have an additional folder level (e.g., Acrobat Pro and Illustrator).
