Batch to rename files with metadata name - bash

I recently accidentally formatted a 2TB hard drive mac os jounaled!
I was able to recover files with Data Rescue 3, the only problem is the program didn't gave me the files as they were, root tree, and name.
For example I had
|-Music
||-Enya
|||-Sonadora.mp3
|||-Now we are free.mp3
|-Documents
||-CV.doc
||-LetterToSomeone.doc
...and so on
And now I got
|-MP3
||-M0001.mp3
||-M0002.mp3
|-DOCUMENTS
||-D0001.doc
||-D0002.doc
So with a huge amount of data it would take me centuries to manually open, see what is it and rename.
Is there some batch which can scan all my subfolders and take the previous name? By metadata perhaps?
Or do you know a better tool which will keep the same name and path of files (doesn't matter if must pay, ther's always a solution for that :P)
Thank you

My contribution for you music at least...
The idea is to go through all of the MP3 files found, and distributed them based on their ID3 tags.
I'd do something like :
for i in `find /MP3 -type f -iname "*.mp3"`;
do
ARTIST=`id3v2 -l $i | grep TPE1 | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets you the Artist
ALBUM=`id3v2 -l $i | grep TALB | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets you the Album title
TRACK_NUM=`id3v2 -l $i | grep TRCK | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets the track ID/position, like "2/13"
TR_TITLE=`id3v2 -l $i | grep TIT2 | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # Track title
mkdir -p /MUSIC/$ARTIST/$ALBUM/;
cp $i /MUSIC/$ARTIST/$ALBUM/$TRACK_NUM.$TR_TITLE.mp3
done
Basically :
* It looks for all ".mp3" files in /MP3
* then analyses each file's ID3 tags, and parses them to fill 4 variables, using "id3v2" tool (you'll need to install it first). The tags are cleaned to get only the value, sed is used to trim the leading spaces that might pollute.
* then creates (if needed), a tree in /MUSIC/ with Artist name and album name
* then copies the input files to the new tree, and renames it thanks to the tags.

Related

How to create argument variable in bash script

I am trying to write a script such that I can identify number of characters of the n-th largest file in a sub-directory.
I was trying to assign n and the name of sub-directory into arguments like $1, $2.
Current directory: Greetings
Sub-directory: language_files, others
Sub-directory: English, German, French
Files: Goodmorning.csv, Goodafternoon.csv, Goodevening.csv ….
I would be at directory “Greetings”, while I indicating subdirectory (English, German, French), it would show the nth-largest file in the subdirectory indicated and calculate number of characters as well.
For instance, if I am trying to figure out number of characters of 2nd largest file in English, I did:
langs=$1
n=$2
for langs in language_files/;
Do count=$(find language_files/$1 name "*.csv" | wc -m | head -n -1 | sort -n -r | sed -n $2(p))
Done | echo "The file has $count bytes!"
The result I wanted was:
$ ./script1.sh English 2
The file has 1100 bytes!
The main problem of all the issue is the fact that I don't understand how variables and looping work in bash script.
no need for looping
find language_files/"$1" -name "*.csv" | xargs wc -m | sort -nr | sed -n "$2{p;q}"
for byte counting you should use -c, since -m is for char counting (it may be the same for you).
You don't use the loop variable in the script anyway.
Bash loops are interesting. You are encouraged to learn more about them when you have some time. However, this particular problem might not need a loop. Set lang (you can call it langs if you prefer) and n appropriately, and then try this:
count=$(stat -c'%s %n' language_files/$lang/* | sort -nr | head -n$n | tail -n1 | sed -re 's/^[[:space:]]*([[:digit:]]+).*/\1/')
That should give you the $count you need. Then you can echo it however you like.
EXPLANATION
If you wish to learn how it works:
The stat command outputs various statistics about the named file (or files), in this case %s the file's size and %n the file's name.
The head and tail output respectively the first and last several lines of a file. Together, they select a specific line from the file
The sed command screens a certain part of the line. (You can use cut, instead, if you prefer.)
If you wish to be cleverer, then you can optimize as #karafka has done.

find - grep taking too much time

First of all I'm a newbie with bash scripting so forgive me if i'm making easy mistakes.
Here's my problem. I needed to download my company's website. I accomplish this using wget with no problems but because some files have the ? symbol and windows doesn't like filenames with ? I had to create a script that renames files and also update the source code of all files that calls the rename file.
To accomplish this I use the following code:
find . -type f -name '*\?*' | while read -r file ; do
SUBSTRING=$(echo $file | rev | cut -d/ -f1 | rev)
NEWSTRING=$(echo $SUBSTRING | sed 's/?/-/g')
mv "$file" "${file//\?/-}"
grep -rl "$SUBSTRING" * | xargs sed -i '' "s/$SUBSTRING/$NEWSTRING/g"
done
This is having 2 problems.
This is taking way too long, I've waited more than 5 hours and is still going.
It looks like is doing a append in the source code because when i stop the script and search for changes the URL is repeated like 4 times ( or more ).
Thanks all for your comments, i will try the 2 separete step and see, also, just as FYI, there are 3291 files that were downloaded with wget, still thinking that using bash scripting is prefer over other tools for this?
Seems odd that a file would have ? in it. Website URLs have ? to indicate passing of parameters. wget from a website also doesn't guarantee you're getting the site, especially if server side execution takes place, like php files. So, I suspect as wget does its recursiveness, it's finding url's passing parameters and thus creating them for you.
To really get the site, you should have direct access to the files.
If I were you, I'd start over and not use wget.
You may also be having issues with files or directories with spaces in their name.
Instead of that line with xargs, you're already doing one file at a time, but grepping for all recursively. Just do the sed on the new file itself.
Ok, here's the idea (untested):
in the first loop, just move the files and compose a global sed replacement file
once it is done, just scan all the files and apply sed with all the patterns at once, thus saving a lot of read/write operations which are likely to be the cause of the performance issue here
I would avoid to put the current script in the current directory or it will be processed by sed, so I suppose that all files to be processed are not in the current dir but in data directory
code:
sedfile=/tmp/tmp.sed
data=data
rm -f $sedfile
# locate ourselves in the subdir to preserve the naming logic
cd $data
# rename the files and compose the big sedfile
find . -type f -name '*\?*' | while read -r file ; do
SUBSTRING=$(echo $file | rev | cut -d/ -f1 | rev)
NEWSTRING=$(echo $SUBSTRING | sed 's/?/-/g')
mv "$file" "${file//\?/-}"
echo "s/$SUBSTRING/$NEWSTRING/g" >> $sedfile
done
# now apply the big sedfile once on all the files:
# if you need to go recursive:
find . -type f | xargs sed -i -f $sedfile
# if you don't:
sed -i -f $sedfile *
Instead of using grep, you can use the find command or ls command to list the files and then operate directly on them.
For example, you could do:
ls -1 /path/to/files/* | xargs sed -i '' "s/$SUBSTRING/$NEWSTRING/g"
Here's where I got the idea based on another question where grep took too long:
Linux - How to find files changed in last 12 hours without find command

only copying files with unique content

I am trying to filter trough data, and would like to copy only the files which have only 1 representative of a certain group. For example, the file might look like:
sample_AAAAA_9824_r1
GGAAGCATCGTGGGAACTGCTTCACTAAGAAGGAAGTCACAGTTACTTCATAGATATCCATCACTAAAYGTGAGTAGATTGTGTTAATGTGTTATATATGACTGAAAAATTTTGCCTGGATCAGAATACGAAACCTTCTTGAGATATTGTAATGAATTTCAGTCATATGAGAAGTGATGGAGGGGGTGTGAATACATATACTGTGTCATTATCCATGCAGTATkATACTRCAAAGTTC-----
sample_AACCC_12358_r1
GGAAGCATCGTGGGAACTGCTTCACTAAGAAGGAAGTCACAGTTACTTCATAGATATCCATCACTAAATGTGAGTAGATTGTGTTAATGTGTTATATATGACTGAAAAWTTTTGCCTGGATCAGAATACGAAACCTTCTTGAGATATTGTAATGAATTTCAGTCATATGAGAAGTGATGGAGGGGGTGTGAATACATATACTGTGTCATTATCCATGCAGTATTATACTGCAAAGTTC-----
sample_AATTT_3905_r1
GGAAGCATCGTGGGAACTGCTTCACTAAGAAGGAAGTCACAGTTACTTCATAGATATCCATCACTAAATGTGAGTAGATTGTGTTAATGTGTTATATATGACTGAAAAATTTTGCCTGGATCAGAATACGAAACCTTCTTGAGATATTTTCAGTCATATGAGAATTGATGGAGGGGGTGTGAATACATATACTGTGTCATTATCCATGCAGTATGATACTACAAAGTTCCTTCCCATA-----
sample_ACGTA_178_r1
GGAAGCATCGTAGGAACTGCTTCACTAAGAAGGAAGTCACAGTTACTTCATAGATATCCATCACTAAATGTGAGTAGATTGTGTTAATGTGTTATATATGACTGAAAATTTTTGCCTGGATCAGAATACGAAACCTTCTTGAGATATTGTAATGAATTTCAGTCATATGAGAAGCGATGGAGGGGGTGTGAATACATATACTGTGTCATTATCCATGCAGTATGATACTACAAAGTTC-----
sample_ACTGC_9933_r1
GGAAGCATCGTRGGAACTGCTTCACTAAGAAGGAAGTCACAGTTACTTCATAGATATCCATCACTAAATGTGAGTAGATTGTGTTAATGTGTTATATATGACTGAAAAwTTTTGCCTGGATCAGAATACGAAACCTTCTTGAGATATTGTAATGAATTTCAGTCATATGAGAAGYGATGGAGGGGGTGTGAATACATATACTGTGTCATTATCCATGCAGTATGATACTACAAAGTTC-----
I have about 36000 of these files, and would like to copy only those to a different folder which have only one entry per sample (1 sample is for example sample ACTGC). There are 26 sample "numbers", consisting of 5 letters (e.g. AAAAA, AATTTT, ACGTC,...) the following number and "r1" is irrelevant.
I have been looking through different bash scripts for this, but cannot find the exact thing i need. I can count the occurence of each sample in a file, but this is probably not the way to go...
any help is greatly appreciated,
Yannick
You can use a loop to compare using cmp based on the output of sort vs the output of sort | uniq:
for f in files/*
do if cmp -s <(grep sample ${f} | cut -d'_' -f2 | sort) <(grep sample ${f} | cut -d'_' -f2 | sort | uniq)
then
echo "copying file ${f} here..."
# ... copy
else
"not copying file ${f} here" # do nothing...!
fi
done

is it possible to view a recursive tree of files in a folder of a certain file type?

I have a series of folders and files and I would like to copy a list of the folder and files structure and containing files. Is there a way to do this on a mac in the terminal or otherwise?
This looks like a good option http://www.cyberciti.biz/faq/linux-show-directory-structure-command-line/ but I can't see if it supports file types.
Updated Answer
I just came across the pacakge called tree within homebrew. It is rather nice and has many options for output. If you have homebrew, you just run
brew install tree
Then you can type tree -help to see how it works. Recommended!
tree v1.7.0 (c) 1996 - 2014 by Steve Baker, Thomas Moore, Francesc Rocher
Original Answer
You can start a Terminal and run a command like this:
ls -R | grep ":" | sed -e 's/://' -e 's/[^-][^\/]*\//--/g' -e 's/^/ /' -e 's/-/|/'
which gives output like this:
|-CocoaDialog.app
|---Contents
|-----MacOS
|-----Resources
|-------Inputbox.nib
|-------MainMenu.nib
|-------Msgbox.nib
|-------PopUpButton.nib
|-------Progressbar.nib
|-------SecureInputbox.nib
|-------Textbox.nib
|-OpenTerminalHere.app
If you want to copy this, simply add "| pbcopy" to the command above and everything it outputs will be saved in your Clipboard and you can then paste into Email, MS-Word documents or wherever you like.
ls -R | grep ":" | sed -e 's/://' -e 's/[^-][^\/]*\//--/g' -e 's/^/ /' -e 's/-/|/' | pbcopy
Or you may be happier with something simpler, like this:
find `pwd`
/Users/mark/bin
/Users/mark/bin/.DS_Store
/Users/mark/bin/a
/Users/mark/bin/AirPortWirelessPower
/Users/mark/bin/analyze.awk
/Users/mark/bin/analyze_fs
/Users/mark/bin/apachestart
/Users/mark/bin/atime
Or you can specifiy file names to "find" like this:
find `pwd` -name "*.doc"
/Users/mark/Documents/Correspondence/Anderson 0001.doc
/Users/mark/Documents/Correspondence/Anderson 0002.doc
/Users/mark/Documents/Correspondence/Anderson 0003.doc
You can add "| pbcopy" to all of these to copy the output to the Clipboard.

Scripting: get number of root files in RAR archive

I'm trying to write a bash script that determines whether a RAR archive has more than one root file.
The unrar command provides the following type of output if I run it with the v option:
[...#... dir]$ unrar v my_archive.rar
UNRAR 4.20 freeware Copyright (c) 1993-2012 Alexander Roshal
Archive my_archive.rar
Pathname/Comment
Size Packed Ratio Date Time Attr CRC Meth Ver
-------------------------------------------------------------------------------
file1.foo
2208411 2037283 92% 08-08-08 08:08 .....A. 00000000 m3g 2.9
file2.bar
103 103 100% 08-08-08 08:08 .....A. 00000000 m0g 2.9
baz/file3.qux
9911403 9003011 90% 08-08-08 08:08 .....A. 00000000 m3g 2.9
-------------------------------------------------------------------------------
3 12119917 11040397 91%
and since RAR is proprietary I'm guessing this output is as close as I'll get.
If I can get just the file list part (the lines between ------), and then perhaps filter out all even lines or lines beginning with multiple spaces, then I could do num_root_files=$(list of files | cut -d'/' -f1 | uniq | wc -l) and see whether [ $num_root_files -gt 1 ].
How do I do this? Or is there a saner approach?
I have searched for and found ways to grep text between two words, but then I'd have to include those "words" in the command, and doing that with entire lines of dashes is just too ugly. I haven't been able to find any solutions for "grep text between lines beginning with".
What I need this for is to decide whether to create a new directory or not before extracting RAR archives.
The unrar program does provide the x option to extract with full path and e for extracting everything to the current path, but I don't see how that could be useful in this case.
SOLUTION using the accepted answer:
num_root_files=$(unrar v "$file" | sed -n '/^----/,/^----/{/^----/!p}' | grep -v '^ ' | cut -d'/' -f1 | uniq | wc -l)
which seems to be the same as the shorter:
num_root_files=$(unrar v "$file" | sed -n '/^----/,/^----/{/^----/!p}' | grep -v '^ ' | grep -c '^ *[^/]*$')
OR using 7z as mentioned in a comment below:
num_root_files=$(7z l -slt "$file" | grep -c 'Path = [^/]*$')
# check if value is gt 2 rather than gt 1 - the archive itself is also listed
Oh no... I didn't have a man page for unrar so I looked one up online, which seems to have lacked some options that I just discovered with unrar --help. Here's the real solution:
unrar vb "$file" | grep -c '^[^/]*$'
I haven't been able to find any solutions for "grep text between lines
beginning with".
In order to get the lines between ----, you can say:
unrar v my_archive.rar | sed -n '/^----/,/^----/{/^----/!p}'

Resources