Scripting: get number of root files in RAR archive - bash

I'm trying to write a bash script that determines whether a RAR archive has more than one root file.
The unrar command provides the following type of output if I run it with the v option:
[...#... dir]$ unrar v my_archive.rar
UNRAR 4.20 freeware Copyright (c) 1993-2012 Alexander Roshal
Archive my_archive.rar
Pathname/Comment
Size Packed Ratio Date Time Attr CRC Meth Ver
-------------------------------------------------------------------------------
file1.foo
2208411 2037283 92% 08-08-08 08:08 .....A. 00000000 m3g 2.9
file2.bar
103 103 100% 08-08-08 08:08 .....A. 00000000 m0g 2.9
baz/file3.qux
9911403 9003011 90% 08-08-08 08:08 .....A. 00000000 m3g 2.9
-------------------------------------------------------------------------------
3 12119917 11040397 91%
and since RAR is proprietary I'm guessing this output is as close as I'll get.
If I can get just the file list part (the lines between ------), and then perhaps filter out all even lines or lines beginning with multiple spaces, then I could do num_root_files=$(list of files | cut -d'/' -f1 | uniq | wc -l) and see whether [ $num_root_files -gt 1 ].
How do I do this? Or is there a saner approach?
I have searched for and found ways to grep text between two words, but then I'd have to include those "words" in the command, and doing that with entire lines of dashes is just too ugly. I haven't been able to find any solutions for "grep text between lines beginning with".
What I need this for is to decide whether to create a new directory or not before extracting RAR archives.
The unrar program does provide the x option to extract with full path and e for extracting everything to the current path, but I don't see how that could be useful in this case.
SOLUTION using the accepted answer:
num_root_files=$(unrar v "$file" | sed -n '/^----/,/^----/{/^----/!p}' | grep -v '^ ' | cut -d'/' -f1 | uniq | wc -l)
which seems to be the same as the shorter:
num_root_files=$(unrar v "$file" | sed -n '/^----/,/^----/{/^----/!p}' | grep -v '^ ' | grep -c '^ *[^/]*$')
OR using 7z as mentioned in a comment below:
num_root_files=$(7z l -slt "$file" | grep -c 'Path = [^/]*$')
# check if value is gt 2 rather than gt 1 - the archive itself is also listed
Oh no... I didn't have a man page for unrar so I looked one up online, which seems to have lacked some options that I just discovered with unrar --help. Here's the real solution:
unrar vb "$file" | grep -c '^[^/]*$'

I haven't been able to find any solutions for "grep text between lines
beginning with".
In order to get the lines between ----, you can say:
unrar v my_archive.rar | sed -n '/^----/,/^----/{/^----/!p}'

Related

How to create argument variable in bash script

I am trying to write a script such that I can identify number of characters of the n-th largest file in a sub-directory.
I was trying to assign n and the name of sub-directory into arguments like $1, $2.
Current directory: Greetings
Sub-directory: language_files, others
Sub-directory: English, German, French
Files: Goodmorning.csv, Goodafternoon.csv, Goodevening.csv ….
I would be at directory “Greetings”, while I indicating subdirectory (English, German, French), it would show the nth-largest file in the subdirectory indicated and calculate number of characters as well.
For instance, if I am trying to figure out number of characters of 2nd largest file in English, I did:
langs=$1
n=$2
for langs in language_files/;
Do count=$(find language_files/$1 name "*.csv" | wc -m | head -n -1 | sort -n -r | sed -n $2(p))
Done | echo "The file has $count bytes!"
The result I wanted was:
$ ./script1.sh English 2
The file has 1100 bytes!
The main problem of all the issue is the fact that I don't understand how variables and looping work in bash script.
no need for looping
find language_files/"$1" -name "*.csv" | xargs wc -m | sort -nr | sed -n "$2{p;q}"
for byte counting you should use -c, since -m is for char counting (it may be the same for you).
You don't use the loop variable in the script anyway.
Bash loops are interesting. You are encouraged to learn more about them when you have some time. However, this particular problem might not need a loop. Set lang (you can call it langs if you prefer) and n appropriately, and then try this:
count=$(stat -c'%s %n' language_files/$lang/* | sort -nr | head -n$n | tail -n1 | sed -re 's/^[[:space:]]*([[:digit:]]+).*/\1/')
That should give you the $count you need. Then you can echo it however you like.
EXPLANATION
If you wish to learn how it works:
The stat command outputs various statistics about the named file (or files), in this case %s the file's size and %n the file's name.
The head and tail output respectively the first and last several lines of a file. Together, they select a specific line from the file
The sed command screens a certain part of the line. (You can use cut, instead, if you prefer.)
If you wish to be cleverer, then you can optimize as #karafka has done.

How to extract "Create Date" in a faster way than with "identify"

I have done a short and ugly script to create a list of photos and datetime of when it was taken.
identify -verbose *.JPG | grep "Image:\|CreateDate:" | sed ':a;N;$!ba;s/JPG\n/JPG/g' | sed 's[^ ]* \([^ ]*\)[^0-9]*\(.*\)$/\1 \2/'
The output looks like
photo1.JPG 2018-11-28T16:11:44.06
photo2.JPG 2018-11-28T16:11:48.32
photo3.JPG 2018-11-28T16:13:23.01
It works pretty well, but my last folder had 3000 images and the script ran for a few hours after completing the task. This is mostly because identify is very slow. Does anyone have and alternative method? Preferably (but not exclusively) using native tools because it's a server and it is not so easy to convince the admin to install new tools.
Lose the grepand sed and such and use -format. This took about 10 seconds for 500 jpgs:
$ for i in *jpg ; do identify -format '%f %[date:create]\n' "$i" ; done
Output:
image1.jpg 2018-01-19T04:53:59+02:00
image2.jpg 2018-01-19T04:53:59+02:00
...
If you want to modify the output, put the command after the done to avoid forking a process after each image, like:
$ for i in *jpg ; do identify -format '%f %[date:create]\n' "$i" ; done | awk '{gsub(/+.*/,"",$NF)}1'
image1.jpg 2018-01-19T04:53:59
image2.jpg 2018-01-19T04:53:59
...
native tools? identify is the best ("native", I would call imagemagick a native tool) for this job. I don't think you'll find a faster method. Run it for 3000 images in parallel, you will have like nth-x speedup.
find . -maxdepth 1 -name '*.JPG' |
xargs -P0 -- sh -c "
identify -verbose \"\$1\" |
grep 'Image:\|CreateDate:' |
sed ':a;N;$!ba;s/JPG\n/JPG/g' |
sed 's[^ ]* \([^ ]*\)[^0-9]*\(.*\)$/\1 \2/'
" --
Or you can just use bash for f in "*.JPF"; do ( identify -verbose "$f" | .... ) & done.
Your seds look strange and output "unmatched ]" on my platform, I don't know what they are supposed to do, but I think cut -d: -f2 | tr -d '\n' would suffice. Greping for image name is also strange - you already now the image name...
find . -maxdepth 1 -name '*.JPG' |
xargs -P0 -- sh -c "
echo \"\$1 \$(
identify -verbose \"\$1\" |
grep 'CreateDate:' |
tr -d '[:space:]'
cut -d: -f2-
)\"
" --
This will work for filenames without any spaces in them. I think it will be ok with you, as your output is space separated, so you assume your filenames have no special characters.
jhead is small, fast and a stand-alone utility. Sample output:
jhead ~/sample/images/iPhoneSample.JPG
Sample Output
File name : /Users/mark/sample/images/iPhoneSample.JPG
File size : 2219100 bytes
File date : 2013:03:09 08:59:50
Camera make : Apple
Camera model : iPhone 4
Date/Time : 2013:03:09 08:59:50
Resolution : 2592 x 1936
Flash used : No
Focal length : 3.8mm (35mm equivalent: 35mm)
Exposure time: 0.0011 s (1/914)
Aperture : f/2.8
ISO equiv. : 80
Whitebalance : Auto
Metering Mode: pattern
Exposure : program (auto)
GPS Latitude : N 20d 50.66m 0s
GPS Longitude: E 107d 5.46m 0s
GPS Altitude : 1.13m
JPEG Quality : 96
I did 5,000 iPhone images like this in 0.13s on a MacBook Pro:
jhead *jpg | awk '/^File name/{f=substr($0,16)} /^Date\/Time/{print f,substr($0,16)}'
In case you are unfamiliar with awk, that says "Look out for lines starting with File name and if you see one, save characters 16 onwards as f, the filename. Look out for lines starting with Date/Time and if you see any, print the last filename you remembered and the 16th character of the current line onwards".

How to read CSV file stored in variable

I want to read a CSV file using Shell,
But for some reason it doesn't work.
I use this to locate the latest added csv file in my csv folder
lastCSV=$(ls -t csv-output/ | head -1)
and this to count the lines.
wc -l $lastCSV
Output
wc: drupal_site_livinglab.csv: No such file or directory
If I echo the file it says: drupal_site_livinglab.csv
Your issue is that you're one directory up from the path you are trying to read. The quick fix would be wc -l "csv-output/$lastCSV".
Bear in mind that parsing ls -t though convenient, isn't completely robust, so you should consider something like this to protect you from awkward file names:
last_csv=$(find csv-output/ -mindepth 1 -maxdepth 1 -printf '%T#\t%p\0' |
sort -znr | head -zn1 | cut -zf2-)
wc -l "$last_csv"
GNU find lists all files along with their last modification time, separating the output using null bytes to avoid problems with awkward filenames.
if you remove -maxdepth 1, this will become a recursive search
GNU sort arranges the files from newest to oldest, with -z to accept null byte-delimited input.
GNU head -z returns the first record from the sorted list.
GNU cut -z at the end discards the timestamp, leaving you with only the filename.
You can also replace find with stat (again, this assumes that you have GNU coreutils):
last_csv=$(stat csv-output/* --printf '%Y\t%n\0' | sort -znr | head -zn1 | cut -zf2-)

Keep only largest/smallest sized versioned file each in a directory?

In a directory with thousands of versioned files (ls -v format, filename ... filename (n) ), how to keep only the largest or smallest sized (in bytes, not in version number) version of each file?
Extra bonus if also possible to keep both smallest and largest, should it be needed.
Keep as in delete all others.
Any usual unix shell tools, preferaby avoiding xargs (host system doesnt have xargs installed).
It can be trusted that any filename ending in (number).ext is a versioned file.
It took a little while to get started, especially because ls -S did not work properly (this is a low end soho file server box with 2.6 kernel and ancient low mem versions of just about anything), but the core of the idea is to sort each file's versions with sort, if chmod, owner and group are all same, it will sort according to size.
ls -1p | grep -v / | sed 's/ *([0-9]*)//' | sed 's/\.ext//' | uniq | \
awk '{system("ls -l \""$0"\"\* | sort -r | head -n 1")}' | \
sed 's/.*[0-9][0-9]:[0-9][0-9] //' > largest.txt
A fancy solution would remove both smallest and largest inside the awk system call with another awk and run rm for the remaining entries, but my awk is very rusty.

Batch to rename files with metadata name

I recently accidentally formatted a 2TB hard drive mac os jounaled!
I was able to recover files with Data Rescue 3, the only problem is the program didn't gave me the files as they were, root tree, and name.
For example I had
|-Music
||-Enya
|||-Sonadora.mp3
|||-Now we are free.mp3
|-Documents
||-CV.doc
||-LetterToSomeone.doc
...and so on
And now I got
|-MP3
||-M0001.mp3
||-M0002.mp3
|-DOCUMENTS
||-D0001.doc
||-D0002.doc
So with a huge amount of data it would take me centuries to manually open, see what is it and rename.
Is there some batch which can scan all my subfolders and take the previous name? By metadata perhaps?
Or do you know a better tool which will keep the same name and path of files (doesn't matter if must pay, ther's always a solution for that :P)
Thank you
My contribution for you music at least...
The idea is to go through all of the MP3 files found, and distributed them based on their ID3 tags.
I'd do something like :
for i in `find /MP3 -type f -iname "*.mp3"`;
do
ARTIST=`id3v2 -l $i | grep TPE1 | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets you the Artist
ALBUM=`id3v2 -l $i | grep TALB | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets you the Album title
TRACK_NUM=`id3v2 -l $i | grep TRCK | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # This gets the track ID/position, like "2/13"
TR_TITLE=`id3v2 -l $i | grep TIT2 | cut -d":" -f2 | sed -e 's/^[[:space:]]*//'`; # Track title
mkdir -p /MUSIC/$ARTIST/$ALBUM/;
cp $i /MUSIC/$ARTIST/$ALBUM/$TRACK_NUM.$TR_TITLE.mp3
done
Basically :
* It looks for all ".mp3" files in /MP3
* then analyses each file's ID3 tags, and parses them to fill 4 variables, using "id3v2" tool (you'll need to install it first). The tags are cleaned to get only the value, sed is used to trim the leading spaces that might pollute.
* then creates (if needed), a tree in /MUSIC/ with Artist name and album name
* then copies the input files to the new tree, and renames it thanks to the tags.

Resources