Text file content match if else image Resolution - bash

I'm working on a project where I need to cat out the images and videos Resolution if they are not equal script should exit.
I've used command to cat the image Resolution. In this example I've 2 images. In a directory
find $PWD -iname "*.jpg" -type f -exec identify -format '%i %wx%h\n' '{}' \;|awk '{print $NF}'
OUTPUT 1280x720
640x362
I want them both to match if the file size is say it should say Okay else Check the file resolution and exit.
I tried the command to convert the output in to two variables a1 and a2. But it is not working. I tried to copy paste the output to the terminal than it is working. Please help
find $PWD -iname "*.jpg" -type f -exec identify -format '%i %wx%h\n' '{}' \;|awk '{print $NF}'|egrep -n "x"|sed 's#:#=#g'|sed 's#^#a#g'|paste -sd ";"|bash

You did
... | awk '{print $NF}'
and got
1280x720
640x362
I would harness GNU AWK as follows to check if all sizes match
... | awk '{arr[$NF]+=1}END{print length(arr)==1?"Everything match":"At least one mismatch"}'
Which will print either Everything match or At least one mismatch. Explanation: I use array arr for every encountered resolution I increase value for that resolution by 1. At END if arr has exactly 1 distinct key I print Everything match otherwise At least one mismatch using ternany operator i.e. condition?valueiftrue:valueiffalse.
(tested in gawk 4.2.1)

Related

Shorten filename to n characters while preserving file extension

I'm trying to shorten a filename while preserving the extension.
I think cut may be the best tool to use, but I'm not sure how to preserve the file extension.
For example, I'm trying to rename abcdefghijklmnop.txt to abcde.txt
I'd like to simply lop off the end of the filename so that the total character length doesn't exceed [in this example] 5 characters.
I'm not concerned with filename clashes because my dataset likely won't contain any, and anyway I'll do a find, analyze the files, and test before I rename anything.
The background for this is ultimately that I want to mass truncate filenames that exceed 135 characters so that I can rsync the files to an encrypted share on a Synology NAS.
I found a good way to search for all filenames that exceed 135 characters:
find . -type f | awk -F'/' 'length($NF)>135{print $0}'
And I'd like to pipe that to a simple cut command to trim the filename down to size. Perhaps there is a better way than this. I found a method to shorten filenames while preserving extensions, but I need to recurse through all sub-directories.
Any help would be appreciated, thank you!
Update for clarification:
I'd like to use a one-liner with a syntax like this:
find . -type f | awk -F'/' 'length($NF)>135{print $0}' | some_code_here_to_shorten_the_filename_while_preserving_the_extension
With GNU find and bash:
export n=10 # change according to your needs
find . -type f \
! -name '.*' \
-regextype egrep \
! -regex '.*\.[^/.]{'"$n"',}' \
-regex '.*[^/]{'$((n+1))',}' \
-execdir bash -c '
echo "PWD=$PWD"
for f in "${##./}"; do
ext=${f#"${f%.*}"}
echo mv -- "$f" "${f:0:n-${#ext}}${ext}"
done' bash {} +
This will perform a dry-run, that is it shows folders followed by the commands to be executed within them. Once you're happy with its output you can drop echo before mv (and echo "PWD=$PWD" line too if you want) and it'll actually rename all the files whose names exceed n characters to names exactly of n characters length including extension.
Note that this excludes hidden files, and files whose extensions are equal to or longer than n in length (e.g. .hidden, app.properties where n=10).
use bash string manipulations
Details: https://www.linuxtopia.org/online_books/advanced_bash_scripting_guide/string-manipulation.html.
scroll to "Substring Extraction"
example below cut filename to 10 chars preserving extension
~ % cat test
rawFileName=$(basename "$1")
filename="${rawFileName%.*}"
ext="${rawFileName##*.}"
if [[ ${#filename} < 9 ]]; then
echo ${filename:0:10}.${ext}
else
echo $1
fi
And tests:
~ % ./test 12345678901234567890.txt
1234567890.txt
~ % ./test 1234567.txt
1234567.txt
Update
Since your file are distributed in a tree of directories, you can use my original approach, but passing the script to a sh command passed to the -exec option of find:
n=5 find . -type f -exec sh -c 'f={}; d=${f%/*}; b=${f##*/}; e=${b##*.}; b=${b%.*}; mv -- "$f" "$d/${b:0:n}.$e"' \;
Original answer
If the filename is in a variable x, then ${x:0:5}.${x##*.} should do the job.
So you might do something like
n=5 # or 135, or whatever you like
for f in *; do
mv -- "$f" "${f:0:n}.${f##*.}"
done
Clearly this assumes that there are no clashes between the shortened names. If there are clashes, then only one would survive! So be careful.

grep files against a list only containing numbers

I have several files (~70000) that have numbers in the name, a couple of examples would be 991000_Metatissue.qsub.file 828000_Metatissue.qsub.file, and then I have another file (files_failed.txt) with a bunch of numbers that I would use to grep. This list looks like this:
4578000
458000
4582000
527000
5288000
5733000
653000
6548000
6663000
I have tried with: ls -1 *.qsub.file | grep -F -f files_failed.txt - and even doing this:
ls -1 *.qsub.file > files_to_submit.txt
grep -F -f files_failed.txt files_to_submit.txt
But always got all the qsub.files...
grep -f isn't well composed (see GNU bug 16305), so I recommend using awk instead:
find . -name '*_*.qsub.file' |awk -F_ '
NR == FNR { failed[$NR] = 1; next }
$1 in failed
' files_failed.txt /dev/stdin
This uses find to locate the files in question, piping them into awk. Before awk processes that, it reads files_failed.txt and stores the values into an associated array (aka dictionary or hash) when the line number (NR, number of records so far) equals the line number of the current file (FNR), meaning it's the first file read. If the first column (the number of the file since we delimited by _) is in that array, it was a failure. AWK's default action on a stanza is to print it, so you will get a list of those failed files.
Note the lack of regular expressions! On a big directory, this is much faster than grep -F -f …, which itself is much faster than grep -f …, even assuming the aforementioned bug is fixed.
You should be using find and you need to modify your "patterns". Here is one way that should work:
# List all files ending in "qsub.file"
find . -name '*.qsub.file' |
# Add ./ and _ to each number to make the match exact
grep -F -f <(sed -e 's:^:./:' -e 's/$/_/' files_failed.txt)
70000 files is too much for a ls, you should use find instead.
And I prefer invert the logic, list just what a want instead of list all and then filter.
Something like
while read line; do find -iname $line_Metatissue.qsub.file; done < files_failed.txt
If you need the exit in another file?
while read line; do find -iname $line_Metatissue.qsub.file; done < files_failed.txt >> files_to_submit.txt
You can use the below script:-
ls -1 *.qsub.file > filelist.txt
while read pattern
do
filefound=$(grep $pattern filelist.txt)
if [ "$filefound" != "" ]; then
echo "File Found : $filefound"
fi
done < files_failed.txt
Second option:-
while read pattern
do
find . -name "$pattern*.qsub.file" >> filefound.txt
done < files_failed.txt
All your files will be stored in file filefound.txt

Pass .txt list of .jpgs to convert (bash)

I'm currently working on an exercise that requires me to write a shell script whose function is to take a single command-line argument that is a directory. The script takes the given directory, and finds all the .jpgs in that directory and its sub-directories, and creates an image-strip of all the .jpgs in order of modification time (newest on bottom).
So far, I've written:
#!bin/bash/
dir=$1 #the first argument given will be saved as the dir variable
#find all .jpgs in the given directory
#then ls is run for the .jpgs, with the date format %s (in seconds)
#sed lets the 'cut' process ignore the spaces in the columns
#fields 6 and 7 (the name and the time stamp) are then cut and sorted by modification date
#then, field 2 (the file name) is selected from that input
#Finally, the entire sorted output is saved in a .txt file
find "$dir" -name "*.jpg" -exec ls -l --time-style=+%s {} + | sed 's/ */ /g' | cut -d' ' -f6,7 | sort -n | cut -d' ' -f2 > jgps.txt
The script correctly outputs the directory's .jpgs in order of time modification. The part that I am currently struggling on is how to give the list in the .txt file to the convert -append command that will create an image-strip for me (For those who aren't aware of that command, what would be inputted is: convert -append image1.jpg image2.jpg image3.jpg IMAGESTRIP.jpgwith IMAGESTRIP.jpg being the name of the completed image strip file made up of the previous 3 images).
I can't quite figure out how to pass the .txt list of files and their paths to this command. I've been scouring the man pages to find a possible solution but no viable ones have arisen.
xargs is your friend:
find "$dir" -name "*.jpg" -exec ls -l --time-style=+%s {} + | sed 's/ */ /g' | cut -d' ' -f6,7 | sort -n | cut -d' ' -f2 | xargs -I files convert -append files IMAGESTRIP.jpg
Explanation
The basic use of xargs is:
find . -type f | xargs rm
That is, you specify a command to xargs, it appends the arguments it receives from standard input and then executes it. The avobe line would execute:
rm file1 file2 ...
But you also need to specify a final argument to the command, so you need to use the xarg -I parameter, which tells xargs the string you will use after to indicate where the arguments read from standard input will be put.
So, we use the string files to indicate it. Then we write the command, putting the string files where the variable arguments will be, resulting in:
xargs -I files convert -append files IMAGESTRIP.jpg
Put the list of filenames in a file called filelist.txt and call convert with the filename prepended by an ampersand:
convert #filelist.txt -append result.jpg
Here's a little example:
# Create three blocks of colour
convert xc:red[200x100] red.png
convert xc:lime[200x100] green.png
convert xc:blue[200x100] blue.png
# Put their names in a file called "filelist.txt"
echo "red.png green.png blue.png" > filelist.txt
# Tell ImageMagick to make a strip
convert #filelist.txt +append strip.png
As there's always some image with a pesky space in its name...
# Make the pesky one
convert -background black -pointsize 128 -fill white label:"Pesky" -resize x100 "image with pesky space.png"
# Whack it in the list for IM
echo "red.png green.png blue.png 'image with pesky space.png'" > filelist.txt
# IM do your stuff
convert #filelist.txt +append strip.png
By the way, it is generally poor practice to parse the output of ls in case there are spaces in your filenames. If you want to find a list of images, across directories and sort them by time, look at something like this:
# Find image files only - ignoring case, so "JPG", "jpg" both work
find . -type f -iname \*.jpg
# Now exec `stat` to get the file ages and quoted names
... -exec stat --format "%Y:%N {} \;
# Now sort that, and strip the times and colon at the start
... | sort -n | sed 's/^.*://'
# Put it all together
find . -type f -iname \*.jpg -exec stat --format "%Y:%N {} \; | sort -n | sed 's/^.*://'
Now you can either redirect all that to filelist.txt and call convert like this:
find ...as above... > file list.txt
convert #filelist +append strip.jpg
Or, if you want to avoid intermediate files and do it all in one go, you can make this monster where convert reads the filelist from its standard input stream:
find ...as above... | sed 's/^.*://' | convert #- +append strip.jpg

Get total size of a list of files in UNIX

I want to run a find command that will find a certain list of files and then iterate through that list of files to run some operations. I also want to find the total size of all the files in that list.
I'd like to make the list of files FIRST, then do the other operations. Is there an easy way I can report just the total size of all the files in the list?
In essence I am trying to find a one-liner for the 'total_size' variable in the code snippet below:
#!/bin/bash
loc_to_look='/foo/bar/location'
file_list=$(find $loc_to_look -type f -name "*.dat" -size +100M)
total_size=???
echo 'total size of all files is: '$total_size
for file in $file_list; do
# do a bunch of operations
done
You should simply be able to pass $file_list to du:
du -ch $file_list | tail -1 | cut -f 1
du options:
-c display a total
-h human readable (i.e. 17M)
du will print an entry for each file, followed by the total (with -c), so we use tail -1 to trim to only the last line and cut -f 1 to trim that line to only the first column.
Methods explained here have hidden bug. When file list is long, then it exceeds limit of shell comand size. Better use this one using du:
find <some_directories> <filters> -print0 | du <options> --files0-from=- --total -s|tail -1
find produces null ended file list, du takes it from stdin and counts.
this is independent of shell command size limit.
Of course, you can add to du some switches to get logical file size, because by default du told you how physical much space files will take.
But I think it is not question for programmers, but for unix admins :) then for stackoverflow this is out of topic.
This code adds up all the bytes from the trusty ls for all files (it excludes all directories... apparently they're 8kb per folder/directory)
cd /; find -type f -exec ls -s \; | awk '{sum+=$1;} END {print sum/1000;}'
Note: Execute as root. Result in megabytes.
The problem with du is that it adds up the size of the directory nodes as well. It is an issue when you want to sum up only the file sizes. (Btw., I feel strange that du has no option for ignoring the directories.)
In order to add the size of files under the current directory (recursively), I use the following command:
ls -laUR | grep -e "^\-" | tr -s " " | cut -d " " -f5 | awk '{sum+=$1} END {print sum}'
How it works: it lists all the files recursively ("R"), including the hidden files ("a") showing their file size ("l") and without ordering them ("U"). (This can be a thing when you have many files in the directories.) Then, we keep only the lines that start with "-" (these are the regular files, so we ignore directories and other stuffs). Then we merge the subsequent spaces into one so that the lines of the tabular aligned output of ls becomes a single-space-separated list of fields in each line. Then we cut the 5th field of each line, which stores the file size. The awk script sums these values up into the sum variable and prints the results.
ls -l | tr -s ' ' | cut -d ' ' -f <field number> is something I use a lot.
The 5th field is the size. Put that command in a for loop and add the size to an accumulator and you'll get the total size of all the files in a directory. Easier than learning AWK. Plus in the command substitution part, you can grep to limit what you're looking for (^- for files, and so on).
total=0
for size in $(ls -l | tr -s ' ' | cut -d ' ' -f 5) ; do
total=$(( ${total} + ${size} ))
done
echo ${total}
The method provided by #Znik helps with the bug encountered when the file list is too long.
However, on Solaris (which is a Unix), du does not have the -c or --total option, so it seems there is a need for a counter to accumulate file sizes.
In addition, if your file names contain special characters, this will not go too well through the pipe (Properly escaping output from pipe in xargs
).
Based on the initial question, the following works on Solaris (with a small amendment to the way the variable is created):
file_list=($(find $loc_to_look -type f -name "*.dat" -size +100M))
printf '%s\0' "${file_list[#]}" | xargs -0 du -k | awk '{total=total+$1} END {print total}'
The output is in KiB.

How to find Disk usage in UNIX? [duplicate]

So, in many situations I wanted a way to know how much of my disk space is used by what, so I know what to get rid of, convert to another format, store elsewhere (such as data DVDs), move to another partition, etc. In this case I'm looking at a Windows partition from a SliTaz Linux bootable media.
In most cases, what I want is the size of files and folders, and for that I use NCurses-based ncdu:
But in this case, I want a way to get the size of all files matching a regex. An example regex for .bak files:
.*\.bak$
How do I get that information, considering a standard Linux with core GNU utilities or BusyBox?
Edit: The output is intended to be parseable by a script.
I suggest something like: find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1
Some notes:
The -print0 option for find and --files0-from for du are there to avoid issues with whitespace in file names
The regular expression is matched against the whole path, e.g. ./dir1/subdir2/file.bak, not just file.bak, so if you modify it, take that into account
I used h flag for du to produce a "human-readable" format but if you want to parse the output, you may be better off with k (always use kilobytes)
If you remove the tail command, you will additionally see the sizes of particular files and directories
Sidenote: a nice GUI tool for finding out who ate your disk space is FileLight. It doesn't do regexes, but is very handy for finding big directories or files clogging your disk.
du is my favorite answer. If you have a fixed filesystem structure, you can use:
du -hc *.bak
If you need to add subdirs, just add:
du -hc *.bak **/*.bak **/**/*.bak
etc etc
However, this isn't a very useful command, so using your find:
TOTAL=0;for I in $(find . -name \*.bak); do TOTAL=$((TOTAL+$(du $I | awk '{print $1}'))); done; echo $TOTAL
That will echo the total size in bytes of all of the files you find.
Hope that helps.
Run this in a Bourne Shell to declare a function that calculates the sum of sizes of all the files matching a regex pattern in the current directory:
sizeofregex() { IFS=$'\n'; for x in $(find . -regex "$1" 2> /dev/null); do du -sk "$x" | cut -f1; done | awk '{s+=$1} END {print s}' | sed 's/^$/0/'; unset IFS; }
(Alternatively, you can put it in a script.)
Usage:
cd /where/to/look
sizeofregex 'myregex'
The result will be a number (in KiB), including 0 (if there are no files that match your regex).
If you do not want it to look in other filesystems (say you want to look for all .so files under /, which is a mount of /dev/sda1, but not under /home, which is a mount of /dev/sdb1, add a -xdev parameter to find in the function above.
The previous solutions didn't work properly for me (I had trouble piping du) but the following worked great:
find path/to/directory -iregex ".*\.bak$" -exec du -csh '{}' + | tail -1
The iregex option is a case insensitive regular expression. Use regex if you want it to be case sensitive.
If you aren't comfortable with regular expressions, you can use the iname or name flags (the former being case insensitive):
find path/to/directory -iname "*.bak" -exec du -csh '{}' + | tail -1
In case you want the size of every match (rather than just the combined total), simply leave out the piped tail command:
find path/to/directory -iname "*.bak" -exec du -csh '{}' +
These approaches avoid the subdirectory problem in #MaddHackers' answer.
Hope this helps others in the same situation (in my case, finding the size of all DLL's in a .NET solution).
If you're OK with glob-patterns and you're only interested in the current directory:
stat -c "%s" *.bak | awk '{sum += $1} END {print sum}'
or
sum=0
while read size; do (( sum += size )); done < <(stat -c "%s" *.bak)
echo $sum
The %s directive to stat gives bytes not kilobytes.
If you want to descend into subdirectories, with bash version 4, you can shopt -s globstar and use the pattern **/*.bak
The accepted reply suggests to use
find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1
but that doesn't work on my system as du doesn't know a --files-0-from option on my system. Only GNU du knows that option, it's neither part of the POSIX Standard (so you won't find it in FreeBSD or macOS), nor will you find it on BusyBox based Linux systems (e.g. most embedded Linux systems) or any other Linux system that does not use the GNU du version.
Then there's a reply suggesting to use:
find path/to/directory -iregex .*\.bak$ -exec du -csh '{}' + | tail -1
This solution will work as long as there aren't too many files found, as + means that find will try call du with as many hits as possible in a single call, however, there might be a maximum number of arguments (N) a system supports and if there are more hits than this value, find will call du multiple times, splitting the hits into groups smaller than or equal to N items each and this case the result will be wrong and only show the size of the last du call.
Finally there is an answer using stat and awk, which is a nice way to do it, but it relies on shell globbing in a way that only Bash 4.x or later supports. It will not work with older versions and if it works with other shells is unpredictable.
A POSIX conform solution (works on Linux, macOS and any BSD variants), that doesn't suffer by any limitation and that will surely work with every shell would be:
find . -regex '.*\.bak' -exec stat -f "%z" {} \; | awk '{s += $1} END {print s}'

Resources