Naming files after splitting them with split command within a pipeline - bash

I've been recently dealing with splitting large files and processing them further.
My current pipe is very simple>
find . -type f -size +100M | split -b 100M
But what this does is not exactly what I'm after. I would like splitted files named similarly to input files to split function, for example> inputs of find are>
file1
file2
file3
I would like output in the lines of for example
file101 file102 ...
file201 file202 ...
file301 file302 ...
I tried with >
split -b 100M -d $(find . -type f -size +1000M) $(find . -type f -size +1000M)
but it doesn't work as I wish, it throws error!
Thanks.

split doesn't read the filenames from standard input. You have to give the filename as an argument. You can do that with the -exec option to find. Use the {} placeholder to substitute the filename for both the input file and the output prefix arguments.
find . -type f -size +100M -exec split -b 100M -d {} {} \;

Related

How to exclude files from list

I have ~100k files in directory. I need to delete some of them files excluding list of(15k different pattern) pattern:
Directory: /20210111/
Example files:
/20210111/xxx_yyy_zzz.zip
/20210111/aaa_bbb_ccc.zip
/20210111/ddd_eee_fff.zip
...
Exclude.list
ddd
aaa
...
I tried with find:
find /20210111/ -type f -iname "*.zip" ! -iname "*$(cat Exclude.list)*" -exec ...
Getting error: arguments too long. Because exclude.list have a lot of lines.
How can I do that?
You can use grep to filter the output of find, then use xargs to process the resulting list.
find /20210111/ -type f -iname '*.zip' -print0 \
| grep -zvFf Exclude.list - \
| xargs -0 rm
The -print0, -z, and -0 are used to separate the filenames by the null byte, so filenames can contain any valid character (you can't store patterns containing literal newlines in your Exclude.list, anyway).
grep's -F interprets the patterns as fixed strings instead of regexes.

using the `find` command with -exec for a lot of files, how can I compare the output against a value?

I am trying to iterate over a large number of images and check their 'mean' value using ImageMagick.
The following command finds the images I want to check, and executes the correct command on them.
find `pwd` -type f -name "*.png" -exec /usr/bin/identify -ping -format "%[mean]" info: {} \;
Now I want to compare the output to see if it comes up with a certain value, 942.333
How can I get the output of each value that find returns to check and spit out the filename of any matched image who has the ouput of 942.333 from my command?
Thanks!
Change your identify command so it outputs the filename and the mean, then use grep:
find `pwd` -type f -name "*.png" -exec identify -ping -format "%[mean] %f\n" {} \; | grep "942.333"
Or, if you really have lots of images, you could put all your lovely CPU cores to work and do them in parallel, using GNU Parallel:
find . -name \*.png -print0 | parallel -m -0 'identify -ping -format "%[mean] %f\n" {1}' | grep ...

How do I recursively find files with specific names and join using ImageMagick in Terminal?

I have created an ImageMagick command to join images with certain names:
convert -append *A_SLIDER.jpg *B_SLIDER.jpg out.jpg
I have lots of folders with files named *A_SLIDER.jpg and *B_SLIDER.jpg next to each other (only ever one pair in a directory).
I would like to recursively search a directory with many folders and execute the command to join the images.
If it is possible to name the output image based on the input images that would be great e.g.
=> DOGS_A_SLIDER.jpg and DOGS_B_SLIDER.jpg would combine to DOGS_SLIDER.jpg
Something like this, but back up first and try on a sample directory only!
#!/bin/bash
find . -name "*A_SLIDER*" -execdir bash -c ' \
out=$(ls *A_SLIDER*);
out=${out/_A/}; \
convert -append "*A_SLIDER*" "*B_SLIDER*" $out' \;
Find all files containing the letters "A_SLIDER" and go to the containing directory and start bash there. While you are there, get the name of the file, and remove the _A part to form the output filename. Then execute ImageMagick convert with the _A_ and the corresponding _B_ files to form the output file.
Or, a slightly more concise suggestion from #gniourf_gniourf... thank you.
#!/bin/bash
find . -name "*A_SLIDER.jpg" -type f -execdir bash -c 'convert -append "$1" "${1/_A_/_B_}" "${1/_A/}"' _ {} \;
The "find" command will recursively search folders:
$ find . -name "*.jpg" -print
That will display all the filenames. You might instead want "-iname" which does case-insensitive filename matching.
You can add a command line with "-exec", in which "{}" is replaced by the name of the file. You must terminate the command line with "\;":
$ find . -name "*.jpg" -exec ls -l {} \;
You can use sed to edit the name of a file:
$ echo DOGS_A_SLIDER.jpg | sed 's=_.*$=='
DOGS
Can you count on all of your "B" files being named the same as the corresponding "A" files? That is, you will not have "DOGS_A_SLIDER.jpg" and "CATS_A_SLIDER.jpg" in the same directory. If so, something like the following isn't everything you need, but will contribute to your solution:
$ find . -type f -name "*.jpg" -exec "(echo {} | sed 's=_.*==')" \;
That particular sed script will do the wrong thing if you have any directory names with underscores in them.
"find . -type f" finds regular files; it runs modestly faster than without the -type. Use "-d" to find directories.

Get the filename from bash script recursively using find

I am trying retrieve the filename from the find command recursively. This command prints all the filenames with full path
> for f in $(find -name '*.png'); do echo "$f";done
> ./x.png
> ./bg.png
> ./s/bg.png
But when i try to just get the name of the file using these commands, it prints
for f in $(find -name '*.png'); do echo "${f##*/}";done
bg.png
and
for f in $(find -name '*.png'); do echo $(basename $f);done
bg.png
It omits other 2 files. I am new to shell scripting. I couln't figure out whats wrong with this one.
EDIT:
THis is what i have actually wanted.
I want to loop through a directory recursively and find all png images
send it to pngnq for RGBA compression
It outputs the new file with orgfilename-nq8.png
send it to pngcrush and rename and generate a new file (org file will be overwritten )
remove new file
i have a code which works on a single directory
for f in *.png; do pngnq -f -n 256 "$f" && pngcrush "${f%.*}"-nq8.png "$f";rm "${f%.*}"-nq8.png; done
I want to do this recursively
Simply do :
find -name '*.png' -printf '%f\n'
If you want to run something for each files :
find -name '*.png' -printf '%f\n' |
while read file; do
# do something with "$file"
done
Or with xargs :
find -name '*.png' -printf '%f\n' | xargs -n1 command
Be sure you cannot use find directly like this :
find -name '*.png' -exec command {} +
or
find -name '*.png' -exec bash -c 'do_something with ${1##*/}' -- {} \;
Search -printf on http://unixhelp.ed.ac.uk/CGI/man-cgi?find or in
man find | less +/^' *-printf'

Find files containing a given text

In bash I want to return file name (and the path to the file) for every file of type .php|.html|.js containing the case-insensitive string "document.cookie" | "setcookie"
How would I do that?
egrep -ir --include=*.{php,html,js} "(document.cookie|setcookie)" .
The r flag means to search recursively (search subdirectories). The i flag means case insensitive.
If you just want file names add the l (lowercase L) flag:
egrep -lir --include=*.{php,html,js} "(document.cookie|setcookie)" .
Try something like grep -r -n -i --include="*.html *.php *.js" searchstrinhere .
the -i makes it case insensitlve
the . at the end means you want to start from your current directory, this could be substituted with any directory.
the -r means do this recursively, right down the directory tree
the -n prints the line number for matches.
the --include lets you add file names, extensions. Wildcards accepted
For more info see: http://www.gnu.org/software/grep/
find them and grep for the string:
This will find all files of your 3 types in /starting/path and grep for the regular expression '(document\.cookie|setcookie)'. Split over 2 lines with the backslash just for readability...
find /starting/path -type f -name "*.php" -o -name "*.html" -o -name "*.js" | \
xargs egrep -i '(document\.cookie|setcookie)'
Sounds like a perfect job for grep or perhaps ack
Or this wonderful construction:
find . -type f \( -name *.php -o -name *.html -o -name *.js \) -exec grep "document.cookie\|setcookie" /dev/null {} \;
find . -type f -name '*php' -o -name '*js' -o -name '*html' |\
xargs grep -liE 'document\.cookie|setcookie'
Just to include one more alternative, you could also use this:
find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \;
Where:
-regextype posix-extended tells find what kind of regex to expect
-regex "^.*\.(php|html|js)$" tells find the regex itself filenames must match
-exec grep -EH '(document\.cookie|setcookie)' {} \; tells find to run the command (with its options and arguments) specified between the -exec option and the \; for each file it finds, where {} represents where the file path goes in this command.
while
E option tells grep to use extended regex (to support the parentheses) and...
H option tells grep to print file paths before the matches.
And, given this, if you only want file paths, you may use:
find "/starting/path" -type f -regextype posix-extended -regex "^.*\.(php|html|js)$" -exec grep -EH '(document\.cookie|setcookie)' {} \; | sed -r 's/(^.*):.*$/\1/' | sort -u
Where
| [pipe] send the output of find to the next command after this (which is sed, then sort)
r option tells sed to use extended regex.
s/HI/BYE/ tells sed to replace every First occurrence (per line) of "HI" with "BYE" and...
s/(^.*):.*$/\1/ tells it to replace the regex (^.*):.*$ (meaning a group [stuff enclosed by ()] including everything [.* = one or more of any-character] from the beginning of the line [^] till' the first ':' followed by anything till' the end of line [$]) by the first group [\1] of the replaced regex.
u tells sort to remove duplicate entries (take sort -u as optional).
...FAR from being the most elegant way. As I said, my intention is to increase the range of possibilities (and also to give more complete explanations on some tools you could use).

Resources