Passing awk results to command after pipe - bash

I'm trying to pass what would be the awk outputs of print $1 and print $2 to setfattr after a pipe. The value of the extended attribute is an MD5 hash which is calculated from input files from the output of a find command. This is what I have so far:
find /path/to/dir -type f \
-regextype posix-extended \
-not -iregex '.*\.(jpg|docx|psd|jpeg|png|html|bmp|gif|txt|pdf|mp3|bts|srt)' \
| parallel -j 64 md5sum | awk '{system("setfattr -n user.digest.md5 -v " $1 $2)}'
Having awk '{print $1}' and $2 after the last pipe returns the hash and file path respectively just fine, I'm just not sure how to get those values into setfattr. setfattr just throws a generic usage error when that command is run. Is this just a syntax issue or am I going about this totally wrong?

Try piping the output of the parallel command into a while loop:
find /path/to/dir -type f \
-regextype posix-extended \
-not -iregex '.*\.(jpg|docx|psd|jpeg|png|html|bmp|gif|txt|pdf|mp3|bts|srt)' |
parallel -j 64 md5sum |
while read hash file; do
setfattr -n user.digest.md5 -v ${hash} ${file}
done

Related

Displaying the result of two grep commands in bash

I am trying to find the number of files in a directory with two different patterns in the filenames. I don't want the combined count, but display the combined result.
Command 1: find | grep ".coded" | wc -l | Output : 4533
Command 2: find | grep ".read" | wc -l | Output: 654
Output sought: 4533 | 654 in one line
Any suggestions? Thanks!
With the bash shell using process substitution and pr
pr -mts' | ' <(find | grep "\.coded" | wc -l) <(find | grep "\.read" | wc -l)
With GNU find, you can use -printf to print whatever you want, for example a c for each file matching .coded and an "r" for each file matching .read, and then use awk to count how many of each you have:
find -type f \
\( -name '*.coded*' -printf 'c\n' \) \
-o \
\( -name '*.read*' -printf 'r\n' \) \
| awk '{ ++a[$0] } END{ printf "%d | %d\n", a["c"], a["r"] }'
By the way, your grep patterns match Xcoded and Yread, or really anything for your period; if it is a literal period, it has to be escaped, as in '\.coded' and '\.read'. Also, if your filenames contain linebreaks, your count is off.

How to count files in subdir and filter output in bash

Hi hoping someone can help, I have some directories on disk and I want to count the number of files in them (as well as dir size if possible) and then strip info from the output. So far I have this
find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'echo -e $(find "{}" | wc -l) "{}"' | sort -n
This gets me all the dir's that match my pattern as well as the number of files - great!
This gives me something like
2 ./bob/sourceimages/psd/dzv_body.psd,d
2 ./bob/sourceimages/psd/dzv_body_nrm.psd,d
2 ./bob/sourceimages/psd/dzv_body_prm.psd,d
2 ./bob/sourceimages/psd/dzv_eyeball.psd,d
2 ./bob/sourceimages/psd/t_zbody.psd,d
2 ./bob/sourceimages/psd/t_gear.psd,d
2 ./bob/sourceimages/psd/t_pupil.psd,d
2 ./bob/sourceimages/z_vehicles_diff.tga,d
2 ./bob/sourceimages/zvehiclesa_diff.tga,d
5 ./bob/sourceimages/zvehicleswheel_diff.jpg,d
From that I would like to filter based on max number of files so > 4 for example, I would like to capture filetype as a variable for each remaining result e.g ./bob/sourceimages/zvehicleswheel_diff.jpg,d
I guess I could use awk for this?
Then finally I would like like to remove all the results from disk, with find I normally just do something like -exec rm -rf {} \; but I'm not clear how it would work here
Thanks a lot
EDITED
While this is clearly not the answer, these commands get me the info I want in the form I want it. I just need a way to put it all together and not search multiple times as that's total rubbish
filetype=$(find . -type d -name "*,d" -print0 | awk 'BEGIN { FS = "." }; {
print $3 }' | cut -d',' -f1)
filesize=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'du -h
{};' | awk '{ print $1 }')
filenumbers=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c
'echo -e $(find "{}" | wc -l);')
files_count=`ls -keys | nl`
For instance:
ls | nl
nl printed numbers of lines

Print an ordered list of files based on files size in bash

I made the following script to find files based on a 'find' command and then print out the results:
#!/bin/bash
loc_to_look='./'
file_list=$(find $loc_to_look -type f -name "*.txt" -size +5M)
total_size=`du -ch $file_list | tail -1 | cut -f 1`
echo 'total size of all files is: '$total_size
for file in $file_list; do
size_of_file=`du -h $file | cut -f 1`
echo $file" "$size_of_file
done
...which give me output like:
>>> ./file_01.txt 12.0M
>>> ./file_04.txt 24.0M
>>> ./file_06.txt 6.0M
>>> ./file_02.txt 6.2M
>>> ./file_07.txt 84.0M
>>> ./file_09.txt 55.0M
>>> ./file_10.txt 96.0M
What I would like to do first, though, is sort the list by file size before printing it out. What is the best way to go about doing this?
Easy to do if you grab the file size in bytes, just pipe to sort
find $loc_to_look -type f -name "*.txt" -size +5M -printf "%f %s\n" | sort -n -k 2
If you wanted to make the file sizes print in MB, you could finally pipe to awk:
find $loc_to_look -type f -printf "%f %s\n" | sort -n -k 2 | awk '{ printf "%s %.1fM\n", $1, $2/1024/1024}'

Using awk to print ALL spaces within filenames which have a varied number of spaces

I'm executing the following using bash and awk to get the potentially space-full filename, colon, file size. (Column 5 contains the space delimited size, and 9 to EOL the file name):
src="Desktop"
echo "Constructing $src files list. `date`"
cat /dev/null > "$src"Files.txt
find -s ~/"$src" -type f -exec ls -l {} \; |
awk '{for(i=9;i<=NF;i++) {printf("%s", $i " ")} print ":" $5}' |
grep -v ".DS_Store" | grep -v "Icon\r" |
while read line ; do filespacesize=`basename "$line"`; filesize=`echo "$filespacesize" |
sed -e 's/ :/:/1'`
path=`dirname "$line"`; echo "$filesize:$path" >> "$src"Files.txt ;
done
And it works fine, BUT…
If a filename has > 1 space between parts, I only get 1 space between filename parts, and the colon, followed by the filesize.
How can I get the full filename, :, and then the file size?
It seems you want the following (provided your find handles the printf option with the %f, %s and %h modifiers):
src=Desktop
echo "Constructing $src files list. $(date)"
find ~/"$src" -type f -printf '%f:%s:%h\n' > "$src"Files.txt
Much shorter and much more efficient than your method!
This will not discard the .DS_STORE and Icon\r things… but I'm not really sure what you really want to discard. If you want to discard the .DS_STORE directory altogether:
find ~/"$src" -name '.DS_STORE' -type d -prune -o -type f -printf '%f:%s:%h\n' > "$src"Files.txt
#guido seems to have guessed what you mean by grep -v "Icon\r": ignore files ending with Icon; if this his guess is right, then this will do:
find ~/"$src" -name '.DS_STORE' -type d -prune -o ! -name '*Icon' -type f -printf '%f:%s:%h\n' > "$src"Files.txt

bash - padding find results

I'm running the following command to get a directory listing:
find ./../ \
-type f -newer ./lastsearchstamp -path . -prune -name '*.txt' -o -name '*.log' \
| awk -F/ '{print $NF " - " $FILENAME}'
Is there some way I can format the output in a 2 column left indented layout so that the output looks legible?
The command above always adds a constant spacing between the filename and the path.
Expected output:
abc.txt /root/somefolder/someotherfolder/
helloworld.txt /root/folder/someotherfolder/
a.sh /root/folder/someotherfolder/scripts
I nice tool for this kind of thing is column -t. You just add the command on to the end of the pipeline:
find ... | awk -F/ '{print $NF " - " $FILENAME}' | column -t

Resources