Why does "... >> out | sort -n -o out" not actually run sort? - bash

As an exercise, I should find all .c files starting from my home directory, count the lines of each file and store the sorted output in sorted_statistics.txt, using find, wc, cut ad sort.
I found this command to work
find /home/user/ -type f -name "*.c" 2> /dev/null -exec wc -l {} \; | cut -f 1 -d " " | sort -n -o sorted_statistics.txt
but I can't understand why
find /home/user/ -type f -name "*.c" 2> /dev/null -exec wc -l {} \; | cut -f 1 -d " " >> sorted_statistics.txt | sort -n sorted_statistics.txt
stops just before the sort command.
Just out of curiosity, why is that?

You were appending everything to sorted_statistics.txt ( consuming all the output ) and then trying to use that none existing output in a pipe for sort. I have corrected your code so it works now.
find /home/user/ -type f -name "*.c" 2> /dev/null -exec wc -l {} \; | cut -f 1 -d " " >> tmp.txt && sort -n tmp.txt > sorted_statistics.txt
Regards!

This part of the command makes no sense:
cut -f 1 -d " " >> sorted_statistics.txt | sort ...
because the output of cut is appended to the file sorted_statistics.txt and no output at all goes to the sort command. You will probably want to use tee:
cut -f 1 -d " " | tee -a sorted_statistics.txt | sort ...
The tee command sends its input to a file and also to the standard output. It is like a Tee junction in a pipeline.

Related

execute an if statement on every folder

I have for example 3 files (it could 1 or it could be 30) like this :
name_date1.tgz
name_date2.tgz
name_date3.tgz
When extracted it will look like :
name_date1/data/info/
name_date2/data/info/
name_date3/data/info/
Here how it looks inside each folder:
name_date1/data/info/
you.log
you.log.1.gz
you.log.2.gz
you.log.3.gz
name_date2/data/info/
you.log
name_date3/data/info/
you.log
you.log.1.gz
you.log.2.gz
What I want to do is concatenate all you file from each folder and concatenate one more time all the concatenated one to one single file.
1st step: extract all the folder
for a in *.tgz
do
a_dir=${a%.tgz}
mkdir $a_dir 2>/dev/null
tar -xvzf $a -C $a_dir >/dev/null
done
2nd step: executing an if statement on each folder available and cat everything
myarray=(`find */data/info/ -maxdepth 1 -name "you.log.*.gz"`)
ls -d */ | xargs -I {} bash -c "cd '{}' &&
if [ ${#myarray[#]} -gt 0 ];
then
find data/info -name "you.log.*.gz" -print0 | sort -z -rn -t. -k4 | xargs -0 zcat | cat -
data/info/you.log > youfull1.log
else
cat - data/info/you.log > youfull1.log
fi "
cat */youfull1.log > youfull.log
My issue when I put multiple name_date*.tgzit gives me this error:
gzip: stdin: unexpected end of file
With the error, I still have all my files concatenated, but why error message ?
But when I put only one .tgz file then I don't have any issue regardless the number you file.
any suggestion please ?
Try something simpler. No need for myarray. Pass files one at a time as they are inputted and decide what to do with them one at a time. Try:
find */data/info -type f -maxdepth 1 -name "you.log*" -print0 |
sort -z |
xargs -0 -n1 bash -c '
if [[ "${1##*.}" == "gz" ]]; then
zcat "$1";
else
cat "$1";
fi
' --
If you have to iterate over directories, don't use ls, still use find.
find . -maxdepth 1 -type d -name 'name_date*' -print0 |
sort -z |
while IFS= read -r -d '' dir; do
cat "$dir"/data/info/you.log
find "$dir"/data/info -type f -maxdepth 1 -name 'you.log.*.gz' -print0 |
sort -z -t'.' -n -k3 |
xargs -r -0 zcat
done
or (if you have to) with xargs, which should give you the idea how it's used:
find . -maxdepth 1 -type d -name 'name_date*' -print0 |
sort -z |
xargs -0 -n1 bash -c '
cat "$1"/data/info/you.log
find "$1"/data/info -type f -maxdepth 1 -name "you.log.*.gz" -print0 |
sort -z -t"." -n -k3 |
xargs -r -0 zcat
' --
Use -t option with xargs to see what it's doing.

How to return an MD5 and SHA1 value for multiple files in a directory using BASH

I am creating a BASH script to take a directory as an argument and return to std out a list of all files in that directory with both the MD5 and SHA1 value of the files present in that directory. The only files I'm interested in are those between 100 and 500K. So far I gotten this far. (Section of Script)
cd $1 &&
find . -type f -size +100k -size -500k -printf '%f \t %s \t' -exec md5sum {} \; |
awk '{printf "NAME:" " " $1 "\t" "MD5:" " " $3 "\t" "BYTES:" "\t" $2 "\n"}'
I'm getting a little confused when adding the Sha1 and obviously leaving something out.
Can anybody suggest a way to achieve this.
Ideally I'd like the script to format in the following way
Name Md5 SHA1
(With the relevant fields underneath)
Your awk printf bit is overly complicated. Try this:
find . -type f -printf "%f\t%s\t" -exec md5sum {} \; | awk '{ printf "NAME: %s MD5: %s BYTES: %s\n", $1, $3, $2 }'
Just read line by line the list of files outputted by find:
find . -type f |
while IFS= read -r l; do
echo "$(basename "$l") $(md5sum <"$l" | cut -d" " -f1) $(sha1sum <"$l" | cut -d" " -f1)"
done
It's better to use a zero separated stream:
find . -type f -print0 |
while IFS= read -r -d '' l; do
echo "$(basename "$l") $(md5sum <"$l" | cut -d" " -f1) $(sha1sum <"$l" | cut -d" " -f1)"
done
You could speed up something with xargs and multiple processes with -P option to xargs:
find . -type f -print0 |
xargs -0 -n1 sh -c 'echo "$(basename "$1") $(md5sum <"$1" | cut -d" " -f1) $(sha1sum <"$1" | cut -d" " -f1)"' --
Consider adding -maxdepth 1 to find if you are not interested in files in subdirectories recursively.
It's easy from xargs to go to -exec:
find . -type f -exec sh -c 'echo "$1 $(md5sum <"$1" | cut -d" " -f1) $(sha1sum <"$1" | cut -d" " -f1)"' -- {} \;
Tested on repl.
Add those -size +100k -size -500k args to find to limit the sizes.
The | cut -d" " -f1 is used to remove the - that is outputted by both md5sum and sha1sum. If there are no spaces in filenames, you could run a single cut process for the whole stream, so it should be slightly faster:
find . -type f -print0 |
xargs -0 -n1 sh -c 'echo "$(basename "$1") $(md5sum <"$1") $(sha1sum <"$1")"' -- |
cut -d" " -f1,2,5
I also think that running a single md5sum and sha1sum process maybe would be faster rather then spawning multiple separate processes for each file, but such method needs storing all the filenames somewhere. Below a bash array is used:
IFS=$'\n' files=($(find . -type f))
paste -d' ' <(
printf "%s\n" "${files[#]}") <(
md5sum "${files[#]}" | cut -d' ' -f1) <(
sha1sum "${files[#]}" | cut -d' ' -f1)
Your find is fine, you want to join the results of two of those, one for each hash. The command for that is join, which expects sorted inputs.
doit() { find -type f -size +100k -size -500k -exec $1 {} + |sort -k2; }
join -j2 <(doit md5sum) <(doit sha1sum)
and that gets you the raw data in sane environments. If you want pretty data, you can use the column utility:
join -j2 <(doit md5sum) <(doit sha1sum) | column -t
and add nice headers:
(echo Name Md5 SHA1; join -j2 <(doit md5sum) <(doit sha1sum)) | column -t
and if you're in an unclean environment where people put spaces in file names, protect against that by subbing in tabs for the field markers:
doit() { find -type f -size +100k -size -500k -exec $1 {} + \
| sed 's, ,\t,'| sort -k2 -t$'\t' ; }
join -j2 -t$'\t' <(doit md5sum) <(doit sha1sum) | column -ts$'\t'

How to count files in subdir and filter output in bash

Hi hoping someone can help, I have some directories on disk and I want to count the number of files in them (as well as dir size if possible) and then strip info from the output. So far I have this
find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'echo -e $(find "{}" | wc -l) "{}"' | sort -n
This gets me all the dir's that match my pattern as well as the number of files - great!
This gives me something like
2 ./bob/sourceimages/psd/dzv_body.psd,d
2 ./bob/sourceimages/psd/dzv_body_nrm.psd,d
2 ./bob/sourceimages/psd/dzv_body_prm.psd,d
2 ./bob/sourceimages/psd/dzv_eyeball.psd,d
2 ./bob/sourceimages/psd/t_zbody.psd,d
2 ./bob/sourceimages/psd/t_gear.psd,d
2 ./bob/sourceimages/psd/t_pupil.psd,d
2 ./bob/sourceimages/z_vehicles_diff.tga,d
2 ./bob/sourceimages/zvehiclesa_diff.tga,d
5 ./bob/sourceimages/zvehicleswheel_diff.jpg,d
From that I would like to filter based on max number of files so > 4 for example, I would like to capture filetype as a variable for each remaining result e.g ./bob/sourceimages/zvehicleswheel_diff.jpg,d
I guess I could use awk for this?
Then finally I would like like to remove all the results from disk, with find I normally just do something like -exec rm -rf {} \; but I'm not clear how it would work here
Thanks a lot
EDITED
While this is clearly not the answer, these commands get me the info I want in the form I want it. I just need a way to put it all together and not search multiple times as that's total rubbish
filetype=$(find . -type d -name "*,d" -print0 | awk 'BEGIN { FS = "." }; {
print $3 }' | cut -d',' -f1)
filesize=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c 'du -h
{};' | awk '{ print $1 }')
filenumbers=$(find . -type d -name "*,d" -print0 | xargs -0 -I {} sh -c
'echo -e $(find "{}" | wc -l);')
files_count=`ls -keys | nl`
For instance:
ls | nl
nl printed numbers of lines

moving files with xargs

I want to pipe the output of ls into head and pipe it into mv.
I used the following command on terminal but it isn't working properly.
ls -t Downloads/ | head -7 | xargs -i mv {} ~/cso/
Please do rectify the error. Thanks in advance!
It is well documented that parsing ls output is not recommended. You can use this safe approach using find + sort + cut + head + xargs pipeline:
find . -maxdepth 1 -type f -printf '%T#\t%p\0' |
sort -z -rnk1 |
cut -z -f2 |
head -z -n 7 |
xargs -0 -I {} mv {} ~/cso/
Use -I like here :
ls -t Downloads/* | head -7 | xargs -I '{}' mv '{}' ~/cso/

How to print number of occurances of a word in a file in unix

This is my shell script.
Given a directory, and a word, search the directory and print the absolute path of the file that has the maximum occurrences of the word and also print the number of occurrences.
I have written the following script
#!/bin/bash
if [[ -n $(find / -type d -name $1 2> /dev/null) ]]
then
echo "Directory exists"
x=` echo " $(find / -type d -name $1 2> /dev/null)"`
echo "$x"
cd $x
y=$(find . -type f | xargs grep -c $2 | grep -v ":0"| grep -o '[^/]*$' | sort -t: -k2,1 -n -r )
echo "$y"
else
echo "Directory does does not exists"
fi
result: scriptname directoryname word
output: /somedirectory/vtb/wordsearch : 4
/foo/bar: 3
Is there any option to replace xargs grep -c $2 ? Because grep -c prints the count=number of lines which contains the word but i need to print the exact occurrence of a word in the files in a given directory
Using grep's -c count feature:
grep -c "SEARCH" /path/to/files* | sort -r -t : -k 2 | head -n 1
The grep command will output each file in a /path/name:count format, the sort will numerically (-n) sort by the 2nd (-k 2) field as delimited by a colon (-t :) in reverse order (-r). We then use head to keep the first result (-n 1).
Try This:
grep -o -w 'foo' bar.txt | wc -w
OR
grep -o -w 'word' /path/to/file/ | wc -w
grep -Fwor "$word" "$dir" | sed "s/:${word}\$//" | sort | uniq -c | sort -n | tail -1

Resources