improve bash script to add one more action if success - bash

I have a bash script to find out all non empty files under some directory. currently it only print the file name if found.
I would like to add one more line to print first 32 bytes in hex format.
#!/bin/sh
files=$(find /data/ -type f ! -empty)
for f in $files;
do
if [ -f "$f" ]; then
tr -d '\000' <$f | tr -c '\000' '\n' | grep -q -m 1 ^ || echo $f
fi
done
I try to add one more "&& xxd -g 1 -l 32 $f" at the end but it doesn't work!

Get the first 32 chars from the file:
dd if=so.bash ibs=32 count=1 2>/dev/null | od -h
dd gets the first 32 chars
od -h prints them in hex format
You could do it with xxd as well
xxd -l 32 $f
where $f is the file
#!/bin/bash
files=$(find /data/ -type f ! -empty)
for f in $files
do
if [ -f "$f" ]; then
tr -d '\000' <"$f" | tr -c '\000' '\n' | grep -q -m 1 ^ || echo $f
xxd -l 32 "$f"
echo ""
fi
done
the echo "" is to have an empty line between each file to split the output.

Suggesting one line gawk script:
gawk 'BEGINFILE{print FILENAME}/[^\x00]/{system("xxd -l 32 "FILENAME;nextfile}' $(find /data/ -type f ! -empty)
gawk Explanation
BEGINFILE{print FILENAME}
BEGINFILE{ # before processing a file
print FILENAME; # print the filename
}
/[^\x00]/{system("xxd -l 32 "FILENAME);nextfile}
/[^\x00]/{ # if records contains non NULL character
system("xxd -l 32 "FILENAME); # print first 32 hex charachters
nextfile; # read next file
}

Every command run in bash shell returns a value that’s stored in the bash variable “$?”.
Then the command can be split as below:
#!/bin/sh
files=$(find /data/ -type f ! -empty)
for f in $files;
do
if [ -f "$f" ]; then
tr -d '\000' <$f | tr -c '\000' '\n' | grep -q -m 1 ^
if [ $? -ne 0 ]; then
echo $f
xxd -l 32 $f
fi
fi
done

Related

Print top N files by word count in two columns

I would like to make a script that prints the filenames for the top n files from two directories (n being the number of files I give in in the command line) in order of number of words they have. My biggest problem however is in the way they should be displayed.
Say my command line looks like this:
myscript.sh 5 dir1 dir2
The output should have 2 columns: on the left the top 5 files in descending order from dir1, and on the right the top 5 files in descending order from dir2.
This is what I have in terms of code, however I'm missing something. I think that pr -m -t should do what i want, but I couldn't make it work.
#!/bin/bash
dir=$1
dir2=$2
for files in "$dir"
do
find ./reuters-topics/$dir -type f -exec wc -l {} + | sort -rn |head -n 15
done
for files in "$dir2"
do
find ./reuters-topics/$dir2 -type f -exec wc -l {} + | sort -rn | head -n 15
done
This is a solution in fish:
for i in (find . -type f); wc -l $i; end | sort -rn | head -n15 | awk '{print $2 "\t" $1}'
As you can see, the re-ordering (filename first, number of words second) is done by awk. As a separator I use a tab character:
awk '{print $2 "\t" $1}'
The difference between my loop and your find call, btw, is that I do not get the "total" line in the output.
I did not test if this (including awk) also works well for files with spaces in the name.
#!/usr/bin/env bash
_top_files_by_words_usage() {
local usage=""
read -r -d '' usage <<-"EOF"
Usage:
top_files_by_words <show_count> <dir1> <dir2>
EOF
1>&2 printf "%s\n" "$usage"
}
top_files_by_words() {
if (( $# != 3 )) || [[ "$1" != +([0-9]) ]]; then
_top_files_by_words_usage
return 1
fi
local -i showCount=0
local dir1=""
local dir2=""
showCount="$1"
dir1="$2"
dir2="$3"
shopt -s extglob
if [[ ! -d "$dir1" ]]; then
1>&2 printf "directory '%s' does not exist or is not a directory\n" "$dir1"
return 1
fi
if [[ ! -d "$dir2" ]]; then
1>&2 printf "directory '%s' does not exist or is not a directory\n" "$dir2"
return 1
fi
local -a out1=()
local -a out2=()
IFS=$'\n' read -r -d '' -a out1 < <(find "$dir1" -type f -exec wc -w {} \; | sort -k 1gr | head -n "$showCount")
IFS=$'\n' read -r -d '' -a out2 < <(find "$dir2" -type f -exec wc -w {} \; | sort -k 1gr | head -n "$showCount")
local -i i=0
local -i maxLen=0
local -i len=0;
for ((i = 0; i < showCount; ++i)); do
len="${#out1[$i]}"
if (( len > maxLen )); then
maxLen=$len
fi
# len="${#out2[$i]}"
# if (( len > maxLen )); then
# maxLen=$len
# fi
done
for (( i = 0; i < showCount; ++i)); do
printf "%-*.*s %s\n" "$maxLen" "$maxLen" "${out1[$i]}" "${out2[$i]}"
done
return 0
}
top_files_by_words "$#"
$ ~/tmp/count_words.bash 15 tex tikz
2309328 tex/resume.log 9692402 tikz/tikz-Graphics in LaTeX with TikZ.mp4
2242997 tex/resume_cv.log 2208818 tikz/tikz-Tikz-Graphs and Automata.mp4
2242969 tex/cover_letters/resume_cv.log 852631 tikz/tikz-Drawing Automata with TikZ in LaTeX.mp4
73859 tex/pgfplots/plotdata/heightmap.dat 711004 tikz/tikz-tutorial.mp4
49152 tex/pgfplots/lena.dat 300038 tikz/.ipynb_checkpoints/TikZ 11 Design Principles-checkpoint.ipynb
43354 tex/nancy.mp4 300038 tikz/TikZ 11 Design Principles.ipynb
31226 tex/pgfplots/pgfplotstodo.tex 215583 tikz/texample/bridges-of-konigsberg.svg
26000 tex/pgfplots/plotdata/ou.dat 108040 tikz/Visual TikZ.pdf
20481 tex/pgfplots/pgfplotstable.tex 82540 tikz/worldflags.pdf
19571 tex/pgfplots/pgfplots.reference.3dplots.tex 37608 tikz/texample/india-map.tex
19561 tex/pgfplots/plotdata/risingdrop3d_coord.dat 35798 tikz/.ipynb_checkpoints/TikZ-checkpoint.ipynb
19561 tex/pgfplots/plotdata/risingdrop3d_vel.dat 35656 tikz/texample/periodic_table.svg
18207 tex/pgfplots/ChangeLog 35501 tikz/TikZ.ipynb
17710 tex/pgfplots/pgfplots.reference.markers-meta.tex 25677 tikz/tikz-Graphics in LaTeX with TikZ.info.json
13800 tex/pgfplots/pgfplots.reference.axisdescription.tex 14760 tikz/tikz-Tikz-Graphs and Automata.info.json
column can print files side-by-side in columns. You can use process substitution with <(command) to have those "files" be live commands instead of actual files.
#!/bin/bash
top-files() {
local n="$1"
local dir="$2"
find "$dir" -type f -exec wc -l {} + |
head -n -1 | sort -rn | head -n "$n"
}
n="$1"
dir1="$2"
dir2="$3"
column <(top-files "$n" reuters-topics/"$dir1") \
<(top-files "$n" reuters-topics/"$dir2")

find and grep / zgrep / lzgrep progress bar

I would like to add a progress bar to this command line:
find . \( -iname "*.bz" -o -iname "*.zip" -o -iname "*.gz" -o -iname "*.rar" \) -print0 | while read -d '' file; do echo "$file"; lzgrep -a stringtosearch\.anything "$file"; done
The progress file should be calculated on the total of compressed size files (not on the single file).
Of course, it can be a script too.
I would also like to add other progress bars, if possible:
The total number of files processed (example 3 out of 21)
The percentage of progress of the single file
Can anybody help me please?
Here some example of it should look alike (example from here):
tar cf - /folder-with-big-files -P | pv -s $(du -sb /folder-with-big-files | awk '{print $1}') | gzip > big-files.tar.gz
Multiple progress bars (example from here):
pv -cN orig < foo.tar.bz2 | bzcat | pv -cN bzcat | gzip -9 | pv -cN gzip > foo.tar.gz
Thanks,
This is the first time I've ever heard of pv and it's not on any machine I have access to but assuming it needs to know a total at startup and then a number on each iteration of a command, you could do something like this to get a progress bar per file processed:
IFS= readarray -d '' files < <(find . -whatever -print0)
printf '%s\n' "${files[#]}" | pv -s "${#files[#]}" | command
The first line gives you an array of files so you can then use "${#files[#]}" to provide pv it's initial total value (looks like you use -s value for that?) and then do whatever you normally do to get progress as each file is processed.
I don't see any way to tell pv that the pipe it's reading from is NUL-terminated rather than newline-terminated so if your files can have newlines in their names then you'd have to figure out how to solve that problem.
To additionally get progress on a single file you might need something like:
IFS= readarray -d '' files < <(find . -whatever -print0)
printf '%s\n' "${files[#]}" |
pv -s "${#files[#]}" |
xargs -n 1 -I {} sh -c 'pv {} | command'
I don't have pv so all of the above is untested so check the syntax, especially since I've never heard of pv :-).
Thanks to Max C., I found a solution for the main question:
find ./ -type f -iname *\.gz -o -iname *\.bz | (tot=0;while read fname; do s=$(stat -c%s "$fname"); if [ ! -z "$s" ] ; then echo "$fname"; tot=$(($tot+$s)); fi; done; echo $tot) | tac | (read size; xargs -i{} cat "{}" | pv -s $size | lzgrep -a something -)
But this work only for gz and bz files, now I have to develop to use different tool according to extension.
I'm gonna to try the Ed solution too.
Thanks to ED and Max C., here the verision 0.2
This version work with zgrep, but not with lzgrep. :-\
#!/bin/bash
echo -n "collecting dump... "
IFS= readarray -d '' files < <(find . \( -iname "*.bz" -o -iname "*.gz" \) -print0)
echo done
echo "Calculating archives size..."
tot=0
for line in "${files[#]}"; do
s=$(stat -c\%s "$line")
if [ ! -z "$s" ]
then
tot=$(($tot+$s))
fi
done
(for line in "${files[#]}"; do
s=$(stat -c\%s "$line")
if [ ! -z "$s" ]
then
echo "$line"
fi
done
) | xargs -i{} sh -c 'echo Processing file: "{}" 1>&2 ; cat "{}"' | pv -s $tot | zgrep -a anything -

How to print number of occurances of a word in a file in unix

This is my shell script.
Given a directory, and a word, search the directory and print the absolute path of the file that has the maximum occurrences of the word and also print the number of occurrences.
I have written the following script
#!/bin/bash
if [[ -n $(find / -type d -name $1 2> /dev/null) ]]
then
echo "Directory exists"
x=` echo " $(find / -type d -name $1 2> /dev/null)"`
echo "$x"
cd $x
y=$(find . -type f | xargs grep -c $2 | grep -v ":0"| grep -o '[^/]*$' | sort -t: -k2,1 -n -r )
echo "$y"
else
echo "Directory does does not exists"
fi
result: scriptname directoryname word
output: /somedirectory/vtb/wordsearch : 4
/foo/bar: 3
Is there any option to replace xargs grep -c $2 ? Because grep -c prints the count=number of lines which contains the word but i need to print the exact occurrence of a word in the files in a given directory
Using grep's -c count feature:
grep -c "SEARCH" /path/to/files* | sort -r -t : -k 2 | head -n 1
The grep command will output each file in a /path/name:count format, the sort will numerically (-n) sort by the 2nd (-k 2) field as delimited by a colon (-t :) in reverse order (-r). We then use head to keep the first result (-n 1).
Try This:
grep -o -w 'foo' bar.txt | wc -w
OR
grep -o -w 'word' /path/to/file/ | wc -w
grep -Fwor "$word" "$dir" | sed "s/:${word}\$//" | sort | uniq -c | sort -n | tail -1

How do I list newest directory and add as variable to bash script to process files recursively

How do I list newest directory and add as variable to bash script to process files recursively
ls -t1 | head -n1
Works perfectly to list the latest directory, but I want to add that directory name to my script so I can process the files within using the following script:
#!/bin/bash
ls | while read -r FILE
do
mv -v "$FILE" `echo $FILE | tr ' ' '_' `
done
ls | while read -r FILE
do
mv -v "$FILE" `echo $FILE | tr '\*.JPEG' '\*.jpg' `
done
mogrify -resize 750 *.jpg
wait
jpegoptim *.jpg –max=70 --strip-all
exit
I also want to process the files recursively, there might be at most one level of sub directories.
Basically keep the bash script at the root of the directory and process all latest directories and sub directories files.
OK I modified the script to this:
#!/bin/bash
DIR=ls -t1 | head -n1
ls $DIR | while read -r FILE
do
mv -v "$FILE" `echo $FILE | tr ' ' '_' `
done
ls $DIR | while read -r FILE
do
mv -v "$FILE" `echo $FILE | tr '\*.JPEG' '\*.jpg' `
done
mogrify -resize 750 $DIR/*.jpg
wait
jpegoptim $DIR/*.jpg –max=70 --strip-all
exit
But it does not seem to recognise the $DIR variable.
This bash script will rename and convert the jpg files in the newest directory in the current directory and the files in the first level of directories under that directory.
#!/bin/bash
FIRST_DIR=`ls -t1F | grep / | head -n1`
DIR="./${FIRST_DIR}"
ls -t1F $DIR | while read -r FILE
do
if [ "$FILE" ]
then
if [[ $FILE = */ ]]
then
echo "here ${DIR}${FILE}."
DEEP_DIR="${DIR}${FILE}"
ls -t1 $DEEP_DIR | while read -r FILE2
do
if [ "$FILE2" ]
then
if [[ $FILE2 != */ ]]
then
RENAME=`echo ${FILE2//\*/} | tr ' ' '_' `
mv -v "${DEEP_DIR}${FILE2//\*/}" "${DEEP_DIR}${RENAME}"
FILE2=$RENAME
RENAME2=`echo ${FILE2//\*/} | tr '\*.JPEG' '\*.jpg' `
mv -v "${DEEP_DIR}${FILE2//\*/}" "${DEEP_DIR}${RENAME2}"
FILE2=$RENAME2
fi
fi
done
mogrify -resize 750 "$DEEP_DIR*.jpg"
wait
jpegoptim "$DEEP_DIR*.jpg" –max=70 --strip-all
fi
if [[ $FILE != */ ]]
then
RENAME=`echo ${FILE//\*/} | tr ' ' '_' `
mv -v "${DIR}${FILE//\*/}" "${DIR}${RENAME}"
FILE=$RENAME
RENAME2=`echo ${FILE//\*/} | tr '\*.JPEG' '\*.jpg' `
mv -v "${DIR}${FILE//\*/}" "${DIR}${RENAME2}"
FILE=$RENAME2
fi
fi
done
if [ "$FIRST_DIR" ]
then
mogrify -resize 750 "$DIR*.jpg"
wait
jpegoptim "$DIR*.jpg" –max=70 --strip-all
fi
Here are a couple of good links about bash programming:
http://tldp.org/LDP/abs/html/comparison-ops.html
http://tldp.org/LDP/abs/html/string-manipulation.html

Get just the integer from wc in bash

Is there a way to get the integer that wc returns in bash?
Basically I want to write the line numbers and word counts to the screen after the file name.
output: filename linecount wordcount
Here is what I have so far:
files=\`ls`
for f in $files;
do
if [ ! -d $f ] #only print out information about files !directories
then
# some way of getting the wc integers into shell variables and then printing them
echo "$f $lines $words"
fi
done
Most simple answer ever:
wc < filename
Just:
wc -l < file_name
will do the job. But this output includes prefixed whitespace as wc right-aligns the number.
You can use the cut command to get just the first word of wc's output (which is the line or word count):
lines=`wc -l $f | cut -f1 -d' '`
words=`wc -w $f | cut -f1 -d' '`
wc $file | awk {'print "$4" "$2" "$1"'}
Adjust as necessary for your layout.
It's also nicer to use positive logic ("is a file") over negative ("not a directory")
[ -f $file ] && wc $file | awk {'print "$4" "$2" "$1"'}
Sometimes wc outputs in different formats in different platforms. For example:
In OS X:
$ echo aa | wc -l
1
In Centos:
$ echo aa | wc -l
1
So using only cut may not retrieve the number. Instead try tr to delete space characters:
$ echo aa | wc -l | tr -d ' '
The accepted/popular answers do not work on OSX.
Any of the following should be portable on bsd and linux.
wc -l < "$f" | tr -d ' '
OR
wc -l "$f" | tr -s ' ' | cut -d ' ' -f 2
OR
wc -l "$f" | awk '{print $1}'
If you redirect the filename into wc it omits the filename on output.
Bash:
read lines words characters <<< $(wc < filename)
or
read lines words characters <<EOF
$(wc < filename)
EOF
Instead of using for to iterate over the output of ls, do this:
for f in *
which will work if there are filenames that include spaces.
If you can't use globbing, you should pipe into a while read loop:
find ... | while read -r f
or use process substitution
while read -r f
do
something
done < <(find ...)
If the file is small you can afford calling wc twice, and use something like the following, which avoids piping into an extra process:
lines=$((`wc -l "$f"`))
words=$((`wc -w "$f"`))
The $((...)) is the Arithmetic Expansion of bash. It removes any whitespace from the output of wc in this case.
This solution makes more sense if you need either the linecount or the wordcount.
How about with sed?
wc -l /path/to/file.ext | sed 's/ *\([0-9]* \).*/\1/'
typeset -i a=$(wc -l fileName.dat | xargs echo | cut -d' ' -f1)
Try this for numeric result:
nlines=$( wc -l < $myfile )
Something like this may help:
#!/bin/bash
printf '%-10s %-10s %-10s\n' 'File' 'Lines' 'Words'
for fname in file_name_pattern*; {
[[ -d $fname ]] && continue
lines=0
words=()
while read -r line; do
((lines++))
words+=($line)
done < "$fname"
printf '%-10s %-10s %-10s\n' "$fname" "$lines" "${#words[#]}"
}
To (1) run wc once, and (2) not assign any superfluous variables, use
read lines words <<< $(wc < $f | awk '{ print $1, $2 }')
Full code:
for f in *
do
if [ ! -d $f ]
then
read lines words <<< $(wc < $f | awk '{ print $1, $2 }')
echo "$f $lines $words"
fi
done
Example output:
$ find . -maxdepth 1 -type f -exec wc {} \; # without formatting
1 2 27 ./CNAME
21 169 1065 ./LICENSE
33 130 961 ./README.md
86 215 2997 ./404.html
71 168 2579 ./index.html
21 21 478 ./sitemap.xml
$ # the above code
404.html 86 215
CNAME 1 2
index.html 71 168
LICENSE 21 169
README.md 33 130
sitemap.xml 21 21
Solutions proposed in the answered question doesn't work for Darwin kernels.
Please, consider following solutions that work for all UNIX systems:
print exactly the number of lines of a file:
wc -l < file.txt | xargs
print exactly the number of characters of a file:
wc -m < file.txt | xargs
print exactly the number of bytes of a file:
wc -c < file.txt | xargs
print exactly the number of words of a file:
wc -w < file.txt | xargs
There is a great solution with examples on stackoverflow here
I will copy the simplest solution here:
FOO="bar"
echo -n "$FOO" | wc -l | bc # "3"
Maybe these pages should be merged?
Try this:
wc `ls` | awk '{ LINE += $1; WC += $2 } END { print "lines: " LINE " words: " WC }'
It creates a line count, and word count (LINE and WC), and increase them with the values extracted from wc (using $1 for the first column's value and $2 for the second) and finally prints the results.
"Basically I want to write the line numbers and word counts to the screen after the file name."
answer=(`wc $f`)
echo -e"${answer[3]}
lines: ${answer[0]}
words: ${answer[1]}
bytes: ${answer[2]}"
Outputs :
myfile.txt
lines: 10
words: 20
bytes: 120
files=`ls`
echo "$files" | wc -l | perl -pe "s#^\s+##"
You have to use input redirection for wc:
number_of_lines=$(wc -l <myfile.txt)
respectively in your context
echo "$f $(wc -l <"$f") $(wc -w <"$f")"

Resources