How to pass parameter from bash and string comparison in awk? - bash

How to pass a parameter to awk to compare the string with pipe input?
For example, followings are used to filter files created before Aug 2011 under specific folder
#!/bin/bash
$FILTER_DIR=$1
# file date before it should be listed.
FILTER_BEFORE="2011-08"
# $6 in awk statement is date of file name($8)
find $1 -type f | \
sed 's/^/ls -l /g' | \
sh | \
awk ' if ( $6 le $FILTER_BEFORE ) { print $8 }'
The result list all files under $FILER_DIR without filtering.
It seems AWK didnot receive $FILTER_BEFORE from bash properly.
Any comment is appreciated!!

if using gawk, pass it as a parameter
find $1 -type f |
sed 's/^/ls -l /g' |
sh |
awk -v filter_before=${FILTER_BEFORE} '{ if ( $6 <= filter_before ) { print $8 } }'

You will need to use double quotes and escape the other AWK variables so they don't get interpreted by bash.
find $1 -type f | \
sed 's/^/ls -l /g' | \
sh | \
awk " if ( \$6 le \"$FILTER_BEFORE\" ) { print \$8 }"
Alternatively you can break out just the variable into double quotes so you can avoid escaping.
find $1 -type f | \
sed 's/^/ls -l /g' | \
sh | \
awk ' if ( $6 le "'"$FILTER_BEFORE"'" ) { print $8 }'

I'd go with:
touch -t 201107302359 30_july_2011
find . -type f ! -newer 30_july_2011
Or this (GNU find only):
find . -type f ! -newermt '2011-07-30 23:59'

Following statements seem work properly.
Thanks for everybody's help.
find $1 -type f | \
sed 's/^/ls -l /g' | \
sh | \
awk -v filter_before=${FILTER_BEFORE} '{ if ( $6 < filter_before ) { print $8 } }'

Related

Get second part of output separated by two spaces

I have this script
#!/bin/bash
path=$1
find "$path" -type f -exec sha1sum {} \; | sort | uniq -D -w 32
It outputs this:
3c8b9f4b983afa9f644d26e2b34fa3e03a2bef16 ./dups/dup1-1.txt
3c8b9f4b983afa9f644d26e2b34fa3e03a2bef16 ./dups/dup1.txt
ffc752244b634abb4ed68d280dc74ec3152c4826 ./dups/subdups/dup2-2.txt
ffc752244b634abb4ed68d280dc74ec3152c4826 ./dups/subdups/dup2.txt
Now I only want to save the last part (the path) in an array.
When I add this after the sort
| awk -F " " '{ print $1 }'
I get this as output:
3c8b9f4b983afa9f644d26e2b34fa3e03a2bef16
3c8b9f4b983afa9f644d26e2b34fa3e03a2bef16
ffc752244b634abb4ed68d280dc74ec3152c4826
ffc752244b634abb4ed68d280dc74ec3152c4826
When I change the $1 to $2, I get nothing, but I want to get the path of the file.
How should I do this?
EDIT:
This script
#!/bin/bash
path=$1
find "$path" -type f -exec sha1sum {} \; | awk '{ print $1 }' | sort | uniq -D -w 32
Outputs this
parallels#mbp:~/bin$ duper ./dups
3c8b9f4b983afa9f644d26e2b34fa3e03a2bef16
3c8b9f4b983afa9f644d26e2b34fa3e03a2bef16
ffc752244b634abb4ed68d280dc74ec3152c4826
ffc752244b634abb4ed68d280dc74ec3152c4826
When I change it to $2 it outputs this
parallels#mbp:~/bin$ duper ./dups
parallels#mbp:~/bin$
Expected Output
./dups/dup1-1.txt
./dups/dup1.txt
./dups/subdups/dup2-2.txt
./dups/subdups/dup2.txt
There are some files in the directory that are no duplicates of each other. Such as nodup1.txt and nodup2.txt. That's why it doesn't show up.
Change your find command to this:
find "$path" -type f -exec sha1sum {} \; | uniq -D -w 41 | awk '{print $2}' | sort
I moved the uniq as the first filter and it is taking into consideration just the first 41 characters, aiming to match just the sha1sum hash.
You can achieve the same result piping to tr and then cut:
echo '3c8b9f4b983afa9f644d26e2b34fa3e03a2bef16 ./dups/dup1-1.txt' |\
tr -s ' ' | cut -d ' ' -f 2
Outputs:
./dups/dup1-1.txt
-s ' ' on tr is to squeeze spaces
-d ' ' -f 2 on cut is to output the second field delimited by spaces
I like to use cut for stuff like this. With this input:
3c8b9f4b983afa9f644d26e2b34fa3e03a2bef16 ./dups/dup1-1.txt
I'd do cut -d ' ' -f 2 which should return:
./dups/dup1-1.txt
I haven't tested it though for your case.
EDIT: Gonzalo Matheu's answer is better as he ensured to remove any extra spaces between your outputs before doing the cut.

Getting a list of substring based unique filenames in an array

I have a directory my_dir with files having names like:
a_v5.json
a_v5.mapping.json
a_v5.settings.json
f_v39.json
f_v39.mapping.json
f_v39.settings.json
f_v40.json
f_v40.mapping.json
f_v40.settings.json
c_v1.json
c_v1.mapping.json
c_v1.settings.json
I'm looking for a way to get an array [a_v5, f_v40, c_v1] in bash. Here, array of file names with the latest version number is what I need.
Tried this: ls *.json | find . -type f -exec basename "{}" \; | cut -d. -f1, but it returns the results with files which are not of the .json extension.
You can use the following command if your filenames don't contain whitespace and special symbols like * or ?:
array=($(
find . -type f -iname \*.json |
sed -E 's|(.*/)*(.*_v)([0-9]+)\..*|\2 \3|' |
sort -Vr | sort -uk1,1 | tr -d ' '
))
It's ugly and unsafe. The following solution is longer but can handle all file names, even those with linebreaks in them.
maxversions() {
find -type f -iname \*.json -print0 |
gawk 'BEGIN { RS = "\0"; ORS = "\0" }
match($0, /(.*\/)*(.*_v)([0-9]+)\..*/, group) {
prefix = group[2];
version = group[3];
if (version > maxversion[prefix])
maxversion[prefix] = version
}
END {
for (prefix in maxversion)
print prefix maxversion[prefix]
}'
}
mapfile -d '' array < <(maxversions)
In both cases you can check the contents of array with declare -p array.
Arrays and bash string parsing.
declare -A tmp=()
for f in $SOURCE_DIR/*.json
do f=${f##*/} # strip path
tmp[${f%%.*}]=1 # strip extraneous data after . in filename
done
declare -a c=( $( printf "%s\n" "${!tmp[#]}" | cut -c 1 | sort -u ) ) # get just the first chars
declare -a lst=( $( for f in "${c[#]}"
do printf "%s\n" "${!tmp[#]}" |
grep "^${f}_" |
sort -n |
tail -1; done ) )
echo "[ ${lst[#]} ]"
[ a_v5 c_v1 f_v40 ]
Or, if you'd rather,
declare -a arr=( $(
for f in $SOURCE_DIR/*.json
do d=${f%/*} # get dir path
f=${f##*/} # strip path
g=${f:0:2} # get leading str
( cd $d && printf "%s\n" ${g}*.json |
sort -n | sed -n '$ { s/[.].*//; p; }' )
done | sort -u ) )
echo "[ ${arr[#]} ]"
[ a_v5 c_v1 f_v40 ]
This is one possible way to accomplish this :
arr=( $( { for name in $( ls {f,n,m}*.txt ); do echo ${name:0:1} ; done; } | sort | uniq ) )
Output :
$ echo ${arr[0]}
f
$ echo ${arr[1]}
m
$ echo ${arr[2]}
n
Regards!
AWK SOLUTION
This is not an elegant solution... my knowledge of awk is limited.
You should find this functional.
I've updated this to remove the obsolete uniq as suggested by #socowi.
I've also included the printf version as #socowi suggested.
ls *.json | cut -d. -f1 | sort -rn | awk -v last="xx" '$1 !~ last{ print $1; last=substr($1,1,3) }'
OR
printf %s\\n *.json | cut -d. -f1 | sort -rn | awk -v last="xx" '$1 !~ last{ print $1; last=substr($1,1,3) }'
Old understanding below
Find files with name matching pattern.
Now take the second field since your results will likely be similar to ./
find . -type f -iname "*.json" | cut -d. -f2
To get the unique headings....
find . -type f -iname "*.json" | cut -d. -f2 | sort | uniq

Shell Output Alignment

How to align this script output.
for instance in `find /bxp/*/*/*/prod/*/apache_*/httpd/htdocs/ -type f -name status.txt` ; do
echo "`hostname`: `ls -ltr | ${instance}` : `cat ${instance}`"
done
Output looks like:
r008abc, /bxp/xip/xip.pentaho-server_pentaho-server-assembly/pentaho.prod.jobengine/prod/xip.pentaho-server_web.partition_0.0.1/apache_5.3.3-2.2.
26/httpd/htdocs/status.txt, Web server is disabled
However i want the output be like:
r008abc| xip - xip.pentaho-server_web.partition_0.0.1 | Web server is disabled
xip - is nothing but the second column of the $instance - xip.pentaho-server_web.partition_0.0.1 is 6th column of the $instance. How can I achieve this. I tried awk command but it was not helpful. Your suggestion is appreciated.
Command I tried
for instance in `find /bxp/*/*/*/prod/*/apache_*/httpd/htdocs/ -type f -name status.txt` ; do
echo "`hostname`: `"ls -ltr | awk -F '/' '{print $3}"' ${instance}` : `cat ${instance}`"
done
something like this oneliner:
find /bxp/*/*/*/prod/*/apache_*/httpd/htdocs/ -type f -name status.txt | awk -F/ -v host=$(hostname) '{cmd="cat \047" $0 "\047"; if ((cmd | getline out) > 0){printf "%s| %s - %s | %s\n", host,$3, $7, out} close(cmd);}'
Explanation of awk-script:
awk -F/ # use / as field separator
-v host=$(hostname) # set var host
'{
cmd="cat \047" $0 "\047" # define cmd
if ((cmd | getline out) > 0) # read output of cmd
printf "%s| %s - %s | %s\n",host,$3,$7,out; # print formatted result
close(cmd);
}'

To get \n instead of n in echo -e command in shell script

I am trying to get the output for the echo -e command as shown below
Command used:
echo -e "cd \${2}\nfilesModifiedBetweenDates=\$(find . -type f -exec ls -l --time-style=full-iso {} \; | awk '{print \$6,\$NF}' | awk '{gsub(/-/,\"\",\$1);print}' | awk '\$1>= '$fromDate' && \$1<= '$toDate' {print \$2}' | tr \""\n"\" \""\;"\")\nIFS="\;" read -ra fileModifiedArray <<< "\$filesModifiedBetweenDates"\nfor fileModified in \${fileModifiedArray[#]}\ndo\n egrep -w "\$1" "\$fileModified" \ndone"
cd ${2}
Expected output:
cd ${2}
filesModifiedBetweenDates=$(find . -type f -exec ls -l --time-style=full-iso {} \; | awk '{print $6,$NF}' | awk '{gsub(/-/,"",$1);print}' | awk '$1>= '20140806' && $1<= '20140915' {print $2}' | tr "\n" ";")
IFS=; read -ra fileModifiedArray <<< $filesModifiedBetweenDates
for fileModified in ${fileModifiedArray[#]}
do
egrep -w $1 $fileModified
done
Original Ouput:
cd ${2}
filesModifiedBetweenDates=$(find . -type f -exec ls -l --time-style=full-iso {} \; | awk '{print $6,$NF}' | awk '{gsub(/-/,"",$1);print}' | awk '$1>= '20140806' && $1<= '20140915' {print $2}' | tr "n" ";")
IFS=; read -ra fileModifiedArray <<< $filesModifiedBetweenDates
for fileModified in ${fileModifiedArray[#]}
do
egrep -w $1 $fileModified
done
How can i handle "\" in this ?
For long blocks of text, it's much simpler to use a quoted here document than trying to embedded a multi-line string into a single argument to echo or printf.
cat <<"EOF"
cd ${2}
filesModifiedBetweenDates=$(find . -type f -exec ls -l --time-style=full-iso {} \; | awk '{print $6,$NF}' | awk '{gsub(/-/,"",$1);print}' | awk '$1>= '20140806' && $1<= '20140915' {print $2}' | tr "\n" ";")
IFS=; read -ra fileModifiedArray <<< $filesModifiedBetweenDates
for fileModified in ${fileModifiedArray[#]}
do
egrep -w $1 $fileModified
done
EOF
You'd better use printf to have a better control:
$ printf "tr %s %s\n" '"\n"' '";"'
tr "\n" ";"
As you see, we indicate the parameters within double quotes: printf "text %s %s" and then we define what content should be stored in this parameters.
In case you really have to use echo, then escape the \:
$ echo -e 'tr "\\n" ";"'
tr "\n" ";"
Interesting read: Why is printf better than echo?

bash command substitution force to foreground

I have this:
echo -e "\n\n"
find /home/*/var/*/logs/ \
-name transfer.log \
-exec awk -v SUM=0 '$0 {SUM+=1} END {print "{} " SUM}' {} \; \
> >( sed '/\b0\b/d' \
| awk ' BEGIN {printf "\t\t\tTRANSFER LOG\t\t\t\t\t#OF HITS\n"}
{printf "%-72s %-s\n", $1, $2}
' \
| (read -r; printf "%s\n" "$REPLY"; sort -nr -k2)
)
echo -e "\n\n"
When run on a machine with bash 4.1.2 always returns correctly except I get all 4 of my new lines at the top.
When run on a machine with bash 3.00.15 it gives all 4 of my new lines at the top, returns the prompt in the middle of the output, and never completes just hangs.
I would really like to fix this for both versions as we have a lot of machines running both.
Why make life so difficult and unintelligible? Why not simplify?
TXFRLOG=$(find /home..... transfer.log)
awk .... ${TXFRLOG}
The answer I found was to use a while read
echo -e "\n\n"; \
printf "\t\t\tTRANSFER LOG\t\t\t\t\t#OF HITS\n"; \
while read -r line; \
do echo "$line" |sed '/\b0\b/d' | awk '{printf "%-72s %-s\n", $1, $2}'; \
done < <(find /home/*/var/*/logs/ -name transfer.log -exec awk -v SUM=0 '$0 {SUM+=1} END{print "{} " SUM}' {} \;;) \
|sort -nr -k2; \
echo -e "\n\n"

Resources