using if in awk in comparision with todays date - bash

I am looking for a command which helps me use if in awk and equates it to the current date.
A/B folder has files with different dates. I need to filter out files of the present day whenever script runs
A) Gives an output with all the dates,
date=`date +"%Y-%m-%d"`
s3cmd ls --recursive s3://A/B/ | grep A-B | grep .tar | awk '{ if ($1 -eq "$date" ) print $1" "$2 " " $3 " " $4 }' | sort -r
B)Replaces $1 which contains dates with "$date" to all of them
date=`date +"%Y-%m-%d"`
s3cmd ls --recursive s3://A/B/ | grep A-B | grep .tar | awk '{ if ($1 = "$date" ) print $1" "$2 " " $3 " " $4 }' | sort -r
C)Does not give any output. leaves blank
date=`date +"%Y-%m-%d"`
s3cmd ls --recursive s3://A/B/ | grep A-B | grep .tar | awk '{ if ($1 == "$date" ) print $1" "$2 " " $3 " " $4 }' | sort -r
if I remove "" it does not give me any output in all the cases.

The shell doesn't substitute variables inside double quotes. You should assign an awk variable from the shell variable. Also, the equality comparison is ==, not -eq or =.
awk -v date="$date" '$1 == date { print $1" "$2 " " $3 " " $4 }'

You don't really need awk for that. Just use find and say
find /path/to/search/ -type f ! -newermt $(date +"%Y-%m-%d")
$(..) is command substitution and what it will do is expand to current date in the format YYYY-MM-DD.
! -newermt is find option to look for files older than specified date
-type f will only look for files

awk is not shell. It is a completely separate tool with it's own syntax and capabilities. Therefore you should not expect to be able to use shell variables or shell syntax in an awk script. Try this:
s3cmd ls --recursive s3://A/B/ |
awk -v date="$date" -v OFS=" " '/A-B/ && /.tar/ && ($1 == date) { print $1, $2, $3, $4 }' |
sort -r
You probably actually meant \.tar instead of .tar though and as #jaypal said, this is a job for find, not ls piped to awk.

Related

Splitting out a large file

I would like to process a 200 GB file with lines like the following:
...
{"captureTime": "1534303617.738","ua": "..."}
...
The objective is to split this file into multiple files grouped by hours.
Here is my basic script:
#!/bin/sh
echo "Splitting files"
echo "Total lines"
sed -n '$=' $1
echo "First Date"
head -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
echo "Last Date"
tail -n1 $1 | jq '.captureTime' | xargs -i date -d '#{}' '+%Y%m%d%H'
while read p; do
date=$(echo "$p" | sed 's/{"captureTime": "//' | sed 's/","ua":.*//' | xargs -i date -d '#{}' '+%Y%m%d%H')
echo $p >> split.$date
done <$1
Some facts:
80 000 000 lines to process
jq doesn't work well since some JSON lines are invalid.
Could you help me to optimize this bash script?
Thank you
This awk solution might come to your rescue:
awk -F'"' '{file=strftime("%Y%m%d%H",$4); print >> file; close(file) }' $1
It essentially replaces your while-loop.
Furthermore, you can replace the complete script with:
# Start AWK file
BEGIN{ FS='"' }
(NR==1){tmin=tmax=$4}
($4 > tmax) { tmax = $4 }
($4 < tmin) { tmin = $4 }
{ file="split."strftime("%Y%m%d%H",$4); print >> file; close(file) }
END {
print "Total lines processed: ", NR
print "First date: "strftime("%Y%m%d%H",tmin)
print "Last date: "strftime("%Y%m%d%H",tmax)
}
Which you then can run as:
awk -f <awk_file.awk> <jq-file>
Note: the usage of strftime indicates that you need to use GNU awk.
you can start optimizing by changing this
sed 's/{"captureTime": "//' | sed 's/","ua":.*//'
with this
sed -nE 's/(\{"captureTime": ")([0-9\.]+)(.*)/\2/p'
-n suppress automatic printing of pattern space
-E use extended regular expressions in the script

Argument not recognised/accesed by egrep - Shell

Egrep and Awk to output columns of a line , with a specific value for the first column
I am to tasked to write a shell program which when ran as such
./tool.sh -f file -id id OR ./tool.sh -id id -f file
must output the name surname and birthdate (3 columns of the file ) for that specific id.
So far my code is structured as such :
elif [ "$#" -eq 4 ];
then
while [ "$1" != "" ];
do
case $1 in
-f)
cat < "$2" | egrep '"$4"' | awk ' {print $3 "\t" $2 "\t" $5}'
shift 4
;;
-id)
cat < "$4" | egrep '"$2"' | awk ' {print $3 "\t" $2 "\t" $5}'
shift 4
esac
done
(Ignoring the opening elif cause there are more subtasks for later)
My output is nothing. The program just runs.
I've tested the cat < people.dat | egrep '125' | awk ' {print $3 "\t" $2 "\t" $5}'
and it runs just fine.
I also had an instance where i had an output from the program while it was run like so
cat < "$2" | egrep '["$4"]' | awk ' {print $3 "\t" $2 "\t" $5}'
but it wasnt only that specific ID.
`egrep "$4"` was correct instead of `egrep '["$4"]'` in
`cat < "$2" | egrep '["$4"]' | awk ' {print $3 "\t" $2 "\t" $5}'`
Double quotes allow variables, single quotes don't. No commands need
certain types of quotes, they are purely a shell feature that are not
passed to the command. mentioned by(#that other guy)

Creating directories from list preserving whitespaces

I have list of names in a file that I need to create directories from. The list looks like
Ada Lovelace
Jean Bartik
Leah Culver
I need the folders to be the exact same, preserving the whitespace(s). But with
awk '{print $0}' myfile | xargs mkdir
I create separate folders for each word
Ada
Lovelace
Jean
Bartik
Leah
Culver
Same happens with
awk '{print $1 " " $2}' myfile | xargs mkdir
Where is the error?
Using gnu xargs you can use -d option to set delimiter as \n only. This way you can avoid awk also.
xargs -d '\n' mkdir -p < file
If you don't have gnu xargs then you can use tr to convert all \n to \0 first:
tr '\n' '\0' < file | xargs -0 mkdir
#birgit:try: Completely based on your sample Input_file provided.
awk -vs1="\"" 'BEGIN{printf "mkdir ";}{printf("%s%s%s ",s1,$0,s1);} END{print ""}' Input_file | sh
awk '{ system ( sprintf( "mkdir \"%s\"", $0)) }' YourFile
# OR
awk '{ print"mkdir "\"" $0 "\"" | "/bin/sh" }' YourFile
# OR for 1 subshell
awk '{ Cmd = sprintf( "%s%smkdir \"%s\"", Cmd, (NR==1?"":"\n"), $0) } END { system ( Cmd ) }' YourFile
Last version is better due to creation of only 1 subshell.
If there are a huge amount of folder (shell parameter limitation), you could loop and create smaller command several times

Converting date to unix epoch using awk in log files

I have file containing multiple lines in format "[dd.mm.yyyy.] text value". I need to convert this to "Unix epoch| text value". I tried to use awk to do this but I can't seem to find the correct command
For example, if the file is:
[30.08.2013 13:54:49.126] Foo
[30.08.2013 13:56:49.126] Bar
[30.08.2013 13:59:49.126] Foo bar
I use the following (probably too complex awk command):
cat sample.txt | cut -c 2- |awk -F'[. :]' ' { $cmd="date --date " "\""$3$2$1" "$4":"$5":"$6"\""" +%s" ; $cmd |& getline epoch; close($cmd); printf epoch"|"; print $0 ;}';
The problem is that I get the time in epoch correctly but I can't access the rest of the line. The $0 (and other $ variables) contain the date command. So the output is
1377863689|date --date "20130830 13:54:49" +%s
1377863809|date --date "20130830 13:56:49" +%s
1377863989|date --date "20130830 13:59:49" +%s
What I wish to get is
1377863689|Foo
1377863809|Bar
1377863989|Foo bar
Is there a (preferably simple) way of accomplishing this? Should I use some other tool?
Assuming you have gawk (fair assumption since you are using GNU date) you can do this all internally to gawk:
$ awk 'match($0, /\[(.*)\] (.*)/, a) &&
match(a[1], /([0-9]{2})\.([0-9]{2})\.([0-9]{4}) ([0-9:]+)(\.[0-9]+)/,b) {
gsub(/:/," ",b[4])
s=b[3] " " b[2] " " b[1] " " b[4]
print mktime(s) "|" a[2]
}' file
1377896089|Foo
1377896209|Bar
1377896389|Foo bar
Or, a Bash solution:
while IFS= read -r line; do
if [[ "$line" =~ \[([[:digit:]]{2})\.([[:digit:]]{2})\.([[:digit:]]{4})\ +([[:digit:]:]+)\.([[:digit:]]+)\]\ +(.*) ]]
then
printf "%s|%s\n" $(gdate +"%s" --date="${BASH_REMATCH[3]}${BASH_REMATCH[2]}${BASH_REMATCH[1]} ${BASH_REMATCH[4]}") "${BASH_REMATCH[6]}"
fi
done <file
I propose to simplify it to
IFS=' |.|[';
while read -r _ day month year hour _ name; do
date=$(date --date "$year$month$day $hour" +%s);
echo "$date|$name";
done < sample.txt
Or, if you prefer to continue with awk
awk -F'[\\[\\]. ]' '{
split($0,a,"] ")
("date --date \"" $4$3$2" "$5"\" +%s") |& getline date
printf "%s|%s\n",date,a[2]
}' sample.txt

extract information regarding : size && time && row_count in one line shell script

Hey every one! I am pretty new for shell script and I am stuck
I need to extract information regarding: file_name && size && time && row_count and I want it do in one command line. I tried like this :
ls -l * && wc -l file.txt && du -ks file.txt | cut -f1| awk '{print $5" " $6 " " $7 " "$8 " " $9 " "$1 " "$2}'
but is not working properly
I also tried do in loop but i dont know how extract from there
for file in `ls -ltr /export/home/oracle/dbascripts/scripts`
do
[[ -f $file ]] && echo $file | awk '{print $3}'
done
Then I want to redirect to file like this >> for sql loader purpose.
Thanks in advance!
This could be a start if you have GNU find and GNU coreutils (most Linux distribution will do):
for i in /my/path/*; do
find "$i" ! -type d -printf '%p %TY-%Tm-%Td %TH:%TM:%TS %s '
wc -l <"$i"
done
/my/path/* should be modified to reflect the files you want to probe.
Also keep in mind that this one-liner has a few major issues if any directories are specified. This should be safer in that regard:
for i in *; do
if [[ -d "$i" ]]; then
continue
fi
find "$i" -printf '%p %TY-%Tm-%Td %TH:%TM:%TS %s '
wc -l <"$i"
done
You will want to see the manual page for GNU find to understand this better.
EDIT:
There is at least other faster way, using join and bash process substitution, but it's a bit ugly and somewhat harder to make safe and work the kinks out of.
ExtractInformation()
{
timesep="-"
sep="|"
dot=":"
sec="00"
lcount=`wc -l < $fname`
modf_time=`ls -l $fname`
f_size=`echo $modf_time | awk '{print $5}'`
time_month=`echo $modf_time | awk '{print $6}'`
time_day=`echo $modf_time | awk '{print $7}'`
time_hrmin=`echo $modf_time | awk '{print $8}'`
time_hr=`echo $time_hrmin | cut -d ':' -f1`
time_min=`echo $time_hrmin | cut -d ':' -f2`
time_year=`date '+%Y'`
time_param="DD-MON-YYYY HH24:MI:SS"
time_date=$time_day$timesep$time_month$timesep$time_year" "$time_hrmin$dot$sec
result=$fname$sep$time_date$sep$f_size$sep$lcount$sep$time_param
sqlresult=`echo $result | awk '{FS = "|" ;q=sprintf("%c", 39); print "INSERT INTO SIP_ICMS_FILE_T(f_name, f_date_time,f_size,f_row_count) VALUES (" q $1 q ", TO_DATE("q $2 q,q $5 q "),"$3","$4");";}'`
echo $sqlresult>>data.sql
echo "Reading data....."
}
UploadData()
{
#ss=`sqlplus -s a/a#adb #data.sql
#set serveroutput on
#set feedback off
#set echo off`
echo "loading with sql Loader....."
}
f_data=data.sql
[[ -f $f_data ]] && rm data.sql
for fname in * ;
do
if [[ -f $fname ]] then
ExtractInformation
fi
UploadData
#Zipdata
done

Resources