Edit text File in unix - bash

sname;id;is_up;p_data
"0000";4256;100;"052263"
"006335";5228;100;"00522633"
"ABTEST";1452;100;"1522620 0"
How to i Edit the above file in unix to
Add 2 lines at the top for title and System date and time
Add ; at the end of each each row
Add the End tag at the end of the file
The final file should look like
!title
!Time: 2014-12-33
sname;id;is_up;p_data
"0000";4256;100;"052263";
"006335";5228;100;"00522633";
"ABTEST";1452;100;"1522620 0";
!End

Use sed to add ; to the end of line. If you want to skip the first line from this operation, use the address 1!.
{
echo '!title'
echo -n '!Time: '
date +%Y-%m-%d
sed '1! s/$/;/' file
echo '!End'
} > newfile

You can make use of BEGIN and END as this, together with $0=$0";" to add a ; at the end of the lines not being first one:
awk -v d="$(date)" '
BEGIN{print "!title"; print "!Time: ", d}
NR>1{$0=$0";"} 1;
END {print "!End"}
' file
See output:
$ awk -v d="$(date)" 'BEGIN{print "!title"; print "!Time: ", d} NR>1{$0=$0";"} 1; END {print "!End"}' file
!title
!Time: Wed Apr 30 15:25:53 CEST 2014
sname;id;is_up;p_data
"0000";4256;100;"052263";
"006335";5228;100;"00522633";
"ABTEST";1452;100;"1522620 0";
!End
For the date format, you should define which one you want. date alone prints everything. I would maybe go for:
$ date "+%F %T"
2014-04-30 15:41:04

Related

Store variables from lines in a text file using awk and cut in a for loop

I have a tab separated text file, call it input.txt
cat input.txt
Begin Annotation Diff End Begin,End
6436687 >ENST00000422706.5|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-205|APOL1|2901|protein_coding| 50 6436736 6436687,6436736
6436737 >ENST00000426053.5|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-206|APOL1|2808|protein_coding| 48 6436784 6436737,6436784
6436785 >ENST00000319136.8|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000075315.5|APOL1-201|APOL1|3000|protein_coding| 51 6436835 6436785,6436835
6436836 >ENST00000422471.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319151.1|APOL1-204|APOL1|561|nonsense_mediated_decay| 11 6436846 6436836,6436846
6436847 >ENST00000475519.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319153.1|APOL1-212|APOL1|600|retained_intron| 11 6436857 6436847,6436857
6436858 >ENST00000438034.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319152.2|APOL1-210|APOL1|566|protein_coding| 11 6436868 6436858,6436868
6436869 >ENST00000439680.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319252.1|APOL1-211|APOL1|531|nonsense_mediated_decay| 10 6436878 6436869,6436878
6436879 >ENST00000427990.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319154.2|APOL1-207|APOL1|624|protein_coding| 12 6436890 6436879,6436890
6436891 >ENST00000397278.8|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319100.4|APOL1-202|APOL1|2795|protein_coding| 48 6436938 6436891,6436938
6436939 >ENST00000397279.8|ENSG00000100342.21|OTTHUMG00000030427.9|-|APOL1-203|APOL1|1564|protein_coding| 28 6436966 6436939,6436966
6436967 >ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding| 11 6436977 6436967,6436977
6436978 >ENST00000431184.1|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319254.1|APOL1-208|APOL1|550|nonsense_mediated_decay| 11 6436988 6436978,6436988
Using the information in input.txt I want to obtain information from a file called Other_File.fa. This file is an annotation file filled with ENST#'s (transcript IDs) and sequences of A's,T's,C's,and G's. I want to store the sequence in a file called Output.log (see example below) and I want to store the command used to retrieve the text in a file called Input.log (see example below).
I have tried to do this using awk and cut so far using a for loop. This is the code I have tried.
for line in `awk -F "\\t" 'NR != 1 {print substr($2,2,17)"#"$5}' input.txt`
do
transcript=`cut -d "#" -f 1 $line`
range=`cut -d "#" -f 2 $line` #Range is the string location in Other_File.fa
echo "Our transcript is ${transcript} and our range is ${range}" >> Input.log
sed -n '${range}' Other_File.fa >> Output.log
done
Here is an example of the 11 lines between ENST00000433768.5 and ENST00000431184.1 in Other_File.fa.
grep -A 11 ENST00000433768.5 Other_File.fa
>ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding|
ATCCACACAGCTCAGAACAGCTGGATCTTGCTCAGTCTCTGCCAGGGGAAGATTCCTTGG
AGGAGCACACTGTCTCAACCCCTCTTTTCCTGCTCAAGGAGGAGGCCCTGCAGCGACATG
GAGGGAGCTGCTTTGCTGAGAGTCTCTGTCCTCTGCATCTGGATGAGTGCACTTTTCCTT
GGTGTGGGAGTGAGGGCAGAGGAAGCTGGAGCGAGGGTGCAACAAAACGTTCCAAGTGGG
ACAGATACTGGAGATCCTCAAAGTAAGCCCCTCGGTGACTGGGCTGCTGGCACCATGGAC
CCAGGCCCAGCTGGGTCCAGAGGTGACAGTGGAGAGCCGTGTACCCTGAGACCAGCCTGC
AGAGGACAGAGGCAACATGGAGGTGCCTCAAGGATCAGTGCTGAGGGTCCCGCCCCCATG
CCCCGTCGAAGAACCCCCTCCACTGCCCATCTGAGAGTGCCCAAGACCAGCAGGAGGAAT
CTCCTTTGCATGAGAGCAGTATCTTTATTGAGGATGCCATTAAGTATTTCAAGGAAAAAG
T
>ENST00000431184.1|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319254.1|APOL1-208|APOL1|550|nonsense_mediated_decay|
The range value in input.txt for this transcript is 6436967,6436977. In my file Input.log for this transcript I hope to get
Our transcript is ENST00000433768.5 and our range is 6436967,6436977
And in Output.log for this transcript I hope to get
>ENST00000433768.5|ENSG00000100342.21|OTTHUMG00000030427.9|OTTHUMT00000319253.2|APOL1-209|APOL1|541|protein_coding|
ATCCACACAGCTCAGAACAGCTGGATCTTGCTCAGTCTCTGCCAGGGGAAGATTCCTTGG
AGGAGCACACTGTCTCAACCCCTCTTTTCCTGCTCAAGGAGGAGGCCCTGCAGCGACATG
GAGGGAGCTGCTTTGCTGAGAGTCTCTGTCCTCTGCATCTGGATGAGTGCACTTTTCCTT
GGTGTGGGAGTGAGGGCAGAGGAAGCTGGAGCGAGGGTGCAACAAAACGTTCCAAGTGGG
ACAGATACTGGAGATCCTCAAAGTAAGCCCCTCGGTGACTGGGCTGCTGGCACCATGGAC
CCAGGCCCAGCTGGGTCCAGAGGTGACAGTGGAGAGCCGTGTACCCTGAGACCAGCCTGC
AGAGGACAGAGGCAACATGGAGGTGCCTCAAGGATCAGTGCTGAGGGTCCCGCCCCCATG
CCCCGTCGAAGAACCCCCTCCACTGCCCATCTGAGAGTGCCCAAGACCAGCAGGAGGAAT
CTCCTTTGCATGAGAGCAGTATCTTTATTGAGGATGCCATTAAGTATTTCAAGGAAAAAG
T
But I am getting the following error, and I am unsure as to why or how to fix it.
cut: ENST00000433768.5#6436967,6436977: No such file or directory
cut: ENST00000433768.5#6436967,6436977: No such file or directory
Our transcript is and our range is
My thought was each line from the awk would be read as a string then cut could split the string along the "#" symbol I have added, but it is reading each line as a file and throwing an error when it can't locate the file in my directory.
Thanks.
EDIT2: This is a generic solution which will compare 2 files(input and other_file.fa) and on whichever line whichever range is found it will print them. Eg--> Range numbers are found on 300 line number but range shows you should print from 1 to 20 it will work in that case also. Also note this calls system command which further calls sed command(like you were using range within sed), there are other ways too, like to load whole Input_file into an array or so and then print, but I am going with this one here, fair warning this is not tested with huge size files.
awk -F'[>| ]' '
FNR==NR{
arr[$2]=$NF
next
}
($2 in arr){
split(arr[$2],lineNum,",")
print arr[$2]
start=lineNum[1]
end=lineNum[2]
print "sed -n \047" start","end"p \047 " FILENAME
system("sed -n \047" start","end"p\047 " FILENAME)
start=end=0
}
' file1 FS="[>|]" other_file.fa
EDIT: With OP's edited samples, please try following to print lines based on other file. assumes that the line you find range values, those values will be always after the line on which they found(eg--> 3rd line range values found and range is 4 to 10).
awk -F'[>| ]' '
FNR==NR{
arr[$2]=$NF
next
}
($2 in arr){
split(arr[$2],lineNum," ")
start=lineNum[1]
end=lineNum[2]
}
FNR>=start && FNR<=end{
print
if(FNR==end){
start=end=0
}
}
' file1 FS="[>|]" other_file.fa
You need not to do this with a for loop and then call awk program each time for each line. This could be done in single awk, considering that you have to only print them. Written and tested with your shown samples.
awk -F'[>| ]' 'FNR>1{print "Our transcript is:"$3" and our range is:"$NF}' Input_file
NOTE: This will print for each line of your Input_file values of transcript and range, in case you want to further perform some operation with their values then please do mention.

Is there a way to filter a log file based on the date read as argument?

I'm processing this log file:
2021-03-21 20:06:45; ABC; 531.54
2021-03-21 20:06:47; DEF; 136. 81
2021-03-21 20:06:51; GHI; 222.34
I was wondering whether it's possible to use awk to create a filter for the file so that the only lines printed out after applying it are the ones which dates are later than the date given to the script as an argument.
I run the script as:
./script -a 2021-03-21 20:06:46
And expect the output to be:
2021-03-21 20:06:47; DEF; 136. 81
2021-03-21 20:06:51; GHI; 222.34
How can this be achieved?
If GNU Awk which supports the mktime() function is available, would please try the following:
#!/bin/bash
dy=$1 # e.g. "2021-03-21"
tm=$2 # e.g. "20:06:46"
awk -F ";" -v dy="$dy" -v tm="$tm" ' # pass bash arguments to awk
BEGIN { gsub("-", " ", dy); gsub(":", " ", tm); given = mktime(dy " " tm) }
# convert the passed day&time to the seconds since the epoch
{
str = $1; gsub("[-:]", " ", str) # extract the timestamp out of the log line
sec = mktime(str) # convert it to the seconds since the epoch
if (sec > given) print # compare with the given day&time
}
' file.log
Save the script above as a file, say script, add the executable permission with chmod a+x script, then invoke with something like ./script 2021-03-21 20:06:46.
The output will be:
2021-03-21 20:06:47; DEF; 136. 81
2021-03-21 20:06:51; GHI; 222.34
[Alteranative]
Even without the mktime() function, you can just say:
awk -F ";" -v dy="$1" -v tm="$2" '
$1 > dy " " tm
' file.log
which will output the same result. This works because the given date and time string can be compared in a dictionary order.

How to read the most recent 10 minutes of a log file [duplicate]

My server is having unusually high CPU usage, and I can see Apache is using way too much memory.
I have a feeling, I'm being DOS'd by a single IP - maybe you can help me find the attacker?
I've used the following line, to find the 10 most "active" IPs:
cat access.log | awk '{print $1}' |sort |uniq -c |sort -n |tail
The top 5 IPs have about 200 times as many requests to the server, as the "average" user. However, I can't find out if these 5 are just very frequent visitors, or they are attacking the servers.
Is there are way, to specify the above search to a time interval, eg. the last two hours OR between 10-12 today?
Cheers!
UPDATED 23 OCT 2011 - The commands I needed:
Get entries within last X hours [Here two hours]
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date) print Date FS $4}' access.log
Get most active IPs within the last X hours [Here two hours]
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date) print $1}' access.log | sort |uniq -c |sort -n | tail
Get entries within relative timespan
awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date && $4 < Date2) print Date FS Date2 FS $4}' access.log
Get entries within absolute timespan
awk -vDate=`date -d '13:20' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'13:30' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date && $4 < Date2) print $0}' access.log
Get most active IPs within absolute timespan
awk -vDate=`date -d '13:20' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'13:30' +[%d/%b/%Y:%H:%M:%S` ' { if ($4 > Date && $4 < Date2) print $1}' access.log | sort |uniq -c |sort -n | tail
yes, there are multiple ways to do this. Here is how I would go about this. For starters, no need to pipe the output of cat, just open the log file with awk.
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print Date, $0}' access_log
assuming your log looks like mine (they're configurable) than the date is stored in field 4. and is bracketed. What I am doing above is finding everything within the last 2 hours. Note the -d'now-2 hours' or translated literally now minus 2 hours which for me looks something like this: [10/Oct/2011:08:55:23
So what I am doing is storing the formatted value of two hours ago and comparing against field four. The conditional expression should be straight forward.I am then printing the Date, followed by the Output Field Separator (OFS -- or space in this case) followed by the whole line $0. You could use your previous expression and just print $1 (the ip addresses)
awk -vDate=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date {print $1}' | sort |uniq -c |sort -n | tail
If you wanted to use a range specify two date variables and construct your expression appropriately.
so if you wanted do find something between 2-4hrs ago your expression might looks something like this
awk -vDate=`date -d'now-4 hours' +[%d/%b/%Y:%H:%M:%S` -vDate2=`date -d'now-2 hours' +[%d/%b/%Y:%H:%M:%S` '$4 > Date && $4 < Date2 {print Date, Date2, $4} access_log'
Here is a question I answered regarding dates in bash you might find helpful.
Print date for the monday of the current week (in bash)
Introduction
As accepted answer from matchew is wrong, regarding Antoine's comment: Because awk will do alphanumeric comparisons. So if you logfile list events across the end and begin of two months:
[27/Feb/2023:00:00:00
[28/Feb/2023:00:00:00
[01/Mar/2023:00:00:00
awk will consider:
[01/Mar/2023:00:00:00 < [27/Feb/2023:00:00:00 < [28/Feb/2023:00:00:00
Wich is wrong! You have to compare date stings!!
For this, you could use libraries. Conforming to the language
you use.
I will present here two different way, one using perl with Date::Parse library, and another (quicker), using bash with GNU/date.
As this is a common perl task
And because this is not exactly same than extract last 10 minutes from logfile where it's about a bunch of time upto the end of logfile.
And because I've needed them, I (quickly) wrote this:
#!/usr/bin/perl -ws
# This script parse logfiles for a specific period of time
sub usage {
printf "Usage: %s -s=<start time> [-e=<end time>] <logfile>\n";
die $_[0] if $_[0];
exit 0;
}
use Date::Parse;
usage "No start time submited" unless $s;
my $startim=str2time($s) or die;
my $endtim=str2time($e) if $e;
$endtim=time() unless $e;
usage "Logfile not submited" unless $ARGV[0];
open my $in, "<" . $ARGV[0] or usage "Can't open '$ARGV[0]' for reading";
$_=<$in>;
exit unless $_; # empty file
# Determining regular expression, depending on log format
my $logre=qr{^(\S{3}\s+\d{1,2}\s+(\d{2}:){2}\d+)};
$logre=qr{^[^\[]*\[(\d+/\S+/(\d+:){3}\d+\s\+\d+)\]} unless /$logre/;
while (<$in>) {
/$logre/ && do {
my $ltim=str2time($1);
print if $endtim >= $ltim && $ltim >= $startim;
};
};
This could be used like:
./timelapsinlog.pl -s=09:18 -e=09:24 /path/to/logfile
for printing logs between 09h18 and 09h24.
./timelapsinlog.pl -s='2017/01/23 09:18:12' /path/to/logfile
for printing from january 23th, 9h18'12" upto now.
In order to reduce perl code, I've used -s switch to permit auto-assignement of variables from commandline: -s=09:18 will populate a variable $s wich will contain 09:18. Care to not miss the equal sign = and no spaces!
Nota: This hold two diffent kind of regex for two different log standard. If you require different date/time format parsing, either post your own regex or post a sample of formatted date from your logfile
^(\S{3}\s+\d{1,2}\s+(\d{2}:){2}\d+) # ^Jan 1 01:23:45
^[^\[]*\[(\d+/\S+/(\d+:){3}\d+\s\+\d+)\] # ^... [01/Jan/2017:01:23:45 +0000]
Quicker** bash version:
Answering to Gilles Quénot's comment, I've tried to create a bash version.
As this version seem quicker than perl version, I post them here:
#!/bin/bash
prog=${0##*/}
usage() {
cat <<EOUsage
Usage: $prog <start date> <end date> <logfile>
Each argument are required. End date could by `now`.
EOUsage
}
die() {
echo >&2 "ERROR $prog: $*"
exit 1
}
(($#==3))|| { usage; die 'Wrong number of arguments.';}
[[ -f $3 ]] || die "File not found."
# Conversion of argument to EPOCHSECONDS by asking `date` for the two conversions
{
read -r start
read -r end
} < <(
date -f - +%s <<<"$1"$'\n'"$2"
)
# Determing wich kind of log format, between "apache logs" and "system logs":
read -r oline <"$3" # read one log line
if [[ $oline =~ ^[^\ ]{3}\ +[0-9]{1,2}\ +([0-9]{2}:){2}[0-9]+ ]]; then
# Look like syslog format
sedcmd='s/^\([^ ]\{3\} \+[0-9]\{1,2\} \+\([0-9]\{2\}:\)\{2\}[0-9]\+\).*/\1/'
elif [[ $oline =~ ^[^\[]+\[[0-9]+/[^\ ]+/([0-9]+:){3}[0-9]+\ \+[0-9]+\] ]]; then
# Look like apache logs
sedcmd='s/^[0-9.]\+ \+[^ ]\+ \+[^ ]\+ \[\([^]]\+\)\].*$/\1/;s/:/ /;y|/|-|'
else
die 'Log format not recognized'
fi
# Print lines begining by `1<tabulation>`
sed -ne s/^1\\o11//p <(
# paste `bc` tests with log file
paste <(
# bc will do comparison against EPOCHSECONDS returned by date and $start - $end
bc < <(
# Create a bc function for testing against $start - $end.
cat <<EOInitBc
define void f(x) {
if ((x>$start) && (x<$end)) { 1;return ;};
0;}
EOInitBc
# Run sed to extract date strings from logfile, then
# run date to convert string to EPOCHSECONDS
sed "$sedcmd" <"$3" |
date -f - +'f(%s)'
)
) "$3"
)
Explanation
Script run sed to extract date strings from logfile
Pass date strings to date -f - +%s to convert in one run all strings to EPOCH (Unix Timestamp).
Run bc for the tests: print 1 if min > date > max or else print 0.
Run paste to merge bc output with logfile.
Finally run sed to find lines that match 1<tab> then replace match with nothing, then print.
So this script will fork 5 subprocess to do dedicated things by specialised tools, but won't do shell loop against each lines of logfile!
** Note:
Of course, this is quicker on my host because I run on a multicore processor, each task run parallelized!!
Conclusion:
This is not a program! This is an aggregation script!
If you consider bash not as a programming language, but as a super language or a tools aggregator, you could take the full power of all your tools!!
If someone encounters with the awk: invalid -v option, here's a script to get the most active IPs in a predefined time range:
cat <FILE_NAME> | awk '$4 >= "[04/Jul/2017:07:00:00" && $4 < "[04/Jul/2017:08:00:00"' | awk '{print $1}' | sort -n | uniq -c | sort -nr | head -20
Very quick and readable way to do it in Python. This seems to be faster than the bash version. (Computed time is displayed using an internal module which has been striped from this code)
./ext_lines.py -v -s 'Feb 12 00:23:00' -e 'Feb 15 00:23:00' -i /var/log/syslog.1
Total time : 445 ms 187 musec
Time per line : 7 musec 58 ns
Number of lines : 63,072
Number of extracted lines : 29,265
I can't compare this code with the daemon.log file used by others... But, here is my config
Operating System: Kubuntu 22.10
KDE Plasma Version: 5.25.5
KDE Frameworks Version: 5.98.0
Qt Version: 5.15.6
Kernel Version: 6.2.0-060200rc8-generic (64-bit)
Graphics Platform: X11
Processors: 16 × AMD Ryzen 7 5700U with Radeon Graphics
Memory: 14.9 GiB of RAM
The essential code could fit in just one line (dts = ...), but to make it more readable it's being "splited" in three. It's not only rather fast, it's also very compact :-)
from argparse import ArgumentParser, FileType
from datetime import datetime
from os.path import basename
from sys import argv, float_info
from time import mktime, localtime, strptime
__version__ = '1.0.0' # Workaround (internal use)
now = datetime.now
progname = basename(argv[0])
parser = ArgumentParser(description = 'Is Python strptime faster than sed and Perl ?',
prog = progname)
parser.add_argument('--version',
dest = 'version',
action = 'version',
version = '{} : {}'.format(progname,
str(__version__)))
parser.add_argument('-i',
'--input',
dest = 'infile',
default = '/var/log/syslog.1',
type = FileType('r',
encoding = 'UTF-8'),
help = 'Input file (stdin not yet supported)')
parser.add_argument('-f',
'--format',
dest = 'fmt',
default = '%b %d %H:%M:%S',
help = 'Date input format')
parser.add_argument('-s',
'--start',
dest = 'start',
default = None,
help = 'Starting date : >=')
parser.add_argument('-e',
'--end',
dest = 'end',
default = None,
help = 'Ending date : <=')
parser.add_argument('-v',
dest = 'verbose',
action = 'store_true',
default = False,
help = 'Verbose mode')
args = parser.parse_args()
verbose = args.verbose
start = args.start
end = args.end
infile = args.infile
fmt = args.fmt
############### Start code ################
lines = tuple(infile)
# Use defaut values if start or end are undefined
if not start :
start = lines[0][:14]
if not end :
end = lines[-1][:14]
# Convert start and end to timestamp
start = mktime(strptime(start,
fmt))
end = mktime(strptime(end,
fmt))
# Extract matching lines
t1 = now()
dts = [(x, line) for x, line in [(mktime(strptime(line[:14 ],
fmt)),
line) for line in lines] if start <= x <= end]
t2 = now()
# Print stats
if verbose :
total_time = 'Total time'
time_p_line = 'Time per line'
n_lines = 'Number of lines'
n_ext_lines = 'Number of extracted lines'
print(f'{total_time:<25} : {((t2 - t1) * 1000)} ms')
print(f'{time_p_line:<25} : {((t2 -t1) / len(lines) * 1000)} ms')
print(f'{n_lines:<25} : {len(lines):,}')
print(f'{n_ext_lines:<25} : {len(dts):,}')
# Print extracted lines
print(''.join([x[1] for x in dts]))
To parse the access.log precisely in a specified range, in this case only the last 10 minutes (based from EPOCH aka number of seconds since 1970/01/01):
Input file:
172.16.0.3 - - [17/Feb/2023:17:48:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
172.16.0.4 - - [17/Feb/2023:17:25:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
172.16.0.5 - - [17/Feb/2023:17:15:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"
Perl's oneliner:
With the reliable Time::Piece time parser, using strptime() to parse date, and strftime() to format new one. This module is installed in core (by default) thats is not the case with not reliable Date::Parse
$ perl -MTime::Piece -sne '
BEGIN{
my $t = localtime;
our $now = $t->epoch;
our $monthsRe = join "|", $t->mon_list;
}
m!\[(\d{2}/(?:$monthsRe)/\d{4}:\d{2}:\d{2}:\d{2})\s!;
my $d = Time::Piece->strptime("$1", "%d/%b/%Y:%H:%M:%S");
my $old = $d->strftime("%s");
my $diff = (($now - $old) + $gap);
if ($diff > $min and $diff < $max) {print}
' -- -gap=$({ echo -n "0"; date "+%:::z*3600"; } | bc) \
-min=0 \
-max=600 access.log
Explanations of arguments: -gap, -min, -max switches
-gap the $((7*3600)) aka 25200 seconds, is the gap with UTC : +7 hours in seconds in my current case 🇹🇭 (Thai TZ) ¹ rewrote as { echo -n "0"; date "+%:::z*3600"; } | bc if you have GNU date. If not, use another way to set the gap
-min the min seconds since we print log matching line(s)
-max the max seconds until we print log matching line(s)
to know the gap from UTC, take a look to:
¹
$ LANG=C date
Fri Feb 17 15:50:13 +07 2023
The +07 is the gap.
This way, you can filter exactly at the exact seconds range with this snippet.
Sample output
172.16.0.3 - - [17/Feb/2023:17:48:41 +0200] "GET / HTTP/1.1" 200 123 "" "Mozilla/5.0 (compatible; Konqueror/2.2.2-2; Linux)"

Changing the date string format

Hello Sed/Bash/Awk experts,
I have a file full of dates in the following format:
Feb 5 2015
Nov 25 2014
Apr 16 2015
What I would like is to convert them to this format:
YYYY-MM-DD
So they should look like this:
2015-02-05
2014-11-25
2015-04-16
Thanks for your help.
You can simply use:
date -f dates.txt +%Y-%m-%d
In the -f option you can provide your input file with one date per line.
Using awk
awk 'BEGIN{x=" JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC"}
{printf "%04d-%02d-%02d\n",$3,index(x,toupper($1))/3,$2}' file
the date command is your friend here:
date --date="Feb 5 2015" +"%Y-%m-%d"
2015-02-05
so, you can say:
$ cat my_file | while read -r dt
> do
> date --date="${dt}" +"%Y-%m-%d"
> done
2015-02-05
2014-11-25
2015-04-16
paste the following:
{
month="00";
mon=toupper($1);
if(mon=="JAN") month="01";
else if(mon=="FEB") month="02";
else if(mon=="MAR") month="03";
else if(mon=="APR") month="04";
else if(mon=="MAY") month="05";
else if(mon=="JUN") month="06";
else if(mon=="JUL") month="07";
else if(mon=="AUG") month="08";
else if(mon=="SEP") month="09";
else if(mon=="OCT") month="10";
else if(mon=="NOV") month="11";
else if(mon=="DEC") month="12";
printf("%s-%s-%02d\n", $3, month, $2)
}
into a file (We'll refer to the filename as [script_filename]
execute the following:
awk -F' ' -E [script_filename] [date_filename]
Where [date_filename] refers to the file which contains the dates you wish to convert.

Calculate difference between two number in a file

I want to know if it is possible to calculate the difference between two float number contained in a file in two distinct lines in one bash command line.
File content example :
Start at 123456.789
...
...
...
End at 123654.987
I would like to do an echo of 123654.987-123456.789
Is that possible? What is this magic command line ?
Thank you!
awk '
/Start/ { start = $3 } # 3rd field in line matching "Start"
/End/ {
end = $3; # 3rd field in line matching "End"
print end - start # Print the difference.
}
' < file
If you really want to do this on one line:
awk '/Start/ { start = $3 } /End/ { end = $3; print end - start }' < file
you can do this with this command:
start=`grep 'Start' FILENAME| cut -d ' ' -f 3`; end=`grep 'End' FILENAME | cut -d ' ' -f 3`; echo "$end-$start" | bc
You need the 'bc' program for this (for floating point math). You can install it with apt-get install bc, or yum, or rpm, zypper... OS specific :)
Bash doesn't support floating point operations. But you can split your numbers to parts and perform integer operations. Example:
#!/bin/bash
echo $(( ${2%.*} - ${1%.*} )).$(( ${2#*.} - ${1#*.} ))
Result:
./test.sh 123456.789 123654.987
198.198
EDIT:
Correct solution would be using not command line hack, but tool designed or performing fp operations. For example, bc:
echo 123654.987-123456.789 | bc
output:
198.198
Here's a weird way:
printf -- "-%s+%s\n" $(grep -oP '(Start|End) at \K[\d.]+' file) | bc

Resources