How can I extract the data between two time in two or more log files

How can I extract the data between two time in two or more log files - bash

I have two log files namely, Log1.log and Log2.log each containing following data.
Log1.log:
Apr 10 02:07:20 Data 1
May 10 04:11:09 Data 2
June 11 06:22:35 Data 3
Aug 12 09:08:07 Data 4
Log2.log
Apr 10 09:07:20 Data 1
Apr 10 10:07:10 Data 2
Jul 11 11:07:30 Data 3
Aug 18 12:50:40 Data 4
What command I can use to get the data between Apr 10 02:07:20 to Aug 18 12:50:40.
I have used
$ awk -v start=01:06:04 -v stop=01:07:16 'start <= $3 && $3 <= stop' Log1.log Log2.log
I have also used
awk -v StartTime="$StartTime" -v EndTime="$EndTime" -f script.sh Log1.log Log2.log
where script.sh contains,
BEGIN { Keep = 0;}
{
if($3 >= StartTime)
{
keep = 1;
}
if ($3 > EndTime)
{
exit;
}
if(keep)
{
print;
}
}
I am not getting the desired result. Can someone help me in improving me answer?Thanks in advance

I would first use sort to sort the input. Then I would use sed to extract that range:
LC_TIME=C sort -t' ' -k1,1M -k2,3n 1.log 2.log \
| sed -n '/Apr 10 02:07:20/,/Aug 18 12:50:40/p'
Btw, it is not fully clear to me if you want to exclude or include the range borders. The above example includes them, the below example excludes them:
LC_TIME=C sort -t' ' -k1,1M -k2,3n 1.log 2.log \
| sed -n '/Apr 10 02:07:20/,/Aug 12 09:08:07/{/Apr 10 02:07:20/!{/Aug 12 09:08:07/!p}}
At least GNU sed allows to simplify the latter command to:
LC_TIME=C sort -t' ' -k1,1M -k2,3n 1.log 2.log \
| sed -n '/Apr 10 02:07:20/,/Aug 12 09:08:07/{//!p}'

Related

Get the longest logon time of a given user using awk

My task is to write a bash script, using awk, to find the longest logon of a given user ("still logged in" does not count), and print the month day IP logon time in minutes.
Sample input: ./scriptname.sh username1
Content of last username1:
username1 pts/ IP Apr 2 .. .. .. .. (00.03)
username1 pts/ IP Apr 3 .. .. .. .. (00.13)
username1 pts/ IP Apr 5 .. .. .. .. (12.00)
username1 pts/ IP Apr 9 .. .. .. .. (12.11)
Sample output:
Apr 9 IP 731
(note: 12 hours and 11 minutes is in total 731 minutes)
I have written this script, but a bunch of errors pop up, and I am really confused:
#!/bin/bash
usr=$1
last $usr | grep -v "still logged in" | awk 'BEGIN {max=-1;}
{
h=substr($10,2,2);
min=substr($10,5,2) + h/60;
}
(max < min){
max = min;
}
END{
maxh=max/60;
maxmin=max-maxh;
($maxh == 0 && $maxmin >=10){
last $usr | grep "00:$maxmin" | awk '{print $5," ",$6," ", $3," ",$maxmin}'
exit 1
}
($maxh == 0 $$ $maxmin < 10){
last $usr | grep "00:0$maxmin" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
($maxh < 10 && $maxmin == 0){
last $usr | grep "0$maxh:00" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
($maxh < 10 && $maxmin < 10){
last $usr | grep "0$maxh:0$maxmin" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
($maxh >= 10 && $maxmin < 10){
last $usr | grep "$maxh:0$maxmin" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
($maxh >=10 && $maxmin >= 10){
last $usr | grep "$maxh:$maxmin" | awk '{print $5," ",$6," ",$3," ",$maxmin}'
exit 1
}
}'
So a bit of explaining of how I imagined this would work:
After the initialization, I want to find the (hh:mm) column of the last $usr command, save the h and min of every line, find the biggest number (in minutes, meaning it is the longest logon time).
After I found the longest logon time (in minutes, stored in the variable max), I then have to reformat the only minutes format to hh:mm to be able to use a grep, use the last command again, but now only searching for the line(s) that contain the max logon time, and print all of the needed information in the month day IP logon time in minutes format, using another awk.
Errors I get when running this code: A bunch of syntax errors when I try using grep and awk inside the original awk.

awk is not shell. You can't directly call tools like last, grep and awk from awk any more than you could call them directly from a C program.
Using any awk in any shell on every Unix box and assuming if multiple rows have the max time you'd want all of them printed and that if no timestamped rows are found you want something like No matching records printed (easy tweak if not, just tell us your requirements for those cases and include them in the example in your question):
last username1 |
awk '
/still logged in/ {
next
}
{
split($NF,t,/[().]/)
cur = (t[2] * 60) + t[3]
}
cur >= max {
out = ( cur > max ? "" : out ORS ) $4 OFS $5 OFS $3 OFS cur
max = cur
}
END {
print (out ? out : "No matching records")
}
'
Apr 9 IP 731

If gnu-awk is available, you might use a pattern with 2 capture groups for the numbers in the last field. In the END block print the format that you want.
If in this example, file contains the example content, and the last column contains the logon:
awk '
match ($(NF), /\(([0-9]+)\.([0-9]+)\)/, a) {
hm = (a[1] * 60) + a[2]
if(hm > max) {max = hm; line = $0;}
}
END {
n = split(line,a,/[[:space:]]+/)
print a[3], a[4], a[5], max
}
' file
Output
IP Apr 9 731

Testing last command in my machine:
Using Red Hat Linux 7.8
Got the following output:
user0022 pts/1 10.164.240.158 Sat Apr 25 19:32 - 19:47 (00:14)
user0022 pts/1 10.164.243.80 Sat Apr 18 22:31 - 23:31 (1+01:00)
user0022 pts/1 10.164.243.164 Sat Apr 18 19:21 - 22:05 (02:43)
user0011 pts/0 10.70.187.1 Thu Nov 21 15:26 - 18:37 (03:10)
user0011 pts/0 10.70.187.1 Thu Nov 7 16:21 - 16:59 (00:38)
astukals pts/0 10.70.187.1 Mon Oct 7 19:10 - 19:13 (00:03)
reboot system boot 3.10.0-957.10.1. Mon Oct 7 22:09 - 14:30 (156+17:21)
astukals pts/0 10.70.187.1 Mon Oct 7 18:56 - 19:08 (00:12)
reboot system boot 3.10.0-957.10.1. Mon Oct 7 21:53 - 19:08 (-2:-44)
IT pts/0 10.70.187.1 Mon Oct 7 18:50 - 18:53 (00:03)
IT tty1 Mon Oct 7 18:48 - 18:49 (00:00)
user0022 pts/1 30.30.30.168 Thu Apr 16 09:43 - 14:54 (05:11)
user0022 pts/1 30.30.30.59 Wed Apr 15 11:48 - 04:59 (17:11)
user0022 pts/1 30.30.30.44 Tue Apr 14 19:03 - 04:14 (09:11)
Found time format is DD+HH:MM appears only when DD is not zero.
Found there are additional technical users: IT, system, reboot need to filtered.
Suggesting solution:
last | awk 'BEGIN {FS="[ ()+:]*"}
/reboot|system|still/{next}
{ print $5 OFS $6 OFS $3 OFS $(NF-1) + ($(NF-2) * 60) + ($(NF-3) * 60 * 24)}
' |sort -nk 4| head -1
Result:
Apr 15 30.30.30.59 85991

distribute data in both increment and decrement order

I have a file which has n number of rows, i want it's data to be distributed in 7 files as per below order
** my input file has n number of rows, this is just an example.
Input file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
5
16
17
.
.
28
Output file
1 2 3 4 5 6 7
14 13 12 11 10 9 8
15 16 17 18 19 20 21
28 27 26 25 24 23 22
so if i open the first file it should have rows
1
14
15
28
similarly if i open the second file it should have rows
2
13
16
27
similarly output for the other files as well.
Can anybody please help, with below code it is doing what is required but not in required order.
awk '{print > ("te1234"++c".txt");c=(NR%n)?c:0}' n=7 test6.txt
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28

EDIT: Since OP has changed sample of Input_file totally different so adding this solution now, again this is written and tested with shown samples only.
With xargs + single awk: (recommended one)
xargs -n7 < Input_file |
awk '
FNR%2!=0{
for(i=1;i<=NF;i++){
print $i >> (i".txt")
close(i".txt")
}
next
}
FNR%2==0{
for(i=NF;i>0;i--){
count++
print $i >> (count".txt")
close(i".txt")
}
count=""
}'
Initial solution:
xargs -n7 < Input_file |
awk '
FNR%2==0{
for(i=NF;i>0;i--){
val=(val?val OFS:"")$i
}
$i=val
val=""
}
1' |
awk '
{
for(i=1;i<=NF;i++){
print $i >> (i".txt")
close(i".txt")
}
}'
Above could be done with single awk too will add xargs + awk(single) solution in few mins too.
Could you please try following, written and tested with shown samples in GNU awk.
awk '{for(i=1;i<=NF;i++){print $i >> (i".txt");close(i".txt")}}' Input_file

The output file counter could descend for each second group of seven:
awk 'FNR%n==1 {asc=!asc}
{
out="te1234" (asc ? ++c : c--) ".txt";
print >> out;
close(out)
}' n=7 test6.txt

$ ls
file tst.awk
$ cat tst.awk
{ rec = (cnt % 2 ? $1 sep rec : rec sep $1); sep=FS }
!(NR%n) {
++cnt
nf = split(rec,flds)
for (i=1; i<=nf; i++) {
out = "te1234" i ".txt"
print flds[i] >> out
close(out)
}
rec=sep=""
}
.
$ awk -v n=7 -f tst.awk file
.
$ ls
file te12342.txt te12344.txt te12346.txt tst.awk
te12341.txt te12343.txt te12345.txt te12347.txt
$ cat te12341.txt
1
14
15
28
$ cat te12342.txt
2
13
16
27
If you can have input that's not an exact multiple of n then move the code that's currently in the !(NR%n) block into a function and call that function there and in an END section.

This might work for you (GNU sed & parallel):
parallel 'echo {1}~14w file{1}; echo {2}~14w file{1}' ::: {1..7} :::+ {14..8} |
sed -n -f - file &&
paste file{1..7}
Create a sed script to write files named filen where n is 1 thru 7 (see above first set of parameters in the parallel command and also in the paste command).
The sed script uses the n~m address where n is the starting address and m is the modulo thereafter.
The distributed files are created first and the paste command then joins them all together to produce a single output file (tab separated by default, use paste -d option to get desired delimiter).
Alternative using Bash & sed:
for ((n=1,m=14;n<=7;n++,m--));do echo "$n~14w file$n";echo "$m~14w file$n";done |
sed -nf - file &&
paste file{1..7}

Dividing one file into separate based on line numbers

I have the following test file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
I want to separate it in a way that each file contains the last line of the previous file as the first line. The example would be:
file 1:
1
2
3
4
5
file2:
5
6
7
8
9
file3:
9
10
11
12
13
file4:
13
14
15
16
17
file5:
17
18
19
20
That would make 4 files with 5 lines and 1 file with 4 lines.
As a first step, I tried to test the following commands I wrote to get only the first file which contains the first 5 lines. I can't figure out why the awk command in the if statement, instead of printing the first 5 lines, it prints the whole 20?
d=$(wc test)
a=$(echo $d | cut -f1 -d " ")
lines=$(echo $a/5 | bc -l)
integer=$(echo $lines | cut -f1 -d ".")
for i in $(seq 1 $integer); do
start=$(echo $i*5 | bc -l)
var=$((var+=1))
echo start $start
echo $var
if [[ $var = 1 ]]; then
awk 'NR<=$start' test
fi
done
Thanks!

Why not just use the split util available from your POSIX toolkit. It has an option to split on number of lines which you can give it as 5
split -l 5 input-file
From the man split page,
-l, --lines=NUMBER
put NUMBER lines/records per output file
Note that, -l is POSIX compliant also.

$ ls
$
$ seq 20 | awk 'NR%4==1{ if (out) { print > out; close(out) } out="file"++c } {print > out}'
$
$ ls
file1 file2 file3 file4 file5
.
$ cat file1
1
2
3
4
5
$ cat file2
5
6
7
8
9
$ cat file3
9
10
11
12
13
$ cat file4
13
14
15
16
17
$ cat file5
17
18
19
20
If you're ever tempted to use a shell loop to manipulate text again, make sure to read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice first to understand at least some of the reasons to use awk instead. To learn awk, get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
oh. and wrt why your awk command awk 'NR<=$start' test didn't work - awk is not shell, it has no more access to shell variables (or vice-versa) than a C program does. To init an awk variable named awkstart with the value of a shell variable named start and then use that awk variable in your script you'd do awk -v awkstart="$start" 'NR<=awkstart' test. The awk variable can also be named start or anything else sensible - it is completely unrelated to the name of the shell variable.

You could improve your code by removing the unneccesary echo cut and bc and do it like this
#!/bin/bash
for i in $(seq $(wc -l < test) ); do
(( i % 4 != 1 )) && continue
tail +$i test | head -5 > "file$(( 1+i/4 ))"
done
But still the awk solution is much better. Reading the file only once and taking actions based on readily available information (like the linenumber) is the way to go. In shell you have to count the lines, there is no way around it. awk will give you that (and a lot of other things) for free.

Use split:
$ seq 20 | split -l 5
$ for fn in x*; do echo "$fn"; cat "$fn"; done
xaa
1
2
3
4
5
xab
6
7
8
9
10
xac
11
12
13
14
15
xad
16
17
18
19
20
Or, if you have a file:
$ split -l test_file

pick up files based on dates in ksh script

I have this list of files . Now I will have to pick the latest file based on some condition
3679 Jul 21 23:59 belk_rpo_error_**po9324892**_07212014.log
0 Jul 22 23:59 belk_rpo_error_**po9324892**_07222014.log
3679 Jul 23 23:59 belk_rpo_error_**po9324892**_07232014.log
22 Jul 22 06:30 belk_rpo_error_**po9324267**_07012014.log
0 Jul 20 05:50 belk_rpo_error_**po9999992**_07202014.log
411 Jul 21 06:30 belk_rpo_error_**po9999992**_07212014.log
742 Jul 21 07:30 belk_rpo_error_**po9999991**_07212014.log
0 Jul 23 2014 belk_rpo_error_**po9999991**_07232014.log
For a PATRICULAR Order_No(Marked with ** **)
If the latest file is 0 kB then we will discard it (rest of the files with same Order_no as well)
if the latest file is non Zero then I will take it.(Only the latest one)
Then append the contents in a txt file .
My expected output would be ::
411 Jul 21 06:30 belk_rpo_error_**po9999992**_07212014.log
3679 Jul 23 23:59 belk_rpo_error_**po9324892**_07232014.log
22 Jul 22 06:30 belk_rpo_error_**po9324267**_07012014.log
I am at my wits end here. I cant seem to figure out how to compare dates in Unix. Any help is very appreciated.

You can try something like:
touch test.txt
for var in ` find . ! -empty -exec ls -r {} \;`
do
cat $var>>test.txt
done

untested
use stat to emit date (epoch time), size and filename.
use awk to filter out zero-length files and extract order number.
sort by order number and date
awk to pick up the last filename for each order number
stat -c $'%Y\t%s\t%n' *.log |
awk -F'\t' -v OFS='\t' '
$2 > 0 {
split($3, a, /_/)
print a[4], $1, $3
}' |
sort -t $'\t' -k1,1 -k2,2n |
awk -F'\t' '
NR > 1 && $1 != prev_order {print filename}
{filename = $3; prev_order = $1}
END {print filename}
'
The sort command might be wrong: In order to group by order number, you might need to sort first by file time then by order number.
If I understand your question, the resulting files need to be concatenated and appended to a file. If the above pipeline is working OK, then pipe into | xargs cat >> something.log

Grep to multiple output files

I have one huge file (over 6GB) and about 1000 patterns. I want extract lines matching each of the pattern to separate file. For example my patterns are:
1
2
my file:
a|1
b|2
c|3
d|123
As a output I would like to have 2 files:
1:
a|1
d|123
2:
b|2
d|123
I can do it by greping file multiple times, but it is inefficient for 1000 patterns and huge file. I also tried something like this:
grep -f pattern_file huge_file
but it will make only 1 output file. I can't sort my huge file - it takes to much time. Maybe AWK will make it?

awk -F\| 'NR == FNR {
patt[$0]; next
}
{
for (p in patt)
if ($2 ~ p) print > p
}' patterns huge_file
With some awk implementations you may hit the max number of open files limit.
Let me know if that's the case so I can post an alternative solution.
P.S.: This version will keep only one file open at a time:
awk -F\| 'NR == FNR {
patt[$0]; next
}
{
for (p in patt) {
if ($2 ~ p) print >> p
close(p)
}
}' patterns huge_file

You can accomplish this (if I understand the problem) using bash "process substitution", e.g., consider the following sample data:
$ cal -h
September 2013
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
Then selective lines can be grepd to different output files in a single command as:
$ cal -h \
| tee >( egrep '1' > f1.txt ) \
| tee >( egrep '2' > f2.txt ) \
| tee >( egrep 'Sept' > f3.txt )
In this case, each grep is processing the entire data stream (which may or may not be what you want: this may not save a lot of time vs. just running concurrent grep processes):
$ more f?.txt
::::::::::::::
f1.txt
::::::::::::::
September 2013
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
::::::::::::::
f2.txt
::::::::::::::
September 2013
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30
::::::::::::::
f3.txt
::::::::::::::
September 2013

This might work for you (although sed might not be the quickest tool!):
sed 's,.*,/&/w &_file,' pattern_file > sed_file
Then run this file against the source:
sed -nf sed_file huge_file
I did a cursory test and the GNU sed version 4.1.5 I was using, easily opened 1000 files OK, however your unix system may well have smaller limits.

Grep cannot output matches of different patterns to different files. Tee is able to redirect it's input into multiple destinations, but i don't think this is what you want.
Either use multiple grep commands or write a program to do it in Python or whatever else language you fancy.

I had this need, so I added the capability to my own copy of grep.c that I happened to have lying around. But it just occurred to me: if the primary goal is to avoid multiple passes over a huge input, you could run egrep once on the huge input to search for any of your patterns (which, I know, is not what you want), and redirect its output to an intermediate file, then make multiple passes over that intermediate file, once per individual pattern, redirecting to a different final output file each time.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How can I extract the data between two time in two or more log files - bash

Related

Get the longest logon time of a given user using awk

distribute data in both increment and decrement order

Dividing one file into separate based on line numbers

pick up files based on dates in ksh script

Grep to multiple output files

Categories

Resources