How to reduce live log data? - bash

A program produces a log file, which I am watching. Unfortunately, the log file includes sometimes 50 times the same Line 1.
Is there a possibility to get instead of
program.sh
Line 1
Line 1
Line 1
Line 1
...
Line 1
Line 1
Line 2
just something like:
program.sh
Line 1
\= repeated 43 times
Line 2

You can use this awk:
awk 'function prnt() { print p; if (c>1) print " \\= repeated " c " times"; }
p && p != $0{prnt(); c=0} {p=$0; c++}; END{prnt()}' file
Line 1
\= repeated 43 times
Line 2

Related

Unable to parse the log file using Shell and python

I am trying to parse the log file using shell or python script. I used awk and sed but no luck. Can some one help me to resolve this. Below is the input and expecting output.
Input:
customer1:123
SRE:1
clientID:1
Error=1
customer1:124
SRE:1
clientID:1
Error=2
customer1:125
SRE:1
clientID:1
Error=3
customer1:126
SRE:1
clientID:1
Error=4
Output:
Customer | Error
123 1
124 2
125 3
126 4
It's usual to show some of your work, or what you've tried so far, but here's a rough guess at what you're looking for.
tmp$ awk -F: '/^customer1:/ {CUST=$2} ; /^Error/ {split($0,a,"=") ; print CUST, a[2]} ' t
Or breaking down by line:
tmp$ awk -F: '\
> /^customer1:/ {CUST=$2} ; \
> /^Error/ {split($0,a,"=") ; print CUST, a[2]} \
> ' t
123 1
124 2
125 3
126 4
The first line
/^customer1:/ {CUST=$2} ;
Does two things - matches lines that start (^ means start) with customer1, and those lines are automatically split on : because we said -F: at the start of our command.
/^Error/ {split($0,a,"=") ; print CUST, a[2]} ;
Matches lines that starts with Error, splits those lines into array a, on the delimiter "=", and then prints out the last value of CUST, as well as the second field on the error line.
Hopefully that all makes sense. It's worth reading an awk tutorial like https://www.grymoire.com/Unix/Awk.html

Process a line based on lines before and after in bash

I am trying to figure out how to write a bash script which uses the lines immediately before and after a line as a condition. I will give an example in a python-like pseudocode which makes sense to me.
Basically:
for line in FILE:
if line_minus_1 == line_plus_one:
line = line_minus_1
What would be the best way to do this?
So if I have an input file that reads:
3
1
1
1
2
2
1
2
1
1
1
2
2
1
2
my output would be:
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
Notice that it starts from the first line until the last line and respects changes made in earlier lines so if I have:
2
1
2
1
2
2
I would get:
2
2
2
2
2
2
and not:
2
1
1
1
2
2
$ awk 'minus2==$0{minus1=$0} NR>1{print minus1} {minus2=minus1; minus1=$0} END{print minus1}' file
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
How it works
minus2==$0{minus1=$0}
If the line from 2 lines ago is the same as the current line, then set the line from 1 line ago equal to the current line.
NR>1{print minus1}
If we are past the first line, then print the line from 1 line ago.
minus2=minus1; minus1=$0
Update the variables.
END{print minus1}
After we have finished reading the file, print the last line.
Multiple line version
For those who like their code spread over multiple lines:
awk '
minus2==$0{
minus1=$0
}
NR>1{
print minus1
}
{
minus2=minus1
minus1=$0
}
END{
print minus1
}
' file
Here is a (GNU) sed solution:
$ sed -r '1N;N;/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/;P;D' infile
3
1
1
1
2
2
2
2
1
1
1
2
2
2
2
This works with a moving three line window. A bit more readable:
sed -r ' # -r for extended regular expressions: () instead of \(\)
1N # On first line, append second line to pattern space
N # On all lines, append third line to pattern space
/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/ # See below
P # Print first line of pattern space
D # Delete first line of pattern space
' infile
N;P;D is the idiomatic way to get a moving two line window: append a line, print first line, delete first line of pattern space. To get a moving three line window, we read an additional line, but only once, namely when processing the first line (1N).
The complicated bit is checking if the first and third line of the pattern space are identical, and if they are, replacing the second line with the first line. To check if we have to make the substitution, we use the address
/^(.*)\n.*\n\1$/
The anchors ^ and $ are not really required as we'll always have exactly to newlines in the pattern space, but it makes it more clear that we want to match the complete pattern space. We put the first line into a capture group and see if it is repeated on the third line by using a backreference.
Then, if this is the case, we perform the substitution
s/^(.*\n).*\n/\1\1/
This captures the first line including the newline, matches the second line including the newline, and substitutes with twice the first line. P and D then print and remove the first line.
When reaching the end, the whole pattern space is printed so we're not swallowing any lines.
This also works with the second input example:
$ sed -r '1N;N;/^(.*)\n.*\n\1$/s/^(.*\n).*\n/\1\1/;P;D' infile2
2
2
2
2
2
2
To use with BSD sed (as found in OS X), you'd either have to use the -E instead of the -r option, or use no option, i.e., basic regular expressions and escape all parentheses (\(\)) in the capture groups. The newline matching should work, but I didn't test it. If in doubt, check this great answer lining out all the differences.

Auto-insert blank lines in `tail -f`

Having a log file such as:
[DEBUG][2016-06-24 11:10:10,064][DataSourceImpl] - [line A...]
[DEBUG][2016-06-24 11:10:10,069][DataSourceImpl] - [line B...]
[DEBUG][2016-06-24 11:10:12,112][DataSourceImpl] - [line C...]
which is under tail -f real-time monitoring, is it possible to auto-insert (via a command we would pipe to the tail) "blank lines" after, let's say, 2 seconds of inactivity?
Expected result:
[DEBUG][2016-06-24 11:10:10,064][DataSourceImpl] - [line A...]
[DEBUG][2016-06-24 11:10:10,069][DataSourceImpl] - [line B...]
---
[DEBUG][2016-06-24 11:10:12,112][DataSourceImpl] - [line C...]
(because there is a gap of more than 2 seconds between 2 successive lines).
awk -F'[][\\- ,:]+' '1'
The above will split fields on ], [, -, ,, and :, so that each field is as described below:
[DEBUG][2016-06-24 11:10:10,064][DataSourceImpl] - [line A...]
22222 3333 44 55 66 77 88 999 ...
You can then concatenate some of the fields and use that to measure time difference:
tail -f input.log | awk -F'[][\\- ,:]+' '{ curr=$3$4$5$6$7$8$9 }
prev + 2000 < curr { print "" } # Print empty line if two seconds
# have passed since last record.
{ prev=curr } 1'
tail does not have such feature. If you want you could implement a program or script that checks the last line of the file; something like (pseudocode)
previous_last_line = last line of your file
while(sleep 2 seconds)
{
if (last_line == previous_last_line)
print newline
else
print lines since previous_last_line
}
Two remarks:
this will cause you to have no output during 2 seconds; you could keep checking for the last line more often and keep a timestamp; but that requires more code...
this depends on the fact that all lines are unique; which is reasonable in your case; since you have timestamps in each line

Filtering file in Unix

I have a big problem with filtering output error file.
The log file:
Important some words flags
Line 1
Line 2
...
Line N
Important some words
Line 1
Line 2
...
Line N
Important some words
Line 1
Line 2
...
Line N
Important some words flags
Line 1
Line 2
...
Line N
So, some section has word "flags" another not.
Desired output file is:
Important some words flags
Line 1
Line 2
...
Line N
Important some words flags
Line 1
Line 2
...
Line N
Only section with line, which one was started via "Important" and ended "flags".
All sections have a random number of lines.
So I can't use something like that:
grep -B1 -P '!^Important*flags' logfile
Because I don't know how many lines will be after/before that line...
There are more succinct ways to handle it, but this is fairly clear:
awk '/^Important.*flags$/ { p = 1; print; next }
/^Important/ { p = 0; next }
{ if (p) print }'
If the line is important and flagged, set p to 1, print the line, and skip to the next.
Else, if the line is important (but not flagged), set p to 0 and skip to the next.
Otherwise, it is an 'unimportant' line; print it if p is non-zero (which means that the last important line was flagged).
Any lines before the first Important line will find p is 0 anyway, so they won't be printed.
perl -n0E 'say /(Important\N*flags.*?)(?=Important|$)/sg'

awk reading in values

Hello the following code is used by me to split a file
BEGIN{body=0}
!body && /^\/\/$/ {body=1}
body && /^\[/ {print > "first_"FILENAME}
body && /^pos/{$1="";print > "second_"FILENAME}
body && /^[01]+/ {print > "third_"FILENAME}
body && /^\[[0-9]+\]/ {
print > "first_"FILENAME
print substr($0, 2, index($0,"]")-2) > "fourth_"FILENAME
}
the file looks like here
header
//
SeqT: {"POS-s":174.683, "time":0.0130084}
SeqT: {"POS-s":431.49, "time":0.0221447}
[2.04545e+2]:0.00843832,469:0.0109533):0.00657864,((((872:0.00120503,((980:0.0001);
[29]:((962:0.000580339,930:0.000580339):0.00543993);
absolute:
gthcont: 5 4 2 1 3 4 543 5 67 657 78 67 8 5645 6
01010010101010101010101010101011111100011
1111010010010101010101010111101000100000
00000000000000011001100101010010101011111
The problem is that in the file 4 print substr($0, 2, index($0,"]")-2) > "fourth_"FILENAME the number with the sci notation with e does not get through. it works only as long as it is written without that . how can i cahnge the awk to also get the number in the way like 2.7e+7 or so
The problem is you're trying to match E notation when your regex is looking for integers only.
Instead of:
/^\[[0-9]+\]/
use something like:
/^\[[0-9]+(\.[0-9]+(e[+-]?[0-9]+)?)?\]/
This will match positive integers, floats, and E notation wrapped in square brackets at the start of the line.
See demo

Resources