I need to validate my log files:
-All new log lines shall start with date.
-This date will respect the ISO 8601 standard. Example:
2011-02-03 12:51:45,220Z -
Using shell script, I can validate it looping on each line and verifying the date pattern.
The code is below:
#!/bin/bash
processLine(){
# get all args
line="$#"
result=`echo $line | egrep "[0-9]{4}-[0-9]{2}-[0-9]{2} [012][0-9]:[0-9]{2}:[0-9]{2},[0-9]{3}Z" -a -c`
if [ "$result" == "0" ]; then
echo "The log is not with correct date format: "
echo $line
exit 1
fi
}
# Make sure we get file name as command line argument
if [ "$1" == "" ]; then
echo "You must enter a logfile"
exit 0
else
file="$1"
# make sure file exist and readable
if [ ! -f $file ]; then
echo "$file : does not exists"
exit 1
elif [ ! -r $file ]; then
echo "$file: can not read"
exit 2
fi
fi
# Set loop separator to end of line
BAKIFS=$IFS
IFS=$(echo -en "\n\b")
exec 3<&0
exec 0<"$file"
while read -r line
do
# use $line variable to process line in processLine() function
processLine $line
done
exec 0<&3
# restore $IFS which was used to determine what the field separators are
IFS=$BAKIFS
echo SUCCESS
But, there is a problem. Some logs contains stacktraces or something that uses more than one line, in other words, stacktrace is an example, it can be anything. Stacktrace example:
2011-02-03 12:51:45,220Z [ERROR] - File not found
java.io.FileNotFoundException: fred.txt
at java.io.FileInputStream.<init>(FileInputStream.java)
at java.io.FileInputStream.<init>(FileInputStream.java)
at ExTest.readMyFile(ExTest.java:19)
at ExTest.main(ExTest.java:7)
...
will not pass with my script, but is valid!
Then, if I run my script passing a log file with stacktraces for example, my script will failed, because it loops line by line.
I have the correct pattern and I need to validade the logger date format, but I don't have wrong date format pattern to skip lines.
I don't know how I can solve this problem. Does somebody can help me?
Thanks
You need to anchor your search for the date to the start of the line (otherwise the date could appear anywhere in the line - not just at the beginning).
The following snippet will loop over all lines that do not begin with a valid date. You still have to determine if the lines constitute errors or not.
DATEFMT='^[0-9]{4}-[0-9]{2}-[0-9]{2} [012][0-9]:[0-9]{2}:[0-9]{2},[0-9]{3}Z'
egrep -v ${DATEFMT} /path/to/log | while read LINE; do
echo ${LINE} # did not begin with date.
done
So just (silently) discard a single stack trace. In somewhat verbose bash:
STATE=idle
while read -r line; do
case $STATE in
idle)
if [[ $line =~ ^java\..*Exception ]]; then
STATE=readingexception
else
processLine "$line"
fi
;;
readingexception)
if ! [[ $line =~ ^' '*'at ' ]]; then
STATE=idle
processLine "$line"
fi
;;
*)
echo "Urk! internal error [$STATE]" >&2
exit 1
;;
esac
done <logfile
This relies on processLine not continuing on error, else you will need to track a tad more state to avoid two consecutive stack traces.
This makes 2 assumptions.
lines that begin with whitespace are continuations of previous lines. we're matching a leading space, or a leading tab.
lines that have non-whitespace characters starting at ^ are new log lines.
If a line matching #2 doesn't match the date format, we have an error, so print the error, and include the line number.
count=0
processLine() {
count=$(( count + 1 ))
line="$#"
result=$( echo $line | egrep '^[0-9]{4}-[0-9]{2}-[0-9]{2} [012][0-9]:[0-9]{2}:[0-9]{2},[0-9]{3}Z' -a -c )
if (( $result == 0 )); then
# if result = 0, then my line did not start with the proper date.
# if the line starts with whitespace, then it may be a continuation
# of a multi-line log entry (like a java stacktrace)
continues=$( echo $line | egrep "^ |^ " -a -c )
if (( $continues == 0 )); then
# if we got here, then the line did not start with a proper date,
# AND the line did not start with white space. This is a bad line.
echo "The line is not with correct date format: "
echo "$count: $line"
exit 1
fi
fi
}
Create a condition to check if the line starts with a date. If not, skip that line as it is part of a multi-line log.
processLine(){
# get all args
line="$#"
result=`echo $line | egrep "[0-9]{4}-[0-9]{2}-[0-9]{2} [012][0-9]:[0-9]{2}:[0-9]{2},[0-9]{3}Z" -a -c`
if [ "$result" == "0" ]; then
echo "Log entry is multi-lined - continuing."
fi
}
Related
i'm trying to generate a new output file from each existing file in a directory of .txt files. I want to check line by line in each file for two substrings. And append the lines that match that substring to each new output file.
I'm having trouble generating the new files.
This is what i currently have:
#!/bin/sh
# My first Script
success="(Compiling)\s\".*\"\s\-\s(Succeeded)"
failure="(Compiling)\s\".*\"\s\-\s(Failed)"
count_success=0
count_failure=0
for i in ~/Documents/reports/*;
do
while read -r line;
do
if [[$success=~$line]]; then
echo $line >> output_$i
count_success++
elif [[$failure=~$]]; then
echo $line >> output_$i
count_failure++
fi
done
done
echo "$count_success of jobs ran succesfully"
echo "$count_failure of jobs didn't work"
~
Any help would be appreciated, thanks
Please, use https://www.shellcheck.net/ to check your shell scripts.
If you use Visual Studio Code, you could install "ShellCheck" (by Timon Wong) extension.
About your porgram.
Assume bash
Define different extensions for input and output files (really important if there are in the same directory)
Loop on report, input, files only
Clear output file
Read input file
if sequence:
if [[ ... ]] with space after [[ and before ]]
spaces before and after operators (=~)
reverse operands order for operators =~
Prevent globbing with "..."
#! /bin/bash
# Input file extension
declare -r EXT_REPORT=".txt"
# Output file extension
declare -r EXT_OUTPUT=".output"
# RE
declare -r success="(Compiling)\s\".*\"\s\-\s(Succeeded)"
declare -r failure="(Compiling)\s\".*\"\s\-\s(Failed)"
# Counters
declare -i count_success=0
declare -i count_failure=0
for REPORT_FILE in ~/Documents/reports/*"${EXT_REPORT}"; do
# Clear output file
: > "${REPORT_FILE}${EXT_OUTPUT}"
# Read input file (see named file in "done" line)
while read -r line; do
# does the line match the success pattern ?
if [[ $line =~ $success ]]; then
echo "$line" >> "${REPORT_FILE}${EXT_OUTPUT}"
count_success+=1
# does the line match the failure pattern ?
elif [[ $line =~ $failure ]]; then
echo "$line" >> "${REPORT_FILE}${EXT_OUTPUT}"
count_failure+=1
fi
done < "$REPORT_FILE"
done
echo "$count_success of jobs ran succesfully"
echo "$count_failure of jobs didn't work"
What about using grep?
success='Compiling\s".*"\s-\sSucceeded'
failure='Compiling\s".*"\s-\sFailed'
count_success=0
count_failure=0
for i in ~/Documents/reports/*; do
(( count_success += $(grep -E "$success" "$i" | tee "output_$i" | wc -l) ))
(( count_failure += $(grep -E "$failure" "$i" | tee -a "output_$i" | wc -l) ))
done
echo "$count_success of jobs ran succesfully"
echo "$count_failure of jobs didn't work"
I have a series of commands chained together with pipes:
should_create_one_line | expects_one_line
The first command should_create_one_line should produce an output that only has one line, but under strange circumstances it is possible for the output to be multiline or empty.
I would like to add a step in between these two, validate_one_line:
should_create_one_line | validate_one_line | expects_one_line
If its input contains exactly 1 line then validate_one_line will simply output its input. If its input contains more than 1 line or is empty then validate_one_line should cause the whole sequence of steps to stop and return an error code.
What command can I use for validate_one_line?
Use read. Here's a shell function that meets your specs:
exactly_one_line() {
local line # Use to echo the line
read -r line || return # Guarantee at least one line is read
read && return 1 # Indicate failure if another line is successfully read
echo "$line"
}
Notes
"One line" assumes a single line followed by a newline. If your input could be like, a file with contents but no newlines, then this will fail.
Given a pipeline like a|b, a cannot prevent b from running. At a minimum, b needs to handle when a produces no output.
Demo:
$ wc -l empty oneline twolines
0 empty
1 oneline
2 twolines
3 total
$ exactly_one_line < empty; echo $?
1
$ exactly_one_line < oneline; echo $?
oneline
0
$ exactly_one_line < twolines; echo $?
1
First off, you should seriously consider adding the validation code to expects_one_line. According to this post, each process starts in its own subshell, meaning that even if validate_one_line fails, you will get an error in expects_one_line because it will try to run with no input (or a blank line). That being said, here is a bash one-liner that you can insert into your pipe to validate:
should_create_one_line.sh | ( var="$(cat)"; [ $(echo "$var" | wc -l) -ne 1 ] && exit 1 || echo "$var") | expects_one_line.sh
The problem here is that when the validation subshell returns in the exit 1 case, expects_one_line.sh will still get a single blank line. If this works for you, then great. If not, it would be better to just put the following into the beginning of expects_one_line.sh:
input="$(cat)"
[ $(echo "$var" | wc -l) -ne 1 ] && exit 1
This would guarantee that expects_one_line.sh fails properly when getting a single line without having to wonder about what the empty line that the validation outputs will do to the script.
You may find this post helpful: How to read mutliline input from stdin into variable and how to print one out in shell(sh,bash)?
You can use a bash script to check the incoming data and call the other command when the input is only 1 line
The following code starts cat when it is ONLY fet in 1 line
sh -c 'while read CMD; do [ ! -z "$LINE" ] && exit 1; LINE=$CMD; done; [ -z "$LINE" ] && exit 1; printf "%s\n" $LINE | "$0" "$#"' cat
How this works
Try reading a line, if failed go to step 5
If variable $LINE is NOT empty, goto step 6
Save line inside variable $LINE
Goto step 1
If $LINE is NOT empty, goto step 7
Exit the program with status code 1
Call our program and pass our $line to it using printf
Example usage:
Printing out only if grep found 1 match:
grep .... | sh -c 'while read CMD; do [ ! -z "$LINE" ] && exit 1; LINE=$CMD; done; [ -z "$LINE" ] && exit 1; printf "%s\n" $LINE | "$0" "$#"' cat
Example of the question poster:
should_create_one_line | sh -c 'while read CMD; do [ ! -z "$LINE" ] && exit 1; LINE=$CMD; done; [ -z "$LINE" ] && exit 1; printf "%s\n" $LINE | "$0" "$#"' expects_one_line
I am trying to read lines from a file containing multiple lines. I want to identify lines that contain only spaces.
By definition, an empty line is empty and does not contain anything (including spaces).
I want to detect lines that seems to be empty but they are not (lines that contain spaces only)
while read line; do
if [[ `echo "$line" | wc -w` == 0 && `echo "$line" | wc -c` > 1 ]];
then
echo "Fake empty line detected"
fi
done < "$1"
But because read ignores spaces in the start and in the end of a string my code isn't working.
an example of a file
hi
hi
(empty line, no spaces or any other char)
hi
(two spaces)
hey
Please help me to fix the code
Disable word splitting by clearing the value of IFS (the internal field separator):
while IFS= read -r line; do
....
done < "$1"
The -r isn't strictly necessary, but it is good practice.
Also, a simpler way to check the value of line (I assume you're looking for a line with nothing but whitespace):
if [[ $line =~ ^$ ]]; then
echo "Fake empty line detected"
fi
Following your code, it can be improved.
while read line; do
if [ -z "$line" ]
then
echo "Fake empty line detected"
fi
done < "$1"
The test -z checks if $line is empty.
Output:
Fake empty line detected
Fake empty line detected
This is just a simple problem but I don't understand why I got an error here. This is just a for loop inside an if statement.
This is my code:
#!/bin/bash
if (!( -f $argv[1])) then
echo "Argv must be text file";
else if ($#argv != 1) then
echo "Max argument is 1";
else if (-f $argv[1]) then
for i in `cut -d ',' -f2 $argv[1]`
do
ping -c 3 $i;
echo "finish pinging host $i"
done
fi
Error is in line 16, which is the line after fi, that is a blank line .....
Can someone please explain why i have this error ????
many, many errors.
If I try to stay close to your example code:
#!/bin/sh
if [ ! -f "${1}" ]
then
echo "Argv must be text file";
else if [ "${#}" -ne 1 ]
then
echo "Max argument is 1";
else if [ -f "${1}" ]
then
for i in $(cat "${1}" | cut -d',' -f2 )
do
ping -c 3 "${i}";
echo "finish pinging host ${i}"
done
fi
fi
fi
another way, exiting each time the condition is not met :
#!/bin/sh
[ "${#}" -ne 1 ] && { echo "There should be 1 (and only 1) argument" ; exit 1 ; }
[ ! -f "${1}" ] && { echo "Argv must be a file." ; exit 1 ; }
[ -f "${1}" ] && {
for i in $(cat "${1}" | cut -d',' -f2 )
do
ping -c 3 "${i}";
echo "finish pinging host ${i}"
done
}
#!/usr/local/bin/bash -x
if [ ! -f "${1}" ]
then
echo "Argument must be a text file."
else
while-loop-script "${1}"
fi
I have broken this up, because I personally consider it extremely bad form to nest one function inside another; or truthfully to even have more than one function in the same file. I don't care about file size, either; I've got several scripts which are 300-500 bytes long. I'm learning FORTH; fractalism in that sense is a virtue.
# while-loop-script
while read line
do
IFS="#"
ping -c 3 "${line}"
IFS=" "
done < "${1}"
Don't use cat in order to feed individual file lines to a script; it will always fail, and bash will try and execute the output as a literal command. I thought that sed printing would work, and it often does, but for some reason it very often substitutes spaces for newlines, which is extremely annoying as well.
The only absolutely bulletproof method of feeding a line to a script that I know of, which will preserve all space and formatting, is to use while-read loops, rather than substituted for cat or for sed loops, as mentioned.
Something else which you will need to do, in order to be sure about preserving whitespace, is to set the internal field seperator (IFS) to something that you know your file will not contain, and then resetting it back to whitespace at the end of the loop.
For every opening if, you must have a corresponding closing fi. This is also true for else if. Better use elif instead
if test ! -f "$1"; then
echo "Argv must be text file";
elif test $# != 1; then
echo "Max argument is 1";
elif test -f "$1"; then
for i in `cut -d ',' -f2 "$1"`
do
ping -c 3 $i;
echo "finish pinging host $i"
done
fi
There's also no argv variable. If you want to access the command line arguments, you must use $1, $2, ...
Next point is $#argv, this evaluates to $# (number of command line args) and argv. This looks a lot like perl.
Furthermore, testing is done with either test ... or [ ... ], not ( ... )
And finally, you should enclose at least your command line arguments in double quotes "$1". If you don't and there is no command line argument, you have for example
test ! -f
instead of
test ! -f ""
This lets the test fail and go on to the second if, instead of echoing the proper message.
How come the additional 'Line' insideecho "Line $line" is not prepended to all files inside the for loop?
#!/bin/bash
INPUT=targets.csv
IFS=","
[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }
while read target user password path
do
result=$(sshpass -p "$password" ssh -n "$user"#"$target" ls "$path"*file* 2>/dev/null)
if [ $? -ne 0 ]
then
echo "No Heap dumps detected."
else
echo "Found a Heap dump! Possible OOM issue detected"
for line in $result
do
echo "Line $line"
done
fi
done < $INPUT
.csv file contents ..
rob#laptop:~/scripts$ cat targets.csv
server.com,root,passw0rd,/root/
script output ..
rob#laptop:~/scripts$ ./checkForHeapdump.sh
Found a Heap dump! Possible OOM issue detected
Line file1.txt
file2.txt
The statement:
for line in $result
performs word splitting on $result to get each element that $line should be set to. Word splitting uses the delimiters in $IFS. Earlier in the script you set this to just ,. So this loop will iterate over comma-separated data in $result. Since there aren't any commas in it, it's just a single element.
If you want to split it by lines, do:
IFS="
"
for line in $result