read line by line with awk and parse variables - bash

I have a script that read log files and parse the data to insert them to mysql table..
My script looks like
while read x;do
var=$(echo ${x}|cut -d+ -f1)
var2=$(echo ${x}|cut -d_ -f3)
...
echo "$var,$var2,.." >> mysql.infile
done<logfile
The Problem is that log files are thousands of lines and taking hours....
I read that awk is better, I tried, but don't know the syntax to parse the variables...
EDIT:
inputs are structure firewall logs so they are pretty large files like
#timestamp $HOST reason="idle Timeout" source-address="x.x.x.x"
source-port="19219" destination-address="x.x.x.x"
destination-port="53" service-name="dns-udp" application="DNS"....
So I'm using a lot of grep for ~60 variables e.g
sourceaddress=$(echo ${x}|grep -P -o '.{0,0}
source-address=\".{0,50}'|cut -d\" -f2)
if you think perl will be better I'm open to suggestions and maybe a hint how to script it...

To answer your question, I assume the following rules of the game:
each line contains various variables
each variable can be found by a different delimiter.
This gives you the following awk script :
awk 'BEGIN{OFS=","}
{ FS="+"; $0=$0; var=$1;
FS="_"; $0=$0; var2=$3;
...
print var1,var2,... >> "mysql.infile"
}' logfile
It basically does the following :
set the output separator to ,
read line
set the field separator to +, re-parse the line ($0=$0) and determine the first variable
set the field separator to '_', re-parse the line ($0=$0) and determine the second variable
... continue for all variables
print the line to the output file.

The perl script below might help:
perl -ane '/^[^+]*/;printf "%s,",$&;/^([^_]*_){2}([^_]*){1ntf "%s\n",$+' logfile
Since, $& can result in performance penalty, you could also use the /p modifier like below :
perl -ane '/^[^+]*/p;printf "%s,",${^MATCH};/^([^_]*_){2}([^_]*){1}_.*/;printf "%s\n",$+' logfile
For more on perl regex matching refer to [ PerlDoc ]

if you're extracting the values in order, something like this will help
$ awk -F\" '{for(i=2;i<=NF;i+=2) print $i}' file
idle Timeout
x.x.x.x
19219
x.x.x.x
53
dns-udp
DNS
you can easily change the output format as well
$ awk -F\" -v OFS=, '{for(i=2;i<=NF;i+=2)
printf "%s", $i ((i>NF-2)?ORS:OFS)}' file
idle Timeout,x.x.x.x,19219,x.x.x.x,53,dns-udp,DNS

Related

How to get '2f8b547d..eb94967a' string from the log 'Updating 2f8b547d..eb94967a Fast-forward....' in shell?

I am building a shell script.
The script gets git log such as:
"Updating 2f8b547d..eb94967a Fast-forward...."
but I want to get 2f8b547d..eb94967a snippet.
I am a new one for the shell. So, Thanks for you help.
Update:
For the more, I want use the snippet as a param. Because I will excute
git log 2f8b547d..eb94967a
You can pipe it to awk like so:
echo "Updating 2f8b547d..eb94967a Fast-forward...." | awk '{print $2}'
Your result will be 2f8b547d..eb94967a.
If it is a script, say, abc.sh that had such output, then you can run:
$> ./abc.sh | awk '{print $2}'
awk takes the output and splits the information by space. Updating is represented with $1. 2f8b547d..eb94967a is $2 and so on. In the above example, we ask awk to print out the 2nd item in the output.
As an alternative to awk (don't get me wrong, awk is super for this job as well), you can simply use cut with a space delimiter extract the second field, e.g.
cut -d' ' -f2 yourgit.log
You can also pipe output to cut or redirect the input file to it using < as well. It essentially does the same as the awk command, it just being a different choice of which utility to use.
Here another alternative:
echo "Updating 2f8b547d..eb94967a Fast-forward...." | read u hash rest
After this, the string you are looking for is stored on the variable hash:
echo $hash

Bash awk append to same line

There are numerous posts about removing leading white space and appending an entry to a single existing line in a file using awk. None of my attempts work - just three examples here of the many I have tried.
Say I have a file called $log with a single line
a:b:c
and I want to add a fourth entry,
awk '{ print $4"d" }' $log | tee -a $log
output seems to be a newline
`a:b:c:
d`
whereas, I want all on the same line;
a:b:c:d
try
BEGIN { FS = ":" } ; awk '{ print $4"d" }' $log | tee -a $log
or, this - avoid a new line
awk 'BEGIN { ORS=":" }; { print $4"d" }' $log | tee -a $log
no change
`a:b:c:
d`
awk is placing a space after c: and then writing d to the next line.
EDIT: | tee -a $log appears to be necessary to write the additional string to the file.
$log contains 39 variables and was generated using awk without | tee -a
odd...
The actual command to write $40 to the single line entries
awk '{ print $40"'$imagedir'" }' $log
output
+ awk '{ print $40"/home/geoland/Asterism-DEVEL/DSO" }'
/home/geoland/.asterism/log
but this does not write to the $log file.
How should I append d to the same line without leading white space using awk - also looking at sed xargs and other alternatives.
Using awk:
awk '{ print $0":d" }' file
Using sed:
sed 's/$/:d/' file
Using only bash:
while IFS= read -r line; do
echo "$line:d"
done < file
Using sed:
$ echo a:b:c | sed 's,\(^.*$\),\1:d,'
a:b:c:d
Thanks all... This is the solution I went with. I also needed to write the entire line to a perpetual log file because the log file is overwritten at each new process instance.
I will further investigate an awk solution.
logname=$imagedir/log_$name
while IFS=: read -r line; do
echo "$line$imagedir"
done < $log | tee $logname
This places $imagedir directly behind the last IFS ':' separator
There is probably room for refinement.
I too am not entirely sure what you're trying to do here.
Your command line, awk '{ print $4"d" }' $log | tee -a $log is problematic in a number of ways.
First, your awk script tries to print the 4th field, which is empty. Unless you say otherwise, fields are separated by whitespace, and the string a:b:c has no whitespace. So .. awk prints "d". And tee -a appends to your existing logfile, so what you're seeing is the original data, along with the d printed by awk. That's totally expected.
Second, it appears to have tee appending to the same file that awk is in the process of reading. This won't make an endless loop, as awk should stop reading the input file after whatever was the last byte when the file was opened, but it does mean you may have repeated data there.
Your other attempts, aside from some syntactical errors, all suffer from the same assumption that $4 means something that it does not.
The following awk snippet sets the input and output field separators to :, then sets the 4th field to "d", then prints the line.
$ echo "a:b:c" | awk 'BEGIN{FS=OFS=":"} {$4="d"} 1'
a:b:c:d
Is that what you want?
If you really do need to append this data to an existing log file, you can do so with tee -a or simple >> redirection. Just bear in mind that awk will only see the content of the file as of the time it was run, and by appending, you are not replacing lines.
One other thing. If you are actually hoping to use the content of the shell variable $imagedir inside awk, you should pass the variable in rather than exiting your quotes. For example:
$ echo "a:b:c" | awk -v d="foo/bar" 'BEGIN{FS=OFS=":"} {$4=d} 1'
a:b:c:foo/bar
sed "s|$|$imagedir|" file | tee newfile
This does the trick. Read 'file' and write the contents of 'file' with the substitution to a 'new file', so as to read the image directory when using a secondary standalone process.
Because the variable is a directory with several / these need to be escaped, so as not to interpret as sed delimiters. I had difficulty with this using a variable.
A neater option was to use an alternative delimiter. Not to be confused with the pipe that follows.

Swap in shell from file

Through cut -d":" -f1,3 I made a new file which looks like this:
username1:number1
username2:number2
username3:number3
But my point is, I want to my file to looks like this:
number1:username1
number2:username2
number3:username3
I tried that cut -d":" -f3,1 but it still gets me username1:number1 even when I want to be that 3rd column be the 1st and the 1st column to print it like a last one ... Any help with that ?
cut -f3,1 will print the same as cut -f1,3. Use awk:
awk -F: '{print $3 FS $1}' file
I like awk for this sort of thing. You've already got an awk answer though.
To do this in pure bash, you'd do the following:
while IFS=: read -r one two; do printf '%s:%s\n' "$two" "$one"; done < input.txt
The IFS variable is the field separator used to slice up the input into separate variables for read, and I'm using printf to gives us predictably formatted output.

error in Unix and sed coding

I have two questions. The first is that sometimes when I am coding in Unix and I input a command and I do it wrong I get a new line without my prompt and no matter what I type, nothing happens until I exit out and re-enter. Does anyone know why this is happening?
Secondly,
I have a file that consists of: filename space data
I need to get the data, I heard that I should use awk or sed but I am not sure how to do it. Any help is welcome.
Dennis has already answered your first question well. (Note: please put only one question in at a time!)
For your second question, it can be done much more simply.
awk '{ print $2 }' yourfile
By default, awk uses space as its column delimiter, so this simply tells awk to print out the second column. If you want the output sent to a new file, then just do this:
awk '{ print $2 }' yourfile > newfile
First question:
Enter echo " and it will happen. Unix supports multi-line commands.
Example:
echo "
is a multi-line
command"
Type in " and enter to terminate.
Second question:
Here's a link to a nice AWK tutorial: Awk - A Tutorial and Introduction
Basicaly, you use
awk '{ print "echo " $2 }' filename | sh
for example, to echo all the data.
$2 accesses the second chunk of information of each line (chunks are seperates by spaces).
print "echo " $2 will cause awk to output echo data.
Last, you pipe to sh to execute the command of awk's output.

Substring extraction using bash shell scripting and awk

So, I have a file called 'dummy' which contains the string:
"There is 100% packet loss at node 1".
I also have a small script that I want to use to grab the percentage from this file. The script is below.
result=`grep 'packet loss' dummy` |
awk '{ first=match($0,"[0-9]+%")
last=match($0," packet loss")
s=substr($0,first,last-first)
print s}'
echo $result
I want the value of $result to basically be 100% in this case. But for some reason, it just prints out a blank string. Can anyone help me?
You would need to put the closing backtick after the end of the awk command, but it's preferable to use $() instead:
result=$( grep 'packet loss' dummy |
awk '{ first=match($0,"[0-9]+%")
last=match($0," packet loss")
s=substr($0,first,last-first)
print s}' )
echo $result
but you could just do:
result=$( grep 'packet loss' | grep -o "[0-9]\+%" )
Try
awk '{print $3}'
instead.
the solution below can be used when you don't know where the percentage numbers are( and there's no need to use awk with greps)
$ results=$(awk '/packet loss/{for(i=1;i<=NF;i++)if($i~/[0-9]+%$/)print $i}' file)
$ echo $results
100%
You could do this with bash alone using expr.
i=`expr "There is 98.76% packet loss at node 1" : '[^0-9.]*\([0-9.]*%\)[^0-9.]*'`; echo $i;
This extracts the substring matching the regex within \( \).
Here I'm assuming that the output lines you're interested in adhere strictly to your example, with the percentage value being the only variation.
With that assumption, you really don't need anything more complicated than:
awk '/packet loss/ { print $3 }' dummy
This quite literally means "print the 3rd field of any lines containing 'packet loss' in them". By default awk treats whitespace as field delimiters, which is perfect for you.
If you are doing more than simply printing the percentage, you could save the results to a shell variable using backticks, or redirect the output to a file. But your sample code simply echoes the percentages to stdout, and then exits. The one-liner does the exact same thing. No need for backticks or $() or any other shell machinations whatsoever.
NB: In my experience, piping the output of grep to awk is usually doing something that awk can do all by itself.

Resources