Efficient way to get unique value from log file - shell

There is a large log file of 10GB, and formatted as following:
node123`1493000000`POST /api/info`app_id=123&token=123&sign=abc
node456`1493000000`POST /api/info`app_id=456&token=456&sign=abc
node456`1493000000`POST /api/info`token=456&app_id=456&sign=abc
node456`1493000000`POST /api/info`token=456&sign=abc&app_id=456
Now I want to get unique app_ids from the log file. For example, the expected result of the log file above should be:
123
456
I do that with shell script awk -F 'app_id=' '{print $2}' $filename | awk -F '&' '{print $1}' | sort | uniq, and is there a more efficient way?

If your log file's name is log_file.txt,you can use these commands:
grep -Po "(?<=&app_id=)[0-9]+" log_file.txt
awk -F "[&=]" '{print $4}' log_file.txt

Change the logfile name
awk '{print $17" "$18" "$19" "$20}' log.txt |sort -k1|uniq >> z #apache
# filename on line number(0-9) awk result
while read x;
do
echo $x
grep "$x" log.txt | wc -l
done < z

Related

Rename file as per fixed patter in UNIX

I have these files in mydir:
APPLE_STORE_iphone12.csv
APPLE_STORE_iphonex.csv
APPLE_STORE_ipad.csv
APPLE_STORE_imac.csv
Need to rename the files after a matching pattern "APPLE_STORE_".
Required O/P
APPLE_STORE_NY_iphone12_20210107140443.csv
APPLE_STORE_NY_iphonex_20210107140443.csv
APPLE_STORE_NY_ipad_20210107140443.csv
APPLE_STORE_NY_imac_20210107140443.csv
Here is what I tried:
filelist=/mydir/APPLE_STORE_*.csv
dtstamp=`date +%Y%m%d%H%M%S`
location='NY'
for file in ${filelist}
do
filebase=${file%.csv}
mv ${file} ${filebase}_${location}_${dtstamp}.csv
done
This is giving me name like APPLE_STORE_imac_NY_20210107140443.csv
Another (maybe not so elegant) way is to first explicitly divide the filename in its parts using awk with separator "_" and then build it up again as needed. Your script could then look like:
#!/bin/bash
filelist=./APPLE_STORE_*.csv
dtstamp=`date +%Y%m%d%H%M%S`
location='NY'
for file in ${filelist}
do
filebase=${file%.csv}
part1=`echo ${filebase} | awk -v FS="_" '{print $1}'`
part2=`echo ${filebase} | awk -v FS="_" '{print $2}'`
part3=`echo ${filebase} | awk -v FS="_" '{print $3}'`
mv ${file} ${part1}_${part2}_${location}_${part3}_${dtstamp}.csv
done
I tested it successfully.
You are so close.
destfile="$(echo $file | sed -e 's/^APPLE_STORE/APPLE_STORE_${location}/' -e 's/\.csv$/${dtstamp}.csv/')"`
mv "$file" "$destfile"
...or something like that.

grep search with filename as parameter

I'm working on a shell script.
OUT=$1
here, the OUT variable is my filename.
I'm using grep search as follows:
l=`grep "$pattern " -A 15 $OUT | grep -w $i | awk '{print $8}'|tail -1 | tr '\n' ','`
The issue is that the filename parameter I must pass is test.log.However, I have the folder structure :
test.log
test.log.001
test.log.002
I would ideally like to pass the filename as test.log and would like it to search it in all log files.I know the usual way to do is by using test.log.* in command line, but I'm facing difficulty replicating the same in shell script.
My efforts:
var-$'.*'
l=`grep "$pattern " -A 15 $OUT$var | grep -w $i | awk '{print $8}'|tail -1 | tr '\n' ','`
However, I did not get the desired result.
Hopefully this will get you closer:
#!/bin/bash
for f in "${1}*"; do
grep "$pattern" -A15 "$f"
done | grep -w $i | awk 'END{print $8}'

How to get output of awk into a tab-delimited file merging two lines to a line every time?

I have multiple files in gz format and used this script which counts lines in each file and prints 1/4 of lines for each file:
for file in *.gz;
do echo $file;
gunzip -c $file | wc -l | awk '{print, $1/4}';
done
STDOUT:
AB.gz
12
CD.gz
4
How I can pipe outputs of awk into a tab-delimited file like this merging two lines each time:
AB.gz 12
CD.gz 4
I tried paste by piping | paste -sd '\t' > output.txt in the script but it didn't work.
You can use a script like this:
for file in *.gz; do
gzcat "$file" | awk -v fn="$file" -v OFS='\t' 'END{print fn, int(NR/4)}'
done
Do not echo a newline after the file:
for file in *.gz;
do
printf "%s " "${file}"
gunzip -c $file | wc -l | awk '{print, $1/4}';
done

No output when using awk inside bash script

My bash script is:
output=$(curl -s http://www.espncricinfo.com/england-v-south-africa-2012/engine/current/match/534225.html | sed -nr 's/.*<title>(.*?)<\/title>.*/\1/p')
score=echo"$output" | awk '{print $1}'
echo $score
The above script prints just a newline in my console whereas my required output is
$ curl -s http://www.espncricinfo.com/england-v-south-africa-2012/engine/current/match/534225.html | sed -nr 's/.*<title>(.*
?)<\/title>.*/\1/p' | awk '{print $1}'
SA
So, why am I not getting the output from my bash script whereas it works fine in terminal am I using echo"$output" in the wrong way.
#!/bin/bash
output=$(curl -s http://www.espncricinfo.com/england-v-south-africa-2012/engine/current/match/534225.html | sed -nr 's/.*<title>(.*?)<\/title>.*/\1/p')
score=$( echo "$output" | awk '{ print $1 }' )
echo "$score"
Score variable was probably empty, since your syntax was wrong.

Why I can't split the string?

I want to read a file by shell script, and process it line by line. I would like to extract 2 fields from each line. Here is my code:
#!/bin/bsh
mlist=`ls *.log.2011-11-1* | grep -v error`
for log in $mlist
do
while read line
do
echo ${line} | awk -F"/" '{print $4}' #This produce nothing
echo ${line} #This work and print each line
done < $log | grep "java.lang.Exception"
done
This is a sample line from the input file:
<ERROR> LimitFilter.WebContainer : 4 11-14-2011 21:56:55 - java.lang.Exception: File - /AAA/BBB/CCC/DDDDDDDD.PDF does not exist
If I don't use bsh, I can use ksh, and the result is the same. We have no bash here.
It's because you are passing the output of your while loop through grep "java.lang.Exception".
The output of echo $line | awk -F"/" '{print $4}' is CCC. When this is piped through grep, nothing is printed because CCC does not match the search pattern.
Try removing | grep "java.lang.Exception" and you will see the output of your loop come out correctly.
An alternative approach to take might be to remove the while loop and instead just use:
grep "java.lang.Exception" $log | awk -F"/" '{print $4}'

Resources