Evaluating a log file using a sh script - bash

I have a log file with a lot of lines with the following format:
IP - - [Timestamp Zone] 'Command Weblink Format' - size
I want to write a script.sh that gives me the number of times each website has been clicked.
The command:
awk '{print $7}' server.log | sort -u
should give me a list which puts each unique weblink in a separate line. The command
grep 'Weblink1' server.log | wc -l
should give me the number of times the Weblink1 has been clicked. I want a command that converts each line created by the Awk command above to a variable and then create a loop that runs the grep command on the extracted weblink. I could use
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
done
(source: Read a file line by line assigning the value to a variable) but I don't want to save the output of the Awk script in a .txt file.
My guess would be:
while IFS='' read -r line || [[ -n "$line" ]]; do
grep '$line' server.log | wc -l | ='$variabel' |
echo " $line was clicked $variable times "
done
But I'm not really familiar with connecting commands in a loop, as this is my first time. Would this loop work and how do I connect my loop and the Awk script?

Shell commands in a loop connect the same way they do without a loop, and you aren't very close. But yes, this can be done in a loop if you want the horribly inefficient way for some reason such as a learning experience:
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
n=$(grep -c "$line" server.log)
echo "$line" clicked $n times
done
# you only need the read || [ -n ] idiom if the input can end with an
# unterminated partial line (is illformed); awk print output can't.
# you don't really need the IFS= and -r because the data here is URLs
# which cannot contain whitespace and shouldn't contain backslash,
# but I left them in as good-habit-forming.
# in general variable expansions should be doublequoted
# to prevent wordsplitting and/or globbing, although in this case
# $line is a URL which cannot contain whitespace and practically
# cannot be a glob. $n is a number and definitely safe.
# grep -c does the count so you don't need wc -l
or more simply
awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do
echo "$line" clicked $(grep -c "$line" server.log) times
done
However if you just want the correct results, it is much more efficient and somewhat simpler to do it in one pass in awk:
awk '{n[$7]++}
END{for(i in n){
print i,"clicked",n[i],"times"}}' |
sort
# or GNU awk 4+ can do the sort itself, see the doc:
awk '{n[$7]++}
END{PROCINFO["sorted_in"]="#ind_str_asc";
for(i in n){
print i,"clicked",n[i],"times"}}'
The associative array n collects the values from the seventh field as keys, and on each line, the value for the extracted key is incremented. Thus, at the end, the keys in n are all the URLs in the file, and the value for each is the number of times it occurred.

Related

Naming awk output in loop

I'm relatively new to the world of shell scripts so hopefully this won't be too difficult. I have a file (dirlist) with a list of directories. I want to
cat 'dirlist' with the path to each file
use a program called samtools to modify the file from dirlist
use awk to subset the samtools output on a variable chr17
write the output to a file that uses the 8th field of the directory, from 'dirlist' for naming
do this for all the files listed in dirlist
I think I have all the pieces here. Items 1-3 are working fine but the loop is simply naming the file "echo".
for i in `cat dirlist`; do samtools depth $i | awk '$1 == "chr17" {print $0}' echo $i | awk -F'[/]' '{print $8}'; done
Any help would be greatly appreciated
A native bash implementation (just one process, rather than starting an awk for every file) follows:
while IFS= read -r filename; do
while IFS= read -r line; do
if [[ $line = "chr17"[[:space:]]* ]]; then
IFS=/ read -r -a pieces <<<"$filename"
printf '%s\n' "${pieces[7]}"
fi
done < <(samtools depth "$filename")
done <dirlist
I think that's what you want to do
... | awk -v f="$i" 'BEGIN{split(f,fs,"/")} $1=="chr17" {print > fs[8]}'
the final file name will be generated from the original file name split by "/" and use only the 8th segment. Kind of unusual, perhaps needs some error handling.
not tested, caveat emptor...

Read a file in a Bash script

I have a file in my file system. I want to read that file in bash script. File format is different i want to read only selected values from the file. I don't want to read the whole file as the file is very huge. Below is my file format:
Name=TEST
Add=TEST
LOC=TEST
In the file it will have data like above. From that I want to get only Add date in a variable. Could you please suggest me how I can do this.
As of now i am doing this to read the file:
file="data.txt"
while IFS= read line
do
# display $line or do somthing with $line
echo "$line"
done < "$file"
Use the right tool meant for the job, Awk in this case to speed things up!
dateValue="$(awk -F"=" '$1=="Add"{print $2; exit}' file)"
printf "%s\n" "dateValue"
TEST
The idea is to split input lines by = as the de-limiter. The awk logic works by checking the $1 field which equals to Add and prints the corresponding value associated with it.
The exit part after print is optional. It will quit the processing as soon as the Add string is met. It will help in quick processing if the file is huge as you have indicated.
You could rewrite your loop this way, notice the break after you got your line:
while IFS='=' read -r key value; do
if [[ $value == "Add" ]]; then
# your logic
break
fi
done < "$file"
If your intention is to just get the very first occurrence of "Add=", then you could use grep this way:
value=$(grep -m 1 '^Add=' "$file" | cut -f2 -d=)

Extract first word in colon separated text file

How do i iterate through a file and print the first word only. The line is colon separated. example
root:01:02:toor
the file contains several lines. And this is what i've done so far but it does'nt work.
FILE=$1
k=1
while read line; do
echo $1 | awk -F ':'
((k++))
done < $FILE
I'm not good with bash-scripting at all. So this is probably very trivial for one of you..
edit: variable k is to count the lines.
Use cut:
cut -d: -f1 filename
-d specifies the delimiter
-f specifies the field(s) to keep
If you need to count the lines, just
count=$( wc -l < filename )
-l tells wc to count lines
awk -F: '{print $1}' FILENAME
That will print the first word when separated by colon. Is this what you are looking for?
To use a loop, you can do something like this:
$ cat test.txt
root:hello:1
user:bye:2
test.sh
#!/bin/bash
while IFS=':' read -r line || [[ -n $line ]]; do
echo $line | awk -F: '{print $1}'
done < test.txt
Example of reading line by line in bash: Read a file line by line assigning the value to a variable
Result:
$ ./test.sh
root
user
A solution using perl
%> perl -F: -ane 'print "$F[0]\n";' [file(s)]
change the "\n" to " " if you don't want a new line printed.
You can get the first word without any external commands in bash like so:
printf '%s' "${line%%:*}"
which will access the variable named line and delete everything that matches the glob :* and do so greedily, so as close to the front (that's the %% instead of a single %).
Though with this solution you do need to do the loop yourself. If this is the only thing you want to do with the variable the cut solution is better so you don't have to do the file iteration yourself.

BASH - How to retrieve a single line from the file?

How to retrieve a single line from the file?
file.txt
"aaaaaaa"
"bbbbbbb"
"ccccccc"
"ddddddd"
I need to retrieve the line 3 ("ccccccc")
Thank you.
sed is your friend. sed -n 3p prints the third line (-n: no automatic print, 3p: print when line number is 3). You can also have much more complex patterns, for example sed -n 3,10p to print lines 3 to 10.
If the file is very big, you may consider to not cycle through the whole file, but quit after the print. sed -n '3{p;q}'
If you know you need line 3, one approach is to use head to get the first three lines, and tail to get only the last of these:
varname="$(head -n 3 file.txt | tail -n 1)"
Another approach, using only Bash builtins, is to call read three times:
{ read ; read ; IFS= read -r varname } < file.txt
Here's a way to do it with awk:
awk 'FNR==3 {print; exit}' file.txt
Explanation:
awk '...' : Invoke awk, a tool for manipulating files line-by-line. Instructions enclosed by single quotes are executed by awk.
FNR==3 {print; exit}: FNR stands for "File Number Records"; just think of it as "number of lines read so far for this file". Here we are saying, if we are on the 3rd line of the file, print the entire line and then exit awk immediately so we don't waste time reading the rest of a large file.
file.txt: specify the input file as an argument to awk to save a cat.
There are many possibilities: Try so:
sed '3!d' test
Here is a very fast version:
sed "1d; 2d; 3q"
Are other tools than bash allowed? On systems that include bash, you'll usually find sed and awk or other basic tools:
$ line="$(sed -ne 3p input.txt)"
$ echo "$line"
or
$ read line < <(awk 'NR==3' input.txt)
$ echo "$line"
or if you want to optimize this by quitting after the 3rd line is read:
$ read line < <(awk 'NR==3{print;nextfile}' input.txt)
$ echo "$line"
or how about even simpler tools (though less optimized):
$ line="`head -n 3 input.txt | tail -n 1`"
$ echo "$line"
Of course, if you really want to do this all within bash, you can still make it a one-liner, without using any external tools.
$ for (( i=3 ; i-- ; )); do read line; done < input.txt
$ echo "$line"
There are many ways to achieve the same thing. Pick one that makes sense for your task. Next time, perhaps explain your overall needs a bit better, so we can give you answers more applicable to your situation.
Since, as usual, all the other answers involve trivial and usual stuff (pipe through grep then awk then sed then cut or you-name-it), here's a very unusual and (sadly) not very well-known one (so, I hereby claim that I have the most original answer):
mapfile -s2 -n3 -t < input.txt
echo "$MAPFILE"
I would say this is fairly efficient (mapfile is quite efficient and it's a bash builtin).
Done!
Fast bash version;
while (( ${i:-1} <= 3 )); do
(( $i == 3 )) && read -r line; (( i++ ))
done < file.txt
Output
echo "$line" # Third line
"ccccccc"
Explanation
while (( ${i:-1} <= 3 )) - Count until $i equals 3 then exit loop.
(( $i == 3 )) - If $i is equal to 3 execute read line.
read -r line - Read the file line into variable $line.
(( i++ )) - Increment $i by 1 at each loop.
done < file.txt - Pipe file into while loop.

How to handle variables that contain ";"?

I have a configuration file that contains lines like "hallo;welt;" and i want to do a grep on this file.
Whenever i try something like grep "$1;$2" my.config or echo "$1;$2 of even line="$1;$2" my script fails with something like:
: command not found95: line 155: =hallo...
How can i tell bash to ignore ; while evaluating "..." blocks?
EDIT: an example of my code.
# find entry
$line=$(grep "$1;$2;" $PERMISSIONSFILE)
# splitt line
reads=$(echo $line | cut -d';' -f3)
writes=$(echo $line | cut -d';' -f4)
admins=$(echo $line | cut -d';' -f5)
# do some stuff on the permissions
# replace old line with new line
nline="$1;$2;$reads;$writes;$admins"
sed -i "s/$line/$nline/g" $TEMPPERM
my script should be called like this: sh script "table" "a.b.*.>"
EDIT: another, simpler example
$test=$(grep "$1;$2;" temp.authorization.config)
the temp file:
table;pattern;read;write;stuff
the call sh test.sh table pattern results in: : command not foundtable;pattern;read;write;stuff
Don't use $ on the left side of an assignment in bash -- if you do it'll substitute the current value of the variable rather than assigning to it. That is, use:
test=$(grep "$1;$2;" temp.authorization.config)
instead of:
$test=$(grep "$1;$2;" temp.authorization.config)
Edit: also, variable expansions should be in double-quotes unless there's a good reason otherwise. For example, use:
reads=$(echo "$line" | cut -d';' -f3)
instead of:
reads=$(echo $line | cut -d';' -f3)
This doesn't matter for semicolons, but does matter for spaces, wildcards, and a few other things.
A ; inside quotes has no meaning at all for bash. However, if $1 contains a doublequote itself, then you'll end up with
grep "something";$2"
which'll be parsed by bash as two separate commands:
grep "something" ; other"
^---command 1----^ ^----command 2---^
Show please show exactly what your script is doing around the spot the error is occurring, and what data you're feeding into it.
Counter-example:
$ cat file.txt
hello;welt;
hello;world;
hell;welt;
$ cat xx.sh
grep "$1;$2" file.txt
$ bash -x xx.sh hello welt
+ grep 'hello;welt' file.txt
hello;welt;
$
You have not yet classified your problem accurately.
If you try to assign the result of grep to a variable (like I do) your example breaks.
Please show what you mean. Using the same data file as before and doing an assignment, this is the output I get:
$ cat xx.sh
grep "$1;$2" file.txt
output=$(grep "$1;$2" file.txt)
echo "$output"
$ bash -x xx.sh hello welt
+ grep 'hello;welt' file.txt
hello;welt;
++ grep 'hello;welt' file.txt
+ output='hello;welt;'
+ echo 'hello;welt;'
hello;welt;
$
Seems to work for me. It also demonstrates why the question needs an explicit, complete, executable, minimal example so that we can see what the questioner is doing that is different from what people answering the question think is happening.
I see you've provided some sample code:
# find entry
$line=$(grep "$1;$2;" $PERMISSIONSFILE)
# splitt line
reads=$(echo $line | cut -d';' -f3)
writes=$(echo $line | cut -d';' -f4)
admins=$(echo $line | cut -d';' -f5)
The line $line=$(grep ...) is wrong. You should omit the $ before line. Although it is syntactically correct, it means 'assign to the variable whose name is stored in $line the result of the grep command'. That is unlikely to be what you had in mind. It is, occasionally, useful. However, those occasions are few and far between, and only for people who know what they're doing and who can document accurately what they're doing.
For safety if nothing else, I would also enclose the $line values in double quotes in the echo lines. It may not strictly be necessary, but it is simple protective programming.
The changes lead to:
# find entry
line=$(grep "$1;$2;" $PERMISSIONSFILE)
# split line
reads=$( echo "$line" | cut -d';' -f3)
writes=$(echo "$line" | cut -d';' -f4)
admins=$(echo "$line" | cut -d';' -f5)
The rest of your script was fine.
It seems like you are trying to read a semicolon-delimited file, identify a line starting with 'table;pattern;' where table is a string you specify and pettern is a regular expression grep will understand. Once the line is identified you wish to replaced the 3rd, 4th and 5th fields with different data and write the updated line back to the file.
Does this sound correct?
If so, try this code
#!/bin/bash
in_table="$1"
in_pattern="$2"
file="$3"
while IFS=';' read -r -d$'\n' tuple pattern reads writes admins ; do
line=$(cut -d: -f1<<<"$tuple")
table=$(cut -d: -f2<<<"$tuple")
# do some stuff with the variables
# e.g., update the values
reads=1
writes=2
admins=12345
# replace the old line with the new line
sed -i'' -n $line'{i\
'"$table;$pattern;$reads;$writes;$admins"'
;d;}' "$file"
done < <(grep -n '^'"${in_table}"';'"${in_pattern}"';' "${file}")
I chose to update by line number here to avoid problems of unknown characters in the left hand of the substitution.

Resources