how to select the last line of the shell output - bash

Hi I have a shell command like this.
s3=$(awk 'BEGIN{ print "S3 bucket path" }
/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 }
/s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
The output of the above command like this.
echo $s3
2018-02-21T17:58:22,
2018-02-21T17:58:26,
2018-02-21T18:05:33,
2018-02-21T18:05:34
I want to select the last line only. I need the last output like this.
2018-02-21T18:05:34
I tried like this.
awk -v $s3 '{print $(NF)}'
Not working.Any help will be appreciated.

In general, command | tail -n 1 prints the last line of the output from command. However, where command is of the form awk '... { ... print something }' you can refactor to awk '... { ... result = something } END { print result }' to avoid spawning a separate process just to discard the other output. (Conversely, you can replace awk '/condition/ { print something }' | head -n 1 with awk '/condition/ { print something; exit }'.)
If you already have the result in a shell variable s3 and want to print just the last line, a parameter expansion echo "${s3##*$'\n'}" does that. The C-style string $'\n' to represent a newline is a Bash extension, and the parameter expansion operator ## to remove the longest matching prefix isn't entirely portable either, so you should make sure the shebang line says #!/bin/bash, not #!/bin/sh
Notice also that $s3 without quotes is an error unless you specifically require the shell to perform whitespace tokenization and wildcard expansion on the value. You should basically always use double quotes around variables except in a couple of very specific scenarios.
Your Awk command would not work for two reasons; firstly, as explained in the previous paragraph, you are setting s3 to the first token of the variable, and the second is your Awk script (probably a syntax error). In more detail, you are basically running
awk -v s3=firstvalue secondvalue thirdvalue '{ print $(NF) }'
^ value ^ script to run ^ names of files ...
where you probably wanted to say
awk -v s3=$'firstvalue\nsecondvalue\nthirdvalue' '{ print $(NF) }'
But even with quoting, your script would set v to something but then tell Awk to (ignore the variable and) process standard input, which on the command line leaves it reading from your terminal. A fixed script might look like
awk 'END { print }' <<<"$s3"
which passes the variable as standard input to Awk, which prints the last line. The <<<value "here string" syntax is also a Bash extension, and not portable to POSIX sh.

much simple way is
command | grep "your filter" | tail -n 1
or directly
command | tail -n 1

You could try this:
echo -e "This is the first line \nThis is the second line" | awk 'END{print}'

another approach can be, processing the file from the end and exiting after first match.
tac file | awk '/match/{print; exit}'

Hi you can do it just by adding echo $s3 | sed '$!d'
s3=$(awk 'BEGIN{ print "S3 bucket path" }/Executing command\(queryId/{ sub(/.*queryId=[^[:space:]]+: /,""); q=$0 } /s3:\/\//{ print "," $10 }' OFS=',' hive-server2.log)
echo $s3 | sed '$!d'
It will simply print:-
2018-02-21T18:05:34
Hope this will help you.

Related

awk match by variable with dot in it

I have a script that will iterate over a file containing domains (google.com, youtube.com, etc). The purpose of the script is to check how many times each domain is included in the 12th column of a tab seperated value file.
while read domain; do
awk -F '\t' '$12 == '$domain'' data.txt | wc -l
done < domains.txt
However awk seems to be interpretating the dots in the domains as a special character. The following error message is shown:
awk: syntax error at source line 1
context is
$12 ~ >>> google. <<< com
awk: bailing out at source line 1
I am a beginner in bash so any help would be greatly appreciated!
When you write:
domain='google.com'
awk -F '\t' '$12 == '$domain'' data.txt
the $domain is outside of any quotes:
awk -F '\t' '$12 == '$domain' ' data.txt
< > < >
start end start end
and so exposed to the shell for interpretation first and THEN it becomes part of the body of the awk script before awk sees it. So what awk sees is:
awk -F '\t' '$12 == google.com' data.txt
and google.com is not a valid symbol (e.g. variable or function) name nor string nor number. What you MEANT to do was:
awk -F '\t' '$12 == "'"$domain"'"' data.txt
so the shell would see "$domain" instead of just $domain (see https://mywiki.wooledge.org/Quotes for why that's important) and awk would finally see:
awk -F '\t' '$12 == "google.com"' data.txt
which is fine as now "google.com" is a string, not a symbol BUT you should never allow shell variables to expand to become part of an awk script as there are other caveats so what you should really have done is:
awk -F '\t' -v dom="$domain" '$12 == dom' data.txt
See How do I use shell variables in an awk script? for more information.
By the way, even after fixing the above problem do not do this:
while read domain; do
awk -F '\t' -v dom="$domain" '$12 == dom' data.txt | wc -l
done < domains.txt
as it'll be immensely slow and contains insidious bugs (see why-is-using-a-shell-loop-to-process-text-considered-bad-practice). Do something like this instead (untested):
awk -F'\t' '
NR==FNR {
cnt[$1] = 0
next
}
$12 in cnt {
cnt[$12]++
}
END {
for ( dom in cnt ) {
print dom, cnt[dom]
}
}
' domains.txt data.txt
That will be far more efficient, robust, and portable than calling awk inside a shell read loop.
See What are NR and FNR and what does "NR==FNR" imply? for how that awk script works. Get the book Effective AWK Programming, 5th Edition, by Arnold Robbins to learn awk.
awk -F '\t' '$12 == '$domain'' data.txt | wc -l
The single quotes are building an awk program. They are not something visible to awk. So awk sees this:
$12 == google.com
Since there aren't any quotes around google.com, that is a syntax error. You just need to add quotation marks.
awk -F '\t' '$12 == "'"$domain"'"' data.txt
The quotes jammed together like that are a little confusing, but it's just this:
'....' stuff to send to awk. Single quotes are for the shell.
'..."...' a double quote inside the awk program for awk to see
'...'"..." stuff in double quotes _outside_ the awk program for the shell
We can combine those like this:
'..."'"$var"'"...'
That's a bunch of literal awk code ending in a double-quote, followed by the expansion of the shell parameter var, which is double-quoted as usual in the shell for safety, followed by more literal awk code starting with a double quotes. So the end result is a string passed to awk that includes the value of the var inside double quotes.
But you don't have to be so fancy or confusing since awk provides the -v option to set variables from the shell:
awk -v domain="$domain" '$12 == domain' data.txt
Since the domain is not quoted inside the awk code, it is interpreted as the name of a variable. (Periods are not legal in variable names, which is why you got a syntax error with your domains; if you hadn't, though, awk would have treated them as empty and been looking for lines whose twelfth field was likewise blank.)
Use a combination of cut to print the 12th column of the TAB-delimited file, sort and uniq to count the items:
cut -f12 data.txt | sort | uniq -c
This should give the count of how many lines of the input has "google.com" in $12
{m,g}awk -v __="${domain}" '
BEGIN { _*=\
( _ ="\t[^\t]*")*gsub(".",(_)_,_)*sub(".","",_)*\
gsub("[.:&=/-]","[&]",__)*sub("[[][^[]+$",__"\t?",_)*(\
FS=_ } { _+=NF } END { print _-NR }'

Remove the first and last char in the file

I have a data file, and I want to remove the first and last char. Is there a simple way to do this? I searched online, but I didn't find a simple way.
Thanks!
Could you please try following, if you are ok with awk.
awk '
prev{
if(++count==1){
print substr(prev,2)
}
else{
print prev
}
prev=""
}
{
prev=$0
}
END{
if(prev){
print substr(prev,1,length(prev)-1)
}
}' Input_file
Using awk*. Testing with Lorem\nipsum (no trailing newline, as otherwise that would be removed**):
$ echo -en Lorem\\nipsum | awk -v RS="^$" '{gsub(/^.|.$/,"")}1'
orem
ipsu
Solution reads the whole input into memory before processing so that could be considered a downside.
* Succesfully tested with gawk, awk-20121220 and Busybox awk, failed with mawk.
** "Last character":
$ echo -e Lorem\\nipsum # outputs:
Lorem\n
ipsum\n # \n is last char but print (well, 1) adds another
To avoid that, use printf instead:
$ echo -e Lorem\\nipsum | awk -v RS="^$" '{gsub(/^.|.$/,"");printf "%s",$0}'
orem
ipsum$
Try:
sed -i '1s/.\(.*\)/\1/; $s/\(.*\)./\1/' path/to/file.txt
-i: modify file in place
1s: apply substitute command to line number 1 only
$s: apply substitute command to last line only
\( ... \): a capture group to make use of later by referencing it with \1
Basically these are two sed substitute commands glued with ;
EDIT: Just realized you said simple... Not sure if this falls under simple.

Dealing with variable inside awk result division by zero

I'm writing a simple shell command using awk, as follow:
input_folder='/home/Desktop/files'
results_folder='/home/results'
for entry in $input_folder/*
do
re=$(samtools view -H $entry | grep -P '^#SQ' | cut -f 3 -d ':' | awk '{sum+=$1} END {print sum}')
echo -e "$(samtools depth $entry | awk '{sum+=$3} END { print $(sum/$re)}')\t/$entry" >> $results_folder/Results.txt
done
the result in variable re is a number but using the result of re into the second command print $(sum/$re)}' give me this error
awk: cmd. line:1: (FILENAME=- FNR=312843568) fatal: division by zero attempted
I tried not to put $ with the variable but also the same error.
Any help with that please?
Change the awk part to:
awk -v re="$re" '{sum+=$3} END { if(re) print sum/re; else print "oo";}'
You have to use -v to transfer the variable into awk.
And also it's better to check if re is zero.
I used oo to represent Infinity symbol.
I am not clear why you are sending output of echo command to awk. YOur actual awk command should be to avoid your error (in which it tells that you are dividing it by zero). Try changing your awk program to following once?
awk -v re="$re" '{sum+=$3} END {if(re){print (sum/re)} else {print "Please check seems value of re is ZERO else you will get an error from awk program}}'
Inplacing bash variable into awk will do the job:
awk '{sum+=$3} END { print(sum/'${re}') }'
You'd better also check ${re} in bash for non zero check before passing to awk.

How to write a bash script that dumps itself out to stdout (for use as a help file)?

Sometimes I want a bash script that's mostly a help file. There are probably better ways to do things, but sometimes I want to just have a file called "awk_help" that I run, and it dumps my awk notes to the terminal.
How can I do this easily?
Another idea, use #!/bin/cat -- this will literally answer the title of your question since the shebang line will be displayed as well.
Turns out it can be done as pretty much a one liner, thanks to #CharlesDuffy for the suggestions!
Just put the following at the top of the file, and you're done
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
So for my awk_help example, it'd be:
cat "$BASH_SOURCE" | grep -v EZREMOVEHEADER
# Basic form of all awk commands
awk search pattern { program actions }
# advanced awk
awk 'BEGIN {init} search1 {actions} search2 {actions} END { final actions }' file
# awk boolean example for matching "(me OR you) OR (john AND ! doe)"
awk '( /me|you/ ) || (/john/ && ! /doe/ )' /path/to/file
# awk - print # of lines in file
awk 'END {print NR,"coins"}' coins.txt
# Sum up gold ounces in column 2, and find out value at $425/ounce
awk '/gold/ {ounces += $2} END {print "value = $" 425*ounces}' coins.txt
# Print the last column of each line in a file, using a comma (instead of space) as a field separator:
awk -F ',' '{print $NF}' filename
# Sum the values in the first column and pretty-print the values and then the total:
awk '{s+=$1; print $1} END {print "--------"; print s}' filename
# functions available
length($0) > 72, toupper,tolower
# count the # of times the word PASSED shows up in the file /tmp/out
cat /tmp/out | awk 'BEGIN {X=0} /PASSED/{X+=1; print $1 X}'
# awk regex operators
https://www.gnu.org/software/gawk/manual/html_node/Regexp-Operators.html
I found another solution that works on Mac/Linux and works exactly as one would hope.
Just use the following as your "shebang" line, and it'll output everything from line 2 on down:
test.sh
#!/usr/bin/tail -n+2
hi there
how are you
Running this gives you what you'd expect:
$ ./test.sh
hi there
how are you
and another possible solution - just use less, and that way your file will open in searchable gui
#!/usr/bin/less
and this way you can grep if for something too, e.g.
$ ./test.sh | grep something

How to execute awk command in shell script

I have an awk command that extracts the 16th column from 3rd line in a csv file and prints the first 4 characters.
awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}'
This works fine.
But when I execute it from a shell script, I get and error
#!/bin/ksh
YEAR=awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}'
Error message:
-F,: not found
Use command substitution to assign the output of a command to a variable, as shown below:
YEAR=$(awk -F"," 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,0,4)}')
you are asking the shell to do :
VAR=value command [arguments...]
which means: launch command but pass it the VAR=value environment first
(ex: LC_ALL=C grep '[0-9]*' /some/file.txt : will grep a number in file.txt (and this with the LC_ALL variable set to C just for the duration of the call of grep)
So here : you ask the shell to launch the -F"," command (ie, -F, once the shell interpret the "," into , with arguments 'NR==3.......... and with the variable YEAR set to the value awk for the duration of the command invocation.
Just replace it with :
#!/bin/ksh
YEAR="$(awk -F',' 'NR==3{print $16}' sample.csv|sed -e 's/^[ \t]*//'|awk '{print substr($0,1,4)}')"
(I didn't try it, but I hope they work for you and your sample.csv file)
(Note that you use "0" to match character position 1, which works in many awk implementation but not all (ie most (but not all) assume 1 when you write 0))
From your description, it looks like you want to extract the year from the 16th field, which might contain leading spaces. You can accomplish it by calling AWK once:
YEAR=$(awk -F, 'NR==3{sub(/^[ \t]*/, "", $16); print ">" substr($16,1,4) "<" }')
Better yet, you don't even have to use awk. Since you are already writing shell script, let's do it all in shell script:
{ read line; read line; read line; } < sample.csv # Get the third line
IFS=, set $line # Breaks line into comma-separated fields
IFS=" " set ${16} # Trick to remove leading spaces, field 16 becomes field 1
YEAR=${1:0:4} # Extract the first 4 char from field 1
Do this:
year=$(awk -F, 'NR==3{sub(/^[ \t]+/,"",$16); print substr($16,1,4); exit }' sample.csv)

Resources