Find the first or third column - shell

The following command is working as expected. What I need to find is the thread id that is available in the first or third column.
# tail -1000 general.log | grep Connect | egrep -v "(abc|slave_user)"
2856057 Connect root#localhost on
111116 5:14:01 2856094 Connect root#localhost on
If the line starts with the date, select the third column i.e. 2856094 or the first column i.e. 2856057
Expected output:
2856057
2856094

Another way to look at it is that you always take the fourth column when counting from the right:
awk '{ print $(NF-3) }'
Otherwise, if the date is really the only reliable indicator, try this:
awk -v Date=$(date "+%y%m%d") '$1 == Date { print $3; next } { print $1 }'

If your data really is that regular (i.e. all the columns are fixed width), then you could use cut:
tail -1000 general.log | grep Connect | egrep -v "(abc|slave_user)" | cut -c17-23

This might work for you:
tail -1000 general.log | sed -e '/abc\|slave_user/d;/ Connect.*/!d;s///;s/.* //'

Use the awk inbuilt variable NF to capture the number of fields. If they equal to 6 then print 3 column else print 1st column.
awk 'NF==6{ print $3;next } { print $1 }' INPUT_FILE

Without knowing the format of the file, maybe try:
$ tail -1000 general.log | grep Connect | egrep -v "(abc|slave_user)" | awk '{if ($3 == "root#localhost"){print $1;}else{print $3}}'
Or maybe this would work which is simpler:
$ awk '/Connect/ {if ($3 == "root#localhost"){print $1;}else{print $3}}' general.log
I tried. If I'm wrong, or there is a better way, I to will learn it in time. :)
Maybe this using int() ??????
$ awk '/Connect/ {if (!int($3)){print $1;}else{print $3}}' general.log

Related

Cut and sort delimited dates from stdout via pipe

I am trying to split some strings from stdout to get the dates from it, but I have two cases
full.20201004T033103Z.vol93.difftar.gz
full.20201007T033103Z.vol94.difftar.gz
Which should produce: 20201007T033103Z which is the nearest date to now (newest)
Or:
inc.20200830T033103Z.to.20200906T033103Z.vol1.difftar.gz
inc.20200929T033103Z.to.20200908T033103Z.vol10.difftar.gz
Should get the second date (after .to.) not the first one, and print only the newest date: 20200908T033103Z
What I tried:
cat dates_file | awk -F '.to.' 'NF > 1 {print $2}' | cut -d\. -f1 | sort -r -t- -k3.1,3.4 -k2,2 | head -1
This only works for the second case and not covering the first, also I am not sure about the date sorting logic.
Here is a sample data
full.20201004T033103Z.vol93.difftar.gz
full.20201004T033103Z.vol94.difftar.gz
full.20201004T033103Z.vol95.difftar.gz
full.20201004T033103Z.vol96.difftar.gz
full.20201004T033103Z.vol97.difftar.gz
full.20201004T033103Z.vol98.difftar.gz
full.20201004T033103Z.vol99.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.manifest
inc.20200830T033103Z.to.20200906T033103Z.vol1.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol10.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol11.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol12.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol13.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol14.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol15.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol16.difftar.gz
inc.20200830T033103Z.to.20200906T033103Z.vol17.difftar.gz
To get most recent data from your sample data you can use this awk:
awk '{
sub(/^(.*\.to|[^.]+)\./, "")
gsub(/\..+$|[TZ]/, "")
}
$0 > max {
max = $0
}
END {
print max
}' file
20201004033103

awk or shell command to count occurence of value in 1st column based on values in 4th column

I have a large file with records like below :
jon,1,2,apple
jon,1,2,oranges
jon,1,2,pineaaple
fred,1,2,apple
tom,1,2,apple
tom,1,2,oranges
mary,1,2,apple
I want to find the no of person (names in col 1) have apple and oranges both. And the command should take as less memory as possible and should be fast. Any help appreciated!
Output :
awk/sed file => 2 (jon and tom)
Using awk is pretty easy:
awk -F, \
'$4 == "apple" { apple[$1]++ }
$4 == "oranges" { orange[$1]++ }
END { for (name in apple) if (orange[name]) print name }' data
It produces the required output on the sample data file:
jon
tom
Yes, you could squish all the code onto a single line, and shorten the names, and otherwise obfuscate the code.
Another way to do this avoids the END block:
awk -F, \
'$4 == "apple" { if (apple[$1]++ == 0 && orange[$1]) print $1 }
$4 == "oranges" { if (orange[$1]++ == 0 && apple[$1]) print $1 }' data
When it encounters an apple entry for the first time for a given name, it checks to see if the name also (already) has an entry for oranges and prints it if it has; likewise and symmetrically, if it encounters an orange entry for the first time for a given name, it checks to see if the name also has an entry for apple and prints it if it has.
As noted by Sundeep in a comment, it could use in:
awk -F, \
'$4 == "apple" { if (apple[$1]++ == 0 && $1 in orange) print $1 }
$4 == "oranges" { if (orange[$1]++ == 0 && $1 in apple) print $1 }' data
The first answer could also use in in the END loop.
Note that all these solutions could be embedded in a script that would accept data from standard input (a pipe or a redirected file) — they have no need to read the input file twice. You'd replace data with "$#" to process file names if they're given, or standard input if no file names are specified. This flexibility is worth preserving when possible.
With awk
$ awk -F, 'NR==FNR{if($NF=="apple") a[$1]; next}
$NF=="oranges" && ($1 in a){print $1}' ip.txt ip.txt
jon
tom
This processes the input twice
In first pass, add key to an array if last field is apple (-F, would set , as input field separator)
In second pass, check if last field is oranges and if first field is a key of array a
To print only number of matches:
$ awk -F, 'NR==FNR{if($NF=="apple") a[$1]; next}
$NF=="oranges" && ($1 in a){c++} END{print c}' ip.txt ip.txt
2
Further reading: idiomatic awk for details on two file processing and awk idioms
I did a work around and used only grep and comm commands.
grep "apple" file | cut -d"," -f1 | sort > file1
grep "orange" file | cut -d"," -f1 | sort > file2
comm -12 file1 file2 > names.having.both.apple&orange
comm -12 shows only the common names between the 2 files.
Solution from Jonathan also worked.
For the input:
jon,1,2,apple
jon,1,2,oranges
jon,1,2,pineaaple
fred,1,2,apple
tom,1,2,apple
tom,1,2,oranges
mary,1,2,apple
the command:
sed -n "/apple\|oranges/p" inputfile | cut -d"," -f1 | uniq -d
will output a list of people with both apples and oranges:
jon
tom
Edit after comment: For an for input file where lines are not ordered by 1st column and where each person can have two or more repeated fruits, like:
jon,1,2,apple
fred,1,2,apple
fred,1,2,apple
jon,1,2,oranges
jon,1,2,pineaaple
jon,1,2,oranges
tom,1,2,apple
mary,1,2,apple
tom,1,2,oranges
This command will work:
sed -n "/\(apple\|oranges\)$/ s/,.*,/,/p" inputfile | sort -u | cut -d, -f1 | uniq -d

How to count duplicates in Bash Shell

Hello guys I want to count how many duplicates there are in a column of a file and put the number next to them. I use awk and sort like this
awk -F '|' '{print $2}' FILE | sort | uniq -c
but the count (from the uniq -c) appears at the left side of the duplicates.
Is there any way to put the count on the right side instead of the left, using my code?
Thanks for your time!
Though I believe you shouls show us your Input_file so that we could create a single command or so for this requirement, since you have't shown Input_file so trying to solve it with your command itself.
awk -F '|' '{print $2}' FILE | sort | uniq -c | awk '{for(i=2;i<=NF;i++){printf("%s ",$i)};printf("%s%s",$1,RS)}'
You can just use awk to reverse the output like below:
awk -F '|' '{print $2}' FILE | sort | uniq -c | awk {'print $2" "$1'}
awk -F '|' '{print $2}' FILE | sort | uniq -c| awk '{a=$1; $1=""; gsub(/^ /,"",$0);print $0,a}'
You can use awk to calculate the amount of duplicates, so your command can be simplified as followed,
awk -F '|' '{a[$2]++}END{for(i in a) print i,a[i]}' FILE | sort
Check this command:
awk -F '|' '{c[$2]++} END{for (i in c) print i, c[i]}' FILE | sort
Use awk to do the counting is enough. If you do not want to sort by browser, remove the pipe and sort.

Bash string replace on command result

I have a simple bash script which is getting the load average using uptime and awk, for example
LOAD_5M=$(uptime | awk -F'load averages:' '{ print $2}' | awk '{print $2}')
However this includes a ',' at the end of the load average
e.g.
0.51,
So I have then replaced the comma with a string replace like so:
LOAD_5M=${LOAD_5M/,/}
I'm not an awk or bash wizzkid so while this gives me the result I want, I am wondering if there is a succinct way of writing this, either by:
Using awk to get the load average without the comma, or
Stripping the comma in a single line
You can do that in same awk command:
uptime | awk -F 'load averages?: *' '{split($2, a, ",? "); print a[2]}'
1.32
The 5 min load is available in /proc/loadavg. You can simply use cut:
cut -d' ' -f2 /proc/loadavg
With awk you can issue:
awk '{print $2}' /proc/loadavg
If you are not working on Linux the file /proc/loadavg will not being present. In this case I would suggest to use sed, like this:
uptime | sed 's/.*, \(.*\),.*,.*/\1/'
uptime | awk -F'load average:' '{ print $2}' | awk -F, '{print $2}'
0.38
(My uptime output has 'load average:' singular)
The load average numbers are always the last 3 fields in the 'uptime' output so:
IFS=' ,' read -a uptime_fields <<<"$(uptime)"
LOAD_5M=${uptime_fields[#]: -2:1}

bash awk first 1st column and 3rd column with everything after

I am working on the following bash script:
# contents of dbfake file
1 100% file 1
2 99% file name 2
3 100% file name 3
#!/bin/bash
# cat out data
cat dbfake |
# select lines containing 100%
grep 100% |
# print the first and third columns
awk '{print $1, $3}' |
# echo out id and file name and log
xargs -rI % sh -c '{ echo %; echo "%" >> "fake.log"; }'
exit 0
This script works ok, but how do I print everything in column $3 and then all columns after?
You can use cut instead of awk in this case:
cut -f1,3- -d ' '
awk '{ $2 = ""; print }' # remove col 2
If you don't mind a little whitespace:
awk '{ $2="" }1'
But UUOC and grep:
< dbfake awk '/100%/ { $2="" }1' | ...
If you'd like to trim that whitespace:
< dbfake awk '/100%/ { $2=""; sub(FS "+", FS) }1' | ...
For fun, here's another way using GNU sed:
< dbfake sed -r '/100%/s/^(\S+)\s+\S+(.*)/\1\2/' | ...
All you need is:
awk 'sub(/.*100% /,"")' dbfake | tee "fake.log"
Others responded in various ways, but I want to point that using xargs to multiplex output is rather bad idea.
Instead, why don't you:
awk '$2=="100%" { sub("100%[[:space:]]*",""); print; print >>"fake.log"}' dbfake
That's all. You don't need grep, you don't need multiple pipes, and definitely you don't need to fork shell for every line you're outputting.
You could do awk ...; print}' | tee fake.log, but there is not much point in forking tee, if awk can handle it as well.

Resources