Can't format properly with AWK? - bash

Does anyone know how to align the columns properly? I've spent so much time on it but when I make changes the rest of the columns fall out of alignment. It needs to be neatly aligned like the little table I drew below:
________________________________________________________
|Username| UID | GID | Home | Shell |
________________________________________________________
| root | 0 | 0 |/root |/bin/bash |
| posgres| 120 | 125 |/var/lib/postgresql |/bin/bash |
| student| 100 | 1000|/home/student |/bin/bash |
________________________________________________________
#!/bin/bash
echo "_________________________________________________________"
echo -e "| Username\t| UID\t| GID\t| Home\t| Shell\t\t|"
echo "---------------------------------------------------------"
awk 'BEGIN { FS=":" }
{ printf("\033[34m| %s\033[0m\t\t| %s\t| %s\t| %s\t| %s\t|\n", $1, $3, $4, $6, $7)
}' /etc/passwd | sed -n '/\/bin\/bash/p'

Does anyone know how to align the columns properly?
It's what our lecturer wanted.
Since it could be an assignment, I will share some hints instead of posting the working codes.
First of all, awk can do the job of your |sed, you can save one process. For example awk '/pattern/{print $1}
To build the table, you need to find out the longest value in each column before you really print the output. Because you need this value to decide the column width in your printf function. Read the doc of the printf function, you can pad the %s.
You can read the lines you need and store each column in an array, e.g. col1[1]=foo;col2[1]=bar; col1[2]=foo2; col2[2]=bar2 here [1] would be the line number NR. You can also use a multi-dimension array or the array of array to do that. Do a google search you will find some tutorials.
When you got everything you need, you can start printing.
good luck.

With bash, awk, ed and column it's not pretty but it does the job.
#!/usr/bin/env bash
mapfile -t passwd < <(
printf '0a\nUsername:2:UID:GID:5:Home:Shell:/bin/bash\n.\n,p\nQ\n' |
ed -s /etc/passwd
)
ed -s <(
printf '%s\n' "${passwd[#]}" |
awk 'BEGIN { FS=":" ; OFS=" | "
} /\/bin\/bash$/ {
print "| " $1, $3, $4, $6, $7 " |"
}' | column -t) < <(
echo '$t$
$s/|/+/g
$s/[^+]/-/g
1t0
1s/|/+/g
1s/[^+]/-/g
3t3
3s/|/+/g
3s/[^+]/-/g
,p
Q
'
)
As far as remember just awk and column and GNU datamash can do what the OP's is asking but I can't remember where is the link now.
Works on GNU ed but should work also with the BSD variant of ed
Also mapfile is a bash4+ feature jfyi.

Related

Printing data vertically in columns using paste

I'm attempting to use paste to list data vertically in two columns.
Pretend I have the below data in a text file
Bob:75:Male
Mary:85:Female
Troy:12:Male
I extract all the names and store it as a variable:
NAMES=$(cat $FILE | awk -F: '{print $1}')
I also do the same for age
AGE=$(cat $FILE | awk -F: '{print $2}')
I now would like to paste them together, however paste requires that you use files with text files. For simplicities sake I'd rather not create an extra file, so how do I input the variable into paste to list the data in rows and columns?
Any help is appreciated, thanks :)
EDIT: Just to clarify I have dumbed down the question, I am doing other things to the data so the solution I'm looking for isn't simply using
awk -F: '{print $1 $2}'
I need to input to paste using variables from an awk statement.
EDIT #2: To answer the comment it should look like
Bob 75
Mary 85
Tory 12
You can use paste with process substitution:
#!/bin/bash
# read value from `$file` in 2 arrays
names=()
age=()
while read -r n a; do
names+=($n)
age+=($a)
done < <(awk -F: '{print $1, $2}' "$file")
# paste 2 arrays together
paste <(printf "%s\n" "${names[#]}") <(printf "%s\n" "${age[#]}")
Bob 75
Mary 85
Troy 12
paste -d' ' <(echo $NAMES | tr ' ' '\n') <(echo $AGE | tr ' ' '\n')

awk loop over all fields in one file

This statement gives me the count of unique values in column 1:
awk -F ',' '{print $1}' infile1.csv | sort | uniq -c | sort -nr > outfile1.csv
It does what I expected (gives the count (left) of unique values (right) in the column):
117 5
58 0
18 4
14 3
11 1
9 2
However, now I want to create a loop, so it will go through all columns.
I tried:
for i in {1..10}
do
awk -F ',' '{print $$i}' infile.csv | sort | uniq -c | sort -nr > outfile$i.csv
done
This does not do the job (it does produce a file but with much more data). I think that a variable in a print statement, as I tried with print $$i, is not something that works in general, since I did not come across it so far.
I also tried this:
awk -F ',' '{for(i=1;i<=NF;i++) infile.csv | sort | uniq -c | sort -nr}' > outfile$i.csv
But this does not give any result at all (meaning syntax errors for infile and sort command). I am sure I am using the for statement the wrong way.
Ideally, I would like the code to find the count of unique values for each column and print them all in the same output file. However, I am already very happy with a well functioning loop.
Please let me know if this explanation is not good enough, I will do my best to clarify.
Any time you write a loop in shell just to manipulate text you have the wrong approach. Just do it in one awk command, something like this using GNU awk for 2D arrays and sorted in (untested since you didn't provide any sample input):
awk -F, '
BEGIN { PROCINFO["sorted_in"] = "#val_num_desc" }
{ for (i=1; i<=NF; i++) cnt[i][$i]++ }
END {
for (i=1; i<=NF; i++)
for (val in cnt[i])
print val, cnt[i][val] > ("outfile" i ".csv")
}
' infile.csv
No need for half a dozen different commands, pipes, etc.
You want to loop through the columns and perform the same command in each one of them. So what you are doing is fine: pass the column name to awk. However, you need to pass the value differently, so that it is an awk variable:
for i in {1..10}
do
awk -F ',' -v col=$i '{print $col}' infile.csv | sort | uniq -c | sort -nr > outfile$i.csv
^^^^^^^^^^^^^^^^^^^^^^^^
done

BASH: Cannot awk with a variable in a while loop

I have a Problem when trying to awk a READ input in a while loop.
This is my code:
#!/bin/bash
read -p "Please enter the Array LUN ID (ALU) you wish to query, separated by a comma (e.g. 2036,2037,2045): " ARRAY_LUNS
LUN_NUMBER=`echo $ARRAY_LUNS | awk -F "," '{ for (i=1; i<NF; i++) printf $i"\n" ; print $NF }' | wc -w`
echo "you entered $LUN_NUMBER LUN's"
s=0
while [ $s -lt $LUN_NUMBER ];
do
s=$[$s+1]
LUN_ID=`echo $ARRAY_LUNS | awk -F, '{print $'$s'}' | awk -v n1="$s" 'NR==n1'`
echo "NR $s :"
echo "awk -v n1="$s" 'NR==n1'$LUN_ID"
done
No matter what options with awk i try, i dont get it to display more than the first entry before the comma. It looks to me, like the loop has some problems to get the variable s counted upwards. But on the other hand, the code line:
LUN_ID=`echo $ARRAY_LUNS | awk -F, '{print $'$s'}' | awk -v n1="$s" 'NR==n1'`
works just great! Any idea on how to solve this. Another solution to my READ input would be just fine as well.
#!/bin/bash
typeset -a ARRAY_LUNS
IFS=, read -a -p "Please enter the Array LUN ID (ALU) you wish to query, separated by a comma (e.g. 2036,2037,2045): " ARRAY_LUNS
LUN_NUMBER="${#ARRAY_LUNS[#]}"
echo "you entered $LUN_NUMBER LUNs"
for((s=0;s<LUN_NUMBER;s++))
do
echo "LUN id $s: ${ARRAY_LUNS[s]}"
done
Why does your awk code not work?
The problem is not the counter. I said The last awk command in the pipe i.e.
awk -v n1="$s" 'NR==n1'.
This awk code tries to print the first line when s is 1, the second line when s is 2, the third line when s is 3, and so on... But how many lines are printed by echo $ARRAY_LUNS? Just ONE... there is no second line, no third line... just ONE line and just ONE line is printed.
That line contains all LUN_IDs in ONE LINE, i.e, one LUN_ID next to another LUN_ID, like this way:
34 45 21 223
NOT this way
34
45
21
223
Those LUN_IDs are fields printable by awk using $1, $2, $3, ... and so on.
Therefore if you want you code to run fine just remove that last command in the pipe:
LUN_ID=$(echo "$ARRAY_LUNS" | awk -F, '{print $'$s'}')
Please, for any further question, firstly read this awk guide

bash awk first 1st column and 3rd column with everything after

I am working on the following bash script:
# contents of dbfake file
1 100% file 1
2 99% file name 2
3 100% file name 3
#!/bin/bash
# cat out data
cat dbfake |
# select lines containing 100%
grep 100% |
# print the first and third columns
awk '{print $1, $3}' |
# echo out id and file name and log
xargs -rI % sh -c '{ echo %; echo "%" >> "fake.log"; }'
exit 0
This script works ok, but how do I print everything in column $3 and then all columns after?
You can use cut instead of awk in this case:
cut -f1,3- -d ' '
awk '{ $2 = ""; print }' # remove col 2
If you don't mind a little whitespace:
awk '{ $2="" }1'
But UUOC and grep:
< dbfake awk '/100%/ { $2="" }1' | ...
If you'd like to trim that whitespace:
< dbfake awk '/100%/ { $2=""; sub(FS "+", FS) }1' | ...
For fun, here's another way using GNU sed:
< dbfake sed -r '/100%/s/^(\S+)\s+\S+(.*)/\1\2/' | ...
All you need is:
awk 'sub(/.*100% /,"")' dbfake | tee "fake.log"
Others responded in various ways, but I want to point that using xargs to multiplex output is rather bad idea.
Instead, why don't you:
awk '$2=="100%" { sub("100%[[:space:]]*",""); print; print >>"fake.log"}' dbfake
That's all. You don't need grep, you don't need multiple pipes, and definitely you don't need to fork shell for every line you're outputting.
You could do awk ...; print}' | tee fake.log, but there is not much point in forking tee, if awk can handle it as well.

how to print IP_ADDRESS:port # before sed command

Is there an easy way to print the IP_Address:port# ? Because I soon as it gets to the SED command, the port :# is stripped
input file example
Apr 6 14:20:41 TCP 178.255.83.1:80 in
preferred output like this
Apr 6 14:20:41 TCP 178.255.83.1:80 in United Kingdom
egrep -w 'TCP|UDP' $Denied_IPs |
sed 's/:[^:]* in/ in/; s/:[^:]* out/ out/' |
awk '{cmd="echo "$5" | code | fgrep 'Country:' | cut -c 16-43";
cmd | getline rslt;
close(cmd);
print $1" "$2" "$3" "$4" "$5" "$6" "rslt}' >> "$IP2COUNTRY"
The sed command is stripping the port explicitly. Given that that is all the sed command is doing, simply remove it from the expression.
That's a rather unoptimal implementation, by the way. Especially after we remove the sed, the egrep can be folded into the awk:
awk '/ (TCP|UDP) / {
split($5, addr, /:/);
cmd = "echo " addr[1] " | code | fgrep Country: | cut -c 16-43";
cmd | getline rslt;
close(cmd);
print $1, $2, $3, $4, $5, $6, rslt
}' < "$Denied_IPs" >> "$IP2COUNTRY"
and I can't help but think that the invocation of code within awk can be optimized a bit.
(I also removed the single quotes around 'Country:', which were doing nothing useful — and if they had been needed, they would in fact have broken the script because the whole thing is already wrapped in single quotes.)

Resources