I have some ascii file like:
color green
color black
color yellow
os Linux
os android
os windows
os mac
drink coffee
drink water
number 1
number 0
number 100
I want to make a table something like:
color os drink number
-----------------------------------
green Linux coffee 1
black android water 0
yellow windows 100
mac
I wrote this code and it does what I want.
Are there some better solutions?
#!/bin/bash
inpfile=$1
keys=$(cat $inpfile | cut -d" " -f 1 | uniq)
for key in $keys
do
values=$(cat $inpfile | grep $key | cut -d" " -f 2)
result=$(echo "$key ----- $values")
echo $result | datamash -W transpose > /tmp/$key.table
done
paste -d "\t" /tmp/*.table
rm /tmp/*.table
The shell script in your question would fail given various input values (substrings, regexp metachars, spaces, etc.) and/or environment settings and/or even the contents of the directory you run it from. Copy/paste it into http://shellcheck.net and it'll tell you about some of the issues. It'll also be extremely slow.
Here's how to do it robustly, efficiently, and portably using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { OFS="\t" }
!($1 in tag2colNr) {
tag2colNr[$1] = ++numCols
rowNr = ++numVals[numCols]
vals[rowNr,numCols] = $1
}
{
colNr = tag2colNr[$1]
rowNr = ++numVals[colNr]
vals[rowNr,colNr] = $2
numRows = (numVals[colNr]>numRows ? numVals[colNr] : numRows)
}
END {
for (rowNr=1; rowNr<=numRows; rowNr++) {
for (colNr=1; colNr<=numCols; colNr++) {
printf "%s%s", vals[rowNr,colNr], (colNr<numCols ? OFS : ORS)
}
}
}
$ awk -f tst.awk file
color os drink number
green Linux coffee 1
black android water 0
yellow windows 100
mac
There are various ways to add the line of dashes under the tags (header) line and/or change the spacing between fields if you feel there's some value in it but since you said "I want to make a table something like..." I assume the above is like enough.
Another possibility is using tput:
#!/bin/bash
set -euo pipefail
function handleFile() {
local data_file="${1}"
local x=0
local y=0
local column_width=20
# we are using tput below to place the cursor,
# so clear the screen for clean output.
clear
while read -r header ; do
# each header is a new column so reset y coordinate to 0
y=0
# place header with underline
printf "$(tput cup "${y}" "${x}")$(tput smul)%s$(tput sgr0)" "${header}"
while read -r row ; do
# move down 1 row by incrementing y coordinate value
((y+=1))
# place the current row value without underline
printf "$(tput cup "${y}" "${x}")%s$(tput sgr0)" "${row}"
# get row values for each header using grep/awk
done < <(grep "${header}" "${data_file}" | awk '{ print $2 }')
# move to next column by incrementing x coordinate by desired column width
((x+=column_width))
# use awk/sort to get unique header values
done < <(awk '{ print $1 }' "${data_file}" | sort -u)
# move cursor down 2 lines and then remove all cursor settings
((y+=2)) && tput cup "${y}" 0 && tput sgr0
}
handleFile "$#"
Sample usage is:
./script.sh data.txt
color drink number os
green coffee 1 Linux
black water 0 android
yellow 100 windows
mac
Related
---- my text file from which i have to search for the keywords [name of the file --- test] <cat -Evt file>
centos is my bro$
red hat is my course$
ubuntu is my OS$
fqdn is stupid $
$
$
$
tom outsmart jerry$
red hat is my boy$
jerry is samall
------ keyword file is [word.txt] <cat -Evt file >
red hat$
we$
hello$
bye$
Compensation
----- my code
while read "p"; do
paste -d',' <(echo -n "$p" ) <(echo "searchall") <( grep -i "$p" test | wc -l) <(grep -i -A 1 -B 1 "$p" test )
done <word.txt
---- my expectation ,output should be
keyword,serchall,frequency,line above it
line it find keyword in
line below it
red hat,searchall,2,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,tom outsmart jerry
red hat is my boy
jerry is samall
---- but coming OUTPUT from my code
red hat,searchall,2,centos is my bro
,,,red hat is my course
,,,ubuntu is my OS
,,,--
,,,tom outsmart jerry
,,,red hat is my boy
,,,jerry is samall
---- please give me suggestion and point me in the right direction to get the desired output.
---- i am trying to grep the keyword from the file and printing them
Here two records should create as keyword (red hat) is coming two time
----how can i loop through the coming frequency of the keyword.
This sounds very much like a homework assignment.
c.f. BashFAQ for better reads; keeping this simple to focus on what you asked for.
Rewritten for more precise formatting -
while read key # read each search key
do cnt=$(grep "$key" test|wc -l) # count the hits
pad="$key,searchall,$cnt," # build the "header" fields
while read line # read the input from grep
do if [[ "$line" =~ ^-- ]] # treat hits separately
then pad="$key,searchall,$cnt," # reset the "header"
echo # add the blank line
continue # skip to next line of data
fi
echo "$pad$line" # echo "header" and data
pad="${pad//?/ }" # convert header to spacving
done < <( grep -B1 -A1 "$key" test ) # pull hits for this key
echo # add blank lines between
done < word.txt # set stdin for the outer read
$: cat word.txt
course
red hat
$: ./tst
course,searchall,1,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,tom outsmart jerry
red hat is my boy
jerry is samall
This will produce the expected output based on one interpretation of your requirements and should be easy to modify if I've made any wrong guesses about what you want to do:
$ cat tst.awk
BEGIN {
RS = ""
FS = "\n"
}
{ gsub(/^[[:space:]]+|[[:space:]]+$/,"") }
NR == FNR {
words[$0]
next
}
{
for (word in words) {
for (i=1; i<=NF; i++) {
if ($i ~ word) {
map[word,++cnt[word]] = (i>1 ? $(i-1) : "") FS $i FS $(i+1)
}
}
}
}
END {
for (word in words) {
for (i=1; i<=cnt[word]; i++) {
beg = sprintf("%s,searchall,%d,", word, cnt[word])
split(map[word,i],lines)
for (j=1; j in lines; j++) {
print beg lines[j]
beg = sprintf("%*s",length(beg),"")
}
print ""
}
}
}
.
$ awk -f tst.awk words file
red hat,searchall,2,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,tom outsmart jerry
red hat is my boy
jerry is samall
I assumed your real input doesn't start with a bunch of blanks as in your posted example - if it does that's easy to accommodate.
I have data in a csv file. I wrote a script that cats this file and uses column -s, -t to nicely tabularize it into nice columns:
Foosballs Barbells Bazketballs
22 39 14
86 94 37
17 44 28
However, I'd like to display the header row in bold. I can do that by writing the format codes directly to the file.
bold=$(tput bold)
reset=$(tput sgr0)
echo "${bold}Foosballs,Barbells,Bazketballs${reset}" > /path/to/file
This works fine with cat; the format codes are displayed correctly when I cat the file. But they screw up column -t: any colored/bolded row is no longer aligned with the other rows.
Foosballs Barbells Bazketballs
22 39 14
86 94 37
17 44 28
Is there some way to get column -t to ignore color codes when lining up data into columns? (Or is there a better way to display csv data in columns?)
UPDATE:
Applying column first and the format codes second will work, as some answers point out. But in many cases I want to apply different formats/colors to individual values in the row, not to the entire row. Here's a simple example:
echo "${underline}foo${reset} ${underline}bar${reset}"
In general, I might want to use arbitrary formatting logic that's difficult or impossible to apply post-hoc (i.e., after I've already printed the line and called column -t on it). Formatting after tabularizing (as in Charles Duffy's answer) is a great start but may not always work for me (at least, conveniently).
I could always write a utility to do this format-code-transparent tabularization myself, but then I'd have to bring that with me wherever I work. I don't want to have to know column widths in advance; I need something like column -t I can throw on the end of a pipe with arbitrary delimited data. Basically, I need a clever one-liner or a third-party util that's readily available via Homebrew or other package managers.
To sum up: For the bounty, I'm looking for a simple, (reasonably) portable method to tabularize previously-formatted data.
One mechanism to enforce alignment and inject color codes is to use printf:
printf '%s%-20s %-20s %-20s%s\n' "$bold" "Foosballs" "Barbells" "Bazketballs" "$reset"
Note that we're using %s placeholders for the color codes, and strings like %-20s (20 characters, left-aligned) for the other fields. This does mean that your code needs to be responsible for knowing the desired length for each column.
If you don't want to do that, you can postprocess:
generate_data() {
echo "Foosballs,Barbells,Bazketballs"
echo 22,39,14
echo 86,94,28
echo 17,44,28
}
bold=$(tput bold)
reset=$(tput sgr0)
generate_data | column -s, -t | {
IFS= read -r header # read first line
printf '%s\n' "${bold}$header${reset}" # write first line in bold
cat # pass rest of stream through unmodified
}
Or, to color just one column:
generate_data() { printf '%s\n' "Foosballs,Barbells,Bazketballs" 22,39,14 86,94,28 17,44,28; }
color_column() {
gawk -v column_nr="$1" -v color_start="$2" -v color_end="$3" '
BEGIN { FPAT = "([[:space:]]*[^[:space:]]+)"; }
{ $column_nr = color_start $column_nr color_end; print $0 }
'
}
generate_data | column -s, -t | color_column 2 "$(tput bold)" "$(tput sgr0)"
For this test file:
$ cat file
Foosballs,Barbells,Bazketballs
22,39,14
86,94,28
The simple way:
d='\e[0m' #default env
r='\e[31m' #red color
printf "$r"; column -s, -t file; printf "$d"
More complicated with different color for each column:
s=',' #delimiter
d='\\e[0m' #default env
r='\\e[31m' #red color
g='\\e[32m' #green color
b='\\e[34m' #blue color
echo -e "$(
awk -F $s \
-v s="$s" \
-v d="$d" \
-v r="$r" \
-v g="$g" \
-v b="$b" \
'{ print r $1 s g $2 s b $3 d }' file | column -s$s -t
)"
And to make header bold, just add this code \e[1m to the echo command like this:
...
B='\e[1m'
echo -e "$B$(
awk -F "$s" \
...
)"
Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
Note that each page has a random number of lines.
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
But, unfortunately, this solution proved to be very slow in some cases.
Any better solution ?
Thanks!
The idea to use read -d $'\f' and then to count the lines is good.
This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
Performance of awk is better than the above bash version. awk was created for such text processing.
Give this tested version a try:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
When \f is encountered, the page number is increased.
NR is the current line number.
----
For history, there is another bash version.
This version is using only built-it commands to count the lines in current page.
The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
awk to the rescue!
awk -v RS='\f' -v n=09 '$0~"^"n"." || $0~"\n"n"." {print NR}' file
3
updated anchoring as commented below.
$ for i in $(seq -w 12); do awk -v RS='\f' -v n="$i"
'$0~"^"n"." || $0~"\n"n"." {print n,"->",NR}' file; done
01 -> 1
02 -> 1
03 -> 1
04 -> 2
05 -> 2
06 -> 2
07 -> 2
08 -> 2
09 -> 3
10 -> 3
11 -> 3
12 -> 3
A script of similar length can be written in bash itself to locate and respond to the embedded <form-feed>'s contained in a file. (it will work for POSIX shell as well, with substitute for string index and expr for math) For example,
#!/bin/bash
declare -i ln=1 ## line count
declare -i pg=1 ## page count
fname="${1:-/dev/stdin}" ## read from file or stdin
printf "\nln:pg text\n" ## print header
while read -r l; do ## read each line
if [ ${l:0:1} = $'\f' ]; then ## if form-feed found
((pg++))
printf "<ff>\n%2s:%2s '%s'\n" "$ln" "$pg" "${l:1}"
else
printf "%2s:%2s '%s'\n" "$ln" "$pg" "$l"
fi
((ln++))
done < "$fname"
Example Input File
The simple input file with embedded <form-feed>'s was create with:
$ echo -e "a\nb\nc\n\fd\ne\nf\ng\nh\n\fi\nj\nk\nl" > dat/affex.txt
Which when output gives:
$ cat dat/affex.txt
a
b
c
d
e
f
g
h
i
j
k
l
Example Use/Output
$ bash affex.sh <dat/affex.txt
ln:pg text
1: 1 'a'
2: 1 'b'
3: 1 'c'
<ff>
4: 2 'd'
5: 2 'e'
6: 2 'f'
7: 2 'g'
8: 2 'h'
<ff>
9: 3 'i'
10: 3 'j'
11: 3 'k'
12: 3 'l'
With Awk, you can define RS (the record separator, default newline) to form feed (\f) and IFS (the input field separator, default any sequence of horizontal whitespace) to newline (\n) and obtain the number of lines as the number of "fields" in a "record" which is a "page".
The placement of form feeds in your data will produce some empty lines within a page so the counts are off where that happens.
awk -F '\n' -v RS='\f' '{ print NF }' file
You could reduce the number by one if $NF == "", and perhaps pass in the number of the desired page as a variable:
awk -F '\n' -v RS='\f' -v p="2" 'NR==p { print NF - ($NF == "") }' file
To obtain the page number for a particular line, just feed head -n number to the script, or loop over the numbers until you have accrued the sum of lines.
line=1
page=1
for count in $(awk -F '\n' -v RS='\f' '{ print NF - ($NF == "") }' file); do
old=$line
((line += count))
echo "Lines $old through line are on page $page"
((page++)
done
This gnu awk script prints the "page" for the linenumber given as command line argument:
BEGIN { ffcount=1;
search = ARGV[2]
delete ARGV[2]
if (!search ) {
print "Please provide linenumber as argument"
exit(1);
}
}
$1 ~ search { printf( "line %s is on page %d\n", search, ffcount) }
/[\f]/ { ffcount++ }
Use it like awk -f formfeeds.awk formfeeds.txt 05 where formfeeds.awk is the script, formfeeds.txt is the file and '05' is a linenumber.
The BEGIN rule deals mostly with the command line argument. The other rules are simple rules:
$1 ~ search applies when the first field matches the commandline argument stored in search
/[\f]/ applies when there is a formfeed
I need advice on how to achieve this output:
myoutputfile.txt
Tom Hagen 1892
State: Canada
Hank Moody 1555
State: Cuba
J.Lo 156
State: France
output of mycommand:
/usr/bin/mycommand
Tom Hagen
1892
Canada
Hank Moody
1555
Cuba
J.Lo
156
France
Im trying to achieve with this shell script:
IFS=$'\r\n' GLOBIGNORE='*' :; names=( $(/usr/bin/mycommand) )
for name in ${names[#]}
do
#echo $name
echo ${name[0]}
#echo ${name:0}
done
Thanks
Assuming you can always rely on the command to output groups of 3 lines, one option might be
/usr/bin/mycommand |
while read name;
read year;
read state; do
echo "$name $year"
echo "State: $state"
done
An array isn't really necessary here.
One improvement could be to exit the loop if you don't get all three required lines:
while read name && read year && read state; do
# Guaranteed that name, year, and state are all set
...
done
An easy one-liner (not tuned for performance):
/usr/bin/mycommand | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"
It reads 3 lines at a time from the pipe and then passes them to a new instance of printf which is used to format the output.
If you have whitespace at the beginning (it looks like that in your example output), you may need to use something like this:
/usr/bin/mycommand | sed -e 's/^\s*//g' | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"
#!/bin/bash
COUNTER=0
/usr/bin/mycommand | while read LINE
do
if [ $COUNTER = 0 ]; then
NAME="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 1 ]; then
YEAR="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 2 ]; then
STATE="$LINE"
COUNTER=0
echo "$NAME $YEAR"
echo "State: $STATE"
fi
done
chepner's pure bash solution is simple and elegant, but slow with large input files (loops in bash are slow).
Michael Jaros' solution is even simpler, if you have GNU xargs (verify with xargs --version), but also does not perform well with large input files (external utility printf is called once for every 3 input lines).
If performance matters, try the following awk solution:
/usr/bin/mycommand | awk '
{ ORS = (NR % 3 == 1 ? " " : "\n")
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") }
{ print (NR % 3 == 0 ? "State: " : "") $0 }
' > myoutputfile.txt
NR % 3 returns the 0-based index of each input line within its respective group of consecutive 3 lines; returns 1 for the 1st line, 2 for the 2nd, and 0(!) for the 3rd.
{ ORS = (NR % 3 == 1 ? " " : "\n") determines ORS, the output-record separator, based on that index: a space for line 1, and a newline for lines 2 and 3; the space ensures that line 2 is appended to line 1 with a space when using print.
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") strips leading and trailing whitespace from the line - including, if present, a trailing \r, which your input seems to have.
{ print (NR % 3 == 0 ? "State: " : "") $0 } prints the trimmed input line, prefixed by "State: " only for every 3rd input line, and implicitly followed by ORS (due to use of print).
I would like to distinguish adjacent matches more easily, while still retaining the context of input. In order to do so, it would be nice to cycle through a list of colors for each match found by grep.
I would like to modify the command
echo -e "AA\nAABBAABBCC\nBBAABB" | grep --color=always "AABB\|"
So that instead of printing:
It would print:
Can this be done in grep? The closest I answer I could find was matching two different (and non-overlapping) grep queries in different colors.
Alternatively, how can I most easily get this functionality in an Ubuntu terminal?
You can achieve a grep with color rotation using perl and a single substitution using an experimental regex feature to wrap each occurrence in ANSI escape sequences. Here's something to serve as a starting point that you can wrap in a shell function:
$ printf "FOOBARFOOFOOFOOBAR\nFOOBARFOOFOOFOOBARFOOFOOFOOFOOFOOFOOFOOFOOFOOFOOFOOFOO\n" \
| perl -ne 'next unless /FOO/; $m=0; s#(?<!\[0mFOO)\K((?{$n=30+(++$m%8)})FOO)#\033\[1;${n}m\1\033\[0m#g; print'
Not very pretty but at least brief.
I'm too lazy to redo the screenshot but you might want to skip the dark grey by doing $n = 31 + (++$m % 7) instead. If you only want two colors set the divisor to 2 (obviously).
Awk
This can be achieved with an awk script which utilizes ANSI escape sequences
#!/usr/bin/awk -f
# USAGE: echo -e "AA\nAABBAABBCC\nBBAABB" | awk -f color_grep.awk -v regex="AABB"
BEGIN {
# Bold Red ANSI Code
c[0] = "\x1b[1;31m"
# Bold Blue ANSI Code
c[1] = "\x1b[1;34m"
# Default ANSI Code
n = "\x1b[0m"
}
{
i--
j = 1
do {
temp = $0;
i = (i + 1) % 2
$0 = gensub("(" regex ")", c[i] "\\1" n, j, temp);
j++
} while ($0 != temp)
print $0
}
Or as a one liner on the command line:
echo -e "AA\nAABBAABBCC\nBBAABB" | awk 'BEGIN { c[0]="\x1b[1;31m"; c[1]="\x1b[1;34m"; n="\x1b[0m"} { i--; j=1; do { $0=gensub(/(AABB)/, c[i=(i+1)%2] "\\1" n, j++, temp=$0); } while ($0!=temp) print $0 }'
Perl
After seeing Adrian's answer, I decided to come up with my own perl solution.
#!/usr/bin/perl
# USAGE: echo -e "AA\nAABBAABBCC\nBBAABB" | ~/color_grep.perl "AABB"
$regex = #ARGV[0];
# Generates ANSI escape sequences for bold text colored as follows:
# 0 - Red, 2 - Green, 3- Yellow, 4 - Blue, 5 - Magenta, 6 - Cyan
sub color { "\033\[1;" . (31 + $_[0] % 6) . "m" }
# ANSI escape sequence for default text
$default = "\033\[0m";
while (<STDIN>) {
# Surround the matched expression with the color start and color end tags.
# After outputting each match, increment to the next color index
s/($regex)/color($i++) . $1 . $default/ge;
print;
}
As a one liner:
printf "FOOBARFOOFOOFOOBAR\nFOOBARFOOFOOFOOBARFOOFOOFOOFOOFOOFOOFOOFOOFOOFOOFOOFOO\n" | perl -ne 'BEGIN{sub c {"\033\[1;".(31+$_[0]%6)."m"} $d="\033\[0m";} s/(FOO)/c($i++).$1.$d/ge; print'
You can use colout: http://nojhan.github.io/colout/
This example will colorize your pattern in text stream cycling through rainbow colors.
echo -e "AA\nAABBAABBCC\nBBAABB" | colout AABB rainbow
You can change rainbow to random, use some other color map or define it on the fly:
echo -e "AA\nAABBAABBCC\nBBAABB" | colout -c AABB red,blue
-c option instructs colout to cycle comma-separated colors at each match using them as a color map.