trying to remove some strings with awk command [duplicate] - bash

---- my text file from which i have to search for the keywords [name of the file --- test] <cat -Evt file>
centos is my bro$
red hat is my course$
ubuntu is my OS$
fqdn is stupid $
$
$
$
tom outsmart jerry$
red hat is my boy$
jerry is samall
------ keyword file is [word.txt] <cat -Evt file >
red hat$
we$
hello$
bye$
Compensation
----- my code
while read "p"; do
paste -d',' <(echo -n "$p" ) <(echo "searchall") <( grep -i "$p" test | wc -l) <(grep -i -A 1 -B 1 "$p" test )
done <word.txt
---- my expectation ,output should be
keyword,serchall,frequency,line above it
line it find keyword in
line below it
red hat,searchall,2,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,tom outsmart jerry
red hat is my boy
jerry is samall
---- but coming OUTPUT from my code
red hat,searchall,2,centos is my bro
,,,red hat is my course
,,,ubuntu is my OS
,,,--
,,,tom outsmart jerry
,,,red hat is my boy
,,,jerry is samall
---- please give me suggestion and point me in the right direction to get the desired output.
---- i am trying to grep the keyword from the file and printing them
Here two records should create as keyword (red hat) is coming two time
----how can i loop through the coming frequency of the keyword.

This sounds very much like a homework assignment.
c.f. BashFAQ for better reads; keeping this simple to focus on what you asked for.
Rewritten for more precise formatting -
while read key # read each search key
do cnt=$(grep "$key" test|wc -l) # count the hits
pad="$key,searchall,$cnt," # build the "header" fields
while read line # read the input from grep
do if [[ "$line" =~ ^-- ]] # treat hits separately
then pad="$key,searchall,$cnt," # reset the "header"
echo # add the blank line
continue # skip to next line of data
fi
echo "$pad$line" # echo "header" and data
pad="${pad//?/ }" # convert header to spacving
done < <( grep -B1 -A1 "$key" test ) # pull hits for this key
echo # add blank lines between
done < word.txt # set stdin for the outer read
$: cat word.txt
course
red hat
$: ./tst
course,searchall,1,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,tom outsmart jerry
red hat is my boy
jerry is samall

This will produce the expected output based on one interpretation of your requirements and should be easy to modify if I've made any wrong guesses about what you want to do:
$ cat tst.awk
BEGIN {
RS = ""
FS = "\n"
}
{ gsub(/^[[:space:]]+|[[:space:]]+$/,"") }
NR == FNR {
words[$0]
next
}
{
for (word in words) {
for (i=1; i<=NF; i++) {
if ($i ~ word) {
map[word,++cnt[word]] = (i>1 ? $(i-1) : "") FS $i FS $(i+1)
}
}
}
}
END {
for (word in words) {
for (i=1; i<=cnt[word]; i++) {
beg = sprintf("%s,searchall,%d,", word, cnt[word])
split(map[word,i],lines)
for (j=1; j in lines; j++) {
print beg lines[j]
beg = sprintf("%*s",length(beg),"")
}
print ""
}
}
}
.
$ awk -f tst.awk words file
red hat,searchall,2,centos is my bro
red hat is my course
ubuntu is my OS
red hat,searchall,2,tom outsmart jerry
red hat is my boy
jerry is samall
I assumed your real input doesn't start with a bunch of blanks as in your posted example - if it does that's easy to accommodate.

Related

How can we rearrange a table with bash?

I have some ascii file like:
color green
color black
color yellow
os Linux
os android
os windows
os mac
drink coffee
drink water
number 1
number 0
number 100
I want to make a table something like:
color os drink number
-----------------------------------
green Linux coffee 1
black android water 0
yellow windows 100
mac
I wrote this code and it does what I want.
Are there some better solutions?
#!/bin/bash
inpfile=$1
keys=$(cat $inpfile | cut -d" " -f 1 | uniq)
for key in $keys
do
values=$(cat $inpfile | grep $key | cut -d" " -f 2)
result=$(echo "$key ----- $values")
echo $result | datamash -W transpose > /tmp/$key.table
done
paste -d "\t" /tmp/*.table
rm /tmp/*.table
The shell script in your question would fail given various input values (substrings, regexp metachars, spaces, etc.) and/or environment settings and/or even the contents of the directory you run it from. Copy/paste it into http://shellcheck.net and it'll tell you about some of the issues. It'll also be extremely slow.
Here's how to do it robustly, efficiently, and portably using any awk in any shell on every Unix box:
$ cat tst.awk
BEGIN { OFS="\t" }
!($1 in tag2colNr) {
tag2colNr[$1] = ++numCols
rowNr = ++numVals[numCols]
vals[rowNr,numCols] = $1
}
{
colNr = tag2colNr[$1]
rowNr = ++numVals[colNr]
vals[rowNr,colNr] = $2
numRows = (numVals[colNr]>numRows ? numVals[colNr] : numRows)
}
END {
for (rowNr=1; rowNr<=numRows; rowNr++) {
for (colNr=1; colNr<=numCols; colNr++) {
printf "%s%s", vals[rowNr,colNr], (colNr<numCols ? OFS : ORS)
}
}
}
$ awk -f tst.awk file
color os drink number
green Linux coffee 1
black android water 0
yellow windows 100
mac
There are various ways to add the line of dashes under the tags (header) line and/or change the spacing between fields if you feel there's some value in it but since you said "I want to make a table something like..." I assume the above is like enough.
Another possibility is using tput:
#!/bin/bash
set -euo pipefail
function handleFile() {
local data_file="${1}"
local x=0
local y=0
local column_width=20
# we are using tput below to place the cursor,
# so clear the screen for clean output.
clear
while read -r header ; do
# each header is a new column so reset y coordinate to 0
y=0
# place header with underline
printf "$(tput cup "${y}" "${x}")$(tput smul)%s$(tput sgr0)" "${header}"
while read -r row ; do
# move down 1 row by incrementing y coordinate value
((y+=1))
# place the current row value without underline
printf "$(tput cup "${y}" "${x}")%s$(tput sgr0)" "${row}"
# get row values for each header using grep/awk
done < <(grep "${header}" "${data_file}" | awk '{ print $2 }')
# move to next column by incrementing x coordinate by desired column width
((x+=column_width))
# use awk/sort to get unique header values
done < <(awk '{ print $1 }' "${data_file}" | sort -u)
# move cursor down 2 lines and then remove all cursor settings
((y+=2)) && tput cup "${y}" 0 && tput sgr0
}
handleFile "$#"
Sample usage is:
./script.sh data.txt
color drink number os
green coffee 1 Linux
black water 0 android
yellow 100 windows
mac

BASH - read lines from section of file

I have a file formatted like this:
[SITE1]
north
west
[MOTOR]
west
south
north
[AREA]
west
east
north
[CLEAR]
What I need to be able to do is read all values from a specific section.
Eg: read AREA and be returned:
west
east
north
The examples I've found online are for ini files, which have key value pairs.
Can anyone help advise how I can do this ?
Thanks
Using sed :
category=MOTOR; sed -nE "/^\[$category\]$/{:l n;/^(\[.*\])?$/q;p;bl}" /path/to/your/file
It doesn't do anything until it matches a line that constists of your target category, at which point it enters a loop. In this loop, it consumes a line, exits if it's an empty line or another category (or the end of the file) and otherwise prints the line.
The sed commands used are the following :
/pattern/ executes the next command or group of commands when the current line matches the pattern
{commands} regroups commands, for instance to execute them conditionnaly.
:l defines a label named "l", to which you'll be able to jump to.
n asks sed to start working on the next line.
q exits
p prints the current line
bl jumps to the "l" label
You can try it here.
Two options in mind - use a filter (e.g., awk, sed) to extract the relevant section, or use bash to filter to the specific section.
With bash, using a function:
#! /bin/bash
function read_section {
local id=$1
local match
input=()
while read p ; do
if [ "$p" = "[$id]" ] ; then
# Read data here
while read p ; do
# Check for end of section - empty line
if [ "$p" = "" ] ; then
break
fi
# Do something with '$p'
input+=("$p")
echo "Item $p"
done
# Indicate section was found
return 0
fi
done
# Indicate section not found
return 1
}
if read_section "AREA" < p.txt ; then
echo "Found Area" "${#input[$#]}"
else
echo "Missing AREA"
fi
if read_section "FOO" < p.txt ; then
echo "Found FOO"
else
echo "Missing FOO"
fi
Output: (placing sample input into property file p.txt)
Item west
Item east
Item north
Found Area 4
Missing FOO
Notes
that it's not clear if each section ends with empty line. Code assumes that this is the case. Otherwise, the section change can be modified to if [[ "$p" = \[* ]], or similar, with extra check to ignore empty line.
The function return true/false to indicate if the section was found. The script can act on this information.
The loaded items are placed into the input array, for further processing
The alternative is to use an external program to filter the input. This MAY provide performance advantage if the input file is VERY large, or if additional logic is needed.
function filter_section {
local id=$1
awk -v ID="$id" '/^\[/ { p= ($0 == "[" ID "]" ); next } p && $0 { print }' < p.txt
}
function read_section {
local id=$1
local match
input=()
while read p ; do
# Do something with '$p'
input+=("$p")
echo "Item $p"
done <<< $(filter_section "$id")
# Indicate section not found
[ "${#input[*]}" -gt 0 ] && return 0
return 1
}
if read_section "AREA" < p.txt ; then
echo "Found Area" "${#input[$#]}"
else
echo "Missing AREA"
fi
if read_section "FOO" < p.txt ; then
echo "Found FOO"
else
echo "Missing FOO"
fi

bash routine to return the page number of a given line number from text file

Consider a plain text file containing page-breaking ASCII control character "Form Feed" ($'\f'):
alpha\n
beta\n
gamma\n\f
one\n
two\n
three\n
four\n
five\n\f
earth\n
wind\n
fire\n
water\n\f
Note that each page has a random number of lines.
Need a bash routine that return the page number of a given line number from a text file containing page-breaking ASCII control character.
After a long time researching the solution I finally came across this piece of code:
function get_page_from_line
{
local nline="$1"
local input_file="$2"
local npag=0
local ln=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( ++npag ))
ln=$(echo -n "$page" | wc -l)
total=$(( total + ln ))
if [ $total -ge $nline ]; then
echo "${npag}"
return
fi
done < "$input_file"
echo "0"
return
}
But, unfortunately, this solution proved to be very slow in some cases.
Any better solution ?
Thanks!
The idea to use read -d $'\f' and then to count the lines is good.
This version migth appear not ellegant: if nline is greater than or equal to the number of lines in the file, then the file is read twice.
Give it a try, because it is super fast:
function get_page_from_line ()
{
local nline="${1}"
local input_file="${2}"
if [[ $(wc -l "${input_file}" | awk '{print $1}') -lt nline ]] ; then
printf "0\n"
else
printf "%d\n" $(( $(head -n ${nline} "${input_file}" | grep -c "^"$'\f') + 1 ))
fi
}
Performance of awk is better than the above bash version. awk was created for such text processing.
Give this tested version a try:
function get_page_from_line ()
{
awk -v nline="${1}" '
BEGIN {
npag=1;
}
{
if (index($0,"\f")>0) {
npag++;
}
if (NR==nline) {
print npag;
linefound=1;
exit;
}
}
END {
if (!linefound) {
print 0;
}
}' "${2}"
}
When \f is encountered, the page number is increased.
NR is the current line number.
----
For history, there is another bash version.
This version is using only built-it commands to count the lines in current page.
The speedtest.sh that you had provided in the comments showed it is a little bit ahead (20 sec approx.) which makes it equivalent to your version:
function get_page_from_line ()
{
local nline="$1"
local input_file="$2"
local npag=0
local total=0
while IFS= read -d $'\f' -r page; do
npag=$(( npag + 1 ))
IFS=$'\n'
for line in ${page}
do
total=$(( total + 1 ))
if [[ total -eq nline ]] ; then
printf "%d\n" ${npag}
unset IFS
return
fi
done
unset IFS
done < "$input_file"
printf "0\n"
return
}
awk to the rescue!
awk -v RS='\f' -v n=09 '$0~"^"n"." || $0~"\n"n"." {print NR}' file
3
updated anchoring as commented below.
$ for i in $(seq -w 12); do awk -v RS='\f' -v n="$i"
'$0~"^"n"." || $0~"\n"n"." {print n,"->",NR}' file; done
01 -> 1
02 -> 1
03 -> 1
04 -> 2
05 -> 2
06 -> 2
07 -> 2
08 -> 2
09 -> 3
10 -> 3
11 -> 3
12 -> 3
A script of similar length can be written in bash itself to locate and respond to the embedded <form-feed>'s contained in a file. (it will work for POSIX shell as well, with substitute for string index and expr for math) For example,
#!/bin/bash
declare -i ln=1 ## line count
declare -i pg=1 ## page count
fname="${1:-/dev/stdin}" ## read from file or stdin
printf "\nln:pg text\n" ## print header
while read -r l; do ## read each line
if [ ${l:0:1} = $'\f' ]; then ## if form-feed found
((pg++))
printf "<ff>\n%2s:%2s '%s'\n" "$ln" "$pg" "${l:1}"
else
printf "%2s:%2s '%s'\n" "$ln" "$pg" "$l"
fi
((ln++))
done < "$fname"
Example Input File
The simple input file with embedded <form-feed>'s was create with:
$ echo -e "a\nb\nc\n\fd\ne\nf\ng\nh\n\fi\nj\nk\nl" > dat/affex.txt
Which when output gives:
$ cat dat/affex.txt
a
b
c
d
e
f
g
h
i
j
k
l
Example Use/Output
$ bash affex.sh <dat/affex.txt
ln:pg text
1: 1 'a'
2: 1 'b'
3: 1 'c'
<ff>
4: 2 'd'
5: 2 'e'
6: 2 'f'
7: 2 'g'
8: 2 'h'
<ff>
9: 3 'i'
10: 3 'j'
11: 3 'k'
12: 3 'l'
With Awk, you can define RS (the record separator, default newline) to form feed (\f) and IFS (the input field separator, default any sequence of horizontal whitespace) to newline (\n) and obtain the number of lines as the number of "fields" in a "record" which is a "page".
The placement of form feeds in your data will produce some empty lines within a page so the counts are off where that happens.
awk -F '\n' -v RS='\f' '{ print NF }' file
You could reduce the number by one if $NF == "", and perhaps pass in the number of the desired page as a variable:
awk -F '\n' -v RS='\f' -v p="2" 'NR==p { print NF - ($NF == "") }' file
To obtain the page number for a particular line, just feed head -n number to the script, or loop over the numbers until you have accrued the sum of lines.
line=1
page=1
for count in $(awk -F '\n' -v RS='\f' '{ print NF - ($NF == "") }' file); do
old=$line
((line += count))
echo "Lines $old through line are on page $page"
((page++)
done
This gnu awk script prints the "page" for the linenumber given as command line argument:
BEGIN { ffcount=1;
search = ARGV[2]
delete ARGV[2]
if (!search ) {
print "Please provide linenumber as argument"
exit(1);
}
}
$1 ~ search { printf( "line %s is on page %d\n", search, ffcount) }
/[\f]/ { ffcount++ }
Use it like awk -f formfeeds.awk formfeeds.txt 05 where formfeeds.awk is the script, formfeeds.txt is the file and '05' is a linenumber.
The BEGIN rule deals mostly with the command line argument. The other rules are simple rules:
$1 ~ search applies when the first field matches the commandline argument stored in search
/[\f]/ applies when there is a formfeed

Bash script, command - output to array, then print to file

I need advice on how to achieve this output:
myoutputfile.txt
Tom Hagen 1892
State: Canada
Hank Moody 1555
State: Cuba
J.Lo 156
State: France
output of mycommand:
/usr/bin/mycommand
Tom Hagen
1892
Canada
Hank Moody
1555
Cuba
J.Lo
156
France
Im trying to achieve with this shell script:
IFS=$'\r\n' GLOBIGNORE='*' :; names=( $(/usr/bin/mycommand) )
for name in ${names[#]}
do
#echo $name
echo ${name[0]}
#echo ${name:0}
done
Thanks
Assuming you can always rely on the command to output groups of 3 lines, one option might be
/usr/bin/mycommand |
while read name;
read year;
read state; do
echo "$name $year"
echo "State: $state"
done
An array isn't really necessary here.
One improvement could be to exit the loop if you don't get all three required lines:
while read name && read year && read state; do
# Guaranteed that name, year, and state are all set
...
done
An easy one-liner (not tuned for performance):
/usr/bin/mycommand | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"
It reads 3 lines at a time from the pipe and then passes them to a new instance of printf which is used to format the output.
If you have whitespace at the beginning (it looks like that in your example output), you may need to use something like this:
/usr/bin/mycommand | sed -e 's/^\s*//g' | xargs -d '\n' -L3 printf "%s %s\nState: %s\n"
#!/bin/bash
COUNTER=0
/usr/bin/mycommand | while read LINE
do
if [ $COUNTER = 0 ]; then
NAME="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 1 ]; then
YEAR="$LINE"
COUNTER=$(($COUNTER + 1))
elif [ $COUNTER = 2 ]; then
STATE="$LINE"
COUNTER=0
echo "$NAME $YEAR"
echo "State: $STATE"
fi
done
chepner's pure bash solution is simple and elegant, but slow with large input files (loops in bash are slow).
Michael Jaros' solution is even simpler, if you have GNU xargs (verify with xargs --version), but also does not perform well with large input files (external utility printf is called once for every 3 input lines).
If performance matters, try the following awk solution:
/usr/bin/mycommand | awk '
{ ORS = (NR % 3 == 1 ? " " : "\n")
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") }
{ print (NR % 3 == 0 ? "State: " : "") $0 }
' > myoutputfile.txt
NR % 3 returns the 0-based index of each input line within its respective group of consecutive 3 lines; returns 1 for the 1st line, 2 for the 2nd, and 0(!) for the 3rd.
{ ORS = (NR % 3 == 1 ? " " : "\n") determines ORS, the output-record separator, based on that index: a space for line 1, and a newline for lines 2 and 3; the space ensures that line 2 is appended to line 1 with a space when using print.
gsub("^[[:blank:]]+|[[:blank:]]*\r?$", "") strips leading and trailing whitespace from the line - including, if present, a trailing \r, which your input seems to have.
{ print (NR % 3 == 0 ? "State: " : "") $0 } prints the trimmed input line, prefixed by "State: " only for every 3rd input line, and implicitly followed by ORS (due to use of print).

Grep - Cycle through colors for each match

I would like to distinguish adjacent matches more easily, while still retaining the context of input. In order to do so, it would be nice to cycle through a list of colors for each match found by grep.
I would like to modify the command
echo -e "AA\nAABBAABBCC\nBBAABB" | grep --color=always "AABB\|"
So that instead of printing:
It would print:
Can this be done in grep? The closest I answer I could find was matching two different (and non-overlapping) grep queries in different colors.
Alternatively, how can I most easily get this functionality in an Ubuntu terminal?
You can achieve a grep with color rotation using perl and a single substitution using an experimental regex feature to wrap each occurrence in ANSI escape sequences. Here's something to serve as a starting point that you can wrap in a shell function:
$ printf "FOOBARFOOFOOFOOBAR\nFOOBARFOOFOOFOOBARFOOFOOFOOFOOFOOFOOFOOFOOFOOFOOFOOFOO\n" \
| perl -ne 'next unless /FOO/; $m=0; s#(?<!\[0mFOO)\K((?{$n=30+(++$m%8)})FOO)#\033\[1;${n}m\1\033\[0m#g; print'
Not very pretty but at least brief.
I'm too lazy to redo the screenshot but you might want to skip the dark grey by doing $n = 31 + (++$m % 7) instead. If you only want two colors set the divisor to 2 (obviously).
Awk
This can be achieved with an awk script which utilizes ANSI escape sequences
#!/usr/bin/awk -f
# USAGE: echo -e "AA\nAABBAABBCC\nBBAABB" | awk -f color_grep.awk -v regex="AABB"
BEGIN {
# Bold Red ANSI Code
c[0] = "\x1b[1;31m"
# Bold Blue ANSI Code
c[1] = "\x1b[1;34m"
# Default ANSI Code
n = "\x1b[0m"
}
{
i--
j = 1
do {
temp = $0;
i = (i + 1) % 2
$0 = gensub("(" regex ")", c[i] "\\1" n, j, temp);
j++
} while ($0 != temp)
print $0
}
Or as a one liner on the command line:
echo -e "AA\nAABBAABBCC\nBBAABB" | awk 'BEGIN { c[0]="\x1b[1;31m"; c[1]="\x1b[1;34m"; n="\x1b[0m"} { i--; j=1; do { $0=gensub(/(AABB)/, c[i=(i+1)%2] "\\1" n, j++, temp=$0); } while ($0!=temp) print $0 }'
Perl
After seeing Adrian's answer, I decided to come up with my own perl solution.
#!/usr/bin/perl
# USAGE: echo -e "AA\nAABBAABBCC\nBBAABB" | ~/color_grep.perl "AABB"
$regex = #ARGV[0];
# Generates ANSI escape sequences for bold text colored as follows:
# 0 - Red, 2 - Green, 3- Yellow, 4 - Blue, 5 - Magenta, 6 - Cyan
sub color { "\033\[1;" . (31 + $_[0] % 6) . "m" }
# ANSI escape sequence for default text
$default = "\033\[0m";
while (<STDIN>) {
# Surround the matched expression with the color start and color end tags.
# After outputting each match, increment to the next color index
s/($regex)/color($i++) . $1 . $default/ge;
print;
}
As a one liner:
printf "FOOBARFOOFOOFOOBAR\nFOOBARFOOFOOFOOBARFOOFOOFOOFOOFOOFOOFOOFOOFOOFOOFOOFOO\n" | perl -ne 'BEGIN{sub c {"\033\[1;".(31+$_[0]%6)."m"} $d="\033\[0m";} s/(FOO)/c($i++).$1.$d/ge; print'
You can use colout: http://nojhan.github.io/colout/
This example will colorize your pattern in text stream cycling through rainbow colors.
echo -e "AA\nAABBAABBCC\nBBAABB" | colout AABB rainbow
You can change rainbow to random, use some other color map or define it on the fly:
echo -e "AA\nAABBAABBCC\nBBAABB" | colout -c AABB red,blue
-c option instructs colout to cycle comma-separated colors at each match using them as a color map.

Resources