I need little help from the community:
I have these two lines in a large text file:
Connected clients: 42
4 ACTIVE CLIENTS IN LAST 20 SECONDS
How I can find, extract and assign the numbers to variables?
clients=42
active=4
SED, AWK, GREP? Which one should I use?
clients=$(grep -Po '^(?<=Connected clients: )([0-9]+)$' filename)
active=$(grep -Po '^([0-9]+)(?= ACTIVE CLIENTS IN LAST [0-9]+ SECONDS$)' filename)
or
clients=$(sed -n 's/^Connected clients: \([0-9]\+\)$/\1/p' filename)
active=$(sed -n 's/^\([0-9]\+\) ACTIVE CLIENTS IN LAST [0-9]\+ SECONDS$/\1/p' filename)
str='Connected clients: 42 4 ACTIVE CLIENTS IN LAST 20 SECONDS'
set -- $str
clients=$3
active=$4
If it's two lines, fine.
str1='Connected clients: 42'
str2='4 ACTIVE CLIENTS IN LAST 20 SECONDS'
set -- $str1
clients=$3
set -- $str2
active=$1
Reading two lines from a file may be done by
{ read str1; read str2; } < file
Alternately, do the reading and writing in AWK, and slurp the results into Bash.
eval "$(awk '/^Connected clients: / { print "clients=" $3 }
/[0-9]+ ACTIVE CLIENTS/ { print "active=" $1 }
' filename)"
you can use awk
$ set -- $(awk '/Connected/{c=$NF}/ACTIVE/{a=$1}END{print c,a}' file)
$ echo $1
42
$ echo $2
4
assign $1, $2 to appropriate variable names as desired
if you can directly assign using declare
$ declare $(awk '/Connected/{c=$NF}/ACTIVE/{a=$1}END{print "client="c;print "active="a}' file)
$ echo $client
42
$ echo $active
4
Related
I'm would like to substitute a set of edit: single byte characters with a set of literal strings in a stream, without any constraint on the line size.
#!/bin/bash
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
chars_to_strings $'\a\b\t\v' '<bell>' '<backspace>' '<horizontal-tab>' '<vertical-tab>'
The expected output would be:
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>...
I can think of a bash function that would do that, something like:
chars_to_strings() {
local delim buffer
while true
do
delim=''
IFS='' read -r -d '.' -n 4096 buffer && (( ${#buffer} != 4096 )) && delim='.'
if [[ -n "${delim:+_}" ]] || [[ -n "${buffer:+_}" ]]
then
# Do the replacements in "$buffer"
# ...
printf "%s%s" "$buffer" "$delim"
else
break
fi
done
}
But I'm looking for a more efficient way, any thoughts?
Since you seem to be okay with using ANSI C quoting via $'...' strings, then maybe use sed?
sed $'s/\a/<bell>/g; s/\b/<backspace>/g; s/\t/<horizontal-tab>/g; s/\v/<vertical-tab>/g'
Or, via separate commands:
sed -e $'s/\a/<bell>/g' \
-e $'s/\b/<backspace>/g' \
-e $'s/\t/<horizontal-tab>/g' \
-e $'s/\v/<vertical-tab>/g'
Or, using awk, which replaces newline characters too (by customizing the Output Record Separator, i.e., the ORS variable):
$ printf '\a,\b,\t,\v\n' | awk -vORS='<newline>' '
{
gsub(/\a/, "<bell>")
gsub(/\b/, "<backspace>")
gsub(/\t/, "<horizontal-tab>")
gsub(/\v/, "<vertical-tab>")
print $0
}
'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><newline>
For a simple one-liner with reasonable portability, try Perl.
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
perl -pe 's/\a/<bell>/g;
s/\b/<backspace>/g;s/\t/<horizontal-tab>/g;s/\v/<vertical-tab>/g'
Perl internally does some intelligent optimizations so it's not encumbered by lines which are longer than its input buffer or whatever.
Perl by itself is not POSIX, of course; but it can be expected to be installed on any even remotely modern platform (short of perhaps embedded systems etc).
Assuming the overall objective is to provide the ability to process a stream of data in real time without having to wait for a EOL/End-of-buffer occurrence to trigger processing ...
A few items:
continue to use the while/read -n loop to read a chunk of data from the incoming stream and store in buffer variable
push the conversion code into something that's better suited to string manipulation (ie, something other than bash); for sake of discussion we'll choose awk
within the while/read -n loop printf "%s\n" "${buffer}" and pipe the output from the while loop into awk; NOTE: the key item is to introduce an explicit \n into the stream so as to trigger awk processing for each new 'line' of input; OP can decide if this additional \n must be distinguished from a \n occurring in the original stream of data
awk then parses each line of input as per the replacement logic, making sure to append anything leftover to the front of the next line of input (ie, for when the while/read -n breaks an item in the 'middle')
General idea:
chars_to_strings() {
while read -r -n 15 buffer # using '15' for demo purposes otherwise replace with '4096' or whatever OP wants
do
printf "%s\n" "${buffer}"
done | awk '{print NR,FNR,length($0)}' # replace 'print ...' with OP's replacement logic
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1 # add some delay to data being streamed to chars_to_strings()
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
A variation on this idea using a named pipe:
mkfifo /tmp/pipeX
sleep infinity > /tmp/pipeX # keep pipe open so awk does not exit
awk '{print NR,FNR,length($0)}' < /tmp/pipeX &
chars_to_strings() {
while read -r -n 15 buffer
do
printf "%s\n" "${buffer}"
done > /tmp/pipeX
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
# kill background 'awk' and/or 'sleep infinity' when no longer needed
don't waste FS/OFS - use the built-in variables to take 2 out of the 5 needed :
echo $' \t abc xyz \t \a \n\n ' |
mawk 'gsub(/\7/, "<bell>", $!(NF = NF)) + gsub(/\10/,"<bs>") +\
gsub(/\11/,"<h-tab>")^_' OFS='<v-tab>' FS='\13' ORS='<newline>'
<h-tab> abc xyz <h-tab> <bell> <newline><newline> <newline>
To have NO constraint on the line length you could do something like this with GNU awk:
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(foo,bar)
print
}'
That will read and process the input 100 chars at a time no matter which chars are present, whether it has newlines or not, and even if the input was one multi-terabyte line.
Replace gsub(foo,bar) with whatever substitution(s) you have in mind, e.g.:
$ printf '\a,\b,\t,\v' |
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(/\a/,"<bell>")
gsub(/\b/,"<backspace>")
gsub(/\t/,"<horizontal-tab>")
gsub(/\v/,"<vertical-tab>")
print
}'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab>
and of course it'd be trivial to pass a list of old and new strings to awk rather than hardcoding them, you'd just have to sanitize any regexp or backreference metachars before calling gsub().
This question already has answers here:
How to read variables from file, with multiple variables per line?
(2 answers)
Closed last month.
I am trying to assign variables obtained by awk, from a 2 columned txt file.
To a command, which includes every two value as two variables in it.
For example, the file I use is;
foo.txt
10 20
33 40
65 78
my command is aiming to print ;
end=20 start=10
end=40 start=33
end=78 start=65
Basically, I want to iterate the code for every line, and for output, there will be two variables from the two columns of the input file.
I am not an awk expert (I am trying my best), what I could have done so far is this fusion;
while read -r line ; do awk '{ second_variable=$2 ; first_variable=$1 ; }' ; echo "end=$first_name start=$second_name"; done <foo.txt
but it only gives this output;
end= start=
only one time without any variable. I would appreciate any suggestion. Thank you.
In bash you only need while, read and printf:
while read -r start end
do printf 'end=%d start=%d\n' "$end" "$start"
done < foo.txt
end=20 start=10
end=40 start=33
end=78 start=65
With awk, you could do:
awk '{print "end=" $2, "start=" $1}' foo.txt
end=20 start=10
end=40 start=33
end=78 start=65
With sed you'd use regular expressions:
sed -E 's/([0-9]+) ([0-9]+)/end=\2 start=\1/' foo.txt
end=20 start=10
end=40 start=33
end=78 start=65
Just in Bash:
while read -r end start; do echo "end=$end start=$start"; done <foo.txt
What about using xargs?
xargs -n2 sh -c 'echo end=$1 start=$2' sh < file.txt
Demo
xargs -n2 sh -c 'echo end=$1 start=$2' sh <<INPUT
10 20
33 40
65 78
INPUT
Output
end=10 start=20
end=33 start=40
end=65 start=78
I was trying to solve one of my old assignment I am literally stuck in this one Can anyone help me?
There is a file called "datafile". This file has names of some friends and their
ages. But unfortunately, the names are not in the correct format. They should be
lastname, firstname
But, by mistake they are firstname,lastname
The task of the problem is writing a shell script called fix_datafile
to correct the problem, and sort the names alphabetically. The corrected filename
is called datafile.fix .
Please make sure the original structure of the file should be kept untouched.
The following is the sample of datafile.fix file:
#personal information
#******** Name ********* ***** age *****
Alexanderovich,Franklin 47
Amber,Christine 54
Applesum,Franky 33
Attaboal,Arman 18
Balad,George 38
Balad,Sam 19
Balsamic,Shery 22
Bojack,Steven 33
Chantell,Alex 60
Doyle,Jefry 45
Farland,Pamela 40
Handerman,jimmy 23
Kashman,Jenifer 25
Kasting,Ellen 33
Lorux,Allen 29
Mathis,Johny 26
Maxter,Jefry 31
Newton,Gerisha 40
Osama,Franklin 33
Osana,Gabriel 61
Oxnard,George 20
Palomar,Frank 24
Plomer,Susan 29
Poolank,John 31
Rochester,Benjami 40
Stanock,Verona 38
Tenesik,Gabriel 29
Whelsh,Elsa 21
If you can use awk (I suppose you can), than this there's a script which does what you need:
#!/bin/bash
RESULT_FILE_NAME="datafile.new"
cat datafile.fix | head -4 > datafile.new
cat datafile.fix | tail -n +5 | awk -F"[, ]" '{if(!$2){print()}else{print($2","$1, $3)}}' >> datafile.new
Passing -F"[, ]" allows awk to split columns both by , and space and all that remains is just print columns in a needed format. The downsides are that we should use if statement to preserve empty lines and file header also should be treated separately.
Another option is using sed:
cat datafile.fix | sed -E 's/([a-zA-Z]+),([a-zA-Z]+) ([0-9]+)/\2,\1 \3/g' > datafile.new
The downside is that it requires regex that is not as obvious as awk syntax.
awk -F[,\ ] '
!/^$/ && !/^#/ {
first=$1;
last=$2;
map[first][last]=$0
}
END {
PROCINFO["sorted_in"]="#ind_str_asc";
for (i in map) {
for (j in map[i])
{
print map[i][j]
}
}
}' namesfile > datafile.fix
One liner:
awk -F[,\ ] '!/^$/ && !/^#/ { first=$1;last=$2;map[first][last]=$0 } END { PROCINFO["sorted_in"]="#ind_str_asc";for (i in map) { for (j in map[i]) { print map[i][j] } } }' namesfile > datafile.fix
A solution completely in gawk.
Set the field separator to both , and space. Then ignore any lines that are empty or start with #. Mark the first and last variables based on the delimited fields and then create a two dimensional array called map indexed by first and last name and the value equal to the line. At the end, set the sort to indices string ascending and loop through the array printing the names in order as requested.
Completely in bash:
re="^[[:space:]]*([^#]([[:space:]]|[[:alpha:]])+),(([[:space:]]|[[:alpha:]])*[[:alpha:]]) *([[:digit:]]+)"
while read line
do
if [[ ${line} =~ $re ]]
then
echo ${BASH_REMATCH[3]},${BASH_REMATCH[1]} ${BASH_REMATCH[5]}
else
echo "${line}"
fi
done < names.txt
The core of this is to capture, using bash regex matching (=~ operator of the [[ command), parenthesis groupings, and the BASH_REMATCH array, the name before the comma (([^#]([[:space:]]|[[:alpha:]])+)), the name after the comma ((([[:space:]]|[[:alpha:]])*[[:alpha:]])), and the age ( *([[:digit:]]+)). The first-name regex is constructed so as to exclude comments, and the last-name regex is constructed as to handle multiple spaces before the age without including them in the name. Preconditions: Commented lines with or without leading spaces (^[[:space:]]*([^#]), or lines without a comma, are passed through unchanged. Either first names or last names may have internal spaces. Once the last name and first name are isolated, it is easy to print them in reverse order followed by the age (echo ${BASH_REMATCH[3]},${BASH_REMATCH[1]} ${BASH_REMATCH[5]}). Note that the letter/space groupings are counted as matches which is why we skip 2 and 4.
I have tried using awk and sed.
Try if this works
less dataflie.fix | sed 's/ /,/g' | awk -F "," '{print $2,$1,$3}' | sed 's/ /,/' | sed 's/^,//' | sort -u > dataflie_new.fix
I have a file with a bunch of paths that look like so:
7 /usr/file1564
7 /usr/file2212
6 /usr/file3542
I am trying to use sort to pull out and print the path(s) with the most occurrences. Here it what I have so far:
cat temp| sort | uniq -c | sort -rk1 > temp
I am unsure how to only print the highest occurrences. I also want my output to be printed like this:
7 1564
7 2212
7 being the total number of occurrences and the other numbers being the file numbers at the end of the name. I am rather new to bash scripting so any help would be greatly appreciated!
To emit only the first line of output (with the highest number, since you're doing a reverse numeric sort immediately prior), pipe through head -n1.
To remove all content which is not either a number or whitespace, pipe through tr -cd '0-9[:space:]'.
To filter for only the values with the highest number, allowing there to be more than one:
{
read firstnum name && printf '%s\t%s\n' "$firstnum" "$name"
while read -r num name; do
[[ $num = $firstnum ]] || break
printf '%s\t%s\n' "$num" "$name"
done
} < temp
If you want to avoid sort and you are allowed to use awk, then you can do this:
awk '{
if($1>maxcnt) {s=$1" "substr($2,10,4); maxcnt=$1} else
if($1==maxcnt) {s=s "\n"$1" "substr($2,10,4)}} END{print s}' \
temp
I have a log file that is grouping http requests in 5 minute increments based on a unique set of characteristics. Format is as follows:
beginTime endTime platform hostname osVersion os requestType httpStatus nbInstances
So a sample log line could be:
1423983600 1423983900 platform1 test01 8.1 win createAcct 200 15
This indicates in that 5 minute timeframe there were 15 requests with this unique attribute set. What I would like to do is then take this and generate 15 lines identical lines in an output file.
Right now I have a very simple script that is getting the job done but probably not very efficient:
#!/bin/bash
file=$1
count=0
cat $file | while read line
do
string=`echo $line | awk '{print $1,$2,$3,$4,$5,$6,$7,$8}'`
nbInst=`echo $line | awk '{print $9}'`
while [[ $count -lt $nbInst ]]
do
echo "$string" >> test_data.log
count=`expr $count + 1`
done
count=0
done
Any ideas on a faster solution in bash or perl? Thanks.
As mentioned in the comments - it seems unusual that you need to de-coalesce your events to process and index.
However this should do what you're asking:
#!/usr/bin/perl
use strict;
use warnings;
#uses DATA segment from below as file. You'll probably want either STDIN
#or open a file handle.
while (<DATA>) {
#separate line on whitespace
my #line = split;
#grab the last element of the line (pop returns the value, and removes
#from the list)
for ( 1 .. pop(#line) ) {
print join( " ", #line ), "\n";
}
}
__DATA__
1423983600 1423983900 platform1 test01 8.1 win createAcct 200 15