I'm capturing URL content using cURL which gives output in HTML format. Using awk I'm capturing sensor name and its status.
(curl <MY URL> | awk -F"Sensor<\/th><td>" '{print $2}' | awk -F"<\/td></tr>" '{print $1}'; \
curl <my URL> | awk -F"Status<\/th><td><strong>" '{print $2}' | awk -F"<\/strong>" '{printf $1}' \
) | tr -d '\n' >> output
cURL input like,
<html><head><title>Sensor status for NumberOfThreadsSensor-NumberOfThreads</title></head><body>
<h1>Sensor status for NumberOfThreadsSensor-NumberOfThreads</h1>
<table>
<tr><th>Plugin</th><td>NumberOfThreadsSensor</td></tr><tr><th>Sensor</th><td>NumberOfThreads</td></tr><tr><th>Status</th><td>Ok</td></tr><tr><th>Created</th><td>Fri Aug 14 09:03:10 UTC 2020 (13 seconds ago)</td></tr><tr><th>TTL</th><td>30 seconds</td></tr><tr><th>Short message</th><td>1;14;28</td></tr><tr><th>Long message</th><td>1 [interval: 1 min];14 [interval: 30 min];28 [interval: 60 min]</td></tr></table>
<h2>Formats</h2><p>The status shown on this page is also available in the following machine-friendly formats:</p>
<ul>
<li>A simple status string, Possible values: OK, WARNING, CRITICAL, UNKNOWN.</li>
<li>Nagios plugin output, output formatted for easy integration with Nagios.</li>
<li>Full xml all available data in xml for easy parsing by ad-hoc monitoring tools.</li>
<li>Prometheus output, all available data in prometheus format</li>
</ul>
<p>Please do not rely on the output of this page for automated monitoring, use one of the formats above.</p>
</body></html>
Current output ScoreProcessorWarning
expected output ScoreProcessor Warning
Please help me to simplify my shell script and I'm in learning phase. Thanks for help
With the input presented saved in /tmp/input.txt:
<h1>Sensor status for EventProcessorStatus-ScoreProcessor</h1>
<table>
<tr><th>Plugin</th><td>EventProcessorStatus</td></tr><tr><th>Sensor</th><td>ScoreProcessor</td></tr><tr><th>Status</th><td><strong>Warning</strong></td></tr><tr><th>Created</th><td>Fri Aug 10 00:16:23 UTC 2020 (0 seconds ago)</td></tr><tr><th>TTL</th><td>30 seconds</td></tr><tr><th>Short message</th><td>Endpoint is running, but has errors</td></tr><tr><th>Long message</th><td>Endpoint is running, but has errors<br/>
Number of errors in background process (xxxx) logs: 4<br/>
</td></tr></table>
<h2>Performance data</h2><table>
with my very limited knowledge of xmllint I ended with:
# Extract only table, get text from all tales
xmllint --html --xpath '//table//tr//text()' /tmp/input.txt |
# Because we know table has two rows, join two lines together
sed 'N;s/\n/\t/' |
# Filter Sensor and status only
sed -n '/Sensor\t/{s///;h}; /Status\t/{s///;x;G;p}' |
# Read the sensor and status to bash
{ IFS= read -r name; IFS= read -r status; echo "name=$name status=$status" ;}
which outputs:
name=ScoreProcessor status=Warning
Related
would like to get an opinion on how best to do this in bash, thank you
for x number of servers, each has it's own list of replication agreements and their status.. it's easy to run a few commands and get this data, ex;
get servers, output (setting/variable in/from a local config file);
. ./ldap-config ; echo "$MASTER $REPLICAS"
dc1-server1 dc1-server2 dc2-server1 dc2-server2 dc3...
for dc1-server1, get agreements, output;
ipa-replica-manage -p $(cat ~/.dspw) list -v $SERVER.$DOMAIN | grep ': replica' | sed 's/: replica//'
dc2-server1
dc3-server1
dc4-server1
for dc1-server1, get agreement status codes, output;
ipa-replica-manage -p $(cat ~/.dspw) list -v $SERVER.$DOMAIN | grep 'status: Error (' | sed -e 's/.*status: Error (//' -e 's/).*//'
0
0
18
so output would be several columns based on the 'get servers' list with each 'replica: status' under each server, for that server
looking to achieve something like;
dc2-server1: 0 dc2-server2: 0 dc1-server1: 0 ...
dc3-server1: 0 dc3-server2: 18 dc3-server1: 13 ...
dc4-server1: 18 dc4-server2: 0 dc4-server1: 0 ...
Generally eval is considered evil. Nevertheless, I'm going to use it.
paste is handy for printing files side-by-side.
Bash process substitutions can be used where you'd use a filename.
So, I'm going to dynamically build up a paste command and then eval it
I'm going to use get.sh as a placeholder for your mystery commands.
cmd="paste"
while read -ra servers; do
for server in "${servers[#]}"; do
cmd+=" <(./get.sh \"$server\" agreements | sed 's/\$/:/')"
cmd+=" <(./get.sh \"$server\" status)"
done
done < <(./get.sh servers)
eval "$cmd" | column -t
I have a requirement to read certain parameters from log file and then update to a database. I am trying to achieve the first part, i.e. to read from log file using awk commands in a shell script
Log file may consists of below lines or more-
[2018-05-22T11:35:17,857] [RQST: rqst_3ADE-5439-598D-1B8B | TB: 9000042] - [588455375] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 769 - requestType="TESTING",partnerName="Test Merchant 123",testId="123456",lob="TEST1_TO_TEST2",tranType="TEST1",paymentType="P2M",amount="110.00",currency="840",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
[2018-05-22T11:35:17,857] [RQST: rqst_2AEF-2339-598D-1B8B | TB: 9000043] - [588455376] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 770 - requestType="TESTING",partnerName="Test Merchant 234",testId="234567",lob="TEST2_TO_TEST3",tranType="TEST2",paymentType="P2M",amount="120.00",currency="850",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
[2018-05-22T11:35:17,857] [RQST: rqst_4EDA-4539-598D-1B8B | TB: 9000044] - [588455377] - INFO - com.test.webapp.services.functions.TestTransactionService - Line 771 - requestType="TESTING",partnerName="Test Merchant 345",testId="345678",lob="TEST3_TO_TEST4",tranType="TEST3",paymentType="P2M",amount="130.00",currency="860",processor="CBN",network="TestSend",responseCode="00", acctNumLastFour="0087",binCountry="USA",binCurr="USD"
I need to apply filters processor and paymentType and retrieve values of the amount, currency, network and responseCode to variables in a shell script which will be inserted into an Oracle DB table.
I am new to ShellScript and AWK and unable to wrap this. I have tried using
awk '/amount/{print}' testAPI.log
however, is returning all rows which have amount.
since you didn't specify the expected output, here is a template you can tailor for your needs
$ awk -F' - ' '{n=split($NF,a,",");
for(i=1;i<=n;i++) {split(a[i],b,"="); kv[b[1]]=b[2]}}
kv["processor"]=="\"CBN\""
&& kv["paymentType"]=="\"P2M\""{print kv["amount"],kv["currency"]}' file
"110.00" "840"
"120.00" "850"
"130.00" "860"
you can trim the double quotes as well but not sure it's needed as is...
I tried with the three entries in the question, below gives you the output you want
it checks if $5 is paymentType="P2M" and if $8 is having the value processor="CBN" basically, the filter you were looking for, substitute with the required filters you need.
cat testAccelAPI.log | grep -i "[RQST: rqst" | cut -d ' ' -f 19 | awk -F, '{ if($5=="paymentType=\"P2M\"" && $8=="processor=\"CBN\"") print $5 "=" $6 "="$7 "="$8 "=" $9 "="$10}' | cut -d= -f 4,6,8,9 | tr = " "
I have a bunch of text as a output from command, I need to display only specific matching lines plus some additional lines after match "message" (message text is obviously longer than 1 line)
what I tried was:
grep -e 'Subject:' -e 'Date:' -A50 -e 'Message:'
but it included 50 lines after EACH match, and I need to pass that only to single parameter. How would I do that?
code with output command:
(<...> | telnet <mailserver> 110 | grep -e 'Subject:' -e 'Date:' -A50 -e 'Message:'
Part of the telnet output:
Date: Tue, 10 Sep 2013 16
Message-ID: <00fb01ceae25$
MIME-Version: 1.0
Content-Type: multipart/alternative;
boundary="----=_NextPart_000_00FC_01CEAE3E.DE32CE40"
X-Mailer: Microsoft Office Outlook 12.0
Thread-Index: Ac6uJWYdA3lUzs1cT8....
Content-Language: lt
X-Mailman-Approved-At: Tue, 10 Sep 2013 16:0 ....
Subject: ...
X-BeenThere: ...
Precedence: list
Try following:
... | telnet ... > <file>
grep -e 'Subject:' -e 'Date:' <file> && grep -A50 -e 'Message:' <file>
Will need to dump the output to a file first.
This can be done with awk as well, without the need for dumping output to a file.
... | telnet ... | awk '/Date:/ {print}; /Subject:/ {print}; /Message:/ {c=50} c && c--'
With grep it would be hard to do. Better use awk for this
awk '/Subject:|Date:/;/Message:/ {while(l<=50){print $0;l++;getline}}'
Here the awk prints 50 lines below the Message: pattern and only one line is printed for all other patterns.
I'm trying to analyze an enormous text file (1.6GB), whose data lines look like this:
20090118025859 -2.400000 78.100000 1023.200000 0.000000
20090118025900 -2.500000 78.100000 1023.200000 0.000000
20090118025901 -2.400000 78.100000 1023.200000 0.000000
I don't even know how many lines there are. But I'm trying to split the file by date. The left number is a time stamp (these lines for example are from 2009, january 18th).
How can I split this file into pieces according to the date?
The number of entries per date differs, so using split with a constant number won't work.
Everything I know would be to grep file '20090118*' > data20090118.dat , but there sure is a way to do all the dates at once, right?
Thanks in advance,
Alex
Using awk:
awk '{print > "data"substr($1,0,8)".dat"}' myfile
This should work if the items are in date sequence:
date=20090101 # Change to the earliest date
while IFS= read -rd $'\n' line
do
if [ "$(echo "$line" | cut -d ' ' -f 1 | cut -c 1-8)" -eq $date ]
then
echo "$line" >> "$date.dat"
else
let date++
fi
done < log.dat
With the caveats that each day needs to have more than 1 record,
and that the output file will have blank lines:
uniq --all-repeated=separate -w8 file | csplit -s - '/^$/' '{*}'
We really should have an option to uniq to output even uniq records.
Also csplit should have an option to suppress the matched line.
Is there a way to get the size of a remote file like
http://api.twitter.com/1/statuses/public_timeline.json
in shell script?
You can download the file and get its size. But we can do better.
Use curl to get only the response header using the -I option.
In the response header look for Content-Length: which will be followed by the size of the file in bytes.
$ URL="http://api.twitter.com/1/statuses/public_timeline.json"
$ curl -sI $URL | grep -i Content-Length
Content-Length: 134
To get the size use a filter to extract the numeric part from the output above:
$ curl -sI $URL | grep -i Content-Length | awk '{print $2}'
134
Two caveats to the other answers:
Some servers don't return the correct Content-Length for a HEAD request, so you might need to do the full download.
You'll likely get an unrealistically large response (compared to a modern browser) unless you specify gzip/deflate headers.
Also, you can do this without grep/awk or piping:
curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent --write-out 'size_download=%{size_download}\n' --output /dev/null
And the same request with compression:
curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent -H 'Accept-Encoding: gzip,deflate' --write-out 'size_download=%{size_download}\n' --output /dev/null
Similar to codaddict's answer, but without the call to grep:
curl -sI http://api.twitter.com/1/statuses/public_timeline.json | awk '/Content-Length/ { print $2 }'
The preceding answers won't work when there are redirections. For example, if one wants the size of the debian iso DVD, he must use the --location option, otherwise, the reported size may be that of the 302 Moved Temporarily answer body, not that of the real file.
Suppose you have the following url:
$ url=http://cdimage.debian.org/debian-cd/8.1.0/amd64/iso-dvd/debian-8.1.0-amd64-DVD-1.iso
With curl, you could obtain:
$ curl --head --location ${url}
HTTP/1.0 302 Moved Temporarily
...
Content-Type: text/html; charset=iso-8859-1
...
HTTP/1.0 200 OK
...
Content-Length: 3994091520
...
Content-Type: application/x-iso9660-image
...
That's why I prefer using HEAD, which is an alias to the lwp-request command from the libwww-perl package (on debian). Another advantages it has is that it strips the extra \r characters, which eases subsequent string processing.
So to retrieve the size of the debian iso DVD, one could do for example:
$ size=$(HEAD ${url})
$ size=${size##*Content-Length: }
$ size=${size%%[[:space:]]*}
Please note that:
this method will require launching only one process
it will work only with bash, because of the special expansion syntax used
For other shells, you may have to resort to sed, awk, grep et al..
I think the easiest way to do this would be to:
use cURL to run in silent mode -s,
pull only the headers -I (so as to avoid downloading the whole file)
then do a case insensitive grep -i
and return the second arg using awk $2.
output is returned as bytes
Examples:
curl -sI http://api.twitter.com/1/statuses/public_timeline.json | grep -i content-length | awk '{print $2}'
//output: 52
or
curl -sI https://code.jquery.com/jquery-3.1.1.min.js | grep -i content-length | awk '{print $2}'
//output: 86709
or
curl -sI http://download.thinkbroadband.com/1GB.zip | grep -i content-length | awk '{print $2}'
//output: 1073741824
Show as Kilobytes/Megabytes
If you would like to show the size in Kilobytes then change the awk to:
awk '{print $2/1024}'
or Megabytes
awk '{print $2/1024/1024}'
The accepted solution was not working for me, this is:
curl -s https://code.jquery.com/jquery-3.1.1.min.js | wc -c
I have a shell function, based on codaddict's answer, which gives a remote file's size in a human-readable format thusly:
remote_file_size () {
printf "%q" "$*" |
xargs curl -sI |
grep Content-Length |
awk '{print $2}' |
tr -d '\040\011\012\015' |
gnumfmt --to=iec-i --suffix=B # the `g' prefix on `numfmt' is only for systems
# ^ # that lack the GNU coreutils by default, i.e.,
# | # non-Linux systems
# |
# | # in other words, if you're on Linux, remove this
# | # letter `g'; if you're on BSD or Mac, install the GNU coreutils
} # | |
# +----------------------------------------+
This will show you a detailed info about the ongoing download
you just need to specify an URL like below example.
$ curl -O -w 'We downloaded %{size_download} bytes\n'
https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz
output
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7328k 100 7328k 0 0 244k 0 0:00:29 0:00:29 --:--:-- 365k
We downloaded 7504706 bytes
For automated purposes you'll just need to add the command to your
script file.
To combine all the above for me works:
URL="http://cdimage.debian.org/debian-cd/current/i386/iso-dvd/debian-9.5.0-i386-DVD-1.iso"
curl --head --silent --location "$URL" | grep -i "content-length:" | tr -d " \t" | cut -d ':' -f 2
This will return just the content length in bytes:
3767500800
You can kinda do it like this, including auto-following 301/302 redirections :
curl -ILs 'https://twitter.com/i/csp_report?a=ORTGK%3D%3D%3D&ro=fals' |
mawk 'NF*=!_<NF' \
OFS= FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: '
1 41
It's very brute force but gets the job done - but that's whatever raw value being reported by the server, so you may have to make adjustments to it as you see fit.
You may also have to add the -g flag so it can auto handle switchover from vanilla http to https :
curl -gILs 'http://apple.com' |
mawk 'NF *= !_<NF' OFS= \
FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: '
1 304
2 106049
'(I''m *guessing* this might be the main site,
and first item was the redirection page ? )'
Question is old and have been sufficiently answered , but let expand upon exisiting answer. If you want to automate this task ( for checking file sizes of multiple files) then here's a one liner.
first write the URL of the files in a file:
cat url_of_files.txt
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg002_nis_x1dints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg003_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04102_00001-seg001_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_02101_00002-seg001_nis_cal.fits
...
then from the command line (from the same directory as your url_of_files.txt):
eval $(sed -rn '/^https/s/(https.*$)/curl -sI \1/p' url_of_files.txt) | awk '/[Cc]ontent-[Ll]ength/{kb=$2/1024;mb=kb/1024;gb=mb/1024;print ( $2>1024 ? ( kb>1024 ? ( mb>1024 ? gb " G" : mb " M") : kb " K" ) : $2 " B" ) }'
This is for checking file sizes ranging from bytes to Gbs. I use this line to check the fits data files being made available by the JWST team.
It checks the file size and depending on its size , roughly converts it to a an appropriate number with B,K,M,G extensions denoting the size in Bytes, Kilo bytes, Mega bytes, and Giga bytes.
result:
...
177.188 K
177.188 K
236.429 M
177.188 K
5.95184 M
1.83608 G
1.20326 G
130.059 M
1.20326 G
...
My solution is using awk END to ensure to grep only the last Content-length:
function curl2contentlength() {
curl -sI -L -H 'Accept-Encoding: gzip,deflate' $1 | grep -i Content-Length | awk 'END{print $2}'
}
curl2contentlength $#
./curl2contentlength.sh "https://chrt.fm/track/B63133/stitcher.simplecastaudio.com/ec74d48c-cbf1-4764-923e-7d584dce50fa/episodes/a85954a3-24c3-48ed-bced-ef0607b7149a/audio/128/default.mp3?aid=rss_feed&awCollectionId=ec74d48c-cbf1-4764-923e-7d584dce50fa&awEpisodeId=a85954a3-24c3-48ed-bced-ef0607b7149a&feed=qm_9xx0g"
10806508
In fact without it would have been
0
0
10806508
I use like this ([Cc]ontent-[Ll]ength:), because I got server give multiple Content-Length character at header response
curl -sI "http://someserver.com/hls/125454.ts" | grep [Cc]ontent-[Ll]ength: | awk '{ print $2 }'
Accept-Ranges: bytes
Access-Control-Expose-Headers: Date, Server, Content-Type, Content-Length
Server: WowzaStreamingEngine/4.5.0
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: OPTIONS, GET, POST, HEAD
Access-Control-Allow-Headers: Content-Type, User-Agent, If-Modified-Since, Cache-Control, Range
Date: Tue, 10 Jan 2017 01:56:08 GMT
Content-Type: video/MP2T
Content-Length: 666460
different solution:
ssh userName#IP ls -s PATH | grep FILENAME | awk '{print$1}'
gives you the size in KB