I'm capturing URL content using cURL which gives output in HTML format. Using awk I'm capturing sensor name and its status.
(curl <MY URL> | awk -F"Sensor<\/th><td>" '{print $2}' | awk -F"<\/td></tr>" '{print $1}'; \
curl <my URL> | awk -F"Status<\/th><td><strong>" '{print $2}' | awk -F"<\/strong>" '{printf $1}' \
) | tr -d '\n' >> output
cURL input like,
<html><head><title>Sensor status for NumberOfThreadsSensor-NumberOfThreads</title></head><body>
<h1>Sensor status for NumberOfThreadsSensor-NumberOfThreads</h1>
<table>
<tr><th>Plugin</th><td>NumberOfThreadsSensor</td></tr><tr><th>Sensor</th><td>NumberOfThreads</td></tr><tr><th>Status</th><td>Ok</td></tr><tr><th>Created</th><td>Fri Aug 14 09:03:10 UTC 2020 (13 seconds ago)</td></tr><tr><th>TTL</th><td>30 seconds</td></tr><tr><th>Short message</th><td>1;14;28</td></tr><tr><th>Long message</th><td>1 [interval: 1 min];14 [interval: 30 min];28 [interval: 60 min]</td></tr></table>
<h2>Formats</h2><p>The status shown on this page is also available in the following machine-friendly formats:</p>
<ul>
<li>A simple status string, Possible values: OK, WARNING, CRITICAL, UNKNOWN.</li>
<li>Nagios plugin output, output formatted for easy integration with Nagios.</li>
<li>Full xml all available data in xml for easy parsing by ad-hoc monitoring tools.</li>
<li>Prometheus output, all available data in prometheus format</li>
</ul>
<p>Please do not rely on the output of this page for automated monitoring, use one of the formats above.</p>
</body></html>
Current output ScoreProcessorWarning
expected output ScoreProcessor Warning
Please help me to simplify my shell script and I'm in learning phase. Thanks for help
With the input presented saved in /tmp/input.txt:
<h1>Sensor status for EventProcessorStatus-ScoreProcessor</h1>
<table>
<tr><th>Plugin</th><td>EventProcessorStatus</td></tr><tr><th>Sensor</th><td>ScoreProcessor</td></tr><tr><th>Status</th><td><strong>Warning</strong></td></tr><tr><th>Created</th><td>Fri Aug 10 00:16:23 UTC 2020 (0 seconds ago)</td></tr><tr><th>TTL</th><td>30 seconds</td></tr><tr><th>Short message</th><td>Endpoint is running, but has errors</td></tr><tr><th>Long message</th><td>Endpoint is running, but has errors<br/>
Number of errors in background process (xxxx) logs: 4<br/>
</td></tr></table>
<h2>Performance data</h2><table>
with my very limited knowledge of xmllint I ended with:
# Extract only table, get text from all tales
xmllint --html --xpath '//table//tr//text()' /tmp/input.txt |
# Because we know table has two rows, join two lines together
sed 'N;s/\n/\t/' |
# Filter Sensor and status only
sed -n '/Sensor\t/{s///;h}; /Status\t/{s///;x;G;p}' |
# Read the sensor and status to bash
{ IFS= read -r name; IFS= read -r status; echo "name=$name status=$status" ;}
which outputs:
name=ScoreProcessor status=Warning
I want to execute a command on the body of every incoming postfix mail.
sed ':a;N;$!ba;s/=\n//g' /path-to/message-file | sed 's/</\n\</g' | sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist | paste -s -d '|')"'/! s/http/httx/g'
I think it could be possible with Postfix After-Queue Content Filter, but I don't know how to do it...
EDIT:
afterqueue.sh
#!/bin/sh
# Simple shell-based filter. It is meant to be invoked as follows:
# /path/to/script -f sender recipients...
# Localize these. The -G option does nothing before Postfix 2.3.
INSPECT_DIR=/var/spool/filter
SENDMAIL="/usr/sbin/sendmail -G -i" # NEVER NEVER NEVER use "-t" here.
# Exit codes from <sysexits.h>
EX_TEMPFAIL=75
EX_UNAVAILABLE=69
# Clean up when done or when aborting.
trap "rm -f in.$$" 0 1 2 3 15
# Start processing.
cd $INSPECT_DIR || {
echo $INSPECT_DIR does not exist; exit $EX_TEMPFAIL; }
cat >in.$$ || {
echo Cannot save mail to file; exit $EX_TEMPFAIL; }
# Specify your content filter here.
sh /path/to/remove_links.sh <in.$$
$SENDMAIL "$#" <in.$$
exit $?
remove_links.sh
#!/bin/bash
sed ':a;N;$!ba;s/=\n//g' $1 | sed 's/</\n\</g' | sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' /path/to/whitelist | paste -s -d '|')"'/! s/http/httx/g'
It is working, if I call it by hand, but if I add it to the /etc/postfix/master.cf like this:
# =============================================================
# service type private unpriv chroot wakeup maxproc command
# (yes) (yes) (yes) (never) (100)
# =============================================================
filter unix - n n - 10 pipe
flags=Rq user=filter null_sender=
argv=/path/to/afterqueue.sh -f ${sender} -- ${recipient}
there are no changes in the mail.
I get the following syslog:
Apr 13 15:14:08 rs211184 postfix/qmgr[7492]: 3FFDF23CB5F: from=<test#gmail.com>, size=4358, nrcpt=1 (queue active)
Apr 13 15:14:08 rs211184 postfix/pipe[7504]: 116E523CA8C: to=<example#example.de>, relay=filter, delay=0.2, delays=0.16/0/0/0.04, dsn=2.0.0, status=sent (delivered via filter service)
Apr 13 15:14:08 rs211184 postfix/qmgr[7492]: 116E523CA8C: removed
Apr 13 15:14:08 rs211184 postfix-local[7522]: postfix-local: from=test#gmail.com, to=example#example.de, dirname=/var/qmail/mailnames
Apr 13 15:14:08 rs211184 postfix/pipe[7521]: 3FFDF23CB5F: to=<dsehlhoff#lcdev1.de>, relay=plesk_virtual, delay=0.02, delays=0.01/0/0/0.01, dsn=2.0.0, status=sent (delivered via plesk_virtual service)
Apr 13 15:14:08 rs211184 postfix/qmgr[7492]: 3FFDF23CB5F: removed
You seem to expect the message in a file, and oddly a static file name, but that's not how it works. The message arrives on standard input. Minimally, just remove /path/to/message-file -- but really, piping sed to sed is very often a mistake; you should refactor this to a single sed script (or Awk, or Python, or what have you).
sed -e ':a;N;$!ba;s/=\n//g' -e 's/</\n\</g' |
# This is too convoluted, really!
sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist |
paste -s -d '|')"'/! s/http/httx/g'
The output of my shell script is as follows(please find the attached image)
workflow_Name1
Succeeded
Tue May 19 11:15:33 2015
workflow_Name2
Succeeded
Wed Jun 10 18:00:21 2015
I want this to be changed to
workflow_Name1 :-Succeeded :-Tue May 19 11:15:33 2015
workflow_Name2 :-Succeeded :-Wed Jun 10 18:00:21 2015
Following is the script I am using. Could you please let me know how to achieve this.
#!/bin/bash
# source $HOME/.bash_profile
output=/home/infaprd/cron/output.lst
sqlplus -s user/test#dev <<EOF >$output # Capture output from SQL
set linesize 55 pages 500
spool output_temp.lst;
set head off;
select sysdate from dual;
set head on;
spool off;
EOF
for name in workflow_Name1 workflow_Name2; do
pmcmd getworkflowdetails -Repository ${name}
done |
grep -e "Workflow:" -e "Workflow run status:" -e "End time:" | cut -d'[' -f2 | cut -d']' -f1 |
sed -e 's/ *$//' >> $output
mail -s "Output - `date '+%d-%m-%y'`" akhil#gmail.com <$output
You can do it using awk
awk '{getline a;getline b; if($0) printf "%-s\n", $0 " :-" a " :-" b}'
Output:
workflow_Name1 :-Succeeded :-Tue May 19 11:15:33 2015
workflow_Name2 :-Succeeded :-Wed Jun 10 18:00:21 2015
You can also use sed to accomplish this task:
sed 'N;N;s/\n/ :-/g'
consider the example:
Feb 14 26:00:01 randomtext here mail from user10#mailbox.com more random text
Feb 15 25:08:82 randomtext random text mail from user8#mailbox.com more random text
Jan 20 26:23:89 randomtext iortest test test mail from user6#mailbox.com more random
Mar 15 18:23:01 randomtext here mail from user4#mailbox.com more random text
Jun 15 20:04:01 randomtext here mail from user10#mailbox.com more random text
Using BASH I am trying to retrieve the first part of the time stamp for example '26' '25' and the email of the user for example 'user10#mailbox.com'
output would then roughly look like:
26 user10#mailbox.com
25 user8#mailbox.com
26 user6#mailbox.com
18 user4#mailbox.com
20 user10#mailbox.com
I have tried using:
cat myfile | grep -o '[0-9][0-9].*.com'
but it gives me excess text in the middle.
How would i go about retrieving just the two strings i need?
Use sed with capture groups to select the parts you want.
sed 's/^.* \([0-9][0-9]\):.* mail from \(.*#.*\.com\).*/\1 \2/' myfile
^ = beginning of line
.* = any sequence of characters followed by space
\([0-9[0-9]\): = 2 digits followed by a colon. The digits will be saved in capture group #1
.* mail from = any sequence up to a space followed by mail from and another space
\(.*#.*\.com\) = any sequence followed by # followed by any sequence up to .com. This will be saved in capture group #2
.* = any sequence; this will match the rest of the line
Everything this matches (the whole line) will be replaced by capture group #1, a space, and capture group #2.
Try
cat myfile | awk '{print $3, $8}' | sed 's/:[0-9][0-9]//g'
Disclaimer: my awk skills are rusty - there should be a way to do this solely in awk without resorting to sed.
If all your email-addresses will have only domain .com - the previous answer using sed is perfect.
But if you can have different domain, it's better to improve this sed:
sed 's/^.* \([0-9][0-9]\):.* mail from \(.*#.*\..*\)\ more.*/\1 \2/' file
With perl :
$ perl -lne '
print "$1 $2" if /^\w+\s+\d+\s+(\d+):\d+:\d+\s+.*?([-\w\.]+#\S+)/
' file.txt
Output :
26 0#mailbox.com
25 8#mailbox.com
26 6#mailbox.com
18 4#mailbox.com
20 0#mailbox.com
Is there a way to get the size of a remote file like
http://api.twitter.com/1/statuses/public_timeline.json
in shell script?
You can download the file and get its size. But we can do better.
Use curl to get only the response header using the -I option.
In the response header look for Content-Length: which will be followed by the size of the file in bytes.
$ URL="http://api.twitter.com/1/statuses/public_timeline.json"
$ curl -sI $URL | grep -i Content-Length
Content-Length: 134
To get the size use a filter to extract the numeric part from the output above:
$ curl -sI $URL | grep -i Content-Length | awk '{print $2}'
134
Two caveats to the other answers:
Some servers don't return the correct Content-Length for a HEAD request, so you might need to do the full download.
You'll likely get an unrealistically large response (compared to a modern browser) unless you specify gzip/deflate headers.
Also, you can do this without grep/awk or piping:
curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent --write-out 'size_download=%{size_download}\n' --output /dev/null
And the same request with compression:
curl 'http://api.twitter.com/1/statuses/public_timeline.json' --location --silent -H 'Accept-Encoding: gzip,deflate' --write-out 'size_download=%{size_download}\n' --output /dev/null
Similar to codaddict's answer, but without the call to grep:
curl -sI http://api.twitter.com/1/statuses/public_timeline.json | awk '/Content-Length/ { print $2 }'
The preceding answers won't work when there are redirections. For example, if one wants the size of the debian iso DVD, he must use the --location option, otherwise, the reported size may be that of the 302 Moved Temporarily answer body, not that of the real file.
Suppose you have the following url:
$ url=http://cdimage.debian.org/debian-cd/8.1.0/amd64/iso-dvd/debian-8.1.0-amd64-DVD-1.iso
With curl, you could obtain:
$ curl --head --location ${url}
HTTP/1.0 302 Moved Temporarily
...
Content-Type: text/html; charset=iso-8859-1
...
HTTP/1.0 200 OK
...
Content-Length: 3994091520
...
Content-Type: application/x-iso9660-image
...
That's why I prefer using HEAD, which is an alias to the lwp-request command from the libwww-perl package (on debian). Another advantages it has is that it strips the extra \r characters, which eases subsequent string processing.
So to retrieve the size of the debian iso DVD, one could do for example:
$ size=$(HEAD ${url})
$ size=${size##*Content-Length: }
$ size=${size%%[[:space:]]*}
Please note that:
this method will require launching only one process
it will work only with bash, because of the special expansion syntax used
For other shells, you may have to resort to sed, awk, grep et al..
I think the easiest way to do this would be to:
use cURL to run in silent mode -s,
pull only the headers -I (so as to avoid downloading the whole file)
then do a case insensitive grep -i
and return the second arg using awk $2.
output is returned as bytes
Examples:
curl -sI http://api.twitter.com/1/statuses/public_timeline.json | grep -i content-length | awk '{print $2}'
//output: 52
or
curl -sI https://code.jquery.com/jquery-3.1.1.min.js | grep -i content-length | awk '{print $2}'
//output: 86709
or
curl -sI http://download.thinkbroadband.com/1GB.zip | grep -i content-length | awk '{print $2}'
//output: 1073741824
Show as Kilobytes/Megabytes
If you would like to show the size in Kilobytes then change the awk to:
awk '{print $2/1024}'
or Megabytes
awk '{print $2/1024/1024}'
The accepted solution was not working for me, this is:
curl -s https://code.jquery.com/jquery-3.1.1.min.js | wc -c
I have a shell function, based on codaddict's answer, which gives a remote file's size in a human-readable format thusly:
remote_file_size () {
printf "%q" "$*" |
xargs curl -sI |
grep Content-Length |
awk '{print $2}' |
tr -d '\040\011\012\015' |
gnumfmt --to=iec-i --suffix=B # the `g' prefix on `numfmt' is only for systems
# ^ # that lack the GNU coreutils by default, i.e.,
# | # non-Linux systems
# |
# | # in other words, if you're on Linux, remove this
# | # letter `g'; if you're on BSD or Mac, install the GNU coreutils
} # | |
# +----------------------------------------+
This will show you a detailed info about the ongoing download
you just need to specify an URL like below example.
$ curl -O -w 'We downloaded %{size_download} bytes\n'
https://cmake.org/files/v3.8/cmake-3.8.2.tar.gz
output
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 7328k 100 7328k 0 0 244k 0 0:00:29 0:00:29 --:--:-- 365k
We downloaded 7504706 bytes
For automated purposes you'll just need to add the command to your
script file.
To combine all the above for me works:
URL="http://cdimage.debian.org/debian-cd/current/i386/iso-dvd/debian-9.5.0-i386-DVD-1.iso"
curl --head --silent --location "$URL" | grep -i "content-length:" | tr -d " \t" | cut -d ':' -f 2
This will return just the content length in bytes:
3767500800
You can kinda do it like this, including auto-following 301/302 redirections :
curl -ILs 'https://twitter.com/i/csp_report?a=ORTGK%3D%3D%3D&ro=fals' |
mawk 'NF*=!_<NF' \
OFS= FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: '
1 41
It's very brute force but gets the job done - but that's whatever raw value being reported by the server, so you may have to make adjustments to it as you see fit.
You may also have to add the -g flag so it can auto handle switchover from vanilla http to https :
curl -gILs 'http://apple.com' |
mawk 'NF *= !_<NF' OFS= \
FS='^[Cc][Oo][Nn][Tt][Ee][Nn][Tt]-[Ll][Ee][Nn][Gg][Tt][Hh]: '
1 304
2 106049
'(I''m *guessing* this might be the main site,
and first item was the redirection page ? )'
Question is old and have been sufficiently answered , but let expand upon exisiting answer. If you want to automate this task ( for checking file sizes of multiple files) then here's a one liner.
first write the URL of the files in a file:
cat url_of_files.txt
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg002_nis_x1dints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04101_00001-seg003_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_04102_00001-seg001_nis_calints.fits
https://stpubdata-jwst.stsci.edu/ero/jw02734/jw02734002001/jw02734002001_02101_00002-seg001_nis_cal.fits
...
then from the command line (from the same directory as your url_of_files.txt):
eval $(sed -rn '/^https/s/(https.*$)/curl -sI \1/p' url_of_files.txt) | awk '/[Cc]ontent-[Ll]ength/{kb=$2/1024;mb=kb/1024;gb=mb/1024;print ( $2>1024 ? ( kb>1024 ? ( mb>1024 ? gb " G" : mb " M") : kb " K" ) : $2 " B" ) }'
This is for checking file sizes ranging from bytes to Gbs. I use this line to check the fits data files being made available by the JWST team.
It checks the file size and depending on its size , roughly converts it to a an appropriate number with B,K,M,G extensions denoting the size in Bytes, Kilo bytes, Mega bytes, and Giga bytes.
result:
...
177.188 K
177.188 K
236.429 M
177.188 K
5.95184 M
1.83608 G
1.20326 G
130.059 M
1.20326 G
...
My solution is using awk END to ensure to grep only the last Content-length:
function curl2contentlength() {
curl -sI -L -H 'Accept-Encoding: gzip,deflate' $1 | grep -i Content-Length | awk 'END{print $2}'
}
curl2contentlength $#
./curl2contentlength.sh "https://chrt.fm/track/B63133/stitcher.simplecastaudio.com/ec74d48c-cbf1-4764-923e-7d584dce50fa/episodes/a85954a3-24c3-48ed-bced-ef0607b7149a/audio/128/default.mp3?aid=rss_feed&awCollectionId=ec74d48c-cbf1-4764-923e-7d584dce50fa&awEpisodeId=a85954a3-24c3-48ed-bced-ef0607b7149a&feed=qm_9xx0g"
10806508
In fact without it would have been
0
0
10806508
I use like this ([Cc]ontent-[Ll]ength:), because I got server give multiple Content-Length character at header response
curl -sI "http://someserver.com/hls/125454.ts" | grep [Cc]ontent-[Ll]ength: | awk '{ print $2 }'
Accept-Ranges: bytes
Access-Control-Expose-Headers: Date, Server, Content-Type, Content-Length
Server: WowzaStreamingEngine/4.5.0
Cache-Control: no-cache
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: OPTIONS, GET, POST, HEAD
Access-Control-Allow-Headers: Content-Type, User-Agent, If-Modified-Since, Cache-Control, Range
Date: Tue, 10 Jan 2017 01:56:08 GMT
Content-Type: video/MP2T
Content-Length: 666460
different solution:
ssh userName#IP ls -s PATH | grep FILENAME | awk '{print$1}'
gives you the size in KB