How to extract specific information from textfile table UNIX/Shell Scripting - shell

I'm really new to UNIX/Shell Scripting I'm trying to extract disk usage from numerous servers. So what I'm trying to do is making a shell script that runs
df -g > diskusage.txt to obtain following table and extract ** data from below
Filesystem Size Used Avail Use% Mounted on
/dev/ibm_lv 84.00 56.81 33% 637452 5% /usr/IBM
/dev/apps_lv 10.00 9.95 **1%** 5 1% /usr/apps
/dev/upi_lv 110.00 85.85 **22%** 90654 1% /usr/app/usr
user08:/backup 2000.00 1611.22 20% 177387 1% /backup
Depending on the server, there are more file systems but i only want /usr/apps/usr,/usr/apps disk usage regardless of the number of filesystem. (/usr/apps/usr,/usr/apps will always located at last three row)
I'm pretty sure there are simpler ways than reading last 3 lines -> disregard last line -> search for % on each line.
If there is better way to extract these data, please let me know.

df -g | awk '/\/usr\/app/ {print $4}'
That gets you the available percentages, but it doesn't tell you which one goes with which. You can always include the mountpoint in the output, but then you still have to do some parsing to get the numbers out, something like this:
while read avail mount; do
echo "$mount has $avail available"
done < <(df -g | awk '/\/usr\/app/ {print $4, $NF}')

Related

What is the way to read tables displayed after commands in bash

Is there is the best way to read results of some bash commands, which are displayed as a table, if I need some particular row and column there?
For example, when I run this line:
gcloud app instances describe $instance_name --service=postprocessing --version=$instance_version
I get this:
startTime: '2020-08-03T16:29:29.142Z'
vmDebugEnabled: true
vmIp: xx.xx.xxx.xxx
vmStatus: RUNNING
I need only IP from this table, how do I get it?
I found only one way, to save the whole output as an array and then get -3rd element of it. I was wondering if there is a better way to it?
Also for other types of output with more "columns" which have a header, like this one:
mst#cloudshell:~ (me)$ free -m
total used free shared buff/cache available
Mem: 1995 469 279 0 1247 1379
Swap: 767 0 767
How do I get free/Swap, for example?
I suggest using awk to get the value you want in the output.
To set $VMIP variable with the vmIP value (2nd field):
VMIP=$(gcloud app ... | awk '/vmIp:/ {print $2}' )
To set $SWAP_FREE variable with the Swap free value (4th field):
SWAP_FREE=$(free -m | awk '/Swap:/ {print $4}')

Enhanced docker stats command with total amount of RAM and CPU

I just want to share a small script that I made to enhance the docker stats command.
I am not sure about the exactitude of this method.
Can I assume that the total amount of memory consumed by the complete Docker deployment is the sum of each container consumed memory ?
Please share your modifications and or corrections. This command is documented here: https://docs.docker.com/engine/reference/commandline/stats/
When running a docker stats The output looks like this:
$ docker stats --all --format "table {{.MemPerc}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.Name}}"
MEM % CPU % MEM USAGE / LIMIT NAME
0.50% 1.00% 77.85MiB / 15.57GiB ecstatic_noether
1.50% 3.50% 233.55MiB / 15.57GiB stoic_goodall
0.25% 0.50% 38.92MiB / 15.57GiB drunk_visvesvaraya
My script will add the following line at the end:
2.25% 5.00% 350.32MiB / 15.57GiB TOTAL
docker_stats.sh
#!/bin/bash
# This script is used to complete the output of the docker stats command.
# The docker stats command does not compute the total amount of resources (RAM or CPU)
# Get the total amount of RAM, assumes there are at least 1024*1024 KiB, therefore > 1 GiB
HOST_MEM_TOTAL=$(grep MemTotal /proc/meminfo | awk '{print $2/1024/1024}')
# Get the output of the docker stat command. Will be displayed at the end
# Without modifying the special variable IFS the ouput of the docker stats command won't have
# the new lines thus resulting in a failure when using awk to process each line
IFS=;
DOCKER_STATS_CMD=`docker stats --no-stream --format "table {{.MemPerc}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.Name}}"`
SUM_RAM=`echo $DOCKER_STATS_CMD | tail -n +2 | sed "s/%//g" | awk '{s+=$1} END {print s}'`
SUM_CPU=`echo $DOCKER_STATS_CMD | tail -n +2 | sed "s/%//g" | awk '{s+=$2} END {print s}'`
SUM_RAM_QUANTITY=`LC_NUMERIC=C printf %.2f $(echo "$SUM_RAM*$HOST_MEM_TOTAL*0.01" | bc)`
# Output the result
echo $DOCKER_STATS_CMD
echo -e "${SUM_RAM}%\t\t\t${SUM_CPU}%\t\t${SUM_RAM_QUANTITY}GiB / ${HOST_MEM_TOTAL}GiB\tTOTAL"
From the documentation that you have linked above,
The docker stats command returns a live data stream for running containers.
To limit data to one or more specific containers, specify a list of container names or ids separated by a space.
You can specify a stopped container but stopped containers do not return any data.
and then furthermore,
Note: On Linux, the Docker CLI reports memory usage by subtracting page cache usage from the total memory usage.
The API does not perform such a calculation but rather provides the total memory usage and the amount from the page cache so that clients can use the data as needed.
According to your question, it looks like you can assume so, but also do not forget it also factors in containers that exist but are not running.
Your docker_stats.sh does the job for me, thanks!
I had to add unset LC_ALL somewhere before LC_NUMERIC is used though as the former overrides the latter and otherwise I get this error:
"Zeile 19: printf: 1.7989: Ungültige Zahl." This is probably due to my using a German locale.
There is also a discussion to add this feature to the "docker stats" command itself.
Thanks for sharing the script! I've updated it in a way it depends on DOCKER_MEM_TOTAL instead of HOST_MEM_TOTAL as docker has it's own memory limit which could differ from host total memory count.

How to zgrep the last line of a gz file without tail

Here is my problem, I have a set of big gz log files, the very first info in the line is a datetime text, e.g.: 2014-03-20 05:32:00.
I need to check what set of log files holds a specific data.
For the init I simply do a:
'-query-data-'
zgrep -m 1 '^20140320-04' 20140320-0{3,4}*gz
BUT HOW to do the same with the last line without process the whole file as would be done with zcat (too heavy):
zcat foo.gz | tail -1
Additional info, those logs are created with the data time of it's initial record, so if I want to query logs at 14:00:00 I have to search, also, in files created BEFORE 14:00:00, as a file would be created at 13:50:00 and closed at 14:10:00.
The easiest solution would be to alter your log rotation to create smaller files.
The second easiest solution would be to use a compression tool that supports random access.
Projects like dictzip, BGZF, and csio each add sync flush points at various intervals within gzip-compressed data that allow you to seek to in a program aware of that extra information. While it exists in the standard, the vanilla gzip does not add such markers either by default or by option.
Files compressed by these random-access-friendly utilities are slightly larger (by perhaps 2-20%) due to the markers themselves, but fully support decompression with gzip or another utility that is unaware of these markers.
You can learn more at this question about random access in various compression formats.
There's also a "Blasted Bioinformatics" blog by Peter Cock with several posts on this topic, including:
BGZF - Blocked, Bigger & Better GZIP! – gzip with random access (like dictzip)
Random access to BZIP2? – An investigation (result: can't be done, though I do it below)
Random access to blocked XZ format (BXZF) – xz with improved random access support
Experiments with xz
xz (an LZMA compression format) actually has random access support on a per-block level, but you will only get a single block with the defaults.
File creation
xz can concatenate multiple archives together, in which case each archive would have its own block. The GNU split can do this easily:
split -b 50M --filter 'xz -c' big.log > big.log.sp.xz
This tells split to break big.log into 50MB chunks (before compression) and run each one through xz -c, which outputs the compressed chunk to standard output. We then collect that standard output into a single file named big.log.sp.xz.
To do this without GNU, you'd need a loop:
split -b 50M big.log big.log-part
for p in big.log-part*; do xz -c $p; done > big.log.sp.xz
rm big.log-part*
Parsing
You can get the list of block offsets with xz --verbose --list FILE.xz. If you want the last block, you need its compressed size (column 5) plus 36 bytes for overhead (found by comparing the size to hd big.log.sp0.xz |grep 7zXZ). Fetch that block using tail -c and pipe that through xz. Since the above question wants the last line of the file, I then pipe that through tail -n1:
SIZE=$(xz --verbose --list big.log.sp.xz |awk 'END { print $5 + 36 }')
tail -c $SIZE big.log.sp.xz |unxz -c |tail -n1
Side note
Version 5.1.1 introduced support for the --block-size flag:
xz --block-size=50M big.log
However, I have not been able to extract a specific block since it doesn't include full headers between blocks. I suspect this is nontrivial to do from the command line.
Experiments with gzip
gzip also supports concatenation. I (briefly) tried mimicking this process for gzip without any luck. gzip --verbose --list doesn't give enough information and it appears the headers are too variable to find.
This would require adding sync flush points, and since their size varies on the size of the last buffer in the previous compression, that's too hard to do on the command line (use dictzip or another of the previously discussed tools).
I did apt-get install dictzip and played with dictzip, but just a little. It doesn't work without arguments, creating a (massive!) .dz archive that neither dictunzip nor gunzip could understand.
Experiments with bzip2
bzip2 has headers we can find. This is still a bit messy, but it works.
Creation
This is just like the xz procedure above:
split -b 50M --filter 'bzip2 -c' big.log > big.log.sp.bz2
I should note that this is considerably slower than xz (48 min for bzip2 vs 17 min for xz vs 1 min for xz -0) as well as considerably larger (97M for bzip2 vs 25M for xz -0 vs 15M for xz), at least for my test log file.
Parsing
This is a little harder because we don't have the nice index. We have to guess at where to go, and we have to err on the side of scanning too much, but with a massive file, we'd still save I/O.
My guess for this test was 50000000 (out of the original 52428800, a pessimistic guess that isn't pessimistic enough for e.g. an H.264 movie.)
GUESS=50000000
LAST=$(tail -c$GUESS big.log.sp.bz2 \
|grep -abo 'BZh91AY&SY' |awk -F: 'END { print '$GUESS'-$1 }')
tail -c $LAST big.log.sp.bz2 |bunzip2 -c |tail -n1
This takes just the last 50 million bytes, finds the binary offset of the last BZIP2 header, subtracts that from the guess size, and pulls that many bytes off of the end of the file. Just that part is decompressed and thrown into tail.
Because this has to query the compressed file twice and has an extra scan (the grep call seeking the header, which examines the whole guessed space), this is a suboptimal solution. See also the below section on how slow bzip2 really is.
Perspective
Given how fast xz is, it's easily the best bet; using its fastest option (xz -0) is quite fast to compress or decompress and creates a smaller file than gzip or bzip2 on the log file I was testing with. Other tests (as well as various sources online) suggest that xz -0 is preferable to bzip2 in all scenarios.
————— No Random Access —————— ——————— Random Access ———————
FORMAT SIZE RATIO WRITE READ SIZE RATIO WRITE SEEK
————————— ————————————————————————————— —————————————————————————————
(original) 7211M 1.0000 - 0:06 7211M 1.0000 - 0:00
bzip2 96M 0.0133 48:31 3:15 97M 0.0134 47:39 0:00
gzip 79M 0.0109 0:59 0:22
dictzip 605M 0.0839 1:36 (fail)
xz -0 25M 0.0034 1:14 0:12 25M 0.0035 1:08 0:00
xz 14M 0.0019 16:32 0:11 14M 0.0020 16:44 0:00
Timing tests were not comprehensive, I did not average anything and disk caching was in use. Still, they look correct; there is a very small amount of overhead from split plus launching 145 compression instances rather than just one (this may even be a net gain if it allows an otherwise non-multithreaded utility to consume multiple threads).
Well, you can access randomly a gzipped file if you previously create an index for each file ...
I've developed a command line tool which creates indexes for gzip files which allow for very quick random access inside them:
https://github.com/circulosmeos/gztool
The tool has two options that may be of interest for you:
-S option supervise a still-growing file and creates an index for it as it is growing - this can be useful for gzipped rsyslog files as reduces to zero in the practice the time of index creation.
-t tails a gzip file: this way you can do: $ gztool -t foo.gz | tail -1
Please, note that if the index doesn't exists, this will consume the same time as a complete decompression: but as the index is reusable, next searches will be greatly reduced in time!
This tool is based on zran.c demonstration code from original zlib, so there's no out-of-the-rules magic!

How do I get the percentage of used storage in the UNIX server [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I want to get the only Percentage of the Disk space in the UNIX server
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 457G 90G 344G 21% /
udev 2.0G 4.0K 2.0G 1% /dev
tmpfs 798M 1.1M 797M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 2.0G 23M 2.0G 2% /run/shm
cgroup 2.0G 0 2.0G 0% /sys/fs/cgroup
I am using the following command to get the Percentage data
df -h > space.txt
space=`head -2 space.txt | tail -1 | cut -d' ' -f15 | sed 's/%.*$//'`
Is there any command to get the "Used percentage" directly
You could use this:
used=$(df / | awk 'END{print $5}')
echo $used
56%
Rather than running df without specifying the filesystem you actually mean and then looking for it in a mass of lines, I specify / as the filesystem up front, then I know the result will be on the last line. I take advantage of that by using END in awk to get the 5th field in the last line only.
Using gnu coreutils df:
$ df -P | awk '/\/dev\/sd/ { print $1, $5, $6 } '
/dev/sda3 54% /
/dev/sda2 95% /media/data
Note the use of the -P flag to force each mount to print on exactly one line.
This is a requirement for predictable results when scripting.
This example includes both device $1 and mountpoint $6. You can drop either or both as you wish.
‘-P’
‘--portability’
Use the POSIX output format. This is like the default format except for the following:
The information about each file system is always printed on exactly one line; a mount device is never
put on a line by itself. This means that if the mount device name is more than 20 characters long
(e.g., for some network mounts), the columns are misaligned.
The labels in the header output line are changed to conform to POSIX.
The default block size and output format are unaffected by the DF_BLOCK_SIZE, BLOCK_SIZE
and BLOCKSIZE environment variables. However, the default block size is still affected by
POSIXLY_CORRECT: it is 512 if POSIXLY_CORRECT is set, 1024 otherwise. See Block size.

Script to send alert mail if disk usage exceeds a percentage

I am new to shell scripting, and want to implement a script on my server which will automatically send e-mail alerts if:
Disk usage exceeds 90%
Disk usage exceeds 95% (In addition to the previous e-mail)
My filesystem is abc:/xyz/abc and my mount is /pqr. How can I set this up via scripts?
You can use the df command to check the file system usage. As a starting point, you can use the below command:
df -h | awk -v val=90 '$1=="/pqr"{x=int($5)>val?1:0;print x}'
The above command will print 1 if more than threshold, else print 0. The threshold is set in val.
Note: Please ensure the 5th column of your df output is the use percentage, else use appropriate column.

Resources