bash awk file compare

bash awk file compare - bash

I have a config
[LogicalUnit1] UnitInquiry "NFSN00Y5IP51ZL" LUN0 /mnt/extent0 64MB
[LogicalUnit2] UnitInquiry "NFSN00N49CQL28" LUN0 /mnt/extent1 64MB
[LogicalUnit3] UnitInquiry "NFSNBRGQOCXK" LUN0 /mnt/extent4 10MB
[LogicalUnit4] UnitInquiry "NFSNE7IXADFJ" LUN0 /mnt/extent5 25MB
which is read via a bash script, using awk i parse the file and get variables
awk '/UnitInquiry/ {print $1, $3, $5, $6}' $ctld_config | while read a b c d ; do
if [ -f $a ]
then
ctladm create -b block -o file=$c -S $b -d $a
ctladm devlist -v > $lun_config
else
truncate -s $d $c ; ctladm create -b block -o file=$c -S $b -d $a
fi
this will initialize the luns properly on bootup, however if i add a lun then it will recreate them all again, how can i compare whats running, to whats configured and only reinitialize the ones not already live, there is a command to list the devices
ctladm devlist -v
LUN Backend Size (Blocks) BS Serial Number Device ID
0 block 131072 512 "NFSN00Y5IP51ZL [LogicalUnit1]
lun_type=0
num_threads=14
file=/mnt/extent0
1 block 131072 512 "NFSN00N49CQL28 [LogicalUnit2]
lun_type=0
num_threads=14
file=/mnt/extent1
2 block 20480 512 "NFSNBRGQOCXK" [LogicalUnit3]
lun_type=0
num_threads=14
file=/mnt/extent4
3 block 51200 512 "NFSNE7IXADFJ" [LogicalUnit4]
lun_type=0
num_threads=14
file=/mnt/extent5

Why not add the following after the then:
ctladm devlist -v | grep -q "$a" && continue
This will
run the command that show the currently active devices
check if the LogicalUnit name you want to register is already listed, and if yes...
skip the rest of the loop.
If $a (logical unit name) is not unique enough, you can also grep for another, more unique identifier, e.g. the serial number.

Related

echo 3 statements based wc -l count number returned in output

Team, I have below command that is working fine but I am enhancing it to get result like below
my goal is to report the count and display statement with it.
I have three conditions to be met
1 - if result = 0 mounts not found
2 - if result = 1-64, mounts found under 64
3 - if result = 64+, mounts found above 64
if count is 0 I want to output:
0 mounts found on this hostname
if 1-64 mounts found, then I want to say whatever number is found
x mounts found on hostname.
if anything beyond 64 mounts are found, then i want to say
x mounts found on hostname that are more than 64
mount | grep csi | grep -e /dev/sd | wc -l && echo "mounts found on $HOSTNAME"
I am trying to learn how to compare returned count to 64 and displace statement accordingly. I need a single line shell command for all this and not a multiple coz i need to fit it in ansible shell module.
sample output:
mount | grep csi
tmpfs on /var/lib/kubelet/pods/abaa868f-2109-11ea-a1f8-ac1f6b5995dc/volumes/kubernetes.io~secret/csi-nodeplugin-token-type tmpfs (rw,relatime)
/host/dev/sdc on /var/lib/kubelet/pods/11ea-a1f8-ac1f6b5995dc/volumes/kubernetes.io~csi/ea6728b2-08d0-5fb7-b93a-5f63e49f770c/mount type iso9660 (ro,relatime,nojoliet,check=s,map=n,blocksize=2048,fsc,readahead=4096)
mount | grep csi | grep /dev/sd
/host/dev/sdc on /var/lib/kubelet/pods/11ea-a1f8-ac1f6b5995dc/volumes/kubernetes.io~csi/b93a-5f63e49f770c/mount type iso9660 (ro,relatime,nojoliet,check=s,map=n,blocksize=2048,fsc,readahead=4096)
any hint why is this not working below?
tried solution: with awk and comparison operator
mount | grep -Ec '/dev/sd.*\<csi' | awk '$0 = 0 { printf "No mounts found", $0,"TRUE" ; } ($0 > 0 && $0 <= 64) { print "Mounts are less than 64", $0 ;} $0 > 64 { print "Mounts are more than 64", $0 ;}'
output:
node1
expected:
node1 No mounts found

With extended and optimized pipeline:
mount | grep -Ec '/dev/sd.*\<csi' \
| awk '{ print $0,"mounts found on hostname"($0>64? " that are more than 64." : ".") }'
grep's -c option - suppress normal output; instead print a count of matching lines
The symbols \< and \> respectively match the empty string at the beginning and end of a word.

How to loop through a list and pass as variable in bash / awk

Updated question:
I have a config.file in which I define a few variables that are ultimately called in a different script.
$cat config.file
#1 Accession number ref
ref=L41223.2
#2 Accession number SRA
SRA=SRA7361534
#3 Path to SRA
path_SRA='/Volumes/5TB/sra/'
#4 Path to ref
path_ref='/Volumes/5TB/results/species1/'
The #3 (path to SRA) is constant and never changes. For the other variables ($ref, $sra and $path_ref), I would like to read them one-by-one from different fields of an input.file:
$cat input.file
species1 L41223.2 SRA7361534
species2 D45023.5 SRA9473231
species3 L42823.6 SRA0918881
...
All these variables are called several times in a script.sh:
#!/bin/bash
# Path to the configuration file
. /Users/Main/config.file
# Use NCBI's e-utilities to download reference files
esearch -db nucleotide -query $ref | efetch -format fasta > $path_ref$ref.fasta
# Using NCBI's sratoolkit to download SRA file
prefetch $SRA
cd $path_SRA
mv *.sra $path_ref
# Decompress the SRA file
cd $path_ref; if fastq-dump --split-3 $SRA.sra ; then
echo "SRA file successfully decompressed. Deleting the SRA file now..."
rm $SRA.sra
else
echo "Could not decompress SRA file"
fi
# Use bwa to align DNA reads to the reference sequence
cd $path_ref;
bwa index -p INDEX $ref.fasta
bwa aln -t $core INDEX *_1.fastq > 1.sai
bwa aln -t $core INDEX *_2.fastq > 2.sai
bwa sampe INDEX 1.sai 2.sai *_1.fastq *_2.fastq | samtools view -hq 5 > $SRA.Q5.sam
# Use samtools for conversion
samtools view -bT $ref.fasta $SRA.Q5.sam > $SRA.Q5.bam
samtools sort $SRA.Q5.bam -o $SRA.sorted
# use bedtools for coverage
bedtools genomecov -d -ibam $SRA.sorted.bam > $SRA.gencov.txt
# use awk for extraction
awk '$2 ~ /81|161|97|145/ {print $0}' $SRA.Q5.sam > $SRA.OTW.sam
samtools view -bT $ref.fasta $SRA.OTW.sam > $SRA.OTW.bam
samtools sort $SRA.OTW.bam -o $SRA.OTW.sorted.bam
# Extract FLAG, POS, CIGAR and TLEN for outward-oriented reads
awk '$2 ~ /81|161|97|145/ {print $2, $4, $6, $9}' $SRA.Q5.sam > $SRA.OTW.txt
# Get per-base coverage for outward-oriented reads
bedtools genomecov -d -ibam $SRA.OTW.sorted.bam > $SRA.OTW.gencoverage.txt
# Simplify the output by averaging read coverage over 50 bp window; prints the average count value and last genomic position
awk '{sum+=$3; count++} FNR % 50 == 0 {print $2, (sum/count); count=sum = ""}' $SRA.OTW.gencoverage.txt > $SRA.OTW.50sum.txt
#### End of the script
What I would like to do is "read" from the input.file into the config.file. The first field (species1...) would be used as input for $path_ref, field 2 (L41223.2...) would be used as input for $ref and third field (SRA7361534...) would be used as input for $SRA variable. Once the first round (basically the first line) has been done, the script.sh would run again and read fields 1,2 and 3 from the line 2 and so on. Basically, a loop, but somewhat more complicated than the answer below because different variables are called at different places in the script.
This works fine for one variable, however I couldn't implement it with three different variables called throughout the script:
while read -r c1 c2 c3; do
bwa index -p INDEX ${c2}.fasta
# place rest of your script here
done < input.file
Many thanks in advance.

In script.sh, after the line . /Users/Main/config.file, add these lines:
number_of_inputs=$(wc -l < input.file)
for (( i=1 ; i <= number_of_inputs ; i++ )); do
# extract columns $1, $2, $3 here, from line $i - please change appropriately
ref=$( awk "NR==$i{print \$1}" input.file)
SRA=$( awk "NR==$i{print \$2}" input.file)
path_ref=$(awk "NR==$i{print \$3}" input.file)
then add a done at the end of the file, so the whole thing loops over the values in each line of input.file, setting the values accordingly

Bash error: Integer expression expected

In the sections below, you'll see the shell script I am trying to run on a UNIX machine, along with a transcript.
When I run this program, it gives the expected output but it also gives an error shown in the transcript. What could be the problem and how can I fix it?
First, the script:
#!/usr/bin/bash
while read A B C D E F
do
E=`echo $E | cut -f 1 -d "%"`
if test $# -eq 2
then
I=`echo $2`
else
I=90
fi
if test $E -ge $I
then
echo $F
fi
done
And the transcript of running it:
$ df -k | ./filter.sh -c 50
./filter.sh: line 12: test: capacity: integer expression expected
/etc/svc/volatile
/var/run
/home/ug
/home/pg
/home/staff/t
/packages/turnin
$ _

Before the line that says:
if test $E -ge $I
temporarily place the line:
echo "[$E]"
and you'll find something very much non-numeric, and that's because the output of df -k looks like this:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb1 954316620 212723892 693109608 24% /
udev 10240 0 10240 0% /dev
: :
The offending line there is the first, which will have its fifth field Use% turned into Use, which is definitely not an integer.
A quick fix may be to change your usage to something like:
df -k | sed -n '2,$p' | ./filter -c 50
or:
df -k | tail -n+2 | ./filter -c 50
Either of those extra filters (sed or tail) will print only from line 2 onwards.
If you're open to not needing a special script at all, you could probably just get away with something like:
df -k | awk -vlimit=40 '$5+0>=limit&&NR>1{print $5" "$6}'
The way it works is to only operate on lines where both:
the fifth field, converted to a number, is at least equal to the limit passed in with -v; and
the record number (line) is two or greater.
Then it simply outputs the relevant information for those matching lines.
This particular example outputs the file system and usage (as a percentage like 42%) but, if you just want the file system as per your script, just change the print to output $6 on its own: {print $6}.
Alternatively, if you do the percentage but without the %, you can use the same method I used in the conditional: {print $5+0" "$6}.

bash Assign Result Query to Variable

The situation, where I have unknown number of volume groups and their names with unknown number of disks assigned to them.
Example :
pvs -o pv_name,vg_name
PV VG
/dev/vdd appvg01
/dev/vdb appvg01
/dev/vdf3 vg00
/dev/vdh testvg
vgs --noheadings | awk '{print $1}'| while read line ; do echo $line;vgs --noheadings -o pv_name $line; done
appvg01
/dev/vdd
/dev/vdb
testvg
/dev/vdh
vg00
/dev/vdf3
At the final stage I'd like to mirror each volume with new disk that I'll add manually :
for i in `/sbin/lvs| /bin/awk '{if ($2 ~ /appvg01/) print $1}'`; do
/sbin/lvconvert -b -m0 appvg01/$i /dev/vde
done
but, I don't know what volume name should I correlate with, as it might be any other name.
what is the best approach for this structure.
Thanks

The correct data structure to store this kind of information in bash is associative arrays:
declare -A pvs
{
read # skip the header
while read -r pv vg; do
pvs[$pv]=$vg
done
} < <(pvs -o pv_name,vg_name)
Thereafter, you can iterate and do lookups:
for pv in "${!pvs[#]}"; do
vg="${pvs[$pv]}"
echo "vg $vg is backed by pv $pv"
done

syntax error: operand expected (error token is ">= 75 ")

#!/bin/bash
CURRENT=$(df -h / | grep / | awk '{ print $4}')
THRESHOLD=75
if (( "$CURRENT" >= "$THRESHOLD" )); then
mail -s "CENTOS-6 localhost 10.10.1.238 Disk Space Alert" sss#abc.net << EOF
Your root partition remaining free space is critically low. Used: $CURRENT%
EOF
fi
I got the following error when i run the script, syntax error: operand expected (error token is ">= 75 ")

It's because CURRENT will contain a percent sign, so it won't be a valid operand for the comparison operation.
You can remove the last character like this :
CURRENT=${CURRENT%?};
Also make sure that df -h / | grep / | awk '{ print $4}' is correctly returning the usage ratio, on most systems you have to use print $5.

A couple of things:
you don't need grep at all, awk is quite capable of doing it's own regex stuff.
if you search for / in the df output, you'll probably get most lines as most mounts have a / somewhere in them. If you just want the root mountpoint, you can use <space>/$.
Check that 4 is the correct field number, on my box it's 5.
In any case, that field is of the form 55% which will not be considered numeric. You can use gsub to get rid of it.
With that in mind, the following snippet can be used to get the percentage:
df -h | awk '$0 ~ / \/$/ { gsub("%","",$5); print $5 }'
And, just as an aside, I'm not that big a fan of here-docs in shell scripts since it either (1) screws up my nicely indented files; or (2) makes me burn half an hour while I try to remember the various syntax options which will allow indented EOF strings :-)
I prefer something like:
(
echo Your root partition remaining free space is critically low: Used: ${CURRENT}%
) | mail -s "CENTOS-6 localhost 10.10.1.238 Disk Space Alert" sss#abc.net
Especially since that means I can put arbitrarily complex commands in the sub-shell to generate whatever info I want in the mail message (rather than just simple text substitutions).
So, bottom line, I'd be looking at something more like:
#!/usr/bin/env bash
# Config section.
LIMIT=75
# Code section.
CURR=$(df -h | awk '$0 ~ / \/$/ { gsub("%","",$5); print $5 }')
if [[ ${CURR} -ge ${LIMIT} ]] ; then
(
echo "Your root partition remaining free space is critically low: Used: ${CURR}%"
) | mail -s "CENTOS-6 localhost 10.10.1.238 Disk Space Alert" sss#abc.net
fi

Just try:
CURRENT=$(df -h |awk '{print $4}' |sort -n |tail -n1 |sed 's/%//g')
THRESHOLD=90
if [ $THRESHOLD -gt $CURRENT ]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

bash awk file compare - bash

Related

echo 3 statements based wc -l count number returned in output

How to loop through a list and pass as variable in bash / awk

Bash error: Integer expression expected

bash Assign Result Query to Variable

syntax error: operand expected (error token is ">= 75 ")

Categories

Resources