Checking if strings exist in a file (ksh)

Checking if strings exist in a file (ksh) - shell

What this KornShell (ksh) script should do is check in my dmesg for disks, internals and externals. On the dmesg output, internal drives appear as wd[0-9] and externals as sd[0-9]. I of course do not have so many disks but I want my script to cover as many possibilities as possible. Devices 0-9 will be checked. So this is the idea:
create two arrays of size 9, search through dmesg if wd0 exists, if so make the first element of internals 1, if wd1 does exist make the second element of array 1 or 0 otherwise.
If it was to search for one specific disk e.g. wd0, I could do something like:
internal=`dmesg | grep "^wd0" | head -n 1 | cut -d\ -f1`
which makes
internal = wd0
But how to check if the strings wd0-wd9 exists in the dmesg in a "loopy" way
# create arrays
set -A internals 0 0 0 0 0 0 0 0 0
set -A externals 0 0 0 0 0 0 0 0 0
(the code below is not ksh code, but presenting the idea in c-like syntax):
for (i=0;i<=8;i++){
if (wd0) # that is if wd0 exists, if wd1 exists etc.
internal[i] = 1;
else
internal[i] = 0;
}
And of course the same process should be followed for the externals.

I think the following should get you going, not exactly sure what values you want your arrays set to but here goes:
#!/bin/ksh
set -A internal
for i in {0..9}
do
echo "Looking for wd"$i
internal[$i]=`dmesg | grep "^wd$i" | head -n 1 | cut -d\ -f1`
if [[ ${internal[$i]} = "wd"$i ]];then
internal[$i]=1
else
internal[$i]=0
fi
done

Related

Is there a way to change floating to whole number in for loop in bash

I have a bash loop that I run to copy 2 files from the hpc to my local drive recursively over the processors and all the timesteps. On the hpc the timesteps are saved as
1 2 3
whereas the bash loop interprets it as
1.0 2.0 3.0
probably because of the 0.5 increment. Is there a way to get the $j to be changed to whole number (without the decimal) when running the script?
Script I use:
for i in $(seq 0 1 23)
do
mkdir Run1/processor$i
for j in $(seq 0 0.5 10);
do
mkdir Run1/processor$i/$j
scp -r xx#login.hpc.xx.xx:/scratch/Run1/processor$i/$j/p Run1/processor$i/$j/
scp -r xx#login.hpc.xx.xx:/scratch/Run1/processor$i/$j/U Run1/processor$i/$j/
done
done
Result:
scp: /scratch/Run1/processor0/1.0/p: No such file or directory
The correct directory that exists is
/scratch/Run1/processor0/1
Thanks!

well, yes!
but: Depending on what the end result is.
I will assume you want to floor the decimal number. I can think of 2 options:
pipe the number to cut
do a little bit of perl
for i in $(seq 0 1 23); do
for j in $(seq 0 0.5 10); do
# pipe to cut
echo /scratch/Run1/processor$i/$(echo $j | cut -f1 -d".")/U Run1/processor"$i/$j"/
# pipe to perl
echo /scratch/Run1/processor$i/$(echo $j | perl -nl -MPOSIX -e 'print floor($_);')/U Run1/processor"$i/$j"/
done
done
result:
...
/scratch/Run1/processor23/9/U Run1/processor23/9/
/scratch/Run1/processor23/9/U Run1/processor23/9.5/
/scratch/Run1/processor23/9/U Run1/processor23/9.5/
/scratch/Run1/processor23/10/U Run1/processor23/10/
/scratch/Run1/processor23/10/U Run1/processor23/10/
edit :
Experimented a litle, found another way:
echo /scratch/Run1/processor$i/${j%%.[[:digit:]]}/U Run1/processor"$i/$j"/

Bash - Count frequency of palindromes from text file

This is a follow up from my other post:
Printing all palindromes from text file
I want to be able to print to amount of palindromes that I have found from my text file similar to a frequency table. It'll show the amount of the word followed by the word, similar to this format:
100 did
32 sas
17 madam
My code right now is:
#!usr/bin/env bash
function search
{
grep -oiE '[a-z]{3,}' "$1" | sort -n | tr '[:upper:]' '[:lower:]' | while read -r word; do
[[ $word == $(rev <<< "$word") ]] && echo "$word" | uniq -c
done
}
search "$1"
In comparison to the last post I did: Printing all palindromes from text file . I have added "sort -n" and "uniq -c" which from my knowledge is to sort the palindromes found in alphabetical order, then "uniq -c" is to print the number of occurrences of the words found.
Just to test script I have a testing file named: "testingfile.txt" . This contains:
testing words testing words testing words
palindromes
Sas
Sas
Sas
sas
bob
Sas
Sas
Sas Sas madam
midim poop goog tot sas did i want to go to the movies did
otuikkiuto
pop
poop
This file is just so I can test before trying this script on a much larger file in which it'll take much longer.
When typing in the console: (also to note "palindrome" is the name of my script)
source palindrome testingfile.txt
The output appears like this:
1 bob
1 did
1 did
1 goog
1 madam
1 midim
1 otuikkiuto
1 poop
1 poop
1 pop
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 sas
1 tot
Is there something I am missing to get the result that I want:
9 sas
2 did
2 poop
1 bob
1 goog
1 madam
1 midim
1 otuikkiuto
1 pop
1 tot
Solutions to this would be greatly appreciated! If there are solutions with other commands that are needed an explanation of the reasoning behind the other commands are also greatly appreciated.
Thank you

You missed two important details:
You need to pass all input at once to uniq -c to count them, not one by one to one uniq each
uniq expects its input to be sorted. The sort you had in the grep pipeline is ineffective, because after the transformation to lowercase, the values would need to be sorted again
You can apply sort | uniq -c to the output of an entire loop,
by piping the loop itself:
grep -oiE '[a-z]{3,}' "$1" | tr '[:upper:]' '[:lower:]' | while read -r word; do
[[ $word == $(rev <<< "$word") ]] && echo "$word"
done | sort | uniq -c
Finally, to get an output sorted in descending order by count,
you need to further pipe the output to sort -nr.

Creating histograms in bash

EDIT
I read the question that this is supposed to be a duplicate of (this one). I don't agree. In that question the aim is to get the frequencies of individual numbers in the column. However if I apply that solution to my problem, I'm still left with my initial problem of grouping the frequencies of the numbers in a particular range into the final histogram. i.e. if that solution tells me that the frequency of 0.45 is 2 and 0.44 is 1 (for my input data), I'm still left with the problem of grouping those two frequencies into a total of 3 for the range 0.4-0.5.
END EDIT
QUESTION-
I have a long column of data with values between 0 and 1.
This will be of the type-
0.34
0.45
0.44
0.12
0.45
0.98
.
.
.
A long column of decimal values with repetitions allowed.
I'm trying to change it into a histogram sort of output such as (for the input shown above)-
0.0-0.1 0
0.1-0.2 1
0.2-0.3 0
0.3-0.4 1
0.4-0.5 3
0.5-0.6 0
0.6-0.7 0
0.7-0.8 0
0.8-0.9 0
0.9-1.0 1
Basically the first column has the lower and upper bounds of each range and the second column has the number of entries in that range.
I wrote it (badly) as-
for i in $(seq 0 0.1 0.9)
do
awk -v var=$i '{if ($1 > var && $1 < var+0.1 ) print $1}' input | wc -l;
done
Which basically does a wc -l of the entries it finds in each range.
Output formatting is not a part of the problem. If I simply get the frequencies corresponding to the different bins , that will be good enough. Also please note that the bin size should be a variable like in my proposed solution.
I already read this answer and want to avoid the loop. I'm sure there's a much much faster way in awk that bypasses the for loop. Can you help me out here?

Following the same algorithm of my previous answer, I wrote a script in awk which is extremely fast (look at the picture).
The script is the following:
#!/usr/bin/awk -f
BEGIN{
bin_width=0.1;
}
{
bin=int(($1-0.0001)/bin_width);
if( bin in hist){
hist[bin]+=1
}else{
hist[bin]=1
}
}
END{
for (h in hist)
printf " * > %2.2f -> %i \n", h*bin_width, hist[h]
}
The bin_width is the width of each channel. To use the script just copy it in a file, make it executable (with chmod +x <namefile>) and run it with ./<namefile> <name_of_data_file>.

For this specific problem, I would drop the last digit, then count occurrences of sorted data:
cut -b1-3 | sort | uniq -c
which gives, on the specified input set:
2 0.1
1 0.3
3 0.4
1 0.9
Output formatting can be done by piping through this awk command:
| awk 'BEGIN{r=0.0}
{while($2>r){printf "%1.1f-%1.1f %3d\n",r,r+0.1,0;r=r+.1}
printf "%1.1f-%1.1f %3d\n",$2,$2+0.1,$1}
END{while(r<0.9){printf "%1.1f-%1.1f %3d\n",r,r+0.1,0;r=r+.1}}'

The only loop you will find in this algorithm is around the line of the file.
This is an example on how to realize what you asked in bash. Probably bash is not the best language to do this since it is slow with math. I use bc, you can use awk if you prefer.
How the algorithm works
Imagine you have many bins: each bin correspond to an interval. Each bin will be characterized by a width (CHANNEL_DIM) and a position. The bins, all together, must be able to cover the entire interval where your data are casted. Doing the value of your number / bin_width you get the position of the bin. So you have just to add +1 to that bin. Here a much more detailed explanation.
#!/bin/bash
# This is the input: you can use $1 and $2 to read input as cmd line argument
FILE='bash_hist_test.dat'
CHANNEL_NUMBER=9 # They are actually 10: 0 is already a channel
# check the max and the min to define the dimension of the channels:
MAX=`sort -n $FILE | tail -n 1`
MIN=`sort -rn $FILE | tail -n 1`
# Define the channel width
CHANNEL_DIM_LONG=`echo "($MAX-$MIN)/($CHANNEL_NUMBER)" | bc -l`
CHANNEL_DIM=`printf '%2.2f' $CHANNEL_DIM_LONG `
# Probably printf is not the best function in this context because
#+the result could be system dependent.
# Determine the channel for a given number
# Usage: find_channel <number_to_histogram> <width_of_histogram_channel>
function find_channel(){
NUMBER=$1
CHANNEL_DIM=$2
# The channel is found dividing the value for the channel width and
#+rounding it.
RESULT_LONG=`echo $NUMBER/$CHANNEL_DIM | bc -l`
RESULT=`printf '%.0f' $RESULT_LONG`
echo $RESULT
}
# Read the file and do the computuation
while IFS='' read -r line || [[ -n "$line" ]]; do
CHANNEL=`find_channel $line $CHANNEL_DIM`
[[ -z HIST[$CHANNEL] ]] && HIST[$CHANNEL]=0
let HIST[$CHANNEL]+=1
done < $FILE
counter=0
for i in ${HIST[*]}; do
CHANNEL_START=`echo "$CHANNEL_DIM * $counter - .04" | bc -l`
CHANNEL_END=`echo " $CHANNEL_DIM * $counter + .05" | bc`
printf '%+2.1f : %2.1f => %i\n' $CHANNEL_START $CHANNEL_END $i
let counter+=1
done
Hope this helps. Comment if you have other questions.

Incrementing a single number in a history line

So I'm having difficulty figuring this out.
What I am trying to do, is display the most recently entered command
Let's use this as an example:
MD5=$(cat $DICT | head -1 | tail -1 | md5sum)
This command has just been executed. It is contained inside of a shell script.
After it is executed, the output is checked in an if..then..else.. statement.
If the condition is met, I want it to run the command above, except I want it incremented by one, every time it is ran.
For instance:
MD5=$(cat $DICT | head -1 | tail -1 | md5sum)
if test ! $MD5=$HASH #$HASH is a user defined MD5Hash, it is checking if $MD5 does NOT equal the user's $HASH
then #one liner to display the history, to display the most recent
"MD5=$(cat $DICT | head -1 | tail -1 | md5sum)" #pipe it to remove the column count, then increment the "head -1" to "head -2"
else echo "The hash is the same."
fi #I also need this if..then..else statement to run, until the "else" condition is met.
Can anyone help, please and thank you. I'm having a brain fart.
I was thinking using sed, or awk to increment. grep to display the most recent of the commands,
So say:
$ history 3
Would output:
1 MD5=$(cat $DICT | head -1 | tail -1 | md5sum)
2 test ! $MD5=$HASH
3 history 3
-
$ history 3 | grep MD5
Would output:
1 MD5=$(cat $DICT | head -1 | tail -1 | md5sum)
Now I want it to remove the 1, and add a 1 to head's value, and rerun that command. And send that command back through the if..then..else test.

UPDATED
If I understood your problem well, this can be a solution:
# Setup test environment
DICT=infile
cat >"$DICT" <<XXX
Kraftwerk
King Crimson
Solaris
After Cyring
XXX
HASH=$(md5sum <<<"After Cyring")
# Process input file and look for match
while read line; do
md5=$(md5sum<<<"$line")
((++count))
[ "$HASH" == "$md5" ] && echo "The hash is the same. ($count)" && break
done <$DICT
Output:
The hash is the same. (4)
I improved the script a little bit. It spares one more clone(2) and pipe(2) call using md5sum<<<word notation instead of echo word|md5sum.
At first it sets up the test env creating infile and a HASH. Then it reads each line of the input file, creates the MD5 checksum and checks if is matches with HASH. If so it writes some message to stdout and breaks the loop.
IMHO the original problem was a little bit over-thought.

Bash script that reads from files has garbled output

I am very new to Bash scripting. I am trying to write a script that works with two files. Each line of the files looks like this:
INST <_variablename_> = <_value_>;
The two files share many variables, but they are in a different order, so I can't just diff them. What I want to do is go through the files and find all the variables that have different values, or all the variables that are specified in one file but not the other.
Here is my script so far. Again, I'm very new to Bash so please go easy on me, but also feel free to suggest improvements (I appreciate it).
#!/bin/bash
line_no=1
while read LINE
do
search_var=`echo $LINE | awk '{print $2}'`
result_line=`grep -w $search_var file2`
if [ $? -eq 1 ]
then
echo "$line_no: not found [ $search_var ]"
else
value=`echo $LINE | awk '{print $4}'`
result_value=`echo $result_line | awk '{print $4}'`
if [ "$value" != "$result_value" ]
then
echo "$line_no: mismatch [ $search_var , $value , $result_value ]"
fi
fi
line_no=`expr $line_no + 1`
done < file1
Now here's an example of some of the output that I'm getting:
111: mismatch [ TXAREFBIASSEL , TRUE; , "TRUE"; ]
, 4'b1100; ] [ TXTERMTRIM , 4'b1100;
113: not found [ VREFBIASMODE ]
, 2'b00; ]ch [ CYCLE_LIMIT_SEL , 2'b00;
, 3'b100; ]h [ FDET_LCK_CAL , 3'b101;
The first line is what I would expect (I'll deal with the quotes later). On the second, fourth, and fifth line, it looks like the final value is overwriting the "line_no: mismatch" part. And furthermore, on the second and fourth line, the values DO match--it shouldn't print anything at all!
I asked my friend about this, and his suggestion was "Do it in Perl." So I'm learning Perl right now, but I'd still like to know what's going on and why this is happening.
Thank you!
EDIT:
Sigh. I figured out the problem. One of the files had Unix line breaks, and the other had DOS line breaks. I actually thought this might be the case, but I also thought that vi was supposed to display some character if it opened a dos-ended file. Since they looked the same, I assumed that they were the same.
Thanks for your help and suggestions everybody!

Rather than simply replacing the Bash language with Perl, how about a paradigm shift?
diff -w <(sort file1) <(sort file2)
This will sort both files, so that the variables will appear in the same order in each, and will diff the results (ignoring whitespace differences, just for fun).
This may give you more or less what you need, without any "code" per se. Note that you could also sort the files into intermediate files and run diff on those if you find that easier...I happen to like doing it with no temporary files.

What about this? 2 is avaliable in both files and same value. other values can be parsed easily.
sort 1.txt 2.txt | uniq -c
2 a = 10
1 b = 20
1 b = 40
1 c = 10
1 c = 30
1 e = 50
or like this get your key and values.
sed 's|INST \(.*\) = \(.*\)|\1 = \2|' 1.txt 2.txt | sort | uniq -c
2 a = 10
1 b = 20
1 b = 40
1 c = 10
1 c = 30
1 e = 50

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio