Bash: checking substring increments with modular arithmetic - bash

I have a list of files with file names that contain a substring of 6 numbers that represents HHMMSS, HH: 2 digits hour, MM: 2 digits minutes, SS: 2 digits seconds.
If the list of files is ordered, the increments should be in steps of 30 minutes, that is, the first substring should be 000000, followed by 003000, 010000, 013000, ..., 233000.
I want to check that no file is missing iterating the list of files and checking that neither of these substrings is missing. My approach:
string_check=000000
for file in ${file_list[#]}; do
if [[ ${file:22:6} == $string_check ]]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
string_check=$((string_check+3000)) #this is the key line
done
And the previous to the last line is the key. It should be formatted to 6 digits, I know how to do that, but I want to add time like a clock, or, in more specific words, modular arithmetic modulo 60. How can that be done?

Assumptions:
all 6-digit strings are of the format xx[03]0000 (ie, has to be an even 00 or 30 minutes and no seconds)
if there are strings like xx1529 ... these will be ignored (see 2nd half of answer - use of comm - to address OP's comment about these types of strings being an error)
Instead of trying to do a bunch of mod 60 math for the MM (minutes) portion of the string, we can use a sequence generator to generate all the desired strings:
$ for string_check in {00..23}{00,30}00; do echo $string_check; done
000000
003000
010000
013000
... snip ...
230000
233000
While OP should be able to add this to the current code, I'm thinking we might go one step further and look at pre-parsing all of the filenames, pulling the 6-digit strings into an associative array (ie, the 6-digit strings act as the indexes), eg:
unset myarray
declare -A myarray
for file in ${file_list}
do
myarray[${file:22:6}]+=" ${file}" # in case multiple files have same 6-digit string
done
Using the sequence generator as the driver of our logic, we can pull this together like such:
for string_check in {00..23}{00,30}00
do
[[ -z "${myarray[${string_check}]}" ]] &&
echo "Problem: (file) '${string_check}' is missing"
done
NOTE: OP can decide if the process should finish checking all strings or if it should exit on the first missing string (per OP's current code).
One idea for using comm to compare the 2 lists of strings:
# display sequence generated strings that do not exist in the array:
comm -23 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
# OP has commented that strings not like 'xx[03]000]` should generate an error;
# display strings (extracted from file names) that do not exist in the sequence
comm -13 <(printf "%s\n" {00..23}{00,30}00) <(printf "%s\n" "${!myarray[#]}" | sort)
Where:
comm -23 - display only the lines from the first 'file' that do not exist in the second 'file' (ie, missing sequences of the format xx[03]000)
comm -13 - display only the lines from the second 'file' that do not exist in the first 'file' (ie, filenames with strings not of the format xx[03]000)
These lists could then be used as input to a loop, or passed to xargs, for additional processing as needed; keeping in mind the comm -13 output will display the indices of the array, while the associated contents of the array will contain the name of the original file(s) from which the 6-digit string was derived.

Doing this easy with POSIX shell and only using built-ins:
#!/usr/bin/env sh
# Print an x for each glob matched file, and store result in string_check
string_check=$(printf '%.0sx' ./*[0-2][0-9][03]000*)
# Now string_check length reflects the number of matches
if [ ${#string_check} -eq 48 ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi
Alternatively:
#!/usr/bin/env sh
if [ "$(printf '%.0sx' ./*[0-2][0-9][03]000*)" \
= 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' ]; then
echo "Ok"
else
echo "Problem: an hour (file) is missing"
exit 99
fi

Related

shell script to create multiple files, incrementing from last file upon next execution

I'm trying to create a shell script that will create multiple files (or a batch of files) of a specified amount. When the amount is reached, script stops. When the script is re-executed, the files pick up from the last file created. So if the script creates files 1-10 on first run, then on the next script execution should create 11-20, and so on.
enter code here
#!/bin/bash
NAME=XXXX
valid=true
NUMBER=1
while [ $NUMBER -le 5 ];
do
touch $NAME$NUMBER
((NUMBER++))
echo $NUMBER + "batch created"
if [ $NUMBER == 5 ];
then
break
fi
touch $NAME$NUMBER
((NUMBER+5))
echo "batch complete"
done
Based on my comment above and your description, you can write a script that will create 10 numbered files (by default) each time it is run, starting with the next available number. As mentioned, rather than just use a raw-unpadded number, it's better for general sorting and listing to use zero-padded numbers, e.g. 001, 002, ...
If you just use 1, 2, ... then you end up with odd sorting when you reach each power of 10. Consider the first 12 files numbered 1...12 without padding. a general listing sort would produce:
file1
file11
file12
file2
file3
file4
...
Where 11 and 12 are sorted before 2. Adding leading zeros with printf -v avoids the problem.
Taking that into account, and allowing the user to change the prefix (first part of the file name) by giving it as an argument, and also change the number of new files to create by passing the count as the 2nd argument, you could do something like:
#!/bin/bash
prefix="${1:-file_}" ## beginning of filename
number=1 ## start number to look for
ext="txt" ## file extension to add
newcount="${2:-10}" ## count of new files to create
printf -v num "%03d" "$number" ## create 3-digit start number
fname="$prefix$num.$ext" ## form first filename
while [ -e "$fname" ]; do ## while filename exists
number=$((number + 1)) ## increment number
printf -v num "%03d" "$number" ## form 3-digit number
fname="$prefix$num.$ext" ## form filename
done
while ((newcount--)); do ## loop newcount times
touch "$fname" ## create filename
((! newcount)) && break; ## newcount 0, break (optional)
number=$((number + 1)) ## increment number
printf -v num "%03d" "$number" ## form 3-digit number
fname="$prefix$num.$ext" ## form filename
done
Running the script without arguments will create the first 10 files, file_001.txt - file_010.txt. Run a second time, it would create 10 more files file_011.txt to file_020.txt.
To create a new group of 5 files with the prefix of list_, you would do:
bash scriptname list_ 5
Which would result in the 5 files list_001.txt to list_005.txt. Running again with the same options would create list_006.txt to list_010.txt.
Since the scheme above with 3 digits is limited to 1000 files max (if you include 000), there isn't a big need to get the number from the last file written (bash can count to 1000 quite fast). However, if you used 7-digits, for 10 million files, then you would want to parse the last number with ls -1 | tail -n 1 (or version sort and choose the last file). Something like the following would do:
number=$(ls -1 "$prefix"* | tail -n 1 | grep -o '[1-9][0-9]*')
(note: that is ls -(one) not ls -(ell))
Let me know if that is what you are looking for.

How to iterate over two strings simultaneously ksh

I'm using data that is returned by another person's ksh93 script in the format of a print to the standard output. Depending on the flag I give it, their script gives me the information I need for my code. It comes out like a list separated by spaces, such that a run of the program has the format of:
"1 3 4 7 8"
"First Third Fourth Seventh Eighth"
For what I'm working on, I need to be able to match the entries of each output, so that I could make the information print in the following format:
1:First
3:Third
4:Fourth
7:Seventh
8:Eighth
I need to do more than just printing with the data, I just need to be able to access the pairs of information in each of the strings. Even though the actual contents of the strings can be any number of values, the two strings I get from running the other script will always be the same length.
I'm wondering if there exists a way to iterate over both at the same time, something along the lines of:
str_1=$(other_script -f)
str_2=$(other_script -i)
for a,b in ${str_1},${str_2} ; do
print "${a}:${b}"
done
This obviously isn't the right syntax, but I have been unable to find a way to make it work. Is there a way to iterate over both at the same time?
I know I could convert them to arrays first then iterate by numerical element, but I would like to save the time of converting them if there's a way to iterate over both simultaneously.
Why do you think it is not quick to convert the strings to arrays?
For example:
`#!/bin/ksh93
set -u
set -A line1
string1="1 3 4 7 8"
line1+=( ${string1} )
set -A line2
string2="First Third Fourth Seventh Eighth"
line2+=( ${string2})
typeset -i num_elem_line1=${#line1[#]}
typeset -i num_elem_line2=${#line2[#]}
typeset -i loop_counter=0
if (( num_elem_line1 == num_elem_line2 ))
then
while (( loop_counter < num_elem_line1 ))
do
print "${line1[${loop_counter}]}:${line2[${loop_counter}]}"
(( loop_counter += 1 ))
done
fi
`
As with the other comments, not sure why an array would be out of the question, especially if you plan on referencing the individual elements more than once later in your code.
A sample script that assumes you want to maintain your str_1/str_2 variables as strings; we'll load into arrays for referencing individual elements:
$ cat testme
#!/bin/ksh
str_1="1 3 4 7 8"
str_2="First Third Fourth Seventh Eighth"
str1=( ${str_1} )
str2=( ${str_2} )
# at this point matching array elements have the same index (0..4) ...
echo "++++++++++ str1[index]=element"
for i in "${!str1[#]}"
do
echo "str1[${i}]=${str1[${i}]}"
done
echo "++++++++++ str2[index]=element"
for i in "${!str1[#]}"
do
echo "str2[${i}]=${str2[${i}]}"
done
# since matching array elements have the same index, we just need
# to loop through one set of indexes to allow us to access matching
# array elements at the same time ...
echo "++++++++++ str1:str2"
for i in "${!str1[#]}"
do
echo ${str1[${i}]}:${str2[${i}]}
done
echo "++++++++++"
And a run of the script:
$ testme
++++++++++ str1[index]=element
str1[0]=1
str1[1]=3
str1[2]=4
str1[3]=7
str1[4]=8
++++++++++ str2[index]=element
str2[0]=First
str2[1]=Third
str2[2]=Fourth
str2[3]=Seventh
str2[4]=Eighth
++++++++++ str1:str2
1:First
3:Third
4:Fourth
7:Seventh
8:Eighth
++++++++++

Random word Bash script if a number is supplied as the first command line argument then it will select from only words with that many characters

I am trying to create a Bash script that
- prints a random word
- if a number is supplied as the first command line argument then it will select from only words with that many characters.
This is my go at the first section (print a random word):
C=$(sed -n "$RANDOM p" /usr/share/dict/words)
echo $C
I am really stuck with the second section. Can anyone help?
might help someone coming from ryans tutorial
#!/bin/bash
charlen=$1
grep -E "^.{$charlen}$" $PWD/words.txt | shuf -n 1
you have to use a while loop to read every single line of that file and check if the length of a word equals the specified number ( including apostrophes ). In my o.s it is 99171 line ( i.e the file).
#!/usr/bin/env bash
readWords() {
declare -i int="$1"
(( int == 0 )) && {
printf "%s\n" "$int is 0, cant find 0 words"
return 1
}
while read getWords;do
if [[ ${#getWords} -eq $int ]];then
printf "%s\n" "$getWords"
fi
done < /usr/share/dict/words
}
readWords 20
this function takes a single argument. the declare command coerces the argument into an integer, if the argument is a string , it coerces it into a number which is 0 . Since we don't have 0 words if the specified argument ( number ) is 0 ( or a string coerced to 0 ) return from the function.
Read every single line in /usr/share/dict/words, get the length of each line with ${#getWords} ( $# >> gives the length of a string/commandline parameters/array size ) check if it equals the specified argument ( number )
A loop is not required, you can do something like
CH=$1; # how many characters the word must have
WordFile=/usr/share/dict/words; # file to read from
# find how many words that matches that length
TOTW=$(grep -Ec "^.{$CH}$" $WordFile);
# pick a random one, if you expect more than 32767 hits you
# need to do something like ($RANDOM+1)*($RANDOM+1)
RWORD=$(($RANDOM%$TOTW+1));
#show that word
grep -E "^.{$CH}$" $WordFile|sed -n "$RWORD p"
Depending on things you probably need to add checks for things like that $1 is a reasonable number, the file exist, that TOTW is >0 and so on.
This code would achieve what you want:
awk -v n="$1" 'length($0) == n' /usr/share/dict/words > /tmp/wordsHolder
shuf -n 1 /tmp/wordsHolder
Some comments: by using "$RANDOM" (as you did on your original script attempt), one would generate an integer on the range 0 - 32767, which could be more (or less) than the number of words (lines) available, given the desired number of characters on a word -- thus, potential for errors here.
To avoid that, we are using a shuf syntax that will retrieve a (sub)randomly picked word (line) on the file using its entire range (from line 1 - last line of file).

Is there a way to implement a counter in bash but for letters instead of numbers?

I'm working with an existing script which was written a bit messily. Setting up a loop with all of the spaghetti code could make a bigger headache than I want to deal with in the near term. Maybe when I have more time I can clean it up but for now, I'm just looking for a simple fix.
The script deals with virtual disks on a xen server. It reads multipath output and asks if particular LUNs should be formatted in any way based on specific criteria. However, rather than taking that disk path and inserting it, already formatted, into a configuration file, it simply presents every line in the format
'phy:/dev/mapper/UUID,xvd?,w',
UUID, of course, is an actual UUID.
The script actually presents each of the found LUNs in this format expecting the user to copy and paste them into the config file replacing each ? with a letter in sequence. This is tedious at best.
There are several ways to increment a number in bash. Among others:
var=$((var+1))
((var+=1))
((var++))
Is there a way to do the same with characters which doesn't involve looping over the entire alphabet such that I could easily "increment" the disk assignment from xvda to xvdb, etc?
To do an "increment" on a letter, define the function:
incr() { LC_CTYPE=C printf "\\$(printf '%03o' "$(($(printf '%d' "'$1")+1))")"; }
Now, observe:
$ echo $(incr a)
b
$ echo $(incr b)
c
$ echo $(incr c)
d
Because, this increments up through ASCII, incr z becomes {.
How it works
The first step is to convert a letter to its ASCII numeric value. For example, a is 97:
$ printf '%d' "'a"
97
The next step is to increment that:
$ echo "$((97+1))"
98
Or:
$ echo "$(($(printf '%d' "'a")+1))"
98
The last step is convert the new incremented number back to a letter:
$ LC_CTYPE=C printf "\\$(printf '%03o' "98")"
b
Or:
$ LC_CTYPE=C printf "\\$(printf '%03o' "$(($(printf '%d' "'a")+1))")"
b
Alternative
With bash, we can define an associative array to hold the next character:
$ declare -A Incr; last=a; for next in {b..z}; do Incr[$last]=$next; last=$next; done; Incr[z]=a
Or, if you prefer code spread out over multiple lines:
declare -A Incr
last=a
for next in {b..z}
do
Incr[$last]=$next
last=$next
done
Incr[z]=a
With this array, characters can be incremented via:
$ echo "${Incr[a]}"
b
$ echo "${Incr[b]}"
c
$ echo "${Incr[c]}"
d
In this version, the increment of z loops back to a:
$ echo "${Incr[z]}"
a
How about an array with entries A-Z assigned to indexes 1-26?
IFS=':' read -r -a alpharray <<< ":A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z"
This has 1=A, 2=B, etc. If you want 0=A, 1=B, and so on, remove the first colon.
IFS=':' read -r -a alpharray <<< "A:B:C:D:E:F:G:H:I:J:K:L:M:N:O:P:Q:R:S:T:U:V:W:X:Y:Z"
Then later, where you actually need the letter;
var=$((var+1))
'phy:/dev/mapper/UUID,xvd${alpharray[$var]},w',
The only problem is that if you end up running past 26 letters, you'll start getting blanks returned from the array.
Use a Bash 4 Range
You can use a Bash 4 feature that lets you specify a range within a sequence expression. For example:
for letter in {a..z}; do
echo "phy:/dev/mapper/UUID,xvd${letter},w"
done
See also Ranges in the Bash Wiki.
Here's a function that will return the next letter in the range a-z. An input of 'z' returns 'a'.
nextl(){
((num=(36#$(printf '%c' $1)-9) % 26+97));
printf '%b\n' '\x'$(printf "%x" $num);
}
It treats the first letter of the input as a base 36 integer, subtracts 9, and returns the character whose ordinal number is 'a' plus that value mod 26.
Use Jot
While the Bash range option uses built-ins, you can also use a utility like the BSD jot utility. This is available on macOS by default, but your mileage may vary on Linux systems. For example, you'll need to install athena-jot on Debian.
More Loops
One trick here is to pre-populate a Bash array and then use an index variable to grab your desired output from the array. For example:
letters=( "" $(jot -w %c 26 a) )
for idx in 1 26; do
echo ${letters[$idx]}
done
A Loop-Free Alternative
Note that you don't have to increment the counter in a loop. You can do it other ways, too. Consider the following, which will increment any letter passed to the function without having to prepopulate an array:
increment_var () {
local new_var=$(jot -nw %c 2 "$1" | tail -1)
if [[ "$new_var" == "{" ]]; then
echo "Error: You can't increment past 'z'" >&2
exit 1
fi
echo -n "$new_var"
}
var="c"
var=$(increment_var "$var")
echo "$var"
This is probably closer to what the OP wants, but it certainly seems more complex and less elegant than the original loop recommended elsewhere. However, your mileage may vary, and it's good to have options!

bash split a variable into two or more based on character length

I built a script for SMS autoresponder, my goal is that when an sms content has more than 160 of character length, it splits the content into two or more variables then send them separately.
myvar="this variable has more than ten character length"
That variable has 48 of character length, how do I print that variable from length 1 to length 25 and length 26 to 48 ? So i'll have 2 variables in the end and send those variables with sms:
firstvar="this variable has more th"
secondvar="an ten character length"
I know there's a command split but my openwrt doesn't support that command, so I have to find another way to do that.
Bash can split a variable into substrings using it's substitution rules.
echo ${variable:4:8}
Will display eight characters starting at offset four. The offset starts at zero.
In general:
${parameter:offset:length}
this snippet should help you:
myvar="this variable has more than ten character length"
size=${#myvar}
if [ $size -gt 25 ]; then
firstvar=${myvar:0:25}
secondvar=${myvar:26:size}
echo "$firstvar"
echo "$secondvar"
fi
A pure Bash possibility, no external tools (hence only depends on Bash and no other specific third-party tools) and no subshells:
#!/bin/bash
mysms="this variable has more than ten character length"
maxlength=25
sms_tmp=$mysms
sms_ary=()
while [[ $sms_tmp ]]; do
sms_ary+=( "${sms_tmp::maxlength}" )
sms_tmp=${sms_tmp:maxlength}
done
# At this point, you have your sms split in the array sms_ary:
# You can print them, one per line:
printf '%s\n' "${sms_ary[#]}"
# You can print them, one per line, with header:
printf -- '--START SMS-- %s --END SMS--\n' "${sms_ary[#]}"
# You can print them, space padded (spaces on the right):
printf -- "--START SMS-- %-$(maxlength}s --END SMS--\n" "${sms_ary[#]}"
# You can print them, space padded (spaces on the left):
printf -- "--START SMS-- %${maxlength}s --END SMS--\n" "${sms_ary[#]}"
# You can loop through them:
for sms in "${sms_ary[#]}"; do
printf 'Doing stuff with SMS: %s\n' "$sms"
done
# You can loop through them with index (C-style loop):
for ((i=0;i<${#sms_ary[#]};++i)); do
printf 'This is SMS #%d at index %d: %s\n' "$((i+1))" "$i" "${sms_ary[i]}"
done
# You can loop through them (using array key as variable):
n=1
for i in "${!sms_ary[#]}"; do
printf 'This is SMS #%d at index %d: %s\n' "$((n++))" "$i" "${sms_ary[i]}"
done
# Here's the number of SMS:
printf 'That was fun. There were %d chunks of SMS\n' "${#sms_ary[#]}"
Another way to split your string:
#!/bin/bash
mysms="this variable has more than ten character length"
maxlength=25
sms_ary=()
for ((i=0;i<${#mysms};i+=maxlength)); do
sms_ary+=( "${mysms:i:maxlength}" )
done
# Same as before, at this point you have your chunks in array sms_ary

Resources