substring extraction in bash

substring extraction in bash - bash

iamnewbie: this code is inefficient but it should extract the substring, the problem is with last echo statement,need some insight.
function regex {
#this function gives the regular expression needed
echo -n \'
for (( i = 1 ; i <= $1 ; i++ ))
do
echo -n .
done
echo -n '\('
for (( i = 1 ; i <= $2 ; i++ ))
do
echo -n .
done
echo -n '\)'
echo -n \'
}
# regex function ends
echo "Enter the string:"
read stg
#variable stg holds the string entered
if [ -z "$stg" ] ; then
echo "Null string"
exit
else
echo "Length of the $stg is:"
z=`expr "$stg" : '.*' `
#variable z holds the length of given string
echo $z
fi
echo "Enter the number of trailing characters to be extracted from $stg:"
read n
m=`expr $z - $n `
#variable m holds an integer value which is equal to total length - length of characters to be extracted
x=$(regex $m $n)
echo ` expr "$stg" : "$x" `
#the echo statement(above) is just printing a newline!! But not the result
What I intend to do with this code is, if I enter "racecar" and give "3" , it should display "car" which are the last three characters. Instead of displaying "car" its just printing a newline. Please correct this code rather than giving a better one.

Although you didn't ask for a better solution, it's worth mentioning:
$ n=3
$ stg=racecar
$ echo "${stg: -n}"
car
Note that the space after the : in ${stg: -n} is required. Without the space, the parameter expansion is a default-value expansion rather than a substring expansion. With the space, it's a substring expansion; -n is interpreted as an arithmetic expression (which means that n is interpreted as $n) and since the result is a negative number, it specifies the number of characters from the end to start the substring. See the Bash manual for details.
Your solution is based on evaluating the equivalent of:
expr "$stg" : '......$...$'
with an appropriate number of dots. It's important to understand what the above bash syntax actually means. It invokes the command expr, passing it three arguments:
arg 1: the contents of the variable stg
arg 2: :
arg 3: ......$...$
Note that there are no quotes visible. That's because the quotes are part of bash syntax, not part of the argument values.
If the value of stg had enough characters, the result of the above expr invocation would be to print out the 7th, 8th and 9th character of the value of stg`. Otherwise, it would print a blank line, and fail.
But that's not what you are doing. You're creating the regular expression:
'......$...$'
which has single quotes in it. Since single-quotes are not special characters in a regex, they match themselves; in other words, that pattern will match a string which starts with a single quote, followed by nine arbitrary characters, followed by another single quote. And if the string does match, it will print the three characters prior to the second single-quote.
Of course, since the regular expression you make has a . for every character in the target string, it won't match the target even if the target started and begun with a single-quote, since there would be too many dots in the regex to match that.
If you don't put single quotes into the regex, then your program will work, but I have to say that few times have I seen such an intensely circuitous implementation of the substring function. If you're not trying to win an obfuscated bash competition (a difficult challenge since most production bash code is obfuscated by nature), I'd suggest you use normal bash features instead of trying to do everything with regexen.
One of those is the syntax to determine the length of a string:
$ stg=racecar
$ echo ${#stg}
7
(although, as shown at the beginning, you don't actually even need that.)

What about:
$ n=3
$ string="racecar"
$ [[ "$string" =~ (.{$n})$ ]]
$ echo ${BASH_REMATCH[1]}
car
This looks for the last n characters at the end of the line. In a script:
#!/bin/bash
read -p "Enter a string: " string
read -p "Enter the number of characters you want from the end: " n
[[ "$string" =~ (.{$n})$ ]]
echo "These are the last $n characters: ${BASH_REMATCH[1]}"
You may want to add some more error handling, but this'll do it.

I'm not sure you need loops for this task. I wrote some example to get two parameters from user and cut the word according to it.
#!/bin/bash
read -p "Enter some word? " -e stg
#variable stg holds the string entered
if [ -z "$stg" ] ; then
echo "Null string"
exit 1
fi
read -p "Enter some number to set word length? " -e cutNumber
# check that cutNumber is a number
if ! [ "$cutNumber" -eq "$cutNumber" ]; then
echo "Not a number!"
exit 1
fi
echo "Cut first n characters:"
echo ${stg:$cutNumber}
echo
echo "Show first n characters:"
echo ${stg:0:$cutNumber}
echo "Alternative get last n characters:"
echo -n "$stg" | tail -c $cutNumber
echo
Example:
Enter some word? TheRaceCar
Enter some number to set word length? 7
Cut first n characters:
Car
Show first n characters:
TheRace
Alternative get last n characters:
RaceCar

Related

Identifying hash encoding

I am creating a function that will accept an input and determine if the value is a certain type of hash encoding (md5, sha1, sha256, and sha512). I have asked a few classmates and logically it makes sense, but clearly something is wrong.
#!/usr/bin/bash
function identify-hash() {
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=]+${32}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid md5sum string"
exit
else
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=]+${40}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid sha1sum string"
exit
else
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=]+${64}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid sha256sum string"
exit
else
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=]+${128}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid sha512sum string"
exit
else
echo "Unable to determine the hash function used to generate the input"
fi
fi
fi
fi
}
identify-hash $1
I know that hashes have a specific number of characters for them, but I don't know exactly why it's not working. Removing the {32} out of line 4 allows it to answer as a md5sum, but than it assumes everything is md5sum.
Suggestions?

Fixed your script. I advise you would have spotted most of the issues if you had used ShellCheck:
#!/usr/bin/env bash
identify_hash() {
# local variables
local -- encrypt_input
local -- sumname
# Regex capture the hexadecimal digits
if [[ "$1" =~ ([[:xdigit:]]+) ]]; then
encrypt_input="${BASH_REMATCH[1]}"
else
encrypt_input=''
fi
# Determine name of sum algorithm based on length of encrypt_input
case "${#encrypt_input}" in
32) sumname=md5sum ;;
40) sumname=sha1sum ;;
64) sumname=sha256sum ;;
128) sumname=sha512sum ;;
*) sumname=;;
esac
# If sum algorithm name found (sumname is not empty)
if [ -n "$sumname" ]; then
printf 'The %s is a valid %s string\n' "$encrypt_input" "$sumname"
else
printf 'Unable to determine the hash function used to generate the input\n' >&2
exit 1
fi
}
identify_hash "$1"

Something shorter, using bash:
checkHash() {
local -ar sumnames=([32]=md5sum [40]=sha1sum [64]=sha256sum [128]=sha512sum)
[[ "$1" =~ [[:xdigit:]]{32,129} ]]
echo "${sumnames[${#BASH_REMATCH}]+String $BASH_REMATCH could be }${sumnames[
${#BASH_REMATCH}]:-No hash tool match this string.}"
}
This will extract [:xdigit:] part out of any complete line:
checkHash 'Filename: 13aba32dbe4db7a7117ed40a25c29fa8 --'
String 13aba32dbe4db7a7117ed40a25c29fa8 could be md5sum
checkHash a32dba32dbe4db7a7117ed40a25c29fa8e4db7a7117ed40a25c29fa8
No hash tool match this string.
checkHash a32dba32dbe4db7a7117ed40a25c29fa8e4db7a7117ed40a25c29fa8da921adb
String a32dba32dbe4db7a7117ed40a25c29fa8e4db7a7117ed40a25c29fa8da921adb could be sha256sum
... then ${var+return this only if $var exist}
... and ${var:-return this if $var is empty}

Further explaining #Gordon Davissons' comment and some basics for anyone who stops by
NB This answer is extremely simplified to apply only to the current question. here's my preferred guide for more regex
Basics of regex
^ - start of a line
$ - end of a line
[...] - list of possible characters
has special sauce
a-z = all lowercase (English) letters; 0-9 = all digits; etc.
also accepts character classes - e.g [:xdigit:] for hexadecimal characters
the expression is now [[:xdigit:]] - i.e [:class:] inside [...]
{...} - number of times the preceding expression should be matched
^[a]{1}$ will match a but not aa
^f[o]{2}d$ will match food but not fod, foood, fooo*d
^[a-z]{4}$ will match
ball ✔️ but not buffalo ❌
cove ✔️ but not cover ❌
basically any line ( because of the ^...$) containing a string of exactly 4 (English) alphabetic characters
{1,5} - at least 1 and at most 5
* - shorthand for {0,} meaning 0 or any number of times
+ - shorthand for {1,} meaning at least 1; but no upper limit
? - shorthand for {1}
So ${32} is looking for 32 "end of line" \n in jargon and what you need is [a-z0-9=]{32} instead
BUT as also pointed out by Andrej Podzimek in the comments you need to match only hexadecimal [0-9a-f] characters which is the same as [:xdigit:]. Either can be used.
PS
more Basics
. (fullstop/period) matches ANY character including spaces and special characters
(...) is to match patterns
[a-z ]*(chicken).*
will match anything from chicken coop to chicken soup and please pass that chicken cookbook, Alex?
[.] means period/fullstop not any character
note the space after z this is to make space (ascii 32 ) a possible character
and . is case-insensituve
PPS if it's for homework/assignment/schoolwork, please specify so in your question :)

How to check of the user input value is Upper case, Lower case or a digit using Shell Script?

I am trying to write a shell script to read user input data and check if the input value is either Upper Case, Lower Case or anything else. But what I wrote is only checking a single character
Here is what I wrote:
printf 'Please enter a character: '
IFS= read -r c
case $c in
([[:lower:]]) echo lowercase letter;;
([[:upper:]]) echo uppercase letter;;
([[:alpha:]]) echo neither lower nor uppercase letter;;
([[:digit:]]) echo decimal digit;;
(?) echo any other single character;;
("") echo nothing;;
(*) echo anything else;;
esac
How can I make it read a long String other than a single character and get the output accordingly?

You can do it in many ways, here you have one:
#!/bin/bash
read -p "Enter something: " str
echo "Your input is: $str"
strUppercase=$(printf '%s\n' "$str" | awk '{ print toupper($0) }')
strLowercase=$(printf '%s\n' "$str" | awk '{ print tolower($0) }')
if [ -z "${str//[0-9]}" ]
then
echo "Digit"
elif [ $str == $strLowercase ]
then
echo "Lowercase"
elif [ $str == $strUppercase ]
then
echo "Uppercase"
else
echo "Something else"
fi

Preceding your use with shopt -s extglob, you can use +([[:upper:]]) to match a string composed of one or more uppercase letters.
From man 1 bash:
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol‐
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Use, for example, +([[:upper:][:digit:] .]) to match one or more {uppercase letters, digits, spaces, dots}. Consider using some of the other following classes defined in the POSIX standard:
alnum alpha ascii blank cntrl digit graph lower print punct space upper word xdigit
Proof (just a test on an example) that it works:
shopt -s extglob; case "1A5. .Q7." in (+([[:upper:][:digit:] .])) echo "it works";; esac

How to pull a string apart by its contents

I have a string in a common pattern that I want to manipulate. I want to be able to turn string 5B299 into 5B300 (increment the last number by one).
I want to avoid blindly splicing the string by index, as the first number and letter can change in size. Essentially I want to be able to get the entire value of everything after the first character, increment it by one, and re-append it.
The only things I've found online so far show me how to cut by a delimiter, but I don't have a constant delimiter.

You could use the regex features supported by the bash shell with its ~ construct that supports basic Extended Regular Expression matching (ERE). All you need to do is define a regex and work on the captured groups to get the resulting string
str=5B299
re='^(.*[A-Z])([0-9]+)$'
Now use the ~ operator to do the regex match. The ~ operator populates an array BASH_REMATCH with the captured groups if regex match was successful. The first part (5B in the example) would be stored in the index 0 and the next one at 1. We increment the value at index 1 with the $((..)) operator.
if [[ $str =~ $re ]]; then
result="${BASH_REMATCH[1]}$(( BASH_REMATCH[2] + 1 ))"
printf '%s\n' "$result"
fi
The POSIX version of the regex, free of the locale dependency would be to use character classes instead of range expressions as
posix_re='^(.*[[:alpha:]])([[:digit:]]+)$'

You can do what you are attempting fairly easily with the bash parameter-expansion for string indexes along with the POSIX arithmetic operator. For instance you could do:
#!/bin/bash
[ -z "$1" ] && { ## validate at least 1 argument provided
printf "error: please provide a number.\n" >&2
exit 1
}
[[ $1 =~ [^0-9][^0-9]* ]] && { ## validate all digits in argument
printf "error: input contains non-digit characters.\n" >&2
exit 1
}
suffix=${1:1} ## take all character past 1st as suffix
suffix=$((suffix + 1)) ## increment suffix by 1
result=${1:0:1}$suffix ## append suffent to orginal 1st character
echo "$result" ## output
exit 0
Which will leave the 1st character alone while incrementing the remaining characters by 1 and then joining again with the original 1st digit, while validating that the input consisted only of digits, e.g.
Example Use/Output
$ bash prefixsuffix.sh
error: please provide a number.
$ bash prefixsuffix.sh 38a900
error: input contains non-digit characters.
$ bash prefixsuffix.sh 38900
38901
$ bash prefixsuffix.sh 39999
310000
Look things over and let me know if that is what you intended.

You can use sed in conjunction with awk:
increment() {
echo $1 | sed -r 's/([0-9]+[a-zA-Z]+)([0-9]+)/\1 \2/' | awk '{printf "%s%d", $1, ++$2}'
}
echo $(increment "5B299")
echo $(increment "127ABC385")
echo $(increment "7cf999")
Output:
5B300
127ABC386
7cf1000

BASH trying to create scrabble word generator, generated error: line 15: ?: syntax error: operand expected (error token is "?")

This is the script that will run if there are any "?" characters in the command line argument of the main script. I'm using i to run through the alphabet and essentially setting each "?" to $i for each letter in the alphabet. I think that the error is occurring because a special character is showing up at the end of the character array that can't be compared to "?", but I don't know how to deal with that. I did use ShellCheck already.
EDIT: The text file that is generated by the main script includes all permutations (no repeats) of the letters provided in the command line argument with a hard return between each permutation, and I am pretty sure that the hard return character is what is generating the error. Here is an example of permute.txt
at?
a?t
ta?
t?a
?at
?ta
And here is the slightly edited script:
#reads the text file produced by another script into the array
readarray words < ./permute.txt
#runs through loop 26 times with i as a letter in the alphabet
for i in {a..z}; do
#runs through every word in the array provided by the text file
for j in "${words[#]}"; do
s=$j #sets $j to a variable to mess with
declare -a a
word=""
#splits string s into character array
while read -n 1 c; do a+=($c); done <<< "$s"
#checks array for "?", used as a blank tile and changes that character to $i
for k in "${a[#]}"; do if [ "$k" = "?" ]; then a[$k]=$i; fi; done
#puts the array back together into a string
for ((k=0; k<${#a}; k++)); do echo $word; word=$word${a[$k]}; done
valid=$(grep -w $word /usr/share/dict/words) #checks for word in dictionary
#statement that fixes bug where words with apostrophes end up in the output
if [ ${#valid} -eq ${#i} ]; then
echo "$valid"
fi
done
done

Do I understand rigth, that you want wo replace the ? in the permutations with all characters a..z and then check if they are in the dictionary?
I would try without manually array work and let sed work for you (replacement of ?).
for i in {a..z}; do
for word in $(sed "s/\?/$i/" permute.txt); do
grep -w $word /usr/share/dict/words
done
done
You could even use bash's built in replacement in variables:
for i in {a..z}; do
for word in $(< permute.txt); do
grep -w ${word/\?/$i} /usr/share/dict/words
done
done
btw: I don't understand the last if statement. Since $i is a single character and $valid is a whole line, that at least contains a (in the example) 3-character word, if [ ${#valid} -eq ${#i} ] will never be true.

shell - put variable in ALPHABETICAL range of cycle

is there some way to put variable in ALPHABETICAL range of cycle?
This doesnt work.
read -p "Where I should start?" start #there will be entered one small letter
for aaa in {$start..z}; do #how put variable $start in range?
...
done
Thanks for reply.

Use eval to expand the variable:
$ s=t
$ eval echo {$s..z}
t u v w x y z
Your example then becomes:
read -p "Where I should start?" start #there will be entered one small letter
for aaa in $(eval echo {$start..z}); do
echo $aaa
done
Since you have user input to eval, you may want to check the value of start as being a single lower case character first:
read -p "Where I should start?" start #there will be entered one small letter
if [[ $start =~ ^[a-y]$ ]]; then
for aaa in $(eval echo {$start..z}); do
echo $aaa
done
else
echo "Need to use a letter 'a-y'"
fi
You can read more about Bash brace expansion here

Unfortunately, you can't put variables inside {start..end} ranges in bash.
This does what you want:
until [[ $s == "_" ]]; do echo $s && s=$(tr "a-z" "b-z_" <<<$s); done
It uses tr to translate each character to the next one. "_" is the character after "z".
For example:
$ s=t
$ until [[ $s == "_" ]]; do echo $s && s=$(tr "a-z" "b-z_" <<<$s); done
t
u
v
w
x
y
z
If you don't mind using Perl, you could use this:
perl -le 'print for shift .. "z"' $s
It uses .. to create a list between the first argument on the command line and "z".
A slightly more esoteric way to do it in bash would be:
for ((i=$(LC_CTYPE=C printf '%d' "'$s"); i<=122; ++i)); do
printf "\\$(printf '%03o' $i)\n"
done
The for loop goes from the ASCII character number of the variable $s to "z", which is ASCII character 122. The format specifier the inner printf converts the character number to octal, padding it with zeros up to three characters long. The outer printf then interprets this as an escape sequence and prints the character. Credit goes to Greg's wiki for the code used to convert ASCII characters to their values.
Of course you could just use eval to expand the variable, the advantage being that the code required to do so is much shorter. However, executing arbitrary strings that have been read in to your script is arguably a bit of a security hole.

x=t
for I in $(sed -nr "s/.*($x.*)/\1/;s/ /\n/g;p" <(echo {a..z}))
do
# do something with $I
done
Output:
t
u
v
w
x
y
z

I would avoid the use of eval.
for aaa in {a..z}; do
[[ $aaa < $start ]] && continue
...
done
The overhead of comparing $aaa to $start should be negligible, especially compare to the cost of starting a separate process to compute the range.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio