How to pull a string apart by its contents - bash

I have a string in a common pattern that I want to manipulate. I want to be able to turn string 5B299 into 5B300 (increment the last number by one).
I want to avoid blindly splicing the string by index, as the first number and letter can change in size. Essentially I want to be able to get the entire value of everything after the first character, increment it by one, and re-append it.
The only things I've found online so far show me how to cut by a delimiter, but I don't have a constant delimiter.

You could use the regex features supported by the bash shell with its ~ construct that supports basic Extended Regular Expression matching (ERE). All you need to do is define a regex and work on the captured groups to get the resulting string
str=5B299
re='^(.*[A-Z])([0-9]+)$'
Now use the ~ operator to do the regex match. The ~ operator populates an array BASH_REMATCH with the captured groups if regex match was successful. The first part (5B in the example) would be stored in the index 0 and the next one at 1. We increment the value at index 1 with the $((..)) operator.
if [[ $str =~ $re ]]; then
result="${BASH_REMATCH[1]}$(( BASH_REMATCH[2] + 1 ))"
printf '%s\n' "$result"
fi
The POSIX version of the regex, free of the locale dependency would be to use character classes instead of range expressions as
posix_re='^(.*[[:alpha:]])([[:digit:]]+)$'

You can do what you are attempting fairly easily with the bash parameter-expansion for string indexes along with the POSIX arithmetic operator. For instance you could do:
#!/bin/bash
[ -z "$1" ] && { ## validate at least 1 argument provided
printf "error: please provide a number.\n" >&2
exit 1
}
[[ $1 =~ [^0-9][^0-9]* ]] && { ## validate all digits in argument
printf "error: input contains non-digit characters.\n" >&2
exit 1
}
suffix=${1:1} ## take all character past 1st as suffix
suffix=$((suffix + 1)) ## increment suffix by 1
result=${1:0:1}$suffix ## append suffent to orginal 1st character
echo "$result" ## output
exit 0
Which will leave the 1st character alone while incrementing the remaining characters by 1 and then joining again with the original 1st digit, while validating that the input consisted only of digits, e.g.
Example Use/Output
$ bash prefixsuffix.sh
error: please provide a number.
$ bash prefixsuffix.sh 38a900
error: input contains non-digit characters.
$ bash prefixsuffix.sh 38900
38901
$ bash prefixsuffix.sh 39999
310000
Look things over and let me know if that is what you intended.

You can use sed in conjunction with awk:
increment() {
echo $1 | sed -r 's/([0-9]+[a-zA-Z]+)([0-9]+)/\1 \2/' | awk '{printf "%s%d", $1, ++$2}'
}
echo $(increment "5B299")
echo $(increment "127ABC385")
echo $(increment "7cf999")
Output:
5B300
127ABC386
7cf1000

Related

Identifying hash encoding

I am creating a function that will accept an input and determine if the value is a certain type of hash encoding (md5, sha1, sha256, and sha512). I have asked a few classmates and logically it makes sense, but clearly something is wrong.
#!/usr/bin/bash
function identify-hash() {
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=]+${32}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid md5sum string"
exit
else
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=]+${40}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid sha1sum string"
exit
else
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=]+${64}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid sha256sum string"
exit
else
encryptinput=$(echo $1 | grep -E -i '^[a-z0-9=]+${128}')
if [[ -n $encryptinput ]]; then
echo "The $1 is a valid sha512sum string"
exit
else
echo "Unable to determine the hash function used to generate the input"
fi
fi
fi
fi
}
identify-hash $1
I know that hashes have a specific number of characters for them, but I don't know exactly why it's not working. Removing the {32} out of line 4 allows it to answer as a md5sum, but than it assumes everything is md5sum.
Suggestions?
Fixed your script. I advise you would have spotted most of the issues if you had used ShellCheck:
#!/usr/bin/env bash
identify_hash() {
# local variables
local -- encrypt_input
local -- sumname
# Regex capture the hexadecimal digits
if [[ "$1" =~ ([[:xdigit:]]+) ]]; then
encrypt_input="${BASH_REMATCH[1]}"
else
encrypt_input=''
fi
# Determine name of sum algorithm based on length of encrypt_input
case "${#encrypt_input}" in
32) sumname=md5sum ;;
40) sumname=sha1sum ;;
64) sumname=sha256sum ;;
128) sumname=sha512sum ;;
*) sumname=;;
esac
# If sum algorithm name found (sumname is not empty)
if [ -n "$sumname" ]; then
printf 'The %s is a valid %s string\n' "$encrypt_input" "$sumname"
else
printf 'Unable to determine the hash function used to generate the input\n' >&2
exit 1
fi
}
identify_hash "$1"
Something shorter, using bash:
checkHash() {
local -ar sumnames=([32]=md5sum [40]=sha1sum [64]=sha256sum [128]=sha512sum)
[[ "$1" =~ [[:xdigit:]]{32,129} ]]
echo "${sumnames[${#BASH_REMATCH}]+String $BASH_REMATCH could be }${sumnames[
${#BASH_REMATCH}]:-No hash tool match this string.}"
}
This will extract [:xdigit:] part out of any complete line:
checkHash 'Filename: 13aba32dbe4db7a7117ed40a25c29fa8 --'
String 13aba32dbe4db7a7117ed40a25c29fa8 could be md5sum
checkHash a32dba32dbe4db7a7117ed40a25c29fa8e4db7a7117ed40a25c29fa8
No hash tool match this string.
checkHash a32dba32dbe4db7a7117ed40a25c29fa8e4db7a7117ed40a25c29fa8da921adb
String a32dba32dbe4db7a7117ed40a25c29fa8e4db7a7117ed40a25c29fa8da921adb could be sha256sum
... then ${var+return this only if $var exist}
... and ${var:-return this if $var is empty}
Further explaining #Gordon Davissons' comment and some basics for anyone who stops by
NB This answer is extremely simplified to apply only to the current question. here's my preferred guide for more regex
Basics of regex
^ - start of a line
$ - end of a line
[...] - list of possible characters
has special sauce
a-z = all lowercase (English) letters; 0-9 = all digits; etc.
also accepts character classes - e.g [:xdigit:] for hexadecimal characters
the expression is now [[:xdigit:]] - i.e [:class:] inside [...]
{...} - number of times the preceding expression should be matched
^[a]{1}$ will match a but not aa
^f[o]{2}d$ will match food but not fod, foood, fooo*d
^[a-z]{4}$ will match
ball ✔️ but not buffalo ❌
cove ✔️ but not cover ❌
basically any line ( because of the ^...$) containing a string of exactly 4 (English) alphabetic characters
{1,5} - at least 1 and at most 5
* - shorthand for {0,} meaning 0 or any number of times
+ - shorthand for {1,} meaning at least 1; but no upper limit
? - shorthand for {1}
So ${32} is looking for 32 "end of line" \n in jargon and what you need is [a-z0-9=]{32} instead
BUT as also pointed out by Andrej Podzimek in the comments you need to match only hexadecimal [0-9a-f] characters which is the same as [:xdigit:]. Either can be used.
PS
more Basics
. (fullstop/period) matches ANY character including spaces and special characters
(...) is to match patterns
[a-z ]*(chicken).*
will match anything from chicken coop to chicken soup and please pass that chicken cookbook, Alex?
[.] means period/fullstop not any character
note the space after z this is to make space (ascii 32 ) a possible character
and . is case-insensituve
PPS if it's for homework/assignment/schoolwork, please specify so in your question :)

How to check of the user input value is Upper case, Lower case or a digit using Shell Script?

I am trying to write a shell script to read user input data and check if the input value is either Upper Case, Lower Case or anything else. But what I wrote is only checking a single character
Here is what I wrote:
printf 'Please enter a character: '
IFS= read -r c
case $c in
([[:lower:]]) echo lowercase letter;;
([[:upper:]]) echo uppercase letter;;
([[:alpha:]]) echo neither lower nor uppercase letter;;
([[:digit:]]) echo decimal digit;;
(?) echo any other single character;;
("") echo nothing;;
(*) echo anything else;;
esac
How can I make it read a long String other than a single character and get the output accordingly?
You can do it in many ways, here you have one:
#!/bin/bash
read -p "Enter something: " str
echo "Your input is: $str"
strUppercase=$(printf '%s\n' "$str" | awk '{ print toupper($0) }')
strLowercase=$(printf '%s\n' "$str" | awk '{ print tolower($0) }')
if [ -z "${str//[0-9]}" ]
then
echo "Digit"
elif [ $str == $strLowercase ]
then
echo "Lowercase"
elif [ $str == $strUppercase ]
then
echo "Uppercase"
else
echo "Something else"
fi
Preceding your use with shopt -s extglob, you can use +([[:upper:]]) to match a string composed of one or more uppercase letters.
From man 1 bash:
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol‐
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
Use, for example, +([[:upper:][:digit:] .]) to match one or more {uppercase letters, digits, spaces, dots}. Consider using some of the other following classes defined in the POSIX standard:
alnum alpha ascii blank cntrl digit graph lower print punct space upper word xdigit
Proof (just a test on an example) that it works:
shopt -s extglob; case "1A5. .Q7." in (+([[:upper:][:digit:] .])) echo "it works";; esac

How can I increment a number at the end of a string in bash?

Basically i need to create a function where an argument is passed, and i need to update the number so for example the argument would be
version_2 and after the function it would change it to version_3
just increments by one
in java I would just create a new string, and grab the last character update by one and append but not sure how to do it in bash.
updateVersion() {
version=$1
}
the prefix can be anything for example it can be dog12 or dog_12 and always has one number to update.
after the update it would be dog13 or dog_13 respectively.
updateVersion()
{
[[ $1 =~ ([^0-9]*)([0-9]+) ]] || { echo 'invalid input'; exit; }
echo "${BASH_REMATCH[1]}$(( ${BASH_REMATCH[2]} + 1 ))"
}
# Usage
updateVersion version_11 # output: version_12
updateVersion version11 # output: version12
updateVersion something_else123 # output: something_else124
updateVersion "with spaces 99" # output: with spaces 100
# Putting it in a variable
v2="$(updateVersion version2)"
echo "$v2" # output: version3
Use parameter expansion:
#! /bin/bash
shopt -s extglob
for version in version_1 version_19 version_34.14 ; do
echo $version
v=${version##*[^0-9]}
((++v))
echo ${version%%+([0-9])}$v
done
extglob is needed for the +([0-9]) construct which means "one or more digits".
incrementTrailingNumber() {
local prefix number
if [[ $1 =~ ^(.*[^[:digit:]])([[:digit:]]+)$ ]]; then
prefix=${BASH_REMATCH[1]}
number=${BASH_REMATCH[2]}
printf '%s%s\n' "$prefix" "$(( number + 1 ))"
else
printf '%s\n' "$1"
fi
}
Usage as:
$ incrementTrailingNumber version_30
version_31
$ incrementTrailingNumber foo-2.15
foo-2.16
$ incrementTrailingNumber noNumberHereAtAll # demonstrate noop case
noNumberHereAtAll
Late to the party here, but there is an issue with the accepted answer. It works for the OP's case where there are no numbers before the end, but I had an example like this:
1.0.143
For that, the regexp needs to be a bit looser. Here's how I did it, preserving leading zeroes:
#!/usr/bin/env bash
updateVersion()
{
[[ ${1} =~ ^(.*[^0-9])?([0-9]+)$ ]] && \
[[ ${#BASH_REMATCH[1]} -gt 0 ]] && \
printf "%s%0${#BASH_REMATCH[2]}d" "${BASH_REMATCH[1]}" "$((10#${BASH_REMATCH[2]} + 1 ))" || \
printf "%0${#BASH_REMATCH[2]}d" "$((10#${BASH_REMATCH[2]} + 1))" || \
printf "${1}"
}
# Usage
updateVersion 09 # output 10
updateVersion 1.0.450 # output 1.0.451
updateVersion version_01 # output version_02
updateVersion version12 # output version13
updateVersion version19 # output version20
Notes:
You only need to double-quote the first argument to printf.
Replace ${1} with content in "" if you want to use it on a command line,
instead of in a function.
You can switch the last printf to a basic echo if you prefer. If you are just printing to stdout or stderr, consider adding a newline (\n) at the end of each printf.
You can combine the function content into a single line, but it's harder to read. It's better to break it into lines with \ at every if (&&) and else (||), as above.
What the function does - line by line:
Test the passed value ends with a number of one or more digits, optionally prefixed by at least one non-number. Split into two groups accordingly (indexing is 1-based).
When ending in a number, test there is a non-numeric prefix (i.e. length of group 1 > 0).
When there are non-numerics, print group 1 (a string) followed by group 2 (an integer padded with zeroes to match the original string size). Group 2 is base-10 converted and incremented by 1. The conversion is important - leading zeroes are interpreted as octal by default.
When there are only numbers, increment as above but just print group 2.
If the input is anything else, return the supplied string.

Bash shell test if all characters in one string are in another string

I have two strings which I want to compare for equal chars, the strings must contain the exact chars but mychars can have extra chars.
mychars="abcdefg"
testone="abcdefgh" # false h is not in mychars
testtwo="abcddabc" # true all char in testtwo are in mychars
function test() {
if each char in $1 is in $2 # PSEUDO CODE
then
return 1
else
return 0
fi
}
if test $testone $mychars; then
echo "All in the string" ;
else ; echo "Not all in the string" ; fi
# should echo "Not all in the string" because the h is not in the string mychars
if test $testtwo $mychars; then
echo "All in the string" ;
else ; echo "Not all in the string" ; fi
# should echo 'All in the string'
What is the best way to do this? My guess is to loop over all the chars in the first parameter.
You can use tr to replace any char from mychars with a symbol, then you can test if the resulting string is any different from the symbol, p.e.,:
tr -s "[$mychars]" "." <<< "ggaaabbbcdefg"
Outputs:
.
But:
tr -s "[$mychars]" "." <<< "xxxggaaabbbcdefgxxx"
Prints:
xxx.xxx
So, your function could be like the following:
function test() {
local dictionary="$1"
local res=$(tr -s "[$dictionary]" "." <<< "$2")
if [ "$res" == "." ]; then
return 1
else
return 0
fi
}
Update: As suggested by #mklement0, the whole function could be shortened (and the logic fixed) by the following:
function test() {
local dictionary="$1"
[[ '.' == $(tr -s "[$dictionary]" "." <<< "$2") ]]
}
The accepted answer's solution is short, clever, and efficient.
Here's a less efficient alternative, which may be of interest if you want to know which characters are unique to the 1st string, returned as a sorted, distinct list:
charTest() {
local charsUniqueToStr1
# Determine which chars. in $1 aren't in $2.
# This returns a sorted, distinct list of chars., each on its own line.
charsUniqueToStr1=$(comm -23 \
<(sed 's/\(.\)/\1\'$'\n''/g' <<<"$1" | sort -u) \
<(sed 's/\(.\)/\1\'$'\n''/g' <<<"$2" | sort -u))
# The test succeeds if there are no chars. in $1 that aren't also in $2.
[[ -z $charsUniqueToStr1 ]]
}
mychars="abcdefg" # define reference string
charTest "abcdefgh" "$mychars"
echo $? # print exit code: 1 - 'h' is not in reference string
charTest "abcddabc" "$mychars"
echo $? # print exit code: 0 - all chars. are in reference string
Note that I've renamed test() to charTest() to avoid a name collision with the test builtin/utility.
sed 's/\(.\)/\1\'$'\n''/g' splits the input into individual characters by placing each on a separate line.
Note that the command creates an extra empty line at the end, but that doesn't matter in this case; to eliminate it, append ; ${s/\n$//;} to the sed script.
The command is written in a POSIX-compliant manner, which complicates it, due to having to splice in an \-escaped actual newline (via an ANSI C-quoted string, $\n'); if you have GNU sed, you can simplify to sed -r 's/(.)/\1\n/g
sort -u then sorts the resulting list of characters and weeds out duplicates (-u).
comm -23 compares the distinct set of sorted characters in both strings and prints those unique to the 1st string (comm uses a 3-column layout, with the 1st column containing lines unique to the 1st file, the 2nd column containing lines unique to the 2nd column, and the 3rd column printing lines the two input files have in common; -23 suppresses the 2nd and 3rd columns, effectively only printing the lines that are unique to the 1st input).
[[ -z $charsUniqueToStr1 ]] then tests if $charsUniqueToStr1 is empty (-z);
in other words: success (exit code 0) is indicated, if the 1st string contains no chars. that aren't also contained in the 2nd string; otherwise, failure (exit code 1); by virtue of the conditional ([[ .. ]]) being the last statement in the function, its exit code also becomes the function's exit code.

substring extraction in bash

iamnewbie: this code is inefficient but it should extract the substring, the problem is with last echo statement,need some insight.
function regex {
#this function gives the regular expression needed
echo -n \'
for (( i = 1 ; i <= $1 ; i++ ))
do
echo -n .
done
echo -n '\('
for (( i = 1 ; i <= $2 ; i++ ))
do
echo -n .
done
echo -n '\)'
echo -n \'
}
# regex function ends
echo "Enter the string:"
read stg
#variable stg holds the string entered
if [ -z "$stg" ] ; then
echo "Null string"
exit
else
echo "Length of the $stg is:"
z=`expr "$stg" : '.*' `
#variable z holds the length of given string
echo $z
fi
echo "Enter the number of trailing characters to be extracted from $stg:"
read n
m=`expr $z - $n `
#variable m holds an integer value which is equal to total length - length of characters to be extracted
x=$(regex $m $n)
echo ` expr "$stg" : "$x" `
#the echo statement(above) is just printing a newline!! But not the result
What I intend to do with this code is, if I enter "racecar" and give "3" , it should display "car" which are the last three characters. Instead of displaying "car" its just printing a newline. Please correct this code rather than giving a better one.
Although you didn't ask for a better solution, it's worth mentioning:
$ n=3
$ stg=racecar
$ echo "${stg: -n}"
car
Note that the space after the : in ${stg: -n} is required. Without the space, the parameter expansion is a default-value expansion rather than a substring expansion. With the space, it's a substring expansion; -n is interpreted as an arithmetic expression (which means that n is interpreted as $n) and since the result is a negative number, it specifies the number of characters from the end to start the substring. See the Bash manual for details.
Your solution is based on evaluating the equivalent of:
expr "$stg" : '......\(...\)'
with an appropriate number of dots. It's important to understand what the above bash syntax actually means. It invokes the command expr, passing it three arguments:
arg 1: the contents of the variable stg
arg 2: :
arg 3: ......\(...\)
Note that there are no quotes visible. That's because the quotes are part of bash syntax, not part of the argument values.
If the value of stg had enough characters, the result of the above expr invocation would be to print out the 7th, 8th and 9th character of the value of stg`. Otherwise, it would print a blank line, and fail.
But that's not what you are doing. You're creating the regular expression:
'......\(...\)'
which has single quotes in it. Since single-quotes are not special characters in a regex, they match themselves; in other words, that pattern will match a string which starts with a single quote, followed by nine arbitrary characters, followed by another single quote. And if the string does match, it will print the three characters prior to the second single-quote.
Of course, since the regular expression you make has a . for every character in the target string, it won't match the target even if the target started and begun with a single-quote, since there would be too many dots in the regex to match that.
If you don't put single quotes into the regex, then your program will work, but I have to say that few times have I seen such an intensely circuitous implementation of the substring function. If you're not trying to win an obfuscated bash competition (a difficult challenge since most production bash code is obfuscated by nature), I'd suggest you use normal bash features instead of trying to do everything with regexen.
One of those is the syntax to determine the length of a string:
$ stg=racecar
$ echo ${#stg}
7
(although, as shown at the beginning, you don't actually even need that.)
What about:
$ n=3
$ string="racecar"
$ [[ "$string" =~ (.{$n})$ ]]
$ echo ${BASH_REMATCH[1]}
car
This looks for the last n characters at the end of the line. In a script:
#!/bin/bash
read -p "Enter a string: " string
read -p "Enter the number of characters you want from the end: " n
[[ "$string" =~ (.{$n})$ ]]
echo "These are the last $n characters: ${BASH_REMATCH[1]}"
You may want to add some more error handling, but this'll do it.
I'm not sure you need loops for this task. I wrote some example to get two parameters from user and cut the word according to it.
#!/bin/bash
read -p "Enter some word? " -e stg
#variable stg holds the string entered
if [ -z "$stg" ] ; then
echo "Null string"
exit 1
fi
read -p "Enter some number to set word length? " -e cutNumber
# check that cutNumber is a number
if ! [ "$cutNumber" -eq "$cutNumber" ]; then
echo "Not a number!"
exit 1
fi
echo "Cut first n characters:"
echo ${stg:$cutNumber}
echo
echo "Show first n characters:"
echo ${stg:0:$cutNumber}
echo "Alternative get last n characters:"
echo -n "$stg" | tail -c $cutNumber
echo
Example:
Enter some word? TheRaceCar
Enter some number to set word length? 7
Cut first n characters:
Car
Show first n characters:
TheRace
Alternative get last n characters:
RaceCar

Resources