Length of string in bash - bash

How do you get the length of a string stored in a variable and assign that to another variable?
myvar="some string"
echo ${#myvar}
# 11
How do you set another variable to the output 11?

To get the length of a string stored in a variable, say:
myvar="some string"
size=${#myvar}
To confirm it was properly saved, echo it:
$ echo "$size"
11

Edit 2023-02-13: Use of printf %n instead of locales...
UTF-8 string length
In addition to fedorqui's correct answer, I would like to show the difference between string length and byte length:
myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
LANG=$oLang LC_ALL=$oLcAll
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
will render:
Généralités is 11 char len, but 14 bytes len.
you could even have a look at stored chars:
myvar='Généralités'
chrlen=${#myvar}
oLang=$LANG oLcAll=$LC_ALL
LANG=C LC_ALL=C
bytlen=${#myvar}
printf -v myreal "%q" "$myvar"
LANG=$oLang LC_ALL=$oLcAll
printf "%s has %d chars, %d bytes: (%s).\n" "${myvar}" $chrlen $bytlen "$myreal"
will answer:
Généralités has 11 chars, 14 bytes: ($'G\303\251n\303\251ralit\303\251s').
Nota: According to Isabell Cowan's comment, I've added setting to $LC_ALL along with $LANG.
Same, but without having to play with locales
I recently learn %n format of printf command (builtin):
myvar='Généralités'
chrlen=${#myvar}
printf -v _ %s%n "$myvar" bytlen
printf "%s is %d char len, but %d bytes len.\n" "${myvar}" $chrlen $bytlen
Généralités is 11 char len, but 14 bytes len.
Syntax is a little counter-intuitive, but this is very efficient! (further function strU8DiffLen is about 2 time quicker by using printf than previous version using local LANG=C.)
Length of an argument, working sample
Argument work same as regular variables
showStrLen() {
local -i chrlen=${#1} bytlen
printf -v _ %s%n "$1" bytlen
LANG=$oLang LC_ALL=$oLcAll
printf "String '%s' is %d bytes, but %d chars len: %q.\n" "$1" $bytlen $chrlen "$1"
}
will work as
showStrLen théorème
String 'théorème' is 10 bytes, but 8 chars len: $'th\303\251or\303\250me'
Useful printf correction tool:
If you:
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
printf " - %-14s is %2d char length\n" "'$string'" ${#string}
done
- 'Généralités' is 11 char length
- 'Language' is 8 char length
- 'Théorème' is 8 char length
- 'Février' is 7 char length
- 'Left: ←' is 7 char length
- 'Yin Yang ☯' is 10 char length
Not really pretty output!
For this, here is a little function:
strU8DiffLen() {
local -i bytlen
printf -v _ %s%n "$1" bytlen
return $(( bytlen - ${#1} ))
}
or written in one line:
strU8DiffLen() { local -i _bl;printf -v _ %s%n "$1" _bl;return $((_bl-${#1}));}
Then now:
for string in Généralités Language Théorème Février "Left: ←" "Yin Yang ☯";do
strU8DiffLen "$string"
printf " - %-$((14+$?))s is %2d chars length, but uses %2d bytes\n" \
"'$string'" ${#string} $((${#string}+$?))
done
- 'Généralités' is 11 chars length, but uses 14 bytes
- 'Language' is 8 chars length, but uses 8 bytes
- 'Théorème' is 8 chars length, but uses 10 bytes
- 'Février' is 7 chars length, but uses 8 bytes
- 'Left: ←' is 7 chars length, but uses 9 bytes
- 'Yin Yang ☯' is 10 chars length, but uses 12 bytes
Unfortunely, this is not perfect!
But there left some strange UTF-8 behaviour, like double-spaced chars, zero spaced chars, reverse deplacement and other that could not be as simple...
Have a look at diffU8test.sh or diffU8test.sh.txt for more limitations.

I wanted the simplest case, finally this is a result:
echo -n 'Tell me the length of this sentence.' | wc -m;
36

You can use:
MYSTRING="abc123"
MYLENGTH=$(printf "%s" "$MYSTRING" | wc -c)
wc -c or wc --bytes for byte counts = Unicode characters are counted with 2, 3 or more bytes.
wc -m or wc --chars for character counts = Unicode characters are counted single until they use more bytes.

In response to the post starting:
If you want to use this with command line or function arguments...
with the code:
size=${#1}
There might be the case where you just want to check for a zero length argument and have no need to store a variable. I believe you can use this sort of syntax:
if [ -z "$1" ]; then
#zero length argument
else
#non-zero length
fi
See GNU and wooledge for a more complete list of Bash conditional expressions.

If you want to use this with command line or function arguments, make sure you use size=${#1} instead of size=${#$1}. The second one may be more instinctual but is incorrect syntax.

Using your example provided
#KISS (Keep it simple stupid)
size=${#myvar}
echo $size

Here is couple of ways to calculate length of variable :
echo ${#VAR}
echo -n $VAR | wc -m
echo -n $VAR | wc -c
printf $VAR | wc -m
expr length $VAR
expr $VAR : '.*'
and to set the result in another variable just assign above command with back quote into another variable as following:
otherVar=`echo -n $VAR | wc -m`
echo $otherVar
http://techopsbook.blogspot.in/2017/09/how-to-find-length-of-string-variable.html

I know that the Q and A's are old enough, but today I faced this task for first time. Usually I used the ${#var} combination, but it fails with unicode: most text I process with the bash is in Cyrillic...
Based on #atesin's answer, I made short (and ready to be more shortened) function which may be usable for scripting. That was a task which led me to this question: to show some message of variable length in pseudo-graphics box. So, here it is:
$ cat draw_border.sh
#!/bin/sh
#based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash
border()
{
local BPAR="$1"
local BPLEN=`echo $BPAR|wc -m`
local OUTLINE=\|\ "$1"\ \|
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 # 8:47
local OUTBORDER=\+`head -c $(($BPLEN+1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER
echo $OUTLINE
echo $OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"
And what this sample produces:
$ draw_border.sh
+-------------+
| Généralités |
+-------------+
+----------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+----------------------------------+
+--------------+
| pure ENGLISH |
+--------------+
First example (in French?) was taken from someone's example above.
Second one combines Cyrillic and the value of some variable. Third one is self-explaining: only 1s 1/2 of ASCII chars.
I used echo $BPAR|wc -m instead of printf ... in order to not rely on if the printf is buillt-in or not.
Above I saw talks about trailing newline and -n parameter for echo. I did not used it, thus I add only one to the $BPLEN. Should I use -n, I must add 2.
To explain the difference between wc -m and wc -c, see the same script with only one minor change: -m was replaced with -c
$ draw_border.sh
+----------------+
| Généralités |
+----------------+
+---------------------------------------------+
| А вот еще одна /usr/bin/lesspipe |
+---------------------------------------------+
+--------------+
| pure ENGLISH |
+--------------+
Accented characters in Latin, and most of characters in Cyrillic are two-byte, thus the length of drawn horizontals are greater than the real length of the message.
Hope, it will save some one some time :-)
p.s. Russian text says "here is one more"
p.p.s. Working "two-liner"
#!/bin/sh
#based on https://stackoverflow.com/questions/17368067/length-of-string-in-bash
border()
{
# line below based on https://www.cyberciti.biz/faq/repeat-a-character-in-bash-script-under-linux-unix/
# comment of Bit Twiddler Jun 5, 2021 # 8:47
local OUTBORDER=\+`head -c $(( $(echo "$1"|wc -m) +1))</dev/zero|tr '\0' '-'`\+
echo $OUTBORDER"\n"\|\ "$1"\ \|"\n"$OUTBORDER
}
border "Généralités"
border 'А вот еще одна '$LESSCLOSE' '
border "pure ENGLISH"
In order to not clutter the code with repetitive OUTBORDER's drawing, I put the forming of OUTBORDER into separate command

Maybe just use wc -c to count the number of characters:
myvar="Hello, I am a string."
echo -n $myvar | wc -c
Result:
21

Length of string in bash
str="Welcome to Stackoveflow"
length=`expr length "$str"`
echo "Length of '$str' is $length"
OUTPUT
Length of 'Welcome to Stackoveflow' is 23

Related

Is there a command for substituting a set of characters by a set of strings?

I'm would like to substitute a set of edit: single byte characters with a set of literal strings in a stream, without any constraint on the line size.
#!/bin/bash
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
chars_to_strings $'\a\b\t\v' '<bell>' '<backspace>' '<horizontal-tab>' '<vertical-tab>'
The expected output would be:
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>,<backspace>,<horizontal-tab>,<vertical-tab><bell>...
I can think of a bash function that would do that, something like:
chars_to_strings() {
local delim buffer
while true
do
delim=''
IFS='' read -r -d '.' -n 4096 buffer && (( ${#buffer} != 4096 )) && delim='.'
if [[ -n "${delim:+_}" ]] || [[ -n "${buffer:+_}" ]]
then
# Do the replacements in "$buffer"
# ...
printf "%s%s" "$buffer" "$delim"
else
break
fi
done
}
But I'm looking for a more efficient way, any thoughts?
Since you seem to be okay with using ANSI C quoting via $'...' strings, then maybe use sed?
sed $'s/\a/<bell>/g; s/\b/<backspace>/g; s/\t/<horizontal-tab>/g; s/\v/<vertical-tab>/g'
Or, via separate commands:
sed -e $'s/\a/<bell>/g' \
-e $'s/\b/<backspace>/g' \
-e $'s/\t/<horizontal-tab>/g' \
-e $'s/\v/<vertical-tab>/g'
Or, using awk, which replaces newline characters too (by customizing the Output Record Separator, i.e., the ORS variable):
$ printf '\a,\b,\t,\v\n' | awk -vORS='<newline>' '
{
gsub(/\a/, "<bell>")
gsub(/\b/, "<backspace>")
gsub(/\t/, "<horizontal-tab>")
gsub(/\v/, "<vertical-tab>")
print $0
}
'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab><newline>
For a simple one-liner with reasonable portability, try Perl.
for (( i = 1; i <= 0x7FFFFFFFFFFFFFFF; i++ ))
do
printf '\a,\b,\t,\v'
done |
perl -pe 's/\a/<bell>/g;
s/\b/<backspace>/g;s/\t/<horizontal-tab>/g;s/\v/<vertical-tab>/g'
Perl internally does some intelligent optimizations so it's not encumbered by lines which are longer than its input buffer or whatever.
Perl by itself is not POSIX, of course; but it can be expected to be installed on any even remotely modern platform (short of perhaps embedded systems etc).
Assuming the overall objective is to provide the ability to process a stream of data in real time without having to wait for a EOL/End-of-buffer occurrence to trigger processing ...
A few items:
continue to use the while/read -n loop to read a chunk of data from the incoming stream and store in buffer variable
push the conversion code into something that's better suited to string manipulation (ie, something other than bash); for sake of discussion we'll choose awk
within the while/read -n loop printf "%s\n" "${buffer}" and pipe the output from the while loop into awk; NOTE: the key item is to introduce an explicit \n into the stream so as to trigger awk processing for each new 'line' of input; OP can decide if this additional \n must be distinguished from a \n occurring in the original stream of data
awk then parses each line of input as per the replacement logic, making sure to append anything leftover to the front of the next line of input (ie, for when the while/read -n breaks an item in the 'middle')
General idea:
chars_to_strings() {
while read -r -n 15 buffer # using '15' for demo purposes otherwise replace with '4096' or whatever OP wants
do
printf "%s\n" "${buffer}"
done | awk '{print NR,FNR,length($0)}' # replace 'print ...' with OP's replacement logic
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1 # add some delay to data being streamed to chars_to_strings()
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
A variation on this idea using a named pipe:
mkfifo /tmp/pipeX
sleep infinity > /tmp/pipeX # keep pipe open so awk does not exit
awk '{print NR,FNR,length($0)}' < /tmp/pipeX &
chars_to_strings() {
while read -r -n 15 buffer
do
printf "%s\n" "${buffer}"
done > /tmp/pipeX
}
Take for a test drive:
for (( i = 1; i <= 20; i++ ))
do
printf '\a,\b,\t,\v'
sleep 0.1
done | chars_to_strings
1 1 15 # output starts printing right away
2 2 15 # instead of waiting for the 'for'
3 3 15 # loop to complete
4 4 15
5 5 13
6 6 15
7 7 15
8 8 15
9 9 15
# kill background 'awk' and/or 'sleep infinity' when no longer needed
don't waste FS/OFS - use the built-in variables to take 2 out of the 5 needed :
echo $' \t abc xyz \t \a \n\n ' |
mawk 'gsub(/\7/, "<bell>", $!(NF = NF)) + gsub(/\10/,"<bs>") +\
gsub(/\11/,"<h-tab>")^_' OFS='<v-tab>' FS='\13' ORS='<newline>'
<h-tab> abc xyz <h-tab> <bell> <newline><newline> <newline>
To have NO constraint on the line length you could do something like this with GNU awk:
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(foo,bar)
print
}'
That will read and process the input 100 chars at a time no matter which chars are present, whether it has newlines or not, and even if the input was one multi-terabyte line.
Replace gsub(foo,bar) with whatever substitution(s) you have in mind, e.g.:
$ printf '\a,\b,\t,\v' |
awk -v RS='.{1,100}' -v ORS= '{
$0 = RT
gsub(/\a/,"<bell>")
gsub(/\b/,"<backspace>")
gsub(/\t/,"<horizontal-tab>")
gsub(/\v/,"<vertical-tab>")
print
}'
<bell>,<backspace>,<horizontal-tab>,<vertical-tab>
and of course it'd be trivial to pass a list of old and new strings to awk rather than hardcoding them, you'd just have to sanitize any regexp or backreference metachars before calling gsub().

Format a string to have the same number of characters in a shell posix script

I have various variables and what I want to do is to print them all with the same width (character wise). To achieve that, I first need to discover which is the longest string and add one to it,
and then print the shorter ones with that width, padding with spaces.
well ideally i want the output to be like
IFROGZ FREE REIN 2 "00:00:00:00:B5:C8"
Mi Phone "A4:50:46:AC:32:59"
realme Watch "D8:CA:8E:CD:5D:7C"
where in case of a device being connected on the left 2 of the 4 spaces become asterix(*) but this question is more adressed to the right padding/formating
printf %15s $something ' ' $isuppose, doesn't work which would be the ideal solution, but how do I find the width to put there?
A very important thing is it needs to be POSIX compliant scripting.
Here's where I'm at, but the code at this point is very redudant because I've tried brute-forcing the solution.
inc=$#
inc=$((inc-1))
demon=$(eval printf \"\$$inc\");
inc=$((inc-2))
tellar=$(eval printf \"\$$inc\");
demon=${demon:1:-1}$tella$tellar
inc=$((inc-1))
while (( $inc >= 2 )); do
aussie=$(eval printf \"\$$inc\");
inc=$((inc-2))
tellar=$(eval printf \"\$$inc\");
inc=$((inc+2))
demon=$(printf "%s»%"$smoll"s%s" $demon ${aussie:1:-1} " " $tellar);
inc=$((inc-3))
done
demon=$(echo $demon | sed -E "s/»/`space=${#demon}; while (( smoll > i++ )); do ( printf " " ); done; unset space;`\n/g")
Here's the current input and output although the input is "wrong".
Given a set of variables, the maximum width can be easily calculated with:
setwidth(){
width=0
for str in "$#"; do
[ $width -lt ${#str} ] && width=${#str}
done
}
We store the result in a global variable width for later use in a printf format string.
Example of use:
var1="123"
var2=" 2345"
var3="123456 89"
testprint(){
setwidth "$#"
echo right-justified:
printf "\055 %${width}s |\n" "$#"
echo
echo left-justified:
printf "\055 %-${width}s |\n" "$#"
}
testprint "$var1" "$var2" "$var3"
giving:
right-justified:
- 123 |
- 2345 |
- 123456 89 |
left-justified:
- 123 |
- 2345 |
- 123456 89 |

How to write a bash function that can detect if a given input ends in Kilobytes `K` or Megabytes `M`?

I have a bash function that is currently set up as:
MB=$(( $(echo $(FUNCTION_THAT_RETURNS_Kb_OR_Mb) | cut -d "K" -f 1 | sed 's/^.*- //') / 1000 ))
where the middle portion echo $(FUNCTION_THAT_RETURNS_Kb_OR_Mb) returns a value that ends in K or M, (for example: 515223 K or 36326 M) for Kilobytes or Megabytes. I currently have designed the function to strip the trailing units indicator for K, and then divide by 1000 to convert to megabytes. However, when the inside part of it ends in M, it fails. How can I write a function that detects if its in kilobytes or megabytes?
Don't reinvent the wheel - there is numfmt:
function_that_returns_Kb_or_Mb() { echo "515223 K"; }
mb=$(function_that_returns_Kb_or_Mb | numfmt -d '' --from=iec --to-unit=Mi)
# mb=504
function_that_returns_Kb_or_Mb() { echo "36326 M"; }
mb=$(function_that_returns_Kb_or_Mb | numfmt -d '' --from=iec --to-unit=Mi)
# mb=36326
Notes:
echo $(FUNCTION_THAT_RETURNS_Kb_OR_Mb) is a useless use of echo. It's like echo $(echo $(echo $(...)))). Just FUNCTION_THAT_RETURNS_Kb_OR_Mb | blabla.
By convention UPPERCASE VARIABLES are used for exported variables, like PATH COLUMNS UID PWD etc. - use lower case identifiers in your scripts.
I assumed input and output is using IEC scale, for SI scale use --from=si --to-unit=M.

How can I get printf to produce "+ 123" instead of " +123"?

I want to print a number with a certain field width for the digits, have the digits right-aligned, and print a sign indicator - not right before the digits, but rather before the spacing. Thus
$ magic -123 7
- 123
rather than
$ magic -123 7
-123
Can I do that with the GNU coreutils version of the printf utility? Other versions of it perhaps?
Note: To be clear, the solution should work for any field spacing and any value, e.g.:
There might be zero, one or many spaces
The number might "overflow" the specified width
Simply transform the output:
printf %+d 12 | sed 's/[+-]/\0 /'
+ 12
To directly answer your question, I do not believe that you can, with the GNU coreutils version of the printf, have space padding be inserted between the sign character and the nonzero digits of the number. printf seems to always group the sign with the unpadded digits, placing any additional space padding to the left of the sign.
You can use a function called magic like this using pure shell utilities:
magic() {
# some sanity checks to make sure you get $1 and $2
[[ $2 -lt 0 ]] && printf "-" || printf "+"
printf "%${1}s\n" "${2#[+-]}"
}
Now use it as:
$> magic 5 120
+ 120
$> magic 5 120234
+120234
$> magic 5 -120234
-120234
$> magic 5 -120
- 120
$> magic 5 1
+ 1
$> magic 5 +120
+ 120
Based on #KarolyHorvath's suggestion, I suppose this should work:
printf "%+7d" 123 | sed -r 's/^( *)([+-])/\2\1/'
magic () {
local sign="+" number=$1 width=$2
if ((number < 0)); then
sign="-"
((number *= -1))
fi
printf '%s%*d\n' "$sign" "$((width - 1))" "$number"
}
or
magic () {
printf '%+*d\n' "$2" "$1" | sed -r 's/^( *)([+-])/\2\1/'
}
Uses the * in the format specification to take the field width from the arguments.

Prevent bc from auto truncating leading zeros when converting from hex to binary

I'm trying to convert a hex string to binary. I'm using:
echo "ibase=16; obase=2; $line" | BC_LINE_LENGTH=9999 bc
It is truncating the leading zeroes. That is, if the hex string is 4F, it is converted to 1001111 and if it is 0F, it is converted to 1111. I need it to be 01001111 and 00001111
What can I do?
The output from bc is correct; it simply isn't what you had in mind (but it is what the designers of bc had in mind). If you converted hex 4F to decimal, you would not expect to get 079 out of it, would you? Why should you get leading zeroes if the output base is binary? Short answer: you shouldn't, so bc doesn't emit them.
If you must make the binary output a multiple of 8 bits, you can add an appropriate number of leading zeroes using some other tool, such as awk:
awk '{ len = (8 - length % 8) % 8; printf "%.*s%s\n", len, "00000000", $0}'
Pure Bash solution (beside bc):
paddy()
{
how_many_bits=$1
read number
zeros=$(( $how_many_bits - ${#number} ))
for ((i=0;i<$zeros;i++)); do
echo -en 0
done && echo $number
}
Usage:
>bc <<< "obase=2;ibase=16; 20" | paddy 8
00100000
You can pipe to awk like this:
echo "ibase=16; obase=2; $line" | BC_LINE_LENGTH=9999 bc | awk '{ printf "%08d\n", $0 }'
You can do it in python:
line=4F
python -c "print ''.join([bin(int(i, 16))[2:].zfill(4) for i in '$line'])"
result:
'01001111'
What's frustrating is the bc expects the input to be zero padded but doesn't provide a similar output option. Here's another alternative using sed:
sed 's_0_0000_g; s_1_0001_g; s_2_0010_g; s_3_0011_g;
s_4_0100_g; s_5_0101_g; s_6_0110_g; s_7_0111_g;
s_8_1000_g; s_9_1001_g; s_[aA]_1010_g; s_[bB]_1011_g;
s_[cC]_1100_g; s_[dD]_1101_g; s_[eE]_1110_g; s_[fF]_1111_g;'
You can use seq and sed to help you pād:
function paddington(){
PADDLE=8; while read IN; do
seq -f '0' -s '' 1 $PADDLE | \
sed "s/0\{${#IN}\}\$/$IN/"
done
}
bc <<< "ibase=16; obase=2; 4F; 1E; 0F" | paddington
The output:
01001111
00011110
00001111
You can use printf to left-pad the result with zeros if the result's length is not a multiple of four:
hex_nr=2C8B; hex_len=${#hex_nr}; binary_nr=$(bc <<< "obase=2;ibase=16;$hex_nr"); \
bin_length=$(( hex_len * 4 )); printf "%0${bin_length}d\n" $binary_nr
will result in
0010110010001011
instead of bc's output of
10110010001011

Resources