ORD and CHR a file in Bash

ORD and CHR a file in Bash - bash

I build ord and chr functions and they work just fine.
But if I take a file that contains \n, for example:
hello
CHECK THIS HIT
YES
when I ord everything I don't get any new line values. Why is that? I'm writing in Bash.
Here is the code that I am using:
function ord {
ordr="`printf "%d\n" \'$1`"
}
TEXT="`cat $1`"
for (( i=0; i<${#TEXT}; i++ ))
do
ord "${TEXT:$i:1}"
echo "$ordr"
done

Your ord function is really weird. Maybe it would be better to write it as:
function ord {
printf -v ordr "%d" "'$1"
}
Then you would use it as:
TEXT=$(cat "$1")
for (( i=0; i<${#TEXT}; i++ )); do
ord "${TEXT:$i:1}"
printf '%s\n' "$ordr"
done
This still leaves two problems: you won't be able to have null bytes and you won't see trailing newlines. For example (I called your script banana and chmod +x banana):
$ ./banana <(printf 'a\0b\n')
97
98
Two problems show here: the null byte is removed from Bash in the TEXT=$(cat "$1") part, as a Bash variable can't contain null bytes. Moreover, this step also trims trailing newlines.
A more robust approach would be to use read:
while IFS= read -r -n 1 -d '' char; do
ord "$char"
printf '%s\n' "$ordr"
done < "$1"
With this modification:
$ ./banana <(printf 'a\0b\n')
97
0
98
10
Note that this script will depend on your locale. With my locale (LANG="en_US.UTF-8):
$ ./banana <(printf 'a\0ℂ\n')
97
0
8450
10
whereas:
$ LANG= ./banana <(printf 'a\0ℂ\n')
97
0
226
132
130
10
That's to show you that Bash doesn't read bytes, but characters. So depending on how you want Bash to treat your data, set LANG accordingly.
If your script only does that, it's much simpler to not use an ord function at all:
#!/bin/bash
while IFS= read -r -n 1 -d '' char; do
printf '%d\n' "'$char"
done < "$1"
It's that simple!

Related

Hex to decimal conversion in bash without using gawk

Input:
cat test1.out
12 , maze|style=0x48570006, column area #=0x7, location=0x80000d
13 , maze|style=0x48570005, column area #=0x7, location=0x80aa0d
....
...
..
.
Output needed:
12 , maze|style=0x48570006, column area #=0x7, location=8388621 <<<8388621 is decimal of 0x80000d
....
I want to convert just the last column to decimal.
I cannot use gawk as it is not available in our company machines everywhere.
Tried using awk --non-decimal-data but it didnt work also.
Wondering if just printf command can work on flipping the last word from hex to decimal.
Any other ideas that you can suggest?

There's no need for awk or any other external commands here: bash's native math operation handle hexadecimal values correctly when in an arithmetic context (this is why echo $((0xff)) emits 255).
#!/usr/bin/env bash
# ^^^^- must be really bash, not /bin/sh
location_re='location=(0x[[:xdigit:]]+)([[:space:]]|$)'
while read -r line; do
if [[ $line =~ $location_re ]]; then
hex=${BASH_REMATCH[1]}
dec=$(( $hex ))
printf '%s\n' "${line/location=$hex/location=$dec}"
else
printf '%s\n' "$line"
fi
done
You can see this running at https://ideone.com/uN7qNY

Considering the case strtonum() function is not available, how about:
#!/bin/bash
awk -F'location=0x' '
function hex2dec(str,
i, x, c, tab) {
for (i = 0; i <= 15; i++) {
tab[substr("0123456789ABCDEF", i + 1, 1)] = i;
}
x = 0
for (i = 1; i <= length(str); i++) {
c = toupper(substr(str, i, 1))
x = x * 16 + tab[c]
}
return x
}
{
print $1 "location=" hex2dec($2)
}
' test1.out
where hex2dec() is a homemade substituion of strtonum().

Wait, can't you just use printf in other awks? It won't work with gawk but it does with other awks, right? For example with mawk:
$ mawk 'BEGIN{FS=OFS="="}{$NF=sprintf("%d", $NF);print}' file
12 , maze|style=0x48570006, column area #=0x7, location=8388621
13 , maze|style=0x48570005, column area #=0x7, location=8432141
I tested with mawk, awk-20070501, awk-20121220 and Busybox awk.
Discarded after edit but left for comments' sake:
Using rev and cut to extract around the last = and printf for hex2dec conversion:
$ while IFS='' read -r line || [[ -n "$line" ]]
do
printf "%s=%d\n" "$(echo "$line" | rev | cut -d = -f 2- | rev)" \
$(echo "$line" | rev | cut -d = -f 1 | rev)
done < file
Output:
12 , maze|style=0x48570006, column area #=0x7, location=8388621
13 , maze|style=0x48570005, column area #=0x7, location=8432141

If you have Perl installed, not having Gawk is rather inconsequential.
perl -pe 's/location=\K0x([0-9a-f]+)/ hex($1) /e' file

This might work for you (GNU sed and Bash):
sed 's/\(.*location=\)\(0x[0-9a-f]\+\)/echo "\1$((\2))"/Ie' file
Use pattern matching and back references to split each line and then evaluate an echo command.
Alternative:
sed 's/\(.*location=\)\(0x[0-9a-f]\+\)/echo "\1$((\2))"/I' file | sh

BASH_REMATCH array info :
http://molk.ch/tips/gnu/bash/rematch.html
quintessential principe :
[[ string =~ regexp ]]
[[ "abcdef" =~ (b)(.)(d)e ]]
If the 'string' matches 'regexp',
.. the matched part of the string is stored in the BASH_REMATCH array.
# Now:
# BASH_REMATCH[0]=bcde # as the total match
# BASH_REMATCH[1]=b # as the 1'th captured group
# BASH_REMATCH[2]=c # as ...
# BASH_REMATCH[3]=d
enjoy !

Bash's native math operation handles hexadecimal values correctly anytime.
Example:
echo $(( 0xff))
255
printf '%d' 0xf0
240

How to read null terminated strings in pairs using bash

Let say, have a command genpairs which generates null-terminated strings.
key1 \0 val1 \0 key2 \0 val2 \0
Want read the above input into bash variables in pairs. The following not works for me:
genpairs() { #for the demo
printf "%d\0x\0" 1
printf "%d\0y\0" 2
printf "%d\0z\0" 3
}
#the above generates 1 \0 x \0 2 \0 y \0 3 \0 z \0 etc...
while IFS= read -r -d '' key val; do
echo "key:[$key] val:[$val]"
done < <(genpairs)
prints
key:[1] val:[]
key:[x] val:[]
key:[2] val:[]
key:[y] val:[]
key:[3] val:[]
key:[z] val:[]
e.g. the read somewhat doesn't split on the $'\0' into two variables.
The wanted output:
key:[1] val:[x]
key:[2] val:[y]
key:[3] val:[z]
How to read null-terminated input into multiple variables?
EDITED the OP's question - added a better demo - x y z
I can solve it as:
n=0
while IFS= read -r -d '' inp; do
if (( n % 2 ))
then
val="$inp"
echo "key:[$key] val:[$val]"
else
key="$inp"
fi
let n++
done < <(genpairs)
This prints the
key:[1] val:[x]
key:[2] val:[y]
key:[3] val:[z]
but it looks to me really terrible solution...

Just use two read statements:
while IFS= read -r -d '' key && IFS= read -r -d '' val; do
echo "key:[$key] val:[$val]"
done < <(genpairs)
Using Bash≥4.4, you can also use mapfile with its -d switch:
while mapfile -n 2 -d '' ary && ((${#ary[#]}>=2)); do
echo "key:[${ary[0]}] val:[${ary[1]}]"
done < <(genpairs)

How to cut and assign the string to a dynamic array inside the for loop

This is what i have done to perform this function but I am not getting what i want.
#!/bin/sh
DIRECTIONPART1=4-7-9
for (( i=1; i<=3; i++ ))
do
x=`echo $DIRECTIONPART1| awk -F'-' '{print $i}'`
myarray[$i]=$x
done
for (( c=1; c<=3; c++ ))
do
echo ${myarray[$c]}
done
Problem we realised at this step
x=`echo $DIRECTIONPART1| awk -F'-' '{print $i}'`
Please help me in getting the result
This is what i get :
4-7-9
4-7-9
4-7-9
But I want this:
4
7
9

you are right with line of problem. The problem is that you cant use $i as variable in print. I have tried little workaround which worked for me:
x=`echo $DIRECTIONPART1| awk -F '-' -v var=$i '{print $var }'`
in all it looks like:
#!/bin/sh
DIRECTIONPART1=4-7-9
for (( i=1; i<=3; i++ ))
do
x=`echo $DIRECTIONPART1| awk -F '-' -v var=$i '{print $var }'`
myarray[$i]=$x
done
for (( c=1; c<=3; c++ ))
do
echo ${myarray[$c]}
done
with expected output:
# sh test.sh
4
7
9
#

The simplest portable way to get the desired output is to use $IFS (in a subshell):
#!/bin/sh
DIRECTIONPART1=4-7-9
(IFS=- && echo $DIRECTIONPART1)
The shell array would not work portably, as POSIX, ksh, and bash do not
agree on arrays. POSIX doesn't have any; ksh and bash use different syntax.
If you really want an array, I would suggest to do the entire thing in awk:
#!/bin/sh
DIRECTIONPART1=4-7-9
awk -v v=${DIRECTIONPART1} 'BEGIN {
n=split(v,a,"-")
for (i=1;i<=n;i++) {
print a[i]
}
}'
This will produce one line for each value in the string:
4
7
9
And if you want bash arrays, drop the #!/bin/sh, and do something like this:
#!/bin/bash
DIRECTIONPART1=4-7-9
A=( $(IFS=- && echo $DIRECTIONPART1) )
for ((i=0;i<=${#A[#]};i++))
do
echo ${A[i]}
done

Calling awk multiple times, or even once, is not the right thing to do. Use the bash built-in read to populate the array.
# Note that the quotes here are only necessary to
# work around a bug that was fixed in bash 4.3. It
# doesn't hurt to use them in any version, though.
$ IFS=- read -a myarray <<< "$DIRECTIONPART_1"
$ printf '%s\n' "${myarray[#]}"
4
7
9

[akshay#localhost tmp]$ bash test.sh
#!/usr/bin/env bash
DIRECTIONPART1=4-7-9
# Create array
IFS='-' read -a array <<< "$DIRECTIONPART1"
#To access an individual element:
echo "${array[0]}"
#To iterate over the elements:
for element in "${array[#]}"
do
echo "$element"
done
#To get both the index and the value:
for index in "${!array[#]}"
do
echo "$index ${array[index]}"
done
Output
[akshay#localhost tmp]$ bash test.sh
4
4
7
9
0 4
1 7
2 9
OR
[akshay#localhost tmp]$ cat test1.sh
#!/usr/bin/env bash
DIRECTIONPART1=4-7-9
array=(${DIRECTIONPART1//-/ })
for index in "${!array[#]}"
do
echo "$index ${array[index]}"
done
Output
[akshay#localhost tmp]$ bash test1.sh
0 4
1 7
2 9

Read a file by bytes in BASH

I need to read first byte of file I specified, then second byte,third and so on. How could I do it on BASH?
P.S I need to get HEX of this bytes

Full rewrite: september 2019!
A lot shorter and simplier than previous versions! (Something faster, but not so much)
Yes , bash can read and write binary:
Syntax:
LANG=C IFS= read -r -d '' -n 1 foo
will populate $foo with 1 binary byte. Unfortunately, as bash strings cannot hold null bytes ($\0), reading one byte once is required.
But for the value of byte read, I've missed this in man bash (have a look at 2016 post, at bottom of this):
printf [-v var] format [arguments]
...
Arguments to non-string format specifiers are treated as C constants,
except that ..., and if the leading character is a single or double
quote, the value is the ASCII value of the following character.
So:
read8() {
local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
read -r -d '' -n 1 _r8_car
printf -v $_r8_var %d "'"$_r8_car
}
Will populate submitted variable name (default to $OUTBIN) with decimal ascii value of first byte from STDIN
read16() {
local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
read8 _r16_lb &&
read8 _r16_hb
printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb ))
}
Will populate submitted variable name (default to $OUTBIN) with decimal value of first 16 bits word from STDIN...
Of course, for switching Endianness, you have to switch:
read8 _r16_hb &&
read8 _r16_lb
And so on:
# Usage:
# read[8|16|32|64] [varname] < binaryStdInput
read8() { local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
read -r -d '' -n 1 _r8_car
printf -v $_r8_var %d "'"$_r8_car ;}
read16() { local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
read8 _r16_lb && read8 _r16_hb
printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb )) ;}
read32() { local _r32_var=${1:-OUTBIN} _r32_lw _r32_hw
read16 _r32_lw && read16 _r32_hw
printf -v $_r32_var %d $(( _r32_hw<<16| _r32_lw )) ;}
read64() { local _r64_var=${1:-OUTBIN} _r64_ll _r64_hl
read32 _r64_ll && read32 _r64_hl
printf -v $_r64_var %d $(( _r64_hl<<32| _r64_ll )) ;}
So you could source this, then if your /dev/sda is gpt partitioned,
read totsize < <(blockdev --getsz /dev/sda)
read64 gptbackup < <(dd if=/dev/sda bs=8 skip=68 count=1 2>/dev/null)
echo $((totsize-gptbackup))
1
Answer could be 1 (1st GPT is at sector 1, one sector is 512 bytes. GPT Backup location is at byte 32. With bs=8 512 -> 64 + 32 -> 4 = 544 -> 68 blocks to skip... See GUID Partition Table at Wikipedia).
Quick small write function...
write () {
local i=$((${2:-64}/8)) o= v r
r=$((i-1))
for ((;i--;)) {
printf -vv '\%03o' $(( ($1>>8*(0${3+-1}?i:r-i))&255 ))
o+=$v
}
printf "$o"
}
This function default to 64 bits, little endian.
Usage: write <integer> [bits:64|32|16|8] [switchto big endian]
With two parameter, second parameter must be one of 8, 16, 32 or 64, to be bit length of generated output.
With any dummy 3th parameter, (even empty string), function will switch to big endian.
.
read64 foo < <(write -12345);echo $foo
-12345
...
First post 2015...
Upgrade for adding specific bash version (with bashisms)
With new version of printf built-in, you could do a lot without having to fork ($(...)) making so your script a lot faster.
First let see (by using seq and sed) how to parse hd output:
echo ;sed <(seq -f %02g 0 $(( COLUMNS-1 )) ) -ne '
/0$/{s/^\(.*\)0$/\o0337\o033[A\1\o03380/;H;};
/[1-9]$/{s/^.*\(.\)/\1/;H};
${x;s/\n//g;p}';hd < <(echo Hello good world!)
0 1 2 3 4 5 6 7
012345678901234567890123456789012345678901234567890123456789012345678901234567
00000000 48 65 6c 6c 6f 20 67 6f 6f 64 20 77 6f 72 6c 64 |Hello good world|
00000010 21 0a |!.|
00000012
Were hexadecimal part begin at col 10 and end at col 56, spaced by 3 chars and having one extra space at col 34.
So parsing this could by done by:
while read line ;do
for x in ${line:10:48};do
printf -v x \\%o 0x$x
printf $x
done
done < <( ls -l --color | hd )
Old original post
Edit 2 for Hexadecimal, you could use hd
echo Hello world | hd
00000000 48 65 6c 6c 6f 20 77 6f 72 6c 64 0a |Hello world.|
or od
echo Hello world | od -t x1 -t c
0000000 48 65 6c 6c 6f 20 77 6f 72 6c 64 0a
H e l l o w o r l d \n
shortly
while IFS= read -r -n1 car;do [ "$car" ] && echo -n "$car" || echo ; done
try them:
while IFS= read -rn1 c;do [ "$c" ]&&echo -n "$c"||echo;done < <(ls -l --color)
Explain:
while IFS= read -rn1 car # unset InputFieldSeparator so read every chars
do [ "$car" ] && # Test if there is ``something''?
echo -n "$car" || # then echo them
echo # Else, there is an end-of-line, so print one
done
Edit; Question was edited: need hex values!?
od -An -t x1 | while read line;do for char in $line;do echo $char;done ;done
Demo:
od -An -t x1 < <(ls -l --color ) | # Translate binary to 1 byte hex
while read line;do # Read line of HEX pairs
for char in $line;do # For each pair
printf "\x$char" # Print translate HEX to binary
done
done
Demo 2: We have both hex and binary
od -An -t x1 < <(ls -l --color ) | # Translate binary to 1 byte hex
while read line;do # Read line of HEX pairs
for char in $line;do # For each pair
bin="$(printf "\x$char")" # translate HEX to binary
dec=$(printf "%d" 0x$char) # translate to decimal
[ $dec -lt 32 ] || # if caracter not printable
( [ $dec -gt 128 ] && # change bin to a single dot.
[ $dec -lt 160 ] ) && bin="."
str="$str$bin"
echo -n $char \ # Print HEX value and a space
((i++)) # count printed values
if [ $i -gt 15 ] ;then
i=0
echo " - $str"
str=""
fi
done
done
New post on september 2016:
This could be usefull on very specific cases, ( I've used them to manualy copy GPT partitions between two disk, at low level, without having /usr mounted...)
Yes, bash could read binary!
... but only one byte, by one... (because `char(0)' couldn't be correctly read, the only way of reading them correctly is to consider end-of-file, where if no caracter is read and end of file not reached, then character read is a char(0)).
This is more a proof of concept than a relly usefull tool: there is a pure bash version of hd (hexdump).
This use recent bashisms, under bash v4.3 or higher.
#!/bin/bash
printf -v ascii \\%o {32..126}
printf -v ascii "$ascii"
printf -v cntrl %-20sE abtnvfr
values=()
todisplay=
address=0
printf -v fmt8 %8s
fmt8=${fmt8// / %02x}
while LANG=C IFS= read -r -d '' -n 1 char ;do
if [ "$char" ] ;then
printf -v char "%q" "$char"
((${#char}==1)) && todisplay+=$char || todisplay+=.
case ${#char} in
1|2 ) char=${ascii%$char*};values+=($((${#char}+32)));;
7 ) char=${char#*\'\\};values+=($((8#${char%\'})));;
5 ) char=${char#*\'\\};char=${cntrl%${char%\'}*};
values+=($((${#char}+7)));;
* ) echo >&2 ERROR: $char;;
esac
else
values+=(0)
fi
if [ ${#values[#]} -gt 15 ] ;then
printf "%08x $fmt8 $fmt8 |%s|\n" $address ${values[#]} "$todisplay"
((address+=16))
values=() todisplay=
fi
done
if [ "$values" ] ;then
((${#values[#]}>8))&&fmt="$fmt8 ${fmt8:0:(${#values[#]}%8)*5}"||
fmt="${fmt8:0:${#values[#]}*5}"
printf "%08x $fmt%$((
50-${#values[#]}*3-(${#values[#]}>8?1:0)
))s |%s|\n" $address ${values[#]} ''""'' "$todisplay"
fi
printf "%08x (%d chars read.)\n" $((address+${#values[#]})){,}
You could try/use this, but don't try to compare performances!
time hd < <(seq 1 10000|gzip)|wc
1415 25480 111711
real 0m0.020s
user 0m0.008s
sys 0m0.000s
time ./hex.sh < <(seq 1 10000|gzip)|wc
1415 25452 111669
real 0m2.636s
user 0m2.496s
sys 0m0.048s
same job: 20ms for hd vs 2000ms for my bash script.
... but if you wanna read 4 bytes in a file header or even a sector address in an hard drive, this could do the job...

Did you try xxd? It gives hex dump directly, as you want..
For your case, the command would be:
xxd -c 1 /path/to/input_file | while read offset hex char; do
#Do something with $hex
done
Note: extract the char from hex, rather than while read line. This is required because read will not capture white space properly.

using read a single char can be read at a time as follows:
read -n 1 c
echo $c
[ANSWER]
Try this:
#!/bin/bash
# data file
INPUT=/path/to/input.txt
# while loop
while IFS= read -r -n1 char
do
# display one character at a time
echo "$char"
done < "$INPUT"
From this link
Second method,
Using awk, loop through char by char
awk '{for(i=1;i<=length;i++) print substr($0, i, 1)}' /home/cscape/Desktop/table2.sql
third way,
$ fold -1 /home/cscape/Desktop/table.sql | awk '{print $0}'
EDIT: To print each char as HEX number:
Suppose I have a file name file :
$ cat file
123A3445F
I have written a awk script (named x.awk) to that read char by char from file and print into HEX :
$ cat x.awk
#!/bin/awk -f
BEGIN { _ord_init() }
function _ord_init( low, high, i, t)
{
low = sprintf("%c", 7) # BEL is ascii 7
if (low == "\a") { # regular ascii
low = 0
high = 127
} else if (sprintf("%c", 128 + 7) == "\a") {
# ascii, mark parity
low = 128
high = 255
} else { # ebcdic(!)
low = 0
high = 255
}
for (i = low; i <= high; i++) {
t = sprintf("%c", i)
_ord_[t] = i
}
}
function ord(str, c)
{
# only first character is of interest
c = substr(str, 1, 1)
return _ord_[c]
}
function chr(c)
{
# force c to be numeric by adding 0
return sprintf("%c", c + 0)
}
{ x=$0; printf("%s , %x\n",$0, ord(x) )}
To write this script I used awk-documentation
Now, You can use this awk script for your work as follows:
$ fold -1 /home/cscape/Desktop/file | awk -f x.awk
1 , 31
2 , 32
3 , 33
A , 41
3 , 33
4 , 34
4 , 34
5 , 35
F , 46
NOTE: A value is 41 in HEX decimal. To print in decimal change %x to %d in last line of script x.awk.
Give it a Try!!

Yet another solution, using head, tail and printf:
for a in $( seq $( cat file.txt | wc -c ) ) ; do cat file.txt | head -c$a | tail -c1 | xargs -0 -I{} printf '%s %0X\n' {} "'{}" ; done
More readable:
#!/bin/bash
function usage() {
echo "Need file with size > 0"
exit 1
}
test -s "$1" || usage
for a in $( seq $( cat $1 | wc -c ) )
do
cat $1 | head -c$a | tail -c1 | \
xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

use read with -n option.
while read -n 1 ch; do
echo $ch
done < moemoe.txt

I have a suggestion to give, but would like a feedback from everybody and manly a personal advice from syntaxerror's user.
I don't know much about bash but I thought maybe it would be better to have "cat $1" stored in a variable.. but the problem is that echo command will also bring a small overhead right?
test -s "$1" || (echo "Need a file with size greater than 0!"; exit 1)
a=0
rfile=$(cat $1)
max=$(echo $rfile | wc -c)
while [[ $((++a)) -lt $max ]]; do
echo $rfile | head -c$a | tail -c1 | \
xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done
in my opinion it would have a better performance but i haven't perf'tested..

Although I rather wanted to expand Perleone's own post (as it was his basic concept!), my edit was rejected after all, and I was kindly adviced that this should be posted as a separate answer. Fair enough, so I will do that.
Considerations in short for the improvements on Perleone's original script:
seq would be totally overkill here. A simple while loop with a used as a (likewise simple) counter variable will do the job just fine (and much quicker too)
The max value, $(cat $1 | wc -c) must be assigned to a variable, otherwise it will be recalculated every time and make this alternate script run even slower than the one it was derived from.
There's no need to waste a function on a simple usage info line. However, it is necessary to know about the (mandatory) curly braces around two commands, for without the { }, the exit 1 command will be executed in either case, and the script interpreter will never make it to the loop. (Last note: ( ) will work too, but not in the same way! Parentheses will spawn a subshell, whilst curly braces will execute commands inside them in the current shell.)
#!/bin/bash
test -s "$1" || { echo "Need a file with size greater than 0!"; exit 1; }
a=0
max=$(cat $1 | wc -c)
while [[ $((++a)) -lt $max ]]; do
cat $1 | head -c$a | tail -c1 | \
xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

How to perform a for loop on each character in a string in Bash?

I have a variable like this:
words="这是一条狗。"
I want to make a for loop on each of the characters, one at a time, e.g. first character="这", then character="是", character="一", etc.
The only way I know is to output each character to separate line in a file, then use while read line, but this seems very inefficient.
How can I process each character in a string through a for loop?

You can use a C-style for loop:
foo=string
for (( i=0; i<${#foo}; i++ )); do
echo "${foo:$i:1}"
done
${#foo} expands to the length of foo. ${foo:$i:1} expands to the substring starting at position $i of length 1.

With sed on dash shell of LANG=en_US.UTF-8, I got the followings working right:
$ echo "你好嗎 新年好。全型句號" | sed -e 's/\(.\)/\1\n/g'
你
好
嗎
新
年
好
。
全
型
句
號
and
$ echo "Hello world" | sed -e 's/\(.\)/\1\n/g'
H
e
l
l
o
w
o
r
l
d
Thus, output can be looped with while read ... ; do ... ; done
edited for sample text translate into English:
"你好嗎 新年好。全型句號" is zh_TW.UTF-8 encoding for:
"你好嗎" = How are you[ doing]
" " = a normal space character
"新年好" = Happy new year
"。全型空格" = a double-byte-sized full-stop followed by text description

${#var} returns the length of var
${var:pos:N} returns N characters from pos onwards
Examples:
$ words="abc"
$ echo ${words:0:1}
a
$ echo ${words:1:1}
b
$ echo ${words:2:1}
c
so it is easy to iterate.
another way:
$ grep -o . <<< "abc"
a
b
c
or
$ grep -o . <<< "abc" | while read letter; do echo "my letter is $letter" ; done
my letter is a
my letter is b
my letter is c

I'm surprised no one has mentioned the obvious bash solution utilizing only while and read.
while read -n1 character; do
echo "$character"
done < <(echo -n "$words")
Note the use of echo -n to avoid the extraneous newline at the end. printf is another good option and may be more suitable for your particular needs. If you want to ignore whitespace then replace "$words" with "${words// /}".
Another option is fold. Please note however that it should never be fed into a for loop. Rather, use a while loop as follows:
while read char; do
echo "$char"
done < <(fold -w1 <<<"$words")
The primary benefit to using the external fold command (of the coreutils package) would be brevity. You can feed it's output to another command such as xargs (part of the findutils package) as follows:
fold -w1 <<<"$words" | xargs -I% -- echo %
You'll want to replace the echo command used in the example above with the command you'd like to run against each character. Note that xargs will discard whitespace by default. You can use -d '\n' to disable that behavior.
Internationalization
I just tested fold with some of the Asian characters and realized it doesn't have Unicode support. So while it is fine for ASCII needs, it won't work for everyone. In that case there are some alternatives.
I'd probably replace fold -w1 with an awk array:
awk 'BEGIN{FS=""} {for (i=1;i<=NF;i++) print $i}'
Or the grep command mentioned in another answer:
grep -o .
Performance
FYI, I benchmarked the 3 aforementioned options. The first two were fast, nearly tying, with the fold loop slightly faster than the while loop. Unsurprisingly xargs was the slowest... 75x slower.
Here is the (abbreviated) test code:
words=$(python -c 'from string import ascii_letters as l; print(l * 100)')
testrunner(){
for test in test_while_loop test_fold_loop test_fold_xargs test_awk_loop test_grep_loop; do
echo "$test"
(time for (( i=1; i<$((${1:-100} + 1)); i++ )); do "$test"; done >/dev/null) 2>&1 | sed '/^$/d'
echo
done
}
testrunner 100
Here are the results:
test_while_loop
real 0m5.821s
user 0m5.322s
sys 0m0.526s
test_fold_loop
real 0m6.051s
user 0m5.260s
sys 0m0.822s
test_fold_xargs
real 7m13.444s
user 0m24.531s
sys 6m44.704s
test_awk_loop
real 0m6.507s
user 0m5.858s
sys 0m0.788s
test_grep_loop
real 0m6.179s
user 0m5.409s
sys 0m0.921s

I believe there is still no ideal solution that would correctly preserve all whitespace characters and is fast enough, so I'll post my answer. Using ${foo:$i:1} works, but is very slow, which is especially noticeable with large strings, as I will show below.
My idea is an expansion of a method proposed by Six, which involves read -n1, with some changes to keep all characters and work correctly for any string:
while IFS='' read -r -d '' -n 1 char; do
# do something with $char
done < <(printf %s "$string")
How it works:
IFS='' - Redefining internal field separator to empty string prevents stripping of spaces and tabs. Doing it on a same line as read means that it will not affect other shell commands.
-r - Means "raw", which prevents read from treating \ at the end of the line as a special line concatenation character.
-d '' - Passing empty string as a delimiter prevents read from stripping newline characters. Actually means that null byte is used as a delimiter. -d '' is equal to -d $'\0'.
-n 1 - Means that one character at a time will be read.
printf %s "$string" - Using printf instead of echo -n is safer, because echo treats -n and -e as options. If you pass "-e" as a string, echo will not print anything.
< <(...) - Passing string to the loop using process substitution. If you use here-strings instead (done <<< "$string"), an extra newline character is appended at the end. Also, passing string through a pipe (printf %s "$string" | while ...) would make the loop run in a subshell, which means all variable operations are local within the loop.
Now, let's test the performance with a huge string.
I used the following file as a source:
https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt
The following script was called through time command:
#!/bin/bash
# Saving contents of the file into a variable named `string'.
# This is for test purposes only. In real code, you should use
# `done < "filename"' construct if you wish to read from a file.
# Using `string="$(cat makefiles.txt)"' would strip trailing newlines.
IFS='' read -r -d '' string < makefiles.txt
while IFS='' read -r -d '' -n 1 char; do
# remake the string by adding one character at a time
new_string+="$char"
done < <(printf %s "$string")
# confirm that new string is identical to the original
diff -u makefiles.txt <(printf %s "$new_string")
And the result is:
$ time ./test.sh
real 0m1.161s
user 0m1.036s
sys 0m0.116s
As we can see, it is quite fast.
Next, I replaced the loop with one that uses parameter expansion:
for (( i=0 ; i<${#string}; i++ )); do
new_string+="${string:$i:1}"
done
The output shows exactly how bad the performance loss is:
$ time ./test.sh
real 2m38.540s
user 2m34.916s
sys 0m3.576s
The exact numbers may very on different systems, but the overall picture should be similar.

I've only tested this with ascii strings, but you could do something like:
while test -n "$words"; do
c=${words:0:1} # Get the first character
echo character is "'$c'"
words=${words:1} # trim the first character
done

It is also possible to split the string into a character array using fold and then iterate over this array:
for char in `echo "这是一条狗。" | fold -w1`; do
echo $char
done

The C style loop in #chepner's answer is in the shell function update_terminal_cwd, and the grep -o . solution is clever, but I was surprised not to see a solution using seq. Here's mine:
read word
for i in $(seq 1 ${#word}); do
echo "${word:i-1:1}"
done

#!/bin/bash
word=$(echo 'Your Message' |fold -w 1)
for letter in ${word} ; do echo "${letter} is a letter"; done
Here is the output:
Y is a letter
o is a letter
u is a letter
r is a letter
M is a letter
e is a letter
s is a letter
s is a letter
a is a letter
g is a letter
e is a letter

To iterate ASCII characters on a POSIX-compliant shell, you can avoid external tools by using the Parameter Expansions:
#!/bin/sh
str="Hello World!"
while [ ${#str} -gt 0 ]; do
next=${str#?}
echo "${str%$next}"
str=$next
done
or
str="Hello World!"
while [ -n "$str" ]; do
next=${str#?}
echo "${str%$next}"
str=$next
done

sed works with unicode
IFS=$'\n'
for z in $(sed 's/./&\n/g' <(printf '你好嗎')); do
echo hello: "$z"
done
outputs
hello: 你
hello: 好
hello: 嗎

Another approach, if you don't care about whitespace being ignored:
for char in $(sed -E s/'(.)'/'\1 '/g <<<"$your_string"); do
# Handle $char here
done

Another way is:
Characters="TESTING"
index=1
while [ $index -le ${#Characters} ]
do
echo ${Characters} | cut -c${index}-${index}
index=$(expr $index + 1)
done

fold and while read are great for the job as shown in some answers here. Contrary to those answers, I think it's much more intuitive to pipe in the order of execution:
echo "asdfg" | fold -w 1 | while read c; do
echo -n "$c "
done
Outputs: a s d f g

I share my solution:
read word
for char in $(grep -o . <<<"$word") ; do
echo $char
done

TEXT="hello world"
for i in {1..${#TEXT}}; do
echo ${TEXT[i]}
done
where {1..N} is an inclusive range
${#TEXT} is a number of letters in a string
${TEXT[i]} - you can get char from string like an item from an array

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ORD and CHR a file in Bash - bash

Related

Hex to decimal conversion in bash without using gawk

How to read null terminated strings in pairs using bash

How to cut and assign the string to a dynamic array inside the for loop

Read a file by bytes in BASH

How to perform a for loop on each character in a string in Bash?

Categories

Resources