Padded printf format strings not adding enough padding with multi-byte characters - bash

I often use printf inside shell scripts to make some nice aligned outputs
The problem is, everytime there is an accent (éèà) in the printed string, it shifts the following string 1 step back.
Example :
printf "%-10s %s\n" "toto" "test"
printf "%-10s %s\n" "titi" "test"
printf "%-10s %s\n" "tété" "test"
printf "%-10s %s\n" "toto" "test"
Expected :
toto test
titi test
tété test
toto test
Got :
toto test
titi test
tété test
toto test
Does someone have an explanation on this and what can I do to make printf doing it right with special characters?
Thank you for your help

Does someone have an explanation on this
é is character encoded with two bytes.
what can I do to make printf doing it right with special characters?
Design your own method of padding that would take into account utf-8s. Ideally I believe a tool like wprintf or making %Ls format specifier call wcwidth() to determine character width or something similar would be welcomed and usefull.
As of now at least my bash when calculating string length takes utf-8 chars into account. You could insert the padding yourself:
printf "%-10s %s\n" "titi" "test";
s="tété";
# (echo -n "$s" | wc -c) is 6 , but ${#s} is 4!
printf "%s%*s %s\n" "$s" "$((10-${#s}))" "" "test"

Adapted my answer from https://unix.stackexchange.com/a/592479/310674
#!/usr/bin/env bash
align_left(){ printf %s%\*s "${2:0:$1}" $(($1-${#2})) '';}
printf '%s %s\n' \
"$(align_left 10 "toto")" "test" \
"$(align_left 10 "titi")" "test" \
"$(align_left 10 "tété")" "test" \
"$(align_left 10 "têtu")" "test"
Output:
toto test
titi test
tété test
têtu test

But you can use other tool to print your report in that manner. Following example uses awk:
echo "toto" | awk '{printf "%-10s test\n", $1}'
echo "tété" | awk '{printf "%-10s test\n", $1}'
echo "titi" | awk '{printf "%-10s test\n", $1}'
EDIT:
The following statement was partially wrong:
printf might not be part of bash, but coreutils. Coreutils have a long history with multibyte characters - https://crashcourse.housegordon.org/coreutils-multibyte-support.html.
As noted in a comment by #charles-duffy - printf, in this case, is shell builtin. You can check it with:
[Alex#NormandySR2 ~]$ type printf
printf is a shell builtin
I also agree with the fact that most shell implements their own printf. I checked the following:
fish
bash
zsh
tcsh
ksh
dash
oil
All of them uses printf builtin that can differ in details. So my assumption about printf as part of coreutils, in this case, was wrong.

Related

How to parse multiple line output as separate variables

I'm relatively new to bash scripting and I would like someone to explain this properly, thank you. Here is my code:
#! /bin/bash
echo "first arg: $1"
echo "first arg: $2"
var="$( grep -rnw $1 -e $2 | cut -d ":" -f1 )"
var2=$( grep -rnw $1 -e $2 | cut -d ":" -f1 | awk '{print substr($0,length,1)}')
echo "$var"
echo "$var2"
The problem I have is with the output, the script I'm trying to write is a c++ function searcher, so upon launching my script I have 2 arguments, one for the directory and the second one as the function name. This is how my output looks like:
first arg: Projekt
first arg: iseven
Projekt/AX/include/ax.h
Projekt/AX/src/ax.cpp
h
p
Now my question is: how do can I save the line by line output as a variable, so that later on I can use var as a path, or to use var2 as a character to compare. My plan was to use IF() statements to determine the type, idea: IF(last_char == p){echo:"something"}What I've tried was this question: Capturing multiple line output into a Bash variable and then giving it an array. So my code looked like: "${var[0]}". Please explain how can I use my line output later on, as variables.
I'd use readarray to populate an array variable just in case there's spaces in your command's output that shouldn't be used as field separators that would end up messing up foo=( ... ). And you can use shell parameter expansion substring syntax to get the last character of a variable; no need for that awk bit in your var2:
#!/usr/bin/env bash
readarray -t lines < <(printf "%s\n" "Projekt/AX/include/ax.h" "Projekt/AX/src/ax.cpp")
for line in "${lines[#]}"; do
printf "%s\n%s\n" "$line" "${line: -1}" # Note the space before the -1
done
will display
Projekt/AX/include/ax.h
h
Projekt/AX/src/ax.cpp
p

How to print "-" using printf

I am trying to print "-" multiple times using printf. I am using the below command to print the same character multiple times, which works fine for all except "-".
printf "`printf '=%.0s' {1..30}` \n"
When I try to do the same for "-", it gives error.
printf "`printf '-%.0s' {1..30}` \n"
bash: printf: -%: invalid option
It is trying to take it as user-passed option. How do I work around this?
Pass -- before everything else to each printf invocation:
printf -- "`printf -- '-%.0s' {1..30}` \n"
Like many commands, printf takes options in the form of tokens starting with - (although -v and -- are the only options I know). This interferes with your argument string, as printf is instead trying to parse -%.0s as an option. For that case however, it supports the -- option (like many other commands), which terminates option parsing and passes through all following arguments literally.
Are you trying to print 30 hyphens? This is how I do that:
printf "%*s\n" 30 "" | sed 's/ /-/g'
The printf command prints a line with 30 spaces, then use sed to turn them all into hyphens
This can be encapsulated into a function:
ruler() { printf "%*s\n" "$1" "" | sed "s/ /${2//\//\\\/}/g"; }
And then you can do stuff like:
ruler $(tput cols) =

Awk dealing with variables containing whitespace

I want to pass a string with whitespaces as a variable to awk from a Bash script, but independent of how I escape it, awk will complain. Please consider the following example:
list1:
one
two
three
four
The output:
[user#actual ~]$ ./dator.sh list1
1470054866 two (...)
A working script:
CMD='awk'
DATE=$(date +%s)
VARIABLES="-v time=$DATE"
SCRIPT='NR>=2 {printf "%s %s\n", time, $1;}'
$CMD $VARIABLES "$SCRIPT" $1
And only changing the date-formatting will break it:
CMD='awk'
DATE=$(date -u)
VARIABLES="-v time=$DATE"
SCRIPT='NR>=2 {printf "%s %s\n", time, $1;}'
$CMD $VARIABLES "$SCRIPT" $1
How should I escape it?
Every kind of quoting I'm aware of doesn't work.
Translating and inserting escaping "\" before whitespace doesn't make a difference.
Printing the variable via a function as suggested in another solution didn't work.
Arrays were designed for storing arbitrary arguments.
current_date=$(date +%u)
variables=( -v "time=$current_date")
script='NR >= 2 {printf "%s %s\n", time, $1;}'
awk "${variables[#]}" "$script" "$1"

Bash: Keeping indentation during interpolation

I have a variable containing a multiline string.
I am going to interpolate this variable into another multiline echoed string, this echoed string has indentation.
Here's an example:
ip_status=`ip -o addr | awk 'BEGIN { printf "%-12s %-12s %-12s\n", "INTERFACE", "PROTOCOL", "ADDRESS"
printf "%-12s %-12s %-12s\n", "---------", "--------", "-------" }
{ printf "%-12s %-12s %-12s\n", $2, $3, $4 }'`
echo -e "
->
$ip_status
->
"
When running that, the first line of $ip_status is left justified against the ->, however the subsequent lines are not justified against the ->.
It's easier to see if you run that in your bash. This is the output:
->
INTERFACE PROTOCOL ADDRESS
--------- -------- -------
lo inet 127.0.0.1/8
lo inet6 ::1/128
eth0 inet 10.0.2.15/24
eth0 inet6 fe80::a00:27ff:fed3:76c/64
->
I want all the lines in the $ip_status to be aligned with the ->, not just the first line.
You need to insert the indentation yourself. Bash comes with no feature for making text pretty, although there are some possibly useful utilities (column -t is frequently useful in this sort of application, for example).
Still, inserting indentation isn't too difficult. Here's one solution:
echo "
->
${ip_status//$'\n'/$'\n '}
->
"
Note: I removed the non-standard -e flag because it really isn't necessary.
Another alternative would be to apply the replacement on the entire output, using a tool like sed:
echo "
->
$ip_status
->
" | sed 's/^ */ /'
This second one has the possible advantage that it will tidy up the indentation, even if it were ragged as in the example. If you didn't want that effect, use 's/^/ /' instead.
Or a little shell function whose first argument is the desired indent and whose remaining arguments are indented and concatenated with a newline after each one:
indent() {
local s=$(printf '%*s' $1 "")
shift
printf "$s%s\n" "${#//$'\n'/$'\n'$s}"
}
indent 4 '->' "$ip_status" '->'
That might require some explanation:
printf accepts * as a length specifier, just like the C version. It means "use the corresponding argument as the numeric value". So local s=$(printf '%*s' $1 "") creates a string of spaces of length $1.
Also, printf repeats its format as often as necessary to consume all arguments. So the second printf applies an indent at the beginning and a newline at the end to each argument.
"${#/pattern/subst}" is a substitution applied to each argument in turn. Using two slashes at the beginning ("${#//pattern/subst}") makes it a repeated substitution.
$'\n' is a common syntax for interpreting C-style backslash escapes, implemented by bash and a variety of other shells. (But it's not available in a minimal posix standard shell.)
So "${#//$'\n'/$'\n'$s}" inserts $s -- that is, the desired indentation -- after every newline in each argument.
echo " ->"
while IFS= read -r line
do
echo " $line"
done <<< "$ip_status"
echo " ->"
You can read the variable line by line and echo it with the number of spaces you need before it. I have used the accepted answer of this question.
To make it a function:
myfunction() {
echo " ->"
while IFS= read -r line
do
echo " $line"
done <<< "$1"
echo " ->"
}
myfunction "$ip_status"
A simple form is to use readarray, process substitution and printf:
readarray -t ip_status < <(exec ip -o addr | awk 'BEGIN { printf "%-12s %-12s %-12s\n", "INTERFACE", "PROTOCOL", "ADDRESS"
printf "%-12s %-12s %-12s\n", "---------", "--------", "-------" }
{ printf "%-12s %-12s %-12s\n", $2, $3, $4 }')
printf ' %s\n' '->' "${ip_status[#]}" '->'
Reference: http://www.gnu.org/software/bash/manual/bashref.html

problem with bash echo function

how do i print using echo in bash so the row wont "jump" abit to the right cause of the length of the Variable can u please help me with a command that do so
Try using the printf shell command:
$ printf "%5d %s\n" 1 test
1 test
$ printf "%5d %s\n" 123 another
123 another
To trim leading whitespace inside a variable you can use Bash parameter expansion:
var=" value"
echo "${var#"${var%%[![:space:]]*}"}"
Use tabs to separate your columns.
echo -e "$var1\t$var2"
or, better, use printf to do it:
printf "%s\t%s\n" $var1 $var2
Or, as Greg Hewgill showed, use field widths (even with strings - the hyphen makes them left-aligned):
printf "%-6s %-8s %10s\n" abcde fghij 12345

Resources