I am having a file in the following format
Column1 Column2
str1 1
str2 2
str3 3
I want the columns to be rearranged. I tried below command
cut -f2,1 file.txt
The command doesn't reorder the columns. Any idea why its not working?
For the cut(1) man page:
Use one, and only one of -b, -c or -f. Each LIST is made up of
one
range, or many ranges separated by commas. Selected input is written
in the same order that it is read, and is written exactly once.
It reaches field 1 first, so that is printed, followed by field 2.
Use awk instead:
awk '{ print $2 " " $1}' file.txt
You may also combine cut and paste:
paste <(cut -f2 file.txt) <(cut -f1 file.txt)
via comments: It's possible to avoid process substitution and remove one instance of cut by doing:
paste file.txt file.txt | cut -f2,3
Using join:
join -t $'\t' -o 1.2,1.1 file.txt file.txt
Notes:
-t $'\t' In GNU join the more intuitive -t '\t' without the $ fails, (coreutils v8.28 and earlier?); it's probably a bug that a workaround like $ should be necessary. See: unix join separator char.
Even though there's just one file being worked on, join syntax requires two filenames. Repeating the file name allows join to perform the desired action.
For systems with low resources join offers a smaller footprint than some of the tools used in other answers:
wc -c $(realpath `which cut join sed awk perl`) | head -n -1
43224 /usr/bin/cut
47320 /usr/bin/join
109840 /bin/sed
658072 /usr/bin/gawk
2093624 /usr/bin/perl
using just the shell,
while read -r col1 col2
do
echo $col2 $col1
done <"file"
You can use Perl for that:
perl -ane 'print "$F[1] $F[0]\n"' < file.txt
-e option means execute the command after it
-n means read line by line (open the file, in this case STDOUT, and loop over lines)
-a means split such lines to a vector called #F ("F" - like Field). Perl indexes vectors starting from 0 unlike cut which indexes fields starting form 1.
You can add -F pattern (with no space between -F and pattern) to use pattern as a field separator when reading the file instead of the default whitespace
The advantage of running perl is that (if you know Perl) you can do much more computation on F than rearranging columns.
Just been working on something very similar, I am not an expert but I thought I would share the commands I have used. I had a multi column csv which I only required 4 columns out of and then I needed to reorder them.
My file was pipe '|' delimited but that can be swapped out.
LC_ALL=C cut -d$'|' -f1,2,3,8,10 ./file/location.txt | sed -E "s/(.*)\|(.*)\|(.*)\|(.*)\|(.*)/\3\|\5\|\1\|\2\|\4/" > ./newcsv.csv
Admittedly it is really rough and ready but it can be tweaked to suit!
Just as an addition to answers that suggest to duplicate the columns and then to do cut. For duplication, paste etc. will work only for files, but not for streams. In that case, use sed instead.
cat file.txt | sed s/'.*'/'&\t&'/ | cut -f2,3
This works on both files and streams, and this is interesting if instead of just reading from a file with cat, you do something interesting before re-arranging the columns.
By comparison, the following does not work:
cat file.txt | paste - - | cut -f2,3
Here, the double stdin placeholder paste does not duplicate stdin, but reads the next line.
Using sed
Use sed with basic regular expression's nested subexpressions to capture and reorder the column content. This approach is best suited when there are a limited number of cuts to reorder columns, as in this case.
The basic idea is to surround interesting portions of the search pattern with \( and \), which can be played back in the replacement pattern with \# where # represents the sequential position of the subexpression in the search pattern.
For example:
$ echo "foo bar" | sed "s/\(foo\) \(bar\)/\2 \1/"
yields:
bar foo
Text outside a subexpression is scanned but not retained for playback in the replacement string.
Although the question did not discuss fixed width columns, we will discuss here as this is a worthy measure of any solution posed. For simplicity let's assume the file is space delimited although the solution can be extended for other delimiters.
Collapsing Spaces
To illustrate the simplest usage, let's assume that multiple spaces can be collapsed into single spaces, and the the second column values are terminated with EOL (and not space padded).
File:
bash-3.2$ cat f
Column1 Column2
str1 1
str2 2
str3 3
bash-3.2$ od -a f
0000000 C o l u m n 1 sp sp sp sp C o l u m
0000020 n 2 nl s t r 1 sp sp sp sp sp sp sp 1 nl
0000040 s t r 2 sp sp sp sp sp sp sp 2 nl s t r
0000060 3 sp sp sp sp sp sp sp 3 nl
0000072
Transform:
bash-3.2$ sed "s/\([^ ]*\)[ ]*\([^ ]*\)[ ]*/\2 \1/" f
Column2 Column1
1 str1
2 str2
3 str3
bash-3.2$ sed "s/\([^ ]*\)[ ]*\([^ ]*\)[ ]*/\2 \1/" f | od -a
0000000 C o l u m n 2 sp C o l u m n 1 nl
0000020 1 sp s t r 1 nl 2 sp s t r 2 nl 3 sp
0000040 s t r 3 nl
0000045
Preserving Column Widths
Let's now extend the method to a file with constant width columns, while allowing columns to be of differing widths.
File:
bash-3.2$ cat f2
Column1 Column2
str1 1
str2 2
str3 3
bash-3.2$ od -a f2
0000000 C o l u m n 1 sp sp sp sp C o l u m
0000020 n 2 nl s t r 1 sp sp sp sp sp sp sp 1 sp
0000040 sp sp sp sp sp nl s t r 2 sp sp sp sp sp sp
0000060 sp 2 sp sp sp sp sp sp nl s t r 3 sp sp sp
0000100 sp sp sp sp 3 sp sp sp sp sp sp nl
0000114
Transform:
bash-3.2$ sed "s/\([^ ]*\)\([ ]*\) \([^ ]*\)\([ ]*\)/\3\4 \1\2/" f2
Column2 Column1
1 str1
2 str2
3 str3
bash-3.2$ sed "s/\([^ ]*\)\([ ]*\) \([^ ]*\)\([ ]*\)/\3\4 \1\2/" f2 | od -a
0000000 C o l u m n 2 sp C o l u m n 1 sp
0000020 sp sp nl 1 sp sp sp sp sp sp sp s t r 1 sp
0000040 sp sp sp sp sp nl 2 sp sp sp sp sp sp sp s t
0000060 r 2 sp sp sp sp sp sp nl 3 sp sp sp sp sp sp
0000100 sp s t r 3 sp sp sp sp sp sp nl
0000114
Lastly although the question's example does not have strings of unequal length, this sed expression supports this case.
File:
bash-3.2$ cat f3
Column1 Column2
str1 1
string2 2
str3 3
Transform:
bash-3.2$ sed "s/\([^ ]*\)\([ ]*\) \([^ ]*\)\([ ]*\)/\3\4 \1\2/" f3
Column2 Column1
1 str1
2 string2
3 str3
bash-3.2$ sed "s/\([^ ]*\)\([ ]*\) \([^ ]*\)\([ ]*\)/\3\4 \1\2/" f3 | od -a
0000000 C o l u m n 2 sp C o l u m n 1 sp
0000020 sp sp nl 1 sp sp sp sp sp sp sp s t r 1 sp
0000040 sp sp sp sp sp nl 2 sp sp sp sp sp sp sp s t
0000060 r i n g 2 sp sp sp nl 3 sp sp sp sp sp sp
0000100 sp s t r 3 sp sp sp sp sp sp nl
0000114
Comparison to other methods of column reordering under shell
Surprisingly for a file manipulation tool, awk is not well-suited for cutting from a field to end of record. In sed this can be accomplished using regular expressions, e.g. \(xxx.*$\) where xxx is the expression to match the column.
Using paste and cut subshells gets tricky when implementing inside shell scripts. Code that works from the commandline fails to parse when brought inside a shell script. At least this was my experience (which drove me to this approach).
Expanding on the answer from #Met, also using Perl:
If the input and output are TAB-delimited:
perl -F'\t' -lane 'print join "\t", #F[1, 0]' in_file
If the input and output are whitespace-delimited:
perl -lane 'print join " ", #F[1, 0]' in_file
Here,
-e tells Perl to look for the code inline, rather than in a separate script file,
-n reads the input 1 line at a time,
-l removes the input record separator (\n on *NIX) after reading the line (similar to chomp), and add output record separator (\n on *NIX) to each print,
-a splits the input line on whitespace into array #F,
-F'\t' in combination with -a splits the input line on TABs, instead of whitespace into array #F.
#F[1, 0] is the array made up of the 2nd and 1st elements of array #F, in this order. Remember that arrays in Perl are zero-indexed, while fields in cut are 1-indexed. So fields in #F[0, 1] are the same fields as the ones in cut -f1,2.
Note that such notation enables more flexible manipulation of input than in some other answers posted above (which are fine for a simple task). For example:
# reverses the order of fields:
perl -F'\t' -lane 'print join "\t", reverse #F' in_file
# prints last and first fields only:
perl -F'\t' -lane 'print join "\t", #F[-1, 0]' in_file
Related
itsmejitu#itsmejitu:~$ numbers=(47 -78 12 45 6)
itsmejitu#itsmejitu:~$ printf "%d \n" ${numbers[#]} | sort -n
-78
6
12
45
47
itsmejitu#itsmejitu:~$ declare -a letters
itsmejitu#itsmejitu:~$ letters=(a c e z l s q a d c v)
itsmejitu#itsmejitu:~$ printf "%s \0" ${letters[#]} | sort -z | xargs -0n1
a
a
c
c
d
e
l
q
s
v
z
itsmejitu#itsmejitu:~$ printf "%s \n" ${letters[#]} | sort -z | xargs -0n1
a
c
e
z
l
s
q
a
d
c
v
Sorting integers is straightforward
I tried to do sorting of letters in bash. Couldn't do it, So my friend sent me this. He couldn't explain though. I looked through printf, xargs manuals. But the terms used there is beyond my understanding(Not a CS student).
Is there any simpler way to understand this?
thanks!!
In the first example, sort sees 5 different numbers separated by line feeds.
In the second example, sort and xargs see 11 different two-character strings (each has a trailing space) separated by null characters.
In the third example, sort and xargs see a single string (containing embedded line feeds and spaces) "separated" by a null character.
It might help to pipe the output of printf through hexdump -C or od to see what sort sees in each case.
Suppose I have the following data:
# all the numbers are their own number. I want to reshape exactly as below
0 a
1 b
2 c
0 d
1 e
2 f
0 g
1 h
2 i
...
And I would like to reshape the data such that it is:
0 a d g ...
1 b e h ...
2 c f i ...
Without writing a complex composition. Is this possible using the unix/bash toolkit?
Yes, trivially I can do this inside a language. The idea is NOT TO "just" do that. So if some cat X.csv | rs [magic options] sort of solution (and rs, or the bash reshape command, would be great, except it isn't working here on debian stretch) exists, that is what I am looking for.
Otherwise, an equivalent answer that involves a composition of commands or script is out of scope: already got that, but would rather not have it.
Using GNU datamash:
$ datamash -s -W -g 1 collapse 2 < file
0 a,d,g
1 b,e,h
2 c,f,i
Options:
-s sort
-W use whitespace (spaces or tabs) as delimiters
-g 1 group on the first field
collapse 2 print comma-separated list of values of the second field
To convert the tabs and commas to space characters, pipe the output to tr:
$ datamash -s -W -g 1 collapse 2 < file | tr '\t,' ' '
0 a d g
1 b e h
2 c f i
bash version:
function reshape {
local index number key
declare -A result
while read index number; do
result[$index]+=" $number"
done
for key in "${!result[#]}"; do
echo "$key${result[$key]}"
done
}
reshape < input
We just need to make sure input is in unix format
I have extended ascii chars in Oracle table data which I am able extract to a file using sqlplus with the \ escape character prefixed. I want to use nzload to load the exact same data into a netezza table.
nzload adds a couple of extra bytes when it encounters this char seq (c2bf)
in the extracted file data:
echo "PROFESSIONAL¿" | od -x
0000000 5052 4f46 4553 5349 4f4e 414c **c2bf** 0a00
after nzload:
echo "PROFESSIONAL¿" | od -x
0000000 5052 4f46 4553 5349 4f4e 414c **c382 c2bf**
on the nzload command line, I have these options:
-escapechar \ -ctrlchars
Can anyone provide any help with this?
I'm not very savvy with Unicode conversion issues, but I've done this to myself before, and I'll demonstrate what I think is happening.
I believe what you are seeing here is not an issue with loading special characters with nzload, rather it's an issue with how your display/terminal software is display the data and/or Netezza how is storing the character data. I suspect a double conversion to/from UTF-8 (the Unicode encoding that Netezza supports). Let's see if we can suss out which it is.
Here I am using PuTTY with the default (for me) Remote Character Set as Latin-1.
$ od -xa input.txt
0000000 5250 464f 5345 4953 4e4f 4c41 bfc2 000a
P R O F E S S I O N A L B ? nl
0000017
$ cat input.txt
PROFESSIONAL¿
Here we can see from od that the file has only the data we expect, however when we cat the file we see the extra character. If it's not in the file, then the character is likely coming from the display translation.
If I change the PuTTY settings to have UTF-8 be the remote character set, we see it this way:
$ od -xa input.txt
0000000 5250 464f 5345 4953 4e4f 4c41 bfc2 000a
P R O F E S S I O N A L B ? nl
0000017
$ cat input.txt
PROFESSIONAL¿
So, the same source data, but two different on-screen representations, which are, not coincidentally, the same as your two different outputs. The same data can be displayed at least two ways.
Now let's see how it loads into Netezza, once into a VARCHAR column, and again into an NVARCHAR column.
create table test_enc_vchar (col1 varchar(50));
create table test_enc_nvchar (col1 nvarchar(50));
$ nzload -db testdb -df input.txt -t test_enc_vchar -escapechar '\' -ctrlchars
Load session of table 'TEST_ENC_VCHAR' completed successfully
$ nzload -db testdb -df input.txt -t test_enc_nvchar -escapechar '\' -ctrlchars
Load session of table 'TEST_ENC_NVCHAR' completed successfully
The data loaded with no errors. Note while I specify the escapechar option for nzload, none of the characters in this specific sample of input data require escaping, nor are they escaped.
I will now use the rawtohex function from the SQL Extension Toolkit as an in-database tool like we've used od from the command line.
select rawtohex(col1) from test_enc_vchar;
RAWTOHEX
------------------------------
50524F46455353494F4E414CC2BF
(1 row)
select rawtohex(col1) from test_enc_nvchar;
RAWTOHEX
------------------------------
50524F46455353494F4E414CC2BF
(1 row)
At this point both columns seem to have exactly the same data as the input file. So far, so good.
What if we select the column? For the record, I am doing this in a PuTTY session with remote character set of UTF-8.
select col1 from test_enc_vchar;
COL1
----------------
PROFESSIONAL¿
(1 row)
select col1 from test_enc_nvchar;
COL1
---------------
PROFESSIONAL¿
(1 row)
Same binary data, but different display. If I then copy the output of each of those selects into echo piped to od,
$ echo PROFESSIONAL¿ | od -xa
0000000 5250 464f 5345 4953 4e4f 4c41 82c3 bfc2
P R O F E S S I O N A L C stx B ?
0000020 000a
nl
0000021
$ echo PROFESSIONAL¿ | od -xa
0000000 5250 464f 5345 4953 4e4f 4c41 bfc2 000a
P R O F E S S I O N A L B ? nl
0000017
Based on this output, I'd wager that you are loading your sample data, which I'd also wager is UTF-8, into a VARCHAR column rather than an NVARCHAR column. This is not, in of itself, a problem, but can have display/conversion issues down the line.
Generally speaking, you'd want to load UTF-8 data into NVARCHAR columns.
I am trying to replace values in a large space-delimited text-file and could not find a suitable answer for this specific problem:
Say I have a file "OLD_FILE", containing a header and approximately 2 million rows:
COL1 COL2 COL3 COL4 COL5
rs10 7 92221824 C A
rs1000000 12 125456933 G A
rs10000010 4 21227772 T C
rs10000012 4 1347325 G C
rs10000013 4 36901464 C A
rs10000017 4 84997149 T C
rs1000002 3 185118462 T C
rs10000023 4 95952929 T G
...
I want to replace the first value of each row with a corresponding value, using a large (2.8M rows) conversion table. In this conversion table, the first column lists the value I want to have replaced, and the second column lists the corresponding new values:
COL1_b36 COL2_b37
rs10 7_92383888
rs1000000 12_126890980
rs10000010 4_21618674
rs10000012 4_1357325
rs10000013 4_37225069
rs10000017 4_84778125
rs1000002 3_183635768
rs10000023 4_95733906
...
The desired output would be a file where all values in the first column have been changed according to the conversion table:
COL1 COL2 COL3 COL4 COL5
7_92383888 7 92221824 C A
12_126890980 12 125456933 G A
4_21618674 4 21227772 T C
4_1357325 4 1347325 G C
4_37225069 4 36901464 C A
4_84778125 4 84997149 T C
3_183635768 3 185118462 T C
4_95733906 4 95952929 T G
...
Additional info:
Performance is an issue (the following command takes approximately a year:
while read a b; do sed -i "s/\b$a\b/$b/g" OLD_FILE ; done < CONVERSION_TABLE
A complete match is necessary before replacing
Not every value in the OLD_FILE can be found in the conversion table...
...but every value that could be replaced, can be found in the conversion table.
Any help is very much appreciated.
Here's one way using awk:
awk 'NR==1 { next } FNR==NR { a[$1]=$2; next } $1 in a { $1=a[$1] }1' TABLE OLD_FILE
Results:
COL1 COL2 COL3 COL4 COL5
7_92383888 7 92221824 C A
12_126890980 12 125456933 G A
4_21618674 4 21227772 T C
4_1357325 4 1347325 G C
4_37225069 4 36901464 C A
4_84778125 4 84997149 T C
3_183635768 3 185118462 T C
4_95733906 4 95952929 T G
Explanation, in order of appearance:
NR==1 { next } # simply skip processing the first line (header) of
# the first file in the arguments list (TABLE)
FNR==NR { ... } # This is a construct that only returns true for the
# first file in the arguments list (TABLE)
a[$1]=$2 # So when we loop through the TABLE file, we add the
# column one to an associative array, and we assign
# this key the value of column two
next # This simply skips processing the remainder of the
# code by forcing awk to read the next line of input
$1 in a { ... } # Now when awk has finished processing the TABLE file,
# it will begin reading the second file in the
# arguments list which is OLD_FILE. So this construct
# is a condition that returns true literally if column
# one exists in the array
$1=a[$1] # re-assign column one's value to be the value held
# in the array
1 # The 1 on the end simply enables default printing. It
# would be like saying: $1 in a { $1=a[$1]; print $0 }'
This might work for you (GNU sed):
sed -r '1d;s|(\S+)\s*(\S+).*|/^\1\\>/s//\2/;t|' table | sed -f - file
You can use join:
join -o '2.2 1.2 1.3 1.4 1.5' <(tail -n+2 file1 | sort) <(tail -n+2 file2 | sort)
This drops the headers of both files, you can add it back with head -n1 file1.
Output:
12_126890980 12 125456933 G A
4_21618674 4 21227772 T C
4_1357325 4 1347325 G C
4_37225069 4 36901464 C A
4_84778125 4 84997149 T C
3_183635768 3 185118462 T C
4_95733906 4 95952929 T G
7_92383888 7 92221824 C A
Another way with join. Assuming the files are sorted on the 1st column:
head -1 OLD_FILE
join <(tail -n+2 CONVERSION_TABLE) <(tail -n+2 OLD_FILE) | cut -f 2-6 -d' '
But with data of this size you should consider using a database engine.
How can I add spaces between every character or symbol within a UTF-8 document? E.g. 123hello! becomes 1 2 3 h e l l o !.
I have BASH, OpenOffice.org, and gedit, if any of those can do that.
I don't care if it sometimes leaves extra spaces in places (e.g. 2 or 3 spaces in a single place is no problem).
Shortest sed version
sed 's/./& /g'
Output
$ echo '123hello!' | sed 's/./& /g'
1 2 3 h e l l o !
Obligatory awk version
awk '$1=$1' FS= OFS=" "
Output
$ echo '123hello!' | awk '$1=$1' FS= OFS=" "
1 2 3 h e l l o !
sed(1) can do this:
$ sed -e 's/\(.\)/\1 /g' < /etc/passwd
r o o t : x : 0 : 0 : r o o t : / r o o t : / b i n / b a s h
d a e m o n : x : 1 : 1 : d a e m o n : / u s r / s b i n : / b i n / s h
It works well on e.g. UTF-8 encoded Japanese content:
$ file japanese
japanese: UTF-8 Unicode text
$ sed -e 's/\(.\)/\1 /g' < japanese
E X I F 中 の 画 像 回 転 情 報 対 応 に よ り 、 一 部 画 像 ( 特 に 『
$
sed is ok but this is pure bash
string=hello
for ((i=0; i<${#string}; i++)); do
string_new+="${string:$i:1} "
done
Since you have bash, I am will assume that you have access to sed. The following command line will do what you wish.
$ sed -e 's:\(.\):\1 :g' < input.txt > output.txt
I like these solutions because they do not have a trailing space like the rest
here.
GNU awk:
echo 123hello! | awk NF=NF FS=
GNU awk:
echo 123hello! | awk NF=NF FPAT=.
POSIX awk:
echo 123hello! | awk '{while(a=substr($0,++b,1))printf b-1?FS a:a}'
This might work for you:
echo '1 23h ello ! ' | sed 's/\s*/ /g;s/^\s*\(.*\S\)\s*$/\1/;l'
1 2 3 h e l l o !$
1 2 3 h e l l o !
In retrospect a far better solution:
sed 's/\B/ /g' file
Replaces the space between letters with a space.
string='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
echo ${string} | sed -r 's/(.{1})/\1 /g'
Pure POSIX Shell version:
addspace() {
__addspace_str="$1"
while [ -n "${__addspace_str#?}" ]; do
printf '%c ' "$__addspace_str"
__addspace_str="${__addspace_str#?}"
done
printf '%c' "$__addspace_str"
}
Or if you need to put it in a variable:
addspace_var() {
addspace_result=""
__addspace_str="$1"
while [ -n "${__addspace_str#?}" ]; do
addspace_result="$addspace_result${__addspace_str%${__addspace_str#?}} "
__addspace_str="${__addspace_str#?}"
done
addspace_result="$addspace_result$__addspace_str"
}
addspace_var abc
echo "$addspace_result"
Tested with dash, ksh, zsh, bash (+ bash --posix), and busybox ash.
Explanation
${x#?}
This parameter expansion removes the first character of x. ${x#...} in general removes a prefix given by a pattern, and ? matches any single character.
printf '%c ' "$str"
The %c format parameter transforms the string argument into its first character, so the full format string '%c ' prints the first character of the string followed by a space. Note that if the string was empty this would cause issues, but we already checked that it wasn't before, so it's fine. To print the first character safely in any situation we can use '%.1s', but I like living dangerously :3j
${x%${x#?}}
This is an alternate way to get the first character of the string. We already know that ${x#?} is all but the first character. Well, ${x%...} removes ... from the end of x, so ${x%${x#?}} removes all but the first character from the end of x, leaving only the first one.
__prefixed_variable_names
POSIX doesn't define local, so to avoid variable conflicts it's safer to create unique names that are unlikely to clobber each other. I am starting to experiment using M4 to generate unique names while not having to destroy my code every time but it's probably overkill for people who don't use shell as much as me.
[ -n "${str#?}" ]
Why not just [ -n "$str" ]? It's to avoid the dreaded trailing space, it's also why we have a little statement guy at the bottom there outside the loop. The loops goes until the string is one character long, then we finish outside of it so we can append this last character without adding a space.
When should I use this?
This is good for small inputs in long running loops, since it avoids the overhead of calling an external process, but for larger inputs it starts lagging behind fast, specially the var version. (I fault the ${x%${x#?}} trick).
Benchmark Commands
# addspace
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do addspace \"$input\" >/dev/null; done"
# addspace_var
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do addspace_var \"$input\" >/dev/null; done"
# sed for comparison
time dash -c ". ./addspace.sh; for x in $(seq -s ' ' 1 10000); do echo \"$input\" | sed 's/./& /g' >/dev/null; done"
Input Length = 3
addspace addspace_var sed
real 0m0,106s 0m0,106s 0m10,651s
user 0m0,077s 0m0,075s 0m9,349s
sys 0m0,029s 0m0,031s 0m3,030s
Input Length = 200
addspace addspace_var sed
real 0m6,050s 0m47,115s 0m11,049s
user 0m5,557s 0m46,919s 0m9,727s
sys 0m0,488s 0m0,068s 0m3,085s
Input Length = 1000
addspace addspace_var sed
real 0m55,989s TBD 0m11,534s
user 0m53,560s TBD 0m10,214s
sys 0m2,428s TBD 0m2,975s
(Yeah, I was waiting a bit for that last var one.)
In situations like this you can simply check the length of the input and call the appropriate function for maximum performance.
addspace() {
if [ ${#1} -lt 100 ]; then
addspace_builtins "$1"
else
addspace_proccess "$1"
fi
}