convert txt to columnated file - bash

I need to convert test.txt file to a columnated file.
I know how to convert it with awk if the number of lines after each keyword are same but they are different in this example.
awk 'NR % 5 {printf "%s ", $0; next}1' test.txt
if the number of lines are same here is the code but this one will not work with this input file.
Anyway to convert this? Please advise.
test.txt
"abc"
4
21
22
25
"standard"
1
"test"
4
5
10
11
12
Expected Output:
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12

$ awk '{printf "%s%s", (/^"/ ? ors : OFS), $0; ors=ORS} END{print ""}' file
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12

A bit magic, but works in this case:
sed -z 's/\n"/\n\x01"/g' |
tr '\n' ' ' |
tr $'\x01' '\n'
Each "header" starts is a string between " ... ". So:
Using sed I put some delimter (I chose 0x01 in hex) between a newline and a ", everywhere in the file. Note that -z is a gnu extension.
Then I substitute all newlines for a space.
Then I substitute all 0x01 bytes for newlines.
This method is a little tricky, but is simply and works in cases where the header starts with some certain character on the beginning of the line.
Live version available at tutorialspoint.
One can get with sed without gnu extension by using for example:
sed '2,$s/^"/\x01"/'
ie. for lines greater then the second if the line starts with a ", then add the 0x01 byte on the beginning of the line.

with GNU awk
$ awk -v RS='\n"' '{$1=$1; printf "%s", rt $0; rt=RT}' file
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12

POSIX awk:
$ awk '/^"/{if (s) print s; s=$0; next} {s=s OFS $0} END{print s}' file
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12
Or with perl:
$ perl -0777 -lnE 'for (/^"[^"]+"\R(?:[\s\S]+?)(?=^"|\z)/mg) {tr /\n/ /; say} ' file
If your fields do not have spaces in them, you can use a simple tr and sed pipe:
$ cat file | tr '\n' ' ' | sed -E 's/ ("[^"]*")/\
\1/g'
Or GNU sed:
$ cat file | tr '\n' ' ' | sed -E 's/ ("[^"]*")/\n\1/g'

While an awk or sed solution is advisable, since the question is also tagged bash, you can do all that is needed with a simple read loop and a flag variable to control the newline output for the first iteration. Essentially, you are reading each line and using the string indexing parameter expansion to test whether the first character is a non-digit, and on the 1st iteration simply output the string, for all additional iterations, output the string preceded by a '\n'. If the line begins with a digit, simply output it with a space preceding.
For example:
#!/bin/bash
declare -i n=0 ## simple flag to omit '\n' on first string output
while read -r line; do ## read each line
[[ ${line:0:1} =~ [^0-9] ]] && { ## begins with non-digit
## 1st iteration, just output $line, rest output '\n$line'
((n == 0)) && printf "%s" "$line" || printf "\n%s" "$line"
} || printf " %s" "$line" ## begins with digit - output " $line"
n=1 ## set flag
done < "$1"
echo "" ## tidy up with newline
Example Use/Output
$ bash fmtlines test.txt
"abc" 4 21 22 25
"standard" 1
"test" 4 5 10 11 12
While awk and sed will generally be faster (as a general rule), here with nothing more than a while read loop and a few conditionals and parameter expansions, the native bash solution would not be bad by comparison.
Look things over and let me know if you have questions.

Related

Indent long line stdout

Let's say I have a standard 80 columns terminal, execute command with long line output (i.e. stdout from ls) that splits into two or more lines, and want to indent the continuation line of all my bash stdout.
Indent should be configurable, 1 or 2 or 3 or whatever spaces.
from this
lrwxrwxrwx 1 root root 24 Feb 19 1970 sdcard -> /storage/emula
ted/legacy/
to this
lrwxrwxrwx 1 root root 24 Feb 19 1970 sdcard -> /storage/emula
ted/legacy/
Read this Indenting multi-line output in a shell script so I tried to pipe | sed 's/^/ /' but gives me the exact opposite, indents the first lines and not the continuation.
Ideally I would put a script in profile.rc or whatever so every time I open an interactive shell and execute any command long output gets indented .
I'd use awk for this.
awk -v width="$COLUMNS" -v spaces=4 '
BEGIN {
pad = sprintf("%*s", spaces, "") # generate `spaces` spaces
}
NF { # if current line is not empty
while (length > width) { # while length of current line is greater than `width`
print substr($0, 1, width) # print `width` chars from the beginning of it
$0 = pad substr($0, width + 1) # and leave `pad` + remaining chars
}
if ($0 != "") # if there are remaining chars
print # print them too
next
} 1' file
In one line:
awk -v w="$COLUMNS" -v s=4 'BEGIN{p=sprintf("%*s",s,"")} NF{while(length>w){print substr($0,1,w);$0=p substr($0,w+1)} if($0!="") print;next} 1'
As #Mark suggested in comments, you can put this in a function and add it to .bashrc for ease of use.
function wrap() {
awk -v w="$COLUMNS" -v s=4 'BEGIN{p=sprintf("%*s",s,"")} NF{while(length>w){print substr($0,1,w);$0=p substr($0,w+1)} if($0!="") print;next} 1'
}
Usage:
ls -l | wrap
Edit by Ed Morton per request:
Very similar to oguzismails script above but should work with Busybox or any other awk:
$ cat tst.awk
BEGIN { pad = sprintf("%" spaces "s","") }
{
while ( length($0) > width ) {
printf "%s", substr($0,1,width)
$0 = substr($0,width+1)
if ( $0 != "" ) {
print ""
$0 = pad $0
}
}
print
}
$
$ echo '123456789012345678901234567890' | awk -v spaces=3 -v width=30 -f tst.awk
123456789012345678901234567890
$ echo '123456789012345678901234567890' | awk -v spaces=3 -v width=15 -f tst.awk
123456789012345
678901234567
890
$ echo '' | awk -v spaces=3 -v width=15 -f tst.awk
$
That first test case is to show that you don't get a blank line printed after a full-width input line and the third is to show that it doesn't delete blank rows. Normally I'd have used sprintf("%*s",spaces,"") to create the pad string but I see in a comment that that doesn't work in the apparently non-POSIX awk you're using.
This might work for you (GNU sed):
sed 's/./&\n /80;P;D' file
This splits lines into length 80 and indents the following line(s) by 2 spaces.
Or if you prefer:
s=' ' c=80
sed "s/./&\n$s/$c;P;D" file
To prevent empty lines being printed, use:
sed 's/./&\n/80;s/\n$//;s/\n /;P;D' file
or more easily:
sed 's/./\n &/81;P;D' file
One possible solution using pure bash string manipulation.
You can make the script to read stdin and format whatever it read.
MAX_LEN=5 # length of the first/longest line
IND_LEN=2 # number of indentation spaces
short_len="$((MAX_LEN-IND_LEN))"
while read line; do
printf "${line:0:$MAX_LEN}\n"
for i in $(seq $MAX_LEN $short_len ${#line}); do
printf "%*s${line:$i:$short_len}\n" $IND_LEN
done
done
Usage: (assuming the script is saved as indent.sh)
$ echo '0123456789' | ./indent.sh
01234
567
89

bash - how do I use 2 numbers on a line to create a sequence

I have this file content:
2450TO3450
3800
4500TO4560
And I would like to obtain something of this sort:
2450
2454
2458
...
3450
3800
4500
4504
4508
..
4560
Basically I would need a one liner in sed/awk that would read the values on both sides of the TO separator and inject those in a seq command or do the loop on its own and dump it in the same file as a value per line with an arbitrary increment, let's say 4 in the example above.
I know I can use several one temp file, go the read command and sorts, but I would like to do it in a one liner starting with cat filename | etc. as it is already part of a bigger script.
Correctness of the input is guaranteed so always left side of TOis smaller than bigger side of it.
Thanks
Like this:
awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}' file
or, if you like starting with cat:
cat file | awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}'
Something like this might work:
awk -F TO '{system("seq " $1 " 4 " ($2 ? $2 : $1))}'
This would tell awk to system (execute) the command seq 10 4 10 for lines just containing 10 (which outputs 10), and something like seq 10 4 40 for lines like 10TO40. The output seems to match your example.
Given:
txt="2450TO3450
3800
4500TO4560"
You can do:
echo "$txt" | awk -F TO '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i++) print i}'
If you want an increment greater than 1:
echo "$txt" | awk -F TO -v p=4 '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i+=p) print i}'
Give a try to this:
sed 's/TO/ /' file.txt | while read first second; do if [ ! -z "$second" ] ; then seq $first 4 $second; else printf "%s\n" $first; fi; done
sed is used to replace TO with space char.
read is used to read the line, if there are 2 numbers, seq is used to generate the sequence. Otherwise, the uniq number is printed.
This might work for you (GNU sed):
sed -r 's/(.*)TO(.*)/seq \1 4 \2/e' file
This evaluates the RHS of the substitution command if the LHS contains TO.

Sort a text file by line length including spaces

I have a CSV file that looks like this
AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Atlantis,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mrs. Plain Example, 1121110 Ternary st. 110 Binary ave..,Atlantis,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Liberty City,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mr. Plain Example, 110 Ternary ave.,Some City,RI,12345,(999)123-5555,1.56
I need to sort it by line length including spaces. The following command doesn't
include spaces, is there a way to modify it so it will work for me?
cat $# | awk '{ print length, $0 }' | sort -n | awk '{$1=""; print $0}'
Answer
cat testfile | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2-
Or, to do your original (perhaps unintentional) sub-sorting of any equal-length lines:
cat testfile | awk '{ print length, $0 }' | sort -n | cut -d" " -f2-
In both cases, we have solved your stated problem by moving away from awk for your final cut.
Lines of matching length - what to do in the case of a tie:
The question did not specify whether or not further sorting was wanted for lines of matching length. I've assumed that this is unwanted and suggested the use of -s (--stable) to prevent such lines being sorted against each other, and keep them in the relative order in which they occur in the input.
(Those who want more control of sorting these ties might look at sort's --key option.)
Why the question's attempted solution fails (awk line-rebuilding):
It is interesting to note the difference between:
echo "hello awk world" | awk '{print}'
echo "hello awk world" | awk '{$1="hello"; print}'
They yield respectively
hello awk world
hello awk world
The relevant section of (gawk's) manual only mentions as an aside that awk is going to rebuild the whole of $0 (based on the separator, etc) when you change one field. I guess it's not crazy behaviour. It has this:
"Finally, there are times when it is convenient to force awk to rebuild the entire record, using the current value of the fields and OFS. To do this, use the seemingly innocuous assignment:"
$1 = $1 # force record to be reconstituted
print $0 # or whatever else with $0
"This forces awk to rebuild the record."
Test input including some lines of equal length:
aa A line with MORE spaces
bb The very longest line in the file
ccb
9 dd equal len. Orig pos = 1
500 dd equal len. Orig pos = 2
ccz
cca
ee A line with some spaces
1 dd equal len. Orig pos = 3
ff
5 dd equal len. Orig pos = 4
g
The AWK solution from neillb is great if you really want to use awk and it explains why it's a hassle there, but if what you want is to get the job done quickly and don't care what you do it in, one solution is to use Perl's sort() function with a custom caparison routine to iterate over the input lines. Here is a one liner:
perl -e 'print sort { length($a) <=> length($b) } <>'
You can put this in your pipeline wherever you need it, either receiving STDIN (from cat or a shell redirect) or just give the filename to perl as another argument and let it open the file.
In my case I needed the longest lines first, so I swapped out $a and $b in the comparison.
Benchmark results
Below are the results of a benchmark across solutions from other answers to this question.
Test method
10 sequential runs on a fast machine, averaged
Perl 5.24
awk 3.1.5 (gawk 4.1.0 times were ~2% faster)
The input file is a 550MB, 6 million line monstrosity (British National Corpus txt)
Results
Caleb's perl solution took 11.2 seconds
my perl solution took 11.6 seconds
neillb's awk solution #1 took 20 seconds
neillb's awk solution #2 took 23 seconds
anubhava's awk solution took 24 seconds
Jonathan's awk solution took 25 seconds
Fritz's bash solution takes 400x longer than the awk solutions (using a truncated test case of 100000 lines). It works fine, just takes forever.
Another perl solution
perl -ne 'push #a, $_; END{ print sort { length $a <=> length $b } #a }' file
Try this command instead:
awk '{print length, $0}' your-file | sort -n | cut -d " " -f2-
Pure Bash:
declare -a sorted
while read line; do
if [ -z "${sorted[${#line}]}" ] ; then # does line length already exist?
sorted[${#line}]="$line" # element for new length
else
sorted[${#line}]="${sorted[${#line}]}\n$line" # append to lines with equal length
fi
done < data.csv
for key in ${!sorted[*]}; do # iterate over existing indices
echo -e "${sorted[$key]}" # echo lines with equal length
done
Python Solution
Here's a Python one-liner that does the same, tested with Python 3.9.10 and 2.7.18. It's about 60% faster than Caleb's perl solution, and the output is identical (tested with a 300MiB wordlist file with 14.8 million lines).
python -c 'import sys; sys.stdout.writelines(sorted(sys.stdin.readlines(), key=len))'
Benchmark:
python -c 'import sys; sys.stdout.writelines(sorted(sys.stdin.readlines(), key=len))'
real 0m5.308s
user 0m3.733s
sys 0m1.490s
perl -e 'print sort { length($a) <=> length($b) } <>'
real 0m8.840s
user 0m7.117s
sys 0m2.279s
The length() function does include spaces. I would make just minor adjustments to your pipeline (including avoiding UUOC).
awk '{ printf "%d:%s\n", length($0), $0;}' "$#" | sort -n | sed 's/^[0-9]*://'
The sed command directly removes the digits and colon added by the awk command. Alternatively, keeping your formatting from awk:
awk '{ print length($0), $0;}' "$#" | sort -n | sed 's/^[0-9]* //'
I found these solutions will not work if your file contains lines that start with a number, since they will be sorted numerically along with all the counted lines. The solution is to give sort the -g (general-numeric-sort) flag instead of -n (numeric-sort):
awk '{ print length, $0 }' lines.txt | sort -g | cut -d" " -f2-
With POSIX Awk:
{
c = length
m[c] = m[c] ? m[c] RS $0 : $0
} END {
for (c in m) print m[c]
}
Example
1) pure awk solution. Let's suppose that line length cannot be more > 1024
then
cat filename | awk 'BEGIN {min = 1024; s = "";} {l = length($0); if (l < min) {min = l; s = $0;}} END {print s}'
2) one liner bash solution assuming all lines have just 1 word, but can reworked for any case where all lines have same number of words:
LINES=$(cat filename); for k in $LINES; do printf "$k "; echo $k | wc -L; done | sort -k2 | head -n 1 | cut -d " " -f1
using Raku (formerly known as Perl6)
~$ cat "BinaryAve.txt" | raku -e 'given lines() {.sort(*.chars).join("\n").say};'
AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Atlantis,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mr. Plain Example, 110 Ternary ave.,Some City,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mr. Plain Example, 110 Binary ave.,Liberty City,RI,12345,(999)123-5555,1.56
AS2345,ASDF1232, Mrs. Plain Example, 1121110 Ternary st. 110 Binary ave..,Atlantis,RI,12345,(999)123-5555,1.56
To reverse the sort, add .reverse in the middle of the chain of method calls--immediately after .sort(). Here's code showing that .chars includes spaces:
~$ cat "number_triangle.txt" | raku -e 'given lines() {.map(*.chars).say};'
(1 3 5 7 9 11 13 15 17 19 0)
~$ cat "number_triangle.txt"
1
1 2
1 2 3
1 2 3 4
1 2 3 4 5
1 2 3 4 5 6
1 2 3 4 5 6 7
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 0
Here's a time comparison between awk and Raku using a 9.1MB txt file from Genbank:
~$ time cat "rat_whole_genome.txt" | raku -e 'given lines() {.sort(*.chars).join("\n").say};' > /dev/null
real 0m1.308s
user 0m1.213s
sys 0m0.173s
~$ #awk code from neillb
~$ time cat "rat_whole_genome.txt" | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2- > /dev/null
real 0m1.189s
user 0m1.170s
sys 0m0.050s
HTH.
https://raku.org
Here is a multibyte-compatible method of sorting lines by length. It requires:
wc -m is available to you (macOS has it).
Your current locale supports multi-byte characters, e.g., by setting LC_ALL=UTF-8. You can set this either in your .bash_profile, or simply by prepending it before the following command.
testfile has a character encoding matching your locale (e.g., UTF-8).
Here's the full command:
cat testfile | awk '{l=$0; gsub(/\047/, "\047\"\047\"\047", l); cmd=sprintf("echo \047%s\047 | wc -m", l); cmd | getline c; close(cmd); sub(/ */, "", c); { print c, $0 }}' | sort -ns | cut -d" " -f2-
Explaining part-by-part:
l=$0; gsub(/\047/, "\047\"\047\"\047", l); ← makes of a copy of each line in awk variable l and double-escapes every ' so the line can safely be echoed as a shell command (\047 is a single-quote in octal notation).
cmd=sprintf("echo \047%s\047 | wc -m", l); ← this is the command we'll execute, which echoes the escaped line to wc -m.
cmd | getline c; ← executes the command and copies the character count value that is returned into awk variable c.
close(cmd); ← close the pipe to the shell command to avoid hitting a system limit on the number of open files in one process.
sub(/ */, "", c); ← trims white space from the character count value returned by wc.
{ print c, $0 } ← prints the line's character count value, a space, and the original line.
| sort -ns ← sorts the lines (by prepended character count values) numerically (-n), and maintaining stable sort order (-s).
| cut -d" " -f2- ← removes the prepended character count values.
It's slow (only 160 lines per second on a fast Macbook Pro) because it must execute a sub-command for each line.
Alternatively, just do this solely with gawk (as of version 3.1.5, gawk is multibyte aware), which would be significantly faster. It's a lot of trouble doing all the escaping and double-quoting to safely pass the lines through a shell command from awk, but this is the only method I could find that doesn't require installing additional software (gawk is not available by default on macOS).
Revisiting this one. This is how I approached it (count length of LINE and store it as LEN, sort by LEN, keep only the LINE):
cat test.csv | while read LINE; do LEN=$(echo ${LINE} | wc -c); echo ${LINE} ${LEN}; done | sort -k 2n | cut -d ' ' -f 1

How to print all the columns after a particular number using awk?

On shell, I pipe to awk when I need a particular column.
This prints column 9, for example:
... | awk '{print $9}'
How can I tell awk to print all the columns including and after column 9, not just column 9?
awk '{ s = ""; for (i = 9; i <= NF; i++) s = s $i " "; print s }'
When you want to do a range of fields, awk doesn't really have a straight forward way to do this. I would recommend cut instead:
cut -d' ' -f 9- ./infile
Edit
Added space field delimiter due to default being a tab. Thanks to Glenn for pointing this out
awk '{print substr($0, index($0,$9))}'
Edit:
Note, this doesn't work if any field before the ninth contains the same value as the ninth.
sed -re 's,\s+, ,g' | cut -d ' ' -f 9-
Instead of dealing with variable width whitespace, replace all whitespace as single space. Then use simple cut with the fields of interest.
It doesn't use awk so isn't germane but seemed appropriate given the other answers/comments.
Generally perl replaces awk/sed/grep et. al., and is much more portable (as well as just being a better penknife).
perl -lane 'print "#F[8..$#F]"'
Timtowtdi applies of course.
awk -v m="\x01" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
This chops what is before the given field nr., N, and prints all the rest of the line, including field nr.N and maintaining the original spacing (it does not reformat). It doesn't mater if the string of the field appears also somewhere else in the line, which is the problem with Ascherer's answer.
Define a function:
fromField () {
awk -v m="\x01" -v N="$1" '{$N=m$N; print substr($0,index($0,m)+1)}'
}
And use it like this:
$ echo " bat bi iru lau bost " | fromField 3
iru lau bost
$ echo " bat bi iru lau bost " | fromField 2
bi iru lau bost
Output maintains everything, including trailing spaces
For N=0 it returns the whole line, as is, and for n>NF the empty string
Here is an example of ls -l output:
-rwxr-----# 1 ricky.john 1493847943 5610048 Apr 16 14:09 00-Welcome.mp4
-rwxr-----# 1 ricky.john 1493847943 27862521 Apr 16 14:09 01-Hello World.mp4
-rwxr-----# 1 ricky.john 1493847943 21262056 Apr 16 14:09 02-Typical Go Directory Structure.mp4
-rwxr-----# 1 ricky.john 1493847943 10627144 Apr 16 14:09 03-Where to Get Help.mp4
My solution to print anything post $9 is awk '{print substr($0, 61, 50)}'
Using cut instead of awk and overcoming issues with figuring out which column to start at by using the -c character cut command.
Here I am saying, give me all but the first 49 characters of the output.
ls -l /some/path/*/* | cut -c 50-
The /*/*/ at the end of the ls command is saying show me what is in subdirectories too.
You can also pull out certain ranges of characters ala (from the cut man page). E.g., show the names and login times of the currently logged in users:
who | cut -c 1-16,26-38
To display the first 3 fields and print the remaining fields you can use:
awk '{s = ""; for (i=4; i<= NF; i++) s= s $i : "; print $1 $2 $3 s}' filename
where $1 $2 $3 are the first 3 fields.
function print_fields(field_num1, field_num2){
input_line = $0
j = 1;
for (i=field_num1; i <= field_num2; i++){
$(j++) = $(i);
}
NF = field_num2 - field_num1 + 1;
print $0
$0 = input_line
}
Usually it is desired to pass the remaining columns unmodified. That is, without collapsing contiguous white space.
Imagine the case of processing the output of ls -l or ps faux (not recommended, just giving examples where the last column may contain sequences of whitespace)). We'd want any contiguous white space in the remaning columns preserved so that a file named my file.txt doesn't become my file.txt.
Preserving white space for the remainder of the line is surprisingly difficult using awk. The accepted awk-based answer does not, even with the suggested refinements.
sed or perl are better suited to this task.
sed
echo '1 2 3 4 5 6 7 8 9 10' | sed -E 's/^([^ \t]*[ \t]*){8}//'
Result:
9 10
The -E option enables modern ERE regex syntax. This saves me the trouble of backslash escaping the parentheses and braces.
The {8} is a quantifier indicating to match the previous item exactly 8 times.
The sed s command replaces 8 occurrences of white space delimited words by an empty string. The remainder of the line is left intact.
perl
Perl regex supports the \h escape for horizontal whitespace.
echo '1 2 3 4 5 6 7 8 9 10' | perl -pe 's/^(\H*\h*){8}//'
Result:
9 10
ruby -lane 'print $F[3..-1].join(" ")' file

awk line break with printf

I have a simple shell script, shown below, and I want to put a line break after each line returned by it.
#!/bin/bash
vcount=`db2 connect to db_lexus > /dev/null; db2 list tablespaces | grep -i "Tablespace ID" | wc -l`
db2pd -d db_lexus -tablespaces | grep -i "Tablespace Statistics" -A $vcount | awk '{printf ($2 $7)}'
The output is:
Statistics:IdFreePgs0537610230083224460850d
and I want the output to be something like that:
Statistics:
Id FreePgs
0 5376
1 0
2 3008
3 224
4 608
5 0
Is that possible to do with shell scripting?
Your problem can be reduced to the following:
$ cat infile
11 12
21 22
$ awk '{ printf ($1 $2) }' infile
11122122
printf is for formatted printing. I'm not even sure if the behaviour of above usage is defined, but it's not how it's meant to be done. Consider:
$ awk '{ printf ("%d %d\n", $1, $2) }' infile
11 12
21 22
"%d %d\n" is an expression that describes how to format the output: "a decimal integer, a space, a decimal integer and a newline", followed by the numbers that go where the %d are. printf is very flexible, see the manual for what it can do.
In this case, we don't really need the power of printf, we can just use print:
$ awk '{ print $1, $2 }' infile
11 12
21 22
This prints the first and second field, separated by a space1 – and print does add a newline without us telling it to.
1More precisely, "separated by the value of the output field separator OFS", which defaults to a space and is printed wherever we use , between two arguments. Forgetting the comma is a popular mistake that leads to no space between the record fields.
It looks like you just want to print columns 2 and 7 of whatever is passed to AWK. Try changing your AWK command to
awk '{print $2, $7}'
This will also add a line break at the end.
I realize you are asking about how to do something in a shell script, but it would certainly be a LOT easier to get this from the database using SQL:
#!/bin/bash
export DB2DBDFT=db_lexus
db2 "select tbsp_id, tbsp_free_pages \
from table(mon_get_tablespace('',-2)) as T \
order by tbsp_id"

Resources