Replace tab with variable amount of spaces, maintaining the alignment - bash

I have a tab separated file, consisting of 7 columns.
ABC 1437 1 0 71 15.7 174.4
DEF 0 0 0 1 45.9 45.9
GHIJ 2 3 0 9 1.1 1.6
What I need is to replace the tab character with variable amount of space characters in order ot maintain the column alignment. Note that, I do not want every tab to be replaced by 8 spaces. Instead, I want 5 spaces after row #1 column #1 (8 - length(ABC) = 5), 4 spaces after row #1 column #2 (8 - length(1437) = 4), etc.
Is there a linux tool to do it for me, or I should write it myself?

The POSIX utility pr called as pr -e -t does exactly what you want and AFAIK is present in every Unix installation.
$ cat file
ABC 1437 1 0 71 15.7 174.4
DEF 0 0 0 1 45.9 45.9
GHIJ 2 3 0 9 1.1 1.6
$ pr -e -t file
ABC 1437 1 0 71 15.7 174.4
DEF 0 0 0 1 45.9 45.9
GHIJ 2 3 0 9 1.1 1.6
and with the tabs visible as ^Is:
$ cat -ET file
ABC^I1437^I1^I0^I71^I15.7^I174.4$
DEF^I0^I0^I0^I1^I45.9^I45.9$
GHIJ^I2^I3^I0^I9^I1.1^I1.6$
$ pr -e -t file | cat -ET
ABC 1437 1 0 71 15.7 174.4$
DEF 0 0 0 1 45.9 45.9$
GHIJ 2 3 0 9 1.1 1.6$

There is command pair dedicated for this task.
$ expand file
will do exactly what you want. The counterpart unexpand -a to do the reverse. There are few other useful options in both.

Use column, as suggested in the comment by anubhava, specifically using -t and -s options:
column -t -s $'\t' in_file
From the column manual:
-s, --separator separators
Specify the possible input item delimiters (default is
whitespace).
-t, --table
Determine the number of columns the input contains and
create a table. Columns are delimited with whitespace, by
default, or with the characters supplied using the
--output-separator option. Table output is useful for
pretty-printing.

Related

select multiple patterns with grep

I have file that looks like that:
t # 3-7, 1
v 0 104
v 1 92
v 2 95
u 0 1 2
u 0 2 2
u 1 2 2
t # 3-8, 1
v 0 94
v 1 13
v 2 19
v 3 5
u 0 1 2
u 0 2 2
u 0 3 2
t # 3-9, 1
v 0 94
v 1 13
v 2 19
v 3 7
u 0 1 2
u 0 2 2
u 0 3 2
t corresponds to header of each block.
I would like to extract multiple patterns from the file and output transactions that contain required patterns altogether.
I tried the following code:
ps | grep -e 't\|u 0 1 2' file.txt
and it works well to extract header and pattern 'u 0 1 2'. However, when I add one more pattern, the output list only headers start with t #. My modified code looks like that:
ps | grep -e 't\|u 0 1 2 && u 0 2 2' file.txt
I tried sed and awk solutions, but they do not work for me as well.
Thank you for your help!
Olha
Use | as the separator before the third alternative, just like the second alternative.
grep -E 't|u 0 1 2|u 0 2 2' file.txt
Also, it doesn't make sense to specify a filename and also pipe ps to grep. If you provide filename arguments, it doesn't read from the pipe (unless you use - as a filename).
You can use grep with multiple -e expressions to grep for more than one thing at a time:
$ printf '%d\n' {0..10} | grep -e '0' -e '5'
0
5
10
Expanding on #kojiro's answer, you'll want to use an array to collect arguments:
mapfile -t lines < file.txt
for line in "${lines[#]}"
do
arguments+=(-e "$line")
done
grep "${arguments[#]}"
You'll probably need a condition within the loop to check whether the line is one you want to search for, but that's it.

Combining multiple awk output statements into one line

I have some ascii files I’m processing, with 35 columns each, and variable number of rows. I need to take the difference between two columns (N+1), and place the results into a duplicate ascii file on column number 36. Then, I need to take another column, and divide it (row by row) by column 36, and place that result into the same duplicate ascii file in column 37.
I’ve done similar processing in the past, but by outputting temp files for each awk command, reading each successive temp file in to eventually create a final ascii file. Then, I would delete the temp files after. I’m hoping there is an easier/faster method than having to create a bunch of temp files.
Below is an initial working processing step, that the above awk commands would need to follow and fit into. This step gets the data from foo.txt, removes the header, and processes only the rows containing a particular, but varying, string.
cat foo.txt | tail -n +2 | awk '$17 ~ /^[F][0-9][0-9][0-9]$/' >> foo_new.txt
There’s another processing step for different data files, that I would also need the 2 new columns discussed earlier. This is simply appending a unique file name from what’s being catted to the last column of every row in a new ascii file. This command is actually in a loop with varying input files, but I’ve simplified it here.
cat foo.txt | tail -n +2 | awk -v fname="$fname" '{print $0 OFS fname;}' >> foo_new.txt
An example of one of the foo.txt files.
20 0 5 F001
4 2 3 F002
12 4 8 F003
100 10 29 O001
Below would be the example foo_new.txt desired. The requested 2 columns of output from awk (last 2 columns). In this example, column 5 is the difference between column 3 and 2 plus 1. Column 6 is the result of column 1 divided by column 5.
20 0 5 F001 6 3.3
4 2 3 F002 2 2.0
12 4 8 F003 5 2.4
For the second example foo_new.txt. The last column is an example of fname. These are computed in the shell script, and passed to awk. I don't care if the results in column 7 (fname) are at the end or placed between columns 4 and 5, so long as it gets along with the other awk statements.
20 0 5 F001 6 3.3 C1
4 2 3 F002 2 2.0 C2
12 4 8 F003 5 2.4 C3
The best luck so far, but unfortunately this is producing a file with the original output first, and the added output below it. I'd like to have the added output appended on as columns (#5 and #6).
cat foo.txt | tail -n +2 | awk '$17 ~ /^[F][0-9][0-9][0-9]$/' >> foo_new.txt
cat foo_new.txt | awk '{print $4=$3-$2+1, $5=$1/($3-$2+1)}' >> foo_new.txt
Consider an input file data with header line like this (based closely on your minimal example):
Col1 Col2 Col3 Col4
20 0 5 F001
4 2 3 F002
12 4 8 F003
100 10 29 O001
You want the output to contain a column 5 that is the value of $3 - $2 + 1 (column 3 minus column 2 plus 1), and a column 6 that is the value of column 1 divided by column 5 (with 1 decimal place in the output), and a file name that is based on a variable fname passed to the script but that has a unique value for each line. And you only want lines where column 4 matches F and 3 digits, and you want to skip the first line. That can all be written directly in awk:
awk -v fname=C '
NR == 1 { next }
$4 ~ /^F[0-9][0-9][0-9]$/ { c5 = $3 - $2 + 1
c6 = sprintf("%.1f", $1 / c5)
print $0, c5, c6, fname NR
}' data
You could write that on one line too:
awk -v fname=C 'NR==1{next} $4~/^F[0-9][0-9][0-9]$/ { c5=$3-$2+1; print $0,c5,sprintf("%.1f",$1/c5), fname NR }' data
The output is:
20 0 5 F001 6 3.3 C2
4 2 3 F002 2 2.0 C3
12 4 8 F003 5 2.4 C4
Clearly, you could change the file name so that the counter starts from 0 or 1 by using counter++ or ++counter respectively in place of the NR in the print statement, and you could format it with leading zeros or whatever else you want with sprintf() again. If you want to drop the first line of each file, rather than just the first file, change the NR == 1 condition to FNR == 1 instead.
Note that this does not need the preprocessing provided by cat foo.txt | tail -n +2.
I need to take the difference between two columns (N+1), and place the results into a duplicate ascii file on column number 36. Then, I need to take another column, and divide it (row by row) by column 36, and place that result into the same duplicate ascii file in column 37.
That's just:
awk -vN=9 -vanother_column=10 '{ v36 = $N - $(N+1); print $0, v36, $another_column / v36 }' input_file.tsv
I guess your file has some "header"/special "first line", so if it's the first line, then preserve it:
awk ... 'NR==1{print $0, "36_header", "37_header"} NR>1{ ... the script above ... }`
Taking first 3 columns from the example script you presented, and substituting N for 2 and another_column for 1, we get the following script:
# recreate input file
cat <<EOF |
20 0 5
4 2 3
12 4 8
100 10 29
EOF
tr -s ' ' |
tr ' ' '\t' > input_file.tsv
awk -vOFS=$'\t' -vIFS=$'\t' -vN=2 -vanother_column=1 '{ tmp = $(N + 1) - $N; print $0, tmp, $another_column / tmp }' input_file.tsv
and it will output:
20 0 5 5 4
4 2 3 1 4
12 4 8 4 3
100 10 29 19 5.26316
Such script:
awk -vOFS=$'\t' -vIFS=$'\t' -vN=2 -vanother_column=1 '{ tmp = $(N + 1) - $N + 1; print $0, tmp, sprintf("%.1f", $another_column / tmp) }' input_file.tsv
I think get's closer output to what you want:
20 0 5 6 3.3
4 2 3 2 2.0
12 4 8 5 2.4
100 10 29 20 5.0
And I guess that by that (N+1) you meant "the difference between two columns with 1 added".

Divide column values of different files by a constant then output one minus the other

I have two files of the form
file1:
#fileheader1
0 123
1 456
2 789
3 999
4 112
5 131
6 415
etc.
file2:
#fileheader2
0 442
1 232
2 542
3 559
4 888
5 231
6 322
etc.
How can I take the second column of each, divide it by a value then minus one from the other and then output a new third file with the new values?
I want the output file to have the form
#outputheader
0 123/c-422/k
1 456/c-232/k
2 789/c-542/k
etc.
where c and k are numbers I can plug into the script
I have seen this question: subtract columns from different files with awk
But I don't know how to use awk to do this by myself, does anyone know how to do this or could explain what is going on in the linked question so I can try to modify it?
I'd write:
awk -v c=10 -v k=20 ' ;# pass values to awk variables
/^#/ {next} ;# skip headers
FNR==NR {val[$1]=$2; next} ;# store values from file1
$1 in val {print $1, (val[$1]/c - $2/k)} ;# perform the calc and print
' file1 file2
output
0 -9.8
1 34
2 51.8
3 71.95
4 -33.2
5 1.55
6 25.4
etc. 0

strange behaviour in awk sub replacement

I've got a string a
[root#rh-1 ~]# echo $a
4 11 10 7 11
This set of values need to be checked against the id field in the file table
[root#rh-1 ~]# cat table
id name inuse rules
1 Critical data 0 ...
2 Important data 0 ...
3 Normal data 0 ...
4 nc-test 0 ...
5 schedule one 0 ...
7 foo sc2 0 ...
8 foo sc3 0 ...
9 foo-sc4 0 ...
10 foo_sc5 0 ...
11 foosc6 0 ...
Wherever the id is one of the numbers in $a, I need to replace the value of the inuse column with 1.The rules column can be discarded
[root#rh-1 ~]# cat table | awk -v a="$a" ' {
split(a,sid," "); NF=NF-1;}
{ $NF=0;n=length(sid);
for (i=1;i<n;i++)
{ if($1 == sid[i]){
sub($NF,1);
break;
}
}
if($1=="id")
print("mid name InUse" );
else
printf("%-25s\n",$0);
} '
The approach above works, but for some reason here, in the line
10 foo_sc5
instead of replacing the 0 at the end,
to
10 foo_sc5 1
it replaces as
11 foo_sc5 0
while all others are replaced correctly.
mid name InUse
1 Critical data 0
2 Important data 0
3 Normal data 0
4 nc-test 1
5 schedule one 0
7 foo sc2 1
8 foo sc3 0
9 foo-sc4 0
11 foo_sc5 0
11 foosc6 1
[root#rh-1 ~]#
Somehow the replacement only for that particular line is faulty. Can someone help ?
Replace sub($NF,1) with $NF=1.
sub($NF,1) uses $NF as a regular expression and, For the first occurrence in the line, it replaces with 1. Since $NF starts out as 0, this means that the first zero in the line is replaced with a one.
By contrast, $NF=1 merely sets the last column to one.

Paste side by side multiple files by numerical order

I have many files in a directory with similar file names like file1, file2, file3, file4, file5, ..... , file1000. They are of the same dimension, and each one of them has 5 columns and 2000 lines. I want to paste them all together side by side in a numerical order into one large file, so the final large file should have 5000 columns and 2000 lines.
I tried
for x in $(seq 1 1000); do
paste `echo -n "file$x "` > largefile
done
Instead of writing all file names in the command line, is there a way I can paste those files in a numerical order (file1, file2, file3, file4, file5, ..., file10, file11, ..., file1000)?
for example:
file1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
...
file2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
....
file 3
3 3 3 3 3
3 3 3 3 3
3 3 3 3 3
....
paste file1 file2 file3 .... file 1000 > largefile
largefile
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
....
Thanks.
If your current shell is bash: paste -d " " file{1..1000}
you need rename the files with leading zeroes, like
paste <(ls -1 file* | sort -te -k2.1n) <(seq -f "file%04g" 1000) | xargs -n2 echo mv
The above is for "dry run" - Remove the echo if you satisfied...
or you can use e.g. perl
ls file* | perl -nlE 'm/file(\d+)/; rename $_, sprintf("file%04d", $1);'
and after you can
paste file*
With zsh:
setopt extendedglob
paste -d ' ' file<->(n)
<x-y> is to match positive decimal integer numbers from x to y. x and/or y can be omitted so <-> is any positive decimal integer number. It could also be written [0-9]## (## being the zsh equivalent of regex +).
The (n) is the globbing qualifiers. The n globbing qualifier turns on numeric sorting which sorts on all sequences of decimal digits appearing in the file names.

Resources