How can I add 6 more single space separated columns to a file.
The input file that looks like this:
-11.160574
...
-11.549076
-12.020907
...
-12.126601
...
-11.93235
...
-8.297653
Where ... represents 50 more lines of numbers.
The output I want is this:
-11.160574 1 1 1 1 1 14
...
-11.549076 51 51 1 1 1 14
-12.020907 1 1 2 2 1 14
...
-12.126601 51 51 2 2 1 14
...
-11.93235 1 1 51 51 1 14
...
-8.297653 51 51 51 51 1 14
The 2nd and 3rd columns are loops for 1 to 51.
The 4th and 5th columns are also loops for 1 to 51, but at the upper level from above.
The last two ones constants columns of 1 and 14.
Use a loop to read the file line-by-line and maintain counters to keep track of the field numbers as shown below:
#!/bin/bash
field1=1
field2=1
while read line
do
echo "$line $field1 $field1 $field2 $field2 1 14"
(( field1++ ))
if (( $field1 == 52 )); then
field1=1
(( field2++ ))
fi
done < file
Here you go, an awk script:
{
mod = 51
a = (NR - 1) % mod + 1
b = int((NR - 1) / mod) + 1
c = 1
d = 14
print $0,a,a,b,b,c,d
}
Run it with something like awk -f the-script.awk in-file.txt. Or make it executable and add #!/usr/bin/awk -f at the top, and you can run it directly without typing awk -f.
Related
Here is what I have:
#!/bin/bash
#create a multiplication table 5 columns 10 rows
echo " Multiplication Table "
echo "-----+-------------------------"
for x in {0..5}
do
for y in {0..10}
do
echo -n "$(( $x * $y )) "
done
echo
echo "-----+--------------------------"
done
This is my Output:
Multiplication Table
-----+-------------------------
0 0 0 0 0 0 0 0 0 0 0
-----+--------------------------
0 1 2 3 4 5 6 7 8 9 10
-----+--------------------------
0 2 4 6 8 10 12 14 16 18 20
-----+--------------------------
0 3 6 9 12 15 18 21 24 27 30
-----+--------------------------
0 4 8 12 16 20 24 28 32 36 40
-----+--------------------------
0 5 10 15 20 25 30 35 40 45 50
-----+--------------------------
This is the Needed Output:
Multiplication Table
----+-------------------------------------
| 0 1 2 3 4
----+-------------------------------------
0 | 0 0 0 0 0
1 | 0 1 2 3 4
2 | 0 2 4 6 8
3 | 0 3 6 9 12
4 | 0 4 8 12 16
5 | 0 5 10 15 20
6 | 0 6 12 18 24
7 | 0 7 14 21 28
8 | 0 8 16 24 32
9 | 0 9 18 27 36
----+-------------------------------------
I've tried to write this many different ways, but I'm struggling with finding a way to format it correctly. The first is pretty close, but I need it to have the sequential numbers being multiplied on the top and left side. I'm not sure how to use, or if I can use, the seq command to achieve this or if there is a better way. I also need to have straight columns and rows with the defining lines setting the table layout, but my looking up the column command hasn't produced the right output.
Here was my final output and code:
#!/bin/bash
#create a multiplication table 5 columns 10 rows
#Create top of the table
echo " Multiplication Table"
echo "----+------------------------------"
#Print the nums at top of table and format dashes
echo -n " |"; printf '\t%d' {0..5}; echo
echo "----+------------------------------"
#for loops to create table nums
for y in {0..9}
do
#Print the side nums and |
echo -n "$y |"
#for loop to create x
for x in {0..5}
do
#Multiply vars, tab for spacing
echo -en "\t$((x*y))"
done
#Print
echo
done
#Print bottom dashes for format
echo "----+------------------------------"
I changed a bit of Armali's code just to make it more appealing to the eye, and the echo was moved to the bottom (out of the loop) so it didn't print as many lines. But again, thank you Armali, as I would've spent a lot more time figuring out exactly how to write that printf code to get the format correct.
I'm not sure how to use, or if I can use, the seq command to achieve this …
seq offers no advantage here over bash's sequence expression combined with printf.
This variant of your script produces (with the usual 8-column tabs) the needed output:
#!/bin/bash
#create a multiplication table 5 columns 10 rows
echo " Multiplication Table"
echo "----+-------------------------------------"
echo -n " |"; printf '\t%d' {0..4}; echo
echo "----+-------------------------------------"
for y in {0..9}
do echo -n "$y |"
for x in {0..4}
do echo -en "\t$((x*y))"
done
echo
echo "----+-------------------------------------"
done
I'm currently using:
printf "%14s %14s %14s %14s %14s %14s\n" $(cat NFE.txt)>prueba.txt
This reads a list in NFE.txt and generates 6 columns. I need to generate N columns where N is a variable.
Is there a simple way of saying something like:
printf "N*(%14s)\n" $(cat NFE.txt)>prueba.txt
Which generates the desire output?
# T1 is a white string with N blanks
T1=$(printf "%${N}s")
# Replace every blank in T with string %14s and assign to T2
T2="${T// /%14s }"
# Pay attention to that T2 contains a trailing blank.
# ${T2% } stands for T2 without a trailing blank
printf "${T2% }\n" $(cat NFE.txt)>prueba.txt
You can do this although i don't know how robust it will be
$(printf 'printf '; printf '%%14s%0.s' {1..6}; printf '\\n') $(<file)
^
This is your variable number of strings
It prints out the command with the correct number of string and executes it in a subshell.
Input
10 20 30 40 50 1 0
1 3 45 6 78 9 4 3
123 4
5 4 8 4 2 4
Output
10 20 30 40 50 1
0 1 3 45 6 78
9 4 3 123 4 5
4 8 4 2 4
You could write this in pure bash, but then you could just use an existing language. For example:
printf "$(python -c 'print("%14s "*6)')\n" $(<NFE.txt)
In pure bash, you could write, for example:
repeat() { (($1)) && printf "%s%s" "$2" "$(times $(($1-1)) "$2")"; }
and then use that in the printf:
printf "$(repeat 6 "%14s ")\n" $(<NFE.txt)
I have a large file that needs to be slitted based on line numbers.
For instance , my file is like that:
aaaaaa
bbbbbb
cccccc
dddddd
****** //here blank line//
eeeeee
ffffff
gggggg
hhhhhh
*******//here blank line//
ıııııı
jjjjjj
kkkkkk
llllll
******
//And so on...
I need two separate files as such that one file should have first 4 lines, third 4 lines, fifth 4 lines in it and the other file should have second 4 lines, fourth 4 lines, sixth 4 lines in it and so on. how can I do that in bash script?
You can play with the number of the line, NR:
$ awk 'NR%10>0 && NR%10<5' your_file > file1
$ awk 'NR%10>5' your_file > file2
If it is 10K + n, 0 < n < 5, then goes to the first file.
If it is 10K + n, n > 5, then goes to the second file.
In one line:
$ awk 'NR%10>0 && NR%10<5 {print > "file1"} NR%10>5 {print > "file2"}' file
Test
$ cat a
1
2
3
4
6
7
8
9
11
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
31
32
33
34
36
37
38
39
41
42
43
44
46
47
48
49
51
$ awk 'NR%10>0 && NR%10<5 {print > "file1"} NR%10>5 {print > "file2"}' a
$ cat file1
1
2
3
4
11
12
13
14
21
22
23
24
31
32
33
34
41
42
43
44
51
$ cat file2
6
7
8
9
16
17
18
19
26
27
28
29
36
37
38
39
46
47
48
49
You can do this with head and tail (which are not be part of the bash itself):
head -n 20 <file> | tail -n 5
gives you the lines 15 to 20.
This is however inefficient, if you want to get multiple sections of your file, since it has to be parsed again and again. In this case I'd prefer some real scripting.
Another approach is to treat blank-line-separated paragraphs as the records, and print odd-numbered and even-numbered records to different files:
awk -v RS= -v ORS='\n\n' '{
outfile = (NR % 2 == 1) ? "file1" : "file2"
print > outfile
}' file
Maybe something like that:
#!/bin/bash
EVEN="even.log"
ODD="odd.log"
line_count=0
block_count=0
while read line
do
# ignore blank lines
if [ ! -z "$line" ]; then
if [ $(( $block_count % 2 )) -eq 0 ]; then
# even
echo "$line" >> "$EVEN"
else
# odd
echo "$line" >> "$ODD"
fi
line_count=$[$line_count +1]
if [ "$line_count" -eq "4" ]; then
block_count=$[$block_count +1]
line_count=0
fi
fi
done < "$1"
The first argument is the source file: ./split.sh split_input
This script prints lines from file 1.txt with indexes 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, ...
i=0
while read p; do
if [ $i%8 -lt 4 ]
then
echo $p
fi
let i=$i+1
done < 1.txt
This script prints lines with indexes 4, 5, 6, 7, 12, 13, 14, 15, ...
i=0
while read p; do
if [ $i%8 -gt 3 ]
then
echo $p
fi
let i=$i+1
done < 1.txt
I have a time series of files 0000.vx.dat, 0000.vy.dat, 0000.vz.dat; ...; 0077.vx.dat, 0077.vy.dat, 0077.vz.dat... Each file is a space-separated 2D matrix. I would like to take each triplet of files and combine them all into a coordinate-based data format, i.e.:
[timestep + 1] [i] [j] [vx(i,j)] [vy(i,j)] [vz(i,j)]
Each file number corresponds to a particular time step. Given the amount of data I have in this time series (~ 4 GB), bash wasn't cutting it so it seemed to be time to head over to awk... specifically mawk. It was pretty stupid to try this in bash but here is
my ill-fated attempt:
for x in $(seq 1 78)
do
tfx=${tf[$x]} # an array of padded zeros
for y in $(seq 1 1568)
do
for z in $(seq 1 1344)
do
echo $x $y $z $(awk -v i=$z -v j=$y "FNR == i {print j}" $tfx.vx.dat) $(awk -v i=$z -v j=$y "FNR == i {print j}" $tfx.vy.dat) $(awk -v i=$z -v j=$y "FNR == i {print j}" $tfx.vz.dat) >> $file
done
done
done
edit: Thank you, ruakh, for pointing out that I had kept j in shell variable format with a $ in front! This is just a snippet of the original script, but I guess would be considered the guts of it!
Suffice it to say this would have taken about six months because of all the memory overhead in bash associated with O(MxN) algorithms, subshells and pipes and whatnot. I was looking for more along the lines of a day at most. Each file is around 18 MB, so it should not be that much of a problem. I would be happy with doing this one timestep at a time in awk provided that I get one output file per timestep. I could just cat them all together without much issue afterwords, I think. It is important, though, that the time step number be the first item on the coordinate list. I could achieve this with an awk -v argument (see above) in with a bash routine. I do not know how to look up specific elements of matrices in three separate files and put them all together into one output. This is the main hurdle I would like to overcome. I was hoping mawk could provide a nice balance between effort and computational speed. If this seems to be too much for an awk script, I could go to something lower level, and would appreciate any of those answering letting me know I should just go to C instead.
Thank you in advance! I really like awk, but am afraid I am a novice.
The three files, 0000.vx.dat, 0000.vy.dat, and 0000.vz.dat would read as follows (except huge and of the correct dimensions):
0000.vx.dat:
1 2 3
4 5 6
7 8 9
0000.vy.dat:
10 11 12
13 14 15
16 17 18
0000.vz.dat:
19 20 21
22 23 24
25 26 27
I would like to be able to input:
awk -v t=1 -f stackoverflow.awk 0000.vx.dat 0000.vy.dat 0000.vz.dat
and get the following output:
1 1 1 1 10 19
1 1 2 2 11 20
1 1 3 3 12 21
1 2 1 4 13 22
1 2 2 5 14 23
1 2 3 6 15 24
1 3 1 7 16 25
1 3 2 8 17 26
1 3 3 9 18 27
edit: Thank you, shellter, for suggesting I put the desired input and output more clearly!
Personally, I use gawk to process most of my text files. However, since you have requested a mawk compatible solution, here's one way to solve your problem. Run, in your present working directory:
for i in *.vx.dat; do nawk -f script.awk "$i" "${i%%.*}.vy.dat" "${i%%.*}.vz.dat"; done
Contents of script.awk:
FNR==1 {
FILENAME++
c=0
}
{
for (i=1;i<=NF;i++) {
c++
a[c] = (a[c] ? a[c] : FILENAME FS NR FS i) FS $i
}
}
END {
for (j=1;j<=c;j++) {
print a[j] > sprintf("%04d.dat", FILENAME)
}
}
When you run the above, the results should be a single file for each set of three files containing your coordinates. These output files will have the filenames in the form: timestamp + 1 ".dat". I decided to pad these filenames with four 0's for your convenience. But you can change this to whatever format you like. Here's the results I get from the sample data you've posted. Contents of 0001.dat:
1 1 1 1 10 19
1 1 2 2 11 20
1 1 3 3 12 21
1 2 1 4 13 22
1 2 2 5 14 23
1 2 3 6 15 24
1 3 1 7 16 25
1 3 2 8 17 26
1 3 3 9 18 27
I have a set of tables in the following format:
1000 3 0 15 14
2000 3 0 7 13
3000 2 3 14 12
4000 3 1 11 14
5000 1 1 9 14
6000 3 1 13 11
7000 3 0 10 15
They are in simple text files.
I want to merge these files into a new table in the same format, where each cell (X,Y) is the sum of all cells (X,Y) from the original set of tables. One slightly complicating factor is that the numbers from the first column should not be summed, since these are labels.
I suspect this can be done with AWK, but I'm not particularly versed in this language and can't find a solution on the web. If someone suggests another tool, that's also fine.
I want to do this from a bash shell script.
Give this a try:
#!/usr/bin/awk -f
{
for (i=2;i<=NF; i++)
a[$1,i]+=$i
b[$1]=$1
if (NF>maxNF) maxNF=NF
}
END {
n=asort(b,c)
for (i=1; i<=n; i++) {
printf "%s ", b[c[i]]
for (j=2;j<=maxNF;j++) {
printf "%d ", a[c[i],j]
}
print ""
}
}
Run it like this:
./sumcell.awk table1 table2 table3
or
./sumcell.awk table*
The output using your example input twice would look like this:
$ ./sumcell.awk table1 table1
1000 6 0 30 28
2000 6 0 14 26
3000 4 6 28 24
4000 6 2 22 28
5000 2 2 18 28
6000 6 2 26 22
7000 6 0 20 30
Sum each line, presuming at least one numeric column on each line.
while read line ; do
label=($line)
printf ${label[0]}' ' ;
expr $(
printf "${label[1]}"
for c in "${label[#]:2}" ; do
printf ' + '$c
done
)
done < table
EDIT: Of course I didn't see the comment about combining based on the label, so this is incomplete.
perl -anE'$h{$F[0]}[$_]+=$F[$_]for 1..4}{say$_,"#{$h{$_}}"for sort{$a<=>$b}keys%h' file_1 file_2