how to print every fifth row in a file - bash

I have a file with numbers
20
18
21
16
14
30
40
24
and I need to output four files with rows printed with intervals of 4
So we have rows 1,5,9...
20
14
Then rows 2,6,10...
18
30
Then 3,7,11...
21
40
and then 4,8,12...
16
24
I did try the code below but it does not give me the control over the starting row
awk 'NR % 4 == 0'

In a single awk you can do:
awk '{print > ("file" (NR%4))}' inputfile
This will send the output to files file0, file1, file2 and file3

You may use these awk commands:
awk -v n=1 'NR%4 == n%4' file
20
14
awk -v n=2 'NR%4 == n%4' file
18
30
awk -v n=3 'NR%4 == n%4' file
21
40
awk -v n=4 'NR%4 == n%4' file
16
24

IMHO awk is the best solution. You can use sed:
Inputfile generated with seq 12:
for ((i=1;i<5; i++)); do
sed -n $i~4w$i.out <(seq 12)
done
Here w$i.out writes to file $i.out.

This might work for you (GNU sed):
sed -ne '1~4w file1' -e '2~4w file2' -e '3~4w file3' -e '4~4w file4' file

Related

How do i use loops to get it to read each line at a time in bash?

I have a text file like,
01_AA_00 11
02_BB_00 11
03_CC_01 22
04_BB_01 22
05_CC_02 33
06_CC_02 33
Expected output in a new file is,
01_AA_00 11 AABBCCDD
02_AA_00 11 AABBCCDD
03_AA_01 22 AABBCCDD
04_BB_01 22 AABBCCDD
05_CC_02 33 AABBCCDD
06_CC_02 33 AABBCCDD
What i have been trying to do,
while IFS= read -r line; do
fName=$(awk '{print $1}' $1)
printf "$fName AABBCCDD\n" > nFile.txt
done < $1
the output i am getting is like this,
01_AA_00 11
02_BB_00 11
03_CC_01 22
04_BB_01 22
05_CC_02 33
06_CC_02 33 AABBCCDD
I not looking to just add text after the each line, where i know that it could be done like so awk '{print $0, "AABBCCDD"}' file.txt > nFile.txt since i have to use other information stored in variables.
you can just
while IFS= read -r line; do
echo $line" AABBCCDD" >> nFile.txt;
done < $1
You can do it with awk only
awk -v v="AABBCCDD" '{print $0 " " v}' file.txt > nFile.txt
just leverage ORS ::
{m,g}awk 1 ORS=' AABBCCDD\n'
01_AA_00 11 AABBCCDD
02_BB_00 11 AABBCCDD
03_CC_01 22 AABBCCDD
04_BB_01 22 AABBCCDD
05_CC_02 33 AABBCCDD
06_CC_02 33 AABBCCDD
another way is to use OFS instead ::
{m,g}awk ++NF FS='^$' OFS=' AABBCCDD'
A sed solution:
sed 's/$/ AABBCCDD/' file.txt > nFile.txt

How to add number to beginning of each line?

This is what I normally use to add numbers to the beginning of each line:
awk '{ print FNR " " $0 }' file
However, what I need to do is start the number at 1000001. Is there a way to start with a specific number like this instead of having to use line numbers?
there is a special command for this nl
nl -v1000001 file
You can just add 1000001 to FNR (or NR):
awk '{ print (1000001 + FNR), $0 }' file
$ seq 5 | awk -v n=1000000 '{print ++n, $0}'
1000001 1
1000002 2
1000003 3
1000004 4
1000005 5
$ seq 5 | awk -v n=30 '{print ++n, $0}'
31 1
32 2
33 3
34 4
35 5

Get n lines from file which are equal spaced

I have a big file with 1000 lines.I wanted to get 110 lines from it.
Lines should be evenly spread in Input file.
For example,I have read 4 lines from file with 10 lines
Input File
1
2
3
4
5
6
7
8
9
10
outFile:
1
4
7
10
Use:
sed -n '1~9p' < file
The -n option will stop sed from outputting anything. '1~9p' tells sed to print from line 1 every 9 lines (the p at the end orders sed to print).
To get closer to 110 lines you have to print every 9th line (1000/110 ~ 9).
Update: This answer will print 112 lines, if you need exactly 110 lines, you can limit the output just using head like this:
sed -n '1~9p' < file | head -n 110
$ cat tst.awk
NR==FNR { next }
FNR==1 { mod = int((NR-1)/tgt) }
!( (FNR-1)%mod ) { print; cnt++ }
cnt == tgt { exit }
$ wc -l file1
1000 file1
$ awk -v tgt=110 -f tst.awk file1 file1 > file2
$ wc -l file2
110 file2
$ head -5 file2
1
10
19
28
37
$ tail -5 file2
946
955
964
973
982
Note that this will not produce the output you posted in your question given your posted input file because that would require an algorithm that doesn't always use the same interval between output lines. You could dynamically calculate mod and adjust it as you parse your input file if you like but the above may be good enough.
With awk you can do:
awk -v interval=3 '(NR-1)%interval==0' file
where interval is the difference in line count between consecutive lines that are printed. The value is essentially a division of the total lines in the file divided by the number of lines that are printed.
I often like to use a combination of shell and awk for these sorts of things
#!/bin/bash
filename=$1
toprint=$2
awk -v tot=$(expr $(wc -l < $filename)) -v toprint=$toprint '
BEGIN{ interval=int((tot-1)/(toprint-1)) }
(NR-1)%interval==0 {
print;
nbr++
}
nbr==toprint{exit}
' $filename
Some examples:
$./spread.sh 1001lines 5
1
251
501
751
1001
$ ./spread.sh 1000lines 110 |head -n 3
1
10
19
$ ./spread.sh 1000lines 110 |tail -n 3
964
973
982

Print every n lines from a file

I'm trying to print every nth line from file, but n is not a constant but a variable.
For instance, I want to replace sed -n '1~5p' with something like sed -n '1~${i}p'.
Is this possible?
awk can also do it in a more elegant way:
awk -v n=YOUR_NUM 'NR%n==1' file
With -v n=YOUR_NUM you indicate the number. Then, NR%n==1 evaluates to true just when the line number is on a form of 7n+1, so it prints the line.
Note how good it is to use awk for this: if you want the lines on the form of 7n+k, you just need to do: awk -v n=7 'NR%n==k' file.
Example
Let's print every 7 lines:
$ seq 50 | awk -v n=7 'NR%n==1'
1
8
15
22
29
36
43
50
Or in sed:
$ n=7
$ seq 50 | sed -n "1~$n p" # quote the expression, so that "$n" is expanded
1
8
15
22
29
36
43
50
The point is, you should use double quote " instead of the single one to wrap your sed codes. Variable won't be expanded within single quote. so:
sed -n "1~${i} p"
You can do it like this for example:
i=3
sed "2,${i}s/.*/changed line/g" InputFile
Example:
AMD$ cat File
aaaaaaaaaaaaaaaaaaaaaaaa
bbbbbbbbbbbbbbbbbbbbbb
cccccccccccccccccccccc
ddddddddddddddddddddddd
eeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffffffffff
ggggggggggggggggggggg
AMD$ i=4; sed "2,${i}s/.*/changed line/g" File
aaaaaaaaaaaaaaaaaaaaaaaa
changed line
changed line
changed line
eeeeeeeeeeeeeeeeeeeeeeee
fffffffffffffffffffff
ggggggggggggggggggggg
The key is to use " " for variable substitution.
why a sed ?
head -${i} YourFile

bash process data from two files

file1:
456
445
2323
file2:
433
456
323
I want get the deficit of the data in the two files and output to output.txt, that is:
23
-11
2000
How do I realize this? thank you.
$ paste file1 file2 | awk '{ print $1 - $2 }'
23
-11
2000
Use paste to create the formulae, and use bc to perform the calculations:
paste -d - file1 file2 | bc
In pure bash, with no external tools:
while read -u 4 line1 && read -u 5 line2; do
printf '%s\n' "$(( line1 - line2 ))"
done 4<file1 5<file2
This works by opening both files (attaching them to file descriptors 4 and 5); going into a loop in which we read one line from each descriptor per iteration (exiting the loop if either has no value), and calculate and print the result.
You could use paste and awk to operate between columns:
paste -d" " file1 file2 | awk -F" " '{print ($1-$2)}'
Or even pipe to a file:
paste -d" " file1 file2 | awk -F" " '{print ($1-$2)}' > output.txt
Hope it helps!

Resources