Extracting lines 1, 11, 21, etc. from a text file - shell

Is it possible to retrieve data from lines number 1, 11, 21 ,31 from a text file using Linux commands?
I need to do the same for 2, 12, 22, 32 and so on.

You can use awk for this:
awk '(NR % 10 == 1){ print }' your_input_file
For example:
$ seq 1 100|awk '(NR%10 == 2){print}'
2
12
22
32
42
52
62
72
82
92
As glenn jackman points out, you can parametrize the awk script to make it more easy to use. And given that print is the default action, you can simply write:
$ seq 1 20|awk -v step=10 -v idx=3 'NR%step==idx'
3
13

Related

Insert rows using awk

How can I insert a row using awk?
My file looks as:
1 43
2 34
3 65
4 75
I would like to insert three rows with "?" So my desire file looks as:
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
I am trying with the below script.
awk '{if(NR<=3){print "NR ?"}} {printf" " NR $2}' file.txt
Here's one way to do it:
$ awk 'BEGIN{s=" "; for(c=1; c<4; c++) print c s "?"}
{print c s $2; c++}' ip.txt
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
$ awk 'BEGIN {printf "1 ?\n2 ?\n3 ?\n"} {printf "%d", $1 + 3; printf " %s\n", $2}' file.txt
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
You could also add the 3 lines before awk, e.g.:
{ seq 3; cat file.txt; } | awk 'NR <= 3 { $2 = "?" } $1 = NR' OFS='\t'
Output:
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
I would do it following way using GNU AWK, let file.txt content be
1 43
2 34
3 65
4 75
then
awk 'BEGIN{OFS=" "}NR==1{print 1,"?";print 2,"?";print 3,"?"}{print NR+3,$2}' file.txt
output
1 ?
2 ?
3 ?
4 43
5 34
6 65
7 75
Explanation: I set output field separator (OFS) to 7 spaces. For 1st row I do print three lines which consisting of subsequent number and ? sheared by output field separator. You might elect to do this using for loop, especially if you expect that requirement might change here. For every line I print number of row plus 4 (to keep order) and 2nd column ($2). Thanks to use of OFS, you would need to make only one change if requirement regarding number of spaces will be altered. Note that construct like
{if(condition){dosomething}}
might be written in GNU AWK in more concise manner as
(condition){dosomething}
(tested in gawk 4.2.1)

How to add the elements in a for loop [duplicate]

This question already has answers here:
Summing values of a column using awk command
(2 answers)
Closed 1 year ago.
so basically my code looks through data and greps whatever it begins with, and so I've been trying to figure out a way where I'm able to add the those values.
the sample input is
35 45 75 76
34 45 53 55
33 34 32 21
my code:
for id in $(awk '{ print $1 }' < $3); do echo $id; done
I'm printing it right now to see the values but basically whats outputted is
35
34
33
I'm trying to add them all together but I cant figure out how, some help would be appreciated.
my desired output would be
103
Lots of ways to do this, a few ideas ...
$ cat numbers.dat
35 45 75 76
34 45 53 55
33 34 32 21
Tweaking OP's current code:
$ sum=0
$ for id in $(awk '{ print $1 }' < numbers.dat); do ((sum+=id)); done
$ echo "${sum}"
102
Eliminating awk:
$ sum=0
$ while read -r id rest_of_line; do sum=$((sum+id)); done < numbers.dat
$ echo "${sum}"
102
Using just awk (looks like Aivean beat me to it):
$ awk '{sum+=$1} END {print sum}' numbers.dat
102
awk '{ sum += $1 } END { print sum }'
Test:
35 45 75 76
34 45 53 55
33 34 32 21
Result:
102
(sum(35, 34, 33) = 102, that's what you want, right?)
Here is the detailed explanation of how this works:
$1 is the first column of the input.
sum is the variable that holds the sum of all the values in the first column.
END { print sum } is the action to be performed after all the input has been processed.
So the awk program is basically summing up the first column of the input and printing the result.
This answer was partially generated by Davinci Codex model, supervised and verified by me.

Collapse sequential numbers to ranges in bash

I am trying to collapse sequential numbers to ranges in bash. For example, if my input file is
1
2
3
4
15
16
17
18
22
23
45
46
47
I want the output as:
1 4
15 18
22 23
45 47
How can I do this with awk or sed in a single line command?
Thanks for any help!
$ awk 'NR==1{first=$1;last=$1;next} $1 == last+1 {last=$1;next} {print first,last;first=$1;last=first} END{print first,last}' file
1 4
15 18
22 23
45 47
Explanation
NR==1{first=$1;last=$1;next}
On the first line, initialize the variables first and last and skip to next line.
$1 == last+1 {last=$1;next}
If this line continues in the sequence from the last, update last and jump to the next line.
print first,last;first=$1;last=first
If we get here, we have a break in the sequence. Print out the range for the last sequence and reinitialize the variables for a new sequence.
END{print first,last}
After we get to the end of the file, print the final sequence.

Removing multiple block of lines of a text file in bash

Assume a text file with 40 lines of data. How can I remove lines 3 to 10, 13 to 20, 23 to 30, 33 to 40, in place using bash script?
I already know how to remove lines 3 to 10 with sed, but I wonder if there is a way to do all the removing, in place, with only one command line. I can use for loop but the problem is that with each iteration of loop the lines number will be changed and it needs some additional calculation of line numbers to be removed.
here is an awk oneliner, works for your needs no matter your file has 40 lines or 40k lines:
awk 'NR~/[12]$/' file
for example, with 50 lines:
kent$ seq 50|awk 'NR~/[12]$/'
1
2
11
12
21
22
31
32
41
42
sed -i '3,10d;13,20d;23,30d;33,40d' file
This might work for you (GNU sed):
sed '3~10,+7d' file
Deletes lines in the range of 3 and thereafter steps of 10 for the following 7 lines to be deleted.
If the file was longer than 40 lines and you were only interested in the first 40 lines:
sed '41,$b;3~10,+7d' file
The first instruction tells sed to ignore lines 41 to end-of-file.
Could also be written:
sed '1,40{3~10,+7d}' file
#Kent's answer is the way to go for this particular case, but in general:
$ seq 50 | awk '{idx=(NR%10)} idx>=1 && idx<=2'
1
2
11
12
21
22
31
32
41
The above will work even if you want to select the 4th through 7th lines out of every 13, for example:
$ seq 50 | awk '{idx=(NR%13)} idx>=4 && idx<=7'
4
5
6
7
17
18
19
20
30
31
32
33
43
44
45
46
its not constrained to N out of 10.
Or to select just the 3rd, 5th and 6th lines out of every 13:
$ seq 50 | awk 'BEGIN{split("3 5 6",tmp); for (i in tmp) tgt[tmp[i]]=1} tgt[NR%13]'
3
5
6
16
18
19
29
31
32
42
44
45
The point is - selecting ranges of lines is a job for awk, definitely not sed.
awk '{m=NR%10} !(m==0 || m>=3)' file > tmp && mv tmp file

split a file based upon line number

I have a large file that needs to be slitted based on line numbers.
For instance , my file is like that:
aaaaaa
bbbbbb
cccccc
dddddd
****** //here blank line//
eeeeee
ffffff
gggggg
hhhhhh
*******//here blank line//
ıııııı
jjjjjj
kkkkkk
llllll
******
//And so on...
I need two separate files as such that one file should have first 4 lines, third 4 lines, fifth 4 lines in it and the other file should have second 4 lines, fourth 4 lines, sixth 4 lines in it and so on. how can I do that in bash script?
You can play with the number of the line, NR:
$ awk 'NR%10>0 && NR%10<5' your_file > file1
$ awk 'NR%10>5' your_file > file2
If it is 10K + n, 0 < n < 5, then goes to the first file.
If it is 10K + n, n > 5, then goes to the second file.
In one line:
$ awk 'NR%10>0 && NR%10<5 {print > "file1"} NR%10>5 {print > "file2"}' file
Test
$ cat a
1
2
3
4
6
7
8
9
11
12
13
14
16
17
18
19
21
22
23
24
26
27
28
29
31
32
33
34
36
37
38
39
41
42
43
44
46
47
48
49
51
$ awk 'NR%10>0 && NR%10<5 {print > "file1"} NR%10>5 {print > "file2"}' a
$ cat file1
1
2
3
4
11
12
13
14
21
22
23
24
31
32
33
34
41
42
43
44
51
$ cat file2
6
7
8
9
16
17
18
19
26
27
28
29
36
37
38
39
46
47
48
49
You can do this with head and tail (which are not be part of the bash itself):
head -n 20 <file> | tail -n 5
gives you the lines 15 to 20.
This is however inefficient, if you want to get multiple sections of your file, since it has to be parsed again and again. In this case I'd prefer some real scripting.
Another approach is to treat blank-line-separated paragraphs as the records, and print odd-numbered and even-numbered records to different files:
awk -v RS= -v ORS='\n\n' '{
outfile = (NR % 2 == 1) ? "file1" : "file2"
print > outfile
}' file
Maybe something like that:
#!/bin/bash
EVEN="even.log"
ODD="odd.log"
line_count=0
block_count=0
while read line
do
# ignore blank lines
if [ ! -z "$line" ]; then
if [ $(( $block_count % 2 )) -eq 0 ]; then
# even
echo "$line" >> "$EVEN"
else
# odd
echo "$line" >> "$ODD"
fi
line_count=$[$line_count +1]
if [ "$line_count" -eq "4" ]; then
block_count=$[$block_count +1]
line_count=0
fi
fi
done < "$1"
The first argument is the source file: ./split.sh split_input
This script prints lines from file 1.txt with indexes 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, ...
i=0
while read p; do
if [ $i%8 -lt 4 ]
then
echo $p
fi
let i=$i+1
done < 1.txt
This script prints lines with indexes 4, 5, 6, 7, 12, 13, 14, 15, ...
i=0
while read p; do
if [ $i%8 -gt 3 ]
then
echo $p
fi
let i=$i+1
done < 1.txt

Resources