Extract blocks of lines with sed - bash

How would one go about with sed to extract n lines of a file every m-th line?
Say my textfile looks like this:
myfile.dat:
1
2
3
4
5
6
7
8
9
10
Say that I want to extract blocks of three lines and then skipping two lines throughout the entire file, such that my output looks like this:
output.dat:
1
2
3
6
7
8
Any suggestions on how one could achieve this with sed?
Edit:
For my example I could just have used
sed -n 'p;n;p;n;p;n;n' myfile.dat > output.dat
or with GNU sed (not preferred due to portability)
sed '1~5b;2~5b;3~5b;d' myfile.dat > output.dat
However, I typically want to print blocks of 2450 lines from a file with 49 002 450 lines, such that my outputfile contains 247 450 lines.

This might work for you (GNU sed):
sed -n '1~5,+2p' file
Starting at line 1, print line numbers with modulus 5 and the following two lines.
An alternative:
sed -n 'N;N;p;n;n' file

In your case the below would work. It's checking the remainder when divided by 5 is between 1 and 3:
awk 'NR%5==1, NR%5==3' myfile.dat

Related

Convert .csv file with multiple values in each line to one per line

I have a csv file with multiple values on each line like this
0,1,2,3,4,5,6
I would like to convert it to
0
1
2
3
4
5
6
Is there any quick, easy way to do this in linux terminal?
cat mycsvfile.txt | tr ',' '\n' > aaa.txt <enter>
Just Found this (http://www.askmeaboutlinux.com/?p=2742)
In only one command you can use sed -i 's/,/\n/g' file.txt
This will replace, in your entire file.txt, the char , by the char \n.
You may find explanation on how this command works on this answer.

How to delete multiple first columns from multiple files?

I have multiple files like this:
trans_ENSG00000047849.txt.traw
trans_ENSG00000047848.txt.traw
trans_ENSG00000047847.txt.traw
...
and each has around 300 columns. Columns are separated with tab.I would like to remove the first 7 columns from each of those files.
I know how to do it for each file:
cut -f 7- trans_ENSG00000047849.txt.traw > trans_ENSG00000047849.txt.trawN
Is there is a way to do it at once for all files?
NOTE: there is a tab at the beginning. Therefore I used here cut -f 7 rather than cut -f 8 to remove the first 7 columns.
Just use a for loop:
for file in *.txt.traw
do
cut -f 7- "$file" > "$file"N
done
Backup your files first, and try this (GNU sed):
sed -ri 's/^([^\t]*\t){7}//' trans_*.txt.traw
As -i to sed will change your files in place. (You can remove the i for testing).
Eg:
$ cat file
1 2 3 4 5 6 7 8 9 0
a b c d e f g h i j
dfad da
$ sed -ri 's/^([^\t]*\t){7}//' file
$ cat file
8 9 0
h i j
dfad da
However, the command's for simple, so it won't remove when there're less than 7 columns. (Guess you won't have lines like this, right?)
If you still want to remove when there're less than 7 columns:
sed -r 's/^([^\t]*(\t|$)){,7}//'

Unix file read from specific position onwards

I've a unix text file and I want to search from a specific pattern. When I find the first occurrence of the pattern, from that position onward, I want to read complete file till end.
How I can achieve this via bash commands.
Regards,
DKamran
Something like this perhaps?
seq 15 | sed '0,/4/d'
Will delete up to the first 4 and print the rest.
perl -ne 'if (/your_pattern/..EOF){print}' your_file.txt
This uses the flip flop operator so that it ignores every line until your pattern is matched. Until it reaches the end of the file, it prints each line.
awk variant:
$ seq 1 15 | awk '/6/{t=1}t'
6
7
8
9
10
11
12
13
14
15
grep for the pattern Using the -n option to get the line number. Cut out the line number using awk or cut and then use tail with the -n option and a + suffixed with the line number you've obtained to give you the file from that point onwards.

How to select the nth line of a file from a variable in bash?

Possible duplicate: Bash tool to get nth line from a file
I need to select the nth line a file, this line is defined be the variable PBS_ARRAYID
The accept solution in the another question (link above) is:
sed 'NUMq;d' job_params
I'm trying to adapt for the variable like (actually I try lots of stuff, but is the one that makes more sense):
sed "${PBS_ARRAYID}q;d" job_params
But I get the following error:
sed: -e expression #1, char 2: invalid usage of line address 0
What am I doing wrong?
Your solution is correct:
sed "${PBS_ARRAYID}q;d" job_params
The only problem is that sed considers the first line to be line 1 (thanks rici), so PBS_ARRAYID must be in range [1,X], where X is the number of lines on the input file, or:
wc -l job_params
Here is an awk example.
Lets say we have this file:
cat file
1 one
2 two
3 three
4 four
5 five
6 six
7 seven
8 eight
9 nine
Then we have theses variable:
var="four"
number=2
Then this awk gives:
awk '$0~v {f=NR} f && f+n==NR' v="$var" n="$number" file
6 six

Extract a range of rows, with overlap using sed

I have a (dummy) file that looks like this:
header
1
2
3
4
5
6
7
8
9
10
And I need a command that would give me different files made of rows extracted every four lines with one overlaping row. So I would have something like this:
1
2
3
4
3
4
5
6
5
6
7
8
7
8
9
10
So here is what I got (it is not much, sorry):
tail -n + 2 | sed -n 1,4p > window1.txt
But I don't know how to apply this over all the file, with an overlap.
Thanks in advance.
This might work for you (GNU sed and split):
sed -nr '1{N;N;N};:a;p;$q;s/^.*\n.*\n(.*\n.*)$/\1/;N;N;ba' file | split -dl4
EDIT:
To make this programmable use:
sed -nr ':a;$!{N;s/[^\n]+/&/4;Ta};p;$q;s/.*((\n[^\n]*){2})$/\1/;D' file |
split -dl4 file-name-prefix
Where 4 is the number lines per file and 2 is the number of overlap lines.
File-name-prefix is your chosen file name which will have numbers appended (see man split).

Resources