SED: copy lines from a file to specific line in another file - bash

I can do this using the following example. The 1st command will output the lines 16...80 from file1 to patch, while the 2nd will insert the contents of patch after line 18 to file2:
sed -n 16,80p file1>patch
sed -i 18rpatch file2
However, I would like to copy directly from one file to another without using a temporary file in-between, in one command using sed (not awk, etc.). I'm pretty sure this is possible, just don't know how.

Doing this with sed requires some additional shell trickery. Assuming bash, you could use
sed -i 18r<(sed '16,80!d' file1) file2
Where <(sed '16,80!d' file1) is substituted with the name of a pipe from which the output of sed '16,80!d' file1 can be read.
Generally, I feel that it is nicer to do this with awk (if a little longer), because awk is better equipped to handle multiple input files. For example:
awk 'NR == FNR { if(FNR >= 16 && FNR <= 80) { patch = patch $0 ORS }; next } FNR == 18 { $0 = patch $0 } 1' file1 file2
This works as follows:
NR == FNR { # While processing the first file
if(FNR >= 16 && FNR <= 80) { # remember the patch lines
patch = patch $0 ORS
}
next # and do nothing else
}
FNR == 18 { # after that, while processing the first file:
$0 = patch $0 # prepend the patch to line 18
}
1 # and print regardless of whether the current
# line was patched.
However, this approach does not lend itself to in-place editing of files. This is not usually a problem; I'd simply use
cp file2 file2~
awk ... file1 file2~ > file2
with the added advantage of having a backup in case things go pear-shaped, but in the end it's up to you.

I have done something similar using:
head -80 file | tail -16 > patch
Check the documentation for your local versions of head and tail, and change the two integers to suit your requirements.

sed -i '1,15 d
34 r patch
81,$ d' YourFile
# oneliner version
sed -i -e '1,15 d' -e '34 r patch' -e '81,$ d' YourFile
order of line is not important.
You can adapt a bit or batch it with variable like this
sed -i "1,16 d
$(( 16 + 18 )) r patch
81,$ d" YourFile
but add some security about line count in this case.
if the r line is more than 1 line, following line are still counted from original place and final file is bigger than 80 - 16 lines
i dont exactly test for line taken,excluded or modified (like 34 is the 18th line of cropped file),but principe is the same
Explaination for Lines index references used in this sample:
1,15 are the heading line to remove, so file take care lines from 16 in this case
34 is the line to change the content and is the result of 18th line AFTER the first new content (line 16 in our case) so 16 + 18 = 34
81,$ are trailing lines to remove, $ mean last line and 81 is the first line (after 80 that is taken) of the unwanted trailing lines.

i had this problem, i did it in 2 steps(1-tail 2-head), for example in a text file with 20 lines(test.txt), we want to copy lines from 13 to 17 to another file(final.txt),
tail -8 test.txt > temp.txt
head -5 temp.txt > final.txt

Related

Delete range of lines when line number of known or not in unix using head and tail?

This is my sample file.
I want to do this.
I have fixed requirement to delete 2nd and 3rd line keeping the 1st line.
From the bottom, I want to delete above 2 lines excluding last line, as I wouldn't know what my last line number is as it depends on file.
Once I delete my 2nd and 3rd line 4th line should ideally come at 2nd and so on, same for a bottom after delete.
I want to use head/tail command and modify the existing file only. as Changes to write back to the same file.
Sample file text format.
Input File
> This is First Line
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> ..
> ..
> ..
> ..
> Delete Delete Delete This Line
> Delete Delete Delete This Line
> This is Last Line, should not be deleted It could be come at any line
number (variable)
Output file (same file modified)
This is First Line
..
..
..
..
This is Last Line, should not be deleted It could be come at any line number (variable)
Edit - Because of compatibility issues on Unix (Using HP Unix on ksh shell) I want to implement this using head/tail/awk. not sed.
Adding solution as per OP's request to make it genuine solution.
Approach: In this solution OP could provide lines from starting point and from ending point of any Input_file and those lines will be skipped.
What code will do: I have written code in that way it will generate an awk code as per your given lines to be skipped then and will run it too.
cat print_lines.ksh
start_line="2,3"
end_line="2,3"
total_lines=$(wc -l<Input_file)
awk -v len="$total_lines" -v OFS="||" -v s1="'" -v start="$start_line" -v end="$end_line" -v lines=$(wc -l <Input_file) '
BEGIN{
num_start=split(start, a,",");
num_end=split(end, b,",");
for(i=1;i<=num_start;i++){
val=val?val OFS "FNR=="a[i]:"FNR=="a[i]};
for(j=1;j<=num_end;j++){
b[j]=b[j]>1?len-(b[j]-1):b[j];
val=val?val OFS "FNR=="b[j]:"FNR=="b[j]};
print "awk " s1 val "{next} 1" s1" Input_file"}
' | sh
Change Input_file name to your actual file name and let me know how it goes then.
Following awk may help you in same(Since I don't have Hp system so didn't test it).
awk -v lines=$(wc -l <Input_file) 'FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){next} 1' Input_file
EDIT: Adding non-one liner form of solution too now.
awk -v lines=$(wc -l <Input_file) '
FNR==2 || FNR==3 || FNR==(lines-1) || FNR==(lines-2){
next}
1
' Input_file
wc + sed solution:
len=$(wc -l inpfile | cut -d' ' -f1)
sed "$(echo "$((len-2)),$((len-1))")d; 2,3d" inpfile > tmp_f && mv tmp_f inpfile
$ cat inputfile
> This is First Line
> ..
> ..
> ..
> ..
> This is Last Line, should not be deleted It could be come at any line
Perl suggestion... read whole file into array #L, get index of last line. Delete 2nd last, 3rd last, 3rd and 2nd line. Print what's left.
perl -e '#L=<>; $p=$#L; delete $L[$p-1]; delete $L[$p-2]; delete $L[2]; delete $L[1]; print #L' file.txt
Or, maybe a little more succinctly with splice:
perl -e '#L=<>; splice #L,1,2; splice #L,$#L-2,2; print #L' file.txt
If you wish to have some flexibility a ksh script approach may work, though little expensive in terms of resources :
#!/bin/ksh
[ -f "$1" ] || echo "Input is not a file" || exit 1
total=$(wc -l "$1" | cut -d' ' -f1 )
echo "How many lines to delete at the end?"
read no
[ -z "$no" ] && echo "Not sure how many lines to delete, aborting" && exit 1
sed "2,3d;$((total-no)),$((total-1))d" "$1" >tempfile && mv tempfile "$1"
And feed the file as argument to the script.
Notes
This deletes second and third lines.
Plus no number of lines from last excluding last as read from user.
Note: My ksh version is 93u+ 2012-08-01
awk '{printf "%d\t%s\n", NR, $0}' < file | sed '2,3d;N;$!P;D' file
The awk here serves the purpose of providing line numbers and then passing the output to the sed which uses the line numbers to do the required operations.
%d : Used to print the numbers. You can also use '%i'
'\t' : used to place a tab between the number and string
%s : to print the string of charaters
'\n' : To create a new line
NR : to print lines numbers starting from 1
For sed
N: Read/append the next line of input into the pattern space.
$! : is for not deleting the last line
D : This is used when pattern space contains no new lines normal and start a new cycle as if the d command was issued. Otherwise, delete text in the pattern space up to the specified lines, and restart cycle with the resultant pattern space, without reading a new line of input.
P : Print up to the first embedded newline of the current pattern space.This
prints the lines after removing the subjected lines.
I enjoyed this task and wrote awk script for more scaleable case (huge files).
Reading/scanning the input file once (no need to know line count), not storing the whole file in memory.
script.awk
BEGIN { range = 3} # define sliding window range
{lines[NR] = $0} # capture each line in array
NR == 1 {print} # print 1st line
NR > range * 2{ # for lines in sliding window range bottom
print lines[NR - range]; # print sliding window top line
delete lines[NR - range]; # delete sliding window top line
}
END {print} # print last line
running:
awk -f script.awk input.txt
input.txt
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10
output:
line 1
line 4
line 5
line 6
line 7
line 10

Printing numerous specific lines from file using awk or sed command loop

I've got this big txt file with ID names. It has 2500 lines, one column. Let's call it file.txt
H3430
H3467
H9805
Also, I've got another file, index.txt, which has 390 numbers:
1
4
9
13
15
Those numbers are the number of lines (of IDs) I have to extract from file.txt. I need to generate another file, newfile.txt let's call it, with only the 390 IDs that are in the specific lines that index.txt demands (the first ID of the list, the fourth, the ninth, and so on).
So, I tried to do the following loop, but it didn't work.
num=$'index.txt'
for i in num
do
awk 'NR==i' "file.txt" > newfile.txt
done
I'm a noob regarding this matters... so, I need some help. Even if it is with my loop or with a new solution suggested by you. Thank you :)
Lets create an example file that simulates your 2500 line file with seq:
$ seq 2500 > /tmp/2500
And use the example you have for the line numbers to print in a file called 390:
$ echo "1
4
9
13
15" > /tmp/390
You can print line N in the file 2500 by reading the line numbers into an array and printing the lines if in that array:
$ awk 'NR==FNR{ a[$1]++; next} a[FNR]' /tmp/390 /tmp/2500
You can also use a sed command file:
$ sed 's/$/p/' /tmp/390 > /tmp/sed_cmd
$ sed -n -f /tmp/sed_cmd /tmp/2500
With GNU sed, you can do sed 's/$/p/' /tmp/390 | sed -n -f - /tmp/2500 but that does not work on OS X :-(
You can do this tho:
$ sed -n -f <(sed 's/$/p/' /tmp/390) /tmp/2500
You can read the index.txt file in to a map and then compare it with the line number of file.txt. Redirect the output to another file.
awk 'NR==FNR{line[$1]; next}(FNR in line){print $1}' index.txt file.txt > newfile.txt
When you work with two files, using FNR is necessary as it gets reset to 1 when new file starts (on the contrary NR will continue to increment).
As Ed Morton suggests in the comments. The command could then be refined to further remove {print $1} since awk prints by default on truth.
awk 'NR==FNR{line[$1]; next} FNR in line' index.txt file.txt > newfile.txt
If index.txt is sorted, we could walk file.txt in order.
That reduces the number of actions to the very minimum (faster script):
awk 'BEGIN
{ indexfile="index.txt"
if ( (getline ind < indexfile) <= 0)
{ printf("Empty %s\n; exiting",indexfile);exit }
}
{ if ( FNR < ind ) next
if ( FNR == ind ) printf("%s %s\n",ind,$0)
if ( (getline ind < indexfile) <= 0) {exit}
}' file.txt
If the file is not actually sorted, get it quickly sorted with sort:
sort -n index.txt > temp.index.txt
rm index.txt
mv temp.index.txt index.txt

how to delete a large number of lines from a file

I have a file with ~700,000 lines and I would like to remove a bunch of specific lines (~30,000) using bash scripting or another method.
I know I can remove lines using sed:
sed -i.bak -e '1d;34d;45d;678d' myfile.txt # an example
I have the lines in a text file but I don't know if I can use it as input to sed, maybe perl??
Thanks
A few options:
sed <(sed 's/$/d/' lines_file) data_file
awk 'NR==FNR {del[$1]; next} !(FNR in del)' lines_file data_file
perl -MPath::Class -e '
%del = map {$_ => 1} file("lines_file")->slurp(chomp => 1);
$f = file("data_file")->openr();
while (<$f>) {
print unless $del{$.};
}
'
perl -ne'
BEGIN{ local #ARGV =pop; #h{<>} =() }
exists $h{"$.\n"} or print;
' myfile.txt lines
You can make the remove the lines using sed file.
First make a list of lines to remove. (One line number for one line)
$ cat lines
1
34
45
678
Make this file to sed format.
$ sed -e 's|$| d|' lines >lines.sed
$ cat lines.sed
1 d
34 d
45 d
678 d
Now use this sed file and give it as input to sed command.
$ sed -i.bak -f lines.sed file_with_70k_lines
This will remove the lines.
If you can create a text file of the format
1d
34d
45d
678d
then you can run something like
sed -i.bak -f scriptfile datafile
You can use a genuine editor for that, and ed is the standard editor.
I'm assuming your lines are in a file lines.txt, one number per line, e.g.,
1
34
45
678
Then (with a blatant bashism):
ed -s file.txt < <(sed -n '/^[[:digit:]]\+$/p' lines.txt | sort -nr | sed 's/$/d/'; printf '%s\n' w q)
A first sed selects only the numbers from file lines.txt (just in case).
There's something quite special to take into account here: that when you delete line 1, then line 34 in the original file becomes line 33. So it's better to remove the lines from the end: start with 678, then 45, etc. that's why we're using sort -nr (to sort the numbers in reverse order). A final sed appends d (ed's delete command) to the numbers.
Then we issue the w (write) and q (quit) commands.
Note that this overwrites the original file!

Extract specified lines from a file

I have a file and I want to extract specific lines from that file like lines 2, 10, 15,21, .... and so on. There are around 200 thousand lines to be extracted from the file. How can I do it efficiently in bash
Maybe looking for:
sed -n -e 1p -e 4p afile
Put the linenumbers of the lines you want in a file called "wanted", like this:
2
10
15
21
Then run this script:
#!/bin/bash
while read w
do
sed -n ${w}p yourfile
done < wanted
TOTALLY ALTERNATIVE METHOD
Or you could let "awk" do it all for you, like this which is probably miles faster since you won't have to create 200,000 sed processes:
awk 'FNR==NR{a[$1]=1;next}{if(FNR in a){print;}}' wanted yourfile
The FNR==NR portion detects when awk is reading the file called "wanted" and if so, it sets element "$1" of array "a" to "1" so we know that this line number is wanted. The stuff in the second set of curly braces is active when processing your bigger file only and it prints the current line if its linenumber is in the array "a" we created when reading the "wanted" file.
$ gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
Wanted line numbers have to be stored in lines delimited by newline and they may safely be in random order. It almost exactly the same as #Mark Setchell’s second method, but uses a little more clear way to determine which file is current. Although this ARGIND is GNU extension, so gawk. If you are limited to original AWK or mawk, you can write it as:
$ awk 'FILENAME==ARGV[1] { L[$0]++ }; FILENAME==ARGV[2] && FNR in L' lines file > file.lines
Efficiency test:
$ awk 'BEGIN { for (i=1; i<=1000000; i++) print i }' > file
$ shuf -i 1-1000000 -n 200000 > lines
$ time gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
real 0m1.734s
user 0m1.460s
sys 0m0.052s
UPD:
As #Costi Ciudatu pointed out, there is room for impovement for the case when all wanted lines are in the head of a file.
#!/usr/bin/gawk -f
ARGIND==1 { L[$0]++ }
ENDFILE { L_COUNT = FNR }
ARGIND==2 && FNR in L { L_PRINTED++; print }
ARGIND==2 && L_PRINTED == L_COUNT { exit 0 }
Sript interrupts when last line is printed, so now it take few milliseconds to filter out 2000 random lines from first 1 % of a one million lines file.
$ time ./getlines.awk lines file > file.lines
real 0m0.016s
user 0m0.012s
sys 0m0.000s
While reading a whole file still takes about a second.
$ time gawk 'ARGIND==1 { L[$0]++ }; ARGIND==2 && FNR in L' lines file > file.lines
real 0m0.780s
user 0m0.756s
sys 0m0.016s
Provided your system supports sed -f - (i.e. for sed to read its script on standard input; it works on Linux, but not on some other platforms) you can turn the file of line numbers into a sed script, naturally using sed:
sed 's/$/p/' lines | sed -n -f - inputfile >output
If the lines you're interested in are close to the beginning of the file, you can make use of head and tail to efficiently extract specific lines.
For your example line numbers (assuming that list doesn't go on until close to 200,000), a dummy but still efficient approach to read those lines would be the following:
for n in 2 10 15 21; do
head -n $n /your/large/file | tail -1
done
sed Example
sed -n '2p' file
awk Example
awk 'NR==2' file
this will print 2nd line of file
use same logic in loop & try.
say a for loop
for VARIABLE in 2 10 15 21
do
awk "NR==$VARIABLE" file
done
Give your line numbers this way..

Apply an gawk script to multiple files in a folder

I would like to use the following awk line to remove every even line (and keep the odd lines) in a text file.
awk 'NR%2==1' filename.txt > output
The problem is that I struggle to either loop properly in awk or build a shell script to apply this to all *.txt fies in a folder. I tried to use this one-liner
gawk 'FNR==1{if(o)close(o);o=FILENAME;
sub(/\.txt/,"_oddlines.txt",o)}{NR%2==1; print>o}'
but that didn't remove the even lines. And I am even less familiar with shell scripting. I use gawk under win7 or cygwin with bash. Many thanks for any kind of idea.
Your existing gawk one-liner is really close. Here it is formatted as a more readable script:
FNR == 1 {
if (o)
close(o)
o = FILENAME
sub(/\.txt/, "_oddlines.txt", o)
}
{
NR % 2 == 1
print > o
}
This should make the error obvious1. So now we remove that error:
FNR == 1 {
if (o)
close(o)
o = FILENAME
sub(/\.txt/, "_oddlines.txt", o)
}
NR % 2 == 1 {
print > o
}
$ awk -f foo.awk *.txt
and it works (and of course you can re-one-line-ize this).
(Normally I would do this with a for like the other answers, but I wanted to show you how close you were!)
1Per comment, maybe not quite so obvious?
Awk's basic language construct is the "pattern-action" statement. An awk program is just a list of such statements. The "pattern" is so named because originally they were mostly grep-like regular expression patterns:
$ awk '/^be.*st$/' < /usr/share/dict/web2
beanfeast
beast
[snip]
(Except for the slashes, this is basically just running grep, since it uses the default action, print.)
Patterns can actually contain two addresses, but it's more typical to use one, as in these cases. Patterns not enclosed within slashes allow tests like FNR == 1 (File-specific Number of this Record equals 1) or NR % 2 == 1 (Number of this Record—cumulative across all files!—mod 2 equals 1).
Once you hit the open brace, though, you're into the "action" part. Now NR % 2 == 1 simply calculates the result (true or false) and then throws it away. If you leave out the "pattern" part entirely, the "action" part is run on every input line. So this prints every line.
Note that the test NR % 2 == 1 is testing the cumulative record-number. So if some file has an odd number of lines ("records"), the next file will print out every even-numbered line (and this will persist until you hit another file with an odd number of lines).
For instance, suppose the two input files are A.txt and B.txt. Awk starts reading A.txt and has both FNR and NR set to 1 for the first line, which might be, e.g., file A, line 1. Since FNR == 1 the first "action" is done, setting o. Then awk tests the second pattern. NR is 1, so NR % 2 is 1, so the second "action" is done, printing that line to A_oddlines.txt.
Now suppose file A.txt contains only that one line. Awk now goes on to file B.txt, resetting FNR but leaving NR cumulative. The first line of B might be file B, line 1. Awk tries the first "pattern", and indeed, FNR == 1 so this closes the old o and sets up the new one.
But NR is 2, because NR is cumulative across all input files. So the second pattern (NR % 2 == 1) computes 2 % 2 (which is 0) and compares == 1 which is false, and thus awk skips the second "action" for line 1 of file B.txt. Line 2, if it exists, will have FNR == 2 and NR == 3, so that line will be copied out.
(I originally assumed, since your script was close to working, that you intended this and were just stuck a bit on syntax.)
With GNU awk you could just do:
$ awk 'FNR%2{print > (FILENAME".odd")}' *.txt
This will create a .odd file for every .txt file in the current directory containing only the odd lines.
However sed has the upper hand on conciseness here. The following GNU sed command will remove all even lines and store the old file with the extension .bck for all .txt files in the current directory:
$ sed -ni.bck '1~2p' *txt
Demo:
$ ls
f1.txt f2.txt
$ cat f1.txt
1
2
3
4
5
$ cat f2.txt
6
7
8
9
10
$ sed -ni.bck '1~2p' *txt
$ ls
f1.txt f1.txt.bck f2.txt f2.txt.bck
$ cat f1.txt
1
3
5
$ cat f1.txt.bck
1
2
3
4
5
$ cat f2.txt
6
8
10
$ cat f2.txt.bck
6
7
8
9
10
If you don't won't the back up files then simply:
$ sed -ni '1~2p' *txt
Personally, I'd use
for filename in *.txt; do
awk 'NR%2==1' "$filename" > "oddlines-$filename"
done
EDIT: quote filenames
You can try a for loop :
#!/bin/bash
for file in dir/*.txt
do
oddfile=$(echo "$file" | sed -e 's|\.txt|_odd\.txt|g') #This will create file_odd.txt
awk 'NR%2==1' "$file" > "$oddfile" # This will output it in the same dir.
done
Your problem is that NR%2==1 is inside the {NR%2==1; print>o} 'action block' and is not kicking in as a 'condition'. Use this instead:
gawk 'FNR==1{if(o)close(o);o=FILENAME;sub(/\.txt/,"_oddlines.txt",o)};
FNR%2==1{print > o}' *.txt

Resources