Merge two text files at a specific location, sed or awk - bash

I have two text files, I want to place a text in the middle of another, I did some research and found information about adding single strings:
I have a comment in the second text file called STUFFGOESHERE, so I tried:
sed '/^STUFFGOESHERE/a file1.txt' file2.txt
sed: 1: "/^STUFFGOESHERE/a long.txt": command a expects \ followed by text
So I tried something different, trying to place the contents of the text based on a given line, but no luck.
Any ideas?

This should do it:
sed '/STUFFGOESHERE/ r file1.txt' file2.txt
If you want to remove the STUFFGOESHERE line:
sed -e '/STUFFGOESHERE/ r file1.txt' -e '/STUFFGOESHERE/d' file2.txt
If you want to modify file2 in place:
sed -i -e...
(or maybe sed -i '' -e..., I'm using GNU sed 4.1.5.)

If you can use ex or ed, try
cat <<EOF | ex -e - file2.txt
/^STUFFGOESHERE/
.r file1.txt
w
q
EOF
The same script works for ed:
cat <<EOF | ed file2.txt
/^STUFFGOESHERE/
.r file1.txt
w
q
EOF

awk '/STUFFGOESHERE/{while((getline line<"file1")>0){ print line};next}1' file2

From a Unix shell (bash, csh, zsh, whatever):
: | perl -e '#c = join("", map {<>} 0..eof); print $c[0] =~ /STUFFGOESHERE/ ? $` . $c[1] . $'"'"' : $c[0]' file2.txt file1.txt > newfile2.txt

Related

Add the first line to the beginning of a file to each line with shell

I have a lot of files with the first line of them as an identifier. The subsequent lines are products of the identifier. Here is an example of the file:
0G000001:
Product_2221
Product_2222
Product_2122
...
I want to put the identifier at the beginning of every line of the file. The final output would be like this:
0G000001: Product_2221
0G000001: Product_2222
0G000001: Product:2122
....
I want to make a loop for all the files that I have. I've been trying with:
for i in $(echo `head -n1 file.$i.txt);
do
cat - file.$i.txt > file_id.$i.txt;
done
But I only duplicate the first line of the file. I know that sed can add specific text at the beginning of the file but I can't figure it out to specify that the text is the first line of the file and in a loop context.
No explicit loop necessary:
awk '
FNR==1 { close(out); out=FILENAME; sub(/\./,"_id&",out); hdr=$0; next }
{ print hdr, $0 > out }
' file.*.txt
With awk:
awk 'NR==1 { prod = $0 } NR>1 { print prod, $0 }' infile
Output:
0G000001: Product_2221
0G000001: Product_2222
0G000001: Product_2122
A sed command to do what you want could look like this:
$ sed '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' infile
0G000001: Product_2221
0G000001: Product_2222
0G000001: Product_2122
This does the following:
1 { # On the first line
h # Copy the pattern space to the hold space
d # Delete the line, move to next line
}
G # Append the hold space to the pattern space
s/\(.*\)\n\(.*\)/\2 \1/ # Swap the lines in the pattern space
Some seds might complain about {h;d} and require an extra semicolon, {h;d;}.
To do this in-place for a file, you can use
sed -i '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' infile
for GNU sed, or
sed -i '' '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' infile
for macOS sed. Or, if your sed doesn't support -i at all:
sed '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' infile > tmpfile && mv tmpfile infile
To do it in a loop over all files in a directory:
for f in /path/to/dir/*; do
sed -i '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' "$f"
done
or even directly with a glob:
sed -i '1{h;d};G;s/\(.*\)\n\(.*\)/\2 \1/' /path/to/dir/*
The latter works for sure with GNU sed; not sure about other seds.
sed + head solution:
for f in *.txt; do sed -i '1d; s/^/'"$(head -n1 $f)"' /' "$f"; done
-i - to modify file in-place
1d; - delete the 1st line
$(head -n1 $f) - extract the 1st line from file (getting identifier)
s/^/<identifier> / - prepend identifier to each line in file
This might work for you (GNU sed):
sed -ri '1h;1d;G;s/(.*)\n(.*)/\2 \1/' file ...
Save the first line in the hold space (HS) and then delete it from the pattern space (PS). For every line (other than the first), append the HS to the PS and then swap the lines and replace the newline with a space.

how to delete a large number of lines from a file

I have a file with ~700,000 lines and I would like to remove a bunch of specific lines (~30,000) using bash scripting or another method.
I know I can remove lines using sed:
sed -i.bak -e '1d;34d;45d;678d' myfile.txt # an example
I have the lines in a text file but I don't know if I can use it as input to sed, maybe perl??
Thanks
A few options:
sed <(sed 's/$/d/' lines_file) data_file
awk 'NR==FNR {del[$1]; next} !(FNR in del)' lines_file data_file
perl -MPath::Class -e '
%del = map {$_ => 1} file("lines_file")->slurp(chomp => 1);
$f = file("data_file")->openr();
while (<$f>) {
print unless $del{$.};
}
'
perl -ne'
BEGIN{ local #ARGV =pop; #h{<>} =() }
exists $h{"$.\n"} or print;
' myfile.txt lines
You can make the remove the lines using sed file.
First make a list of lines to remove. (One line number for one line)
$ cat lines
1
34
45
678
Make this file to sed format.
$ sed -e 's|$| d|' lines >lines.sed
$ cat lines.sed
1 d
34 d
45 d
678 d
Now use this sed file and give it as input to sed command.
$ sed -i.bak -f lines.sed file_with_70k_lines
This will remove the lines.
If you can create a text file of the format
1d
34d
45d
678d
then you can run something like
sed -i.bak -f scriptfile datafile
You can use a genuine editor for that, and ed is the standard editor.
I'm assuming your lines are in a file lines.txt, one number per line, e.g.,
1
34
45
678
Then (with a blatant bashism):
ed -s file.txt < <(sed -n '/^[[:digit:]]\+$/p' lines.txt | sort -nr | sed 's/$/d/'; printf '%s\n' w q)
A first sed selects only the numbers from file lines.txt (just in case).
There's something quite special to take into account here: that when you delete line 1, then line 34 in the original file becomes line 33. So it's better to remove the lines from the end: start with 678, then 45, etc. that's why we're using sort -nr (to sort the numbers in reverse order). A final sed appends d (ed's delete command) to the numbers.
Then we issue the w (write) and q (quit) commands.
Note that this overwrites the original file!

Extract lines from a file in bash

I have a file like this
I would like to extract the line with the 0 and 1 (all lines in the file) into a seperate file. However, the sequence does not have to start with a 0 but could also start with a 1. However, the line always comes directly after the line (SITE:). Moreover, I would like to extract the line SITTE itself into a seperate file. Could somebody tell me how that is doable in bash?
Moreover, I would like to extract the line SITTE itself into a seperate file.
That’s the easy part:
grep '^SITE:' infile > outfile.site
Extracting the line after that is slightly harder:
grep --after-context=1 '^SITE:' infile \
| grep '^[01]*$' \
> outfile.nr
--after-context (or -A) specifies how many lines after the matching line to print as well. We then use the second grep to print only that line, and not the actually matching line (nor the delimiter which grep puts between each matching entry when specifying an after-context).
Alternatively, you could use the following to match the numeric lines:
grep '^[01]*$' infile > outfile.nr
That’s much easier, but it will find all lines consisting solely of 0s and 1s, regardless of whether they come after a line which starts with SITE:.
You could try something like :
$ egrep -o "^(0|1)+$" test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
$ grep "^SITE:" test.txt > test3.txt
$ cat test3.txt
SITE: 0 0.000340988542 0.0357651018
SITE: 1 0.000529755514 0.00324293642
SITE: 2 0.000577745511 0.052214098
Another solution, using bash :
$ while read; do [[ $REPLY =~ ^(0|1)+$ ]] && echo "$REPLY"; done < test.txt > test2.txt
$ cat test2.txt
0000000000001010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000
0011010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
To remove the characters 0 at beginning of the line :
$ egrep "^(0|1)+$" test.txt | sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
1010000000000000010000000000000000000100000000000010000000000000000000000000000000000000
1000000000000000000000000000000000000000000000000000000000
11010000000000001010000000000000001000010001000000001001001000011000000000000000101000101010101000
UPDATE : New file format provided in comments :
$ egrep "^SITE:" test.txt|egrep -o "(0|1)+$"|sed "s/^0\{1,\}//g" > test2.txt
$ cat test2.txt
100000000000000000000001000001000000000000000000000000000000000000
1010010010000000000111101000010000001001010111111100000000000010010001101010100011101011110011100
10000000000
$ egrep "^SITE:" test.txt|sed "s/[01\ ]\{1,\}$//g" > test3.txt
$ cat test3.txt
SITE: 967 0.189021866 0.0169990123
SITE: 968 0.189149593 0.246619149
SITE: 969 0.189172266 6.84752689e-05
Here's a simple awk solution that matches all lines starting with SITE: and outputs the respective next line:
awk '/^SITE:/ { if (getline) print }' infile > outfile
Simply omit the { ... } block part to extract all lines starting with SITE: themselves to a separate file:
awk '/^SITE:/' infile > outfile
If you wanted to combine both operations:
outfile1 and outfile2 are the names of the 2 output files, passed to awk as variables f1 and f2:
awk -v f1=outfile1 -v f2=outfile2 \
'/^SITE:/ { print > f1; if (getline) print > f2 }' infile

head and grep simultaneously

Is there a unix one liner to do this?
head -n 3 test.txt > out_dir/test.head.txt
grep hello test.txt > out_dir/test.tmp.txt
cat out_dir/test.head.txt out_dir/test.tmp.txt > out_dir/test.hello.txt
rm out_dir/test.head.txt out_dir/test.tmp.txt
I.e., I want to get the header and some grep lines from a given file, simultaneously.
Use awk:
awk 'NR<=3 || /hello/' test.txt > out_dir/test.hello.txt
You can say:
{ head -n 3 test.txt ; grep hello test.txt ; } > out_dir/test.hello.txt
Try using sed
sed -n '1,3p; /hello/p' test.txt > out_dir/test.hello.txt
The awk solution is the best, but I'll add a sed solution for completeness:
$ sed -n test.txt -e '1,3p' -e '4,$s/hello/hello/p' test.txt > $output_file
The -n says not to print out a line unless specified. The -e are the commands '1,3p prints ou the first three lines 4,$s/hello/hello/p looks for all lines that contain the word hello, and substitutes hello back in. The p on the end prints out all lines the substitution operated upon.
There should be a way of using 4,$g/HELLO/p, but I couldn't get it to work. It's been a long time since I really messed with sed.
Of course, I would go awk but here is an ed solution for the pre-vi nostalgics:
ed test.txt <<%
4,$ v/hello/d
w test.hello.txt
%

Using sed to insert file content

I'm trying to insert a file content before a given pattern
Here is my code:
sed -i "" "/pattern/ {
i\\
r $scriptPath/adapters/default/permissions.xml"
}" "$manifestFile"
It adds the path instead of the content of the file.
Any ideas ?
In order to insert text before a pattern, you need to swap the pattern space into the hold space before reading in the file. For example:
sed "/pattern/ {
h
r $scriptPath/adapters/default/permissions.xml
g
N
}" "$manifestFile"
Just remove i\\.
Example:
$ cat 1.txt
abc
pattern
def
$ echo hello > 2.txt
$ sed -i '/pattern/r 2.txt' 1.txt
$ cat 1.txt
abc
pattern
hello
def
I tried Todd's answer and it works great,
but I found "h" & "g" commands are ommitable.
Thanks to this faq (found from #vscharf's comments), Todd's answer can be this one liner.
sed -i -e "/pattern/ {r $file" -e 'N}' $manifestFile
Edit:
If you need here-doc version, please check this.
I got something like this using awk. Looks ugly but did the trick in my test:
command:
cat test.txt | awk '
/pattern/ {
line = $0;
while ((getline < "insert.txt") > 0) {print};
print line;
next
}
{print}'
test.txt:
$ cat test.txt
some stuff
pattern
some other stuff
insert.txt:
$ cat insert.txt
this is inserted file
this is inserted file
output:
some stuff
this is inserted file
this is inserted file
pattern
some other stuff
CodeGnome's solution don't work, if the pattern is on the last line..
So I used 3 commands.
sed -i '/pattern/ i\
INSERTION_MARKER
' $manifestFile
sed -i '/INSERTION_MARKER/r $scriptPath/adapters/default/permissions.xml' $manifestFile
sed -i 's/INSERTION_MARKER//' $manifestFile

Resources