copy and paste between two txt file - bash

Hi I am new to bash and using sed need a little help
I have two txt files i need to copy and paste between them the first file I know what the text is and placed of the text but the second txt file I don't know the text but I do know the placed of the text is.
In file1 put the two text words or numbers from file2 and place them like I show below.
When I create file2 all I am going to know about it will have two words or numbers on the same line4
I have been trying with this
sed $'10{e sed "4!d" /home/Desktop/file1.txt\n;d}' /home/Desktop/file2.txt
and
awk 'NR==4{a=$0}NR==FNR{next}FNR==10{print a}4' /home/Desktop/file2.txt /home/Desktop/file1.txt
This is what my files would look like
file1.txt
cat
hat
sat
fat
mat
rat
file2.txt
line1
line2
line3
text1 text2
line5
I need it to look like this
file1.txt
cat
hat
sat text1
fat text2
mat
rat
thanks for any help

This might work for you (GNU sed):
sed -E '1{x;s#^#sed -n 4p file2#e;x};3{G;s/\n(\S+).*/ \1/};4{G;s/\n\S+//}' file1
Stuff the line from file2 into the hold space when processing file1 and append and manipulate that line when needed.
A more explicit explanation:
By default, sed reads each line of a file. For each cycle, it removes the newline, places the result in the pattern space, goes through a sequence of commands, re-appends the newline and prints the result e.g. sed '' file replicates the cat command. The sed commands are usually placed between '...' and represent a cycle, thus:
1{x;s#^#sed -n 4p file2#e;x}
1{..} executes the commands between the ellipses on the first line of file1. Commands are separated by ;'s
x sed provides two buffers. After removing the newline that delimits each line of a file, the result is placed in the pattern space. Another buffer is provided empty, at the start of each invocation, called the hold space. The x swaps the pattern space for the hold space.
s#^#sed -n 4p file2#e this inserts another sed invocation into the empty hold space and evaluates it by the use of the e flag. The second invocation turns off implicit printing (-n option) and then prints line 4 of file2 only.
x the hold space is now swapped with the pattern space.Thus, line 4 of file2 is placed in the hold space.
3{G;s/\n(\S+).*/ \1/}
3{..} executes the commands between the ellipses on the third line of file1.
G append the contents of hold space to the pattern space using a newline as a separator.
s/\n(\S+).*/ \1/ match on the appended hold space and replace it by a space and the first column.
4{G;s/\n\S+//}
4{..} executes the commands between the ellipses on the fourth line of file1.
G append the contents of hold space to the pattern space using a newline as a separator.
s/\n\S+// match on the appended hold space and remove the newline and the first column, thus leaving a space and the second column.
m

Assuming you want to append the fields of the 4th line of file2.txt
to the 3rd and the following lines of file1.txt, how about:
awk 'FNR==NR {if (FNR==4) split($0, ary, " "); next} {print $0 " " ary[FNR - 3 + 1]}' /home/Desktop/file2.txt /home/Desktop/file1.txt
Result:
cat
hat
sat text1
fat text2
mat
rat

Related

How to refresh the line numbers in sed

How can you refresh the line numbers of a sed output inside the same sed command?
I have a sed script as follows -
#!/usr/bin/sed -f
/pattern/i #inserting a line
1~10i ####
What this does is that it inserts lines wherever the pattern is matched and then inserts #### every ten lines. The problem is that it inserts the hashes every 10 lines according to the line numbers of the original file before inserting the lines for the matching pattern. I want to refresh the line numbers after inserting the lines and use them for inserting the 4 hashes every 10 lines.
Anyway this can be done without piping the output into a new sed?
Interesting challenge. If your file is not too large, the following may work for you (tested with GNU sed):
#!/usr/bin/sed -nEf
:a; N; $!ba
{
s/([^\n]*pattern[^\n]*\n)/#inserting a line\n\1/g
s/\n/ \n/g
s/\`/####\n/
:b
s/(.*####\n([^\n]* \n){9}[^\n]*) \n/\1\n####\n/
tb
s/ \n/\n/g
p
}
Explanations, line by line:
No print, extended RE mode (-nE).
Loop around label a to concatenate the whole file in the pattern space (reason why its size matters).
Add #inserting a line\n before each line containing pattern.
Add a space before all endline characters.
Insert ####\n before the first line.
Label b.
Append ####\n' to anything followed by ####\n` and 10 space-terminated lines, removing the final space (to prevent subsequent matches).
Goto b if there was a substitution.
Remove all spaces at the end of a line.
print.
Note: if your file does not contain NUL characters the -z option of GNU sed saves a few commands:
#!/usr/bin/sed -Ezf
s/([^\n]*pattern[^\n]*\n)/#inserting a line\n\1/g
s/\n/ \n/g
s/\`/####\n/
:a
s/(.*####\n([^\n]* \n){9}[^\n]*) \n/\1\n####\n/
ta
s/ \n/\n/g
Note: with the hold space we could probably do the same on the fly, instead of storing the whole file in the pattern space.
This might work for you (GNU sed):
sed -zE 's/.*pattern/# insert line\n&/mg
s/([^\n]*\n){10}/&####\n/g
s/^/####\n/' file
Slurp the file into memory.
Insert desired text before lines containing pattern.
Insert #### every 10 lines and before the first line.

Sed range and removing last matching line

I have this data:
One
two
three
Four
five
six
Seven
eight
And this command:
sed -n '/^Four$/,/^[^[:blank:]]/p'
I get the following output:
Four
five
six
Seven
How can I change this sed expression to not match the final line of the output? So the ideal output should be:
Four
five
six
I've tried many things involving exclamation points but haven't managed to get close to getting this working.
Use a "do..while()" loop:
sed -n '/^Four$/{:a;p;n;/^[[:blank:]]/ba}'
details:
/^Four$/ {
:a # define the label "a"
p # print the pattern-space
n # load the next line in the pattern space
/^[[:blank:]]/ba # if the pattern succeeds, go to label "a"
}
You may pipe to another sed and skip last line:
sed -n '/^Four$/,/^[^[:blank:]]/p' file | sed '$d'
Four
five
six
Alternatively you may use:
sed -n '/^Four$/,/^[^[:blank:]]/{/^Four$/p; /^[^[:blank:]]/!p;}' file
You're using the wrong tool. sed is for doing s/old/new, that is all. Just use awk:
$ awk '/^[^[:blank:]]/{f=/^Four$/} f' file
Four
five
six
How it works: Every time it finds a line that doesn't start with spaces (/^[^[:blank:]]/) it sets a flag f (for "found") to 1 if that line starts with Four and 0 otherwise (f=/^Four$/). Whenever f is non-zero that is interpreted as a true condition and so invokes awks default behavior which is to print the current line. So when it hits a block starting with Four it prints every line in that block because f is 1/true and for every other block it doesn't print since f is 0/false.
Following awk may help you here.
awk '!/^ /{flag=""} /Four/{flag=1} flag' Input_file
Output will be as follows.
Four
five
six
Also in case of you need to save the output into Input_file itself append > temp_file && mv temp_file Input_file to above code.
grep -Pzo '\n\KFour\n(\s.+\n)+' input.txt
Output
Four
five
six
This might work for you (GNU sed):
sed '/^Four/{:a;n;/^\s/ba};d' file
If the line begins with Four print it and any following lines beginning with a space.
Another way:
sed '/^\S/h;G;/^Four/MP;d' file
If a line begins with a non-space, copy it to the hold space (HS). Append the HS to each line and if either line begins with Four print the first line and delete the rest. This will delete all lines other than the section beginning with Four.

match repeated character in sed on mac

I am trying to find all instances of 3 or more new lines and replace them with only 2 new lines (imagine a file with wayyy too much white space). I am using sed, but OK with an answer using awk or the like if that's easier.
note: I'm on a mac, so sed is slightly different than on linux (BSD vs GNU)
My actual goal is new lines, but I can't get it to work at all so for simplicity I'm trying to match 3 or more repetitions of bla and replace that with BLA.
Make an example file called stupid.txt:
$ cat stupid.txt
blablabla
$
My understanding is that you match i or more things using regex syntax thing{i,}.
I have tried variations of this to match the 3 blas with no luck:
cat stupid.txt | sed 's/bla{3,}/BLA/g' # simplest way
cat stupid.txt | sed 's/bla\{3,\}/BLA/g' # escape curly brackets
cat stupid.txt | sed -E 's/bla{3,}/BLA/g' # use extended regular expressions
cat stupid.txt | sed -E 's/bla\{3,\}/BLA/g' # use -E and escape brackets
Now I am out of ideas for what else to try!
thing{3,} matches thinggg. Use (..) to group things to make the quantifier apply to what you want:
$ echo blablabla | sed -E 's/(bla){3}/BLA/g'
BLA
If slurping the whole file is acceptable:
perl -0777pe 's/(\n){3,}/\n\n/g' newlines.txt
Where you should replace \n with whatever newline sequence is appropriate.
-0777 tells perl to not break each line into its own record, which allows a regex that works across lines to function.
If you are satisfied with the result, -i causes perl to replace the file in-place rather than output to stdout:
perl -i -0777pe 's/(\n){3,}/\n\n/g' newlines.txt
You can also do as so: -i~ to create a backup file with the given suffix (~ in this case).
If slurping the whole file is not acceptable:
perl -ne 'if (/^$/) {$i++}else{$i=0}print if $i<3' newlines.txt
This prints any line that is not the third (or higher) consecutive empty line. -i works with this the same.
ps--MacOS comes with perl installed.
sed -E 's/bla{3,}/BLA/g'
The above matches bl followed by three or more repetitions of a. This is not what you want. It appears that you actually want three or more repetitions of bla. If that is the case, then replace:
$ sed -E 's/bla{3,}/BLA/g' stupid.txt
blablabla
With:
$ sed -E 's/(bla){3,}/BLA/g' stupid.txt
BLA
The above, though, doesn't directly help with your task of replacing newlines because, by default, sed reads in only one line at a time.
Replacing newlines
Let's consider this file which has 3 newlines between the 1 and 2:
$ cat file.txt
1
3
To replace any occurrence of three or more newlines with a single newline:
$ sed -E 'H;1h;$!d;x; s/\n{3,}/\n/g' file.txt
1
3
How it works:
H;1h;$!d;x
This complex series of commands reads in the whole file. It is probably
simplest to think of this as an idiom. If you really want to know
the gory details:
H - Append current line to hold space
1h - If this is the first line, overwrite the hold space
with it
$!d - If this is not the last line, delete pattern space
and jump to the next line.
x - Exchange hold and pattern space to put whole file in
pattern space
s/\n{3,}/\n/g
This replaces all sequences of three or more newlines with a single newline.
Alternate
The above solution reads in the whole file at once. For large (gigabyte) files that could be a disadvantage. This alternate approach avoids that:
$ sed -E '/^$/{:a; N; /\n$/ba; s/\n{3,}([^\n]*)/\1/}' file.txt # GNU only
1
3
How it works:
/^$/{...}
This selects blank lines. For blank lines and only blank lines, the commands in braces are executed and they are:
:a
This defines a label a.
N
This reads in the next line from the file into the pattern space, separated from the previous by a newline.
/\n$/ba
If the last line read in is empty, branch (jump) to label a.
s/\n{3,}([^\n]*)/\1/
If we didn't branch, then this substitution is performed which removes the excess newlines.
BSD Version: I don't have a BSD system to test this on but I am guessing:
sed -E -e '/^$/{:a' -e N -e '/\n$/ba' -e 's/\n{3,}([^\n]*)/\1/}' file.txt
To keep only 2 newlines, you can try this sed
sed '
/^$/!b
N
/../b
h
:A
y/\n/#/
/^#$/!bB
s/#//
$bB
N
bA
:B
s/^#//
/./ {
x
G
b
}
g
' infile
/^$/!b If it's a empty line don't print it
N get a new line
/../b if this new line is not empty print the 2 lines
h keep the 2 empty lines in the hold buffer
:A label A
At this point there is always 2 lines in the pattern buffer and the first is empty
y/\n/#/ substitute \n by # (you can choose another char not present in your file)
/^#$/!bB If the second line is not empty jump to B
s/#// remove the #
$bB If it's the last line jump to B
At this point there is 1 empty line in the pattern space
N get the last line
bA jump to A
:B label B
s/^#// remove the # at the start of the line
/./ { If the last line is not empty
x exchange pattern and hold buffer
G add the hold buffer to the pattern space
b jump to end
}
g replace the pattern space (empty) by the hold space
print the pattern space

Find and replace text containing a new line

In bash, how can I find and replace some text containing a new line?
I want to match exactly 2 lines as specified (I can't match them separately as both lines appear at different places separately & I only want to replace where both lines appear consecutively). Using sed I was able to find and replace the individual lines and new line separately, but not together!
In case if needed, below are the lines I want to find and replace (from multiple files at once!):
} elseif ($this->aauth->is_member('Default')) {
$form_data['userstat'] = $this->aauth->get_user()->id;
In general you can used sed -z which tells sed to use the null-character to split lines. Assume you have the file text containing
Hello World
This is a line
line1
line2
Hello World, again
line1
line2
end
Executing sed -z -e 's/line1\nline2/xxx/g' text yields
Hello World
This is a line
xxx
Hello World, again
xxx
end
You can add * (that is <space><star>) to handle inconsistent white spaces.
In your specific case if you want to delete the second line you can use a block statement to advance to the next line and delete it if it matches the second line
sed -e '/line1/{n;/line2/d}' text
This might work for you (GNU sed):
sed -i 'N;s/first line\nsecond line/replacement/;P;D' file ...
Keep a moving window of two lines in the pattern space and replace when necessary.
N.B. -i option updates file(s) in place.
Also using a range and the change command:
sed -i '/first line/,/second line/c\replacement1\nreplacement2\netc' file ...

How could I put these lines in range format?

I have a text file with 826,838 lines. Text file looks like this (sorry, couldn't get the image uploader to work).
I'm using sed (sed -n '2p;$p') to print the second and last line but can't figure out how to put the lines in range format.
Current output:
1 3008.00 7380.00 497724.00 3158482.00 497724.00 3158482.00
826838 4744.00 7409.00 480729.00 3207718.00 480729.00 3207718.00
Desired output:
1-826838 3008.00-4744.00 7380.00-7409.00 497724.00-480729.00 3158482.00-3207718.00 497724.00-480729.00 3158482.00-3207718.00
Thank you for your help!
This might work for you (GNU sed):
sed -r '2H;$!d;H;x;:a;s/\n\s*(\S+)\s*(.*\n)\s*(\S+\s*)/\1-\3\n\2/;ta;P;d' file
Store line 2 and the last line in the hold space (HS). Following the last line, swap to the HS and then repeatedly move the first fields of the second and third lines to the first line. Finally print the first line only.
With single awk expression (will get the needed lines and make the needed ranges):
awk 'NR==2{ split($0,a) }END{ for(i=1;i<=NF;i++) printf("%s\t",a[i]"-"$i); print "" }' file
The output:
1-826838 3008.00-4744.00 7380.00-7409.00 497724.00-480729.00 3158482.00-3207718.00 497724.00-480729.00 3158482.00-3207718.00

Resources