I'm trying to reverse the lines in a file, but I want to do it two lines by two lines.
For the following input:
1
2
3
4
…
97
98
I would like the following output:
97
98
…
3
4
1
2
I found lots of ways to reverse a file line by line (especially on this topic: How can I reverse the order of lines in a file?).
tac. The simplest. Doesn't seem to have an option for what I want, even if I tried to play around with options -r and -s.
tail -r (not POSIX compliant). Not POSIX compliant, my version doesn't seem to have anything to do that.
Remains three sed formula, and I think a little modification would do the trick. But I'm not even understanding what they're doing, and thus I'm stuck here.
sed '1!G;h;$!d'
sed -n '1!G;h;$p'
sed 'x;1!H;$!d;x'
Any help would be appreciated. I'll try to understand these formula and to give answer to this question by myself.
Okay, I'll bite. In pure sed, we'll have to build the complete output in the hold buffer before printing it (because we see the stuff we want to print first last). A basic template can look like this:
sed 'N;G;h;$!d' filename # Incomplete!
That is:
N # fetch another line, append it to the one we already have in the pattern
# space
G # append the hold buffer to the pattern space.
h # save the result of that to the hold buffer
$!d # and unless the end of the input was reached, start over with the next
# line.
The hold buffer always contains the reversed version of the input processed so far, and the code takes two lines and glues them to the top of that. In the end, it is printed.
This has two problems:
If the number of input lines is odd, it prints only the last line of the file, and
we get a superfluous empty line at the end of the input.
The first is because N bails out if no more lines exist in the output, which happens with an odd number of input lines; we can solve the problem by executing it conditionally only when the end of the input was not yet reached. Just like the $!d above, this is done with $!N, where $ is the end-of-input condition and ! inverts it.
The second is because at the very beginning, the hold buffer contains an empty line that G appends to the pattern space when the code is run for the very first time. Since with $!Nwe don't know if at that point the line counter is 1 or 2, we should inhibit it conditionally on both. This can be done with 1,2!G, where 1,2 is a range spanning from line 1 to line 2, so that 1,2!G will run G if the line counter is not between 1 and 2.
The whole script then becomes
sed '$!N;1,2!G;h;$!d' filename
Another approach is to combine sed with tac, such as
tac filename | sed -r 'N; s/(.*)\n(.*)/\2\n\1/' # requires GNU sed
That is not the shortest possible way to use sed here (you could also use tac filename | sed -n 'h;$!{n;G;};p'), but perhaps easier to understand: Every time a new line is processed, N fetches another line, and the s command swaps them. Because tac feeds us the lines in reverse, this restores pairs of lines to their original order.
The key difference to the first approach is the behavior for an odd number of lines: with the second approach, the first line of the file will be alone without a partner, whereas with the first it'll be the last.
I would go with this:
tac file | while read a && read b; do echo $b; echo $a; done
Here is an awk you can use:
cat file
1
2
3
4
5
6
7
8
awk '{a[NR]=$0} END {for (i=NR;i>=1;i-=2) print a[i-1]"\n"a[i]}' file
7
8
5
6
3
4
1
2
It store all line in an array a, then print it out in reverse, two by two.
Related
Having a file like foo.txt with content
1
2
3
4
5
How do i get the lines starting with 4 and 5 out of it (everything after last empty line), assuming the amount of lines can be different?
Updated
Let's try a slightly simpler approach with just sed.
$: sed -n '/^$/{g;D;}; N; $p;' foo.txt
4
5
-n says don't print unless I tell you to.
/^$/{g;D;}; says on each blank line, clear it all out with this:
g : Replace the contents of the pattern space with the contents of the hold space. Since we never put anything in, this erases the (possibly long accumulated) pattern space. Note that I could have used z since this is GNU, but I wanted to break it out for non-GNU sed's below, and in this case this works for both.
D : remove the now empty line from the pattern space, and go read the next.
Now previously accumulated lines have been wiped if (and only if) we saw a blank line. The D loops back to the beginning, so N will never see a blank line.
N : Add a newline to the pattern space, then append the next line of input to the pattern space. This is done on every line except blanks, after which the pattern space will be empty.
This accumulates all nonblanks until either 1) a blank is hit, which will clear and restart the buffer as above, or 2) we reach EOF with a buffer intact.
Finally, $p says on the LAST line (which will already have been added to the pattern space unless the last line was blank, which will have removed the pattern space...), print the pattern space. The only time this will have nothing to print is if the last line of the file was a blank line.
So the whole logic boils down to: clean the buffer on empty lines, otherwise pile the non-empty lines up and print at the end.
If you don't have GNU sed, just put the commands on separate lines.
sed -n '
/^$/{
g
D
}
N
$p
' foo.txt
Alternate
The method above is efficient, but could potentially build up a very large pattern buffer on certain data sets. If that's not an issue, go with it.
Or, if you want it in simple steps, don't mind more processes doing less work each, and prefer less memory consumed:
last=$( sed -n /^$/= foo.txt|tail -1 ) # find the last blank
next=$(( ${last:-0} + 1 )) # get the number of the line after
cmd="$next,\$p" # compose the range command to print
sed -n "$cmd" foo.txt # run it to print the range you wanted
This runs a lot of small, simple tasks outside of sed so that it can give sed the simplest, most direct and efficient description of the task possible. It will read the target file twice, but won't have to manage filling, flushing, and refilling the accumulation of data in the pattern buffer with records before a blank line. Still likely slower unless you are memory bound, I'd think.
Reverse the file, print everything up to the first blank line, reverse it again.
$ tac foo.txt | awk '/^$/{exit}1' | tac
4
5
Using GNU awk:
awk -v RS='\n\n' 'END{printf "%s",$0}' file
RS is the record separator set to empty line.
The END statement prints the last record.
try this:
tail +$(($(grep -nE ^$ test.txt | tail -n1 | sed -e 's/://g')+1)) test.txt
grep your input file for empty lines.
get last line with tail => 5:
remove unnecessary :
add 1 to 5 => 6
tail starting from 6
You can try with sed :
sed -n ':A;$bB;/^$/{x;s/.*//;x};H;n;bA;:B;H;x;s/^..//;p' infile
With GNU sed:
sed ':a;/$/{N;s/.*\n\n//;ba;}' file
# cat file
LBL 434
any lines but not block start
...
LBL 75677
...
any
LBL 777
...
LBL 798
...
# sed -ne '/LBL 75677/,/LBL/p' file | head -n -1
LBL 75677
...
any
#
The above command is good for me, but I would like to know:
Can I suppress the last line without the head command, only in one sed script? I know the commands and control flow of sed (N P D b ...) but I couldn't figure out it at the moment.
#Cyrus, Thanks It works fine and I know how it works thanks again.
But I wanted to find different way of solution if it is.
I tried the lines of block /LBL 75677/,/LBL/ put into the space buffer of sed with N command and D remove the last line from space buffer (this is first line of new block) and print all space buffer. Does somebody can do it.
Below script :
sed -n '/LBL 75677/{p;:loop;n;/LBL/!{p;b loop}}' file
may be what you're looking for.
:loop here is a label and b loop is unconditional jumping to that label.
Here we create a small loop and go on to print the lines until the next LBL is reached.
sed is for simple substitutions on individual lines (s/old/new/), that is all. For anything else you should be using awk:
$ awk '/LBL/{f=0} /LBL 75677/{f=1} f' file
LBL 75677
...
any
In addition to being simpler and clearer than an equivalent sed script, the above will execute faster (especially if you only want one record output and so can change /LBL/{f=0} to /LBL/{exit}), and be more portable as it will work as-is on all awks on all UNIX systems and will be vastly easier to enhance if/when your requirements change (when dealing with anything more than s/old/new/ a tiny requirements change typically means a complete rewrite for a sed script).
If you're using any constructs other than s, g, and p (with -n) in sed then you are using constructs that became obsolete in the mid-1970s when awk was invented and so sed no longer needed all the cryptic runes to perform simple multi-line tasks.
I have a big file made up of 316125000 lines. This file is made up of 112500 data blocks, and each data block has 2810 lines.
I need to reduce the size of the file, so I want to leave the 1st, 10th, 20th, ... 112490th, and 112450th data blocks, and remove all other data blocks. This will gonna give me 11250 data blocks as a result.
This means the same thing that I want to remove every 2811 ~ 28100 lines, and leaving every 1~2810, and 28101~30910 .... lines.
I was thinking of awk, sed or grep, but which one is faster, and how can I acheive this? I know how to remove every 2nd or 3rd line, with awk and NR, but I don't know how to remove big chunk of lines repetitively.
Thanks
Best,
Something along these lines might work:
awk 'int((NR - 1) / 2810) % 10 == 0' <infile >outfile
That is, int((NR - 1) / 2810) gives the (zero-based) number of the block of 2810 lines for the current line (NR), and if the remainder of that block number divided by ten is 0 (% 10 == 0) prints the line. This should result in every 10th block being printed, including the first (block number 0).
I wouldn't guess which is fastest, but I can provide a GNU sed recipe for your benchmarking:
sed -e '2811~28100,+25289d' <input >output
This says: starting at line 2811 and every 28100 lines thereafter, delete it and the next 25289 lines.
Equivalently, we can use sed -n and print lines 1-2810 every 28100 lines:
sed -ne '1~28100,+2809p' <input >output
Say i have a file a.txt containing a word, followed by a number, followed by a newline on
and 3
now 2
for 2
something 7
completely 8
different 6
I need to select the nth char from every word (specified by the number next to the word)
cat a.txt | cut -d' ' -f2 | xargs -i -n1 cut a.txt -c {}
I tried this command, which selects the numbers and uses xargs to put them into the -c option from cut, but the cut command gets executed on every line, instead of a.txt being looped (which I had expected to happen) How can I resolve this problem?
EDIT: Since it seems to be unclear, i want to select a character from a word. The character which I need to select can be found next to the word, for example:
and 3, will give me d. I want to do this for the entire file, which will then form a word :)
A pure shell solution:
$ while read word num; do echo ${word:$((num-1)):1}; done < a.txt
d
o
o
i
e
r
This is using a classic while; do ... ; done shell loop and the read builtin. The general format is
while read variable1 variable2 ... variableN; do something; done < input_file
This will iterate over each line of your input file splitting it into as many variables as you've given. By default, it will split at whitespace but you can change that by changing the $IFS variable. If you give a single variable, the entire line will be saved, if you give more, it will populate as many variables as you give it and save the rest in the last one.
In this particular loop, we're reading the word into $word and the number into $num. Once we have the word, we can use the shell's string manipulation capabilities to extract a substring. The general format is
${string:start:length}
So, ${string:0:2} would extract the first two characters from the variable $string. Here, the variable is $word, the start is the number minus one (this starts counting at 0) and the length is one. The result is the single letter at the position given by the number.
I would suggest that you used awk:
awk '{print substr($1,$2,1)}' file
substr takes a substring of the first field starting from the number contained in the second field and of length 1.
Testing it out (using the original input from your question):
$ cat file
and 3
now 2
for 2
something 7
completely 8
different 6
$ awk '{print substr($1,$2,1)}' file
d
o
o
i
e
r
I have a file which looks like this:
Guest-List 1
All present
Guest-list 2
All present
Guest-List 3
Guest-list 4
All present
Guest-list 5
I want to remove the line containing "All present" and its title (the line just above "All present"). The desired output would be:
Guest-List 3
Guest-list 5
I am interested in implementing this using sed. Because I am a rookie, other possible solutions without sed will be appreciated as well (when answering please provide detailed explanation so I can learn) : )
(I know can delete a line matching a regex, and could store the line above it sending it to the hold buffer, something like this: sed '/^.*present$/d; h' ... then the "g" command would copy the hold buffer back to the pattern space... but how do I tell sed to delete that as well?)
Thanks in advance!
You can use fgrep like this:
fgrep -v -f <(fgrep 'All present' -B1 file) file
Guest-List 3
Guest-list 5
sed -n '/All present$/{s/.*//;x;d;};x;p;${x;p;}' file | sed '/^$/d'
Where file is your file.
This is an adapted example from here.
It has a great explanation:
In order to delete the line prior to the pattern,we store every line in a buffer called as hold space. Whenever the pattern matches, we delete the content present in both, the pattern space which contains the current line, the hold space which contains the previous line.
Let me explain this command: x;p; ; This gets executed for every line.
x exchanges the content of pattern space with hold space. p prints the pattern space. As a result, every time, the current line goes to hold space, and the previous line comes to pattern space and gets printed. When the pattern /All Present/ matches, we empty(s/.*//) the pattern space, and exchange(x) with the hold space(as a result of which the hold space becomes empty) and delete(d) the pattern space which contains the previous line. And hence, the current and the previous line gets deleted on encountering the pattern Linux. The ${x;p;} is to print the last line which will remain in the hold space if left.
The second part of sed is to remove the empty lines created by the first sed command.
If you are using more than the s, g, and p (with -n) commands in sed then you are using language constructs that became obsolete in the mid-1970s when awk was invented.
sed is an excellent tool for simple substitutions on a single line, for anything else just use awk:
$ cat file
Guest-List 1
All present
Guest-list 2
All present
Guest-List 3
Guest-list 4
All present
Guest-list 5
$ awk 'NR==FNR{ if (/All present/) {skip[FNR-1]; skip[FNR]} next} !(FNR in skip)' file file
Guest-List 3
Guest-list 5
The above just parses the file twice - first time to create an array named skip of the line numbers (FNR) you do not want output, and the second time to print the lines that are not in that array. Simple, clear, maintainable, extensible, ....