Sed range and removing last matching line - bash

I have this data:
One
two
three
Four
five
six
Seven
eight
And this command:
sed -n '/^Four$/,/^[^[:blank:]]/p'
I get the following output:
Four
five
six
Seven
How can I change this sed expression to not match the final line of the output? So the ideal output should be:
Four
five
six
I've tried many things involving exclamation points but haven't managed to get close to getting this working.

Use a "do..while()" loop:
sed -n '/^Four$/{:a;p;n;/^[[:blank:]]/ba}'
details:
/^Four$/ {
:a # define the label "a"
p # print the pattern-space
n # load the next line in the pattern space
/^[[:blank:]]/ba # if the pattern succeeds, go to label "a"
}

You may pipe to another sed and skip last line:
sed -n '/^Four$/,/^[^[:blank:]]/p' file | sed '$d'
Four
five
six
Alternatively you may use:
sed -n '/^Four$/,/^[^[:blank:]]/{/^Four$/p; /^[^[:blank:]]/!p;}' file

You're using the wrong tool. sed is for doing s/old/new, that is all. Just use awk:
$ awk '/^[^[:blank:]]/{f=/^Four$/} f' file
Four
five
six
How it works: Every time it finds a line that doesn't start with spaces (/^[^[:blank:]]/) it sets a flag f (for "found") to 1 if that line starts with Four and 0 otherwise (f=/^Four$/). Whenever f is non-zero that is interpreted as a true condition and so invokes awks default behavior which is to print the current line. So when it hits a block starting with Four it prints every line in that block because f is 1/true and for every other block it doesn't print since f is 0/false.

Following awk may help you here.
awk '!/^ /{flag=""} /Four/{flag=1} flag' Input_file
Output will be as follows.
Four
five
six
Also in case of you need to save the output into Input_file itself append > temp_file && mv temp_file Input_file to above code.

grep -Pzo '\n\KFour\n(\s.+\n)+' input.txt
Output
Four
five
six

This might work for you (GNU sed):
sed '/^Four/{:a;n;/^\s/ba};d' file
If the line begins with Four print it and any following lines beginning with a space.
Another way:
sed '/^\S/h;G;/^Four/MP;d' file
If a line begins with a non-space, copy it to the hold space (HS). Append the HS to each line and if either line begins with Four print the first line and delete the rest. This will delete all lines other than the section beginning with Four.

Related

How to get all lines from a file after the last empty line?

Having a file like foo.txt with content
1
2
3
4
5
How do i get the lines starting with 4 and 5 out of it (everything after last empty line), assuming the amount of lines can be different?
Updated
Let's try a slightly simpler approach with just sed.
$: sed -n '/^$/{g;D;}; N; $p;' foo.txt
4
5
-n says don't print unless I tell you to.
/^$/{g;D;}; says on each blank line, clear it all out with this:
g : Replace the contents of the pattern space with the contents of the hold space. Since we never put anything in, this erases the (possibly long accumulated) pattern space. Note that I could have used z since this is GNU, but I wanted to break it out for non-GNU sed's below, and in this case this works for both.
D : remove the now empty line from the pattern space, and go read the next.
Now previously accumulated lines have been wiped if (and only if) we saw a blank line. The D loops back to the beginning, so N will never see a blank line.
N : Add a newline to the pattern space, then append the next line of input to the pattern space. This is done on every line except blanks, after which the pattern space will be empty.
This accumulates all nonblanks until either 1) a blank is hit, which will clear and restart the buffer as above, or 2) we reach EOF with a buffer intact.
Finally, $p says on the LAST line (which will already have been added to the pattern space unless the last line was blank, which will have removed the pattern space...), print the pattern space. The only time this will have nothing to print is if the last line of the file was a blank line.
So the whole logic boils down to: clean the buffer on empty lines, otherwise pile the non-empty lines up and print at the end.
If you don't have GNU sed, just put the commands on separate lines.
sed -n '
/^$/{
g
D
}
N
$p
' foo.txt
Alternate
The method above is efficient, but could potentially build up a very large pattern buffer on certain data sets. If that's not an issue, go with it.
Or, if you want it in simple steps, don't mind more processes doing less work each, and prefer less memory consumed:
last=$( sed -n /^$/= foo.txt|tail -1 ) # find the last blank
next=$(( ${last:-0} + 1 )) # get the number of the line after
cmd="$next,\$p" # compose the range command to print
sed -n "$cmd" foo.txt # run it to print the range you wanted
This runs a lot of small, simple tasks outside of sed so that it can give sed the simplest, most direct and efficient description of the task possible. It will read the target file twice, but won't have to manage filling, flushing, and refilling the accumulation of data in the pattern buffer with records before a blank line. Still likely slower unless you are memory bound, I'd think.
Reverse the file, print everything up to the first blank line, reverse it again.
$ tac foo.txt | awk '/^$/{exit}1' | tac
4
5
Using GNU awk:
awk -v RS='\n\n' 'END{printf "%s",$0}' file
RS is the record separator set to empty line.
The END statement prints the last record.
try this:
tail +$(($(grep -nE ^$ test.txt | tail -n1 | sed -e 's/://g')+1)) test.txt
grep your input file for empty lines.
get last line with tail => 5:
remove unnecessary :
add 1 to 5 => 6
tail starting from 6
You can try with sed :
sed -n ':A;$bB;/^$/{x;s/.*//;x};H;n;bA;:B;H;x;s/^..//;p' infile
With GNU sed:
sed ':a;/$/{N;s/.*\n\n//;ba;}' file

Align numbers using only sed

I need to align decimal numbers with the "," symbol using only the sed command. The "," should go in the 5th position. For example:
183,7
2346,7
7,999
Should turn into:
183,7
2346,7
7,999
The maximum amount of numbers before the comma is 4. I have tried using this to remove spaces:
sed 's/ //g' input.txt > nospaces.txt
And then I thought about adding spaces depending on the number of digits before the comma, but I don't know how to do this using only sed.
Any help would be appreciated.
Assuming that there is only one number on each line; that there are at most four digits before the ,, and that there is always a ,:
sed 's/[^0-9,]*\([0-9]\+,[0-9]*\).*/ \1/;s/.*\(.....,.*\)/\1/;'
The first s gets rid of everything other than the (first) number on the line, and puts four spaces before it. The second one deletes everything before the fifth character prior to the ,, leaving just enough spaces to right justify the number.
The second s command might mangle input lines which didn't match the first s command. If it is possible that the input contains such lines, you can add a conditional branch to avoid executing the second substitution if the first one failed. With Gnu sed, this is trivial:
sed 's/[^0-9,]*\([0-9]\+,[0-9]*\).*/ \1/;T;s/.*\(.....,.*\)/\1/;'
T jumps to the end of the commands if the previous s failed. Posix standard sed only has a conditional branch on success, so you need to use this circuitous construction:
sed 's/[^0-9,]*\([0-9]\+,[0-9]*\).*/ \1/;ta;b;:a;s/.*\(.....,.*\)/\1/;'
where ta (conditional branch to a on success) is used to skip over a b (unconditional branch to end). :a is the label referred to by the t command.
if you change your mind, here is an awk solution
$ awk -F, 'NF{printf "%5d,%-d\n", $1,$2} !NF' file
183,7
2346,7
7,999
set the delimiter to comma and handle both parts as separate fields
Try with this:
gawk -F, '{ if($0=="") print ; else printf "%5d,%-d\n", $1, $2 }' input.txt
If you are using GNU sed, you could do as below
sed -r 's/([0-9]+),([0-9]+)/printf "%5s,%d" \1 \2/e' input.txt

How could I put these lines in range format?

I have a text file with 826,838 lines. Text file looks like this (sorry, couldn't get the image uploader to work).
I'm using sed (sed -n '2p;$p') to print the second and last line but can't figure out how to put the lines in range format.
Current output:
1 3008.00 7380.00 497724.00 3158482.00 497724.00 3158482.00
826838 4744.00 7409.00 480729.00 3207718.00 480729.00 3207718.00
Desired output:
1-826838 3008.00-4744.00 7380.00-7409.00 497724.00-480729.00 3158482.00-3207718.00 497724.00-480729.00 3158482.00-3207718.00
Thank you for your help!
This might work for you (GNU sed):
sed -r '2H;$!d;H;x;:a;s/\n\s*(\S+)\s*(.*\n)\s*(\S+\s*)/\1-\3\n\2/;ta;P;d' file
Store line 2 and the last line in the hold space (HS). Following the last line, swap to the HS and then repeatedly move the first fields of the second and third lines to the first line. Finally print the first line only.
With single awk expression (will get the needed lines and make the needed ranges):
awk 'NR==2{ split($0,a) }END{ for(i=1;i<=NF;i++) printf("%s\t",a[i]"-"$i); print "" }' file
The output:
1-826838 3008.00-4744.00 7380.00-7409.00 497724.00-480729.00 3158482.00-3207718.00 497724.00-480729.00 3158482.00-3207718.00

What is the meaning of "0,/xxx" in sed?

A sed command used in a script as following:
sed -i "0,/^ENABLE_DEBUG.*/s/^ENABLE_DEBUG.*/ENABLE_DEBUG = YES/" MakeConfig
I knows that
s/^ENABLE_DEBUG.*/ENABLE_DEBUG = YES/
is to substitutes line prefix
ENABLE_DEBUG as ENABLE_DEBUG = YES
But no idea about the meaning of
0,/^ENABLE_DEBUG.*/
Anyone can help me?
0,/^ENABLE_DEBUG.*/ means that the substitution will only occur on lines from the beginning, 0, to the first line that matches /^ENABLE_DEBUG.*/. No substitution will be made on subsequent lines even if they match /^ENABLE_DEBUG.*/
Other examples of ranges
This will substitute only on lines 2 through 5:
sed '2,5 s/old/new/'
This will substitute from line 2 to the first line after it which includes something:
sed '2,/something/ s/old/new/'
This will substitute from the first line that contains something to the end of the file:
sed '/something/,$ s/old/new/'
POSIX vs. GNU ranges: the meaning of line "0"
Consider this test file:
$ cat test.txt
one
two
one
three
Now, let's apply sed over the range 1,/one/:
$ sed '1,/one/ s/one/Hello/' test.txt
Hello
two
Hello
three
The range starts with line 1 and ends with the first line after line 1 that matches one. Thus two substitutions are made above.
Suppose that we only wanted the first one replaced. With POSIX sed, this cannot be done with ranges. As NeronLeVelu points out, GNU sed offers an extension for this case: it allows us to specify the range as 0,/one/. This range ends with the first occurrence of one in the file:
$ sed '0,/one/ s/one/Hello/' test.txt
Hello
two
one
three
Thus, the range 0,/^ENABLE_DEBUG/ ends with the first line that begins with ENABLE_DEBUG even if that line is the first line. This requires GNU sed.

'grep +A': print everything after a match [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I have a file that contains a list of URLs. It looks like below:
file1:
http://www.google.com
http://www.bing.com
http://www.yahoo.com
http://www.baidu.com
http://www.yandex.com
....
I want to get all the records after: http://www.yahoo.com, results looks like below:
file2:
http://www.baidu.com
http://www.yandex.com
....
I know that I could use grep to find the line number of where yahoo.com lies using
grep -n 'http://www.yahoo.com' file1
3 http://www.yahoo.com
But I don't know how to get the file after line number 3. Also, I know there is a flag in grep -A print the lines after your match. However, you need to specify how many lines you want after the match. I am wondering is there something to get around that issue. Like:
Pseudocode:
grep -n 'http://www.yahoo.com' -A all file1 > file2
I know we could use the line number I got and wc -l to get the number of lines after yahoo.com, however... it feels pretty lame.
AWK
If you don't mind using AWK:
awk '/yahoo/{y=1;next}y' data.txt
This script has two parts:
/yahoo/ { y = 1; next }
y
The first part states that if we encounter a line with yahoo, we set the variable y=1, and then skip that line (the next command will jump to the next line, thus skip any further processing on the current line). Without the next command, the line yahoo will be printed.
The second part is a short hand for:
y != 0 { print }
Which means, for each line, if variable y is non-zero, we print that line. In AWK, if you refer to a variable, that variable will be created and is either zero or empty string, depending on context. Before encounter yahoo, variable y is 0, so the script does not print anything. After encounter yahoo, y is 1, so every line after that will be printed.
Sed
Or, using sed, the following will delete everything up to and including the line with yahoo:
sed '1,/yahoo/d' data.txt
This is much easier done with sed than grep. sed can apply any of its one-letter commands to an inclusive range of lines; the general syntax for this is
START , STOP COMMAND
except without any spaces. START and STOP can each be a number (meaning "line number N", starting from 1); a dollar sign (meaning "the end of the file"), or a regexp enclosed in slashes, meaning "the first line that matches this regexp". (The exact rules are slightly more complicated; the GNU sed manual has more detail.)
So, you can do what you want like so:
sed -n -e '/http:\/\/www\.yahoo\.com/,$p' file1 > file2
The -n means "don't print anything unless specifically told to", and the -e directive means "from the first appearance of a line that matches the regexp /http:\/\/www\.yahoo\.com/ to the end of the file, print."
This will include the line with http://www.yahoo.com/ on it in the output. If you want everything after that point but not that line itself, the easiest way to do that is to invert the operation:
sed -e '1,/http:\/\/www\.yahoo\.com/d' file1 > file2
which means "for line 1 through the first line matching the regexp /http:\/\/www\.yahoo\.com/, delete the line" (and then, implicitly, print everything else; note that -n is not used this time).
awk '/yahoo/ ? c++ : c' file1
Or golfed
awk '/yahoo/?c++:c' file1
Result
http://www.baidu.com
http://www.yandex.com
This is most easily done in Perl:
perl -ne 'print unless 1 .. m(http://www\.yahoo\.com)' file
In other words, print all lines that aren’t between line 1 and the first occurrence of that pattern.
Using this script:
# Get index of the "yahoo" word
index=`grep -n "yahoo" filepath | cut -d':' -f1`
# Get the total number of lines in the file
totallines=`wc -l filepath | cut -d' ' -f1`
# Subtract totallines with index
result=`expr $total - $index`
# Gives the desired output
grep -A $result "yahoo" filepath

Resources