Grep lines between two patterns, one unique and one repeated - bash

I have a text file which looks like this
1
bbbbb
aaa
END
2
ttttt
mmmm
uu
END
3
....
END
The number of lines between the single number patterns (1,2,3) and END is variable. So the upper delimiting pattern changes, but the final one does not. Using some bash commands, I would like to grep lines between a specified upper partner and the corresponding END, for example a command that takes as input 2 and returns
2
ttttt
mmmm
uu
END
I've tried various solutions with sed and awk, but still can't figure it out. The main problem is that I may need to grep a entry in the middle of the file, so I can't use sed with /pattern/q...Any help will be greatly appreciated!

With awk we set a flag f when matching the start pattern, which is an input argument. After that row, the flag is on and it prints every line. When reaching "END" (AND the flag is on!) it exits.
awk -v p=2 '$0~p{f=1} f{print} f&&/END/{exit}' file

Use sed and its addresses to only print a part of the file between the patterns:
#!/bin/bash
start=x
while [[ $start = *[^0-9]* ]] ; do
read -p 'Enter the start pattern: ' start
done
sed -n "/^$start$/,/^END$/p" file

You can use the sed with an address range. Modify the first regular expression (RE1) in /RE1/,/RE2/ as your convenience:
sed -n '/^[[:space:]]*2$/,/^[[:space:]]*END$/p' file
Or,
sed '
/^[[:space:]]*2$/,/^[[:space:]]*END$/!d
/^[[:space:]]*END$/q
' file
This quits upon reading the END, thus may be more efficient.

Another option/solution using just bash
#!/usr/bin/env bash
start=$1
while IFS= read -r lines; do
if [[ ${lines##* } == $start ]]; then
print=on
elif [[ ${lines##* } == [0-9] ]]; then
print=off
fi
case $print in on) printf '%s\n' "$lines";; esac
done < file.txt
Run the script with the number as the argument, 1 can 2 or 3 or ...
./myscript 1

This might work for you (GNU sed):
sed -n '/^\s*2$/{:a;N;/^\s*END$/M!ba;p;q}' file
Switch off implicit printing by setting the -n option.
Gather up the lines beginning with a line starting with 2 and ending in a line starting with END, print the collection and quit.
N.B. The second regexp uses the M flag, which allows the ^ and $ to match start and end of lines when multiple lines are being matched. Another thing to bear in mind is that using a range i.e. sed -n '/start/,/end/p' file, will start printing lines the moment the first condition is met and if the second match does not materialise, it will continue printing to the end of the file.

Related

echo last character of text file in Unix/Bash

I need to see the last characters of bunch of text files (or alternatively test whether they are "}" and give a list of files that test negative ). Is there an easy way to do this from the command line.
(Ideally the solution works without reading the whole file from the start because in addition to there being many they can also be quite large.
P.S.: Any answer would be great but I would really appreciate if the function and syntax of everything in the answer can be fully explained.
It can be done fairly easily with tail and then string indexing in bash. For example, you obtain the last line in a file with, tail -n1 file. You will need to store the line in a variable using command-substitution, e.g.
lastln=$(tail -n1 file)
Then it is simply a matter of indexing the last characters, e.g.
echo ${lastln:(-1)}
(note: when indexing from the end of the string, you must put the offset (e.g. -1 in parenthesis (-1) -- or -- you must leave a space before the -1, e.g. echo ${lastln: -1} is also valid.)
You can try this:
for file in file1 file2; do tail -n 1 "$file" | grep -q '}$' || echo "$file"; done
where you should replace file1 file2 with the list of files you want to analyze, e.g. * or the like. Now what happens here? The outer part
for file in file1 file2; do ...; done
is a simple loop over the files, where inside the loop, you can refer to the current file as $file. Then,
tail -n 1 "$file"
prints the last line of the given file and
| grep -q '}$'
redirects the output to grep (turned into silent mode with -q), which looks for '}' immediatly followed by the end of the line ($). The return value of this command can be used to chain another action: when grep returns non-zero (indicating failure, i.e., the pattern is not matched), the last part
|| echo "$file"
is executed, resulting in the list of files you need.

How to get line WITH tab character using tail and head

I have made a script to practice my Bash, only to realize that this script does not take tabulation into account, which is a problem since it is designed to find and replace a pattern in a Python script (which obviously needs tabulation to work).
Here is my code. Is there a simple way to get around this problem ?
pressure=1
nline=$(cat /myfile.py | wc -l) # find the line length of the file
echo $nline
for ((c=0;c<=${nline};c++))
do
res=$( tail -n $(($(($nline+1))-$c)) myfile.py | head -n 1 | awk 'gsub("="," ",$1){print $1}' | awk '{print$1}')
#echo $res
if [ $res == 'pressure_run' ]
then
echo "pressure_run='${pressure}'" >> myfile_mod.py
else
echo $( tail -n $(($nline-$c)) myfile.py | head -n 1) >> myfile_mod.py
fi
done
Basically, it finds the line that has pressure_run=something and replaces it by pressure_run=$pressure. The rest of the file should be untouched. But in this case, all tabulation is deleted.
If you want to just do the replacement as quickly as possible, sed is the way to go as pointed out in shellter's comment:
sed "s/\(pressure_run=\).*/\1$pressure/" myfile.py
For Bash training, as you say, you may want to loop manually over your file. A few remarks for your current version:
Is /myfile.py really in the root directory? Later, you don't refer to it at that location.
cat ... | wc -l is a useless use of cat and better written as wc -l < myfile.py.
Your for loop is executed one more time than you have lines.
To get the next line, you do "show me all lines, but counting from the back, don't show me c lines, and then show me the first line of these". There must be a simpler way, right?
To get what's the left-hand side of an assignment, you say "in the first space-separated field, replace = with a space , then show my the first space separated field of the result". There must be a simpler way, right? This is, by the way, where you strip out the leading tabs (your first awk command does it).
To print the unchanged line, you do the same complicated thing as before.
A band-aid solution
A minimal change that would get you the result you want would be to modify the awk command: instead of
awk 'gsub("="," ",$1){print $1}' | awk '{print$1}'
you could use
awk -F '=' '{ print $1 }'
"Fields are separated by =; give me the first one". This preserves leading tabs.
The replacements have to be adjusted a little bit as well; you now want to match something that ends in pressure_run:
if [[ $res == *pressure_run ]]
I've used the more flexible [[ ]] instead of [ ] and added a * to pressure_run (which must not be quoted): "if $res ends in pressure_run, then..."
The replacement has to use $res, which has the proper amount of tabs:
echo "$res='${pressure}'" >> myfile_mod.py
Instead of appending each line each loop (and opening the file each time), you could just redirect output of your whole loop with done > myfile_mod.py.
This prints literally ${pressure} as in your version, because it's single quoted. If you want to replace that by the value of $pressure, you have to remove the single quotes (and the braces aren't needed here, but don't hurt):
echo "$res=$pressure" >> myfile_mod.py
This fixes your example, but it should be pointed out that enumerating lines and then getting one at a time with tail | head is a really bad idea. You traverse the file for every single line twice, it's very error prone and hard to read. (Thanks to tripleee for suggesting to mention this more clearly.)
A proper solution
This all being said, there are preferred ways of doing what you did. You essentially loop over a file, and if a line matches pressure_run=, you want to replace what's on the right-hand side with $pressure (or the value of that variable). Here is how I would do it:
#!/bin/bash
pressure=1
# Regular expression to match lines we want to change
re='^[[:space:]]*pressure_run='
# Read lines from myfile.py
while IFS= read -r line; do
# If the line matches the regular expression
if [[ $line =~ $re ]]; then
# Print what we matched (with whitespace!), then the value of $pressure
line="${BASH_REMATCH[0]}"$pressure
fi
# Print the (potentially modified) line
echo "$line"
# Read from myfile.py, write to myfile_mod.py
done < myfile.py > myfile_mod.py
For a test file that looks like
blah
test
pressure_run=no_tab
blah
something
pressure_run=one_tab
pressure_run=two_tabs
the result is
blah
test
pressure_run=1
blah
something
pressure_run=1
pressure_run=1
Recommended reading
How to read a file line-by-line (explains the IFS= and -r business, which is quite essential to preserve whitespace)
BashGuide

Delete lines where 3rd character equals a number

I have a consistent file with numbers like
0123456
0234566
.
.
.
etc
With bash tools, command line preferable, how can I remove each line if the third digit equals 2 .
eg, with cut -c3 I can get the correct digit but I cannot combine it effectively with sed or something similar. I am not looking for a pattern, only the 3rd digit.
(I have done it in a script in python but I was wondering how its done through a one-line bash command). Thank you!
EDIT: Additionally, if I want to delete the lines where the third digit NOT equals to 2 (opposite question)
You can just do this with sed
sed -i '/^..2/d' file
If you want to do the opposite you can do:
sed -i '/^..[^2]/d' file
since you are dealing with a specific character.
I would use awk:
$ awk -F "" '$3!=2' file
0234566
by setting the field separator to "" (empty, just valid on GNU-awk), every character is stored in a different field. Then, saying $3 != 2 checks if the 3rd character is not 2 and, if so, the line is printed.
Or with pure bash, using Using shell parameter expansion ${parameter:offset:length}:
while IFS= read -r line
do
[ "${line:2:1}" != "2" ] && echo "$line"
done < file

'grep +A': print everything after a match [duplicate]

This question already has answers here:
How to get the part of a file after the first line that matches a regular expression
(12 answers)
Closed 7 years ago.
I have a file that contains a list of URLs. It looks like below:
file1:
http://www.google.com
http://www.bing.com
http://www.yahoo.com
http://www.baidu.com
http://www.yandex.com
....
I want to get all the records after: http://www.yahoo.com, results looks like below:
file2:
http://www.baidu.com
http://www.yandex.com
....
I know that I could use grep to find the line number of where yahoo.com lies using
grep -n 'http://www.yahoo.com' file1
3 http://www.yahoo.com
But I don't know how to get the file after line number 3. Also, I know there is a flag in grep -A print the lines after your match. However, you need to specify how many lines you want after the match. I am wondering is there something to get around that issue. Like:
Pseudocode:
grep -n 'http://www.yahoo.com' -A all file1 > file2
I know we could use the line number I got and wc -l to get the number of lines after yahoo.com, however... it feels pretty lame.
AWK
If you don't mind using AWK:
awk '/yahoo/{y=1;next}y' data.txt
This script has two parts:
/yahoo/ { y = 1; next }
y
The first part states that if we encounter a line with yahoo, we set the variable y=1, and then skip that line (the next command will jump to the next line, thus skip any further processing on the current line). Without the next command, the line yahoo will be printed.
The second part is a short hand for:
y != 0 { print }
Which means, for each line, if variable y is non-zero, we print that line. In AWK, if you refer to a variable, that variable will be created and is either zero or empty string, depending on context. Before encounter yahoo, variable y is 0, so the script does not print anything. After encounter yahoo, y is 1, so every line after that will be printed.
Sed
Or, using sed, the following will delete everything up to and including the line with yahoo:
sed '1,/yahoo/d' data.txt
This is much easier done with sed than grep. sed can apply any of its one-letter commands to an inclusive range of lines; the general syntax for this is
START , STOP COMMAND
except without any spaces. START and STOP can each be a number (meaning "line number N", starting from 1); a dollar sign (meaning "the end of the file"), or a regexp enclosed in slashes, meaning "the first line that matches this regexp". (The exact rules are slightly more complicated; the GNU sed manual has more detail.)
So, you can do what you want like so:
sed -n -e '/http:\/\/www\.yahoo\.com/,$p' file1 > file2
The -n means "don't print anything unless specifically told to", and the -e directive means "from the first appearance of a line that matches the regexp /http:\/\/www\.yahoo\.com/ to the end of the file, print."
This will include the line with http://www.yahoo.com/ on it in the output. If you want everything after that point but not that line itself, the easiest way to do that is to invert the operation:
sed -e '1,/http:\/\/www\.yahoo\.com/d' file1 > file2
which means "for line 1 through the first line matching the regexp /http:\/\/www\.yahoo\.com/, delete the line" (and then, implicitly, print everything else; note that -n is not used this time).
awk '/yahoo/ ? c++ : c' file1
Or golfed
awk '/yahoo/?c++:c' file1
Result
http://www.baidu.com
http://www.yandex.com
This is most easily done in Perl:
perl -ne 'print unless 1 .. m(http://www\.yahoo\.com)' file
In other words, print all lines that aren’t between line 1 and the first occurrence of that pattern.
Using this script:
# Get index of the "yahoo" word
index=`grep -n "yahoo" filepath | cut -d':' -f1`
# Get the total number of lines in the file
totallines=`wc -l filepath | cut -d' ' -f1`
# Subtract totallines with index
result=`expr $total - $index`
# Gives the desired output
grep -A $result "yahoo" filepath

Using bash and sed

Okay, so I'm not too great at this, but I have a bash script to pick a random number, then use sed to read lines off of files.
It's not working and I must have done something wrong. Could anyone correct my code?
I want the code to pull the line (random number) from each of those files, then output it as a single string (with spaces between).
NUMBER=$[ ( $RANDOM % 100 ) + 1 ]
sed -n NUMBER'p' /Users/user/Desktop/Street.txt
sed -n NUMBER'p' /Users/user/Desktop/City.txt
sed -n NUMBER'p' /Users/user/Desktop/State.txt
sed -n NUMBER'p' /Users/user/Desktop/Zip.txt
You probably need to use $NUMBER in your sed commands, rather than just NUMBER (or ${NUMBER} if other text is directly next to it). Example:
sed -n "${NUMBER}p" /Users/user/Desktop/Street.txt
The following script will use the same randomly chosen number to grab that line from each of the 4 input files you specified and concatenate those lines into a single variable called $outstring.
#!/bin/bash
NUMBER=$(((RANDOM % 100)+1))
for file in Street City State Zip; do
outstring+="$(sed -n "${NUMBER}p" "./${file}.txt") "
done
echo $outstring
Note: If you want (potentially) different line numbers from each of the 4 input files, then simply put the NUMBER= statement inside the for-loop.
This has the advantage of choosing from the whole of each file rather than only the first 100 lines. It will choose a different line from each file.
for f in Street City State Zip
do
printf '%s ' "$(shuf -n 1 "/Users/user/Desktop/$f.txt")"
done
printf '\n'

Resources