Reading a subset of the lines in a text file, with bash

Reading a subset of the lines in a text file, with bash - bash

I have a file
line a - this is line a
line b - this is line b
line c - this is line c
line d - this is line d
line e - this is line e
The question is: How can I output the lines starting from "line b" till "line d" using bash commands?
I mean, to obtain:
"line b - this is line b
line c - this is line c
line d - this is line d"

sed -n '/line b/,/line d/p' file

Your example is not enough to infer what you want in the general case, but assuming you want to remove the first and last line, you can simply use
tail -n+2 $filename | head -n-1
Here tail -n+2 prints all the lines starting from the second, and head -n-1 prints all the lines except the last.

for your set of sample data:
awk '/line b/,/line d/' file
Or
awk '/line d/{f=0;print}/line b/{f=1}f' file

If by bash, you mean actually bash alone, I can't help you. You really should be using the right tools for the job. If you mean standard UNIX utilities that you can call from bash, I would be using awk for that.
echo 'line a - this is line a
line b - this is line b
line c - this is line c
line d - this is line d
line e - this is line e' | awk '
BEGIN {e=0}
/^line b/ {e=1}
/^line d/ {if (e==1) {print;exit}}
{if (e==1) print}
'
This outputs:
line b - this is line b
line c - this is line c
line d - this is line d
The way it works is simple.
e is the echo flag, initially set to false (0).
when you find line b, set echo to true (1) - don't print yet. That will be handled by the last bullet point below.
when you find line d and echo is on, print it and exit.
when echo is on, print the line (this includes line b).
I've made an assumption here that you don't want to exit on a line d unless you're already echoing. If that's wrong, move the exit outside of the if statement for line d:
/^line d/ {if (e==1) print;exit}
Then, if you get a line d before your line b, it will just exit without echoing anything.
The "/^line X/"-type clauses can be made very powerful to match pretty well anything you can throw at it.

You can do it using bash alone, though I agree with Pax that using other tools is probably a better solution. Here's a bash-only solution:
while read line
do
t=${line#line b}
if test "$t" != "$line"
then
echo $line
while read line
do
echo $line
t=${line#line d}
if test "$t" != "$line"
then
exit 0
fi
done
fi
done

Another approach which depends on what you mean:
pcregrep -m 'line b - this is line b
line c - this is line c
line d - this is line d' file

Related

Bash Square Bracket Expansion

So the official bash manual states that "For example, the regular expression ‘[0123456789]’ matches any single digit, whereas ‘[^()]’ matches any single character that is not an opening or closing parenthesis,", copied a link at the bottom of this question, for context.
So I tested it every which way I could think of, to try and do the "negate" part of this, but I could not get it to work:
$ cat test
a line
b line
c line
d line
$ grep [^abc] test
a line
b line
c line
d line
$ grep '[^abc]' test
a line
b line
c line
d line
$ grep '[^(abc)]' test
a line
b line
c line
d line
[$ grep [^(abc)] test
bash: syntax error near unexpected token `('
https://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html
I was expecting just line D to be shown

[^abc] matches any character which is not a, b, or c.
It matches a line because it includes , l, i, n, and e, none of which are the excluded characters.
To ensure that no character in the string matches any in the list you would need grep '^[^abc]*$' test

To stick with the bash tag on the question, here’s a pure Bash option that takes advantage of extglob (which is mostly enabled by default).
while IFS= read -r line; do
[[ "$line" = *([^abc]) ]] && printf '%s\n' "$line"
done < test

Shell separate line into multiple lines after every number

So I have a selection of text files all of which are on one line
I need a way to seperate the line into multiple lines after every number.
At the minute I have something like this
a 111111b 222c 3d 444444
and I need a way to get it to this
a 11111
b 222
c 3
d 444444
I have been trying to create a gawk with regex but I'm not aware of a way to get this to work. (I am fairly new to shell)

Easy with sed.
$: cat file
a 51661b 99595c 65652d 51515
$: sed -E 's/([a-z] [0-9]+)\n*/\1\n/g' file
a 51661
b 99595
c 65652
d 51515
Pretty easy with awk.
$: awk '{ print gensub("([a-z] [0-9]+)\n*", "\\1\n", "g") }' file
a 51661
b 99595
c 65652
d 51515
Could even do with bash built-ins only...but don't...
while read -r line
do while [[ "$line" =~ [a-z]\ [0-9]+ ]]
do printf "%s\n" "$BASH_REMATCH"
line=${line#$BASH_REMATCH}
done
done < file
a 51661
b 99595
c 65652
d 51515

You already have a good answer from Paul, but for sed an arguably more direct expression simply using the first two numbered backreferences separated by a newline would be:
sed -E 's/([0-9])([^0-9])/\1\n\2/g' file
Example Use/Output
In your case that would be:
$ echo "a 111111b 222c 3d 444444" | sed -E 's/([0-9])([^0-9])/\1\n\2/g'
a 111111
b 222
c 3
d 444444

Coloring output of a script that overwrites lines?

I'm using this to color the output of a script/command:
commandWithOutput | sed -r 's/(pattern)/'"${COLOR_RED}"'\1'"${COLOR_DEFAULT}"'/g'
(This will color all occurences of string "pattern" in the command's output.) And it works fine with traditional commands. However, if the script/command overwrites lines in its output (maybe this has more to do with a terminal/console than just standard output?), e.g.:
Building project X:
CXX Building file XYZ.cpp... [123/1034]
the behavior isn't as expected. My sed will still color the output but the overwriting doesn't work anymore, i.e.:
Building project X:
CXX Building file ABC.cpp... [1/1034]
CXX Building file DEF.cpp... [2/1034]
CXX Building file GHI.cpp... [3/1034]
CXX Building file JKL.cpp... [4/1034]
CXX Building file MNO.cpp... [5/1034]
// and so on...
CXX Building file XYZ.cpp... [123/1034]
Is there a way to color the output of a script/command that overwrites lines?

I've tried several different ideas ... IFS=$'\r' + OP's sed command ... trying to use an intermediate pipe (mkfifo) for processing the output from commandWithOutput ... a few attempts at trying to unbuffer stdout and/or stdin ... but (so far) could only get a awk solution to work, so fwiw ...
NOTE: I'm assuming OP's command is generating a \r when overwriting a line; if this is not the case the OP can try piping their command's output to | od -c to see what character is at the 'end of the line', with the idea being to use said character in place of my \r references (below).
First we'll write a small script to generate some data, (re)printing over the first few lines, and then printing some 'standalone' lines:
$ cat overwrite
#!/usr/bin/bash
for (( i=1 ; i<="${1}" ; i++ ))
do
printf "this is a test ... ${i}\r"
sleep 1
done
printf "\nanother test output \t and a tab\n"
echo "X."
Running the above generates the following output:
$ overwrite 3
this is a test ... 3 << this line is actually printed 3x times with suffixes of '1', '2' and '3'
another test output and a tab
X.
Running this through od shows the \r at the end of the first 3 lines:
$ overwrite 3 | od -c
0000000 t h i s i s a t e s t .
0000020 . . 1 \r t h i s i s a t
0000040 e s t . . . 2 \r t h i s i
0000060 s a t e s t . . . 3 \r \n
0000100 a n o t h e r t e s t o u t
0000120 p u t \t a n d a t a b \n
0000140 X . \n
0000143
We'll now look at one awk solution for recoloring a specific pattern in the output from our overwrite script ...
First we'll define the start and clear/reset variables for our desired color; for this exercise I'm going to use 'red':
$ myred=$(tput setaf 1) # set our highlight color to red
$ myreset=$(tput sgr0) # disable coloring
NOTE: There are a few ways to define these colors (and the disable/reset); I'll leave that up to the reader to pick what works best in their environment.
Here's one awk solution I found that works:
$ overwrite 3 | awk -v ptn="test" -v cstart="${myred}" -v creset="${myreset}" -v RS="[\n\r]" '{ sub(ptn,cstart ptn creset) ; printf $0 RT }'
Where:
-v ptn="test" - we want to recolor all instances of the string test; we'll pass this in as awk variable ptn
-v cstart="${myred}" - assign our highlight color code (red) to our awk variable cstart
-v creset="${myreset}" - assign our color clear/reset code to the awk variable creset
-v RS="[\n\r]" - redefine our input record separator as either \r or \n
sub(ptn,cstart ptn creset) - replace all instances of test with <red> + test + <reset>
printf $0 RT - print our new line; RT allows us to make use of the same RS that was used to parse out this record
Running the above generates:
this is a test ... 3 << this line is actually printed 3x times with suffixes of '1', '2' and '3', and the 'test' string printed in red
another test output and a tab << the 'test' string is printed in red
X.

sed/awk between two patterns in a file: pattern 1 set by a variable from lines of a second file; pattern 2 designated by a specified charcacter

I have two files. One file contains a pattern that I want to match in a second file. I want to use that pattern to print between that pattern (included) up to a specified character (not included) and then concatenate into a single output file.
For instance,
File_1:
a
c
d
and File_2:
>a
MEEL
>b
MLPK
>c
MEHL
>d
MLWL
>e
MTNH
I have been using variations of this loop:
while read $id;
do
sed -n "/>$id/,/>/{//!p;}" File_2;
done < File_1
hoping to obtain something like the following output:
>a
MEEL
>c
MEHL
>d
MLWL
But have had no such luck. I have played around with grep/fgrep awk and sed and between the three cannot seem to get the right (or any output). Would someone kindly point me in the right direction?

Try:
$ awk -F'>' 'FNR==NR{a[$1]; next} NF==2{f=$2 in a} f' file1 file2
>a
MEEL
>c
MEHL
>d
MLWL
How it works
-F'>'
This sets the field separator to >.
FNR==NR{a[$1]; next}
While reading in the first file, this creates a key in array a for every line in file file.
NF==2{f=$2 in a}
For every line in file 2 that has two fields, this sets variable f to true if the second field is a key in a or false if it is not.
f
If f is true, print the line.

A plain (GNU) sed solution. Files are read only once. It is assumed that characters in File_1 needn't to be quoted in sed expression.
pat=$(sed ':a; $!{N;ba;}; y/\n/|/' File_1)
sed -E -n ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}" File_2
Explanation:
The first call to sed generates a regular expression to be used in the second call to sed and stores it in the variable pat. The aim is to avoid reading repeatedly the entire File_2 for each line of File_1. It just "slurps" the File_1 and replaces new-line characters with | characters. So the sample File_1 becomes a string with the value a|c|d. The regular expression a|c|d matches if at least one of the alternatives (a, b, c for this example) matches (this is a GNU sed extension).
The second sed expression, ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}", could be converted to pseudo code like this:
begin:
read next line (from File_2) or quit on end-of-file
label_a:
if line begins with `>` followed by one of the alternatives in `pat` then
label_b:
print the line
read next line (from File_2) or quit on end-of-file
if line begins with `>` goto label_a else goto label_b
else goto begin

Let me try to explain why your approach does not work well:
You need to say while read id instead of while read $id.
The sed command />$id/,/>/{//!p;} will exclude the lines which start
with >.
Then you might want to say something like:
while read id; do
sed -n "/^>$id/{N;p}" File_2
done < File_1
Output:
>a
MEEL
>c
MEHL
>d
MLWL
But the code above is inefficient because it reads File_2 as many times as the count of the id's in File_1.
Please try the elegant solution by John1024 instead.

If ed is available, and since the shell is involve.
#!/usr/bin/env bash
mapfile -t to_match < file1.txt
ed -s file2.txt <<-EOF
g/\(^>[${to_match[*]}]\)/;/^>/-1p
q
EOF
It will only run ed once and not every line that has the pattern, that matches from file1. Like say if you have a to z from file1,ed will not run 26 times.
Requires bash4+ because of mapfile.
How it works
mapfile -t to_match < file1.txt
Saves the entry/value from file1 in an array named to_match
ed -s file2.txt point ed to file2 with the -s flag which means don't print info about the file, same info you get with wc file
<<-EOF A here document, shell syntax.
g/\(^>[${to_match[*]}]\)/;/^>/-1p
g means search the whole file aka global.
( ) capture group, it needs escaping because ed only supports BRE, basic regular expression.
^> If line starts with a > the ^ is an anchor which means the start.
[ ] is a bracket expression match whatever is inside of it, in this case the value of the array "${to_match[*]}"
; Include the next address/pattern
/^>/ Match a leading >
-1 go back one line after the pattern match.
p print whatever was matched by the pattern.
q quit ed

Bash merge file lines

I have a file that holds output from a test.
test 1
42
test 2
69
test 3
420
test 4
55378008
I would like to make the test output appear on the same line as the test name. like so:
test 1: 42
test 2: 69
test 3: 420
test 4: 55378008
I am sure there is some fancy sed, awk or perl way to do this but I am stuck.

And here is another one in sed flavor to complete the offer :
sed 'N ; s/\n/: /' input_file
For each (odd) line starting from the first, append the next (even) one in pattern space separated by a LF, then just replace this LF by :.

awk 'FNR%2{printf "%s: ", $0; next}1' file
This prints odd lines with suffix : and without newline and even lines with a newline.

pr has this built-in, but if you need whitespace adjustment as well, then sed/awk/perl solutions suggested in other answers will suit you better
$ pr -2ats': ' ip.txt
test 1: 42
test 2: 69
test 3: 420
test 4: 55378008
This combines 2 lines at a time with : as the separator.

Just replace the line feed of odd lines with :␠.
perl -pe's/\n/: / if $. % 2'
You have mentioned that you want to removing leading and trailing whitespace as well. For that, you can use the following:
perl -pe's/^\h+|\h+$/g; s/\n/: / if $. % 2'
Specifying file to process to Perl one-liner

A shell solution, which is very slow on large set of data/files.
while IFS= read -r odd_line; do
IFS= read -r even_line
printf '%s: %s\n' "$odd_line" "$even_line"
done < file.txt
On the other hand if the colon is not a requirement paste can do the job.
paste - - < file.txt

Bash solution
skips empty lines
process both UNIX/DOS format 'end of line'
accepts filename as argument or otherwise reads data from STDIN
#!/bin/bash
while read p1
do
[[ -z $p1 ]] && continue
# p1=`echo -n $p1 | tr -d "\r"` # replaced with following line
p1=${p1//$'\r'/}
read p2
echo -n "$p1: $p2"
done < ${1:-/dev/stdin}
Output
test 1: 42
test 2: 69
test 3: 420
test 4: 55378008
NOTE: no empty lines allowed between lines for join

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Reading a subset of the lines in a text file, with bash - bash

sed -n '/line b/,/line d/p' file

Your example is not enough to infer what you want in the general case, but assuming you want to remove the first and last line, you can simply use tail -n+2 $filename | head -n-1 Here tail -n+2 prints all the lines starting from the second, and head -n-1 prints all the lines except the last.

for your set of sample data: awk '/line b/,/line d/' file Or awk '/line d/{f=0;print}/line b/{f=1}f' file

Another approach which depends on what you mean: pcregrep -m 'line b - this is line b line c - this is line c line d - this is line d' file

Related

Bash Square Bracket Expansion

Shell separate line into multiple lines after every number

Coloring output of a script that overwrites lines?

sed/awk between two patterns in a file: pattern 1 set by a variable from lines of a second file; pattern 2 designated by a specified charcacter

Bash merge file lines

Categories

Resources