Split one file into multiple files based on pattern - bash

I have a binary file which I convert into a regular file using hexdump and few awk and sed commands. The output file looks something like this -
$cat temp
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e5820000000000000000000
000000087d3f513000000000000000000000000000000000001001001010f000000000026
58783100b354c52658783100b43d3d0000ad6413400103231665f301010b9130194899f2f
fffffffffff02007c00dc015800a040402802f1d5b2b8ca5674504f433031000000000004
6363070000000000000000000000000065450000b4fb6b4000393d3d1116cdcc57e58287d
3f55285a1084b
The temp file has few eye catchers (3d3d) which don't repeat that often. They kinda denote a start of new binary record. I need to split the file based on those eye catchers.
My desired output is to have multiple files (based on the number of eyecatchers in my temp file).
So my output would look something like this -
$cat temp1
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e582000000000000000
0000000000087d3f513000000000000000000000000000000000001001001010f00000000
002658783100b354c52658783100b4
$cat temp2
3d3d0000ad6413400103231665f301010b9130194899f2ffffffffffff02007c00dc0
15800a040402802f1d5b2b8ca5674504f4330310000000000046363070000000000000000
000000000065450000b4fb6b400039
$cat temp3
3d3d1116cdcc57e58287d3f55285a1084b

The RS variable in awk is nice for this, allowing you to define the record separator. Thus, you just need to capture each record in its own temp file. The simplest version is:
cat temp |
awk -v RS="3d3d" '{ print $0 > "temp" NR }'
The sample text starts with the eye-catcher 3d3d, so temp1 will be an empty file. Further, the eye-catcher itself won't be at the start of the temp files, as was shown for the temp files in the question. Finally, if there are a lot of records, you could run into the system limit on open files. Some minor complications will bring it closer to what you want and make it safer:
cat temp |
awk -v RS="3d3d" 'NR > 1 { print RS $0 > "temp" (NR-1); close("temp" (NR-1)) }'

#!/usr/bin/perl
undef $/;
$_ = <>;
$n = 0;
for $match (split(/(?=3d3d)/)) {
open(O, '>temp' . ++$n);
print O $match;
close(O);
}

This might work:
# sed 's/3d3d/\n&/2g' temp | split -dl1 - temp
# ls
temp temp00 temp01 temp02
# cat temp00
3d3d01f87347545002f1d5b2be4ee4d700010100018000cc57e5820000000000000000000000000087d3f513000000000000000000000000000000000001001001010f000000000026 58783100b354c52658783100b4
# cat temp01
3d3d0000ad6413400103231665f301010b9130194899f2ffffffffffff02007c00dc015800a040402802f1d5b2b8ca5674504f4330310000000000046363070000000000000000000000000065450000b4fb6b400039
# cat temp02
3d3d1116cdcc57e58287d3f55285a1084b
EDIT:
If there are newlines in the source file you can remove them first by using tr -d '\n' <temp and then pipe the output through the above sed command. If however you wish to preserve them then:
sed 's/3d3d/\n&/g;s/^\n\(3d3d\)/\1/' temp |csplit -zf temp - '/^3d3d/' {*}
Should do the trick

Mac OS X answer
Where that nice awk -v RS="pattern" trick doesn't work. Here's what I got working:
Given this example concatted.txt
filename=foo bar
foo bar line1
foo bar line2
filename=baz qux
baz qux line1
baz qux line2
use this command (remove comments to prevent it from failing)
# cat: useless use of cat ^__^;
# tr: replace all newlines with delimiter1 (which must not be in concatted.txt) so we have one line of all the next
# sed: replace file start pattern with delimiter2 (which must not be in concatted.txt) so we know where to split out each file
# tr: replace delimiter2 with NULL character since sed can't do it
# xargs: split giant single-line input on NULL character and pass 1 line (= 1 file) at a time to echo into the pipe
# sed: get all but last line (same as head -n -1) because there's an extra since concatted-file.txt ends in a NULL character.
# awk: does a bunch of stuff as the final command. Remember it's getting a single line to work with.
# {replace all delimiter1s in file with newlines (in place)}
# {match regex (sets RSTART and RLENGTH) then set filename to regex match (might end at delimiter1). Note in this case the number 9 is the length of "filename=" and the 2 removes the "§" }
# {write file to filename and close the file (to avoid "too many files open" error)}
cat ../concatted-file.txt \
| tr '\n' '§' \
| sed 's/filename=/∂filename=/g' \
| tr '∂' '\0' \
| xargs -t -0 -n1 echo \
| sed \$d \
| awk '{match($0, /filename=[^§]+§/)} {filename=substr($0, RSTART+9, RLENGTH-9-2)".txt"} {gsub(/§/, "\n", $0)} {print $0 > filename; close(filename)}'
results in these two files named foo bar.txt and baz qux.txt respectively:
filename=foo bar
foo bar line1
foo bar line2
filename=baz qux
baz qux line1
baz qux line2
Hope this helps!

It depends if it's a single line in your temp file or not. But assuming if it's a single line, you can go with:
sed 's/\(.\)\(3d3d\)/\1#\2/g' FILE | awk -F "#" '{ for (i=1; i++; i<=NF) { print $i > "temp" i } }'
The first sed inserts a # as a field/record separator, then awk splits on # and prints every "field" to its own file.
If the input file is already split on 3d3d then you can go with:
awk '/^3d3d/ { i++ } { print > "temp" i }' temp
HTH

Related

replace a range of number in a file

I would like to replace a range of number in a file with another range. Let say I have:
/dev/raw/raw16
/dev/raw/raw17
/dev/raw/raw18
And I want modify them as:
/dev/raw/raw1
/dev/raw/raw2
/dev/raw/raw3
I know I can do it using sed or awk but just cannot write it correctly. What is the easiest way to do it?
awk to the rescue!
$ awk -F'/dev/raw/raw' '{print FS (++c)}' ile
/dev/raw/raw1
/dev/raw/raw2
/dev/raw/raw3
I would not recomment changing device names.
Anyway, just to replace letters or numbers you could use the option 's' with sed.
cat file.txt | sed s/raw16/raw1/g; > newfile.txt
In this example you replace all the raw16 with raw1.
Here some other examples ...
sed 's/foo/bar/' # replaces in each row the first foo only
sed 's/foo/bar/4' # replaces in each row every 4.
sed 's/foo/bar/g' # replaces all foo with bar
sed 's/\(.*\)foo/\1bar/' # replace the last only per line
.
# using /raw as field separator, so $2 is the end number in this case
awk -v From=16 -v To=18 -v NewStart=1 -F '/raw' '
# for lines where last number is in the scope
$2 >= From && $2 <=To {
# change last number to corresponding in new scope
sub( /[0-9]+$/, $2 - From + NewStart)
}
# print (default action of a non 0 value "filter") the line (modified or not)
7
' file.txt \
> newfile.txt
Note:
adapt the field separator for your real need
suite for your sample of data, not if other element are in the line but you could easily adapt this code foir your purpose

How to take multiple argument in bash and pass them to awk?

I am writing a function in which I am replacing the leading/trailing space
from the column and if there is no value in the column replace it with null.
Function is working fine for one column but how can i modify it for multiple columns.
Function :
#cat trimfunction
#!/bin/bash
function trim
{
vCol=$1 ###input column name
vFile=$2 ###input file name
var3=/home/vipin/temp ###temp file
awk -v col="${vCol}" -f /home/vipin/colf.awk ${vFile} > $var3 ###operation
mv -f $var3 $vFile ###Forcefully mv
}
AWK script :
#cat colf.awk
#!/bin/awk -f
BEGIN{FS=OFS="|"}{
gsub(/^[ \t]+|[ \t]+$/, "", $col) ###replace space from 2nd column
}
{if ($col=="") {print $1,"NULL",$3} else print $0} ###replace whitespace with NULL
Input file : leading/trailing/white space in 2nd column
#cat filename.txt
1| 2016-01|00000321|12
2|2016-02 |000000432|13
3|2017-03 |000004312|54
4| |000005|32
5|2017-05|00000543|12
Script :
#cat script.sh
. /home/vipin/trimfunction
trim 2 filename.txt
Output file : leading/trailing/white space removed in 2nd column
#./script.sh
#cat filename.txt
1|2016-01|00000321|12
2|2016-02|000000432|13
3|2017-03|000004312|54
4|NULL|000005
5|2017-05|00000543|12
If input file is like below - ( white/leading/trailing space in 2nd
and 5th column of file)
1|2016-01|00000321|12|2016-01 |00000
2|2016-02 |000000432|13| 2016-01|00000
3| 2017-03|000004312|54| |00000
4| |000005|2016-02|0000
5|2017-05 |00000543|12|2016-02 |0000
How to achive below output - (All leading/trailing space trimmed and
white space replaced with NULL in 2nd and 5th col) something like trim
2 5 filename.txt trim 2 5 filename.txt ###passing two column name as
input
1|2016-01|00000321|12|2016-01|00000
2|2016-02|000000432|13|2016-01|00000
3|2017-03|000004312|54|NULL|00000
4|NULL|000005|2016-02|0000
5|2017-05|00000543|12|2016-02|0000
This will do what you said you wanted:
$ cat tst.sh
file="${!#}"
cols=( "$#" )
unset cols[$(( $# - 1 ))]
awk -v cols="${cols[*]}" '
BEGIN {
split(cols,c)
FS=OFS="|"
}
{
for (i in c) {
gsub(/^[[:space:]]+|[[:space:]]+$/,"",$(c[i]))
sub(/^$/,"NULL",$(c[i]))
}
print
}' "$file"
$ ./tst.sh 2 5 file
1|2016-01|00000321|12|2016-01|00000
2|2016-02|000000432|13|2016-01|00000
3|2017-03|000004312|54|NULL|00000
4|NULL|000005|2016-02|0000
5|2017-05|00000543|12|2016-02|0000
but if what you REALLY wanted was to operate on ALL fields instead of specific ones then of course there's a simpler solution.
Never do cmd file > tmp; mv tmp file by the way, always do cmd file > tmp && mv tmp file instead (note the &&) so you only overwrite your original file if the command succeeded. Also - always quote your shell variables unless you have a very specific purpose in mind by not doing so and fully understand all of the implications, so use "$file", not $file. Google it.
You can pass a list of columns to modify as a parameter. Create files
$ cat trim.awk
BEGIN {
split(c, a)
FS = OFS = "|"
}
{
for (i in a) {
i = a[i]
gsub(/^[ \t]+|[ \t]+$/, "", $i)
if (!length($i)) $i = "NULL"
}
print
}
and
$ cat filename.txt
1|2016-01|00000321|12|2016-01 |00000
2|2016-02 |000000432|13| 2016-01|00000
3| 2017-03|000004312|54| |00000
4| |000005|2016-02|0000
5|2017-05 |00000543|12|2016-02 |0000
Usage:
awk -v c="2 5" -f trim.awk filename.txt
If managing leading/trailing spaces is all you want to do, you probably don't want to do all(AWK code) that.
cat q1.txt | tr -s ' ' | sed 's/|\ |/|NULL|/g' | sed 's/\ //g' should do.
Break-down
tr -s ' ' : Squeeze multiple spaces into one
sed 's/|\ |/|NULL|/g' : Replace all "| |" with "|NULL|"
sed 's/\ //g' : Replace all spaces with empty string.

Sort alphabetically lines between 2 patterns in Bash

I'd like to alphabetically lines between 2 patterns in a Bash shell script.
Given the following input file:
aaa
bbb
PATTERN1
foo
bar
baz
qux
PATTERN2
ccc
ddd
I expect as output:
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd
Preferred tool is an AWK "one-liner". Sed and other solutions also accepted. It would be nice if an explanation is included.
This is a perfect case to use asort() to sort an array in GNU awk:
gawk '/PATTERN1/ {f=1; delete a}
/PATTERN2/ {f=0; n=asort(a); for (i=1;i<=n;i++) print a[i]}
!f
f{a[$0]=$0}' file
This uses a similar logic as How to select lines between two marker patterns which may occur multiple times with awk/sed with the addition that it:
Prints lines outside this range
Stores lines within this range
And when the range is over, sorts and prints them.
Detailed explanation:
/PATTERN1/ {f=1; delete a} when finding a line matching PATTERN1, sets a flag on, and clears the array of lines.
/PATTERN2/ {f=0; n=asort(a); for (i=1;i<=n;i++) print a[i]} when finding a line matching PATTERN2, sets the flag off. Also, sorts the array a[] containing all the lines in the range and print them.
!f if the flag is off (that is, outside the range), evaluate as True so that the line is printed.
f{a[$0]=$0} if the flag is on, store the line in the array a[] so that its info can be used later on.
Test
▶ gawk '/PATTERN1/ {f=1} /PATTERN2/ {f=0; n=asort(a); for (i=1;i<=n;i++) print a[i]} !f; f{a[$0]=$0}' FILE
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd
You can use sed with head and tail:
{
sed '1,/^PATTERN1$/!d' FILE
sed '/^PATTERN1$/,/^PATTERN2$/!d' FILE | head -n-1 | tail -n+2 | sort
sed '/^PATTERN2$/,$!d' FILE
} > output
The first line prints everything from the 1st line to PATTERN1.
The second line takes the lines between PATTERN1 and PATTERN2, removes the last and first line, and sorts the remaining lines.
The third line prints everything from PATTERN2 to the end of the file.
More complicated, but may ease the memory load of storing lots of lines (your cfg file would have to be pretty huge for this to matter, but nevertheless...). Using GNU awk and a sort coprocess:
gawk -v p=1 '
/^PATTERN2/ { # when we we see the 2nd marker:
# close the "write" end of the pipe to sort. Then sort will know it
# has all the data and it can begin sorting
close("sort", "to");
# then sort will print out the sorted results, so read and print that
while (("sort" |& getline line) >0) print line
# and turn the boolean back to true
p=1
}
p {print} # if p is true, print the line
!p {print |& "sort"} # if p is false, send the line to `sort`
/^PATTERN1/ {p=0} # when we see the first marker, turn off printing
' FILE
It's a little unconventional but using Vim:
vim -c 'exe "normal /PATTERN1\<cr>jV/PATTERN2\<cr>k: ! sort\<cr>" | wq!' FILE
Where \<cr> is a carriage return, entered as CTRL-v then CTRL-M.
Further explanation:
Using vim normal mode,
/PATTERN1\<cr> - search for the first pattern
j - go to the next line
V - enter visual mode
/PATTERN2\<cr> - search for the second pattern
k - go back one line
: ! sort\<cr> - sort the visual text you just selected
wq! - save and exit
Obviously this is inferior to the GNU AWK solution, but all the same, this is a GNU sed solution:
sed '
/PATTERN1/,/PATTERN2/ {
/PATTERN1/b # branch/break if /PATTERN1/. This line is printed
/PATTERN2/ { # if /PATTERN2/,
x # swap hold and pattern spaces
s/^\n// # delete the leading newline. The first H puts it there
s/.*/sort <<< "&"/e # sort the pattern space by calling Unix sort
p # print the sorted pattern space
x # swap hold and pattern space again to retrieve PATTERN2
p # print it also
}
H # Append the pattern space to the hold space
d # delete this line for now - it will be printed in the block above
}
' FILE
Note that I rely on the e command, a GNU extension.
Testing:
▶ gsed '
/PATTERN1/,/PATTERN2/ {
/PATTERN1/b
/PATTERN2/ {
x
s/^\n//; s/.*/sort <<< "&"/ep
x
p
}
H
d
}
' FILE
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd
Here is a small and easy to understand shell script for sorting lines between two patterns:
#!/bin/sh
in_file=$1
out_file=$2
temp_file_for_sort="$out_file.temp.for_sort"
curr_state=0
in_between_count=0
rm -rf $out_file
while IFS='' read -r line; do
if (( $curr_state == 0 )); then
#write this line to output
echo $line >> $out_file
is_start_line=`echo $line | grep "^PATTERN_START$"`
if [ -z "$is_start_line" ]; then
continue
else
rm -rf $temp_file_for_sort
in_between_count=0
curr_state=1
fi
else
is_end_line=`echo $line | grep "^PATTERN_END"`
if [ -z "$is_end_line" ]; then
#Line inside block - to be sorted
echo $line >> $temp_file_for_sort
in_between_count=$(( $in_between_count +1 ))
else
#End of block
curr_state=0
if (( $in_between_count != 0 )); then
sort -o $temp_file_for_sort $temp_file_for_sort
cat $temp_file_for_sort >> $out_file
rm -rf $temp_file_for_sort
fi
echo $line >> $out_file
fi
fi
done < $temp_file
#if something remains
if [ -f $temp_file_for_sort ]; then
cat $temp_file_for_sort >> $out_file
fi
rm -rf $temp_file_for_sort
Usage: <script_path> <input_file> <output_file>.
Pattern is hardcoded in file, can be changed as required (or taken as argument). Also, it creates a temporary file to sort intermediate data (<output_file>.temp.for_sort)
Algorithm:
Start with state = 0 and read the file line by line.
In state 0, line is written to output file and if START_PATTERN is encountered, state is set to 1.
In state 1, if line is not STOP_PATTERN, write line to temporary file
In state 1, if line is STOP_PATTERN, sort temporary file, append temporary file contents to output file (and remove temporary file) and write STOP_PATTERN to output file. Also, change state to 0.
At last if something is left in temporary file (case when STOP_PATTERN is missing), write contents of temporary file to output file
Along the lines of the solution proposed by #choroba, using GNU sed (depends on Q command):
{
sed -n '1,/PATTERN1/p' FILE
sed '1,/PATTERN1/d; /PATTERN2/Q' FILE | sort
sed -n '/PATTERN2/,$p' FILE
}
Explanation:
Use of the p prints a line in the range 1 to /PATTERN1/ inclusive and ($ is end of file) in '1,/PATTERN1/p' and /PATTERN2/,$p respectively.
Use of -n disables default behaviour of printing all lines. Useful in conjunction with p.
In the middle line, the d command is used to delete lines 1 to the /PATTERN1/ and also to Q (quit without printing, GNU sed only) on the first line matching /PATTERN2/. These are the lines to be sorted, and are thus fed into sort.
This can also be done with non-GNU awk and system command sort, make it work on both macOS and Linux.
awk -v SP='PATTERN1' -v EP='PATTERN2' -v cmd=sort '{
if (match($0, SP)>0) {flag=1}
else if (match($0, EP)>0) {
for (j=0;j<length(a);j++) {print a[j]|cmd}
close(cmd); delete a; i=0; flag=0}
else if (flag==1) {a[i++]=$0; next}
print $0
}' FILE
Output:
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd

How to quickly delete the lines in a file that contain items from a list in another file in BASH?

I have a file called words.txt containing a list of words. I also have a file called file.txt containing a sentence per line. I need to quickly delete any lines in file.txt that contain one of the lines from words.txt, but only if the match is found somewhere between { and }.
E.g. file.txt:
Once upon a time there was a cat.
{The cat} lived in the forest.
The {cat really liked to} eat mice.
E.g. words.txt:
cat
mice
Example output:
Once upon a time there was a cat.
Is removed because "cat" is found on those two lines and the words are also between { and }.
The following script successfully does this task:
while read -r line
do
sed -i "/{.*$line.*}/d" file.txt
done < words.txt
This script is very slow. Sometimes words.txt contains several thousand items, so the while loop takes several minutes. I attempted to use the sed -f option, which seems to allow reading a file, but I cannot find any manuals explaining how to use this.
How can I improve the speed of the script?
An awk solution:
awk 'NR==FNR{a["{[^{}]*"$0"[^{}]*}"]++;next}{for(i in a)if($0~i)next;b[j++]=$0}END{printf "">FILENAME;for(i=0;i in b;++i)print b[i]>FILENAME}' words.txt file.txt
It converts file.txt directly to have the expected output.
Once upon a time there was a cat.
Uncondensed version:
awk '
NR == FNR {
a["{[^{}]*" $0 "[^{}]*}"]++
next
}
{
for (i in a)
if ($0 ~ i)
next
b[j++] = $0
}
END {
printf "" > FILENAME
for (i = 0; i in b; ++i)
print b[i] > FILENAME
}
' words.txt file.txt
If files are expected to get too large that awk may not be able to handle it, we can only redirect it to stdout. We may not be able to modify the file directly:
awk '
NR == FNR {
a["{[^{}]*" $0 "[^{}]*}"]++
next
}
{
for (i in a)
if ($0 ~ i)
next
}
1
' words.txt file.txt
you can use grep to match 2 files like this:
grep -vf words.txt file.txt
In think that using the grep command should be way faster. By example:
grep -f words.txt -v file.txt
The f option make grep use the words.txt file as matching patterns
The v option reverse the matching, ie keeping files that do not match one of the patterns.
It doesn't solve the {} constraint, but that is easily avoidable, for example by adding the brackets to the pattern file (or in a temporary file created at runtime).
I think this should work for you:
sed -e 's/.*/{.*&.*}/' words.txt | grep -vf- file.txt > out ; mv out file.txt
This basically just modifies the words.txt file on the fly and uses it as a word file for grep.
In pure native bash (4.x):
#!/bin/env bash4
# ^-- MUST start with a /bin/bash shebang, NOT /bin/sh
readarray -t words <words.txt # read words into array
IFS='|' # use | as delimiter when expanding $*
words_re="[{].*(${words[*]}).*[}]" # form a regex matching all words
while read -r; do # for each line in file...
if ! [[ $REPLY =~ $words_re ]]; then # ...check whether it matches...
printf '%s\n' "$REPLY" # ...and print it if not.
fi
done <file.txt
Native bash is somewhat slower than awk, but this still is a single-pass solution (O(n+m), whereas the sed -i approach was O(n*m)), making it vastly faster than any iterative approach.
You could do this in two steps:
Wrap each word in words.txt with {.* and .*}:
awk '{ print "{.*" $0 ".*}" }' words.txt > wrapped.txt
Use grep with inverse match:
grep -v -f wrapped.txt file.txt
This would be particularly useful if words.txt is very large, as a pure-awk approach (storing all the entries of words.txt in an array) would require a lot of memory.
If would prefer a one-liner and would like to skip creating the intermediate file you could do this:
awk '{ print "{.*" $0 ".*}" }' words.txt | grep -v -f - file.txt
The - is a placeholder which tells grep to use stdin
update
If the size of words.txt isn't too big, you could do the whole thing in awk:
awk 'NR==FNR{a[$0]++;next}{p=1;for(i in a){if ($0 ~ "{.*" i ".*}") { p=0; break}}}p' words.txt file.txt
expanded:
awk 'NR==FNR { a[$0]++; next }
{
p=1
for (i in a) {
if ($0 ~ "{.*" i ".*}") { p=0; break }
}
}p' words.txt file.txt
The first block builds an array containing each line in words.txt. The second block runs for every line in file.txt. A flag p controls whether the line is printed. If the line matches the pattern, p is set to false. When the p outside the last block evaluates to true, the default action occurs, which is to print the line.

Search file, show matches and first line

I've got a comma separated textfile, which contains the column headers in the first line:
column1;column2;colum3
foo;123;345
bar;345;23
baz;089;09
Now I want a short command that outputs the first line and the matching line(s). Is there a shorter way than:
head -n 1 file ; cat file | grep bar
This should do the job:
sed -n '1p;2,${/bar/p}' file
where:
1p will print the first line
2,$ will match from second line to the last line
/bar/p will print those lines that match bar
Note that this won't print the header line twice if there's a match in the columns names.
This might work for you:
cat file | awk 'NR<2;$0~v' v=baz
column1;column2;colum3
baz;089;09
Usually cat file | ... is useless but in this case it keeps the file argument out of the way and allows the variable v to be amended quickly.
Another solution:
cat file | sed -n '1p;/foo/p'
column1;column2;colum3
foo;123;345
You can use grouping commands, then pipe to column command for pretty-printing
$ { head -1; grep bar; } <input.txt | column -ts';'
column1 column2 colum3
bar 345 23
What if the first row contains bar too? Then it's printed two times with your version. awk solution:
awk 'NR == 1 { print } NR > 1 && $0 ~ "bar" { print }' FILE
If you want the search sting as the almost last item on the line:
awk 'ARGIND > 1 { exit } NR == 1 { print } NR > 1 && $0 ~ ARGV[2] { print }' FILE YOURSEARCHSTRING 2>/dev/null
sed solution:
sed -n '1p;1d;/bar/p' FILE
The advantage for both of them, that it's a single process.
head -n 1 file && grep bar file Maybe there is even a shorter version but will get a bit complicated.
EDIT: as per bobah 's comment I have added && between the commands to have only a single error for missing file
Here is the shortest command yet:
awk 'NR==1||/bar/' file

Resources