Insert text before a certain line using Bash - bash

How can I insert a set of lines (about 5) into a file at the first place a string is found?
For example:
BestAnimals.txt
dog
cat
dolphin
cat
$ "Insert giraffe to BestAnimals.txt before cat" > NewBestAnimals.txt
NewBestAnimals.txt
dog
giraffe
cat
dolphin
cat

If using gnu sed:
$ cat animals
dog
cat
dolphin
cat
$ sed "/cat/ { N; s/cat\n/giraffe\n&/ }" animals
dog
giraffe
cat
dolphin
cat
match a line with (/cat/)
continue on next line (N)
substitute the matched pattern with the insertion and the matched string, where & represent the matched string.

awk -v insert=giraffe -v before=cat '
$1 == before && ! inserted {
print insert
inserted++
}
{print}
' BestAnimals.txt > NewBestAnimals.txt

If you know (or somehow find out) the line:
sed -n '/cat/=' BestAnimals.txt
You can use sed:
sed -i '2i giraffe' BestAnimals.txt

An awk solution:
awk '/cat/ && c == 0 {c = 1; print "giraffe"}; {print}' \
BestAnimals.txt
If the animals you want to insert are in "MyOtherBestAnimals.txt" you can also do
awk '/cat/ && c == 0 {c = 1; system("cat MyOtherBestAnimals.txt") }; {print} ' \
BestAnimals.txt
This answer can basically be broken down as follows, because ; separates the awk condition-action pairs:
/cat/ && c == 0 { c = 1; ... } sets c to 1 at the first row containing cat. The commands put at the ... are then executed, but only once, because c is 1 now.
{print} is the action print with no condition: prints any input line. This is done after the above condition-action pair.
Depending on what is actually at the ..., giraffe is printed, or the contents of "MyOtherBestAnimals.txt" is sent to the standard output, before printing the first line containing "cat".
Edit
After analysis of #glenn jackman's solution, it seems this solution can still be improved: when using input file
nyan cat
cat
the data is appended before nyan cat and not before the line equal to cat. The solution is then to request the full line to be equal to cat:
awk '$0 == "cat" && c == 0 {c = 1; print "giraffe"}; {print}' \
BestAnimals.txt
for the insertion of a single line and
awk '$0 == "cat" && c == 0 {c = 1; system("cat MyOtherBestAnimals.txt") }; {print} ' \
BestAnimals.txt
for the insertion of a file

I would:
Use grep to find the line number of the first match
Use head to get the text leading up to the match
Insert the new content using cat
Use tail to get the lines after the match
It's neither quick, efficient nor elegant. But it's pretty straight-forward, and if the file isn't gigantic and/or you need to do this many times a second, it should be fine.

Related

Computing the size of array in text file in bash

I have a text file that sometimes-not always- will have an array with a unique name like this
unique_array=(1,2,3,4,5,6)
I would like to find the size of the array-6 in the above example- when it exists and skip it or return -1 if it doesnt exist.
grepping the file will tell me if the array exists but not how to find its size.
The array can fill multiple lines like
unique_array=(1,2,3,
4,5,6,
7,8,9,10)
Some of the elements in the array can be negative as in
unique_array=(1,2,-3,
4,5,6,
7,8,-9,10)
awk -v RS=\) -F, '/unique_array=\(/ {print /[0-9]/?NF:0}' file.txt
-v RS=\) - delimit records by ) instead of newlines
-F, - delimit fields by , instead of whitespace
/unique_array=(/ - look for a record containing the unique identifier
/[0-9]?NF:0 - if record contains digit, number of fields (ie. commas+1), otherwise 0
There is a bad bug in the code above: commas preceding the array may be erroneously counted. A fix is to truncate the prefix:
awk -v RS=\) -F, 'sub(/.*unique_array=\(/,"") {print /[0-9]/?NF:0}' file.txt
Your specifications are woefully incomplete, but guessing a bit as to what you are actually looking for, try this at least as a starting point.
awk '/^unique_array=\(/ { in_array = 1; n = split(",", arr, $0); next }
in_array && /\)/ { sub(/\)./, ""); quit = 1 }
in_array { n += split(",", arr, $0);
if (quit) { print n; in_array = quit = n = 0 } }' file
We keep a state variable in_array which tells us whether we are currently in a region which contains the array. This gets set to 1 when we see the beginning of the array, and back to 0 when we see the closing parenthesis. At this point, we remove the closing parenthesis and everything after it, and set a second variable quit to trigger the finishing logic in the next condition. The last condition performs two tasks; it adds the items from this line to the count in n, and then checks if quit is true; if it is, we are at the end of the array, and print the number of elements.
This will simply print nothing if the array was not found. You could embellish the script to set a different exit code or print -1 if you like, but these details seem like unnecessary complications for a simple script.
I think what you probably want is this, using GNU awk for multi-char RS and RT and word boundaries:
$ awk -v RS='\\<unique_array=[(][^)]*)' 'RT{exit} END{print (RT ? gsub(/,/,"",RT)+1 : -1)}' file
With your shown samples please try following awk.
awk -v RS= '
{
while(match($0,/\<unique_array=[(][^)]*\)/)){
line=substr($0,RSTART,RLENGTH)
gsub(/[[:space:]]*\n[[:space:]]*|(^|\n)unique_array=\(|(\)$|\)\n)/,"",line)
print gsub(/,/,"&",line)+1
$0=substr($0,RSTART+RLENGTH)
}
}
' Input_file
Using sed and declare -a. The test file is like this:
$ cat f
saa
dfsaf
sdgdsag unique_array=(1,2,3,
4,5,6,
7,8,9,10) sdfgadfg
sdgs
sdgs
sfsaf(sdg)
Testing:
$ declare -a "$(sed -n '/unique_array=(/,/)/s/,/ /gp' f | \
sed 's/.*\(unique_array\)/\1/;s/).*/)/;
s/`.*`//g')"
$ echo ${unique_array[#]}
1 2 3 4 5 6 7 8 9 10
And then you can do whatever you want with ${unique_array[#]}
With GNU grep or similar that support -z and -o options:
grep -zo 'unique_array=([^)]*)' file.txt | tr -dc =, | wc -c
-z - (effectively) treat file as a single line
-o - only output the match
tr -dc =, - strip everything except = and ,
wc -c - count the result
Note: both one- and zero-element arrays will be treated as being size 1. Will return 0 rather than -1 if not found.
here's an awk solution that works with gawk, mawk 1/2, and nawk :
TEST INPUT
saa
dfsaf
sdgdsag unique_array=(1,2,3,
4,5,6,
7,8,9,10) sdfgadfg
sdgs
sdgs
sfsaf(sdg)
CODE
{m,n,g}awk '
BEGIN { __ = "-1:_ERR_NOT_FOUND_"
RS = "^$" (_ = OFS = "")
FS = "(^|[ \t-\r]?)unique[_]array[=][(]"
___ = "[)].*$|[^0-9,.+-]"
} $!NF = NR < NF ? $(gsub(___,_)*_) : __'
OUTPUT
1,2,3,4,5,6,7,8,9,10

Bash: Separating a file by blank lines and assigning to a list

So i have a file for example
a
b
c
d
I'd like to make the list of the lines with data out of this. The empty line would be the seperator. So above file's list would be
First element = a
Second element = b
c
Third element = d
Replace blank lines with ,, then remove newline characters:
cat <file> | sed 's/^$/, /' | tr -d '\n'
The following awk would do:
awk 'BEGIN{RS="";ORS=",";FS="\n";OFS=""}($1=$1)' file
This adds an extra , at the end. You can get rid of that in the following way:
awk 'BEGIN{RS="";ORS=",";FS="\n";OFS=""}
{$1=$1;s=s $0 ORS}END{sub(ORS"$","",s); print s}' file
But what happened now, by making this slight modification to eliminate the last ORS (i.e. comma), you have to store the full thing in memory. So you could then just do it more boring and less elegant by storing the full file in memory:
awk '{s=s $0}END{gsub(/\n\n/,",",s);gsub(/\n/,"",s); print s}' file
The following sed does exactly the same. Store the full file in memory and process it.
sed ':a;N;$!ba;s/\n\n/,/g;s/\n//g' <file>
There is, however, a way to play it a bit more clever with awk.
awk 'BEGIN{RS=OFS="";FS="\n"}{$1=$1; print (NR>1?",":"")$0}' file
It depends on what you need to do with that data.
With perl, you have a one-liner:
$ perl -00 -lnE 'say "element $. = $_"' file.txt
element 1 = a
element 2 = b
c
element 3 = d
But clearly you need to process the elements in some way, and I suspect Perl is not your cup of tea.
With bash you could do:
elements=()
n=0
while IFS= read -r line; do
[[ $line ]] && elements[n]+="$line"$'\n' || ((n++))
done < file.txt
# strip the trailing newline from each element
elements=("${elements[#]/%$'\n'/}")
# and show what's in the array
declare -p elements
declare -a elements='([0]="a" [1]="b
c" [2]="d")'
$ awk -v RS= '{print "Element " NR " = " $0}' file
Element 1 = a
Element 2 = b
c
Element 3 = d
If you really want to say First Element instead of Element 1 then enjoy the exercise :-).

Including empty lines using pattern

My problem is the following: I have a text file where there are no empty lines, now I would like to include the lines according to the pattern file where 1 means print the line without including a new line, 0 - include a new line. My text file is :
apple
banana
orange
milk
bread
Thу pattern file is :
1
1
0
1
0
1
1
The desire output correspondingly:
apple
banana
orange
milk
bread
What I tried is:
for i in $(cat pattern file);
do
awk -v var=$i '{if var==1 {print $0} else {printf "\n" }}' file;
done.
But it prints all the lines first, and only after that it changes $i
Thanks for any prompts.
Read the pattern file into an array, then use that array when processing the text file.
awk 'NR==FNR { newlines[NR] = $0; next}
{ print $0 (newlines[FNR] ? "" : "\n") }' patternfile textfile
allow multiple 0 between 1
Self documented code
awk '# for file 1 only
NR==FNR {
#load an array with 0 and 1 (reversed due to default value of an non existing element = 0)
n[NR]=!$1
# cycle to next line (don't go furthier in the script for this line)
next
}
# at each line (of file 2 due to next of last bloc)
{
# loop while (next due to a++) element of array = 1
for(a++;n[a]==1;a++){
# print an empty line
printf( "\n")
}
# print the original line
print
}' pattern YourFile
need of inversion of value to avoid infinite new line on last line in case there is less info in pattern than line in data file
multiple 0 need a loop + test
unsynchro between file number of pattern and data file is a problem using a direct array (unless it keep how much newline to insert, another way to doing it)
This is a bit of a hack, but I present it as an alternative to your traditionally awk-y solutions:
paste -d, file.txt <(cat pattern | tr '\n' ' ' | sed 's,1 0,10,g' | tr ' ' '\n' | tr -d '1') | tr '0' '\n' | tr -d ','
The output looks like this:
apple
banana
orange
milk
bread
Inverse of Barmar's, read the text into an array and then print as you process the pattern:
$ awk 'NR==FNR {fruit[NR]=$0; next} {print $0?fruit[++i]:""}' fruit.txt pattern.txt
apple
banana
orange
milk
For an answer using only bash:
i=0; mapfile file < file
for p in $(<pattern); do
((p)) && printf "%s" "${file[i++]}" || echo
done

Sort alphabetically lines between 2 patterns in Bash

I'd like to alphabetically lines between 2 patterns in a Bash shell script.
Given the following input file:
aaa
bbb
PATTERN1
foo
bar
baz
qux
PATTERN2
ccc
ddd
I expect as output:
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd
Preferred tool is an AWK "one-liner". Sed and other solutions also accepted. It would be nice if an explanation is included.
This is a perfect case to use asort() to sort an array in GNU awk:
gawk '/PATTERN1/ {f=1; delete a}
/PATTERN2/ {f=0; n=asort(a); for (i=1;i<=n;i++) print a[i]}
!f
f{a[$0]=$0}' file
This uses a similar logic as How to select lines between two marker patterns which may occur multiple times with awk/sed with the addition that it:
Prints lines outside this range
Stores lines within this range
And when the range is over, sorts and prints them.
Detailed explanation:
/PATTERN1/ {f=1; delete a} when finding a line matching PATTERN1, sets a flag on, and clears the array of lines.
/PATTERN2/ {f=0; n=asort(a); for (i=1;i<=n;i++) print a[i]} when finding a line matching PATTERN2, sets the flag off. Also, sorts the array a[] containing all the lines in the range and print them.
!f if the flag is off (that is, outside the range), evaluate as True so that the line is printed.
f{a[$0]=$0} if the flag is on, store the line in the array a[] so that its info can be used later on.
Test
▶ gawk '/PATTERN1/ {f=1} /PATTERN2/ {f=0; n=asort(a); for (i=1;i<=n;i++) print a[i]} !f; f{a[$0]=$0}' FILE
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd
You can use sed with head and tail:
{
sed '1,/^PATTERN1$/!d' FILE
sed '/^PATTERN1$/,/^PATTERN2$/!d' FILE | head -n-1 | tail -n+2 | sort
sed '/^PATTERN2$/,$!d' FILE
} > output
The first line prints everything from the 1st line to PATTERN1.
The second line takes the lines between PATTERN1 and PATTERN2, removes the last and first line, and sorts the remaining lines.
The third line prints everything from PATTERN2 to the end of the file.
More complicated, but may ease the memory load of storing lots of lines (your cfg file would have to be pretty huge for this to matter, but nevertheless...). Using GNU awk and a sort coprocess:
gawk -v p=1 '
/^PATTERN2/ { # when we we see the 2nd marker:
# close the "write" end of the pipe to sort. Then sort will know it
# has all the data and it can begin sorting
close("sort", "to");
# then sort will print out the sorted results, so read and print that
while (("sort" |& getline line) >0) print line
# and turn the boolean back to true
p=1
}
p {print} # if p is true, print the line
!p {print |& "sort"} # if p is false, send the line to `sort`
/^PATTERN1/ {p=0} # when we see the first marker, turn off printing
' FILE
It's a little unconventional but using Vim:
vim -c 'exe "normal /PATTERN1\<cr>jV/PATTERN2\<cr>k: ! sort\<cr>" | wq!' FILE
Where \<cr> is a carriage return, entered as CTRL-v then CTRL-M.
Further explanation:
Using vim normal mode,
/PATTERN1\<cr> - search for the first pattern
j - go to the next line
V - enter visual mode
/PATTERN2\<cr> - search for the second pattern
k - go back one line
: ! sort\<cr> - sort the visual text you just selected
wq! - save and exit
Obviously this is inferior to the GNU AWK solution, but all the same, this is a GNU sed solution:
sed '
/PATTERN1/,/PATTERN2/ {
/PATTERN1/b # branch/break if /PATTERN1/. This line is printed
/PATTERN2/ { # if /PATTERN2/,
x # swap hold and pattern spaces
s/^\n// # delete the leading newline. The first H puts it there
s/.*/sort <<< "&"/e # sort the pattern space by calling Unix sort
p # print the sorted pattern space
x # swap hold and pattern space again to retrieve PATTERN2
p # print it also
}
H # Append the pattern space to the hold space
d # delete this line for now - it will be printed in the block above
}
' FILE
Note that I rely on the e command, a GNU extension.
Testing:
▶ gsed '
/PATTERN1/,/PATTERN2/ {
/PATTERN1/b
/PATTERN2/ {
x
s/^\n//; s/.*/sort <<< "&"/ep
x
p
}
H
d
}
' FILE
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd
Here is a small and easy to understand shell script for sorting lines between two patterns:
#!/bin/sh
in_file=$1
out_file=$2
temp_file_for_sort="$out_file.temp.for_sort"
curr_state=0
in_between_count=0
rm -rf $out_file
while IFS='' read -r line; do
if (( $curr_state == 0 )); then
#write this line to output
echo $line >> $out_file
is_start_line=`echo $line | grep "^PATTERN_START$"`
if [ -z "$is_start_line" ]; then
continue
else
rm -rf $temp_file_for_sort
in_between_count=0
curr_state=1
fi
else
is_end_line=`echo $line | grep "^PATTERN_END"`
if [ -z "$is_end_line" ]; then
#Line inside block - to be sorted
echo $line >> $temp_file_for_sort
in_between_count=$(( $in_between_count +1 ))
else
#End of block
curr_state=0
if (( $in_between_count != 0 )); then
sort -o $temp_file_for_sort $temp_file_for_sort
cat $temp_file_for_sort >> $out_file
rm -rf $temp_file_for_sort
fi
echo $line >> $out_file
fi
fi
done < $temp_file
#if something remains
if [ -f $temp_file_for_sort ]; then
cat $temp_file_for_sort >> $out_file
fi
rm -rf $temp_file_for_sort
Usage: <script_path> <input_file> <output_file>.
Pattern is hardcoded in file, can be changed as required (or taken as argument). Also, it creates a temporary file to sort intermediate data (<output_file>.temp.for_sort)
Algorithm:
Start with state = 0 and read the file line by line.
In state 0, line is written to output file and if START_PATTERN is encountered, state is set to 1.
In state 1, if line is not STOP_PATTERN, write line to temporary file
In state 1, if line is STOP_PATTERN, sort temporary file, append temporary file contents to output file (and remove temporary file) and write STOP_PATTERN to output file. Also, change state to 0.
At last if something is left in temporary file (case when STOP_PATTERN is missing), write contents of temporary file to output file
Along the lines of the solution proposed by #choroba, using GNU sed (depends on Q command):
{
sed -n '1,/PATTERN1/p' FILE
sed '1,/PATTERN1/d; /PATTERN2/Q' FILE | sort
sed -n '/PATTERN2/,$p' FILE
}
Explanation:
Use of the p prints a line in the range 1 to /PATTERN1/ inclusive and ($ is end of file) in '1,/PATTERN1/p' and /PATTERN2/,$p respectively.
Use of -n disables default behaviour of printing all lines. Useful in conjunction with p.
In the middle line, the d command is used to delete lines 1 to the /PATTERN1/ and also to Q (quit without printing, GNU sed only) on the first line matching /PATTERN2/. These are the lines to be sorted, and are thus fed into sort.
This can also be done with non-GNU awk and system command sort, make it work on both macOS and Linux.
awk -v SP='PATTERN1' -v EP='PATTERN2' -v cmd=sort '{
if (match($0, SP)>0) {flag=1}
else if (match($0, EP)>0) {
for (j=0;j<length(a);j++) {print a[j]|cmd}
close(cmd); delete a; i=0; flag=0}
else if (flag==1) {a[i++]=$0; next}
print $0
}' FILE
Output:
aaa
bbb
PATTERN1
bar
baz
foo
qux
PATTERN2
ccc
ddd

Grab nth occurrence in between two patterns using awk or sed

I have an issue where I want to parse through the output from a file and I want to grab the nth occurrence of text in between two patterns preferably using awk or sed
category
1
s
t
done
category
2
n
d
done
category
3
r
d
done
category
4
t
h
done
Let's just say for this example I want to grab the third occurrence of text in between category and done, essentially the output would be
category
3
r
d
done
This might work for you (GNU sed):
'sed -n '/category/{:a;N;/done/!ba;x;s/^/x/;/^x\{3\}$/{x;p;q};x}' file
Turn off automatic printing by using the -n option. Gather up lines between category and done. Store a counter in the hold space and when it reaches 3 print the collection in the pattern space and quit.
Or if you prefer awk:
awk '/^category/,/^done/{if(++m==1)n++;if(n==3)print;if(/^done/)m=0}' file
Try doing this :
awk -v n=3 '/^category/{l++} (l==n){print}' file.txt
Or more cryptic :
awk -v n=3 '/^category/{l++} l==n' file.txt
If your file is big :
awk -v n=3 '/^category/{l++} l>n{exit} l==n' file.txt
If your file doesn't contain any null characters, here's on way using GNU sed. This will find the third occurrence of a pattern range. However, you can easily modify this to get any occurrence you'd like.
sed -n '/^category/ { x; s/^/\x0/; /^\x0\{3\}$/ { x; :a; p; /done/q; n; ba }; x }' file.txt
Results:
category
3
r
d
done
Explanation:
Turn off default printing with the -n switch. Match the word 'category' at the start of a line. Swap the pattern space with the hold space and append a null character to the start of the pattern. In the example, if the pattern then contains two leading null characters, pull the pattern out of holdspace. Now create a loop and print the contents of the pattern space until the last pattern is matched. When this last pattern is found, sed will quit. If it's not found sed will continue to read the next line of input in and continue in its loop.
awk -v tgt=3 '
/^category$/ { fnd=1; rec="" }
fnd {
rec = rec $0 ORS
if (/^done$/) {
if (++cnt == tgt) {
printf "%s",rec
exit
}
fnd = 0
}
}
' file
With GNU awk you can set the the record separator to a regular expression:
<file awk 'NR==n+1 { print rt, $0 } { rt = RT }' RS='\\<category' ORS='' n=3
Output:
category
3
r
d
done
RT is the matched record separator. Note that the record relative to n will be off by one as the first record refers to what precedes the first RS.
Edit
As per Ed's comment, this will not work when the records have other data in between them, e.g.:
category
1
s
t
done
category
2
n
d
done
foo
category
3
r
d
done
bar
category
4
t
h
done
One way to get around this is to clean up the input with a second (or first) awk:
<file awk '/^category$/,/^done$/' |
awk 'NR==n+1 { print rt, $0 } { rt = RT }' RS='\\<category' ORS='' n=3
Output:
category
3
r
d
done
Edit 2
As Ed has noted in the comments, the above methods do not search for the ending pattern. One way to do this, which hasn't been covered by the other answers, is with getline (note that there are some caveats with awk getline):
<file awk '
/^category$/ {
v = $0
while(!/^done$/) {
if(!getline)
exit
v = v ORS $0
}
if(++nr == n)
print v
}' n=3
On one line:
<file awk '/^category$/ { v = $0; while(!/^done$/) { if(!getline) exit; v = v ORS $0 } if(++nr == n) print v }' n=3

Resources