Remove empty lines followed by a pattern - bash

I'm trying to find a way to remove empty lines which are found in my asciidoc file before a marker string, such as:
//Empty line
[source,shell]
I'd need:
[source,shell]
I'm trying with:
sed '/^\s*$\n[source,shell]/d' file
however it doesn't produce the expected effect (even escaping the parenthesis). Any help ?

You may use this awk-script to delete previous empty line:
awk -v desired_val="[source,shell]"
'BEGIN { first_time=1 }
{
if ( $0 != desired_val && first_time != 1) { print prev };
prev = $0;
first_time = 0;
}
END { print $0 }' your_file
Next script is little more than previous, but provides deleting all empty lines before desired value.
# AWK script file
# Provides clearing all empty lines in front of desired value
# Usage: awk -v DESIRED_VAL="your_value" -f "awk_script_fname" input_fname
BEGIN { i=0 }
{
# If line is empty - save counts of empty strings
if ( length($0) == 0 ) { i++; }
# If line is not empty and is DESIRED_VAL, print it
if ( length ($0) != 0 && $0 == DESIRED_VAL )
{
print $0; i=0;
}
# If line is not empty and not DESIRED_VAL, print all empty str and current
if ( length ($0) != 0 && $0 != DESIRED_VAL )
{
for (m=0;m<i;m++) { print ""; } i=0; print $0;
}
}
# If last lines is empty, print it
END { for (m=0;m<i;m++) { print ""; } }
This is awk-script used by typing followed command:
awk -v DESIRED_VAL="your_value" -f "awk_script_fname" input_fname

Your sed line doesn't work because sed processes one line at a time, so it will not match a pattern that includes \n unless you manipulate the pattern space.
If you still want to do it with sed:
sed '/^$/{N;s/\n\(\[source,shell]\)/\1/}' file
How it works: When matching an empty line, read the next line into the pattern space and remove the empty line if a marker is found. Note that this won't work correctly if you have two empty lines before the marker, as the first empty line will consume the second one and there will be no matching with the marker.

Related

AWK Script: Matching Two File with Unique Identifier and append record if already match

I'm trying to comparing two files with field as unique identifier to match.
With file 1 having account number and compare with second file.
If account number both file, next is condition to match the value and append to the original file or record.
Sample file 1:
ACCT1,PHONE1,TEST1
ACCT2,PHONE2,TEST3
Sample file 2:
ACCT1,SOMETHING1
ACCT1,SOMETHING3
ACCT1,SOMETHING1
ACCT1,SOMETHING3
ACCT2,SOMETHING1
ACCT2,SOMETHING3
ACCT2,SOMETHING1
ACCT2,SOMETHING1
But since the awk always gets the last occurrences of the file even there is already match before the end of record.
Actual Output base with condition below:
ACCT1,PHONE1,TEST1,000
ACCT2,PHONE2,TEST3,001
Expected Output:
ACCT1,PHONE1,TEST1,001
ACCT2,PHONE2,TEST3,001
Code I'm trying to:
awk -f test.awk pass=0 samplefile2.txt pass=1 samplefile1.txt > output.txt
BEGIN{
}
pass==0{
FS=","
ACT=$1
RES1[ACT]=$2
}
pass==1{
ACCTNO=$1
PHNO=$2
FIELD3=$3
LVCODE=RES1[ACCTNO]
if(LVCODE=="SOMETHING1"){ OTHERFLAG="001" }
else if(LVCODE=="SOMETHING4"){ OTHERFLAG="002" }
else{ OTHERFLAG="000" }
printf("%s\,", ACCTNO)
printf("%s\,", PHNO)
printf("%s\,", FIELD3)
printf("%s", OTHERFLAG)
printf "\n"
}
I'm trying to loop the variable that holds array, unfortunately it turns to infinite loop during my run.
You may use this awk command:
awk '
BEGIN {FS=OFS=","}
NR==FNR {
map[$1] = $0
next
}
$1 in map {
print map[$1], ($2 == "SOMETHING1" ? "001" : ($2 == "SOMETHING4" ? "002" : "000"))
delete map[$1]
}' file1 file2
ACCT1,PHONE1,TEST1,001
ACCT2,PHONE2,TEST3,001
Once we print a matching record from file2 we delete record from associative array map to ensure only first matching record is evaluated.
It sounds like you want to know the first occurrence of ACCTx in samplefile2.txt if SOMETHING1 or SOMETHING4 is present. I think you should read samplefile1.txt first into a data struction and then iterate line by line in samplefile2.txt looking for your criteria
BEGIN {
FS=","
while (getline < ACCOUNTFILE ) accounts[$1]=$0
}
{ OTHERFLAG = "" }
$2 == "SOMETHING1" { OTHERFLAG="001" }
$2 == "SOMETHING4" { OTHERFLAG="002" }
($1 in accounts) && OTHERFLAG!="" {
print(accounts[$1] "," OTHERFLAG)
# delete the accounts so that it does not print again.
# Only the first occurrence in samplefile2.txt will matter.
delete accounts[$1]
}
END {
# Print remaining accounts that did not match above
for (acct in accounts) print(accounts[acct] ",000")
}
Run above with:
awk -v ACCOUNTFILE=samplefile1.txt -f test.awk samplefile2.txt
I am not sure what you want to do if both SOMETHING1 and SOMETHING4 are in samplefile2.txt for the same ACCT1. If you want 'precedence' so that SOMETHING4 will overrule SOMETHING1 if it comes after you will need additional logic. In that case you probably want to avoid the 'delete' and keep updating the accounts[$1] array until you reach the end of the file and then print all the accounts at the end.

Use awk to create index of words from file

I'm learning UNIX for school and I'm supposed to create a command line that takes a text file and generates a dictionary index showing the words (exluding articles and prepositions) and the lines where it appears in the file.
I found a similar problem as mine in: https://unix.stackexchange.com/questions/169159/how-do-i-use-awk-to-create-an-index-of-words-in-file?newreg=a75eebee28fb4a3eadeef5a53c74b9a8 The problem is that when I run the solution
$ awk '
{
gsub(/[^[:alpha:] ]/,"");
for(i=1;i<=NF;i++) {
a[$i] = a[$i] ? a[$i]", "FNR : FNR;
}
}
END {
for (i in a) {
print i": "a[i];
}
}' file | sort
The output contains special characters (which I don't want) like:
-Quiero: 21
Sancho,: 2, 4, 8
How can I remove all the special characters and excluding articles and prepositions?
$ echo This is this test. | # some test text
awk '
BEGIN{
x["a"];x["an"];x["the"];x["on"] # the stop words
OFS=", " # list separator to a
}
{
for(i=1;i<=NF;i++) # list words in a line
if($i in x==0) { # if word is not a stop word
$i=tolower($i) # lowercase it
gsub(/^[^a-z]|[^a-z]$/,"",$i) # remove leading and trailing non-alphabets
a[$i]=a[$i] (a[$i]==""?"":OFS) NR # add record number to list
}
}
END { # after file is processed
for(i in a) # in no particular order
print i ": " a[i] # ... print elements in a
}'
this: 1, 1
test: 1
is: 1

awk get the nextline

i'm trying to use awk to format a file thats contains multiple line.
Contains of file:
ABC;0;1
ABC;0;0;10
ABC;0;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12
KLM;6;18;1200
KLM;10;18;14
KLM;1;18;15
result desired:
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
I am using the code below :
awk -F ";" '{
ligne= ligne $0
ma_var = $1
{getline
if($1 != ma_var){
ligne= ligne "\n" $0
}
else {
ligne= ligne";"NF
}
}
}
END {
print ligne
} ' ${FILE_IN} > ${FILE_OUT}
the objectif is to compare the first column of the next line to the first column the current line, if it matches then add the last column of the next line to the current line, and delete the next line, else print the next line.
Kind regards,
As with life, it's a lot easier to make decisions based on what has happened (the previous line) than what will happen (the next line). Re-state your requirements as the objective is to compare the first column of the current line to the first column the previous line, if it matches then add the last column of the current line to the previous line, and delete the current line, else print the current line. and the code to implement it becomes relatively straight-forward:
$ cat tst.awk
BEGIN { FS=OFS=";" }
$1 == p1 { prev = prev OFS $NF; next }
{ if (NR>1) print prev; prev=$0; p1=$1 }
END { print prev }
$ awk -f tst.awk file
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
If you're ever tempted to use getline again, be sure you fully understand everything discussed at http://awk.freeshell.org/AllAboutGetline before making a decision.
I would take a slightly different approach than Ed:
$ awk '$1 == p { printf ";%s", $NF; next } NR > 1 { print "" } {p=$1;
printf "%s" , $0} END{print ""}' FS=\; input
At each line, check if the first column matches the previous. If it does, just print the last field. If it doesn't, print the whole line with no trailing newline.

AWK find if line is newline or #

I have the following, it's ignoring the lines with just # but not those with \n (empty/ just containing newline lines)
Do you know of a way I can hit two birds with one stone?
I.E. if the lines don't contain more than 1 char, then delete the line..
function check_duplicates {
awk '
FNR==1{files[FILENAME]}
{if((FILENAME, $0) in a) dupsInFile[FILENAME]
else
{a[FILENAME, $0]
dups[$0] = $0 in dups ? (dups[$0] RS FILENAME) : FILENAME
count[$0]++}}
{if ($0 ~ /#/) {
delete dups[$0]
}}
#Print duplicates in more than one file
END{for(k in dups)
{if(count[k] > 1)
{print ("\n\nDuplicate line found: " k) " - In the following file(s)"
print dups[k] }}
printf "\n";
}' $SITEFILES
awk '
NR {
b[$0]++
}
$0 in b {
if ($0 ~ /#/) {
delete b[$0]
}
if (b[$0]>1) {
print ("\n\nRepeated line found: "$0) " - In the following file"
print FILENAME
delete b[$0]
}
}' $SITEFILES
}
The expected input is usually as follows.
#File Path's
/path/to/file1
/path/to/file2
/path/to/file3
/path/to/file4
#
/more/paths/to/file1
/more/paths/to/file2
/more/paths/to/file3
/more/paths/to/file4
/more/paths/to/file5
/more/paths/to/file5
In this case, /more/paths/to/file5, occurs twice, and should be flagged as such.
However, there are also many newlines, which I'd rather ignore.
Er, it also has to be awk, I'm doing a tonne of post processing, and don't want to vary from awk for this bit, if that's okay :)
It really seems to be a bit tougher than I would have expected.
Cheers,
Ben
You can combine both the if into a single regex.
if ($0 ~ /#|\n/) {
delete dups[$0]
}
OR
To be more specific you can write
if ($0 ~ /^#?$/) {
delete dups[$0]
}
What it does
^ Matches starting of the line.
#? Matches one or zero #
$ Matches end of line.
So, ^$ matches empty lines and ^#$ matches lines with only #.

match a pattern and print nth line if condition matches

My requirement is something like this:
Read File:
If ( line contains /String1/)
{
Increment cursor by 10 lines;
If (line contains /String2/ )
{ print line; }
}
so far I have got:
awk '/String1/{nr[NR]; nr[NR+10]}; NR in nr' file1.log
Result of this should input to:
awk 'match ($0 , /String2/) {print $0}' file1.log
How can I achieve it? Is there a better way?
Thanks.
You are close; you need to set the value of the array element.
awk '/String1/ { linematch[NR+10]=1; } /String2/ && NR in linematch;' file1.log
Each time a line matches String1, you save the record (line) number plus 10. Any time you match String2, check if the current line number is one we are expecting, and if so, print the line.
Here's another way to describe your algorithm. Instead of:
If ( line contains /String1/)
{
Increment cursor by 10 lines;
If (line contains /String2/ )
{ print line; }
}
which would require jumping ahead in your input stream somehow, think of it as:
If ( line contains /String2/)
{
If (line 10 lines previously contained /String1/ )
{ print line; }
}
which just requires you to re-visit what you already read in:
awk '/String1/{f[NR]} /String2/ && (NR-10) in f' file

Resources