Change three or more empty lines into two using bash, sed or awk - bash

Lets say that we have string containing words and multiple empty lines. For instance
"1\n2\n\n3\n\n\n4\n\n\n\n2\n\n3\n\n\n1\n"
I would like to "shrink" three or more empty lines into two using bash, sed or awk to obtain string
"1\n2\n\n3\n\n4\n\n2\n\n3\n\n1\n"
Has anybody an idea?

with awk
$ awk -v RS= -v ORS='\n\n' 1 file

If perl is acceptable,
perl -00 -lpe1
ought to do it. It reads and outputs whole paragraphs, which has the side effect of normalizing 2+ newlines to just \n\n.

If the data isn't too voluminous and you have GNU sed, use sed -z to make it work on a single null-terminated record rather than one \n-terminated record per line :
sed -z 's/\n\n\n\n*/\n\n/g'
Or with extended regexs :
sed -zr 's/\n{3,}/\n\n/g'

Related

Remove filler spaces from blank lines in linux script

I am trying to work on a bash script that will take files from one github repo and copy them over to another one.
I have this mostly working however 1 file I am trying to move over has spaces on all of its blank lines like so:
FROM metrics_flags ORDER BY DeliveryDate ASC
)
SELECT * FROM selected;
""";
Notice how its not just a blank line, there are actually 10-20 spaces in between the 2 blocks of code on that blank line.
Is there some unix command that can parse the file and remove the spaces (but keep the blank line)?
I tried
awk 'NF { $1=$1; print }' file.txt
and
sed -e 's/^[ \t]*//' file.txt
with no success.
awk used without changing delimeters splits records (lines) into white-space-separated fields. By default any print commands obey the same separators for the output but any empty fields can be removed resulting in their white-space-separators not being used.
The 'trick' is to get awk to re-evaluate the line by setting any field (even empty ones) to itself:
awk '{$1=$1; print}' test.txt
will remove all white space that is not surrounding other printable characters and return the file contents to stdout where it can be redirected to file if required.
I don't know why you used NF as a pattern in your awk attempt, nor why it caused it to fail, but the similar approach without it, as above, works fine.
edit after a quick experiment, I think what was happening with your awk attempt was that setting the pattern to NF caused awk to skip lines with no printable fields completely. Removing that pattern allows the now empty lines to be printed.
This should do what you describe, replacing leading whitespace only from empty lines:
sed -E 's|^\s+$||' file
The -E (extended regex) is required for \s+ (\t also), meaning one or more whitespace characters. I think you might have accidentally used a lower e.
If you like the output, you can add -i to apply the edit to your file.
This is an example of using awk to achieve the same:
awk '{gsub(/^\s+$/, "")}; { print }' file
To apply it, use -i inplace:
awk -i inplace '{gsub(/^\s+$/, "")}; { print }' file
I tested this on Ubuntu 22.04 with GNU sed 4.8 and GNU awk 5.1.0
Odd ...
sed -i 's/^[[:space:]]*$//g' file.txt
definitely works for me; I don't see why your sed version wouldn't, though.
On MacOS, this works (TESTED):
sed -E -i "" 's/^[[:space:]]*$//g' file.txt

Sed command to replace numbers between space and :

I have a file with a records like the below
FIRST 1: SECOND 2: THREE 4: FIVE 255: SIX 255
I want to remove values between space and :
FIRST:SECOND:THREE:FIVE:SIX
with code
awk -F '[[:space:]]*,:*' '{$1=$1}1' OFS=, file
tried on gnu awk:
awk -F' [0-9]*(: *|$)' -vOFS=':' '{print $1,$2,$3,$4,$5}' file
tried on gnu sed:
sed -E 's/\s+[0-9]+(:|$)\s*/\1/g' file
Explanation of awk,
regex , a space, followed by [0-9]+ one or more number followed by literal : followed by one or more space: *, if all such matched, then collect everything else than this matched pattern, ie. FIRST, SECOND,... so on because -F option determine it as field separator (FS) and $1, $2 .. so on is always else than FS. But the output needs nice look ie. has FS so that'd be : and it'd be awk variable definition -vOFS=':'
You can add [[:digit:]] also with a ending asterisk, and leave only a space just after OFS= :
$ awk -F '[[:space:]][[:digit:]]*' '{$1=$1}1' OFS= file
FIRST:SECOND:THREE:FIVE:SIX
To get the output we want in idiomatic awk, we make the input field separator (with -F) contain all the stuff we want to eliminate (anchored with :), and make the output field separator (OFS) what we want it replaced with. The catch is that this won't eliminate the space and numbers at the end of the line, and for this we need to do something more. GNU’s implementation of awk will allow us to use a regular expression for the input record separator (RS), but we could just do a simple sub() with POSIX complaint awk as well. Finally, force recalculation via $1=$1... the side effects for this pattern/statement are that the buffer will be recalculated doing FS/RS substitution for us, and that non-blank lines will take the default action -- which is to print.
gawk -F '[[:space:]]*[[:digit:]]*:[[:space:]]*' -v OFS=: -v RS='[[:space:]]*[[:digit:]]*\n' '$1=$1' file
Or:
awk -F '[[:space:]]*[[:digit:]]*:[[:space:]]*' -v OFS=: '{ sub(/[[:space:]]*[[:digit:]]*$/, “”) } $1=$1' file
A sed implementation is fun but probably slower (because current versions of awk have better regex implementations).
sed 's/[[:space:]]*[[:digit:]]*:[[:space:]]/:/g; s/[[:space:]]*[[:digit:]]*[[:space:]]*$//' file
Or if POSIX character classes are not available...
sed 's/[\t ]*[0-9]*:[\t ]/:/g; s/[\t ]*[0-9]*[\t ]*$//' file
Something tells me that your “FIRST, SECOND, THIRD...” might be more complicated, and might contain digits... in this case, you might want to experiment with replacing * with + for awk or with \+ for sed.

How to extract lines after founding specific string

My example text is,
AA BB CC
DDD
process.get('name1')
process.get('name2')
process.get('name3')
process.get('name4')
process.get('name5')
process.get('name6')
EEE
FFF
...
I want to search the string "process.get('name1')" first, if found then extract the lines from "process.get('name1')" to "process.get('name6')".
How do I extract the lines using sed?
This should work and... it uses sed as per OP request:
$ sed -n "/^process\.get('name1')$/,/^process\.get('name6')$/p" file
sed is for simple substitutions on individual lines, for anything more interesting you should be using awk:
$ awk -v beg="process.get('name1')" -v end="process.get('name6')" \
'index($0,beg){f=1} f; index($0,end){f=0}' file
process.get('name1')
process.get('name2')
process.get('name3')
process.get('name4')
process.get('name5')
process.get('name6')
Note that you could use a range in awk, just like you are forced to in sed:
awk -v beg="process.get('name1')" -v end="process.get('name6')" \
'index($0,beg),index($0,end)' file
and you could use regexps after escaping metachars in awk, just like you are forced to in sed:
awk "/process\.get\('name1'\)/,/process\.get\('name6'\)/" file
but the first awk version above using strings instead of regexps and a flag variable is simpler (in as much as you don't have to figure out which chars are/aren't RE metacharacters), more robust and more easily extensible in future.
It's important to note that sed CANNOT operate on strings, just regexps, so when you say "I want to search for a string" you should stop trying to force sed to behave as if it can do that.
Imagine your search strings are passed in to a script as positional parameters $1 and $2. With awk you'd just init the awk variables from them in the expected way:
awk -v beg="$1" -v end="$2" 'index($0,beg){f=1} f; index($0,end){f=0}' file
whereas with sed you'd have to do something like:
beg=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<< "$1")
end=$(sed 's/[^^]/[&]/g; s/\^/\\^/g' <<< "$2")
sed -n "/^${beg}$/,/^${end}$/p" file
to deactivate any metacharacters present. See Is it possible to escape regex metacharacters reliably with sed for details on escaping RE metachars for sed.
Finally - as mentioned above you COULD use a range expression with strings in awk:
awk -v beg="$1" -v end="$2" 'index($0,beg),index($0,end)' file
but I personally have never found that useful, there's always some slight requirements change comes along to make me wish I'd started out using a flag. See Is a /start/,/end/ range expression ever useful in awk? for details on that

How to extract specific string in a file using awk, sed or other methods in bash?

I have a file with the following text (multiple lines with different values):
TokenRange(start_token:8050285221437500528,end_token:8051783269940793406,...
I want to extract the value of start_token and end_token. I tried awk and cut, but I am not able to figure out the best way to extract the targeted values.
Something like:
cat filename| get the values of start_token and end_token
grep -oP '(?<=token:)\d+' filename
Explanation:
-o: print only part that matches, not complete line
-P: use Perl regex engine (for look-around)
(?<=token:): positive look-behind – zero-width pattern
\d+: one or more digits
Result:
8050285221437500528
8051783269940793406
A (potentially more efficient) variant of this, as pointed out by hek2mgl in his comment, uses \K, the variable-width look-behind:
grep -oP 'token:\K\d+'
\K keeps everything that has been matched to the left of it, but does not include it in the match (see perlre).
Using awk:
awk -F '[(:,]' '{print $3, $5}' file
8050285221437500528 8051783269940793406
First value is start_token and last value is end_token.
a sed version
sed -e '/^TokenRange(/!d' -e 's/.*:\([0-9]*\),.*:\([0-9]*\),.*/\1 \2/' YourFile

Appending to line with sed, adding separator if necessary

I have a properties file, which, when unmodified has the following line:
worker.list=
I would like to use sed to append to that line a value so that after sed has run, the line in the file reads:
worker.list=test
But, when I run the script a second time, I want sed to pick up that a value has already been added, and thus adds a separator:
worker.list=test,test
That's the bit that stumps me (frankly sed scares me with its power, but that's my problem!)
Rich
Thats easy! If you're running GNU sed, you can write it rather short
sed -e '/worker.list=/{s/$/,myValue/;s/=,/=/}'
That'll add ',myValue' to the line, and then remove the comma (if any) after the equal sign.
If you're stuck on some other platform you need to break it apart like so
sed -e '/worker.list=/{' -e 's/$/,myValue/' -e 's/=,/=/' -e '}'
It's a pretty stupid script in that it doesn't know about existance of values etc (I suppose you CAN do a more elaborate parsing, but why should you?), but I guess that's the beauty of it. Oh and it'll destroy a line like this
worker.list=,myval
which will turn into
worker.list=myval,test
If that's a problem let me know, and I'll fix that for you.
HTH.
you can also use awk. Set field delimiter to "=". then what you want to append is always field number 2. example
$ more file
worker.list=
$ awk -F"=" '/worker\.list/{$2=($2=="")? $2="test" : $2",test"}1' OFS="=" file
worker.list=test
$ awk -F"=" '/worker\.list/{$2=($2=="")? $2="test" : $2",test"}1' OFS="=" file >temp
$ mv temp file
$ awk -F"=" '/worker\.list/{$2=($2=="")? $2="test1" : $2",test1"}1' OFS="=" file
worker.list=test,test1
or the equivalent of the sed answer
$ awk -F"=" '/worker\.list/{$2=",test1";sub("=,","=")}1' OFS="=" file

Resources