Remove filler spaces from blank lines in linux script - macos

I am trying to work on a bash script that will take files from one github repo and copy them over to another one.
I have this mostly working however 1 file I am trying to move over has spaces on all of its blank lines like so:
FROM metrics_flags ORDER BY DeliveryDate ASC
)
SELECT * FROM selected;
""";
Notice how its not just a blank line, there are actually 10-20 spaces in between the 2 blocks of code on that blank line.
Is there some unix command that can parse the file and remove the spaces (but keep the blank line)?
I tried
awk 'NF { $1=$1; print }' file.txt
and
sed -e 's/^[ \t]*//' file.txt
with no success.

awk used without changing delimeters splits records (lines) into white-space-separated fields. By default any print commands obey the same separators for the output but any empty fields can be removed resulting in their white-space-separators not being used.
The 'trick' is to get awk to re-evaluate the line by setting any field (even empty ones) to itself:
awk '{$1=$1; print}' test.txt
will remove all white space that is not surrounding other printable characters and return the file contents to stdout where it can be redirected to file if required.
I don't know why you used NF as a pattern in your awk attempt, nor why it caused it to fail, but the similar approach without it, as above, works fine.
edit after a quick experiment, I think what was happening with your awk attempt was that setting the pattern to NF caused awk to skip lines with no printable fields completely. Removing that pattern allows the now empty lines to be printed.

This should do what you describe, replacing leading whitespace only from empty lines:
sed -E 's|^\s+$||' file
The -E (extended regex) is required for \s+ (\t also), meaning one or more whitespace characters. I think you might have accidentally used a lower e.
If you like the output, you can add -i to apply the edit to your file.
This is an example of using awk to achieve the same:
awk '{gsub(/^\s+$/, "")}; { print }' file
To apply it, use -i inplace:
awk -i inplace '{gsub(/^\s+$/, "")}; { print }' file
I tested this on Ubuntu 22.04 with GNU sed 4.8 and GNU awk 5.1.0

Odd ...
sed -i 's/^[[:space:]]*$//g' file.txt
definitely works for me; I don't see why your sed version wouldn't, though.
On MacOS, this works (TESTED):
sed -E -i "" 's/^[[:space:]]*$//g' file.txt

Related

bash scripting: Can I get sed to output the original line, then a space, then the modified line?

I'm new to Unix in all its forms, so please go easy on me!
I have a bash script that will pipe an ls command with arbitrary filenames into sed, which will use an arbitrary replacement pattern on the files, and then this will be piped into awk for some processing. The catch is, awk needs to know both the original file name and the new one.
I've managed everything except getting the original file names into awk. For instance, let's say my files are test.* and my replacement pattern is 's:es:ar;', which would change every occurrence of "test" to "tart". For testing purposes I'm just using awk to print what it's receiving:
ls "$#" | sed "$pattern" | awk '{printf "0: %s\n1: %s\n2: %s\n", $0,$1,$2}'
where test.* is in $# and the pattern is stored in $pattern.
Clearly, this doesn't get me to where I want to be. The output is obviously
0: tart.c
1: tart.c
2:
If I could get sed to output "test.c tart.c", then I'd have two parameters for awk. I've played around with the pattern to no avail, even hardcoding "test.c" into the replacement. But of course that just gave me amateur results like "ttest.c art.c". Is it possible for sed to remember the input, then work it into the beginning of the output? Do I even have the right ideas? Thanks in advance!
Two ways to change the first t in a b in the duplicated field.
Duplicate (& replays the matched part), change first word and swap (remember 2 strings with a space in between):
echo test.c | sed -r 's/.*/& &/;s/t/b/;s/([^ ]*) (.*)/\2 \1/'
or with more magic (copy original value to buffer, make the change, insert value from buffer as the first line and replace eond of line with a space)
echo test.c | sed 'h;s/t/b/;x;G;s/\n/ /'
Use Perl instead of sed:
echo test.c | perl -lne 'print "$_ ", s/es/ar/r'
-l removes the newline from input and adds it after each print. The /r modifier to the substitution returns the modified string instead of changing the variable (Perl 5.14+ needed).
Old answer, not working for s/t/b/2 or s/.*/replaced/2:
You can duplicate the contents of the line with s/.*/& &/, then just tell sed that it should only apply the second substitution (this works at least in GNU sed):
echo test.c | sed 's/.*/& &/; s/es/ar/2'
$ echo 'foo' | awk '{old=$0; gsub(/o/,"e"); print old, $0}'
foo fee

Sed command to replace numbers between space and :

I have a file with a records like the below
FIRST 1: SECOND 2: THREE 4: FIVE 255: SIX 255
I want to remove values between space and :
FIRST:SECOND:THREE:FIVE:SIX
with code
awk -F '[[:space:]]*,:*' '{$1=$1}1' OFS=, file
tried on gnu awk:
awk -F' [0-9]*(: *|$)' -vOFS=':' '{print $1,$2,$3,$4,$5}' file
tried on gnu sed:
sed -E 's/\s+[0-9]+(:|$)\s*/\1/g' file
Explanation of awk,
regex , a space, followed by [0-9]+ one or more number followed by literal : followed by one or more space: *, if all such matched, then collect everything else than this matched pattern, ie. FIRST, SECOND,... so on because -F option determine it as field separator (FS) and $1, $2 .. so on is always else than FS. But the output needs nice look ie. has FS so that'd be : and it'd be awk variable definition -vOFS=':'
You can add [[:digit:]] also with a ending asterisk, and leave only a space just after OFS= :
$ awk -F '[[:space:]][[:digit:]]*' '{$1=$1}1' OFS= file
FIRST:SECOND:THREE:FIVE:SIX
To get the output we want in idiomatic awk, we make the input field separator (with -F) contain all the stuff we want to eliminate (anchored with :), and make the output field separator (OFS) what we want it replaced with. The catch is that this won't eliminate the space and numbers at the end of the line, and for this we need to do something more. GNU’s implementation of awk will allow us to use a regular expression for the input record separator (RS), but we could just do a simple sub() with POSIX complaint awk as well. Finally, force recalculation via $1=$1... the side effects for this pattern/statement are that the buffer will be recalculated doing FS/RS substitution for us, and that non-blank lines will take the default action -- which is to print.
gawk -F '[[:space:]]*[[:digit:]]*:[[:space:]]*' -v OFS=: -v RS='[[:space:]]*[[:digit:]]*\n' '$1=$1' file
Or:
awk -F '[[:space:]]*[[:digit:]]*:[[:space:]]*' -v OFS=: '{ sub(/[[:space:]]*[[:digit:]]*$/, “”) } $1=$1' file
A sed implementation is fun but probably slower (because current versions of awk have better regex implementations).
sed 's/[[:space:]]*[[:digit:]]*:[[:space:]]/:/g; s/[[:space:]]*[[:digit:]]*[[:space:]]*$//' file
Or if POSIX character classes are not available...
sed 's/[\t ]*[0-9]*:[\t ]/:/g; s/[\t ]*[0-9]*[\t ]*$//' file
Something tells me that your “FIRST, SECOND, THIRD...” might be more complicated, and might contain digits... in this case, you might want to experiment with replacing * with + for awk or with \+ for sed.

Bash script delete a line in the file

I have a file, which has multiple lines.
For example:
a
ab#
ad.
a12fs
b
c
...
I want to use sed or awk delete the line, if the line include symbols or numbers. (For example, I want to delete: ab#, ad., a12fs.... lines)
or in another words, I just want to keep the line which include [a-z][A-Z] .
I know how to delete number line,
sed '/[0-9]/d' file.txt
but I do not know how to delete symbols lines.
Or there has any easy way to do that?
To keep blank lines:
grep '^[[:alpha:]]*$' file
sed '/[^[:alpha:]]/d' file
awk '/^[[:alpha:]]*$/' file
To remove blank lines:
grep '^[[:alpha:]]+$' file
sed -E -n '/^[[:alpha:]]+$/p' file
awk '/^[[:alpha:]]+$/' file
grep works well too and is even simpler: just do the reverse: keep the lines that interest you, which are way easier to define
grep -i '^[a-z]*$' file.txt
(match lines containing only letters and empty lines, and -i option makes grep case-insensitive)
to remove empty lines as well:
grep -i '^[a-z]+$' file.txt
caution when using Windows text files, as there's a carriage return at the end of the line, so nothing would match depending on grep versions (tested on windows here and it works)
but just in case:
grep -iP '^[a-z]*\r?$'
(note the P option to enable perl expressions or \r is not recognized)
You can use this sed:
sed '/^[A-Za-z0-9]\+$/!d' file
(OR)
sed '/[^A-Za-z0-9]/d' file
$ awk '!/[^[:alpha:]]/' file.txt
a
b
c

Delete all lines beginning with a # from a file

All of the lines with comments in a file begin with #. How can I delete all of the lines (and only those lines) which begin with #? Other lines containing #, but not at the beginning of the line should be ignored.
This can be done with a sed one-liner:
sed '/^#/d'
This says, "find all lines that start with # and delete them, leaving everything else."
I'm a little surprised nobody has suggested the most obvious solution:
grep -v '^#' filename
This solves the problem as stated.
But note that a common convention is for everything from a # to the end of a line to be treated as a comment:
sed 's/#.*$//' filename
though that treats, for example, a # character within a string literal as the beginning of a comment (which may or may not be relevant for your case) (and it leaves empty lines).
A line starting with arbitrary whitespace followed by # might also be treated as a comment:
grep -v '^ *#' filename
if whitespace is only spaces, or
grep -v '^[ ]#' filename
where the two spaces are actually a space followed by a literal tab character (type "control-v tab").
For all these commands, omit the filename argument to read from standard input (e.g., as part of a pipe).
The opposite of Raymond's solution:
sed -n '/^#/!p'
"don't print anything, except for lines that DON'T start with #"
you can directly edit your file with
sed -i '/^#/ d'
If you want also delete comment lines that start with some whitespace use
sed -i '/^\s*#/ d'
Usually, you want to keep the first line of your script, if it is a sha-bang, so sed should not delete lines starting with #!. also it should delete lines, that just contain only a hash but no text. put it all together:
sed -i '/^\s*\(#[^!].*\|#$\)/d'
To be conform with all sed variants you need to add a backup extension to the -i option:
sed -i.bak '/^\s*#/ d' $file
rm -Rf $file.bak
You can use the following for an awk solution -
awk '/^#/ {sub(/#.*/,"");getline;}1' inputfile
This answer builds upon the earlier answer by Keith.
egrep -v "^[[:blank:]]*#" should filter out comment lines.
egrep -v "^[[:blank:]]*(#|$)" should filter out both comments and empty lines, as is frequently useful.
For information about [:blank:] and other character classes, refer to https://en.wikipedia.org/wiki/Regular_expression#Character_classes.
If you want to delete from the file starting with a specific word, then do this:
grep -v '^pattern' currentFileName > newFileName && mv newFileName currentFileName
So we have removed all the lines starting with a pattern, writing the content into a new file, and then copy the content back into the source/current file.
You also might want to remove empty lines as well
sed -E '/(^$|^#)/d' inputfile
Delete all empty lines and also all lines starting with a # after any spaces:
sed -E '/^$|^\s*#/d' inputfile
For example, see the following 3 deleted lines (including just line numbers!):
1. # first comment
2.
3. # second comment
After testing the command above, you can use option -i to edit the input file in place.
Just this!
Here is it with a loop for all files with some extension:
ll -ltr *.filename_extension > list.lst
for i in $(cat list.lst | awk '{ print $8 }') # validate if it is the 8 column on ls
do
echo $i
sed -i '/^#/d' $i
done

Appending to line with sed, adding separator if necessary

I have a properties file, which, when unmodified has the following line:
worker.list=
I would like to use sed to append to that line a value so that after sed has run, the line in the file reads:
worker.list=test
But, when I run the script a second time, I want sed to pick up that a value has already been added, and thus adds a separator:
worker.list=test,test
That's the bit that stumps me (frankly sed scares me with its power, but that's my problem!)
Rich
Thats easy! If you're running GNU sed, you can write it rather short
sed -e '/worker.list=/{s/$/,myValue/;s/=,/=/}'
That'll add ',myValue' to the line, and then remove the comma (if any) after the equal sign.
If you're stuck on some other platform you need to break it apart like so
sed -e '/worker.list=/{' -e 's/$/,myValue/' -e 's/=,/=/' -e '}'
It's a pretty stupid script in that it doesn't know about existance of values etc (I suppose you CAN do a more elaborate parsing, but why should you?), but I guess that's the beauty of it. Oh and it'll destroy a line like this
worker.list=,myval
which will turn into
worker.list=myval,test
If that's a problem let me know, and I'll fix that for you.
HTH.
you can also use awk. Set field delimiter to "=". then what you want to append is always field number 2. example
$ more file
worker.list=
$ awk -F"=" '/worker\.list/{$2=($2=="")? $2="test" : $2",test"}1' OFS="=" file
worker.list=test
$ awk -F"=" '/worker\.list/{$2=($2=="")? $2="test" : $2",test"}1' OFS="=" file >temp
$ mv temp file
$ awk -F"=" '/worker\.list/{$2=($2=="")? $2="test1" : $2",test1"}1' OFS="=" file
worker.list=test,test1
or the equivalent of the sed answer
$ awk -F"=" '/worker\.list/{$2=",test1";sub("=,","=")}1' OFS="=" file

Resources