Delete empty lines and trim surrounding spaces in Bash - bash

This command removes empty lines:
sed -e '/^$/d' file
But, how do I remove spaces from beginning and end of each non-empty line?

$ sed 's/^ *//; s/ *$//; /^$/d' file.txt
`s/^ *//` => left trim
`s/ *$//` => right trim
`/^$/d` => remove empty line

Even more simple method using awk.
awk 'NF { $1=$1; print }' file
NF selects non-blank lines, and $1=$1 trims leading and trailing spaces (with the side effect of squeezing sequences of spaces in the middle of the line).

This might work for you:
sed -r 's/^\s*(.*\S)*\s*$/\1/;/^$/d' file.txt

Similar, but using ex editor:
ex -s +"g/^$/de" +"%s/^\s\+//e" +"%s/\s\+$//e" -cwq foo.txt
For multiple files:
ex -s +'bufdo!g/^$/de' +'bufdo!%s/^\s\+//e' +'bufdo!%s/\s\+$//e' -cxa *.txt
To replace recursively, you can use a new globbing option (e.g. **/*.txt).

Related

How to remove a comma 2nd to the last line of the file using SED?

I'm trying to search and remove a comma , at the 2nd to the last line using sed.
This is what I have now:
}
"user-account-id": "John",
"user-account-number": "v1001",
"user-account-app": "v10.0.0",
"user-account-dbase": "v10.1.0",
}
I want the end result to be like this:
}
"user-account-id": "John",
"user-account-number": "v1001",
"user-account-app": "v10.0.0",
"user-account-dbase": "v10.1.0"
}
I thought I found the answer an hour after I posted this but I was wrong. It didn't work.
Dry run with any of these combination doesn't work:
sed '2,$ s/,$//' filename
sed '2,$ s/,//' filename
sed '2,$ s/,//g' filename
sed '2,$s/,$//' filename
sed '2,$s/,//' filename
sed '2,$s/,//g' filename
Actual removal with any of these combination doesn't work:
sed -i '2,$ s/,$//' filename
sed -i '2,$ s/,//' filename
sed -i '2,$ s/,//g' filename
sed -i '2,$s/,$//' filename
sed -i '2,$s/,//' filename
sed -i '2,$s/,//g' filename
I thought running sed with '2,$ would only modify "2nd to the last line" in the file.
The output would just delete commas in every line, which doesn't make sense:
}
"user-account-id": "John"
"user-account-number": "v1001"
"user-account-app": "v10.0.0"
"user-account-dbase": "v10.1.0"
}
2,$ is a range starting at the 2nd line from the beginning and ending at the last line (so all lines except for the first one). Modifying the 2nd last line is hard in sed, see for example Replace the "pattern" on second-to-last line of a file.
But in your case, there is an easier solution with GNU sed:
Treat the entire file as one string and delete the last comma followed by an } at the end of the file (ignoring any whitespace, even linebreaks).
sed -Ez 's/,([ \t\r\n]*)\}([ \t\r\n]*)$/\1}\2/' file
In case you know the last
Another tactic: reverse the file, remove the trailing comma on the first time it's seen, then re-reverse the file:
tac file | awk -v p=1 'p && /,$/ {sub(/,$/, ""); p=0} 1' | tac
This might work for you (GNU sed):
sed 'N;$s/,//;P;D' file
Open a two line window throughout the file and when the last line is encountered, remove the first ,.

getting first part of a string that has two parts

I have a string that has two parts (path and owner) both separated by a space.
This is the input file input.txt
/dir1/dir2/file1 #owner1
/dir1/dir2/foo\ bar #owner2
I want to extract all the paths to a separate output file - output.txt
I cannot use space as delimiter since paths can also have filenames with space and delimiter in them
/dir1/dir2/file1
/dir1/dir2/foo\ bar
Here could be a different way of doing it with rev + GNU grep:
rev file | grep -oP '.*# \K.*' | rev
OR
rev file | grep -oP '.*#\s+\K.*' | rev
With original simple solution go with:
awk -F' #' '{print $1}' Input_file
Assuming spaces that shouldn't be parsed as delimiters are escaped by a backslash as in your sample, you could use the following regex :
^(\\ |[^ ])*
For instance with grep :
grep -oE '^(\\ |[^ ])*'
The regex matches from the start of the line any number of either a backslash followed by a space or any other character than a space and will stop at the first occurence of a space that isn't preceded by a backslash.
You can try it here.
I would trim the ending part with sed.
sed 's/ [^ ]*$//' /path/to/file
This will match from the end of the line:
(blank) matches the space character
[^ ]* matches the longest string that contains no spaces, i.e. #owner1
$ matches the end of the line
And they will be replaced by nothing, which will act as if you deleted the matched string.
A one-line would do it:
while read p _; do printf '%q\n' "$p"; done <input.txt >output.txt
You can put them in an array and process
mapfile test < input.txt; test=("${test[#]% *}")
echo "${test[#]}"
echo "${test[0]}"
echo "${test[1]}"
You can try with simple awk
awk ' { $NF=""; print } '
Try it here https://ideone.com/W8J1ZO

How to ignore case when using awk or sed [duplicate]

sed -i '/first/i This line to be added'
In this case,how to ignore case while searching for pattern =first
You can use the following:
sed 's/[Ff][Ii][Rr][Ss][Tt]/last/g' file
Otherwise, you have the /I and n/i flags:
sed 's/first/last/Ig' file
From man sed:
I
i
The I modifier to regular-expression matching is a GNU extension which
makes sed match regexp in a case-insensitive manner.
Test
$ cat file
first
FiRst
FIRST
fir3st
$ sed 's/[Ff][Ii][Rr][Ss][Tt]/last/g' file
last
last
last
fir3st
$ sed 's/first/last/Ig' file
last
last
last
fir3st
GNU sed
sed '/first/Ii This line to be added' file
You can try
sed 's/first/somethingelse/gI'
if you want to save some typing, try awk. I don't think sed has that option
awk -v IGNORECASE="1" '/first/{your logic}' file
For versions of awk that don't understand the IGNORECASE special variable, you can use something like this:
awk 'toupper($0) ~ /PATTERN/ { print "string to insert" } 1' file
Convert each line to uppercase before testing whether it matches the pattern and if it does, print the string. 1 is the shortest true condition, so awk does the default thing: { print }.
To use a variable, you could go with this:
awk -v var="$foo" 'BEGIN { pattern = toupper(foo) } toupper($0) ~ pattern { print "string to insert" } 1' file
This passes the shell variable $foo and transforms it to uppercase before the file is processed.
Slightly shorter with bash would be to use -v pattern="${foo^^}" and skip the BEGIN block.
Use the following, \b for word boundary
sed 's/\bfirst\b/This line to be added/Ig' file

Find the pattern (YYYY-MM-DD) and replace it with the same value concatenating with apostrophes

I have this kind of data:
1,1990-01-01,2,A,2015-02-09
1,NULL,2,A,2015-02-09
1,1990-01-01,2,A,NULL
And looking for solution which will replace each date in the file with the old value but adding apostrophes. Basically expected result from the example will be:
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
I have found the way how to find the pattern which match my date, but still can't get with what I can then replace it.
sed 's/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/????/' a.txt > b.txt
Catch the date in a group by surrounding the pattern with parentheses (). Then you can use this catched group with \1 (second group would be \2 etc.).
sed "s/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\)/'\1'/g"
Note the g at the end, which ensures that all matches are replaced (if there are more than one in one line).
If you add -r switch to sed, the awkward backslashes before () can be omitted:
sed -r "s/([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9])/'\1'/g"
This can be further simplified using quantifiers:
sed -r "s/([0-9]{4}-[0-9]{2}-[0-9]{2})/'\1'/g"
Or even:
sed -r "s/([0-9]{4}-([0-9]{2}){2})/'\1'/g"
As mentioned in the comments: Also, in this particular case, you may use & instead of \1, which matches the whole looked-up expression, and omit the ():
sed -r "s/[0-9]{4}(-[0-9]{2}){2}/'&'/g"
You need to use a capture group, as well as replace all matching occurrences with the g flag.
sed 's/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\)/'"'"'\1'"'"'/g' a.txt > b.txt
The replacement text is a bit confusing because a single-quoted string in shell cannot contain a single quote, so you have to close the single-quoted string, then use a double-quoted single-quote. Using $'...'-style quoting in bash simplies it a bit, at the cost of needing to escape the backslashes.
sed $'s/\\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\\)/\'\1\'/g' a.txt > b.txt
Or, you can simply double-quote the script, since there's nothing currently in it that is subject to expansion:
sed "s/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\)/'\1'/g" a.txt > b.txt
There is also the special & replacement text, which expands to whatever the regular expressions matches, so you can avoid an explicit capture group:
sed "s/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/'&'/g" a.txt > b.txt
With GNU sed:
sed -E 's/([0-9]{2,4}-?){3}/'\''&'\''/g' file
Depending on your file content, the dates may also be described as 1 or 2 followed by a combination of nine dashes or digits:
sed -E 's/[12][-0-9]{9}/'\''&'\''/g" file
Here is one in awk:
$ awk -v q="'" '
BEGIN { FS=OFS="," } # set selimiters
{
for(i=1;i<=NF;i++) # loop all fields
if($i~/[0-9]{4}-[0-9]{2}-[0-9]{2}/) # if field has a date looking string
$i=q $i q # quote it
}1' file
Output:
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
Could you please try following.(REGEX mentioned inside match could be written as [0-9]{4}-[0-9]{2}-[0-9]{2} too but since my awk is of old version so couldn't test it, you could try it once)
awk -v s1="'" '
{
while(match($0,/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/)){
val=val substr($0,1,RSTART-1) s1 substr($0,RSTART,RLENGTH) s1
$0=substr($0,RSTART+RLENGTH)
}
print val
val=""
}' Input_file
Output will be as follows.
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01'
With Perl, it is simple
perl -pe ' s/(\d{4}-\d\d-\d\d)/\x27$1\x27/g '
with inputs - \x27 is used for single quotes
$ cat liubo.txt
1,1990-01-01,2,A,2015-02-09
1,NULL,2,A,2015-02-09
1,1990-01-01,2,A,NULL
$ perl -pe ' s/(\d{4}-\d\d-\d\d)/\x27$1\x27/g ' liubo.txt
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
$
If you want to use single quotes, then escape $ and wrap the command in double quotes
$ perl -pe " s/(\d{4}-\d\d-\d\d)/\'\$1\'/g " liubo.txt
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
$

Replacing/removing excess white space between columns in a file

I am trying to parse a file with similar contents:
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
I want the out file to be tab delimited:
I am a string\t12831928
I am another string\t41327318
A set of strings\t39842938
Another string\t3242342
I have tried the following:
sed 's/\s+/\t/g' filename > outfile
I have also tried cut, and awk.
Just use awk:
$ awk -F' +' -v OFS='\t' '{sub(/ +$/,""); $1=$1}1' file
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Breakdown:
-F' +' # tell awk that input fields (FS) are separated by 2 or more blanks
-v OFS='\t' # tell awk that output fields are separated by tabs
'{sub(/ +$/,""); # remove all trailing blank spaces from the current record (line)
$1=$1} # recompile the current record (line) replacing FSs by OFSs
1' # idiomatic: any true condition invokes the default action of "print"
I highly recommend the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
The difficulty comes in the varying number of words per-line. While you can handle this with awk, a simple script reading each word in a line into an array and then tab-delimiting the last word in each line will work as well:
#!/bin/bash
fn="${1:-/dev/stdin}"
while read -r line || test -n "$line"; do
arr=( $(echo "$line") )
nword=${#arr[#]}
for ((i = 0; i < nword - 1; i++)); do
test "$i" -eq '0' && word="${arr[i]}" || word=" ${arr[i]}"
printf "%s" "$word"
done
printf "\t%s\n" "${arr[i]}"
done < "$fn"
Example Use/Output
(using your input file)
$ bash rfmttab.sh < dat/tabfile.txt
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Each number is tab-delimited from the rest of the string. Look it over and let me know if you have any questions.
sed -E 's/[ ][ ]+/\\t/g' filename > outfile
NOTE: the [ ] is openBracket Space closeBracket
-E for extended regular expression support.
The double brackets [ ][ ]+ is to only substitute tabs for more than 1 consecutive space.
Tested on MacOS and Ubuntu versions of sed.
Your input has spaces at the end of each line, which makes things a little more difficult than without. This sed command would replace the spaces before that last column with a tab:
$ sed 's/[[:blank:]]*\([^[:blank:]]*[[:blank:]]*\)$/\t\1/' infile | cat -A
I am a string^I12831928 $
I am another string^I41327318 $
A set of strings^I39842938 $
Another string^I3242342 $
This matches – anchored at the end of the line – blanks, non-blanks and again blanks, zero or more of each. The last column and the optional blanks after it are captured.
The blanks before the last column are then replaced by a single tab, and the rest stays the same – see output piped to cat -A to show explicit line endings and ^I for tab characters.
If there are no blanks at the end of each line, this simplifies to
sed 's/[[:blank:]]*\([^[:blank:]]*\)$/\t\1/' infile
Notice that some seds, notably BSD sed as found in MacOS, can't use \t for tab in a substitution. In that case, you have to use either '$'\t'' or '"$(printf '\t')"' instead.
another approach, with gnu sed and rev
$ rev file | sed -r 's/ +/\t/1' | rev
You have trailing spaces on each line. So you can do two sed expressions in one go like so:
$ sed -E -e 's/ +$//' -e $'s/ +/\t/' /tmp/file
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Note the $'s/ +/\t/': This tells bash to replace \t with an actual tab character prior to invoking sed.
To show that these deletions and \t insertions are in the right place you can do:
$ sed -E -e 's/ +$/X/' -e $'s/ +/Y/' /tmp/file
I am a stringY12831928X
I am another stringY41327318X
A set of stringsY39842938X
Another stringY3242342X
Simple and without invisible semantic characters in the code:
perl -lpe 's/\s+$//; s/\s\s+/\t/' filename
Explanation:
Options:
-l: remove LF during processing (in this case)
-p: loop over records (like awk) and print
-e: code follows
Code:
remove trailing whitespace
change two or more whitespace to tab
Tested on OP data. The trailing spaces are removed for consistency.

Resources