Concatenating sed commands - bash

I have a .txt file and there are three sed commands that I am using to manipulate it. First I convert it to a .csv by substituting tabs for commas (A), then I remove lines 1-8 (B) and then remove a '# ' that is in the beginning of line 9 (C).
(A) sed 's/\t/,/g' individuals/$message/$message.txt > individuals/$message/$message.csv
(B) sed -i 1,8d individuals/$message/$message.csv
(C) sed -i 's/.\{2\}//' individuals/$message/$message.csv
Is there a better way to do it, maybe integrating these three commands into a single one? It doesn't need to be done using sed but it does need to be done via bash commands.
Here is a sample of my data:
# This data file generated by PLINK at: Mon Jul 11 16:18:56 2022
#
# Below is a text version of your data. Fields are TAB-separated.
# Each line corresponds to a single SNP. For each SNP, we provide its
# identifier, its location on a reference human genome, and the genotype call.
# For further information (e.g. which reference build was used), consult the
# original source of your data.
#
# rsid chromosome position genotype
22:16050607G-A 22 16050607 GG
I deeply appreciate the help!
PS: Line 9 is the # rsid chromossome... one and it should be kept in the file, just without the #

Use multiple -e options to execute multiple sed commands in one call.
sed -e '1,8d' -e '9s/^# //' -e '9,$s/\t/,/g' "individuals/$message/$message.txt" > "individuals/$message/$message.csv"

Related

removing hosts from a comma delimited file

I am trying to script a way of removing hosts from the hostgroup file in Nagios Core.
The format of the hostgroup file is:
server1,server2,server3,server4
When removing a server, I need to be able to not only remove the server, but also the comma that follows it. So in my example above, if I am removing server2, the file would result as follows
server1,server3,server4
So I have googled and tested the following which works to remove server2 and a comma after it (I don't know what the b is used for exactly)
sed -i 's/\bserver2\b,//g' myfile
What I want to be able to do is to feed a list of hostnames to a small script to remove a bunch of hosts (and their following comma) with something similar to the following. The problem lies in that placing a variable like $x breaks the script so that nothing happens.
#!/bin/ksh
for x in `cat /tmp/list`
do
sed -i 's/\b${x}\b,//g' myfile
done
I think I am very close on a solution here, but could use a little help. Thanks much in advance for your kind assistance.
Using single quotes tells the shell not to replace the ${x} - it turns off variable interpolation if you want to google for it.
https://www.tldp.org/LDP/abs/html/quotingvar.html. So use double quotes around the sed replacement string instead:
while read -r x; do sed -i "s/\b${x},\b//g" myfile; done < /tmp/list
But since the last field won't have a comma after it, might be a good idea to run two sed commands, one looking for \bword,\b and the other for ,word$ - where \b is a word boundary and $ is the end of line.
while read -r x; do sed -i "s/\b${x},\b//g" myfile; sed -i "s/,${x}$//" myfile ; done < /tmp/list
One other possible boundary condition - what if you have just server2 on a line by itself and that's what you're trying to delete? Perhaps add a third sed, but this one will leave a blank line behind which you might want to remove:
while read -r x
do
sed -i "s/\b${x},\b//g" myfile # find and delete word,
sed -i "s/,${x}$//" myfile # find and delete ,word
sed -i "s/^${x}$//" myfile # find word on a line by itself
done < t
This works quite nicely:
#!/bin/bash
IN_FILE=$1
shift; sed -i "s/\bserver[$#],*\b//g" $IN_FILE; sed -i "s/,$//g" $IN_FILE
if you invoke it like ./remove_server.sh myfile "1 4" for your example file containing server1,server2,server3,server4, you get the following output:
server2,server3
A quick explanation of what it does:
shift shifts the arguments down by one (making sure that "myfile" isn't fed into the regex)
First sed removes the server with the numbers supplied as arguments in the string (e.g. "1 4")
Second sed looks for a trailing comma and removes it
The \b matches a word boundary
This is a great resource for learning about and testing regex: https://regex101.com/r/FxmjO5/1. I would recommend you check it out and use it each time you have a regex problem. It's helped me on so many occasions!
An example of this script working in a more general sense:
I tried it out on this file:
# This is some file containing server info:
# Here are some servers:
server2,server3
# And here are more servers:
server7,server9
with ./remove_server.sh myfile "2 9" and got this:
# This is some file containing info:
# Here are some servers:
server3
# And here are more servers:
server7
Pretty sure there is a pure sed solution for this but here is a script.
#!/usr/bin/env bash
hosts=()
while read -r host; do
hosts+=("s/\b$host,\{,1\}\b//g")
done < /tmp/list
opt=$(IFS=';' ; printf '%s' "${hosts[*]};s/,$//")
sed "$opt" myfile
It does not run sed line-by-line, but only one sed invocation. Just in case, say you have to remove 20+ pattern then sed will not run 20+ times too.
Add the -i if you think the output is ok.
Using perl and regex by setting the servers to a regex group in a shell variable:
$ remove="(server1|server4)"
$ perl -p -e "s/(^|,)$remove(?=(,|$))//g;s/^,//" file
server2,server3
Explained:
remove="(server1|server4)" or "server1" or even "server."
"s/(^|,)$remove(?=(,|$))//g" double-quoted to allow shell vars, remove leading comma, expected to be followed by a comma or the end of string
s/^,// file remove leading comma if the first entry was deleted
Use the -i switch for infile editing.
bash script that reads the servers to remove from standard input, one per line, and uses perl to remove them from the hostfile (Passed as the first argument to the script):
#!/usr/bin/env bash
# Usage: removehost.sh hostgroupfile < listfile
mapfile -t -u 0 servers
IFS="|"
export removals="${servers[*]}"
perl -pi -e 's/,?(?:$ENV{removals})\b//g; s/^,//' "$1"
It reads the servers to remove into an array, joins that into a pipe-separated string, and then uses that in the perl regular expression to remove all the servers in a single pass through the file. Slashes and other funky characters (As long as they're not RE metacharacters) won't mess up the parsing of the perl, because it uses the environment variable instead of embedding the string directly. It also uses a word boundry so that removing server2 won't remove that part of server22.

How to split a text file content by a string?

Suppose I've got a text file that consists of two parts separated by delimiting string ---
aa
bbb
---
cccc
dd
I am writing a bash script to read the file and assign the first part to var part1 and the second part to var part2:
part1= ... # should be aa\nbbb
part2= ... # should be cccc\ndd
How would you suggest write this in bash ?
You can use awk:
foo="$(awk 'NR==1' RS='---\n' ORS='' file.txt)"
bar="$(awk 'NR==2' RS='---\n' ORS='' file.txt)"
This would read the file twice, but handling text files in the shell, i.e. storing their content in variables should generally be limited to small files. Given that your file is small, this shouldn't be a problem.
Note: Depending on your actual task, you may be able to just use awk for the whole thing. Then you don't need to store the content in shell variables, and read the file twice.
A solution using sed:
foo=$(sed '/^---$/q;p' -n file.txt)
bar=$(sed '1,/^---$/b;p' -n file.txt)
The -n command line option tells sed to not print the input lines as it processes them (by default it prints them). sed runs a script for each input line it processes.
The first sed script
/^---$/q;p
contains two commands (separated by ;):
/^---$/q - quit when you reach the line matching the regex ^---$ (a line that contains exactly three dashes);
p - print the current line.
The second sed script
1,/^---$/b;p
contains two commands:
1,/^---$/b - starting with line 1 until the first line matching the regex ^---$ (a line that contains only ---), branch to the end of the script (i.e. skip the second command);
p - print the current line;
Using csplit:
csplit --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}" && sed -i '/---/d' foo_bar*
If version of coreutils >= 8.22, --suppress-matched option can be used and sed processing is not required, like
csplit --suppress-matched --elide-empty-files --quiet --prefix=foo_bar file.txt "/---/" "{*}".

How to extract (read and delete) a line from file with a single command?

I would like to extract the first line from a file, read into a variable and delete right afterwards, with a single command. I know sed can read the first line as follows:
sed '1q' file.txt
or delete it as follows:
sed '1q;d' file.txt
but can I somehow do both with a single command?
The reason for this is that multiple processes will be reading the first line of the file, and I want to minimize the chances of them getting the same line.
It's impossible.
Except you read the manpage, and have Gnu-sed:
echo -e {1..3}"\n" > input
cat input
1
2
3
sed -n '1p;2,$ Woutput' input
1
cat output
2
3
Explanation:
sed -n '1p;2,$ Woutput' input
-n no output by default
1p; print line 1
2,$ from line 2 until $ last line
W (non posix) Write buffer to file
From the man page gnu sed:
w filename
Write the current pattern space to filename.
W filename
Write the first line of the current pattern space to filename. This is a GNU extension.
However, reading and experimenting takes longer, than opening the file in a full blown office suite and deleting the line by hand, or invoking a text-to-speech framework and training it, to do the job.
It doesn't work if invoked in posix style:
sed -n --posix '1p;2,$ Woutput' input
And you still have the hard hanwork of renaming output to input again.
I didn't try to write to input in place, because that could damage my carefully crafted input file - try it on own risk:
sed -n '1p;2,$ Winput' input
However, you might set up a filesystem notify job, which always rename freshly created output files to input again. But I fear you can't do it from within the sed command. Except ... (to be continued)

Using BASH, how to increment a number that uniquely only occurs once in most lines of an HTML file?

The target is always going to be between two characters, 'E' and '/' and there will never be but one occurrence of this combination, e.g. 'E01/' in most lines in the HTML file and will always be between '01' and '90'.
So, I need to programmatically read the file and replace each occurrence of 'Enn/' where 'nn' in 'Enn/' will be between '01' and '90' and must maintain the '0' for numbers '01' to '09' in 'Enn/' while incrementing the existing number by 1 throughout the HTML file.
Is this doable and if so how best to go about it?
Edit: Target lines will be in one or the other formats:
<DT>ProgramName
<DT>Program Name
You can use sed inside BASH as a fantastic one-liner, either:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+(10#\2>=90?0:1)))/ge' FILENAME
or if you are guaranteed the number is lower than 100:
sed -ri 's/(.*E)([0-9]{2})(\/.*)/printf "\1%02u\3" $((10#\2+1)))/ge' FILENAME
Basically, you'll be doing inplace search and replace. The above will not add anything after 90 (since you didn't specify the exact nature of the overflow condition). So E89/ -> E90/, E90/ -> E90/, and if by chance you have E91/, it will remain E91/. Add this line inside a loop for multiple files
A small explanation of the above command:
-r states that you'll be using a regular expression
-i states to write back to the same file (be careful with overwriting!)
s/search/replace/ge this is the regex command you'll be using
s/ states you'll be using a string search
(.E) first grouping of all characters upto the first E (case sensitive)
([0-9]{2}) second grouping of numbers 0 through 9, repeated twice (fixed width)
(/.) third grouping getting the escaped trailing slash and everything after that
/ (slash separator) denotes end of search pattern and beginning of replacement pattern
printf "format" var this is the expression used for each replacement
\1 place first grouping found here
%02u the replace format for the var
\3 place third grouping found here
$((expression)) BASH arithmetic expression to use in printf format
10#\2 force second grouping as a base 10 number
+(10#\2>=90?0:1) add 0 or 1 to the second grouping based on if it is >= 90 (as used in first command)
+1 add 1 to the second grouping (see second command)
/ge flags for global replacement and the replace parameter will be an expression
GNU sed and awk are very powerful tools to do this sort of thing.
You can use the following perl one-liner to increment the numbers while maintaining the ones with leading 0s.
perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
$ cat file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
$ perl -pe 's/E\K([0-9]+)/sprintf "%02d", 1+$1/e' file
<DT>ProgramName
<DT>Program Name
<DT>Program Name
<DT>Program Name
You can add the -i option to make changes in-place. I would recommend creating backup before doing so.
Not as elegant as one line sed!
Break the commands used into multiple commands and you can debug your bash or grep or sed.
# find the number
# use -o to grep to just return pattern
# use head -n1 for safety to just get 1 number
n=$(grep -o "E[0-9][0-9]\/" file.html |grep -o "[0-9][0-9]"|head -n1)
#octal 08 and 09 are problem so need to do this
n1=10#$n
echo Debug n1=$n1 n=$n
n2=n1
# bash arithmetic done inside (( ))
# as ever with bash bracketing whitespace is needed
(( n2++ ))
echo debug n2=$n2
# use sed with -i -e for inline edit to replace number
sed -ie "s/E$n\//E$(printf '%02d' $n2)\//" file.html
grep "E[0-9][0-9]" file.html
awk might be better. Maybe could do it in one awk command also.
The sed one-liner in other answer is awesome :-)
This works in bash or sh.
http://unixhelp.ed.ac.uk/CGI/man-cgi?grep

sed not replacing lines

I have a file with 1 line of text, called output. I have write access to the file. I can change it from an editor with no problems.
$ cat output
1
$ ls -l o*
-rw-rw-r-- 1 jbk jbk 2 Jan 27 18:44 output
What I want to do is replace the first (and only) line in this file with a new value, either a 1 or a 0. It seems to me that sed should be perfect for this:
$ sed '1 c\ 0' output
0
$ cat output
1
But it never changes the file. I've tried it spread over 2 lines at the backslash, and with double quotes, but I cannot get it to put a 0 (or anything else) in the first line.
Sed operates on streams and prints its output to standard out.
It does not modify the input file.
It's typically used like this when you want to capture its output in a file:
#
# replace every occurrence of foo with bar in input-file
#
sed 's/foo/bar/g' input-file > output-file
The above command invokes sed on input-file and redirects the output to a new file named output-file.
Depending on your platform, you might be able to use sed's -i option to modify files in place:
sed -i.bak 's/foo/bar/g' input-file
NOTE:
Not all versions of sed support -i.
Also, different versions of sed implement -i differently.
On some platforms you MUST specify a backup extension (on others you don't have to).
Since this is an incredibly simple file, sed may actually be overkill. It sounds like you want the file to have exactly one character: a '0' or a '1'.
It may make better sense in this case to just overwrite the file rather than to edit it, e.g.:
echo "1" > output
or
echo "0" > output

Resources