SED script to remove whitespace from a region - bash

I have a file with a bunch of strings that looks like this:
new Tab("Hello World")
I want to turn those particular lines into something like:
new Tab("helloWorld")
Is this possible using SED and if so, how can I accomplish this? I figure I have to use grouping and regions but I can't figure out the replacement string.
This is what I have so far
sed -n 's/new Tab("\(.*\)"/new Tab("\1")'

This solution is not perfect: it assumes the line contains just new Tab("some string blah blah blah") and nothing else on that line. Here is my *remove_space.sed:*
/new Tab/ {
s/ *//g
s/newTab/new Tab/
}
To invoke:
sed -f remove_space.sed data.txt
The first substitution blindly remove all spaces, the second puts back a space between new and tab.
You don't have to put this in a file, the script works on command line as well:
sed '/new Tab/s/ *//g;s/newTab/new Tab/' data.txt

I'm not enough of a sed guru, but here's a piece of Perl:
perl -pe 's/(?<=new Tab\(")[^"]+/ lcfirst(join("", split(" ", $&))) /e'

My first thought was using awk but I came up with something I don't really like:
echo "new Tab(\"Hello World\")" | gawk 'match($0, /new Tab\("(.*)"\)/, r) {print r[1]}' | sed -e 's/ *//g'

Related

How to remove a comma 2nd to the last line of the file using SED?

I'm trying to search and remove a comma , at the 2nd to the last line using sed.
This is what I have now:
}
"user-account-id": "John",
"user-account-number": "v1001",
"user-account-app": "v10.0.0",
"user-account-dbase": "v10.1.0",
}
I want the end result to be like this:
}
"user-account-id": "John",
"user-account-number": "v1001",
"user-account-app": "v10.0.0",
"user-account-dbase": "v10.1.0"
}
I thought I found the answer an hour after I posted this but I was wrong. It didn't work.
Dry run with any of these combination doesn't work:
sed '2,$ s/,$//' filename
sed '2,$ s/,//' filename
sed '2,$ s/,//g' filename
sed '2,$s/,$//' filename
sed '2,$s/,//' filename
sed '2,$s/,//g' filename
Actual removal with any of these combination doesn't work:
sed -i '2,$ s/,$//' filename
sed -i '2,$ s/,//' filename
sed -i '2,$ s/,//g' filename
sed -i '2,$s/,$//' filename
sed -i '2,$s/,//' filename
sed -i '2,$s/,//g' filename
I thought running sed with '2,$ would only modify "2nd to the last line" in the file.
The output would just delete commas in every line, which doesn't make sense:
}
"user-account-id": "John"
"user-account-number": "v1001"
"user-account-app": "v10.0.0"
"user-account-dbase": "v10.1.0"
}
2,$ is a range starting at the 2nd line from the beginning and ending at the last line (so all lines except for the first one). Modifying the 2nd last line is hard in sed, see for example Replace the "pattern" on second-to-last line of a file.
But in your case, there is an easier solution with GNU sed:
Treat the entire file as one string and delete the last comma followed by an } at the end of the file (ignoring any whitespace, even linebreaks).
sed -Ez 's/,([ \t\r\n]*)\}([ \t\r\n]*)$/\1}\2/' file
In case you know the last
Another tactic: reverse the file, remove the trailing comma on the first time it's seen, then re-reverse the file:
tac file | awk -v p=1 'p && /,$/ {sub(/,$/, ""); p=0} 1' | tac
This might work for you (GNU sed):
sed 'N;$s/,//;P;D' file
Open a two line window throughout the file and when the last line is encountered, remove the first ,.

How to ignore case when using awk or sed [duplicate]

sed -i '/first/i This line to be added'
In this case,how to ignore case while searching for pattern =first
You can use the following:
sed 's/[Ff][Ii][Rr][Ss][Tt]/last/g' file
Otherwise, you have the /I and n/i flags:
sed 's/first/last/Ig' file
From man sed:
I
i
The I modifier to regular-expression matching is a GNU extension which
makes sed match regexp in a case-insensitive manner.
Test
$ cat file
first
FiRst
FIRST
fir3st
$ sed 's/[Ff][Ii][Rr][Ss][Tt]/last/g' file
last
last
last
fir3st
$ sed 's/first/last/Ig' file
last
last
last
fir3st
GNU sed
sed '/first/Ii This line to be added' file
You can try
sed 's/first/somethingelse/gI'
if you want to save some typing, try awk. I don't think sed has that option
awk -v IGNORECASE="1" '/first/{your logic}' file
For versions of awk that don't understand the IGNORECASE special variable, you can use something like this:
awk 'toupper($0) ~ /PATTERN/ { print "string to insert" } 1' file
Convert each line to uppercase before testing whether it matches the pattern and if it does, print the string. 1 is the shortest true condition, so awk does the default thing: { print }.
To use a variable, you could go with this:
awk -v var="$foo" 'BEGIN { pattern = toupper(foo) } toupper($0) ~ pattern { print "string to insert" } 1' file
This passes the shell variable $foo and transforms it to uppercase before the file is processed.
Slightly shorter with bash would be to use -v pattern="${foo^^}" and skip the BEGIN block.
Use the following, \b for word boundary
sed 's/\bfirst\b/This line to be added/Ig' file

How to remove symbols and add file name to fasta headers

I have several fasta files with the following headers:
M01498:408:000000000-BLBYD:1:1101:11790:1823 1:N:0:1
I want to remove all symbols (colon, dash, and space), and add "barcodelabel=FILENAME;"
I can do it for one file using:
cat A1.fasta |sed s/-//g | sed s/://g| sed s/\ //g|sed 's/^>/>barcodelabel=A1;/g' >A1.renamed.fasta
How can I do this but for all of my files at once? I tried the code below but it didn't work:
for i in {A..H}{1..6}; do cat ${i}.fasta |sed s/-//g | sed s/://g| sed s/\ //g | sed 's/^>/>barcodelabel=${i};/g' >${i}.named.fasta; done
any help would be appreciated !
Considering that you want to substitute -,: or space with null and want to add string at last of the first line then following may help you on same:
awk 'FNR==1{gsub(/:|-| +/,"");print $0,"barcodelabel=FILENAME";next} 1' Input_file
In case you want to save output in to same Input_file then add following in above code too > temp_file && mv temp_file Input_file
I figured it out. First, I reduced the number of sed to simplify the code. The mistake was in the final sed I had simple quotation marks and it should have been double so it can read the ${i}. final code is:
for i in {A..H}{1..6}; do cat ${i}.fasta |
sed 's/[-: ]//g' |
sed "s/^>/>barcodelabel=${i};/g" > ${i}.final4.fasta; done

Replace string by regex

I have bunch of string like "{one}two", where "{one}" could be different and "two" is always the same. I need to replace original sting with "three{one}", "three" is also constant. It could be easily done with python, for example, but I need it to be done with shell tools, like sed or awk.
If I understand correctly, you want:
{one}two --> three{one}
{two}two --> three{two}
{n}two --> three{n}
SED with a backreference will do that:
echo "{one}two" | sed 's/\(.*\)two$/three\1/'
The search store all text up to your fixed string, and then replace with the your new string pre-appended to the stored text. SED is greedy by default, so it should grab all text up to your fixed string even if there's some repeat in the variable part (e.gxx`., {two}two will still remap to three{two} properly).
Using sed:
s="{one}two"
sed 's/^\(.*\)two/three\1/' <<< "$s"
three{one}
echo "XXXtwo" | sed -E 's/(.*)two/three\1/'
Here's a Bash only solution:
string="{one}two"
echo "three${string/two/}"
awk '{a=gensub(/(.*)two/,"three\\1","g"); print a}' <<< "{one}two"
Output:
three{one}
awk '/{.*}two/ { split($0,s,"}"); print "three"s[1]"}" }' <<< "{one}two"
does also output
three{one}
Here, we are using awk to find the correct lines, and then split on "}" (which means your lines should not contain more than the one to indicate the field).
Through GNU sed,
$ echo 'foo {one}two bar' | sed -r 's/(\{[^}]*\})two/three\1/g'
foo three{one} bar
Basic sed,
$ echo 'foo {one}two bar' | sed 's/\({[^}]*}\)two/three\1/g'
foo three{one} bar

Shell scripting, replace the characters using sed

In shell scripting, i want to replace 1bc1034gf45dna22 (16 characters total) with 1b:c1:03:4g:f4:5d:na:22 (seperated by colon) using sed.
Edit
I have tried
sed 's/\w{2}/\w{2}:/g' a.txt > b.txt
where a.txt has
1bc1034gf45dna22
Too Many Edge Cases
Sed is not an ideal tool for this job, because there are just too many edge cases. If you have very regular data and a known corpus then that's fine, but in some cases you need something more powerful that can act on pieces of a line independently after matching/capturing the text of interest. For example, given an input file containing:
foo 1bc1034gf45dna22 bar
you might need to extract the second column, transform it, and then substitute it in place. You might do this with Ruby as follows:
$ echo 'foo 1bc1034gf45dna22 bar' |
ruby -pe '
if /(?<str>\p{Alnum}{16})/ =~ $_
$_.sub!(str, str.scan(/../).join(?:))
end'
This correctly yields:
foo 1b:c1:03:4g:f4:5d:na:22 bar
Does
sed -e 's/../&:/g' -e 's/:$//' a.txt > b.txt
or
sed -e 's/\(..\)/\1:/g' -e 's/:$//' a.txt > b.txt
work for you?
This might work for you (GNU sed):
sed 's/\w\w\B/&:/g' file
why not sed 's/1bc1034gf45dna22/1b:c1:03:4g:f4:5d:na:2/g' file ?
You could:
sed -i 's/\(\w\{2\}\)/\1:/g' file
This would also append a : to the end of the string. Since we cannot use positive lookahead in our sed regex syntax, we would then need to:
sed -i 's/:$//g' file
Notice that we used inplace editing for sed therefore the file will in this way altered.
Alternatively, you could use perl to use a regex that would have been /(\w{2})(?=\w{2})/$1:/g
With GNU/sed on any GNU/Linux distro you can use extentded regular expressions. This will convert a string of any length to a string where each two letters are separated by a colon:
echo 1bc1034gf45dna22 | sed -re 's/(\w{2})/\1:/g' -e 's/:$//'
On your AIX platform, you can leverage awk:
echo 1bc1034gf45dna22 | awk '
BEGIN { FS=""; ORS=""; }
{
for (i=1; i <=NF; i++) {
if (i > 1 && (i + 1) % 2 == 0) {
print(":");
}
print($i);
}
}'
Well, there's always the brute-force method:
sed 's/\<'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\>/\1:\2:\3:\4:\5:\6:\7:\8/g'
With no leading spaces on the continuation lines the shell pastes the strings together properly.

Resources