Deleting all until you find capital letter in bash - bash

I have an output
timeout.o:
U alarm
000000000000t000 T catch_sig_alarm
0000000000000b13 T set_timeout
U signal
0000000g00000000 B timeout
and I need to get rid of the numers and letters before T and U and B so output will be like this:
timeout.o:
U alarm
T catch_sig_alarm
T set_timeout
U signal
B timeout
How can I do that using sed? I tried something like sed 's/[0-9]*//;s/ *//' but I dont know how to say to delete the letters too.

Update
Based on the real input data (I thought timeout.o was the file name):
... | awk 'NF>1 {sub("^[^A-Z]*","")} {print}'
timeout.o:
U alarm
T catch_sig_alarm
T set_timeout
U signal
B timeout
It does the substitution just in case the line contains more than one field. This way, the first line is skipped. It would be the same in this case to do NR>1.
You can use this:
$ sed 's/^[^A-Z]*//' timeout.o
U alarm
T catch_sig_alarm
T set_timeout
U signal
B timeout
What it does is to fetch all the characters from the beginning (^ indicates beginning of the line) not being a capital letter ([^A-Z]* means that) and replacing them with an empty string.
Note the expression sed 's/hello/bye/' replaces once hello with bye. If you want to do multiple substitution (is not this case), you can do sed 's/hello/bye/g'.
If you want to do an in-place substitution, do sed -i ....

input | sed '/^[a-zA-Z0-9.]\+\.[a-z]\+:$/!s/^[^A-Z]*//'
Explanation: [^A-Z] is everything not uppercase letter. The first ^ makes sure, the expression starts at line beginning and doesn't go rogue in the middle of the line. The expression simply starts deleting everything in a line, until it finds an uppercase letter.
The first part /^[a-zA-Z0-9]\+\.[a-z]\+:$/! up until the s constricts the removal to all lines, that do not (the final !) match exactly [letter]...[a dot][letter]...[a colon], which looks like a filename production.

cat timeout.o | sed 's/^[^BUT]* //'
Or
sed 's/[0-9a-z]* //;s/ *//'

Given that the first column seems to have a fixed width, I'd just use
{ read; echo "$REPLY"; cut -c18-; } < timeout.o
to remove the first 17 characters (while preserving the initial line in full).

Related

Remove comma from last element in each block

I've got a file with the following contents, and want to remove the last comma (in this case, the comma after the 'c' and 'f').
heading1(
a,
b,
c,
);
some more text
heading2(
d,
e,
f,
);
This has to be used using bash and not Perl or Python etc as these are not installed on my target system. I can use sed, awk etc, but I cannot use sed with the -z argument as I'm using an old version of the utility.
So sed -zi 's/,\n);/\n);/g' $file is off the table.
Any help would be greatly appreciated. Thanks
This might work in your version of sed. Then again it might not.
sed 'x;1d;G;/;$/s/,//;$!s/\n.*//' $file
Rough translation: "Swap this line with the hold space. If this is the first line, do no more with it. Append the hold space to the line in the buffer (so that you're looking at the last line and the current one). If what you have ends with a semicolon, delete the comma. If you're not on the last line of the file, delete the second of the two lines you have (i.e. the current line, which we'll deal with after we see the next one)."
Using awk, RS="^$" to read in the whole file and regex to replace parts of the text:
$ awk -v RS=^$ '{gsub(/,\n\);/,"\n);")}1' file
Some output:
heading1(
a,
b,
c
);
...
This should work with GNU sed and BSD sed on the shown input:
sed -e ':a' -e '/,\n);$/!{N' -e 'ba' -e '}' -e 's/,\n);$/\n);/' file.txt
We concatenate lines in the pattern space until it ends with ,\n);. Then we delete the comma, print (the default) and restart the cycle with a new line.
Simpler and more readable version with GNU sed (that you do not have):
sed ':a;/,\n);$/!{N;ba};s/,\n);$/\n);/' file.txt
Using awk:
awk '
$0==");" {sub(/,$/, "", l)}
FNR!=1 {print l}
{l=$0}
END {print l}'
This might work for you (GNU sed):
sed '/,$/{N;/);$/Ms/,$//M;P;D}' file
If a line ends with a comma, fetch the next line and if this ends in );, remove the comma.
Otherwise, if the following line does not match as above, print/delete the first of the lines and repeat.
Using sed there are broadly two approaches:
Keep multiple lines in the pattern space; or
Keep the previous line in the hold space.
Using just the pattern space means a very concise version:
sed 'N; s/,[[:space:]]*\n*[[:space:]]*)/)/; P; D'
This relies on the pattern space being able to hold multiple lines, and being able to match the newline with \n. Not all versions of sed can do this, but GNU sed can.
This also relies on the implicit behaviours of N, P, and D, which change depending on when end-of-input is reached. Read man sed for the gory details.
Unrolling this to one command per line gets:
sed '
N
s/,[[:space:]]*\n*[[:space:]]*)/)/
P
D
'
If you have only a POSIX version of sed available, you'll need to use the hold space as well. In this case the idea is that when you see the ) in the pattern space, you edit the line that's in the hold space to remove the comma:
sed '1 { h; d; }; /^)/ { x; s/,[[:space:]]*$//; x; }; x; $ { p; x; s/,$//; }'
Unrolling that we get:
sed '
1 {
h
d
}
/^)/ {
x
s/,[[:space:]]*$//
x
}
x
$ {
p
x
s/,[[:space:]]*$//
}
'
Breaking that apart: what follows is a "sed script"; so just put '' around it and "sed" in front of it:
sed '
Start by unconditionally copying the first line from the pattern space to the hold space, and then deleting the pattern space (which forces a skip to the next line)
1 {
h
d
}
For each line that starts with ')', swap the pattern space and hold space (so you now have the previous line in the pattern space), remove the trailing comma (if any), and then swap back again:
/^)/ {
x
s/,[[:space:]]*$//
x
}
Now swap the pattern space with the hold space, so that the hold space now hold the current line and pattern space holds the previous line.
x
Normally contents of the pattern space will be sent to output when the end of the script is reached, but we have one more case to take care of first.
On the last line, print the previous line, then swap to retrieve the last line and then (because we reach the end of the script) print it too. This code will also remove a trailing comma from the last line, but that's optional; you can remove the s command in the following if you don't want that.
$ {
p
x
s/,[[:space:]]*$//
}
Upon reaching the end of the sed script, the pattern space will be printed; so there's no "p" at the end.
As mentioned before, close the quote from the beginning.
'
Note:
If you need to scan ahead more than one line, instead of "x" to swap one line, use "H;g" to append to the hold space and then copy the hold space to the pattern space, then "P;D" to print and remove up to the first newline. (H, P & D are GNU extensions.)

sed/awk between two patterns in a file: pattern 1 set by a variable from lines of a second file; pattern 2 designated by a specified charcacter

I have two files. One file contains a pattern that I want to match in a second file. I want to use that pattern to print between that pattern (included) up to a specified character (not included) and then concatenate into a single output file.
For instance,
File_1:
a
c
d
and File_2:
>a
MEEL
>b
MLPK
>c
MEHL
>d
MLWL
>e
MTNH
I have been using variations of this loop:
while read $id;
do
sed -n "/>$id/,/>/{//!p;}" File_2;
done < File_1
hoping to obtain something like the following output:
>a
MEEL
>c
MEHL
>d
MLWL
But have had no such luck. I have played around with grep/fgrep awk and sed and between the three cannot seem to get the right (or any output). Would someone kindly point me in the right direction?
Try:
$ awk -F'>' 'FNR==NR{a[$1]; next} NF==2{f=$2 in a} f' file1 file2
>a
MEEL
>c
MEHL
>d
MLWL
How it works
-F'>'
This sets the field separator to >.
FNR==NR{a[$1]; next}
While reading in the first file, this creates a key in array a for every line in file file.
NF==2{f=$2 in a}
For every line in file 2 that has two fields, this sets variable f to true if the second field is a key in a or false if it is not.
f
If f is true, print the line.
A plain (GNU) sed solution. Files are read only once. It is assumed that characters in File_1 needn't to be quoted in sed expression.
pat=$(sed ':a; $!{N;ba;}; y/\n/|/' File_1)
sed -E -n ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}" File_2
Explanation:
The first call to sed generates a regular expression to be used in the second call to sed and stores it in the variable pat. The aim is to avoid reading repeatedly the entire File_2 for each line of File_1. It just "slurps" the File_1 and replaces new-line characters with | characters. So the sample File_1 becomes a string with the value a|c|d. The regular expression a|c|d matches if at least one of the alternatives (a, b, c for this example) matches (this is a GNU sed extension).
The second sed expression, ":a; /^>($pat)/{:b; p; n; /^>/ba; bb}", could be converted to pseudo code like this:
begin:
read next line (from File_2) or quit on end-of-file
label_a:
if line begins with `>` followed by one of the alternatives in `pat` then
label_b:
print the line
read next line (from File_2) or quit on end-of-file
if line begins with `>` goto label_a else goto label_b
else goto begin
Let me try to explain why your approach does not work well:
You need to say while read id instead of while read $id.
The sed command />$id/,/>/{//!p;} will exclude the lines which start
with >.
Then you might want to say something like:
while read id; do
sed -n "/^>$id/{N;p}" File_2
done < File_1
Output:
>a
MEEL
>c
MEHL
>d
MLWL
But the code above is inefficient because it reads File_2 as many times as the count of the id's in File_1.
Please try the elegant solution by John1024 instead.
If ed is available, and since the shell is involve.
#!/usr/bin/env bash
mapfile -t to_match < file1.txt
ed -s file2.txt <<-EOF
g/\(^>[${to_match[*]}]\)/;/^>/-1p
q
EOF
It will only run ed once and not every line that has the pattern, that matches from file1. Like say if you have a to z from file1,ed will not run 26 times.
Requires bash4+ because of mapfile.
How it works
mapfile -t to_match < file1.txt
Saves the entry/value from file1 in an array named to_match
ed -s file2.txt point ed to file2 with the -s flag which means don't print info about the file, same info you get with wc file
<<-EOF A here document, shell syntax.
g/\(^>[${to_match[*]}]\)/;/^>/-1p
g means search the whole file aka global.
( ) capture group, it needs escaping because ed only supports BRE, basic regular expression.
^> If line starts with a > the ^ is an anchor which means the start.
[ ] is a bracket expression match whatever is inside of it, in this case the value of the array "${to_match[*]}"
; Include the next address/pattern
/^>/ Match a leading >
-1 go back one line after the pattern match.
p print whatever was matched by the pattern.
q quit ed

Sed converting underscore string to CamelCase fails on numbers

I have an assigment to convert function names that are written like this: function_name() to camelCase. There are some restrictions:
don't convert functions with uppercase character in them
don't convert part of function with two underscores (two__underscores())
I thought of sed command that works fairly well, except it fails on single digit between underscores:
command:
sed -re '/[A-Z]+/!s/([0-9a-z])(_)([a-z0-9])/\1\u\3/g'
What it does:
this_is_simple() -> thisIsSimple()
this_is_2_simple() -> thisIs2_simple()
this_is_22_simple() -> thisIs22Simple()
The problem is second example. Why it fails on single digit but not on number with more digits? I tried using [[:digit:]] and replacing ([0-9a-z]) with ([a-z0-9]|[[:digit:]]) . They work same.
Thank you in advance.
Loop through it manually and replace up until there is nothing more to replace.
sed -re '/[A-Z]+/!{ : again; /([0-9a-zA-Z])_([a-z0-9])/{ s//\1\u\2/; b again; }; }'
I have added A-Z in the first regex to handle cases like:
this_is_a_simple -> thisIsASimple
After the first match it becomes thisIsA_simple, so in the second loop we want to match A_simple.
Maybe a better version would be:
sed -re '/[A-Z]+/!{ : again; /(.*[0-9a-z])_([a-z0-9])/{ s//\1\u\2/; b again; }; }'
Because regex is greedy, this will replace from the end, so this_is_a_simple at first becomes this_is_aAimple, then this_isASimple, then thisIsASimple.

insert a string at specific position in a file by SED awk

I have a string which i need to insert at a specific position in a file :
The file contains multiple semicolons(;) i need to insert the string just before the last ";"
Is this possible with SED ?
Please do post the explanation with the command as I am new to shell scripting
before :
adad;sfs;sdfsf;fsdfs
string = jjjjj
after
adad;sfs;sdfsf jjjjj;fsdfs
Thanks in advance
This might work for you:
echo 'adad;sfs;sdfsf;fsdfs'| sed 's/\(.*\);/\1 jjjjj;/'
adad;sfs;sdfsf jjjjj;fsdfs
The \(.*\) is greedy and swallows the whole line, the ; makes the regexp backtrack to the last ;. The \(.*\) make s a back reference \1. Put all together in the RHS of the s command means insert jjjjj before the last ;.
sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/' filename
(substitute jjjjj with what you need to insert).
Example:
$ echo 'adad;sfs;sdfsf;fsdfs;' | sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/'
adad;sfs;sdfsfjjjjj;fsdfs;
Explanation:
sed finds the following pattern: \([^;]*\)\(;[^;]*;$\). Escaped round brackets (\(, \)) form numbered groups so we can refer to them later as \1 and \2.
[^;]* is "everything but ;, repeated any number of times.
$ means end of the line.
Then it changes it to \1jjjjj\2.
\1 and \2 are groups matched in first and second round brackets.
For now, the shorter solution using sed : =)
sed -r 's#;([^;]+);$#; jjjjj;\1#' <<< 'adad;sfs;sdfsf;fsdfs;'
-r option stands for extented Regexp
# is the delimiter, the known / separator can be substituted to any other character
we match what's finishing by anything that's not a ; with the ; final one, $ mean end of the line
the last part from my explanation is captured with ()
finally, we substitute the matching part by adding "; jjjj" ans concatenate it with the captured part
Edit: POSIX version (more portable) :
echo 'adad;sfs;sdfsf;fsdfs;' | sed 's#;\([^;]\+\);$#; jjjjj;\1#'
echo 'adad;sfs;sdfsf;fsdfs;' | sed -r 's/(.*);(.*);/\1 jjjj;\2;/'
You don't need the negation of ; because sed is by default greedy, and will pick as much characters as it can.
sed -e 's/\(;[^;]*\)$/ jjjj\1/'
Inserts jjjj before the part where a semicolon is followed by any number of non-semicolons ([^;]*) at the end of the line $. \1 is called a backreference and contains the characters matched between \( and \).
UPDATE: Since the sample input has no longer a ";" at the end.
Something like this may work for you:
echo "adad;sfs;sdfsf;fsdfs"| awk 'BEGIN{FS=OFS=";"} {$(NF-1)=$(NF-1) " jjjjj"; print}'
OUTPUT:
adad;sfs;sdfsf jjjjj;fsdfs
Explanation: awk starts with setting FS (field separator) and OFS (output field separator) as semi colon ;. NF in awk stands for number of fields. $(NF-1) thus means last-1 field. In this awk command {$(NF-1)=$(NF-1) " jjjjj" I am just appending jjjjj to last-1 field.

sed command to edit stream on given rule

I have an input stream like this:
afs=1;bgd=1;cgd=1;djh=1;fgjhh=1;
Now the rule I have to edit the stream is:
(1)if we have
"djh=number;"
replace it with
"djh=number,"
(2)else replace "string=number;"it with
"string,"
I can handle case 2 as:
sed 's/afs=1/afs,/g;s/dbg=1/dbg,/g;..... so on for rest
How to take care for condition 1?
The "djh" number can be any number(1,12,100), the other numbers are always 1.
all the double quotes I have used are for reference only; no double quotes are present in the input stream. "afs" can be "Afs" also.
Thanks in advance.
sed -e 's/;/,/g; s/,djh=/,#=/; s/\([a-z][a-z]*\)=[0-9]*,/\1,/g; s/#/djh/g'
This does the following
replace all ; by ,
replace djh with #
remove =number from all lower cased strings
replace # with djh
This results in afs,bgd,cgd,djh=1,fgjhh, for your input. Of course you could substitute djh with any other character that makes it easy to match the other strings. This is just illustrating the idea.
echo 'afs=1;bgd=1;cgd=1;djh=1;fgjhh=1;' |
sed -e 's/\(djh=[0-9]\+\);/\1,/g' -e 's/\([a-zA-Z0-9]\+\)=1;/\1,/g'
This might work for you:
echo "afs=1;bgd=1;cgd=1;djh=1;fgjhh=1;" |
sed 's/^/\n/;:a;/\n\(djh=[0-9]*\);/s//\1,\n/;ta;s/\n\([^=]*\)=1;/\1,\n/;ta;s/.$//'
afs,bgd,cgd,djh=1,fgjhh,
Explanation:
This method uses a unique marker (\n is a good choice because it cannot appear in the pattern space as it is used by sed as the line delimiter) as anchor for comparing throughout the input string. It is slow but can scale if more than one exception is needed.
Place the marker in front of the string s/^/\n/
Name a loop label :a
Match the exception(s) /\n\(djh=[0-9]*\)/
If the exception occurs substitute as necessary. Also bump the marker along /s//\1,\n/
If the above is true break to loop label ta
Match the normal and substitute. Also bump the marker along s/\n\([^=]*\)=1;/\1,\n/
If the above is true break to loop label ta
All done remove the marker s/.$//
or:
echo "afs=1;bgd=1;cgd=1;djh=1;fgjhh=1;" |
sed 's/\<djh=/\n/g;s/=[^;]*;/,/g;s/\n\([^;]*\);/djh=\1,/g'
afs,bgd,cgd,djh=1,fgjhh,
Explanation:
This is fast but does not scale for multiple exceptions:
Globaly replace the exception string with a new line s/\<djh=/\n/g
Globaly replace the normal condition s/=[^;]*;/,/g
Globaly replace the \n by the exception string s/\n\([^;]*\);/djh=\1,/g
N.B. When replacing the exception string make sure that it begins on a word boundary \<

Resources