Sed substitution - spaces by tabs - bash

I'm trying to formatting a batch of .c files via the sed command in a shell script to align properly the functions name. I'm replacing int(space)function1() by int(3tab)function1()
int function1(int foo)
{
*my_function_code*
}
char function2(int foo)
{
*my_function_code*
}
int main(int foo)
{
*my_function_code*
}
I'm actually using the following loop to apply my substitution :
#align global scope
printf " Correct global scope alignement...\n"
for file in ${FILES[#]}; do
sed -i -e 's/^int */int /g' \
-i -e 's/^char */char /g' \
-i -e 's/^float */float /g' \
-i -e 's/^long int */long int /g' ${file}
done
The problem is, if I rerun the script, instead of doing nothing, it will add multiple tabs again. Giving me this :
int function1(int foo)
{
*my_function_code*
}
char function2(int foo)
{
*my_function_code*
}
int main(int foo)
{
*my_function_code*
}
The * isn't supposed to looking only for spaces and not tabulations or is it considered as all blanks characters ?

Could you please try following, written and tested with shown samples. Simply checking if line starts either from int or char(you could add float and long int too in condition) then substitute spaces in 3 tabs here.
sed -E '/^int|^char/s/ +/\t\t\t/' Input_file

Related

sed or awk add content to a body of a c function running in git bash v2.21.0

To turn this function in a c file (test.c)
void Fuction(uint8, var)
{
dosomething();
}
// void Fuction(uint8, var)
// should not be injected below a comment with same pattern content
into:
void Fuction(uint8, var)
{
injected1();
injected2();
injected3();
dosomething();
}
// void Fuction(uint8, var)
// should not be injected below a comment with same pattern content
By injecting this one (inject.c)
injected1();
injected2();
injected3();
I tried several approaches with sed and awk but actually i was not able to inject the code below the open curly braces the code was injected before the curly braces.
On a regex website I was able to select the pattern including the curly braces, but in my script it did not work. May be awk is more compatible, but I have no deeper experiance with awk may some one coeld help here?
With awk i had a additional problem to pass the pattern variable with an ^ancor
call in git bash should be like this:
./inject.sh "void Fuction(uint8, var)" test.c inject.c
(my actual inject.sh bash script)
PATTERN=$1
FILE=$2
INJECTFILE=$3
sed -i "/^$PATTERN/r $INJECTFILE" $FILE
#sed -i "/^$PATTERN\r\{/r $INJECTFILE" $FILE
I actually have no idear to catch also the \n and the { in the next line
My result is:
void Fuction(uint8, var)
injected1();
injected2();
injected3();
{
dosomething();
}
// void Fuction(uint8, var)
// should not be injected below a comment with same pattern content
Expanding on OP's sed code:
sed "/^${PATTERN}/,/{/ {
/{/ r ${INJECTFILE}
}" $FILE
# or as a one-liner
sed -e "/^${PATTERN}/,/{/ {" -e "/{/ r ${INJECTFILE}" -e "}" $FILE
Where:
/^${PATTERN}/,/{/ finds range of rows starting with ^${PATTERN} and ending with a line that contains a {
{ ... } within that range ...
/{/ r ${INJECTFILE} - find the line containing a { and append the contents of ${INJECTFILE}
Results:
$ ./inject.sh "void Fuction(uint8, var)" test.c inject.c
void Fuction(uint8, var)
{
injected1();
injected2();
injected3();
dosomething();
}
// void Fuction(uint8, var)
// should not be injected below a comment with same pattern content
Once OP verifies the output the -i flag can be added to force sed to overwrite the file.
NOTE: OP's expected output shows the injected lines with some additional leading white space; if the intention is to auto-indent the injected lines to match with the current lines ... I'd probably want to look at something like awk in order to provide the additional formatting.

Substituting specific string using sed followed by a dot (.)

I am doing some substitution in files as follows:
file: abc.txt
`include foo.h
int main
{
int foo
foo and bar
barfoobar
}
I want to replace 'foo' inside the braces, but I don't want to replace the 'foo' written in the include directive.
I tried using :
sed -i "s/\bfoo\b/my_foo/g"
Output :
`include my_foo.h
int main
{
int my_foo
my_foo and bar
barfoobar
}
Any suggestions ??
To match a string not followed by a . and optionally followed by an end of line
sed -E 's/foo[^.]?$/my_foo/g' test.txt
It needs extended regular expressions. The -E on macos or -r on linux (man sed).
There are regex testers such as http://www.regextester.com/ that allow for exploration of regular expressions, or lots of IDE have them built in.
e.g. Regex to match URL end-of-line or "/" character
sed '/include/b; s/foo/my_&/' foo
include foo.h
int main
{
int my_foo
}
This means, if include is found, branch jump to end of the command.
I believe this command is what you are looking for:
If you want to replace all lines having foo excluding lines with include
sed -i '/foo/ {/include/! s/foo/my_foo/g}' test
Or
If you want to replace all entries of foo excluding entries having foo.
sed -i '/foo/ {/foo./! s/foo/my_foo/g}' test
As the last line of question and title seems ambiguous, I have answered both situations.
Session output:
$ cat test
include foo.h
int main
{
int foo
}
$
$ sed -i '/foo/ {/foo./! s/foo/my_foo/g}' test
$ cat test
include foo.h
int main
{
int my_foo
}
Try this:
sed -i "/^[^`]/ s/\bfoo\b/my_foo/g"
What it does is "Only apply the substitution to lines whose first character is not ` (backtick)."

awk substitution ascii table rules bash

I want to perform a hierarchical set of (non-recursive) substitutions in a text file.
I want to define the rules in an ascii file "table.txt" which contains lines of blank space tabulated pairs of strings:
aaa 3
aa 2
a 1
I have tried to solve it with an awk script "substitute.awk":
BEGIN { while (getline < file) { subs[$1]=$2; } }
{ line=$0; for(i in subs)
{ gsub(i,subs[i],line); }
print line;
}
When I call the script giving it the string "aaa":
echo aaa | awk -v file="table.txt" -f substitute.awk
I get
21
instead of the desired "3". Permuting the lines in "table.txt" doesn't help. Who can explain what the problem is here, and how to circumvent it? (This is a simplified version of my actual task. Where I have a large file containing ascii encoded phonetic symbols which I want to convert into Latex code. The ascii encoding of the symbols contains {$,&,-,%,[a-z],[0-9],...)).
Any comments and suggestions!
PS:
Of course in this application for a substitution table.txt:
aa ab
a 1
a original string: "aa" should be converted into "ab" and not "1b". That means a string which was yielded by applying a rule must be left untouched.
How to account for that?
The order of the loop for (i in subs) is undefined by default.
In newer versions of awk you can use PROCINFO["sorted_in"] to control the sort order. See section 12.2.1 Controlling Array Traversal and (the linked) section 8.1.6 Using Predefined Array Scanning Orders for details about that.
Alternatively, if you can't or don't want to do that you could store the replacements in numerically indexed entries in subs and walk the array in order manually.
To do that you will need to store both the pattern and the replacement in the value of the array and that will require some care to combine. You can consider using SUBSEP or any other character that cannot be in the pattern or replacement and then split the value to get the pattern and replacement in the loop.
Also note the caveats/etcץ with getline listed on http://awk.info/?tip/getline and consider not using that manually but instead using NR==1{...} and just listing table.txt as the first file argument to awk.
Edit: Actually, for the manual loop version you could also just keep two arrays one mapping input file line number to the patterns to match and another mapping patterns to replacements. Then looping over the line number array will get you the pattern and the pattern can be used in the second array to get the replacement (for gsub).
Instead of storing the replacements in an associative array, put them in two arrays indexed by integer (one array for the strings to replace, one for the replacements) and iterate over the arrays in order:
BEGIN {i=0; while (getline < file) { subs[i]=$1; repl[i++]=$2}
n = i}
{ for(i=0;i<n;i++) { gsub(subs[i],repl[i]); }
print tolower($0);
}
It seems like perl's zero-width word boundary is what you want. It's a pretty straightforward conversion from the awk:
#!/usr/bin/env perl
use strict;
use warnings;
my %subs;
BEGIN{
open my $f, '<', 'table.txt' or die "table.txt:$!";
while(<$f>) {
my ($k,$v) = split;
$subs{$k}=$v;
}
}
while(<>) {
while(my($k, $v) = each %subs) {
s/\b$k\b/$v/g;
}
print;
}
Here's an answer pulled from another StackExchange site, from a fairly similar question: Replace multiple strings in a single pass.
It's slightly different in that it does the replacements in inverse order by length of target string (i.e. longest target first), but that is the only sensible order for targets which are literal strings, as appears to be the case in this question as well.
If you have tcc installed, you can use the following shell function, which process the file of substitutions into a lex-generated scanner which it then compiles and runs using tcc's compile-and-run option.
# Call this as: substitute replacements.txt < text_to_be_substituted.txt
# Requires GNU sed because I was too lazy to write a BRE
substitute () {
tcc -run <(
{
printf %s\\n "%option 8bit noyywrap nounput" "%%"
sed -r 's/((\\\\)*)(\\?)$/\1\3\3/;
s/((\\\\)*)\\?"/\1\\"/g;
s/^((\\.|[^[:space:]])+)[[:space:]]*(.*)/"\1" {fputs("\3",yyout);}/' \
"$1"
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"
} | lex -t)
}
With gcc or clang, you can use something similar to compile a substitution program from the replacement list, and then execute that program on the given text. Posix-standard c99 does not allow input from stdin, but gcc and clang are happy to do so provided you tell them explicitly that it is a C program (-x c). In order to avoid excess compilations, we use make (which needs to be gmake, Gnu make).
The following requires that the list of replacements be in a file with a .txt extension; the cached compiled executable will have the same name with a .exe extension. If the makefile were in the current directory with the name Makefile, you could invoke it as make repl (where repl is the name of the replacement file without a text extension), but since that's unlikely to be the case, we'll use a shell function to actually invoke make.
Note that in the following file, the whitespace at the beginning of each line starts with a tab character:
substitute.mak
.SECONDARY:
%: %.exe
#$(<D)/$(<F)
%.exe: %.txt
#{ printf %s\\n "%option 8bit noyywrap nounput" "%%"; \
sed -r \
's/((\\\\)*)(\\?)$$/\1\3\3/; #\
s/((\\\\)*)\\?"/\1\\"/g; #\
s/^((\\.|[^[:space:]])+)[[:space:]]*(.*)/"\1" {fputs("\3",yyout);}/' \
"$<"; \
printf %s\\n "%%" "int main(int argc, char** argv) { return yylex(); }"; \
} | lex -t | c99 -D_POSIX_C_SOURCE=200809L -O2 -x c -o "$#" -
Shell function to invoke the above:
substitute() {
gmake -f/path/to/substitute.mak "${1%.txt}"
}
You can invoke the above command with:
substitute file
where file is the name of the replacements file. (The filename must end with .txt but you don't have to type the file extension.)
The format of the input file is a series of lines consisting of a target string and a replacement string. The two strings are separated by whitespace. You can use any valid C escape sequence in the strings; you can also \-escape a space character to include it in the target. If you want to include a literal \, you'll need to double it.
If you don't want C escape sequences and would prefer to have backslashes not be metacharacters, you can replace the sed program with a much simpler one:
sed -r 's/([\\"])/\\\1/g' "$<"; \
(The ; \ is necessary because of the way make works.)
a) Don't use getline unless you have a very specific need and fully understand all the caveats, see http://awk.info/?tip/getline
b) Don't use regexps when you want strings (yes, this means you cannot use sed).
c) The while loop needs to constantly move beyond the part of the line you've already changed or you could end up in an infinite loop.
You need something like this:
$ cat substitute.awk
NR==FNR {
if (NF==2) {
strings[++numStrings] = $1
old2new[$1] = $2
}
next
}
{
for (stringNr=1; stringNr<=numStrings; stringNr++) {
old = strings[stringNr]
new = old2new[old]
slength = length(old)
tail = $0
$0 = ""
while ( sstart = index(tail,old) ) {
$0 = $0 substr(tail,1,sstart-1) new
tail = substr(tail,sstart+slength)
}
$0 = $0 tail
}
print
}
$ echo aaa | awk -f substitute.awk table.txt -
3
$ echo aaaa | awk -f substitute.awk table.txt -
31
and adding some RE metacharacters to table.txt to show they are treated just like every other character and showing how to run it when the target text is stored in a file instead of being piped:
$ cat table.txt
aaa 3
aa 2
a 1
. 7
\ 4
* 9
$ cat foo
a.a\aa*a
$ awk -f substitute.awk table.txt foo
1714291
Your new requirement requires a solution like this:
$ cat substitute.awk
NR==FNR {
if (NF==2) {
strings[++numStrings] = $1
old2new[$1] = $2
}
next
}
{
delete news
for (stringNr=1; stringNr<=numStrings; stringNr++) {
old = strings[stringNr]
new = old2new[old]
slength = length(old)
tail = $0
$0 = ""
charPos = 0
while ( sstart = index(tail,old) ) {
charPos += sstart
news[charPos] = new
$0 = $0 substr(tail,1,sstart-1) RS
tail = substr(tail,sstart+slength)
}
$0 = $0 tail
}
numChars = split($0, olds, "")
$0 = ""
for (charPos=1; charPos <= numChars; charPos++) {
$0 = $0 (charPos in news ? news[charPos] : olds[charPos])
}
print
}
.
$ cat table.txt
1 a
2 b
$ echo "121212" | awk -f substitute.awk table.txt -
ababab

Removing control / special characters from log file

I have a log file captured by tclsh which captures all the backspace characters (ctrl-H, shows up as "^H") and color-setting sequences (eg. ^[[32m .... ^[[0m ). What is an efficient way to remove them?
^[...m
This one is easy since, I can just do "sed -i /^[.*m//g" to remove them
^H
Right now I have "sed -i s/.^H//", which "applies" a backspace, but I have to keep looping this until there are no more backspaces.
while [ logfile == `grep -l ^H logfile` ]; do sed -i s/.^H// logfile ; done;
"sed -i s/.^H//g" doesn't work because it would match consecutive backspaces. This process takes 11 mins for my log file with ~6k lines, which is too long.
Any better ways to remove the backspace?
You could always write a simple pipeline command to implement the backspace stripping, something like this:
#include <stdio.h>
#include <stdlib.h>
#define BUFFERSIZE 10240
int main(int argc, char* argv[])
{
int c ;
int buf[BUFFERSIZE] ;
int pos = 0 ;
while((c = getchar()) != EOF)
{
switch (c)
{
case '\b':
{
if (pos > 0)
pos-- ;
break ;
}
case '\n':
{
int i ;
for (i = 0; i < pos; ++i)
putchar(buf[i]) ;
putchar('\n') ;
pos = 0 ;
break ;
}
default:
{
buf[pos++] = c ;
break ;
}
}
}
return 0 ;
}
I've only given the code a minimal test and you may need to adjust the buffer sze depending on how big your lines our. It might be an idea to assert that pos is < BUFERSSIZE after pos++ just to be safe!
Alternatively you could maybe implement something similar with the Tcl code that captures the log file in the first place; but without knowing how that works it's a bit hard to say.
Your could try:
sed -i s/[^^H]^H//g
This might or might not work in one go, but should at least be faster than one at a time as you seem to be doing now.
Did you know that “sed” doesn't just do substitutions? The commands of a sed script have to be on separate lines though (or at least they do on the version of sed I've got on this machine).
sed -i bak 's/^[[^^]]*m//g
: again
s/[^^H]^H//g
t again' logfile
The : sets up a label (again in this case) and t branches to a label if any substitutions have been performed (since the start/last branch). Wrapping those round a suitable s gets the substitution applied until it can't any more.
Just to put it out here, I ended up doing this. It's not a pretty solution and not as flexible as Jackson's answer but does what I need in my particular case. I basically use the inner loop to generate the match string for sed.
# "Applies" up to 10 consecutive backspaces
for i in {10..1}; do
match=""
for j in `seq 1 $i`; do
match=".${match}^H"
done;
# Can't put quotes around s//g or else backspaces are evaluated
sed -i s/${match}//g ${file-to-process}
done;

sed: how to replace CR and/or LF with "\r" "\n", so any file will be in one line

I have files like
aaa
bbb
ccc
I need them to sed into aaa\r\nbbb\r\nccc
It should work either for unix and windows replacing them with \r or \r\n accordingly
The problem is that sed adds \n at the end of line but keeps lines separated. How can I fix it?
These two commands together should do what you want:
sed ':a;N;$!ba;s/\r/\\r/g'
sed ':a;N;$!ba;s/\n/\\n/g'
Pass your input file through both to get the output you want. Theres probably a way to combine them into a single expression.
Stolen and Modified from this question:
How can I replace a newline (\n) using sed?
It's possible to merge lines in sed, but personally, I consider needing to change line breaks a sign that it's time to give up on sed and use a more powerful language instead. What you want is one line of perl:
perl -e 'undef $/; while (<>) { s/\n/\\n/g; s/\r/\\r/g; print $_, "\n" }'
or 12 lines of python:
#! /usr/bin/python
import fileinput
from sys import stdout
first = True
for line in fileinput.input(mode="rb"):
if fileinput.isfirstline() and not first:
stdout.write("\n")
if line.endswith("\r\n"): stdout.write(line[:-2] + "\\r\\n")
elif line.endswith("\n"): stdout.write(line[:-1] + "\\n")
elif line.endswith("\r"): stdout.write(line[:-1] + "\\r")
first = False
if not first: stdout.write("\n")
or 10 lines of C to do the job, but then a whole bunch more because you have to process argv yourself:
#include <stdio.h>
void process_one(FILE *fp)
{
int c;
while ((c = getc(fp)) != EOF)
if (c == '\n') fputs("\\n", stdout);
else if (c == '\r') fputs("\\r", stdout);
else putchar(c);
fclose(fp);
putchar('\n');
}
int main(int argc, char **argv)
{
FILE *cur;
int i, consumed_stdin = 0, rv = 0;
if (argc == 1) /* no arguments */
{
process_one(stdin);
return 0;
}
for (i = 1; i < argc; i++)
{
if (argc[i][0] == '-' && argc[i][1] == 0)
{
if (consumed_stdin)
{
fputs("cannot read stdin twice\n", stderr);
rv = 1;
continue;
}
cur = stdin;
consumed_stdin = 1;
}
else
{
cur = fopen(ac[i], "rb");
if (!cur)
{
perror(ac[i]);
rv = 1;
continue;
}
}
process_one(cur);
}
return rv;
}
awk '{printf("%s\\r\\n",$0)} END {print ""}' file
tr -s '\r' '\n' <file | unix2dos
EDIT (it's been pointed out that the above misses the point entirely! •///•)
tr -s '\r' '\n' <file | perl -pe 's/\s+$/\\r\\n/'
The tr gets rid of empty lines and dos line endings. The pipe means two processes—good on modern hardware.

Resources