I would like to parse long options in a shell script. POSIX only provides getopts to parse single letter options. Does anyone know of a portable (POSIX) way to implement long option parsing in the shell? I've looked at what autoconf does when generating configure scripts, but the result is far from elegant. I can live with accepting only the full spellings of long options. Single letter options should still be allowed, possibly in groups.
I'm thinking of a shell function taking a space separated list of args of the form option[=flags], where the flags indicate that the option takes an arg or can be specified multiple times. Unlike its C counterpart there is no need to distinguish between strings, integers and floats.
Design notes towards a portable shell getopt_long command
I have a program getoptx which works with single-letter options (hence it is not the answer to your problem), but it handles arguments with spaces correctly, which the original getopt command (as opposed to the shell built-in getopts) does not. The specification in the source code says:
/*
** Usage: eval set -- $(getoptx abc: "$#")
** eval set -- $(getoptx abc: -a -c 'a b c' -b abc 'd e f')
** The positional parameters are:
** $1 = "-a"
** $2 = "-c"
** $3 = "a b c"
** $4 = "-b"
** $5 = "--"
** $6 = "abc"
** $7 = "d e f"
**
** The advantage of this over the standard getopt program is that it handles
** spaces and other metacharacters (including single quotes) in the option
** values and other arguments. The standard code does not! The downside is
** the non-standard invocation syntax compared with:
**
** set -- $(getopt abc: "$#")
*/
I recommend the eval set -- $(getopt_long "$optspec" "$#") notation for your getopt_long.
One major issue with getopt_long is the complexity of the argument specification — the $optspec in the example.
You may want to look at the notation used in the Solaris CLIP (Command Line Interface Paradigm) for the notation; it uses a single string (like the original getopt() function) to describe the options. (Google: 'solaris clip command line interface paradigm'; using just 'solaris clip' gets you to video clips.)
This material is a partial example derived from Sun's getopt_clip():
/*
Example 2: Check Options and Arguments.
The following example parses a set of command line options and prints
messages to standard output for each option and argument that it
encounters.
This example can be expanded to be CLIP-compliant by substituting the
long string for the optstring argument:
While not encouraged by the CLIP specification, multiple long-option
aliases can also be assigned as shown in the following example:
:a(ascii)b(binary):(in-file)(input)o:(outfile)(output)V(version)?(help)
*/
static const char *arg0 = 0;
static void print_help(void)
{
printf("Usage: %s [-a][-b][-V][-?][-f file][-o file][path ...]\n", arg0);
printf("Usage: %s [-ascii][-binary][-version][-in-file file][-out-file file][path ...]\n", arg0);
exit(0);
}
static const char optstr[] =
":a(ascii)b(binary)f:(in-file)o:(out-file)V(version)?(help)";
int main(int argc, char **argv)
{
int c;
char *filename;
arg0 = argv[0];
while ((c = getopt_clip(argc, argv, optstr)) != -1)
{
...
}
...
}
Related
I need to pass an argument which will change every time from C program to a shell script.
int val=1234;
char buf[100];
sprintf(buf,"echo %d",val);
system("call.sh $buf");
call.sh::
#!/bin/sh
echo "welcome"
echo $*
echo "done"
output of C is::
welcome
done
I cant see the argument value which is 1234 in the script. Can anybody suggest me to get right value...
You can't pass a C variable as a shell variable. You need to build the whole command line in the string, and then pass it to system(...)
int val=1234;
char buf[100];
sprintf(buf, "call.sh %d", val);
system(buf);
You should use the setenv(), getenv() or putenv() functions (defined instdlib.h). Quoting the man:
The setenv() function adds the variable name to the environment with the value value, if name does not already exist. If name does exist in the environment, then its value is changed to value if overwrite is nonzero; if overwrite is zero, then the value of name is not changed. This function makes copies of the strings pointed to by name and value (by contrast with putenv(3)).
The prototype of the function is the following:
int setenv(const char *name, const char *value, int overwrite);
I like to have a command-line calculator handy. The requirements are:
Support all the basic arithmetic operators: +, -, /, *, ^ for exponentiation, plus parentheses for grouping.
Require minimal typing, I don't want to have to call a program interact with it then asking it to exit.
Ideally only one character and a space in addition to the expression itself should be entered into the command line.
It should know how to ignore commas and dollar (or other currency symbols)
in numbers to allow me to copy/paste from the web without worrying
about having to clean every number before pasting it into the calculator
Be white-space tolerant, presence or lack of spaces shouldn't cause errors
No need for quoting anything in the expression to protect it from the shell - again for the benefit of minimal typing
Since tcsh supports alias positional arguments, and since alias expansion precedes all other expansions except history-expansion, it was straight forward to implement something close to my ideal in tcsh.
I used this:
alias C 'echo '\''\!*'\'' |tr -d '\'',\042-\047'\'' |bc -l'
Now I can do stuff like the following with minimal typing:
# the basic stuff:
tcsh> C 1+2
3
# dollar signs, multiplication, exponentiation:
tcsh> C $8 * 1.07^10
15.73721085831652257992
# parentheses, mixed spacing, zero power:
tcsh> C ( 2+5 ) / 8 * 2^0
.87500000000000000000
# commas in numbers, no problem here either:
tcsh> C 1,250.21 * 1.5
1875.315
As you can see there's no need to quote anything to make all these work.
Now comes the problem. Trying to do the same in bash, where parameter aliases aren't supported forces me to implement the calculator as a shell function and pass the parameters using "$#"
function C () { echo "$#" | tr -d ', \042-\047' | bc -l; }
This breaks in various ways e.g:
# works:
bash$ C 1+2
3
# works:
bash$ C 1*2
2
# Spaces around '*' lead to file expansion with everything falling apart:
bash$ C 1 * 2
(standard_in) 1: syntax error
(standard_in) 1: illegal character: P
(standard_in) 1: illegal character: S
(standard_in) 1: syntax error
...
# Non-leading parentheses seem to work:
bash$ C 2*(2+1)
6
# but leading-parentheses don't:
bash$ C (2+1)*2
bash: syntax error near unexpected token `2+1'
Of course, adding quotes around the expression solves these issues, but is against the original requirements.
I understand why things break in bash. I'm not looking for explanations. Rather, I'm looking for a solution which doesn't require manually quoting the arguments. My question to bash wizards is is there any way to make bash support the handy minimal typing calculator alias. Not requiring quoting, like tcsh does? Is this impossible? Thanks!
If you're prepared to type C Enter instead of C Space, the sky's the limit. The C command can take input in whatever form you desire, unrelated to the shell syntax.
C () {
local line
read -p "Arithmetic: " -e line
echo "$line" | tr -d \"-\', | bc -l
}
In zsh:
function C {
local line=
vared -p "Arithmetic: " line
echo $line | tr -d \"-\', | bc -l
}
In zsh, you can turn off globbing for the arguments of a specific command with the noglob modifier. It is commonly hidden in an alias. This prevents *^() from begin interpreted literally, but not quotes or $.
quickie_arithmetic () {
echo "$*" | tr -d \"-\', | bc -l
}
alias C='noglob quickie_arithmetic'
At least preventing the expansion of * is possible using 'set -f' (following someone's blog post:
alias C='set -f -B; Cf '
function Cf () { echo "$#" | tr -d ', \042-\047' | bc -l; set +f; };
Turning it off in the alias, before the calculation, and back on afterwards
$ C 2 * 3
6
I downloaded the bash sources and looked very closely. It seems the parenthesis error occurs directly during the parsing of the command line, before any command is run or alias is expanded. And without any flag to turn it off.
So it would be impossible to do it from a bash script.
This means, it is time to bring the heavy weapons. Before parsing the command line is read from stdin using readline. Therefore, if we intercept the call to readline, we can do whatever we want with the command line.
Unfortunately bash is statically linked against readline, so the call cannot be intercepted directly. But at least readline is a global symbol, so we can get the address of the function using dlsym, and with that address we can insert arbitrary instructions in readline.
Modifying readline directly is prune to errors, if readline is changed between the different bash version, so we modify the function calling readline, leading to following plan:
Locate readline with dlsym
Replace readline with our own function that uses the current stack to locate the function calling readline (yy_readline_get) on its first call and then restores the original readline
Modify yy_readline_get to call our wrapper function
Within the wrapper function: Replace the parentheses with non problematic symbols, if the input starts with "C "
Written in C for amd64, we get:
#include <string.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#ifndef __USE_GNU
#define __USE_GNU
#endif
#ifndef __USE_MISC
#define __USE_MISC
#endif
#include <dlfcn.h>
#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
//-----------Assembler helpers----------
#if (defined(x86_64) || defined(__x86_64__))
//assembler instructions to read rdp, which we need to read the stack
#define MOV_EBP_OUT "mov %%rbp, %0"
//size of a call instruction
#define RELATIVE_CALL_INSTRUCTION_SIZE 5
#define IS64BIT (1)
/*
To replace a function with a new one, we use the push-ret trick, pushing the destination address on the stack and let ret jump "back" to it
This has the advantage that we can set an additional return address in the same way, if the jump goes to a function
This struct corresponds to the following assembler fragment:
68 ???? push <low_dword (address)>
C7442404 ???? mov DWORD PTR [rsp+4], <high_dword (address) )
C3 ret
*/
typedef struct __attribute__((__packed__)) LongJump {
char push; unsigned int destinationLow;
unsigned int mov_dword_ptr_rsp4; unsigned int destinationHigh;
char ret;
// char nopFiller[16];
} LongJump;
void makeLongJump(void* destination, LongJump* res) {
res->push = 0x68;
res->destinationLow = (uintptr_t)destination & 0xFFFFFFFF;
res->mov_dword_ptr_rsp4 = 0x042444C7;
res->destinationHigh = ((uintptr_t)(destination) >> 32) & 0xFFFFFFFF;
res->ret = 0xC3;
}
//Macros to save and restore the rdi register, which is used to pass an address to readline (standard amd64 calling convention)
typedef unsigned long SavedParameter;
#define SAVE_PARAMETERS SavedParameter savedParameters; __asm__("mov %%rdi, %0": "=r"(savedParameters));
#define RESTORE_PARAMETERS __asm__("mov %0, %%rdi": : "r"(savedParameters));
#else
#error only implmented for amd64...
#endif
//Simulates the effect of the POP instructions, popping from a passed "stack pointer" and returning the popped value
static void * pop(void** stack){
void* temp = *(void**)(*stack);
*stack += sizeof(void*);
return temp;
}
//Disables the write protection of an address, so we can override it
static int unprotect(void * POINTER){
const int PAGESIZE = sysconf(_SC_PAGE_SIZE);;
if (mprotect((void*)(((uintptr_t)POINTER & ~(PAGESIZE-1))), PAGESIZE, PROT_READ | PROT_WRITE | PROT_EXEC)) {
fprintf(stderr, "Failed to set permission on %p\n", POINTER);
return 1;
}
return 0;
}
//Debug stuff
static void fprintfhex(FILE* f, void * hash, int len) {
for (int i=0;i<len;i++) {
if ((uintptr_t)hash % 8 == 0 && (uintptr_t)i % 8 == 0 && i ) fprintf(f, " ");
fprintf(f, "%.2x", ((unsigned char*)(hash))[i]);
}
fprintf(f, "\n");
}
//---------------------------------------
//Address of the original readline function
static char* (*real_readline)(const char*)=0;
//The wrapper around readline we want to inject.
//It replaces () with [], if the command line starts with "C "
static char* readline_wrapper(const char* prompt){
if (!real_readline) return 0;
char* result = real_readline(prompt);
char* temp = result; while (*temp == ' ') temp++;
if (temp[0] == 'C' && temp[1] == ' ')
for (int len = strlen(temp), i=0;i<len;i++)
if (temp[i] == '(') temp[i] = '[';
else if (temp[i] == ')') temp[i] = ']';
return result;
}
//Backup of the changed readline part
static unsigned char oldreadline[2*sizeof(LongJump)] = {0x90};
//A wrapper around the readline wrapper, needed on amd64 (see below)
static LongJump* readline_wrapper_wrapper = 0;
static void readline_initwrapper(){
SAVE_PARAMETERS
if (readline_wrapper_wrapper) { fprintf(stderr, "ERROR!\n"); return; }
//restore readline
memcpy(real_readline, oldreadline, 2*sizeof(LongJump));
//find call in yy_readline_get
void * frame;
__asm__(MOV_EBP_OUT: "=r"(frame)); //current stackframe
pop(&frame); //pop current stackframe (??)
void * returnToFrame = frame;
if (pop(&frame) != real_readline) {
//now points to current return address
fprintf(stderr, "Got %p instead of %p=readline, when searching caller\n", frame, real_readline);
return;
}
void * caller = pop(&frame); //now points to the instruction following the call to readline
caller -= RELATIVE_CALL_INSTRUCTION_SIZE; //now points to the call instruction
//fprintf(stderr, "CALLER: %p\n", caller);
//caller should point to 0x00000000004229e1 <+145>: e8 4a e3 06 00 call 0x490d30 <readline>
if (*(unsigned char*)caller != 0xE8) { fprintf(stderr, "Expected CALL, got: "); fprintfhex(stderr, caller, 16); return; }
if (unprotect(caller)) return;
//We can now override caller to call an arbitrary function instead of readline.
//However, the CALL instruction accepts only a 32 parameter, so the called function has to be in the same 32-bit address space
//Solution: Allocate memory at an address close to that CALL instruction and put a long jump to our real function there
void * hint = caller;
readline_wrapper_wrapper = 0;
do {
if (readline_wrapper_wrapper) munmap(readline_wrapper_wrapper, 2*sizeof(LongJump));
readline_wrapper_wrapper = mmap(hint, 2*sizeof(LongJump), PROT_EXEC | PROT_READ | PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (readline_wrapper_wrapper == MAP_FAILED) { fprintf(stderr, "mmap failed: %i\n", errno); return; }
hint += 0x100000;
} while ( IS64BIT && ( (uintptr_t)readline_wrapper_wrapper >= 0xFFFFFFFF + ((uintptr_t) caller) ) ); //repeat until we get an address really close to caller
//fprintf(stderr, "X:%p\n", readline_wrapper_wrapper);
makeLongJump(readline_wrapper, readline_wrapper_wrapper); //Write the long jump in the newly allocated space
//fprintfhex(stderr, readline_wrapper_wrapper, 16);
//fprintfhex(stderr, caller, 16);
//patch caller to become call <readline_wrapper_wrapper>
//called address is relative to address of CALL instruction
*(uint32_t*)(caller+1) = (uint32_t) ((uintptr_t)readline_wrapper_wrapper - (uintptr_t)(caller + RELATIVE_CALL_INSTRUCTION_SIZE) );
//fprintfhex(stderr, caller, 16);
*(void**)(returnToFrame) = readline_wrapper_wrapper; //change stack to jump to wrapper instead real_readline (or it would not work on the first entered command)
RESTORE_PARAMETERS
}
static void _calc_init(void) __attribute__ ((constructor));
static void _calc_init(void){
if (!real_readline) {
//Find readline
real_readline = (char* (*)(const char*)) dlsym(RTLD_DEFAULT, "readline");
if (!real_readline) return;
//fprintf(stdout, "loaded %p\n", real_readline);
//fprintf(stdout, " => %x\n", * ((int*) real_readline));
if (unprotect(real_readline)) { fprintf(stderr, "Failed to unprotect readline\n"); return; }
memcpy(oldreadline, real_readline, 2*sizeof(LongJump)); //backup readline's instructions
//Replace readline with readline_initwrapper
makeLongJump(real_readline, (LongJump*)real_readline); //add a push/ret long jump from readline to readline, to have readline's address on the stack in readline_initwrapper
makeLongJump(readline_initwrapper, (LongJump*)((char*)real_readline + sizeof(LongJump) - 1)); //add a push/ret long jump from readline to readline_initwrapper, overriding the previous RET
}
}
This can be compiled to an intercepting library with:
gcc -g -std=c99 -shared -fPIC -o calc.so -ldl calc.c
and then loaded in bash with:
gdb --batch-silent -ex "attach $BASHPID" -ex 'print dlopen("calc.so", 0x101)'
Now, when the previous alias extended with parenthesis replacement is loaded:
alias C='set -f -B; Cf '
function Cf () { echo "$#" | tr -d ', \042-\047' | tr [ '(' | tr ] ')' | bc -l; set +f; };
We can write:
$ C 1 * 2
2
$ C 2*(2+1)
6
$ C (2+1)*2
6
Even better it becomes, if we switch from bc to qalculate:
alias C='set -f -B; Cf '
function Cf () { echo "$#" | tr -d ', \042-\047' | tr [ '(' | tr ] ')' | xargs qalc ; set +f; };
Then we can do:
$ C e ^ (i * pi)
e^(i * pi) = -1
$ C 3 c
3 * speed_of_light = approx. 899.37737(km / ms)
GCC allows customization of printf specifiers. However, I don't see how I can "teach" it to accept my string class for %s specifier. My string class is a simple wrapper over char pointer and has exactly one member variable (char * data) and no virtual functions. So, it's kind of ok to pass it as-is to printf-like functions in place of regular char *. The problem is that on gcc static analyzer prevents me from doing so and I have to explicitly cast it to const char * to avoid warnings or errors.
My cstring looks something like this:
class cstring
{
cstring() : data(NULL){}
cstring(const char * str) : data(strdup(str)){}
cstring(const cstring & str) : data(strdup(str.data)){}
~cstring()
{
free(data);
}
...
const char * c_str() const
{
return data;
}
private:
char * data;
};
Example code that uses cstring:
cstring str("my string");
printf("str: '%s'", str);
On GCC I get this error:
error: cannot pass objects of non-trivially-copyable type 'class cstring' through '...'
error: format '%s' expects argument of type 'char*', but argument 1 has type 'cstring' [-Werror=format]
cc1plus.exe: all warnings being treated as errors
The C++ standard doesn't require compilers to support this sort of code, and not all versions of gcc support it. (https://gcc.gnu.org/onlinedocs/gcc/Conditionally-supported-behavior.html suggests that gcc-6.0 does, at least - an open question whether it will work with classes such as the one here.)
The relevant section in the C++11 standard is 5.2.2 section 7:
When there is no parameter for a given argument, the argument is passed in such a way that the receiving function can obtain the value of the argument by invoking va_arg ...
Passing a potentially-evaluated argument of class type (Clause 9)
having a non-trivial copy constructor, a non-trivial move constructor,
or a non-trivial destructor, with no corresponding parameter, is
conditionally-supported with implementation-defined semantics.
(But look on the bright side: if you get into the habit of using c_str, then at least you won't get tripped up when/if you use std::string.)
I know I can type print someFloatVariable when I set a breakpoint or po [self someIvarHoldingAnObject], but I can't do useful things like:
[self setAlpha:1];
Then it spits out:
error: '[self' is not a valid command.
Weird thing is that I can call po [self someIvarHoldingAnObject] and it will print it's description.
I believe I've seen a video a year ago where someone demonstrated how to execute code through the console at runtime, and if I am not mistaken this guy also provided arguments and assigned objects to pointers. How to do that?
The canonical reference for gdb v. lldb commands is http://lldb.llvm.org/lldb-gdb.html
You want to use the expr command which evaluates an expression. It's one of the lldb commands that takes "raw input" in addition to arguments so you often need a "--" to indicate where the arguments (to expr) end and the command(s) begin. e.g.
(lldb) expr -- [self setAlpha:1]
There is a shortcut, "p", which does the -- for you (but doesn't allow any arguments), e.g.
(lldb) p [self setAlpha:1]
If the function(s) you're calling are not part of your program, you'll often need to explicitly declare their return type so lldb knows how to call them. e.g.
(lldb) p printf("hi\n")
error: 'printf' has unknown return type; cast the call to its declared return type
error: 1 errors parsing expression
(lldb) p (int)printf("hi\n")
(int) $0 = 3
hi
(lldb)
There is a neat way to work around the floating point argument problem, BTW. You create an "expression prefix" file which is added to every expression you enter in lldb, with a prototype of your class methods. For instance, I have a class MyClass which inherits from NSObject, it has two methods of interest, "setArg:" and "getArg" which set and get a float ivar. This is a silly little example, but it shows how to use it. Here's a prefix file I wrote for lldb:
#interface NSObject
#end
#interface MyClass : NSObject
- init;
- setArg: (float)arg;
- (float) getArg;
#end
extern "C" {
int strcmp (const char *, const char *);
int printf(const char * __restrict, ...);
void puts (const char *);
}
in my ~/.lldbinit file I add
settings set target.expr-prefix /Users/jason/lldb-prefix.h
and now I can do
(lldb) p [var getArg]
(float) $0 = 0.5
(lldb) p [var setArg:0.7]
(id) $1 = 0x0000000100104740
(lldb) p [var getArg]
(float) $2 = 0.7
You'll notice I included a couple of standard C library functions in here too. After doing this, I don't need to cast the return types of these any more, e.g.
(lldb) p printf("HI\n")
<no result>
HI
(lldb) p strcmp ("HI", "THERE")
(int) $3 = -12
(a fix for that "<no result>" thing has been committed to the lldb TOT sources already.)
If you need multiline, use expression:
expression
do {
try thing.save()
} catch {
print(error)
}
// code will execute now
Blank line to finish and execute the code.
I have a couple structure definitions in my input code. For example:
struct node {
int val;
struct node *next;
};
or
typedef struct {
int numer;
int denom;
} Rational;
I used the following line to convert them into one line and copy it twice.
sed '/struct[^(){]*{/{:l N;s/\n//;/}[^}]*;/!t l;s/ */ /g;p;p}'
the result is this:
struct node { int val; struct node *next;};
struct node { int val; struct node *next;};
struct node { int val; struct node *next;};
typedef struct { int numer; int denom;} Rational;
typedef struct { int numer; int denom;} Rational;
typedef struct { int numer; int denom;} Rational;
This is what I want:
I would like the first line to be restored to the original structure block
I would like the second line to turn into to a function heading that looks like this...
void init_structName( structName *var, int data1, int data2 )
-structName is basically the name of the structure.
-var is any name you like.
-data1, data2.... are values that are in the struct.
3.I would like the third line to turn into to the function body. Where I initialize the the data parameters. It would look like this.
{
var->data1 = data1;
var->data2 = data2;
}
Keep in mind that ALL my struct definitions in the input file are placed in one line and copied three times. So when the code finds a structure defintion it can assume that there will be two more copies below.
For example, this is the output I want if the input file had the repeating lines shown above.
struct node {
int val;
struct node *next;
};
void init_node(struct node *var, int val, struct node *next)
{
var->val = val;
var->next = next;
}
typedef struct {
int numer;
int denom;
} Rational;
void init_Rational( Rational *var, int numer, int denom )
{
var->numer = numer;
var->denom = denom;
}
In case someone was curious. These functions will be called from the main function to initialize the struct variables.
Can someone help? I realize this is kind of tough.
Thanks so much!!
Seeing that sed is Turing Complete, it is possible to do it in a single go, but that doesn't mean that the code is very user friendly =)
My attempt at a solution would be:
#!/bin/sed -nf
/struct/b continue
p
d
: continue
# 1st step:
s/\(struct\s.*{\)\([^}]*\)\(}.*\)/\1\
\2\
\3/
s/;\(\s*[^\n}]\)/;\
\1/g
p
s/.*//
n
# 2nd step:
s/struct\s*\([A-Za-z_][A-Za-z_0-9]*\)\s*{\([^}]*\)}.*/void init_\1(struct \1 *var, \2)/
s/typedef\s*struct\s*{\([^}]*\)}\s*\([A-Za-z_][A-Za-z_0-9]*\)\s*;/void init_\2(struct \2 *var, \1)/
s/;/,/g
s/,\s*)/)/
p
s/.*//
n
# 3rd step
s/.*{\s*\([^}]*\)}.*/{\
\1}/
s/[A-Za-z \t]*[\* \t]\s*\([A-Za-z_][A-Za-z_0-9]*\)\s*;/\tvar->\1 = \1;\
/g
p
I'll try to explain everything I did, but firstly I must warn that this probably isn't very generalized. For example, it assumes that the three identical lines follow each other (ie. no other line between them).
Before starting, notice that the file is a script that requires the "-n" flag to run. This tells sed to not print anything to standard output unless the script explicitly tells it to (through the "p" command, for example). The "-f" options is a "trick" to tell sed to open the file that follows. When executing the script with "./myscript.sed", bash will execute "/bin/sed -nf myscript.sed", so it will correctly read the rest of the script.
Step zero would be just a check to see if we have a valid line. I'm assuming every valid line contains the word struct. If the line is valid, the script branches (jumps, the "b" command is equivalent to the goto statement in C) to the continue label (differently from C, labels start with ":", rather than ending with it). If it isn't valid, we force it to be printed with the "p" command, and then delete the line from pattern space with the "d" command. By deleting the line, sed will read the next line and start executing the script from the beginning.
If the line is valid, the actions to change the lines start. The first step is to generate the struct body. This is done by a series of commands.
Separate the line into three parts, everything up to the opening bracket, everything up to the closing bracket (but without including it), and everything from the closing bracket (now including it). I should mention that one of the quirks of sed is that we search for newlines with "\n", but write newlines with a "\" followed by an actual newline. That's why this command is split into three different lines. IIRC this behaviour is specific to POSIX sed, but probably the GNU version (present in most Linux distributions) allows writing a newline with "\n".
Add a newline after every semicolon. The this works is a bit awkward, we copy everything after the semicolon after a newline inserted after the semicolon. The g flag tells sed to do this repeatedly, and that is why it works. Also note again the newline escaping.
Force the result to be printed
Before the second step, we manually clear the lines from the pattern-space (ie. buffer), so we can start fresh for the next line. If we did this with the "d" command, sed would start reading the commands from the start of the file again. The "n" command then reads the next line into the pattern-space. After that, we start the commands to transform the line into a function declaration:
We first match the word struct, followed by zero or more white space, then followed by a C identifier that can start with underscore or alphabetic letters, and can contain underscores and alphanumeric characters. The identifier is captured into the "variable" "\1". We then match the content between brackets, which is stored into "\2". These are then used to generate the function declaration.
We then do the same process, but now for the "typedef" case. Notice that now the identifier is after the brackets, so "\1" now contains the contents inside the brackets and "\2" contains the identifier.
Now we replace all semicolons with commas, so it can start looking more like a function definition.
The last substitute command removes the extra comma before the closing parenthesis.
Finally print the result.
Again, before the last step, manually clean the pattern-space and read the next line. The step will then generate the function body:
Match and capture everything inside the brackets. Notice the ".*" before the opening bracket and after the closing bracket. This is used so only the contents of the brackets are written afterwards. When writing the output, we place the opening the bracket in a separate line.
We match alphabetic characters and spaces, so we can skip the type declaration. We require at least a white space character or an asterisk (for pointers) to mark the start of the identifier. We then proceed to capture the identifier. This only works because of what follows the capture: we explicitly require that after the identifier there are only optional white spaces followed by a semicolon. This forces the expression to get the identifier characters before the semicolon, ie. if there are more than two words, it will only get the last word. Therefore it would work with "unsigned int var", capturing "var" correctly. When writing the output, we place some indentation, followed by the desired format, including the escaped newline.
Print the final output.
I don't know if I was clear enough. Feel free to ask for any clarifications.
Hope this helps =)
This should give you a few tips on how inappropriate sed actually is for this sort of task. I couldn't figure out how to do it in one pass and by the time I finished writing the scripts, I noticed you were expecting somewhat different results.
Your problem is better suited for a scripting language and a parsing library. Consider python + pyparsing (here is an example C struct parsing grammar, but you would need something much simpler than that) or perl6's rules.
Still, perhaps this will be of some use if you decide to stick to sed:
pass-one.sh
#!/bin/sed -nf
/^struct/ {
s|^\(struct[^(){]*{\)|\1\n|
s|[^}];|;\n|gp
a \\n
}
/^typedef/ {
h
# create signature
s|.*{\(.*\)} \(.*\);|void init_\2( \2 *var, \1 ) {|
# insert argument list to signature and remove trailing ;
s|\([^;]*\); ) {|\1 ) {|g
s|;|,|g
p
g
# add constructor (further substitutions follow in pass-two)
s|.*{\(.*\)}.*|\1|
s|;|;\n|g
s|\n$||p
a }
a \\n
}
pass-two.sh
#!/bin/sed -f
# fix struct indent
/^struct/ {
:loop1
n
s|^ | |
t loop1
}
# unsigned int name -> var->name = name
/^void init_/{
:loop2
n
s|.* \(.*\);| var->\1 = \1;|
t loop2
}
Usage
$ cat << EOF | ./pass-one.sh | ./pass-two.sh
struct node { int val; struct node *next;};
typedef struct { int numer; int denom;} Rational;
struct node { int val; struct node *next;};
typedef struct { int numer; unsigned int denom;} Rational;
EOF
struct node {
int va;
struct node *nex;
};
void init_Rational( Rational *var, int numer, int denom ) {
var->numer = numer;
var->denom = denom;
}
struct node {
int va;
struct node *nex;
};
void init_Rational( Rational *var, int numer, unsigned int denom ) {
var->numer = numer;
var->denom = denom;
}