Removing occurences of #ifdef/#endif from a file with perl - xcode

I have a code file that has some #ifdefs I would like removed in the header file after building a library. My first thought was to do this as a perl script that XCode can run. While I can certainly open the header file and read all content of it into a string in perl, I'm curious as to the best way to do the following
Find any occurrence of #ifdef EXAMPLE
Remove it and anything in between the following #endif
So the example is:
int i;
NSString *someString;
#ifdef EXAMPLE
NSString *exampleString;
#endif
bool done;
and the output would be:
int i;
NSString *someString;
bool done;
Options I'm considering:
finding index of every #ifdef EXAMPLE and removing it via substring with the next found #endif
Write a regex that can somehow remove these occurences.
Considering I haven't written Perl before (Objective-C is my primary language) I was curious if any XCode or Perl developers had any suggestions on what the best approach would be

I'm not sure why you want to strip out ifdefs, and you can probably use a C pre-processor to do this, but here's how you'd do it in Perl because it means I get to play with the flip-flop operator.
First thing is to craft a sufficient regex to match the ifdefs. IIRC they can be indented and there can be indentation between the # and the word.
#ifdef
# ifdef
#ifdef
Not sure if that last one is valid, but I'm going with it anyway.
my $ifdef_re = qr{^\s*#\s*ifdef\b};
my $endif_re = qr{^\s*#\s*endif\b};
If it was just removing text between #ifdef and #endif, Perl has the little used flip flop scalar .. operator.
#!/usr/bin/env perl
use strict;
use warnings;
my $ifdef_re = qr{^\s*#\s*ifdef\b};
my $endif_re = qr{^\s*#\s*endif\b};
while(<DATA>) {
my $in_ifdef = /$ifdef_re/ .. /$endif_re/;
print if !$in_ifdef;
}
__DATA__
int i;
NSString *someString;
#ifdef EXAMPLE
NSString *exampleString;
#endif
bool done;
But since we need to worry about nested ifdefs, its insufficient. A depth counter takes care of that.
#!/usr/bin/env perl
use strict;
use warnings;
my $ifdef_re = qr{^\s*#\s*ifdef\b};
my $endif_re = qr{^\s*#\s*endif\b};
my $ifdef_count = 0;
while(<DATA>) {
$ifdef_count++ if /$ifdef_re/;
print if $ifdef_count <= 0;
$ifdef_count-- if /$endif_re/;
}
__DATA__
int i;
NSString *someString;
#ifdef EXAMPLE
NSString *exampleString;
# ifdef FOO
this should not appear
# endif
nor should this
#endif
bool done;

I love regexes, but for this problem I wouldn't use a regex, I'd just read line by line, keeping track of whether I was inside a ifdef:
my $nesting = 0;
while (<STDIN>)
{
$nesting += 1 if /^#ifdef/;
print $_ unless $nesting;
$nesting -= 1 if /^#endif/;
}
If you really want to use a regex, and have read the whole file into the variable $source, I think this will work, if you don't need to worry about nesting:
$source =~ s/^#ifdef.*?^#endif.*?$//gms;
The ^ characters anchor those parts of the expression to the beginning of a line. The $ makes the last part of the match only happen at the end of a line.
The .*? behaves almost like .*, which matches zero or more characters, except that it does minimal matching. So instead of matching all the way to the last #endif, it matches to the first one.
The /gms at the end makes it:
Substitute every occurrence, not just one (that's the g)
Make ^ and $ match at line boundaries, not just string boundaries (the m)
Make . match newlines (the s)
You might want to follow every #ifdef and #endif with \s, to only match if there is whitespace following that string.

I'd just do this with unifdef. XCode installs this by default:
-U will remove #ifdef and matching #else/#endif as if <constant> is undefined.
-D will remove #ifdef and matching #else/#endif as if <constant> is defined.
Here's an example:
$ cat test.h
#ifdef TEST
#ifdef DEBUG
# define AWESOME_DEBUG_LEVEL 1
#else
# define AWESOME_DEBUG_LEVEL 0
#endif
#endif
$ unifdef -U DEBUG test.h
#ifdef TEST
# define AWESOME_DEBUG_LEVEL 0
#endif
$ unifdef -U DEBUG -D TEST test.h
# define AWESOME_DEBUG_LEVEL 0

Related

Sed function (Bash) - Changing way of commented lines

I have such task to do but I have no idea how to write it with sed function.
I have to change the way on commenting in a file from:
//something6
//something4
//something5
//something3
//something2
to
/*something6
* something4
* something5
* something3
* something2*/
from
//something6
//something4
//something5
//something3
//something2
to
/*something6
something4
something5
something3
something2*/
from
/*something6
* something4
* something5
* something3
* something2*/
to
//something6
//something4
//something5
//something3
//something2
from
/*something6
something4
something5
something3
something2*/
to
//something6
//something4
//something5
//something3
//something2
Those 4 patterns must be made by sed function (I guess but not sure about that).
Tried doing it but without luck. I can replace single words to other ones but how to change the way of commenting? No clue. Would be very gratefull for help and assisstance.
Given that the task is:
Please write a script that allows to change style of comments in source files for example : /* .... */ goes to // .... The style of comment is an argument of the script.
I have tried to use just typical:
sed -i 's/'"$lookingfor"'/'"$changing"'/g' $filename
In this context, either $lookingfor or $changing or both will contain slashes, so that simple formulation doesn't work, as you correctly observe.
The conversion of // comments to /* comments is easy as long as you know that you can choose an arbitrary character to separate the sections of the s/// command, such as %. So, for example, you could use:
sed -i.bak -e 's%// *\(.*\)%/*\1 */%'
This looks for a double-slash followed by zero or more spaces and anything and converts it to /* anything */.
The conversion of /* comments is much harder. There are two cases to be concerned about:
/* A single line comment */
/*
** A multiline comment
*/
That's before you get into:
/* OK */ "/* OK */" /* Really?! */
which is a single line containing two comments and a string containing text that outside a string would look like a comment. This I am studiously ignoring! Or, more accurately, I am studiously deciding that it will be OK when converted to:
// OK */ "/* OK */" /* Really?!
which isn't the same at all, but serves you right for writing convoluted C in the first place.
You can deal with the first case with something like:
sed -e '\%/\*\(.*\)\*/% { s%%//\1%; n; }'
I have the grouping braces and the n command in there so that single line comments don't also match the second case:
-e '\%/\*%,\%\*/% {
\%/\*% { s%/\*\(.*\)%//\1%; n; }
\%\*/% { s%\(.*\)\*/%//\1%; n; }
s%^\( *\)%\1//%
}'
The first line selects a range of lines between one matching /* and the next matching */. The \% tells sed to use the % instead of / as the search delimiter. There are three operations within the outer grouping { … }:
Convert /*anything into //anything and start on the next line.
Convert anything*/ into //anything and start on the next line.
Convert any other line so that it preserves leading blanks but puts // after them.
This is still ridiculously easy to subvert if the comments are maliciously formed. For example:
/* a comment */ int x = 0;
is mapped to:
// a comment int x = 0;
Fixing problems like that, and the example with a string, is something I'd not even start trying in sed. And that's before you get onto the legal but implausible C comments, like:
/\
\
* comment
*\
\
/
/\
/\
noisiness \
commentary \
continued
Which contains just two comments (but does contain two comments!). And before you decide to deal with trigraphs (??/ is a backslash). Etc.
So, a moderate approximation to a C to C++ comment conversion is:
sed -e '\%/\*\(.*\)\*/% { s%%//\1%; n; }' \
-e '\%/\*%,\%\*/% {
\%/\*% { s%/\*\(.*\)%//\1%; n; }
\%\*/% { s%\(.*\)\*/%//\1%; n; }
s%^\( *\)%\1//%
}' \
-i.bak "$#"
I'm assuming you aren't using a C shell; if you are, you need more backslashes at the ends of the lines in the script so that the multi-line single-quoted sed command is treated correctly.

How to make a syntax manipulator?

Firstly, sorry for the question, I know I've heard something that could help, but I just can't remember.
Basically I would like to create my own syntax for a programming language. For example this code:
WRITE OUT 'Hello World!'
NEW LINE
would turn into this Java code:
System.out.print("Hello World!");
System.out.println();
How could I achieve this? Is there a method?
Olá.
There are techniques and proper algorithms to do that.
Search for "compiler techniques" and "Interpreter pattern".
An initial approach could be a basic pattern interpreter.
Assuming simple sentences and only one sentence per line, you could read the input file line by line and search for defined patterns (regular expressions).
The patterns describe the structure of the commands in your invented language.
If you get a match then you do the translation.
In particular, we use the regex.h library in c to perform the regular expression search.
Of course regex is also available in java.
Ex. NEW LINE match the pattern " *NEW +LINE *"
The * means that the preceding character occurs 0 or more times.
The + means that the preceding character occurs 1 or more times.
Thus, this pattern can match the command " NEW LINE " with arbitrary spaces between the words.
Ex. WRITE OUT 'Hello World!' match the pattern "WRITE OUT '([[:print:]]*)'"
or if you want to allow spaces " *WRITE +OUT +'([[:print:]]*)' *"
[[:print:]] means: match one printable character (ex. 'a' or 'Z' or '0' or '+')
Thus, [[:print:]]* match a sequence of 0, 1 or more printable characters
If a line of your input file matched the pattern of some command then you can do the translation, but in most cases you will need to retrieve some information before,
ex. the arbitrary text after WRITE OUT. Thats why you need to put parenthesis around [[:print:]]*. That will indicate to the function that perform the search that you want retrieve that particular part of your pattern.
A nice coincidence is that I recently assisted a friend with an college project similar to the problem you want to solve: a translator from c to basic. I reused that code to make an example for you.
I tested the code and it works.
It can translate:
WRITE OUT 'some text'
WRITE OUT variable
NEW LINE
#include <stdio.h>
#include <stdlib.h>
#include <regex.h>
#include <string.h>
#define STR_SHORT 100
#define MATCHES_SIZE 10
/**************************************************************
Returns the string of a match
**************************************************************/
char * GetExp(char *Source, char *Destination, regmatch_t Matches) {
//Source The string that was searched
//Destination Will contains the matched string
//Matches One element of the vector passed to regexec
int Length = Matches.rm_eo - Matches.rm_so;
strncpy(Destination, Source+Matches.rm_so, Length);
Destination[Length]=0;
return Destination;
}
/**************************************************************
MAIN
**************************************************************/
int main(int argc, char *argv[]) {
//Usage
if (argc==1) {
printf("Usage:\n");
printf("interpreter source_file\n");
printf("\n");
printf("Implements a very basic interpreter\n");
return 0;
}
//Open the source file
FILE *SourceFile;
if ( (SourceFile=fopen(argv[1], "r"))==NULL )
return 1;
//This variable is used to get the strings that matched the pattern
//Matches[0] -> the whole string being searched
//Matches[1] -> first parenthetical
//Matches[2] -> second parenthetical
regmatch_t Matches[MATCHES_SIZE];
char MatchedStr[STR_SHORT];
//Regular expression for NEW LINE
regex_t Regex_NewLine;
regcomp(&Regex_NewLine, " *NEW +LINE *", REG_EXTENDED);
//Regular expression for WRITE OUT 'some text'
regex_t Regex_WriteOutStr;
regcomp(&Regex_WriteOutStr, " *WRITE +OUT +'([[:print:]]*)' *", REG_EXTENDED);
//Regular expresion for WRITE OUT variable
regex_t Regex_WriteOutVar;
regcomp(&Regex_WriteOutVar, " *WRITE +OUT +([_[:alpha:]][[:alnum:]]*) *", REG_EXTENDED);
//Regular expression for an empty line'
regex_t Regex_EmptyLine;
regcomp(&Regex_EmptyLine, "^([[:space:]]+)$", REG_EXTENDED);
//Now we read the file line by line
char Buffer[STR_SHORT];
while( fgets(Buffer, STR_SHORT, SourceFile)!=NULL ) {
//printf("%s", Buffer);
//Shorcut for an empty line
if ( regexec(&Regex_EmptyLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("\n");
continue;
}
//NEW LINE
if ( regexec(&Regex_NewLine, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.println();\n");
continue;
}
//WRITE OUT 'some text'
if ( regexec(&Regex_WriteOutStr, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%s\");\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//WRITE OUT variable
//Assumes variable is a string variable
if ( regexec(&Regex_WriteOutVar, Buffer, MATCHES_SIZE, Matches, 0)==0 ) {
printf("System.out.print(\"%%s\", %s);\n", GetExp(Buffer, MatchedStr, Matches[1]));
continue;
}
//Unknown command
printf("Unknown command: %s", Buffer);
}
return 0;
}
Proper solution for this question requires the following steps:
Parse the original syntax code and create a syntax tree.
That is commonly done with tools like ANTLR.
Go through the syntax tree and either convert it to Java code, or to a Java syntax tree.
Both of those steps have their own complexity, so it would be better to ask separate questions about specific issues you encounter while implementing them.
Strictly speaking you can skip step 2 and generate Java directly when parsing, but unless your language is very simple renaming of Java concepts, you wouldn't be able to do that easily.

How to make "echo"ed line editable in an interactive shell script?

I have the following problem: in an interactive script, when asking for input, I want to display a suggestion and make it editable. It is a similar functionality to "arrow up and edit the last command" in a command prompt, except without the "arrow up". I tried several different things but no success so far.
These are the things I tried:
1) Get input from editor, like so:
echo "$SUGGESTION\c"
INPUT=`ed -` # problem with this approach is that 'ed' starts in command mode
# by default, and I would need input mode
2) Use read -e
echo "$SUGGESTION\c"
read -e INPUT # doesn't work as advertised
After extensive Googling I am convinced that the 2) should work, but it doesn't. First of all, I cannot delete the $SUGGESTION without typing some input first; after some characters are typed, backspace deletes the whole line, not just one character.
So my question is: how to make "read -e" work or is there another approach to solve this? Your help is very much appreciated!
It does work as advertised, but you need an extra parameter to do what you want:
read -e -i "$SUGGESTION" INPUT
Unfortunately, that's only available in Bash 4.
If you have a C compiler and readline available, here's a quick hack that you could use. Save the following to myread.c (or whatever) and compile it (you'll need to link with readline). For GCC, that would be: gcc -o myread myread.c -lreadline.
#include <stdio.h>
#include <readline/readline.h>
int main(int argc, char **argv)
{
if (argc != 2)
return 1;
// stuff the input buffer with the default value
char *def = argv[1];
while (*def) {
rl_stuff_char(*def);
def++;
}
// let the user edit
char *input = readline(0);
if (!input)
return 1;
// write out the result to standard error
fprintf(stderr, "%s", input);
return 0;
}
You can use it like this:
myread "$SUGGESTION" 2> some_temp_file
if [ $? -eq 0 ] ; then
# some_temp_file contains the edited value
fi
Lots of room for improvement, but I guess it's a start.

What does the double "at" (##) symbol do in a Makefile?

I have often seen Makefiles that start commands with an "#" symbol to suppress normal output.
target: test
#echo foo
Has this output:
$ make test
foo
But I often encounter Makefiles with ## in front of commands:
target: test
##echo foo
And the output is identical, as far as I can tell, from Makefiles with only one # before the echo command.
What's the difference?
(The ## seems to be common practice, as seen by this Google Code search: http://www.google.com/codesearch#search/&q=##echo%20makefile&type=cs)
In OpusMake, ## means, "really, really quiet". It causes OpusMake to suppress printing the commands even when invoked as make -n. Probably somebody, somewhere, had some familiarity with that feature, wrote their makefiles to use it, somebody else saw it and copied it, and since it doesn't break other make variants (at least, not GNU make), it just stuck around.
Looking at the code, it seems that it just strips all the leading # (or +/-), but I'm not 100% sure (that is, you can put there as many # as you wish) - look at job.c in make source code.
while (*p != '\0')
{
if (*p == '#')
flags |= COMMANDS_SILENT;
else if (*p == '+')
flags |= COMMANDS_RECURSE;
else if (*p == '-')
child->noerror = 1;
else if (!isblank ((unsigned char)*p))
break;
++p;
}

Why is String#scan not finding all the matches?

Let's use this as sample data :
text=<<EOF
#if A==20
int b = 20;
#else
int c = 30;
#endif
And this code :
puts text.scan(/\#.*?\#/m)
Why is this only capturing this:
#if A==20
int b = 20;
#
I was expecting this to match as well:
#else
int c = 30;
#
What do I have to modify so that it captures that as well? I used /m for multiline matching, but it doesn't seem to work.
It doesn't match the second part, because the "#" before the else has already been consumed, so all that's left ist
else
int c = 30;
#
which does not match the pattern. You can fix this by using lookahead to match the second # without consuming it:
text.scan(/#.*?(?=#)/m)
Second # in your input was already matched by the first substring scan found. From there, it proceeds to scan the remaining part of the string, which is:
else
int c = 30;
#endif
which of course doesn't contain anything to match your regex anymore.
.*? finds the shortest match. Try just .* instead.

Resources