Run custom preprocessing step for source files in Xcode build

Run custom preprocessing step for source files in Xcode build - xcode

As part of my build process (in Visual Studio), I use a custom MSBuild task that takes all the input source files, copies them to a secondary location, runs a preprocessing tool over them, and then gives the copies back to MSBuild to continue the build process with. I'm now working on a project for iOS, and I need to be able to do the same thing. It's been a very long time since I've worked with Xcode, so I'm pretty rusty on how I can set up the build process to work the same way I just described.
Specifically, here's the preprocessing that I'm doing:
As with many games and engines, there are often a lot of named resources, events, script symbols, object states, etc. that need to be human readable when represented in source code but for which having to do a full string comparison at runtime would be much too costly. Instead of using a full string, I use a 32-bit StringId integer type to represent these values. My preprocessing tool runs through the source code and replaces all instances of a macro in the form SID('some-named-identifier') with the 32-bit hash of the string inside that macro. During development, programmers and designers can use arbitrary strings as identifiers for whatever they need to be used for. At runtime, comparisons between StringIds are simple integer comparisons and since they are the hashed versions of the actual strings, there are no strings stored in the compiled binary that could be extracted.
Additionally, when preprocessing the SID macros, I populate a MySQL database with the strings and their hashed values. This lets me do a reverse lookup at runtime in order to print the human-readable strings while debugging. It's a great system, and I'd love to get it working in Xcode as well!
Thanks in advance!

Related

How does a bytecode interpreter know what line a runtime error occurred on?

As of now, I am working on a language that compiles to bytecode, and then is ran by a VM. My question is, when a runtime error occurs, how does the VM know what line of the source code caused the error, as all whitespace is removed during the compilation process. One thing I would think of is to store a separate array of integers correlating to the bytecode with the line numbers within it, but that sounds extremely memory-inefficient, especially when there are a lot of instructions.

Some forms of bytecode contain information about line numbers, method names, etc. which are included to provide better debugging information. In the JVM, for example, method bytecode contains a table that maps ranges of bytecode addresses to source line numbers. That’s a more efficient way of storing it than tagging each bytecode operation with a line number, since there are typically multiple operations per line. It does use extra space, though I wouldn’t classify it as extremely inefficient.
Absent this info, there really isn’t a way for the interpreter to report anything about the original program, since as you’ve noted all that information is otherwise discarded.
This is similar to how compiled executables handle debug info. With debug symbols included, the program has tables mapping code addresses to function names and line numbers. With symbols stripped out, you just have raw instructions and data and there’s no way to reference the original code.

Eclipse CLP: maximum number of constraints/variables

In the Eclipse CLP, how many constraints or variables can I define?
I am currently remodeling my scheduling problem - I need to replace a single alldifferent constraint with many atmost constraints. But since I've introduced this change, my ecl script is not working. By "not working" I mean the Eclipse CLP - eclipse.exe or the TkEclipse GUI just shuts down. Without any error message,comment or saying goodbye. Just nothing.
If I try to comment-out some constraints, the script at least gets compiled.
Has someone already bothered with this issue?

There is no specific limit on the number or variables or constraints.
But you were working with large, generated source files where clauses had thousands of subgoals. Because ECLiPSe uses a recursive descent parser, such files can cause an OS stack overflow, in particular on Windows. You could either increase the Windows stack limit, or you could break your generated code into smaller clauses, and call these in conjunction.
Generally, however, generating textual source code isn't such a great idea: it must be created, written, read, parsed, compiled, and is then executed just once. Consider instead generating a pure data file that contains only things like arrays/lists of numbers, but no variables. You can then have a generic ECLiPSe program that reads these data and uses them to create variables and constraints, usually in several loops.
For a very simple example, compare https://eclipseclp.org/examples/transport1.pl.txt (where all the data is explicit in the flat model) with
https://eclipseclp.org/examples/transport_arr.pl.txt where the model is generic and all data comes from the data/3 fact at the end (this would correspond to the generated data file).

Mass Compile Cobol code in IBM i (aka OS/400, iSeries)

I am new to COBOL and AS/400 IBM iSeries world, and am interested in best practices used in this community.
How do people perform batch compilation of all COBOL members, or multiple members at once?
I see that AS/400 source files are organized in libraries, file
objects, and members. How do you detect which file members are COBOL src
code to be compiled? Can you get the member file type (which should be CBL
in this case) via some qshell command?
Thanks,
Akku

Common PDM manual method:
Probably the simplest and most widely used method would be to use PDM (Program Development Manager) in a "green-screen" (5250-emulation) session. This allows you to manually select every program you wish to compile. It may not be quite the answer you were looking for, but it may be the most widely used, due to its simple approach, and leaving decisions in the developer's hands.
People commonly use the Start PDM command STRPDM, which provides a menu where you can select an option to work with lists of Libraries, Objects (Files), or Members. (Personally, I prefer to use the corresponding commands directly, WRKLIBPDM, WRKOBJPDM, or WRKMBRPDM.) At each of these levels you can filter the list by pressing F17 (shift F5).
F18 (shift F6) allows you to set the option to Compile in Batch. This means that each individual compile will be submitted to a job queue, to compile in its own job. You can also specify which job description you would like to use, which will determine which job queue the jobs are placed on. Some job queues may be single threaded, while others might run multiple jobs at once. You can custom-define your own PDM options with F16.
If you chose to start at the library level, you can enter option 12 next to each library you wish to work with its objects (source files).
At the object level, you would want to view only objects of type *FILE, and attribute 'PF-SRC' (or concievably 'PF38-SRC'). You can then enter option 12 beside any source file you wish to work with its members.
At the member level, you might want to filter to type *CBL* because (depending on how things have been done on your system) COBOL members could be CBL, CBLLE, SQLCBL, SQLCBLE, or even System/38 or /36 variants. Type option 14 (or a custom-defined option) next to each member you wish to compile. You can repeat an option down the list with F13 (shift F1).
This method uses manual selection, and does not automatically select ALL of your COBOL programs to be compliled. But it does allow you to submit large numbers of compiles at a time, and uses programmer discretion to determine which members to select, and what options to use.

Many (if not most) developers on IBM i are generally not very familiar with qshell. Most of us write automation scripts in CL. A few renegades like myself may also use REXX, but sadly this is rare. It's not too often that we would want to re-compile all programs in the system. Generally we only compile programs that we are working with, or select only those affected by some file change.
Compiling everything might not be a simple problem. Remember some libraries or source files might simply be archival copies of old source, which you might not really want to compile, or that might not compile successfully anymore. You would want to distinguish which members are COBOL copybooks, rather than programs. With ILE, you would want to distinguish which members should be compiled as programs, modules, or service programs. You may need to compile modules before compiling programs that bind with them. Those modules might not necessarily have been written in COBOL, or COBOL modules might be bound into ILE programs in other languages, perhaps CL or RPG.
So how would a full system recompile be automated in a CL program? You could get a list of all source files on they system with DSPOBJD *ALL/*ALL *FILE OUTPUT(*FILE) OUTFILE( ___ ). The output file contains a file attribute column to distinguish which objects are source files. Your CL program could read this, and for each source file, it could generate a file of detailed member information with DSPFD &lib/&file TYPE(*MBR) OUTPUT(*FILE) OUTFILE( ___ ). That file contains member type information, which could help you determine which members were COBOL. From there you could RTVOBJD to figure out whether it was a program, module, and/or service program.
You may also need to know the options for how individual programs, modules, or service programs were compiled. I often solve this by creating a source file, which I generally call BUILD, with a member for each object that needs special handling. This member could be CL, but I often use REXX. In fact I might be tempted to do the whole thing in REXX for its power as a dynamic interpreted language. But that's just me.

Are there any good reference implementations available for command line implementations for embedded systems?

I am aware that this is nothing new and has been done several times. But I am looking for some reference implementation (or even just reference design) as a "best practices guide". We have a real-time embedded environment and the idea is to be able to use a "debug shell" in order to invoke some commands. Example: "SomeDevice print reg xyz" will request the SomeDevice sub-system to print the value of the register named xyz.

I have a small set of routines that is essentially made up of 3 functions and a lookup table:
a function that gathers a command line - it's simple; there's no command line history or anything, just the ability to backspace or press escape to discard the whole thing. But if I thought fancier editing capabilities were needed, it wouldn't be too hard to add them here.
a function that parses a line of text argc/argv style (see Parse string into argv/argc for some ideas on this)
a function that takes the first arg on the parsed command line and looks it up in a table of commands & function pointers to determine which function to call for the command, so the command handlers just need to match the prototype:
int command_handler( int argc, char* argv[]);
Then that function is called with the appropriate argc/argv parameters.
Actually, the lookup table also has pointers to basic help text for each command, and if the command is followed by '-?' or '/?' that bit of help text is displayed. Also, if 'help' is used for a command, the command table is dumped (possible only a subset if a parameter is passed to the 'help' command).
Sorry, I can't post the actual source - but it's pretty simple and straight forward to implement, and functional enough for pretty much all the command line handling needs I've had for embedded systems development.

You might bristle at this response, but many years ago we did something like this for a large-scale embedded telecom system using lex/yacc (nowadays I guess it would be flex/bison, this was literally 20 years ago).
Define your grammar, define ranges for parameters, etc... and then let lex/yacc generate the code.
There is a bit of a learning curve, as opposed to rolling a 1-off custom implementation, but then you can extend the grammar, add new commands & parameters, change ranges, etc... extremely quickly.

You could check out libcli. It emulates Cisco's CLI and apparently also includes a telnet server. That might be more than you are looking for, but it might still be useful as a reference.

If your needs are quite basic, a debug menu which accepts simple keystrokes, rather than a command shell, is one way of doing this.
For registers and RAM, you could have a sub-menu which just does a memory dump on demand.
Likewise, to enable or disable individual features, you can control them via keystrokes from the main menu or sub-menus.
One way of implementing this is via a simple state machine. Each screen has a corresponding state which waits for a keystroke, and then changes state and/or updates the screen as required.

vxWorks includes a command shell, that embeds the symbol table and implements a C expression evaluator so that you can call functions, evaluate expressions, and access global symbols at runtime. The expression evaluator supports integer and string constants.
When I worked on a project that migrated from vxWorks to embOS, I implemented the same functionality. Embedding the symbol table required a bit of gymnastics since it does not exist until after linking. I used a post-build step to parse the output of the GNU nm tool for create a symbol table as a separate load module. In an earlier version I did not embed the symbol table at all, but rather created a host-shell program that ran on the development host where the symbol table resided, and communicated with a debug stub on the target that could perform function calls to arbitrary addresses and read/write arbitrary memory. This approach is better suited to memory constrained devices, but you have to be careful that the symbol table you are using and the code on the target are for the same build. Again that was an idea I borrowed from vxWorks, which supports both teh target and host based shell with the same functionality. For the host shell vxWorks checksums the code to ensure the symbol table matches; in my case it was a manual (and error prone) process, which is why I implemented the embedded symbol table.
Although initially I only implemented memory read/write and function call capability I later added an expression evaluator based on the algorithm (but not the code) described here. Then after that I added simple scripting capabilities in the form of if-else, while, and procedure call constructs (using a very simple non-C syntax). So if you wanted new functionality or test, you could either write a new function, or create a script (if performance was not an issue), so the functions were rather like 'built-ins' to the scripting language.
To perform the arbitrary function calls, I used a function pointer typedef that took an arbitrarily large (24) number of arguments, then using the symbol table, you find the function address, cast it to the function pointer type, and pass it the real arguments, plus enough dummy arguments to make up the expected number and thus create a suitable (if wasteful) maintain stack frame.
On other systems I have implemented a Forth threaded interpreter, which is a very simple language to implement, but has a less than user friendly syntax perhaps. You could equally embed an existing solution such as Lua or Ch.

For a small lightweight thing you could use forth. Its easy to get going ( forth kernels are SMALL)
look at figForth, LINa and GnuForth.
Disclaimer: I don't Forth, but openboot and the PCI bus do, and I;ve used them and they work really well.
Alternative UI's
Deploy a web sever on your embedded device instead. Even serial will work with SLIP and the UI can be reasonably complex ( or even serve up a JAR and get really really complex.
If you really need a CLI, then you can point at a link and get a telnet.

One alternative is to use a very simple binary protocol to transfer the data you need, and then make a user interface on the PC, using e.g. Python or whatever is your favourite development tool.
The advantage is that it minimises the code in the embedded device, and shifts as much of it as possible to the PC side. That's good because:
It uses up less embedded code space—much of the code is on the PC instead.
In many cases it's easier to develop a given functionality on the PC, with the PC's greater tools and resources.
It gives you more interface options. You can use just a command line interface if you want. Or, you could go for a GUI, with graphs, data logging, whatever fancy stuff you might want.
It gives you flexibility. Embedded code is harder to upgrade than PC code. You can change and improve your PC-based tool whenever you want, without having to make any changes to the embedded device.
If you want to look at variables—If your PC tool is able to read the ELF file generated by the linker, then it can find out a variable's location from the symbol table. Even better, read the DWARF debug data and know the variable's type as well. Then all you need is a "read-memory" protocol message on the embedded device to get the data, and the PC does the decoding and displaying.

Accurately accessing VB6 limitations

As antiquated and painful as it is - I work at a company that continues to actively use VB6 for a large project. In fact, 18 months ago we came up against the 32k identifier limit.
Not willing to give up on the large code base and rewrite everything in .NET we broke our application into a main executable and several supporting DLL files. This week we ran into the 32k limit again.
The problem we have is that no tool we can find will tell us how many unique identifiers our source is using. We have no accurate way to gauge how our efforts are reducing the number of identifiers or how close we are to the limit before we reach it.
Does anyone know of a tool that will scan the source for a project and return some accurate metrics and statistics?

OK. The Project Metrics Viewer which is part of the Project Analyzer tool from Aivosto will do exactly what you want. I've included a screenshot and also the link to the metrics list which includes numbers of variables etc.
Metrics List
(source: aivosto.com)

The company I work for also has a large VB6 project that encountered the identifier limit. I developed a way to accurately count the number of identifiers remaining, and this has been incorporated into our build process for this project.
After trying several tools without success, I finally realized that the VB6 IDE itself knows exactly how many identifiers it has remaining. In fact, the VB6 IDE throws an "out of memory" error when you add one variable past its limit.
Taking advantage of this fact, I wrote a VB6 Add-In project that first compiles the currently loaded project in the IDE, then adds uniquely named variables to the project until it throws an error. When an error is raised, it records the number of identifiers added before the error as the number of identifiers remaining.
This number is stored in file in a location known to our automated build process, which then reads this number and reports it to the development team. When it gets below a value we feel comfortable with, we schedule some refactoring time and move more code out of this project into DLL projects. We have been using this in production for several years now, and has proven to be a reliable process.
To directly answer the question, using an Add-In is the only way I know to accurately measure the number of remaining identifiers. While I cannot share the Add-In code our project is using, I can say there is not much code involved, and it did not take long to develop.
Microsoft has a decent guide for how to create an Add-In, which can get you started:
https://support.microsoft.com/en-us/kb/189468
Here are some important details specific to counting identifiers:
The VB6 IDE will not consistently throw an error when out of identifiers until the current loaded project has been compiled. Our Add-In programmatically does this before adding identifiers to guarantee an accurate count. If the project cannot be compiled, then an accurate count cannot be obtained.
There are 32,500 identifiers available to a new, empty VB6 project.
Only unique identifier names count. Two local variables with the same name in two different routines only count as one identifier.

CodeSmart by AxTools is very good.
(source: axtools.com)

Cheat - create an unused class with #### unique variables in it. Use Excel or something to generate the alphabetical unique variable names. Remove the class from the project when you hit the limit, or comment out blocks of 100 unique variables..
I'd rather lean on the compiler (which defines how many variables are too many) than on some 3rd party tool anyway.
(oh crud, sorry to necro - didn't notice the dates)

You could get this from a tool that extracted identifiers from VB6 code. Then all you'd have to do is sort the list, eliminate duplicates, and measure the list size. We have a source code search engine that breaks up source code into language tokens ("lexes"), with some of those tokens being exactly those identifiers. That would contain exactly the data you want.
But maybe there's another way to solve your problem: find out which variable names which occur rarely and replace them by a set of standard names (e.g., "temp"). So what you really want is a count of the number of each variable name so you can sort for "small numbers of references". The same lexer data can provide this information.
Then all you need is a tool to rename low-occurrence identifiers to something from the standard set. We offer obfuscators that replace one name by another that could probably do this.
[Oct 2014 update]. Just had a long conversation with somebody with this problem. It turns out there's a pretty conceptual answer on which to base a tool, and that is called register coloring, which allocates a fixed number of registers to an arbitrary number of operands. This works by computing an "interference graph" over operands; and two operands that don't "interfere" can be assigned the same register. One could use that to allocate 2^16 available variable names names to an arbitrary number of identifiers, if the interference graph isn't bad enough. My guess is that it is not. YMMV, and somebody still has to build such a tool, needing likely a VB6 parser and machinery to compute such a graph. [Check out my bio].

It seems that Compuware's DevPartner had that kind of code analysis. I don't know if the current version still supports Visual Basic 6.0. (But at least there's a 14-day trial available)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio