In the Eclipse CLP, how many constraints or variables can I define?
I am currently remodeling my scheduling problem - I need to replace a single alldifferent constraint with many atmost constraints. But since I've introduced this change, my ecl script is not working. By "not working" I mean the Eclipse CLP - eclipse.exe or the TkEclipse GUI just shuts down. Without any error message,comment or saying goodbye. Just nothing.
If I try to comment-out some constraints, the script at least gets compiled.
Has someone already bothered with this issue?
There is no specific limit on the number or variables or constraints.
But you were working with large, generated source files where clauses had thousands of subgoals. Because ECLiPSe uses a recursive descent parser, such files can cause an OS stack overflow, in particular on Windows. You could either increase the Windows stack limit, or you could break your generated code into smaller clauses, and call these in conjunction.
Generally, however, generating textual source code isn't such a great idea: it must be created, written, read, parsed, compiled, and is then executed just once. Consider instead generating a pure data file that contains only things like arrays/lists of numbers, but no variables. You can then have a generic ECLiPSe program that reads these data and uses them to create variables and constraints, usually in several loops.
For a very simple example, compare https://eclipseclp.org/examples/transport1.pl.txt (where all the data is explicit in the flat model) with
https://eclipseclp.org/examples/transport_arr.pl.txt where the model is generic and all data comes from the data/3 fact at the end (this would correspond to the generated data file).
Related
As of now, I am working on a language that compiles to bytecode, and then is ran by a VM. My question is, when a runtime error occurs, how does the VM know what line of the source code caused the error, as all whitespace is removed during the compilation process. One thing I would think of is to store a separate array of integers correlating to the bytecode with the line numbers within it, but that sounds extremely memory-inefficient, especially when there are a lot of instructions.
Some forms of bytecode contain information about line numbers, method names, etc. which are included to provide better debugging information. In the JVM, for example, method bytecode contains a table that maps ranges of bytecode addresses to source line numbers. That’s a more efficient way of storing it than tagging each bytecode operation with a line number, since there are typically multiple operations per line. It does use extra space, though I wouldn’t classify it as extremely inefficient.
Absent this info, there really isn’t a way for the interpreter to report anything about the original program, since as you’ve noted all that information is otherwise discarded.
This is similar to how compiled executables handle debug info. With debug symbols included, the program has tables mapping code addresses to function names and line numbers. With symbols stripped out, you just have raw instructions and data and there’s no way to reference the original code.
I'm trying to find a way to figure out in IDA which exports are data exports and which are real functions export.
For example, let's have a look at Microsoft's msftedit.dll's export entries:
While CreateTextServices is a real exported function:
IID_IRichEditOle is a data export and IDA fails to realize that, interpeting data as code:
Do someone know a reliable way to distinguish the two? Help will be much appreciated.
Thanks in advance.
There is no perfectly reliable way to do this for every export.
Each export only specifies an offset within the executable file -- logically, it could be treated as code or as data by any other code that references it.
As you mentioned, you could come up with heuristics to detect the type of the export in almost all of the cases, but it would be easy to come up with counterexamples that do not work for any given heuristic. Take, for instance, the rule you proposed:
The exported entry will be considered a valid exported function if there is a ret instruction in the function, and there are more than <min> valid instructions, and IDA recognizes the function's calling convention.
False negatives: You might have a function that uses tail call optimization and ends with jmp instructions rather than ret instructions. Any short function would also fail. And there are several ways that IDA can be confused into not treating the code as a function.
False positives: There could be a string in memory followed closely by a C3 or C2 like db 'BACKGAMMON0',0,0C3h -- this could logically disassemble as a valid 11-instruction function with a ret and no arguments.
The lines are blurred even further when you consider that an export could be logically treated as both code and data: Imagine that a byte sequence at an export is copied into dynamically allocated memory -- potentially even in another process -- where it is later executed as code.
Perhaps a reasonable suggestion would be to just trust IDA and treat the export as code if IDA thinks it's code. A large part of IDA's functionality is automatically guessing the logical types of data, and it's normally pretty good at it. As you've shown, sometimes it's wrong. But you can't get 100% accuracy anyway. The best you can do is balance between false negatives and false positives.
Proof of this problem's undecidability:
Whether or not an export will be executed as code is undecidable. Whether or not an export will be read as data is also undecidable. Since we cannot guarantee that either is true, distinguishing between seemingly ambiguous cases is impossible.
Proof: Assume that we have an oracle A(P,I,E) which returns 1 if program P (including all of its dependencies) executes (or reads from) export E (from any DLL loaded in the course of P's execution) with "input" (external state) I. Otherwise, it returns 0.
Let us construct a minimal program Z(P,I,E) which executes (or reads from) export E (the DLL for which is loaded into the address space) if and only if A(P,I,E) returns 0.
Now consider the result of Z(Z,I,E):
If Z(Z,I,E) executes (or reads from) export E, then A(Z,I,E) would return 1. But Z(Z,I,E) is defined to not access export E unless A(Z,I,E) returns 0. This is a contradiction.
If Z(Z,I,E) does not execute (or read from) export E, then A(Z,I,E) would return 0. But Z(Z,I,E) is defined such that it will access export E when A(Z,I,E) returns 0. This is a contradiction.
Therefore, our initial assumption that oracle A(P,I,E) exists is proven false.
But you can do better through instrumentation...
Depending on the exact problem you're trying to solve, you may be able to determine which exports are valid functions at runtime.
For example, you could write an application which debugs the program you which to analyze and places guard pages on each of the pages that contain exports you wish to hook. This means, whenever a page is access (executed/read/written to), an exception is raised, and the debugger program gains control.
The debugger could check the program context to see what type of access was made and whether it has anything to do with the export. If the access is an attempt to execute an export, it could perform some hooking functionality before returning control to the program. Otherwise, it could just return control to the program.
In either case, the PAGE_GUARD modifier is lifted after each exception, so you'd need to put it back each time.
Unsurprisingly, this would make execution of your program very slow, as any R/W/X access to any of the pages containing an export causes an expensive context switch -- this would likely include the execution of most instructions that are a part of your exported functions, along with several others that have nothing to do with them.
You could take a similar approach with other instrumentation tools, such as Pin.
Note that you may not gain information about the usage of every export through instrumentation. This is because you may need to determine what input/external state is required to cause the program to access each export in order to learn if it is used as code or as data (if at all).
Also note that both execute and read (or even write) accesses could potentially occur on the same exports.
Relevant background info
I've built a little software that can be customized via a config file. The config file is parsed and translated into a nested environment structure (e.g. .HIVE$db = an environment, .HIVE$db$user = "Horst", .HIVE$db$pw = "my password", .HIVE$regex$date = some regex for dates etc.)
I've built routines that can handle those nested environments (e.g. look up value "db/user" or "regex/date", change it etc.). The thing is that the initial parsing of the config files takes a long time and results in quite a big of an object (actually three to four, between 4 and 16 MB). So I thought "No problem, let's just cache them by saving the object(s) to .Rdata files". This works, but "loading" cached objects makes my Rterm process go through the roof with respect to RAM consumption (over 1 GB!!) and I still don't really understand why (this doesn't happen when I "compute" the object all anew, but that's exactly what I'm trying to avoid since it takes too long).
I already thought about maybe serializing it, but I haven't tested it as I would need to refactor my code a bit. Plus I'm not sure if it would affect the "loading back into R" part in just the same way as loading .Rdata files.
Question
Can anyone tell me why loading a previously computed object has such effects on memory consumption of my Rterm process (compared to computing it in every new process I start) and how best to avoid this?
If desired, I will also try to come up with an example, but it's a bit tricky to reproduce my exact scenario. Yet I'll try.
Its likely because the environments you are creating are carrying around their ancestors. If you don't need the ancestor information then set the parents of such environments to emptyenv() (or just don't use environments if you don't need them).
Also note that formulas (and, of course, functions) have environments so watch out for those too.
If it's not reproducible by others, it will be hard to answer. However, I do something quite similar to what you're doing, yet I use JSON files to store all of my values. Rather than parse the text, I use RJSONIO to convert everything to a list, and getting stuff from a list is very easy. (You could, if you want, convert to a hash, but it's nice to have layers of nested parameters.)
See this answer for an example of how I've done this kind of thing. If that works out for you, then you can forego the expensive translation step and the memory ballooning.
(Taking a stab at the original question...) I wonder if your issue is that you are using an environment rather than a list. Saving environments might be tricky in some contexts. Saving lists is no problem. Try using a list or try converting to/from an environment. You can use the as.list() and as.environment() functions for this.
I am aware that this is nothing new and has been done several times. But I am looking for some reference implementation (or even just reference design) as a "best practices guide". We have a real-time embedded environment and the idea is to be able to use a "debug shell" in order to invoke some commands. Example: "SomeDevice print reg xyz" will request the SomeDevice sub-system to print the value of the register named xyz.
I have a small set of routines that is essentially made up of 3 functions and a lookup table:
a function that gathers a command line - it's simple; there's no command line history or anything, just the ability to backspace or press escape to discard the whole thing. But if I thought fancier editing capabilities were needed, it wouldn't be too hard to add them here.
a function that parses a line of text argc/argv style (see Parse string into argv/argc for some ideas on this)
a function that takes the first arg on the parsed command line and looks it up in a table of commands & function pointers to determine which function to call for the command, so the command handlers just need to match the prototype:
int command_handler( int argc, char* argv[]);
Then that function is called with the appropriate argc/argv parameters.
Actually, the lookup table also has pointers to basic help text for each command, and if the command is followed by '-?' or '/?' that bit of help text is displayed. Also, if 'help' is used for a command, the command table is dumped (possible only a subset if a parameter is passed to the 'help' command).
Sorry, I can't post the actual source - but it's pretty simple and straight forward to implement, and functional enough for pretty much all the command line handling needs I've had for embedded systems development.
You might bristle at this response, but many years ago we did something like this for a large-scale embedded telecom system using lex/yacc (nowadays I guess it would be flex/bison, this was literally 20 years ago).
Define your grammar, define ranges for parameters, etc... and then let lex/yacc generate the code.
There is a bit of a learning curve, as opposed to rolling a 1-off custom implementation, but then you can extend the grammar, add new commands & parameters, change ranges, etc... extremely quickly.
You could check out libcli. It emulates Cisco's CLI and apparently also includes a telnet server. That might be more than you are looking for, but it might still be useful as a reference.
If your needs are quite basic, a debug menu which accepts simple keystrokes, rather than a command shell, is one way of doing this.
For registers and RAM, you could have a sub-menu which just does a memory dump on demand.
Likewise, to enable or disable individual features, you can control them via keystrokes from the main menu or sub-menus.
One way of implementing this is via a simple state machine. Each screen has a corresponding state which waits for a keystroke, and then changes state and/or updates the screen as required.
vxWorks includes a command shell, that embeds the symbol table and implements a C expression evaluator so that you can call functions, evaluate expressions, and access global symbols at runtime. The expression evaluator supports integer and string constants.
When I worked on a project that migrated from vxWorks to embOS, I implemented the same functionality. Embedding the symbol table required a bit of gymnastics since it does not exist until after linking. I used a post-build step to parse the output of the GNU nm tool for create a symbol table as a separate load module. In an earlier version I did not embed the symbol table at all, but rather created a host-shell program that ran on the development host where the symbol table resided, and communicated with a debug stub on the target that could perform function calls to arbitrary addresses and read/write arbitrary memory. This approach is better suited to memory constrained devices, but you have to be careful that the symbol table you are using and the code on the target are for the same build. Again that was an idea I borrowed from vxWorks, which supports both teh target and host based shell with the same functionality. For the host shell vxWorks checksums the code to ensure the symbol table matches; in my case it was a manual (and error prone) process, which is why I implemented the embedded symbol table.
Although initially I only implemented memory read/write and function call capability I later added an expression evaluator based on the algorithm (but not the code) described here. Then after that I added simple scripting capabilities in the form of if-else, while, and procedure call constructs (using a very simple non-C syntax). So if you wanted new functionality or test, you could either write a new function, or create a script (if performance was not an issue), so the functions were rather like 'built-ins' to the scripting language.
To perform the arbitrary function calls, I used a function pointer typedef that took an arbitrarily large (24) number of arguments, then using the symbol table, you find the function address, cast it to the function pointer type, and pass it the real arguments, plus enough dummy arguments to make up the expected number and thus create a suitable (if wasteful) maintain stack frame.
On other systems I have implemented a Forth threaded interpreter, which is a very simple language to implement, but has a less than user friendly syntax perhaps. You could equally embed an existing solution such as Lua or Ch.
For a small lightweight thing you could use forth. Its easy to get going ( forth kernels are SMALL)
look at figForth, LINa and GnuForth.
Disclaimer: I don't Forth, but openboot and the PCI bus do, and I;ve used them and they work really well.
Alternative UI's
Deploy a web sever on your embedded device instead. Even serial will work with SLIP and the UI can be reasonably complex ( or even serve up a JAR and get really really complex.
If you really need a CLI, then you can point at a link and get a telnet.
One alternative is to use a very simple binary protocol to transfer the data you need, and then make a user interface on the PC, using e.g. Python or whatever is your favourite development tool.
The advantage is that it minimises the code in the embedded device, and shifts as much of it as possible to the PC side. That's good because:
It uses up less embedded code space—much of the code is on the PC instead.
In many cases it's easier to develop a given functionality on the PC, with the PC's greater tools and resources.
It gives you more interface options. You can use just a command line interface if you want. Or, you could go for a GUI, with graphs, data logging, whatever fancy stuff you might want.
It gives you flexibility. Embedded code is harder to upgrade than PC code. You can change and improve your PC-based tool whenever you want, without having to make any changes to the embedded device.
If you want to look at variables—If your PC tool is able to read the ELF file generated by the linker, then it can find out a variable's location from the symbol table. Even better, read the DWARF debug data and know the variable's type as well. Then all you need is a "read-memory" protocol message on the embedded device to get the data, and the PC does the decoding and displaying.
As antiquated and painful as it is - I work at a company that continues to actively use VB6 for a large project. In fact, 18 months ago we came up against the 32k identifier limit.
Not willing to give up on the large code base and rewrite everything in .NET we broke our application into a main executable and several supporting DLL files. This week we ran into the 32k limit again.
The problem we have is that no tool we can find will tell us how many unique identifiers our source is using. We have no accurate way to gauge how our efforts are reducing the number of identifiers or how close we are to the limit before we reach it.
Does anyone know of a tool that will scan the source for a project and return some accurate metrics and statistics?
OK. The Project Metrics Viewer which is part of the Project Analyzer tool from Aivosto will do exactly what you want. I've included a screenshot and also the link to the metrics list which includes numbers of variables etc.
Metrics List
(source: aivosto.com)
The company I work for also has a large VB6 project that encountered the identifier limit. I developed a way to accurately count the number of identifiers remaining, and this has been incorporated into our build process for this project.
After trying several tools without success, I finally realized that the VB6 IDE itself knows exactly how many identifiers it has remaining. In fact, the VB6 IDE throws an "out of memory" error when you add one variable past its limit.
Taking advantage of this fact, I wrote a VB6 Add-In project that first compiles the currently loaded project in the IDE, then adds uniquely named variables to the project until it throws an error. When an error is raised, it records the number of identifiers added before the error as the number of identifiers remaining.
This number is stored in file in a location known to our automated build process, which then reads this number and reports it to the development team. When it gets below a value we feel comfortable with, we schedule some refactoring time and move more code out of this project into DLL projects. We have been using this in production for several years now, and has proven to be a reliable process.
To directly answer the question, using an Add-In is the only way I know to accurately measure the number of remaining identifiers. While I cannot share the Add-In code our project is using, I can say there is not much code involved, and it did not take long to develop.
Microsoft has a decent guide for how to create an Add-In, which can get you started:
https://support.microsoft.com/en-us/kb/189468
Here are some important details specific to counting identifiers:
The VB6 IDE will not consistently throw an error when out of identifiers until the current loaded project has been compiled. Our Add-In programmatically does this before adding identifiers to guarantee an accurate count. If the project cannot be compiled, then an accurate count cannot be obtained.
There are 32,500 identifiers available to a new, empty VB6 project.
Only unique identifier names count. Two local variables with the same name in two different routines only count as one identifier.
CodeSmart by AxTools is very good.
(source: axtools.com)
Cheat - create an unused class with #### unique variables in it. Use Excel or something to generate the alphabetical unique variable names. Remove the class from the project when you hit the limit, or comment out blocks of 100 unique variables..
I'd rather lean on the compiler (which defines how many variables are too many) than on some 3rd party tool anyway.
(oh crud, sorry to necro - didn't notice the dates)
You could get this from a tool that extracted identifiers from VB6 code. Then all you'd have to do is sort the list, eliminate duplicates, and measure the list size. We have a source code search engine that breaks up source code into language tokens ("lexes"), with some of those tokens being exactly those identifiers. That would contain exactly the data you want.
But maybe there's another way to solve your problem: find out which variable names which occur rarely and replace them by a set of standard names (e.g., "temp"). So what you really want is a count of the number of each variable name so you can sort for "small numbers of references". The same lexer data can provide this information.
Then all you need is a tool to rename low-occurrence identifiers to something from the standard set. We offer obfuscators that replace one name by another that could probably do this.
[Oct 2014 update]. Just had a long conversation with somebody with this problem. It turns out there's a pretty conceptual answer on which to base a tool, and that is called register coloring, which allocates a fixed number of registers to an arbitrary number of operands. This works by computing an "interference graph" over operands; and two operands that don't "interfere" can be assigned the same register. One could use that to allocate 2^16 available variable names names to an arbitrary number of identifiers, if the interference graph isn't bad enough. My guess is that it is not. YMMV, and somebody still has to build such a tool, needing likely a VB6 parser and machinery to compute such a graph. [Check out my bio].
It seems that Compuware's DevPartner had that kind of code analysis. I don't know if the current version still supports Visual Basic 6.0. (But at least there's a 14-day trial available)