Accurately accessing VB6 limitations - vb6

As antiquated and painful as it is - I work at a company that continues to actively use VB6 for a large project. In fact, 18 months ago we came up against the 32k identifier limit.
Not willing to give up on the large code base and rewrite everything in .NET we broke our application into a main executable and several supporting DLL files. This week we ran into the 32k limit again.
The problem we have is that no tool we can find will tell us how many unique identifiers our source is using. We have no accurate way to gauge how our efforts are reducing the number of identifiers or how close we are to the limit before we reach it.
Does anyone know of a tool that will scan the source for a project and return some accurate metrics and statistics?

OK. The Project Metrics Viewer which is part of the Project Analyzer tool from Aivosto will do exactly what you want. I've included a screenshot and also the link to the metrics list which includes numbers of variables etc.
Metrics List
(source: aivosto.com)

The company I work for also has a large VB6 project that encountered the identifier limit. I developed a way to accurately count the number of identifiers remaining, and this has been incorporated into our build process for this project.
After trying several tools without success, I finally realized that the VB6 IDE itself knows exactly how many identifiers it has remaining. In fact, the VB6 IDE throws an "out of memory" error when you add one variable past its limit.
Taking advantage of this fact, I wrote a VB6 Add-In project that first compiles the currently loaded project in the IDE, then adds uniquely named variables to the project until it throws an error. When an error is raised, it records the number of identifiers added before the error as the number of identifiers remaining.
This number is stored in file in a location known to our automated build process, which then reads this number and reports it to the development team. When it gets below a value we feel comfortable with, we schedule some refactoring time and move more code out of this project into DLL projects. We have been using this in production for several years now, and has proven to be a reliable process.
To directly answer the question, using an Add-In is the only way I know to accurately measure the number of remaining identifiers. While I cannot share the Add-In code our project is using, I can say there is not much code involved, and it did not take long to develop.
Microsoft has a decent guide for how to create an Add-In, which can get you started:
https://support.microsoft.com/en-us/kb/189468
Here are some important details specific to counting identifiers:
The VB6 IDE will not consistently throw an error when out of identifiers until the current loaded project has been compiled. Our Add-In programmatically does this before adding identifiers to guarantee an accurate count. If the project cannot be compiled, then an accurate count cannot be obtained.
There are 32,500 identifiers available to a new, empty VB6 project.
Only unique identifier names count. Two local variables with the same name in two different routines only count as one identifier.

CodeSmart by AxTools is very good.
(source: axtools.com)

Cheat - create an unused class with #### unique variables in it. Use Excel or something to generate the alphabetical unique variable names. Remove the class from the project when you hit the limit, or comment out blocks of 100 unique variables..
I'd rather lean on the compiler (which defines how many variables are too many) than on some 3rd party tool anyway.
(oh crud, sorry to necro - didn't notice the dates)

You could get this from a tool that extracted identifiers from VB6 code. Then all you'd have to do is sort the list, eliminate duplicates, and measure the list size. We have a source code search engine that breaks up source code into language tokens ("lexes"), with some of those tokens being exactly those identifiers. That would contain exactly the data you want.
But maybe there's another way to solve your problem: find out which variable names which occur rarely and replace them by a set of standard names (e.g., "temp"). So what you really want is a count of the number of each variable name so you can sort for "small numbers of references". The same lexer data can provide this information.
Then all you need is a tool to rename low-occurrence identifiers to something from the standard set. We offer obfuscators that replace one name by another that could probably do this.
[Oct 2014 update]. Just had a long conversation with somebody with this problem. It turns out there's a pretty conceptual answer on which to base a tool, and that is called register coloring, which allocates a fixed number of registers to an arbitrary number of operands. This works by computing an "interference graph" over operands; and two operands that don't "interfere" can be assigned the same register. One could use that to allocate 2^16 available variable names names to an arbitrary number of identifiers, if the interference graph isn't bad enough. My guess is that it is not. YMMV, and somebody still has to build such a tool, needing likely a VB6 parser and machinery to compute such a graph. [Check out my bio].

It seems that Compuware's DevPartner had that kind of code analysis. I don't know if the current version still supports Visual Basic 6.0. (But at least there's a 14-day trial available)

Related

Eclipse CLP: maximum number of constraints/variables

In the Eclipse CLP, how many constraints or variables can I define?
I am currently remodeling my scheduling problem - I need to replace a single alldifferent constraint with many atmost constraints. But since I've introduced this change, my ecl script is not working. By "not working" I mean the Eclipse CLP - eclipse.exe or the TkEclipse GUI just shuts down. Without any error message,comment or saying goodbye. Just nothing.
If I try to comment-out some constraints, the script at least gets compiled.
Has someone already bothered with this issue?
There is no specific limit on the number or variables or constraints.
But you were working with large, generated source files where clauses had thousands of subgoals. Because ECLiPSe uses a recursive descent parser, such files can cause an OS stack overflow, in particular on Windows. You could either increase the Windows stack limit, or you could break your generated code into smaller clauses, and call these in conjunction.
Generally, however, generating textual source code isn't such a great idea: it must be created, written, read, parsed, compiled, and is then executed just once. Consider instead generating a pure data file that contains only things like arrays/lists of numbers, but no variables. You can then have a generic ECLiPSe program that reads these data and uses them to create variables and constraints, usually in several loops.
For a very simple example, compare https://eclipseclp.org/examples/transport1.pl.txt (where all the data is explicit in the flat model) with
https://eclipseclp.org/examples/transport_arr.pl.txt where the model is generic and all data comes from the data/3 fact at the end (this would correspond to the generated data file).

Is there a process for munging data from many different formats in RapidMiner?

I'm trying to help my team streamline a data ingestion process that is taking up a substantial amount of time. We receive data in multiple formats and with attributes arranged differently. Is there a way using RapidMiner to create a process that:
Processes files on a schedule that are dropped into a folder (this
one I think I know but I'd love tips on this as scheduled processes
are new to me)
Automatically identifies input filetype and routes to the correct operator ("Read CSV" for example)
Recognizes a relatively small number of attributes and arranges them accordingly. In some cases, attributes are named the same way as our ingestion format and in others they are not (phone vs phone # vs Phone for example)
The attributes we process mostly consist of name, id, phone, email, address. Also, in some cases names are split first/last and in some they are full name.
I recognize that munging files for such simple attributes shouldn't be that hard but the number of files we receive and lack of order makes it very difficult to streamline a process without a bit of automation. I'm also going to move to a standardized receiving format but for a number of reasons that's on the horizon and not an immediate solution.
I appreciate any tips or guidance you can share.
Your question is relative broad, so unfortunately I can't give you complete answer. But here are some ideas on how I would tackle the points you mentioned:
For a full process scheduling RapidMiner Server is what you are
looking for. In that case you can either define a schedule (e.g.,
check regularly for new files) or even define a web service to
trigger the process.
For selecting the correct operator depending on file type, you could
use a combination of "Loop Files" and macro extraction to get the
correct type and the use either "Branch" or "Select Subprocess" for
switching to different input routes.
The "Select Attributes" operator has some very powerful options to
select specific subsets only. In your example I would go for a
regular expression akin to [pP]hone.* to get the different spelling
variants. Also very helpful in that case would be the "Reorder
Attributes" operator and "Rename by Replacing" to create a common
naming schema.
A general tip when building more complex process pipelines is to organize your different tasks in sub-processes and use the "Execute Process" operator. This makes everything much more readable and maintainable. Also a good error handling strategy is important to handle unforeseen data formats.
For more elaborate answers and tips from many adavanced RapidMiner users, I also highly recommend the RapidMiner community.
I hope this gives a good starting point for your project.

Coverity analysis: ignore 3rd party libraries

In a large C++ project Coverity analysis reports issues in files that we won't be fixing e.g. Boost libraries, STL headers, some 3rd party libraries etc.
Ideally there would be a mechanism to completely ignore these and not to increment the total count for such issues.
In Coverity Connect (v8.1) we've set up Components with file path regexp and that nicely filters the files in question when browsing but the total number of issues does not drop down. Two questions related to this:
is there a way to drop the number of total issues for files we don't care about? e.g. after such an issue has already been captured
if new code we introduce includes one of the offending boost/STL/etc headers, will this clock up the total issue counter? (clearly, that would be less than desirable)
Mandatory disclaimer first: Your customers won’t care that bugs in your code came from a third party. That said, the main answer at the link Yannis mentioned is generally the correct one: “use a component filter.” If it’s not working correctly for you, double-check your configuration. I found it quite robust, even with a negative look-ahead regex with over a hundred disjuncts.
Once such issue is found, you can mark it as false positive or ignore it all the way. You have to do this only once. In future analyze, when this issue is found again, it will keep this status. And no, if you use this include to other files too, the total issue counter won't go higher as long as the issue is in the same file.
Check this:
Can Synopsys Static Analysis (Coverity) automatically ignore issues in third-party or noncritical code?

Making an application in Visual Basic to handle Dialogue in Morrowind?

I want to make a program for a very catered, specific purpose, to aid me in making a large set of quest mods to the videogame Elder Scrolls III: Morrowind. I’m attempting to do this through either excel or Visual Basic, and here I’ve provided a little summary of how dialogue works in the game’s normal creation program and then what I want to create outside of it and improve on.
How Morrowind Dialogue works?
For those of you who may be familiar with the game, you’ll remember that the talking to NPC’s will bring up a set of text, and this text is their dialogue. There are different “topics” that if an NPC has dialogue set for, the player can see the topic and click on it, bringing up a new wall of text, and this is generally how dialogue works in the entire game on the player’s end.
In creating a Morrowind Mod, the way dialogue really works in the “Construction Set” (the program used to create and edit the game) is that a database contains every entry of text, and this these entries have conditions set to them which limit which NPCs can say a given entry of dialogue. So for instance, a topic like “latest rumors”, will have lots of entries in it with lots of different NPCs having something to say about it. The topic itself is a condition of sorts with potentially dozens of entries attached to it, and conditions set to specific entries can also be applied. Conditions can include checking to see if the NPC is in a given city, if the in-game time is night or day, if the player is at a certain numbered stage/index of a given quest line and much, much more. This system is what makes all quests possible and the game dynamic.
What I want to create:
I am beginning a rather large mod project that includes many entries of dialogue, many new and old topics, and many quest and quest stages. I could list all the reasons here but essentially my problem is that the Construction Set has many limitations in terms of organization that make it difficult to make a large mod’s dialogue in. I would be better off to design, set the topics for, and edit all of my dialogue entries outside of the Construction Set program and implement them when I’m confident that the writing and quests are finished.
Essentially if this is too complicated I could just write all the quests and dialogue in Microsoft Word, but optimistically I'd like to do something more dynamic and helpful to me, as a writer, and be able to use real variables to store and set Journal/Quest Indexes, filter dialogue by Quest or by NPC, and easily edit dialogue and quests without getting lost in the normal game’s thousands of lines of other dialogue.
*I can't post more than two links here, but I posted on reddit and there I have a gallery showing how the Construction Set works and what I have made in Visual Studio so far:
https://www.reddit.com/r/learnprogramming/comments/4oap6w/making_an_application_in_visual_basic_to_handle/
So, my intention is to make a program in Visual Studio using Visual Basic or Python that leaves me with a program that lets me write, organize, and set the text for dialogue and filter based on conditions.
This likely requires creating a database file for the program in Visual Studio and being able to create variables in runtime, for the program. That is because I want the user of the program to be able to add new dialogue topics, new journal/quests, and all of these things will have conditions with values associated with them.
Any help, advice, and direction is appreciated. I am relearning Visual Studio (I took two courses in it) and I am unfortunately very new to excel and databases in general.
You are correct in that a database of some kind would be needed. However, you could approach this several different ways depending upon your comfort level, money, portability requirements, etc...
One way to do it would be to use XML to store your data. It has the advantage of being extremely portable and transformable. Since this is likely a program where only one person would be directly accessing the data at any given time, it might be your best bet.
Another option is to use MS Access if you have office. This gives you a workable, albeit fairly basic, relational database. This would probably be a better choice if you have 2 or 3 people that could possibly be working in it.
A third option would be a full DBMS. MySQL is free and you could install that to your local machine, or to a remote server. Installed to a remote server would give you the option of allowing many people to connect to it and modify data transactionally. However, this would be overkill if it is only a one or two person system.
Circling back around to XML... That will most likely be your best bet. It is simple and integrates perfectly with .Net applications. It can be imported/transformed to any data-store later once you are finished (or multiple times as you progress). Interfacing with XML via .Net allows you to work with it like a database within your code, so if you design your data layer properly up front, you could even migrate to a full database later if the project expands drastically. The biggest downside to XML would be that it isn't relational in the way that a regular DBMS is, and it is not inherently transactional. You do not have atomic updates, so if you have several people modifying things at once you could lose data if it is overwritten.
You could get around that to an extent by writing a more advanced data layer to interface with the XML files, but if only one person is making changes locally, and then the data file is, say, uploaded to a remote datastore later, the only thing to keep in mind would be coordinating when and who can modify that file. Mostly logistics stuff at that point.

Run custom preprocessing step for source files in Xcode build

As part of my build process (in Visual Studio), I use a custom MSBuild task that takes all the input source files, copies them to a secondary location, runs a preprocessing tool over them, and then gives the copies back to MSBuild to continue the build process with. I'm now working on a project for iOS, and I need to be able to do the same thing. It's been a very long time since I've worked with Xcode, so I'm pretty rusty on how I can set up the build process to work the same way I just described.
Specifically, here's the preprocessing that I'm doing:
As with many games and engines, there are often a lot of named resources, events, script symbols, object states, etc. that need to be human readable when represented in source code but for which having to do a full string comparison at runtime would be much too costly. Instead of using a full string, I use a 32-bit StringId integer type to represent these values. My preprocessing tool runs through the source code and replaces all instances of a macro in the form SID('some-named-identifier') with the 32-bit hash of the string inside that macro. During development, programmers and designers can use arbitrary strings as identifiers for whatever they need to be used for. At runtime, comparisons between StringIds are simple integer comparisons and since they are the hashed versions of the actual strings, there are no strings stored in the compiled binary that could be extracted.
Additionally, when preprocessing the SID macros, I populate a MySQL database with the strings and their hashed values. This lets me do a reverse lookup at runtime in order to print the human-readable strings while debugging. It's a great system, and I'd love to get it working in Xcode as well!
Thanks in advance!

Resources