Application To Help Translate XSL Transformations - debugging

There must exist some application to do the following, but I am not even sure how to google for it.
The dilemma is that we have to backtrace defects and in doing so this requires to see how certain fields in the output xml have been generated by the XSL. The hard part is spending hours in the XSL and XML trying to figure out where it was even generated. Even debugging is difficult if you are working with multiple XSL transformation and edits as you still need to find out primary keys that get in the specific scenario for that transform.
Is there some software program that could take an XSL and perhaps do one of two things:
Feed it an output field name and it would generate a list of all
the possible criteria that would generate this field so you can figure out which one of a dozen in the XSL meets your criteria, or
Somehow convert the xsl into some more readable if/then type
format (kind of like how you can use Javadoc to produce readable documentation)

You don't say what tools you are currently using. Tools like oXygen and Stylus Studio have some quite sophisticated XSLT debugging capability. OXygen's output mapping tool (see http://www.oxygenxml.com/xml_editor/working_with_xslt_debugger.html#xsltOutputMapping) sounds very like the thing you are asking for.
Using schema-aware stylesheets can greatly ease debugging. At least in the Saxon implementation, if you declare in your stylesheet that you want the output to be valid against a particular schema, then if it isn't, Saxon will tell you what instruction in the stylesheet caused invalid output to be generated. Sometimes it will show you the error at stylesheet compile time, before you even supply a source document. This capability is greatly under-used, in my view. More details here: http://www.stylusstudio.com/schema_aware.html

It's an interesting question. Your suggestions are also interesting but would be quite challenging to develop; I know of no COTS or FOSS solution to either, but here are some thoughts:
Your first possibility is essentially data-flow analysis from
compiler design. I know of no tools that expose this to the user,
but you might ask XSLT processor developers if they have ever
considered externalizing such an analysis in a manner that would be useful to XSLT
developers.
Your second possibility is essentially a documentation generator
against XSLT source. I have actually helped to complete one for a client in
financial services in the past (see Document XSLT Automatically), but the solution was the property of
the client and was never released publicly as far as I know. It
would be possible to recreate such a meta-transformation between
XSLT input and HTML or Docbook output, but it's not simple to do in the
most general case.
There's another approach that you might consider:
Tighten up your interface definition. In your comment, you mention uncertainty as to whether a problem's source is bad data from the sender or a bug in the XSLT. You would be well-served by a stricter interface definition. You could implement this via better typing in XSD, addition of xsd:assertion statements if XSD 1.1 is an option, or adding a Schematron-based interface checking level, which would allow you the full power of XPath-based assertions over the input. Having such an improved and more specific interface definition would help both you and your clients know what should and should not be sent into your systems.

Related

Whats the best way to include static information in my app?

I am currently working on my first small desktop menubar app (macOS, Swift 3). It needs to access
a) A list of words (Think word dictionary, 1k-5k words, per supported language)
b) A list of structured data (Think simple structs, ~500)
I am currently pondering, whether to build these in code - maybe a factory class per language. Or include them in my app as json and parse at runtime. Or maybe build an SQLite file and read that during runtime, although that approach would be harder to diff in source control ...
As I am new to the platform I was wondering whether there might be a better way that I am not aware of, or maybe performance considerations that render one of the mentioned approaches useless.
As usual, thanks in advance folks !
Your listed solutions can be used for this task. However I think for such kind of tasks the best solution is to use CoreData, where you can store a list of words as well as structured data, also make relations between them if you need it

Structured debug log

I am writing a complex application (a compiler analysis). To debug it I need to examine the application's execution trace to determine how its values and data structures evolve during its execution. It is quite common for me to generate megabytes of text output for a single run and sifting my way through all that is very labor-intensive. To help me manage these logs I've written my own library that formats them in HTML and makes it easy to color text from different code regions and indent code in called functions. An example of the output is here.
My question is: is there any better solution than my own home-spun library? I need some way to emit debug logs that may include arbitrary text and images and visually structure them and if possible, index them so that I can easily find the region of the output I'm most interested. Is there anything like this out there?
Regardless you didn't mentioned a language applied, I'd like to propose apache Log4XXX family: http://logging.apache.org/
It offers customizable details level as well as tag-driven loggers. GUI tool (chainsaw) can be combined with "old good" GREP approach (so you see only what you're interested in at the moment).
Colorizing, search and filtering using an expression syntax is available in the latest developer snapshot of Chainsaw. The expression syntax also supports regular expressions (using the 'like' keyword).
Chainsaw can parse any regular text log file, not just log files generated by log4j.
The latest developer snapshot of Chainsaw is available here:
http://people.apache.org/~sdeboy
The File, load Chainsaw configuration menu item is where you define the 'format' and location of the log file you want to process, and the expression syntax can be found in the tutorial, available from the help menu.
Feel free to email the log4j users list if you have additional questions.
I created a framework that might help you, https://github.com/pablito900/VisualLogs

Localization best practices

I'm starting to modify my app, which uses all hardcoded strings for errors, GUI, etc. I'm considering these two approaches, but let me know if there is an even better way:
-Put all string in ressource (.rc) files.
-define all strings in a file, once for each language. Use a preprocessor define to decide which strings get compiled in.
Which of these two approaches is generally prefered?
Put all the strings in resource files. Once you've done that, there's several good translation packages available. One useful thing these packages do is allow you to get translation done by somebody who doesn't program.
Remember, also, that internationalization (i18n) is a large subject, and there's a lot of things to consider. It isn't just a matter of translating strings. Do a web search on it, at the very least. You might want to read a book on it: I used International Programming for Windows by Schmitt as a guide. It's an old book from Microsoft Press, and I had to get it through a used book service; most of the more modern stuff seems to be on internationalizing .NET apps.
Without knowing more about your project (what sort of software, who the intended audience is, what sort of organization you have, what sort of budget, why you're interested in internationalization, etc.), this is about the most I can tell you.
Generally you see locale specific resource files containing strings referenced by key. Compiling different versions for different locales is a very rigid solution and will be a maintenance nightmare. Using resource files also allows the user to have fallback locales.
There's another approach of just putting strings in the source with somethign like tr(" ") and usign one of the tools that strips them out and converts them.
It works with any toolkit/GUI library.
You can mark text to be converted and text not to change (such as protocol strings or db keys).
It makes the source easier to read and search, isntead of having to lookup what IDS_MESSAGE34 means.
One problem with resource files, at least with Windows/MFC, is that you can't use the stringtable in dialogs. So you have some text in the stringtabel and some in the dialog section which you have to dela with separately.

What simple syntax can be used for rich text?

I want in an application with a simple text input, enriched with some marks to include formatting or semantic labeling. I want the syntax as easy as possible and I want to include self-defined labels.
Example:
[bold]Stackoverflow[/bold] is a [tag]good[/tag] resource for programmers.
Tables would be needed too.
HTML/XML and LaTeX are mighty enough to allow this, but too complicated. Wiki-Syntax seems simple, but uses another symbol for each markup, has unclear quoting and every Wiki seems to have another syntax. For tables and similar stuff Wiki becomes very complicated.
Exists a language/syntax, that matches my needs or can be slightly changed to do so? Or do I have to invent something myself? In that case, do you have suggestions?
Definitely do NOT invent your own. There are plenty of simple markup languages already, and users HATE learning new ones. Trust me on this!
I would suggest using one of the following:
Textile
Markdown
BBCode
Make your decision based on your userbase, as well as what tools and parsers are available in your chosen language. For my site, we went with Textile, but I've found that BBCode tends to be the language that most people already know. However, this will vary with different user demographics.
StackOverflow, along with several other sites, uses Markdown. I think it will give you the best balance between features and simplicity.
Let me add ReStructuredText to the list.
An additional benefit of using it is given by the availability of ReStructuredText to Anything service that makes extremely easy to create HTML or PDF versions of the document.
As already pointed out there are a lot of lightweight markup languages (many are listed here: wikipedia article), there should be no need of creating your own.

Suggestions for human editable data file format/parsing library

For example, right now I have a roll-my-own solution that uses data files that include blocks like:
PlayerCharacter Fighter
Hitpoints 25
Strength 10
StartPosition (0, 0, 0)
Art
Model BigBuffGuy
Footprint LargeFootprint
end
InventoryItem Sword
InventoryItem Shield
InventoryItem HealthPotion
end
human editable (w/ minimal junk characters, ideally)
resilient to errors (fewest 'wow i can't parse anything useful anymore' style errors, and thus i've lost all of the data in the rest of the file) - but still able to identify and report them, of course. My example the only complete failure case is missing 'end's.
nested structure style data
array/list style data
customizable foundation types
fast
Are there any well known solutions that meet/exceed these requirements?
Yaml is a good solution and very close to what you have. Search for it.
I second the YAML suggestion. It's extremely easy to edit, very forgiving of mistakes and widely supported (especially among the dynamic languages).
I'd say the most common choices are:
JSON (offical site) - very flexible, though the punctuation can take a bit for people to get used to
INI - super simple to use, but a bit limited in data-types
XML - pretty flexible, common, but way too verbose sometimes
You could try JSON available at: http://www.json.org/
It was designed for javascript and web usage initially. But it's pretty clean, and supported in many languages.
Lua was designed to be a programming language where the syntax lets you easily use it as a markup language as well, so that you include data files as if they were code. Many computer games use it for their scripting, such as World of Warcraft due to its speed and ease of use. However it's originally designed and maintained for the energy industry so there's a serious background.
Scheme with its S-expressions is also a very nice but different-looking syntax for data. Finally, you've got XML that has the benefit of the most entry-level developers knowing it. You can also roll your own well-defined and efficient parser with a nice development suite such as ANTLR.
I would suggest JSON.
Just as readable/editable as YAML
If you happen to use for Web then can be eval()'ed into JavaScript objects
Probably as cross language as YAML

Resources