Escaping special characters in User Input in IzPack Installer - installation

I have an IzPack installer that takes in a lot of User Inputs and substitutes them in an XML file. This XML file is actually the configuration file for my application.
There is a major problem that I have hit and I cant move on from it.
In the Input fields (in the installer) user can enter any text and also special characters like & # % ' etc. These special characters messes up my XML file as they are no allowed in the XML syntax and needs to be escaped. for example for & one would need &
So far I have been asking the user to do this, as in escape the special characters themselves, but thats now working either.
Is there a way to have this done automatically? I really need a solution fast.
I am using IzPack V 4.1

You should use a proper XML Api (SAX, DOM) to generate the XML file, this will apply the correct encoding automatically. This may look more complicated first but guarantees that a well formed, syntactically correct file is written.
Searching for JAXP should give you a proper starting point.

Related

Insert a hyperlink to another file (Word) into Visual Studio code file

I am currently developing some functionality that implements some complex calculations. The calculations themselves are explained and defined in Word documents.
What I would like to do is create a hyperlink in each code file that references the assocciated Word document - just as you can in Word itself. Ideally this link would be placed in or near the XML comments for each class.
The files reside on a network share and there are no permissions to worry about.
So far I have the following but it always comes up with a file not found error.
file:///\\165.195.209.3\engdisk1\My Tool\Calculations\111-07 MyToolCalcOne.docx
I've worked out the problem is due to the spaces in the folder and filenames.
My Tool
111-07 MyToolCalcOne.docx
I tried replacing the spaces with %20, thus:
file:///\\165.195.209.3\engdisk1\My%20Tool\Calculations\111-07%20MyToolCalcOne.docx
but with no success.
So the question is; what can I use in place of the spaces?
Or, is there a better way?
One way that works beautifully is to write your own URL handler. It's absolutely trivial to do, but so very powerful and useful.
A registry key can be set to make the OS execute a program of your choice when the registered URL is launched, with the URL text being passed in as a command-line argument. It just takes a few trivial lines of code to will parse the URL in any way you see fit in order to locate and launch the documentation.
The advantages of this:
You can use a much more compact and readable form, e.g. mydocs://MyToolCalcOne.docx
A simplified format means no trouble trying to encode tricky file paths
Your program can search anywhere you like for the file, making the document storage totally portable and relocatable (e.g. you could move your docs into source control or onto a website and just tweak your URL handler to locate the files)
Your URL is unique, so you can differentiate files, web URLs, and documentation URLs
You can register many URLs, so can use different ones for specs, designs, API documentation, etc.
You have complete control over how the document is presented (does it launch Word, an Internet Explorer, or a custom viewer to display the docs, for example?)
I would advise against using spaces in filenames and URLs - spaces have never worked properly under Windows, and always cause problems (or require ugliness like %20) sooner or later. The easiest and cleanest solution is simply to remove the spaces or replace them with something like underscores, dashes or periods.

In Ruby, how to automatically convert non-supported characters in text-processing?

(Using Ruby 1.8)
I only have a brief understanding of encoding and such...but what I want to know is, in any given script handling any given text-file, is there some universal library or call I need to make to turn non-standard characters into their nearest printable equivalent. I realize there's no "all-in-one" fix, but this is for a English (U.S. gov't) text file, and so I'm wondering if there's something that mitigates what must be a relatively common issue in English text formatting.
For example, in a text file, I have an entry like this:
0-8­23
That hyphen is just literally a hyphen as I've typed it out. In the file though, it's something that looks like a hyphen (an n-dash?) but when copy and pasting it...for example, into this browser text box, it doesn't show up.
Printing it out via a Ruby script gets this:
08�23
How do I get my script to resolve it into a dash. Or something other than a gremlin?
It's very common to run into hyphen-like characters and dashes, especially in the output of word-processors. Converting them isn't too hard if you know what the byte is that represents the character, but gets to be a pain when you get a document with several different ones. It gets worse as you throw other accented characters into the mix.
Ruby 1.8 doesn't support multibyte and Unicode character sets as well as 1.9+, but you can work around that somewhat by using the Iconv library.
Iconv lets you convert between various character-sets, such as US-ASCII, ISO-8859-1 and WIN-1252. It's smarter than a regex, because it knows how to convert from accented characters, to similarly looking characters, or ignore them if nothing similar exists, allowing your transliteration to degrade gracefully.
I have some example code in an answer to a related question. Also read James Grey's article linked in the answer. It explains the problem and ways to fix it, ending up with recommending Iconv too.
You could whitelist with gsub:
string.gsub(/[^a-zA-Z0-9]/)
Without knowing more information, I can't build the perfect regex for you, but the general idea is to replace anything that's not what you're expecting (anything not a letter or number or expected symbols).

Ruby library for manipulating XML with minimal diffs?

I have an XML file (actually a Visual C# project file) that I want to manipulate using a Ruby script. I want to read the XML into memory, do some work on them that includes changing some attributes and some text (fixing up some path references), and then write the XML file back out. This isn't so hard.
The hard part is, I want the file I write to look the same as the file I read in, except where I made changes. If the input file used double quotes, I want the output to use double quotes. If the input had a space before />, I want the output to do the same. Basically, I want the output to be the same as the input, except where I explicitly made changes (which, in my case, will only be to attribute values, or to the text content of an element).
I want minimal diffs because this project file is checked into version control -- and because the next time I make a change in Visual Studio, it's going to rewrite it in its preferred format anyway. I want to avoid checking in a bunch of meaningless diffs that will then be changed back again in the near future. I also want to avoid having to open the project in Visual Studio, make a change, and save, before I can commit my Ruby script's changes. I want my Ruby script to just make its changes, nothing more.
I originally just parsed the file with regexes, but ran into cases where I really needed an XML library because I needed to know more about child elements. So I switched to REXML. But it makes the following undesirable changes to my formatting:
It changes all the attributes from double quotes to single quotes.
It escapes all the apostrophes inside attribute values (changing them to ').
It removes the space before />.
It sorts each element's attributes alphabetically, rather than preserving the original order.
I'm working around this by doing a bunch of gsub calls on REXML's output, but is there a Ruby XML-manipulation library that's a better fit for "minimal diff" scenarios?
You can build your own SAX parser (using Nokogiri, for example, it's very easy and I recommend to use it) to parse your XML file, change some data in it, and flush the processed XML file with your own customized, built from scratch, XML generator. The bad news is, you have to build a tiny XML library and generator routine in this case, so it is not an ordinary task.
Another way: don't build the SAX parser, but write an XML generator. Parse XML with your favourite library, change what you need to change and generate anything you want. You just need to recursively walk through all nodes in your document and output them within your conventions.

Are there any invalid linux filenames?

If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.

What are the best practices for building multi-lingual applications on win32?

I have to build a GUI application on Windows Mobile, and would like it to be able user to choose the language she wants, or application to choose the language automatically. I consider using multiple dlls containing just required resources.
1) What is the preferred (default?) way to get the application choose the proper resource language automatically, without user intervention? Any samples?
2) What are my options to allow user / application control what language should it display?
3) If possible, how do I create a dll that would contain multiple language resources and then dynamically choose the language?
For #1, you can use the GetSystemDefaultLangID function to get the language identifier for the machine.
For #2, you could list languages you support and when the user selects one, write the selection into a text file or registry (is there a registry on Windows Mobile?). On startup, use the function in #1 only if there is no selection in the file or registry.
For #3, the way we do it is to have one resource DLL per language, each of which contains the same resource IDs. Once you figure out the language, load the DLL for that language and the rest just works.
Re 1: The previous GetSystemDefuaultLangID suggestion is a good one.
Re 2: You can ask as a first step in your installation. Or you can package different installers for each language.
Re 3:
In theory the DLL method mentioned above sounds great, however in practice it didn't work very well at all for me personally.
A better method is to surround all of the strings in your program with either: Localize or NoLocalize.
MessageBox(Localize("Hello"), Localize("Title"), MB_OK);
RegOpenKey(NoLocalize("\\SOFTWARE\\RegKey"), ...);
Localize is just a function that converts your english text to a the selected language. NoLocalize does nothing.
You want to surround your strings with these values though because you can build a couple of useful scripts in your scripting language of choice.
1) A script that searches for all the Localize(" prefixes and outputs a .ini file with english=otherlangauge name value pairs. If the output .ini file already contains a mapping you don't add it again. You never re-create the ini file completely, your script just adds the missing ones each time you run your script.
2) A script that searches all the strings and makes sure they are surrounded by either Localize(" or NoLocalize(". If not it tells you which strings you still need to localize.
The reason #2 is important is because you need to make sure all of your strings are actually consciously marked as needing localization or not. Otherwise it is absolutely impossible to make sure you have proper localization.
The reason for #1 instead of loading from a DLL is because it takes no work to maintain this solution and you can add new strings that need to be translated on the fly.
You ship the ini files that are output with your program. You also give these ini files to your translators so they can convert the english=otherlanguage pairs. When they send it back to you, you simply replace your checked in .ini file with the one given by your translator. Running your script as mentioned in #1 will re-add any missing translations if any were done while the translator was translating.

Resources