Html entities in file names: Possible mine traps? - image

When I thought about resizing images and saving the new sizes parallel on the server, I came to the following question:
// Original size
DSC_18342.jpg
// New size: Use an "x" for "times"
DSC_18342_640x480px.jpg
// New size: Use the real "×" for "times"
DSC_18342_640×480px.jpg
The point is, that it's slightly easier if you got a real × instead of an x in the file name, as the unit px already contains the x, which makes it a little bit harder to read.
Question: What problems could I get in, when using the Html entity in the filename?
Sidenotes: I'm writing an open source, publicly available script, so the targeted server can be anything - therefore I'm also interested (and will vote up) edge cases, that I'm not aware off.
Thank you all!
You may have noticed, that I'm aware, that I could simply avoid it (which I'll do anyway), but I'm interested in this issue and learning about it, so please just take above example as possible case.

There are file systems that simply don't support unicode. This may be less of a problem if you make unicode support a requirement of your application.
Some consideration about different unicode file system are given in File Systems, Unicode, and Normalization.
A concluding remark (from a viewpoint of solaris file systems) is:
Complete compatibility and seamless interoperability with
all other existing Unicode file systems appears not 100%
possible due to inherent differences.
I can imagine that there will be problems especially when migrating the application. Just storing files is probably no problem but if their names are stored in a database there might be a mismatch after migration.

Related

How to create custom single byte character set for Windows?

Windows uses some encoding table for non-unicode applications to map characters from unicode table to 1-byte table. There are many predefined character sets, user can choose one in windows settings. I need to create a custom character set. Where can I find some information about that process? I tried to Google it, but didn't have any luck, I guess, few people are doing that.
AFAIK, you can't do that, I don't think there's even a way to write some kernel mode "driver" for it, but, haven't looked into these things for a while, maybe there is some way (now).
In any case, you might be better off using a library you can change/update, such as libiconv.
UPDATE:
Since you don't have the source code, you're in a very unfortunate position.
For all string resources (in EXE or any DLLs or, though unlikely, in some other file(s)), you can "read them out" and figure out what's the code page used in them and change it (and the strings themselves), tweaking it in some way that would achieve your purpose - to have the right glyphs appear (yes, you might actually see different glyphs in Notepad, but, who cares if you application shows the right one(s) - FWIW, for such hacks, it's best to use a hex-editor). Then, of course, "put" the (changed) resources back in (EXE/DLL). But, it's quite possible not all strings are in resources, and that's when the "real" problems start.
There's any number of hacks that could have been done here. Your best option is to use some good debugger (WinDbg or better) and figure out what's going on and how are character sets handled = since you don't have the source code, it's gonna be quite painful. You want to find out:
Are the default charset(s) used (OEM/ANSI), or some specific (via NLS APIs)?
Whatever charset is used, is it a standard one or not? The charset here is the "code" Windows assigns to it. Look at Windows lists of available charsets.
Is the application installing fonts? If it is, use a font tool to examine them - maybe it has a specific (non-standard?) code-page supported in it.
Is the application installing some some drivers. If it is, the only way to gain more insight is to use a kernel debugger (which is very tricky and annoying, but, as already said, you're in an unfortunate situation).
It appears that those tables are located at C:\Windows\system32*.nls. I'm not sure whether there's proper documentation for their structure. There's some information in Russian here. Also you might want to tinker with registry at HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls

Extracting the serialized data from unknown files

My dearest stackoverflowers,
I want to access the serialized data contained in files with strange, to me, extensions. The bulk of the data seems to be in a .st and an .idt file.
The program is meant to be run on Windows, and the unix file command gives me only false positives. Any ideas on either what these extensions mean or on how to investigate and extract their contents?
Below I provide the entirety of the extensions in a long list in hope somebody recognizes them. Googling also gives me false positives. For example: .st is commonly used for ATARI emulation files.
Thanks in advance!
.cix
.cmp
.cnt
.dam
.das
.drf
.idt
.irc
.lxp
.mp
.mbr
.str
.vlf
.rpf
.st
.st
Some general advice on how to approach this:
One way to approach this is to use a site like http://filext.com/ to try to figure out where the files came from. This can be tough, because it's not like there's a file extension standard anywhere - anyone can use any extension, so you're going to have a lot of conflicts/disambiguation issues to solve.
Sometimes you can get lucky, and if you open up the files in a plain text editor you can occasionally see plain string data that is readable, which can help identify the general sort of data contained in a file, and therefore help cut down on the possible number of sources for a file. For example, I have often helped people who received a file as an email attachment with no extension, figure out what file type it was using this technique, adding the file extension, and then opening it in the appropriate program.
There are also sites like http://www.oldversion.com/ that keep old versions of programs that you (typcially) can download for free. This is especially helpful if the data you're working with was created 5+ years ago, in a proprietary program, and that program is no longer available/purchasable from the vendor who created it.
Once you have a good idea of what files belong to what programs, then you're probably going to spend a lot of time trying to find online resources for what the structure of the files are. If that isn't available, you can get a copy of the original program, but either the program won't open the files you're interested in or you still want raw access to the data, then try generating some sample output files with data that you input, and go Rosetta Stone on it, comparing your known file to the original file.
From there, the additional knowledge you'll probably want, is to try to find out what language/compiler the software was written in, which can give you a lead on what code libraries were used to serialize the data in the first place. Once you know all that, then it's matter of reading through any available documentation on the serialization process, and then writing a deserializer.
The one thing this technique won't solve is, if you're dealing with corrupt/truncated data files, it may be very difficult to tell the difference between that and whether or not you have the file structure correct. The "Rosetta Stone" technique might be helpful in that case.
Depending on how many different pieces of source software you're talking about, sounds like a pretty big project. Good luck!

Why does loading cached objects increase the memory consumption drastically when computing them will not?

Relevant background info
I've built a little software that can be customized via a config file. The config file is parsed and translated into a nested environment structure (e.g. .HIVE$db = an environment, .HIVE$db$user = "Horst", .HIVE$db$pw = "my password", .HIVE$regex$date = some regex for dates etc.)
I've built routines that can handle those nested environments (e.g. look up value "db/user" or "regex/date", change it etc.). The thing is that the initial parsing of the config files takes a long time and results in quite a big of an object (actually three to four, between 4 and 16 MB). So I thought "No problem, let's just cache them by saving the object(s) to .Rdata files". This works, but "loading" cached objects makes my Rterm process go through the roof with respect to RAM consumption (over 1 GB!!) and I still don't really understand why (this doesn't happen when I "compute" the object all anew, but that's exactly what I'm trying to avoid since it takes too long).
I already thought about maybe serializing it, but I haven't tested it as I would need to refactor my code a bit. Plus I'm not sure if it would affect the "loading back into R" part in just the same way as loading .Rdata files.
Question
Can anyone tell me why loading a previously computed object has such effects on memory consumption of my Rterm process (compared to computing it in every new process I start) and how best to avoid this?
If desired, I will also try to come up with an example, but it's a bit tricky to reproduce my exact scenario. Yet I'll try.
Its likely because the environments you are creating are carrying around their ancestors. If you don't need the ancestor information then set the parents of such environments to emptyenv() (or just don't use environments if you don't need them).
Also note that formulas (and, of course, functions) have environments so watch out for those too.
If it's not reproducible by others, it will be hard to answer. However, I do something quite similar to what you're doing, yet I use JSON files to store all of my values. Rather than parse the text, I use RJSONIO to convert everything to a list, and getting stuff from a list is very easy. (You could, if you want, convert to a hash, but it's nice to have layers of nested parameters.)
See this answer for an example of how I've done this kind of thing. If that works out for you, then you can forego the expensive translation step and the memory ballooning.
(Taking a stab at the original question...) I wonder if your issue is that you are using an environment rather than a list. Saving environments might be tricky in some contexts. Saving lists is no problem. Try using a list or try converting to/from an environment. You can use the as.list() and as.environment() functions for this.

Most suitable language for cheque/check printing on Windows Platform

I need to create a simple module/executable that can print checks (fill out the details). The details need to be retried from an existing Oracle 9i DB on the Windows(xp or later)
Obviously, I shall need to define the pixel format as to where the details (Name, amount, etc) are to be filled.
The major constraint is that the client needs / strongly prefers a executable , not code that is either interpreted or uses a VM. This is so that installation is extremely simple. This requirement really cannot be changed.
Now, the question is, how do I do it.
(.NET, java and python are out of the question, unless there is a way around the VMs)
I have never worked with MFC or other native windows APIs. I am also unfamiliar with GDI.
Do I have any other option? Any language that can abstract the complexities and can be packed into a x86 binary?
Also, if not then any code help with GDI would be appreciated.
The most obvious possibilities would probably be C, C++, and Delphi. There are a few others such as Ada (e.g., Gnat), but offhand I don't see a lot of reason to favor them (especially for a job this small).
At least the way I'd write this, the language would be almost irrelevant. I'd have it run almost entirely by an external configuration file that gave the name of each field, and the location where it should be printed. I'd probably use something like MM_LOMETRIC mapping mode, so Windows will handle most of the translation to real-world coordinates (and use tenths of a millimeter in the configuration file, so you can use the coordinates without any translation).
Probably the more difficult part of this would/will be the database connectivity. There are various libraries around to help out with that, so this won't be terribly difficult, but it's still not (quite) as trivial as the drawing part.

How to edit an XML File

Howdy,
I was wondering if I was able to alter a distinct element in an XML file saved on a Windows Phone 7 device without actually having to serialize the whole file all over again.
As mentioned in previous answers, you can't save a fragment of XML on its own without saving the whole file. If file size was a big issue, then you could split the data into separate files (perhaps alphabetically; depends on the data), so that you're only saving a smaller dataset if you make a change.
As Matt mentioned, XML serialization does not provide the best performance on WP7 devices. Kevin Marshall has a great blog post detailing different serialization approaches and the performance of each. The fastest method is binary serialization, though there's nothing stopping you serializing the XML using the binary serialization approach.
In general, no - and this has little to do with Windows Phone 7 (although I don't know whether IsolatedStorageFileStream on WP7 even supports seeking).
I don't know of any mainstream filesystems with high level abstractions (such as those used by Java and C#) which allow you to delete or insert data in the middle of a file.
I suppose theoretically if you were happy to pad with whitespace, or never change the length of the data you're using, you could just overwrite the relevant bytes - but I don't think it would be a good idea at all. Very brittle and hard to work with.
Just go for overwriting the whole file.

Resources