Why does apple use .plist files? - macos

Why does apple use .plist files?
Windows uses .ini files, which may be less flexible, but also takes up less space, for the same reason why JSon takes up less space than XML.
They could even use JSON for their configuration, it's at least as easy to parse, supports the value types they need (dict etc.) and takes up the least space.

The original property list format found in NeXTSTEP looked a lot like JSON, but with slightly different syntax. When NeXTSTEP became Mac OS X, that format was replaced with the XML version you see today. The new format had a few improvements over the old one which you can read about in that link.
Property lists can hold several data types that JSON (and INI files) cannot: Numbers specified as real numbers (floating point) or integers, dates, and base64-encoded binary data. Also, JSON wasn’t documented publicly until well after Mac OS X was released.
Mac OS 10.2 and newer include a binary plist format that’s much more space-efficient than XML, and plist files can be converted losslessly between the two.

Because NeXTSTEP used them, so Apple adopted them as well.
Property List Wiki Page:
Under NeXTSTEP, property lists were designed to be human-readable and
edited by hand, serialized to ASCII in a syntax somewhat like a
programming language.
NeXTSTEP used one format to represent a property list, and the
subsequent GNUstep and Mac OS X frameworks introduced differing
formats.
While Mac OS X can also read the NeXTSTEP format, Apple sets it aside
in favor of two new formats of its own.
In Mac OS X 10.0, the NeXTSTEP
format was deprecated, and a new XML format was introduced, with a
public DTD defined by Apple. The XML format supports non-ASCII
characters and storing NSValue objects (which, unlike GNUstep's ASCII
property list format, Apple's ASCII property list format does not
support). Since XML files, however, are not the most space-efficient
means of storage, Mac OS X 10.2 introduced a new format where property
list files are stored as binary files. Starting with Mac OS X 10.4,
this is the default format for preference files.

I believe that was one of the things left over from the NeXTSTEP days... as for why they prefer to use it, it's probably because they can. ;-)

Related

How do I reverse engineer Mac OS X language localisation files for natural language learning?

OK, the goal of this question is not strictly programming related but it is a question programmers can answer using programming tools, and programmers may find useful answers here. Bear with me.
I find changing the system language in Mac OS X a useful way to augment my learning of natural languages, eg French. However sometimes I find a menu item or dialog box in French that I can't understand and it's a bore to google the translation or change the system language back to English. But I know that the English translation is hidden away somewhere in the localisation file and maps somehow to the French phrase. So what I want to do is extract all the text from all the localisation files to develop a mapping of this phrase in English = that phrase in French so I can look it up easily.
I know that the localisation files are stored in something like Localizable.strings, lproj files and nib files but I can't make head or tail of how they are stored or how to work with them. I can program but I've never written anything in Xcode. All the information I can find is for Mac OS / iOS programmers to localise their software, not for hackers to extract already made localisation information.
How can I extract the foreign language information as plain text from Mac OS X system and 3rd party software localisation files? Thanks!
Strings files are easy. They're simply dictionaries serialized as property lists. The dictionary keys are used by the program to look up the given string for a particular localization. You can build a mapping from English to another language by loading both dictionaries, iterating over the keys, and using the value from the English dictionary as the key in your output and the value from the other language dictionary as the value in your output.
NIBs are harder. The build process "compiles" NIB files in to a form that's not conduicive to editing or parsing. If you have access to uncompiled NIB files then you can use ibtool --export-strings-file to dump a strings file, which you could then process as per above. If you don't then I think you may have a hard time.

OS X - how to calculate normalized file name

I need to create a mapping between file names generated on Windows and OS X. I know that OS X "converts all file names to decomposed Unicode" however, "most volume formats do not follow the exact specification for these normal forms"
So, it does not seem a simple matter of converting the Windows name to NFD using a standard UTF8 API and being sure I have the correct OS X name. Is there a way to determine what the actual OS X file name will be without actually creating the file in the file system and then scanning the directory to see what was actually created?
I think the answer is this from TechNote 1150 HFS Plus Volume Format:
Note: The Mac OS Text Encoding Converter provides several constants
that let you convert to and from the canonical, decomposed form stored
on HFS Plus volumes. When using CreateTextEncoding to create a text
encoding, you should set the TextEncodingBase to
kTextEncodingUnicodeV2_0, set the TextEncodingVariant to
kUnicodeCanonicalDecompVariant, and set the TextEncodingFormat to
kUnicode16BitFormat. Using these values ensures that the Unicode will
be in the same form as on an HFS Plus volume, even as the Unicode
standard evolves.
You're probably looking for -[NSString fileSystemRepresentation] method.
Note that there is no general solution for this task. What is a valid file name depends on filesystem of the volume you're saving on. Not every file name valid for HFS+ is valid for FAT32, for example.
For Mac's “standard” filesystem (currently HFS+), fileSystemRepresentation should give what you need; for other file systems, there is no general way. Think about ones that don't exist but will be introduced in the future, for example :)
According to your link, filesystem drivers appear to (mostly) follow one of two behaviours:
* Return all names in NFD, and convert names as appropriate.
* Don't perform any conversions.
In both these cases, if you create a file on OSX in NFD, reading it back on OSX should give you the name in NFD.
OTOH, if your filename goes from Windows → NFS → Mac and you want to do some sort of sync, you're out of luck. This is not an easy thing to do, since the underlying problem is a little philosophical: Should filenames be byte strings or Unicode strings? I believe Unix traditionally does the former, and at least in Linux, UTF-8 NFC names are merely a convention.
(It gets worse, since IIRC HFS+ is defined to use Unicode 3.something, so a naïve conversion to NFD might be wrong for characters added/changed since then unless the API you use can guarantee a specific Unicode version.)

Looking for the mac os resource definition for the 'styl' resource

I have searched the net for this antiquated piece of Mac OS 9 technology and actually have a need of it for current Mac OS X development.
I can not find a .r definition in the current set of Mac OS X SDKs.
Effectively I need to be able to analyse and create a styl resource.
The 'styl' resource contains the same structure used by TextEdit (the ancient TextEdit, not the current TextEdit.app) when copying styled text to the clipboard. It is always used in tandem with a corresponding plain text buffer. See this tech note.
This corresponds to struct StScrpRec in TextEdit.h. You can find this file at /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/Headers/TextEdit.h. Note that there are not necessarily 1601 elements in the ScrpSTTable: it's a variable length array, whose actual length is given by the scrpNStyles member of StScrpRec.

Searching the Mac OSX system dictionaries?

I'd like to search for words in the OS X system dictionary (or dictionaries) using a simple glob or regex rather than a known text. (Currently I'm using /usr/share/dict/words instead, but the OSX dict would be a lot nicer.)
The Dictionary Services interface is quite limited and doesn't allow this, but it seems like DSGetTermRangeInString might be doing something similar under the hood. Does anyone know of a way to access such functionality?
Alternatively, is there a way to extract a word list from the dictionary? I could then grep that. Some dictionaries seem to include the source XML in the bundle, which should be easy enough to parse, but (not surprisingly, I guess) the big language dictionaries only have the data in some binary format. Any clues as to what that might be?
The dictionary Apple provides in OS X is licensed from one of the major publishers. Legally, they can't let you dump the whole word list.

ANSI or OEM Codepage when using MME and DirectMusic?

I noticed that when reading MIDI port names from MME, the names are multi-byte strings encoded using the ANSI Codepage, which my app uses by default. When receiving those names from the DirectMusic driver, the names are wide-character strings encoded with the OEM Codepage. See this article by Raymond Chen for a quick refresher on Codepages.
On my German system, this means that when using the current codepage, which turns out to be the ANSI one, I get "Audiogerät" from MME, and "Audiogeröt" from DirectMusic, the latter being wrong. This gets fixed when I treat that last name as OEM-encoded instead.
So how do I know with which codepage to decode those names? Why does the name coming from DirectMusic get encoded differently? Does it come from the USB driver? The COM framework? DirectMusic? How can I know for sure which codepage to use when reading the names of my MIDI ports?
For info:
I use the MultiByteToWideChar() and WideCharToMultiByte() functions to perform the conversions, with CP_ACP and CP_OEMCP as argument for the codepage to use.
I use midiInGetDeviceCaps() to get MIDI port information from the MME subsystem...
... and convert MIDIINCAPS.szPname using the CP_ACP (ANSI) codepage.
I use IID_IDirectMusic8::EnumPort() to get port information from DirectMusic...
... and convert DMUS_PORTCAPS.wszDescription using the CP_OEMCP codepage.
I don't know for sure why the DirectMusic framework would use one set of codepages, and MME another, but the solution here on your end is probably to build an abstraction layer and then make specific implementations for each API. That way, the higher levels of your software don't need to concern itself with details like this.
That said, the endpoint names definitely come from the OS. USB MIDI devices specify only endpoint types (ie, either input or output, and the number), but the OS is free to interpret them as it sees fit, which is why they are localized.
There is not a specific API call (as far as I know) to find out which codepage the framework will deliver its strings in. However, DirectMusic does seem to use double wide characters with OEM codepage as a general convention, though I could not find this clearly stated in any of the MSDN docs. In the MSDN DirectMusic documentation about MIDI port capability structures, the description type clearly is defined as a WCHAR, and the Game Audio Programming book seems to also indicate that this type is an API-wide convention. While it's dangerous to assume that OEM is the default encoding for these chars, I can't find anything that says otherwise (and googling for "DirectMusic codepage" now lists this page as the top hit).
Edit: Check out this stackoverflow question on determining the current OS codepage. It is possible that the DirectMusic API sets the codepage in this manner.
There isn't really an automatic way to tell what codepage is used for these types of data. See here: How can I detect the encoding/codepage of a text file

Resources