Are there alternatives to ibtool for parsing NIB files? - macos

As part of an i18n project, I need to extract strings from a NIB file programmatically. This can be done quite easily with ibtool, of course. But this is a cross-platform product, of which the NIB is only part. It would be nice if we didn't have to lock string extraction to a particular machine, or have to kludge together catalog-merge scripts.
So I realize it's a long shot, since the demand for this is probably quite low, but are there any open-source alternatives to ibtool? Is there any documentation on the NIB format reliable enough to write a parser from?

If you use XIBs, then they are simple XML files. Just look for string elements that have the key NSContents. If you're familiar with XML parsing, it shouldn't take long to reverse engineer --export-strings-file in almost any language.
If you can't move to XIBs, you can read the keyedobjects.nib, which is a binary plist. One portable reader is the Perl implementation plutil.pl. You can also look at Apple's open source code to handle them in CFBinaryPlist.c. If you need to go this way, look at OpenCFLite. Reading and writing Plists is a key reason people use the portable Core Foundation.

Related

Extracting strings for translation from VB6 code

I have a legacy VB application that still has some life in it, and I am wanting to translate it to another language.
I plan to write a Ruby script, possibly utilising a parser, to extract all strings from the three million lines of source, replace them with constants, and move them to a string resource file that can be used to provide translations.
Is anyone aware of a script/library that could be used to intelligently extract the strings?
I'm not aware of any existing off-the-shelf tool that you could use. We created a tool like this at my work and it worked well. The FRM file format is quite simple (although only briefly documented). We wrote a tool that (1) extracted all strings from control definitions and (2) generated the code to reload them at runtime during Form_Load.

How can one create a polyglot PDF?

I like reading the PoC||GTFO issues and one thing I found remarkable when I first discovered it, was the "polyglot" nature of their PDF files.
Let met explain: when you consider for example their 8th issue, you may unzip files from it; execute the encryption they are talking about by running it as a script and even better(worse?) with their 9th issue you can even play it as a music file!
I'm currently in the process of writing small scripts every week and writing each time a little one page PDF in LaTeX to explain the said scripts. So I would really enjoy being able to create the same kind of PDF files. Sadly they explained (partly) in their first issue how to include zip files, but they did so through three small sketches of cmd lines without actual explanations.
So my question is basically :
how can one create such a polyglot PDF file containing stuff like a zip as well as being a shell script which may be run using arguments just like normal scripts?
I'm asking here about the process of creation, not just an explanation of how this is possible. The ideal way for me would that there are already some scripts or programs allowing to create easily such PDF files.
I've tried to search the net for the keywords "polyglot files" and others of the kind and wasn't able to find any useful matches. Maybe this process has another name?
I've already read the presentation by Julia Wolf which explains how things works, but I sadly haven't had time to apply the knowledge there to real world, because I'm sadly not used to play with file headers and the way a PDF is constructed.
EDIT:
Okay, I've read more and found the 7th edition of PoC||GTFO to be really informative concerning this subject. I may end up being able to create my own scripts to do such polyglot PDF files if I have some more time to consider it.
I played around with polyglots myself after attending Ange's talks and also talking to him in person. You really need to understand the file formats to be able to nest them into each other.
However, long story short, here are some links I found extremely useful for creating polyglots:
Some older Google Code Trunk
PoC of the polyglot stuff
Especially the second link (to github) will help you creating polyglots, but also understanding how they are working and how they are implemented. Since it is mostly Python stuff and very well / clean written, it is very useful and easy to follow.
I feel dissecting some file formats would be a good place to start. You can find many file format specifications for different file types through Google, but they can be a tough read and will likely take you some time to translate into whatever language you are using.
PDF: https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf
ELF: https://www.cs.cmu.edu/afs/cs/academic/class/15213-s00/doc/elf.pdf
ZIP: http://kat.sdf.org/zip_file_format.txt
The language(s) you select will need a way to read and write raw bytes (not just ascii alphanumeric), so perhaps C would be good for more direct access to memory. Some Python tricks could help with open sourcing the scripts easily.
To dissect the files, you may want to build a tool kinda like https://github.com/kvesel/zipbrk/ to take them apart, then put them all back together in a polyglot format. For example, zip does not require the section headers to be at the start (or even contiguous for that matter), and PDF magic number can appear in multiple places within the file as well. I also believe I recall a polyglot tool being included in one of the PoC||GTFO publishings (maybe issue 8 or 2??) as a polyglot in the pdf file.
Don't forget the hackers bible! :)
https://nostarch.com/gtfo

ELF/DWARF Parser to Out Structure elements

Is there a way to extract the size and address of elements within a structure using an elf file? I am hoping there is a tool available that can do this and export it to a more readable format.
My end goal is to convert the ELF file to a ASAM A2L file. A open source/free tool that could do this would even be better but most companies that do this charge alot for their tools.
I don't know offhand of anything pre-canned, but it isn't very hard to modify an existing tool to do it.
The "pahole" program from the "dwarves" project does something similar. It prints a structure definition in a certain way.
There's also a "pahole.py" script for gdb that does pretty much the same thing. This would be trivial to modify to print things however you like.
If you want to get a little deeper you could write it yourself using one of the existing DWARF libraries. I like the one in elfutils, but YMMV.

Parsing HTML in AppleScript

What's a good way to parse HTML in AppleScript?
I haven't dabbled in AppleScript in quite some time, and even when I did it was very minimal and uninvolved, so I don't really think naturally in the language quite yet. But I need to do some string manipulation and parse some HTML (basically some simple screen scraping).
Naturally, I'd like to avoid common pitfalls of HTML parsing. However, this is a temporary script and doesn't need to be particularly robust or supportable. I really just need to scrape specific substrings (from a known starting substring to the next known character) into a file.
I've done plenty of string manipulation in C# and similar languages, but AppleScript is an interesting change of pace to say the least. Can somebody point me to some good resources (Google searches on this subject seem to have a high noise-to-signal ratio), or help me out with some sample code snippets?
The ultimate goal of what I'm doing is to take a pre-determined list of pages, open each one in Safari (I'm doing everything through tell application "Safari"), parse out links which fit a certain pattern, and store all of those links in a file. Then go through that file, open each of those links, parse out more links which fit another pattern, and store all of those links in a file.
(The site is actually owned by someone we're working with, so don't worry about me violating any terms of service or anything like that. But for reasons outside the scope of this question, I'm doing some page scraping in AppleScript.)
I can't say enough good things about Matt Neuburg's AppleScript: the Definitive Guide. Without a doubt the most complete documentation of AppleScript ever done. Matt's also one of my favorite tech writers.
I would also check out this article. It contains a tutorial on how to do this; the example provided there parses HTML data from only one source, but I think it's worth looking at.

Localization best practices

I'm starting to modify my app, which uses all hardcoded strings for errors, GUI, etc. I'm considering these two approaches, but let me know if there is an even better way:
-Put all string in ressource (.rc) files.
-define all strings in a file, once for each language. Use a preprocessor define to decide which strings get compiled in.
Which of these two approaches is generally prefered?
Put all the strings in resource files. Once you've done that, there's several good translation packages available. One useful thing these packages do is allow you to get translation done by somebody who doesn't program.
Remember, also, that internationalization (i18n) is a large subject, and there's a lot of things to consider. It isn't just a matter of translating strings. Do a web search on it, at the very least. You might want to read a book on it: I used International Programming for Windows by Schmitt as a guide. It's an old book from Microsoft Press, and I had to get it through a used book service; most of the more modern stuff seems to be on internationalizing .NET apps.
Without knowing more about your project (what sort of software, who the intended audience is, what sort of organization you have, what sort of budget, why you're interested in internationalization, etc.), this is about the most I can tell you.
Generally you see locale specific resource files containing strings referenced by key. Compiling different versions for different locales is a very rigid solution and will be a maintenance nightmare. Using resource files also allows the user to have fallback locales.
There's another approach of just putting strings in the source with somethign like tr(" ") and usign one of the tools that strips them out and converts them.
It works with any toolkit/GUI library.
You can mark text to be converted and text not to change (such as protocol strings or db keys).
It makes the source easier to read and search, isntead of having to lookup what IDS_MESSAGE34 means.
One problem with resource files, at least with Windows/MFC, is that you can't use the stringtable in dialogs. So you have some text in the stringtabel and some in the dialog section which you have to dela with separately.

Resources