I'm looking for the best tool out there to extract any and all metadata embedded within the most popular image file formats (JPEG and PNG specifically). I would like to know about whatever is in there (XMP, Exif, IPTC/IIM, etc.). Ideally I am looking for an all-in-one solution that I can run from a command line, but am interested to hear about any other tools in this area that are of value.
I have found the following, each with advantages/disadvantages:
ExifTool is good, but the output is a little more roughshod that I would like.
DumpImage from the Metadata Working Group has good formatting of the metadata it does find, but doesn't support PNG.
I have recently released Binspector, the tool I ended up writing to answer this question to my own satisfaction. The basic premise of the tool is that it takes a format grammar and uses it to analyze a binary file. As long as the format grammar and the binary file are well-formed, one can inspect and analyze innumerable binary files and formats.
Code is hosted on GitHub, and a blog for the tool is here. (The overview post for the tool is here.)
As you did not mention any preferred programming language I take PHP as an example.
There is an Exif Extension for PHP which can be used to easily retrieve Metadata from an Image.
http://www.php.net/manual/en/function.exif-read-data.php
You could easily create a script that you can call from the command line. I must add that the extension only seems to provide support for JPEG and TIFF images.
You could try the official ADOBE XMP SDK. It is available for download at :
http://www.adobe.com/devnet/xmp.html
This is the complete SDK to read/write/manipulate metadata across a variety of formats.
In the SDK package there is one particular sample that might be of interest to you. Go to the "samples" folder build the samples as per documentation (available in the package). Look for the sample exe "DumpFile". This dumps all the metadata in the file to the console.
Related
I need to write and build easy maintainable, goodlooking, esay to change documentation in pdf and html 5 format. The source format must be easy to edit. This maven plugin has to support my company organziation theam(fonts, colors, pictures etc.), TOC generation, separation of chapters in different files, integration of images files, easy way to put code snipets in the documentation. I have Maven build and I was wondering what is the current best descision to do that?
I was investigate two options:
Doxia - using md(markdown) as input format. There is WSWG md free
editor, support a lot of the aforemention stuffs, etc. Need external repo for its artifacts.
Asciidoctor - use asdcii doc as input format. Support templating using fragments etc.
What are the advantages and disadvantaes of using this plugins?
Are there any other good solutions?
From my attempts to build the documentation first with Doxia and then with AsciiDoc I realised that Asciidoc is the better. It allows
Easy styling using yaml files. Default styling is also very good.
Asciidoc as a mark-up language is very good documented: Uder Guide AsciiDoc
Has good online editors and the language is more powerful than markdown for example, and easier to write in comparison to the xml format.
Good examples when using with Maven and ascii doc and easy to understand configurations.
I saw this related question about publishing toolchain but I know many people did lot of work to produce publishing toolchains recently.
One great example I found is this project from akosma.
Avdi Grimm shared his work with org-mode in this project
I know there are (should be) many others.
What I'm looking for, is a publishing toolchain with
asciidoc / markdown / textile / org-mode or latex input. I don't want xml input
pdf AND html output, epub output is not a requirement for me.
What I can
author templates in latex / html / css / js. again, no xml.
read and write ruby and shell scripts
Take a look at asciidoc, this is what O'Reilly has started using and it is a refreshing break from DocBook. I use asciidoc, the tools and support leaves a little to be desired, but there are people working to create better alternatives (that don't involve Python and the existing Docbook pipeline).
Check out this: https://github.com/runemadsen/asciidoc
EDIT 1/6/13: You also really need to check out AsciiDoctor. Dan Allen from RedHat has been spending a lot of time on this particular package and Ryan Waldron. I expect great things from AsciiDoctor as it is starting to emerge as a foundation for a bunch of important AsciiDoc documentation efforts.
I am converting some docs to pdf using wkhtmltopdf (currently using perl and the command line versions). Is it possible to change the "PDF Producer", "PDF Version" and "Fast Web View" fields? The current defaults are "wkhtmltopdf", "1.4 (Acrobat 5.x)", and "No", respectively. I didn't see anything in the wiki page
Pass the following with the command line to see supported features: " --extended-help"
Not sure if those specific params are supported or not.
I patched wkhtmltopdf to support an additional flag recently, and it would be quite easy to add parameters to change those. I don't believe they are supported currently, though.
PDF Producer: Nope. Most apps want folks to know that particular app generated the PDF.
PDF Version: Nope, but trivial. The version number at the beginning of the file is just a courtesy really. What exactly are you after with this? Chances are you don't really need it. The PDF generated isn't going to acquire any features automagically just because the PDF claims to be this version or that. It's only really used so a viewer opening a newer PDF can say something like "I don't support this version, some stuff might not work". Because everything will work regardless (unless someone happens to have a VERY old version of Acrobat/Reader), I don't see the issue.
Fast Web View: Nope, and decidedly non-trivial. "Fast Web View" means everything needed to display the first page of the PDF is sorted to the front of the file, and there are various "hints" on where an app downloading the PDF can find this or that. It's not just a flag, not by a long stretch.
Zero for three. Sorry.
Is it possible to generate a set of wiki pages from XML comment file generated by Visual Studio?
I'm talking about something like Sandcastle, but for wiki format instead of compiled CHM.
Edit: I'm using MediaWiki which can import/export articles in XML. So I hope that it is possible to write a transformation converting XML comments to MediaWiki XML.
I'd recommend a bit different solution:
Use Help Server to publish .CHM/.HxS on the web
Use special MediaWiki templates to link reference from Wiki like here.
Use <see href="..."> to link Wiki pages from XML comments
See also: FiXml
This is not exactly what you wanted, but I hope this will be helpful.
If the items mentioned above do not suffice, have you tried to simply build your own XSLT transform into the wiki markup of your choice?
You can write a simple application in .NET (or pick your platform of choice) to transform the doc XML format to wiki XML format. You'd still have to keep the wiki updated with the output files manually.
Anyone know of a wiki or wiki plugin that generates a PDF file or CHM file that spans the entire wiki?
I would like to have control of the table of contents.
I would like the internal and external links to work.
Ideally allow for tweaking the output template, but that is not a deal-breaker.
I want to generate content using WIKI syntax and mindset (lots of cross-links etc), but ship the content in PDF, CHM or an embedded application form. Something friendlier than installing the wiki software on the enduser machine...
XWiki does this out of the box.
The MediaWiki PDF Export extension allows you to select a group of PDF pages. I've not installed it yet, so unsure if it's easy to use that feature to select all the pages.
Confluence lets you choose pages when you export to PDF a space
But you can't customise a lot the PDF
You can customise it slightly through a theme (based on velocity)
Sphinx (https://www.sphinx-doc.org) is a fairly nice tool for generating HTML (or CHM) and PDF documentation, with wiki-like syntax. It is not a wiki; you can't edit through the web and generating HTML requires a build process. Still, it is pretty nice, with cross-references, fairly simple markup, and (in the HTML output) a search engine implemented in JavaScript with no server-side dependencies beyond static file hosting. Sphinx was developed for the new version of the Python documentation and is pretty themable; for example, the GeoServer project (which I work on, excuse the shameless plug) is using Sphinx with a custom theme for the new version of their user and developer manuals.
JIRA (http://www.atlassian.com/software/jira/default.jsp) is your geeky wet dream in terms of control; it exports to PDF (amongst other) and you can have complete control of pages, TOC and other aspects, although expect some complexity to set it up.
Microsoft has an HtmlHelp Authoring tool that can create chm files from html files.
If you need the help files both on the web and within deployed applications, generating the help from the same files used on the web could be a great solution. If the help site was created using asp.net (ie database driven) it might be worth using basic styles and creating a tool to generate html files by reading in the served out pages?
Have a look at: http://msdn.microsoft.com/en-us/library/ms524239(VS.85).aspx
I guess one could also additionally then create a PDF from the Html pages?