Export writer document to PDF with another than the default PDF-version - pdf-generation

For a few month I am working with Libreoffice 7.4. I am trying to get a PDF with PDF-version 1.5 (because LaTeX still does not accept 1.6 for input).
I am using a command like this as described in this issue by Mike Kaganski:
soffice --convert-to pdf:writer_pdf_Export:{"SelectPdfVersion":"type":"long","value":"15"}} %1
where %1 is a fully qualified writer document (odt). This produces the correct pdf, but always with PDF-spec 1.6. Changing the value in the command to anything has no effect.
By the way, using the exact syntac as in the link, i.e. 'pdf:writer_pdf ...}}' produces not a *.pdf but a *.'pdf.
What am I doing wrongly? Thanks for any help.

Thanks, K J. Though this is not an answer to my question, it led into the right direction (and thus is even better than an answer). It's certainly true that downgrading any document to a lower version is or might be dangerous.
The problem is, however, that \pdfminorversion only works for PdfTeX and not LuaTeX.
According to the answers found here for LuaTeX (and likely XeTeX, I didn't check), you have to set in the preamble of a project either
\pdfvariable minorversion=6
or
\directlua{pdf.setminorversion(6)}
(the latter is not likely to work with XeTeX).
So I still don't know how to effectively (versus theoretically – if someone knows an answer to this he is still wellcome) downgrade a PDF, but nontheless my actual problem is solved.

Related

Conversion between knitr and sweave

This might have been asked before, but until now I couldn't find a really helpful answer for me.
I am using R Studio with knitr and a colleague of mine who I need to cooperate with uses the sweave format. Is there a good way to convert a script back and forth between these two?
I have already found "Sweave2knitr" and hoped this would have an .rmd as output with all chunks changed (<<>> to {} etc.) but this is not the case. My main problem is that I would also need the option to convert from .rmd back to .rnw so that my colleague can also re-edit my work-over.
Thanks a lot!
To process the code chunks and convert the .Rnw file to .tex, you use the knit() function in the knitr package rather than Sweave().
R -e 'library(knitr);knit("my_file.Rnw")'
Sweave2knitr() is for converting old Sweave-based .Rnw files to the knitr syntax.
In Program defaults change :
Weave Rnw files using Sweave or knitr
The Rnw format is really LaTeX with some modifications, whereas the Rmd format is Markdown with some modifications. There are two main flavours of Rnw, the one used by Sweave being the original, and the one used by knitr being a modification of it, but they are very similar.
It's not hard to change Sweave flavoured Rnw to knitr flavoured Rnw (that's what Sweave2knitr does), but changing either one to Rmd would require extensive changes, and probably isn't feasible: certainly I'd expect a lot of manual work after the change.
So for your joint work with a co-author, I would recommend that you settle on a single format, and just use that. I would choose Rmd for this: it's much easier for your co-author to learn Markdown than for you to learn LaTeX. (If you already know LaTeX, that might push the choice the other way.)

Less beautifier - format code

Is there is code beautifier for less such as http://www.lonniebest.com/formatcss/ for css? I need sort properties in less code by alphabet.
I use CSSComb http://csscomb.com/. This one is a npm module but there are plugins for it. Especially I use it with Sublime Text.
It works with less too although there might me some edge case not (yet) properly handled. But it's good for me.
You can order rules however you want. Just read the docs ;)
You can also use cssbrush. It is based and uses the csscomb under the hood, but include a fix for this bug and also has the ability to remember the files that were previously beautified, so it will only beautify changed files on each run.
Full disclosure, I wrote it.

MediaWiki upgrade breaks File prefix but legacy Image works

Did a MediaWiki upgrade from 1.15.1 to 1.20.2 by following the simple update instructions (basically a new installation, copying over the old LocalSettings.php, update script and copying over images). Weird thing now is that all of the File: prefixes don't work. Instead the internal links to images is a "file:name of image" URL rather than "http://mediawiki address/index.php/File:name of image".
Anybody else getting this. Assuming it is something wrong with the old LocalSettings.php.
Ran the refreshLinks and refreshImageMetadata maintenance scripts without fixing the problem.
In the comments, you wrote that you have file: added to $wgUrlProtocols. This is very likely what's triggering the problem.
It looks like something has changed in the parser between MW 1.15 and 1.20 so that it's now parsing file:whatever as an external link (since it matches the file: prefix you've defined in $wgUrlProtocols) even if it's inside square brackets.
The obvious workaround would be to change the $wgUrlProtocols entry from file: to file:// so that it will only match if the slashes are there (as they should be, according to standard file: URL syntax). Since your on-wiki filenames are, presumably, very unlikely to begin with double slashes, they should not match this more specific prefix.
That said, this could still be considered a bug in MediaWiki. You may want to file a bug report about it, if there isn't one yet.
(Edit: Looks like Mark A. Hershberger filed one already.)

Generating table of contents

I posted this one a couple of months ago on the Mathematica newsgroup, but got no usable response. I thought I'd give SO a try.
The question was: I don't seem to be able to find the method to generate the table of contents of a
Mathematica document I'm working on. Anyone knows this feauture's
hideout?
David Annetts pointed me in the direction of the AuthorTools, an old v5.1
utility package that's still hidden in Mathematica. However, it
doesn't work on my document (v7). Any clue?
Edit
The TOC should contain correct section numbers (if present in the stylesheet) and list page numbers (this requires taking page size settings into account).
Perhaps looking at the code of Yuri Kandrashkin's package, Sidebar, will be useful?

Methods of Parsing Large PDF Files

I have a very large PDF File (200,000 KB or more) which contains a series of pages containing nothing but tables. I'd like to somehow parse this information using Ruby, and import the resultant data into a MySQL database.
Does anyone know of any methods for pulling this data out of the PDF? The data is formatted in the following manner:
Name | Address | Cash Reported | Year Reported | Holder Name
Sometimes the Name field overflows into the address field, in which case the remaining columns are displayed on the following line.
Due to the irregular format, I've been stuck on figuring this out. At the very least, could anyone point me to a Ruby PDF library for this task?
UPDATE: I accidentally provided incorrect information! The actual size of the file is 300 MB, or 300,000 KB. I made the change above to reflect this.
I assume you can copy'n'paste text snippets without problems when your PDF is opened in Acrobat Reader or some other PDF Viewer?
Before trying to parse and extract text from such monster files programmatically (even if it's 200 MByte only -- for simple text in tables that's huuuuge, unless you have 200000 pages...), I would proceed like this:
Try to sanitize the file first by re-distilling it.
Try with different CLI tools to extract the text into a .txt file.
This is a matter of minutes. Writing a Ruby program to do this certainly is a matter of hours, days or weeks (depending on your knowledge about the PDF fileformat internals... I suspect you don't have much experience of that yet).
If "2." works, you may halfway be done already. If it works, you also know that doing it programmatically with Ruby is a job that can in principle be solved. If "2." doesn't work, you know it may be extremely hard to achieve programmatically.
Sanitize the 'Monster.pdf':
I suggest to use Ghostscript. You can also use Adobe Acrobat Distiller if you have access to it.
gswin32c.exe ^
-o Monster-PDF-sanitized ^
-sDEVICE=pdfwrite ^
-f Monster.pdf
(I'm curious how much that single command will make your output PDF shrink if compared to the input.)
Extract text from PDF:
I suggest to first try pdftotext.exe (from the XPDF folks). There are other, a bit more inconvenient methods available too, but this might do the job already:
pdftotext.exe ^
-f 1 ^
-l 10 ^
-layout ^
-eol dos ^
-enc Latin1 ^
-nopgbrk ^
Monster-PDF-sanitized.pdf ^
first-10-pages-from-Monster-PDF-sanitized.txt
This will not extract all pages but only 1-10 (for proof of concept, to see if it works at all). To extract from every page, just leave off the -f 1 -l 10 parameter. You may need to tweak the encoding by changing the parameter to -enc ASCII7 (or UTF-8, UCS-2).
If this doesn't work the quick'n'easy way (because, as sometimes happens, some font in the original PDF uses "custom encoding vector") you should ask a new question, describing the details of your findings so far. Then you need to resort bigger calibres to shoot down the problem.
At the very least, could anyone point
me to a Ruby PDF library for this
task?
If you haven't done so, you should check out the two previous questions: "Ruby: Reading PDF files," and "ruby pdf parsing gem/library." PDF::Reader, PDF::Toolkit, and Docsplit are some of the relatively popular suggested libraries. There is even a suggestion of using JRuby and some Java PDF library parser.
I'm not sure if any of these solutions is actually suitable for your problem, especially that you are dealing with such huge PDF files. So unless someone offers a more informative answer, perhaps you should select a library or two and take them for a test drive.
This will be a difficult task, as rendered PDFs have no concept of tabular layout, just lines and text in predetermined locations. It may not be possible to determine what are rows and what are columns, but it may depend on the PDF itself.
The java libraries are the most robust, and may do more than just extract text. So I would look into JRuby and iText or PDFbox.
Check whether there is any structured content in the PDF. I wrote a blog article explaining this at http://www.jpedal.org/PDFblog/?p=410
If not, you will need to build it.
Maybe the Prawn ruby library? link text

Resources