Conversion between knitr and sweave - rstudio

This might have been asked before, but until now I couldn't find a really helpful answer for me.
I am using R Studio with knitr and a colleague of mine who I need to cooperate with uses the sweave format. Is there a good way to convert a script back and forth between these two?
I have already found "Sweave2knitr" and hoped this would have an .rmd as output with all chunks changed (<<>> to {} etc.) but this is not the case. My main problem is that I would also need the option to convert from .rmd back to .rnw so that my colleague can also re-edit my work-over.
Thanks a lot!

To process the code chunks and convert the .Rnw file to .tex, you use the knit() function in the knitr package rather than Sweave().
R -e 'library(knitr);knit("my_file.Rnw")'
Sweave2knitr() is for converting old Sweave-based .Rnw files to the knitr syntax.
In Program defaults change :
Weave Rnw files using Sweave or knitr

The Rnw format is really LaTeX with some modifications, whereas the Rmd format is Markdown with some modifications. There are two main flavours of Rnw, the one used by Sweave being the original, and the one used by knitr being a modification of it, but they are very similar.
It's not hard to change Sweave flavoured Rnw to knitr flavoured Rnw (that's what Sweave2knitr does), but changing either one to Rmd would require extensive changes, and probably isn't feasible: certainly I'd expect a lot of manual work after the change.
So for your joint work with a co-author, I would recommend that you settle on a single format, and just use that. I would choose Rmd for this: it's much easier for your co-author to learn Markdown than for you to learn LaTeX. (If you already know LaTeX, that might push the choice the other way.)

Related

How can I convert GOPixbuf strings to images

I have an XML file produced with Gnumeric that contains images, stored as GOPixbuf strings inside XML. They look like this:
eXyA/4KEiP9xcnf/f3+E/3l5ff9xb3L/jo2Q/29wdP+ [truncated]
For each string I have width and height, and a rowstride parameter, like in this example:
<GOImage name="Image(70)" type="GOPixbuf" width="151" height="135" rowstride="604">
Is there a reasonable way to convert that to an image - any format will do?
I'm conversant with perl and image conversion tools (imagemagick, gimp) but I have not found any documentation by googling beyond GTK or GOffice docs.
You have already found stuff that is helpful. But since there are no Perl bindings for this on CPAN, you would have to make your own if you want to use Perl.
Fortunately, you don't have to know XS to do that. You can use FFI::Platypus to create temporary bindings and only map what you need.
The docs you have probably already found have a Getting started with GOffice section. After a quick check I found that on my recent Ubuntu there is a package that contains that lib. It is called libgoffice-0.10-dev.
Now you can set that up and play around with the lib functions. Somewhere in https://developer.gnome.org/goffice/unstable/GOImage.html there probably is a method to read and convert it.
One of the good ones might be go-image-get-pixbuf, which returns a GdkPixbuf. That in turn has a very extensive documentation. Maybe what you need might be in this one.
Good luck.

How to force rstudio/knitr/rmarkdown to use alternative pandoc binary (scholdoc)

scholdoc (see scholarlymarkdown.com) is a fork of pandoc that has !FINALLY! easy referencing of figures/code blocks etc. build in - a central missing piece in pandoc.
Is there any straight forward way to force usage of scholdoc instead of the shipped pandoc binary when using knitr/rmarkdown in rstudio?
When I set in .Rprofile
options(
rstudio.markdownToHTML = function(inputFile, outputFile) {
system(
paste(
"~/.cabal/bin/scholdoc",
shQuote(inputFile),
"-o", shQuote(outputFile)))
})
as indicated here, this seems to work, but, as it is missing all manner of command line options used by the internal pandoc, produces HTML out of the box and will lead me down a painful way of getting all the CLI options right.
After studying some rmarkdown code, I have also tried to set the environment variable RSTUDIO_PANDOC to contain the path of scholdoc - to no avail.
Can anyone point out an easy way to do this with up-to-date rstudio/scholdoc installations?
I asked this long ago an thought that for completeness sake, I'd point out, that bookdown has stepped into the arena to provide cross referencing of figures etc. within rmarkdown documents.
after issuing install.packages('bookdown'), RStudio may be coerced to use it by adding the following to the YAML header of a document:
output:
bookdown::pdf_document2:

TeXtoGIF for XeTeX

I need to extend TeXtoGIF to be able to handle XeTeX, ConTeXt, and other more modern implementations of TeX (point being that they can handle multiple fonts). Unfortunately, XeTeX in particular does not support DVI as an output format for its input, and my modifications break.
Please see the diff of changes at GitHub. My changes to the codebase are as follows:
Introduce a variable $cmdTeX to hold the TeX engine (LaTeX, XeLaTeX, etc.)
Add the option -xetex (or anything beginning with an x, really) to specify xelatex as the engine
Substitute the hard-coded latex call with the variable $cmdTeX.
I see two options to fixing this issue:
Coerce XeLaTeX to produce standard DVI output which, IIRC, isn't possible.
Find another sequence of commands (probably a different use of GS, which is why I included the tag, but so be it) to work with PDF output directly instead of DVI
So, I guess the question boils down to:
How can I convert a PDF into GIF without using graphical software?
which, probably, isn't a good fit for SO anymore IMHO.
It sounds like what you have is a patch you would like to submit to the author. Have you contacted him? Unfortunately his software doesn't (appear to) include a license so it may be hard to proceed from a legal standpoint. Most of the time in the open source world, if you encounter a non-responsive (or unwilling) author, you can do as you have already done, fork and patch. At that point you can choose to publish your new version, possibly with a new name, and conforming to the author's license.
From a software standpoint, the code is rather ancient, written for Perl 4. Because Perl has excellent backwards compatibility it will probably still work, but the question is, do you really want to? It may depend on your use-case. The original author was making gifs to use in web pages. If this is what you are doing, you might want to look at MathJaX which lets you use LaTeX right in your browser/HTML directly.
Instead of adding to my Q, this turned out to be a valid solution to my overall issue and should be recorded as such.
I should also note that someone over at TeX.SX pointed me to the standalone class which provides an option convert which, using -shell-escape, can do just about everything I need. Thus,
\documentclass[convert={density=6000,
size=1920x1600,
outext=.png},
border=1cm]{standalone}
\usepackage{fontspec}
\setmainfont{Zapfino}
\pagestyle{empty}
\begin{document}
it's all text
\end{document}
%%% Local Variables:
%%% mode: latex
%%% TeX-engine: xetex
%%% TeX-master: t
%%% End:
ConTeXt is not a modern TeX implementation (like LuaTeX, for instance). It's a macro package for several engines.
Since you want to support specific engines (e.g. XeTeX) and particular macro packages (e.g. ConTeXt), MathJax is not an option. You have to run the TeX engine, create a PDF and post process that PDF. I don't know why you choose GIF as a format, the vector format SVG would produce much prettier results, PNG would be my second choice.
Since you are not very specific about your input I assume you deal with multi page input files. You can use ghostscript to convert the PDF to a series of images.
As you said, you require GIF. According to gs -h ghostscript does not support GIF output, so we convert to PNG first:
gs \
-sDEVICE=png256 \
-dNOPAUSE \
-dBATCH \
-dSAFER \
-dTextAlphaBits=4 \
-q \
-r300x300 \
-sOutputFile=output-%02d.png input.pdf
Then use graphicsmagick or imagemagick to convert the PNGs to GIFs:
mogrify --format gif -- output-*.png

Methods of Parsing Large PDF Files

I have a very large PDF File (200,000 KB or more) which contains a series of pages containing nothing but tables. I'd like to somehow parse this information using Ruby, and import the resultant data into a MySQL database.
Does anyone know of any methods for pulling this data out of the PDF? The data is formatted in the following manner:
Name | Address | Cash Reported | Year Reported | Holder Name
Sometimes the Name field overflows into the address field, in which case the remaining columns are displayed on the following line.
Due to the irregular format, I've been stuck on figuring this out. At the very least, could anyone point me to a Ruby PDF library for this task?
UPDATE: I accidentally provided incorrect information! The actual size of the file is 300 MB, or 300,000 KB. I made the change above to reflect this.
I assume you can copy'n'paste text snippets without problems when your PDF is opened in Acrobat Reader or some other PDF Viewer?
Before trying to parse and extract text from such monster files programmatically (even if it's 200 MByte only -- for simple text in tables that's huuuuge, unless you have 200000 pages...), I would proceed like this:
Try to sanitize the file first by re-distilling it.
Try with different CLI tools to extract the text into a .txt file.
This is a matter of minutes. Writing a Ruby program to do this certainly is a matter of hours, days or weeks (depending on your knowledge about the PDF fileformat internals... I suspect you don't have much experience of that yet).
If "2." works, you may halfway be done already. If it works, you also know that doing it programmatically with Ruby is a job that can in principle be solved. If "2." doesn't work, you know it may be extremely hard to achieve programmatically.
Sanitize the 'Monster.pdf':
I suggest to use Ghostscript. You can also use Adobe Acrobat Distiller if you have access to it.
gswin32c.exe ^
-o Monster-PDF-sanitized ^
-sDEVICE=pdfwrite ^
-f Monster.pdf
(I'm curious how much that single command will make your output PDF shrink if compared to the input.)
Extract text from PDF:
I suggest to first try pdftotext.exe (from the XPDF folks). There are other, a bit more inconvenient methods available too, but this might do the job already:
pdftotext.exe ^
-f 1 ^
-l 10 ^
-layout ^
-eol dos ^
-enc Latin1 ^
-nopgbrk ^
Monster-PDF-sanitized.pdf ^
first-10-pages-from-Monster-PDF-sanitized.txt
This will not extract all pages but only 1-10 (for proof of concept, to see if it works at all). To extract from every page, just leave off the -f 1 -l 10 parameter. You may need to tweak the encoding by changing the parameter to -enc ASCII7 (or UTF-8, UCS-2).
If this doesn't work the quick'n'easy way (because, as sometimes happens, some font in the original PDF uses "custom encoding vector") you should ask a new question, describing the details of your findings so far. Then you need to resort bigger calibres to shoot down the problem.
At the very least, could anyone point
me to a Ruby PDF library for this
task?
If you haven't done so, you should check out the two previous questions: "Ruby: Reading PDF files," and "ruby pdf parsing gem/library." PDF::Reader, PDF::Toolkit, and Docsplit are some of the relatively popular suggested libraries. There is even a suggestion of using JRuby and some Java PDF library parser.
I'm not sure if any of these solutions is actually suitable for your problem, especially that you are dealing with such huge PDF files. So unless someone offers a more informative answer, perhaps you should select a library or two and take them for a test drive.
This will be a difficult task, as rendered PDFs have no concept of tabular layout, just lines and text in predetermined locations. It may not be possible to determine what are rows and what are columns, but it may depend on the PDF itself.
The java libraries are the most robust, and may do more than just extract text. So I would look into JRuby and iText or PDFbox.
Check whether there is any structured content in the PDF. I wrote a blog article explaining this at http://www.jpedal.org/PDFblog/?p=410
If not, you will need to build it.
Maybe the Prawn ruby library? link text

I need to write a .DDS file cross-platform, can someone point me to example?

I need to create a .DDS file with code that runs on both OSX and Windows. Although the format doesn't look difficult, I'd still like an example of writing the file. Note I don't need to read it, just write it.
C or C++ and RGBA bitmap.
I finally resorted to written a RAW file, and using GraphicConvertor (mac) to read it and write the DDS file. I think Photoshop can do it too. RAW files are simply RGB or RGBA or similar formats written straight to a binary file. Then in the reading application you tell it the dimensions so it can read it in. Then you export to whatever. Not a perfect solution but it worked for what I needed.

Resources