exams2pandoc error message - invalid template - pandoc

I'm new to R and barely know enough to make the basics of R-exams work. I've successfully used it to make exams for printing (exams2pdf) and for uploading into canvas, but I keep getting the following error when I try to run exams2pandoc:
Error in make_exams_write_pandoc(name = name, type = type, template = template, :
invalid template: exactly 9 '#-' lines required (and 0 found)
I don't understand what it's telling me and need a little direction.
Note - In the midst of me trying to figure out the problem, exams2pandoc did successfully output a docx file, maybe once or twice, on one of the sample files (e.g., switzerland.Rmd), but now I keep getting the error message above regardless of the file.
I'm not sure what to try at this point (e.g., tweaking one of the template files), nor do I quite know how to do that. Thanks in advance for any assistance.

Your problem sounds like the exams2pandoc() templates shipped along with the package have been modified/corrupted. I would recommend re-installing the exams package. After that exams2pandoc(c("swisscapital.Rmd", "deriv.Rmd")) should work again and produce a file pandoc1.docx.
If you want to modify the template, this is possible but poorly documented. Also the template format might change in future versions, it's still a bit ad hoc. The default template is a LaTeX file plain.tex:
\documentclass[a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage{a4wide,color,Sweave,url,amsmath,booktabs,longtable}
\begin{document}
%% Exam ##ID##
%% ##Date##
\begin{enumerate}
#-
\item
#-
\textbf{##Questionheader##}\\
#-
##Question##
#-
\begin{enumerate}[(a)]
\item ##Questionlist##
\end{enumerate}
#-
\textbf{##Solutionheader##}\\
#-
##Solution##
#-
\begin{enumerate}[(a)]
\item ##Solutionlist##
\end{enumerate}
#-
#-
\end{enumerate}
\end{document}
You see that #- lines are used to define several sections in the template file which contain certain placeholders. If you want to omit the question header it's simplest to create a file, say myplain.tex, where this line is commented:
%% \textbf{##Questionheader##}\\
Analogously, other parts could be commented or modified. And then you can call exams2pandoc(..., template = "myplain.tex").

Related

JMeter - PDF Conversions Produces Blank PDF's

I know there are similar articles like PDF file conversion in JMeter but they do not answer the actual problem which is "when converting a PDF to a variable/object/property then back to PDF the document whilst the correct number of pages is 'whit on white'= blank.
is there a way to :
Create a runtime variable/object/property from an existing PDF file that can be used in a subsequent action.
Here other actions happen in the Test Plan but they do not
Convert the variable/object/property back to a pdf so that when viewed it does not contain just blanks.
Notes: I do not just wish to just copy a to pdf to pdf.
I have also tried creating a UDV form the pdf using the following posted on here without success too.
${__groovy(vars.putObject("hoping_its_a_pdf"), new File("my_original.pdf"))}
Reading other posts here I have also noticed strange character strings like "%âãÏÓ" when using both putObect and props.put when viewing them post creation but as the article said, most probably page break characters or similar so I have ignored those for now as I assumed it is the conversion and not the reason for the blank content.
Can someone please assist as this is now 4 weeks in and I still have white pdf's.
We cannot state what's wrong with the code which you're copying and pasting from some random sources.
There is not problem with storing the PDF file in JMeter Variables or properties and creating the file back from them.
Demo:
There are 2 problems the only piece of "code" you're sharing:
Your way of using vars.putObject() function is wrong, it takes 2 parameters: variable name and the object value. See Top 8 JMeter Java Classes You Should Be Using with Groovy article for more information on this and other JMeter API shorthands
Apart from this the function itself is syntactically incorrect, you need to escape any comma in the function with a backslash
So if you change your:
${__groovy(vars.putObject("hoping_its_a_pdf"), new File("my_original.pdf"))}
to
${__groovy(vars.putObject('hoping_its_a_pdf'\, new File('my_original.pdf')),)}
at least this bit will start working as you expect.

Validating YAML file in PhpStorm

I'm working on a project where YAMLs are used (among other use cases) for storing synonym lists. A file may look a little like this:
- "streifen,gestreift"
- "fleeceoverall,fleeceanzug"- "federball,badminton"
- "hochgarage,parkgarage"
In this case - "federball,badminton" is on the same row as - "fleeceoverall,fleeceanzug" which causes the build of the application to fail with an error stating
Unexpected characters near "- "federball,badminton".
I tried to configure a code style profile for code inspections as mentioned here in the PhpStorm documentation:
https://www.jetbrains.com/help/phpstorm/customizing-profiles.html?keymap=secondary_default_for_macos
But I don't know what to adjust here. I using the IDE-Standard which looks like this for wrapping and braces (which I guess is what I'm looking for ;) :
I also took a look at validating my YAML against a JSON file as mentioned here: https://www.jetbrains.com/help/phpstorm/yaml.html# but ultimately I don't understand how this works :/
So I guess I'm a little lost on how to avoid the errors at build time beforehand and would love some advice!

Is there a way to change the projection in a topojson file?

I am trying to create a topojson file projected using geoAlbersUsa, originating from the US Census's ZCTA (Zip Codes, essentially) shapefile. I was able to successfully get through the examples in the excellent https://medium.com/#mbostock/command-line-cartography-part-1-897aa8f8ca2c using the specified maps, and now I'm trying to get the same result using the Zip Code-level shapefiles.
I keep running into various issues due to the size of the file and the length of the strings within the file. While I have been able to create a geojson file and a topojson file, I haven't been able to give it the geoAlbersUsa projection I want. I was hoping to find something to convert the current topojson file into a topojson file with a geoAlbersUsa projection but I haven't been able to find any way.
I know this can be done programmatically in the browser, but everything I've read indicates that performance will be significantly better if as much as possible can be done in the files themselves first.
Attempt 1: I was able to convert the ZCTA-level shapefile to a geojson file successfully using shp2json (as in Mike Bostock's example) but when I try to run geoproject (from d3-geo-projection) I get errors related to excessive string length. In node (using npm) I installed d3-geo-projection (npm install -g d3-geo-projection) then ran the following:
geoproject "d3.geoAlbersUsa()" < us_zips.geojson > us_zips_albersUsa.json
I get errors stating "Error: Cannot create a string longer than 0x3fffffe7 characters"
Attempt 2: I used ogr2ogr (https://gdal.org/programs/ogr2ogr.html) to create the geojson file (instead of shp2json), then ran tried to run the same geoproject code as above and got the same error.
Attempt 3: I used ogr2ogr to create the geojson sequence file (instead of a geojson file), then ran geo2topo to create the topojson file from the geojsons file. While this succeeded in creating the topojson file, it still doesn't include the geoAlbersUsa projection in the resulting topojson file.
I get from the rather obtuse documentation of ogr2ogr that an output projection can be specified using -a_srs but I can't for the life of me figure out how to specify something that would get me the geoAlbersUsa projection. I found this reference https://spatialreference.org/ref/sr-org/44/ but I think that would get me the Albers and it may chop off Alaska and Hawaii, which is not what I want.
Any suggestions here? I was hoping I'd find a way to change the projection in the topojson file itself since that would avoid the excessively-long-string issue I seem to run into whenever I try to do anything in node that requires the use of the geojson file. It seems like possibly that was something that could be done in earlier versions of topojson (see Ways to project topojson?) but I don't see any way to do it now.
Not quite an answer, but more than a comment..
So, I Googled just "0x3fffffe7" and found this comment on a random Github/NodeJS project, and based on reading it, my gut feeling is that the node stuff, and/or the D3 stuff you're using is reducing your entire ZCTA-level shapefile down to ....a single string stored in memory!! That's not good for a continent-scale map with such granular detail.
Moreover, the person who left that comment suggested that the OP in that case would need a different approach to introduce their dataset to the client. (Which I suppose is a browser?) In your case, might it work if you query out each state's collection of zips into a single shapefile (ogr2ogr can do this using OGR-SQL), which would give you 5 different shapefiles. Then for each of these, run them through your conversions to get json/geoalbers. To test this concept, try exporting just one state and see if everything else works as expected.
That being said, I'm concerned that your approach to this project has an unworkable UI/architectural expectation: I just don't think you can put that much geodata in a browser DIV! How big is the DIV, full screen I hope?!?
My advice would be to think of a different way to present the data. For example an inset-DIV to "select your state", then clicking the state zooms the main DIV to a larger view of that state and simultaneously pulls down and randers that state's-specific ZCTA-level data using the 50 files you prepped using the strategy I mentioned above. Does that make sense?
Here's a quick example for how I expect you can apply the OGR_SQL to your scenario, adapt to fit:
ogr2ogr idaho_zcta.shp USA_zcta.shp -sql "SELECT * FROM USA_zcta WHERE STATE_NAME = 'ID'"
Parameters as follows:
idaho_zcta.shp < this is your new file
USA_zcta.shp < this is your source shapefile
-sql < this signals the OGR_SQL query expression
As for the query itself, a couple tips. First, wrap the whole query string in double-quotes. If something weird happens, try adding leading and trailing spaces to the start and end of your query, like..
" SELECT ... 'ID' "
It's odd I know, but I once had a situation where it only worked that way.
Second, relative to the query, the table name is the same as the shapefile name, only without the ".shp" file extension. I can't remember whether or not there is case-sensitivity between the shapefile name and the query string's table name. If you run into a problem, give the shapefile and all lowercase name and use lowercase in the SQL, too.
As for your projection conversion--you're on your own there. That geoAlbersUSA looks like it's not an industry standard (i.e EPSG-coded) and is D3-specific, intended exclusively for a browser. So ogr2ogr isn't going to handle it. But I agree with the strategy of converting the data in advance. Hopefully the conversion pipeline you already researched will work if you just have much smaller (i.e. state-scale) datasets to put through it.
Good luck.

How to force rstudio/knitr/rmarkdown to use alternative pandoc binary (scholdoc)

scholdoc (see scholarlymarkdown.com) is a fork of pandoc that has !FINALLY! easy referencing of figures/code blocks etc. build in - a central missing piece in pandoc.
Is there any straight forward way to force usage of scholdoc instead of the shipped pandoc binary when using knitr/rmarkdown in rstudio?
When I set in .Rprofile
options(
rstudio.markdownToHTML = function(inputFile, outputFile) {
system(
paste(
"~/.cabal/bin/scholdoc",
shQuote(inputFile),
"-o", shQuote(outputFile)))
})
as indicated here, this seems to work, but, as it is missing all manner of command line options used by the internal pandoc, produces HTML out of the box and will lead me down a painful way of getting all the CLI options right.
After studying some rmarkdown code, I have also tried to set the environment variable RSTUDIO_PANDOC to contain the path of scholdoc - to no avail.
Can anyone point out an easy way to do this with up-to-date rstudio/scholdoc installations?
I asked this long ago an thought that for completeness sake, I'd point out, that bookdown has stepped into the arena to provide cross referencing of figures etc. within rmarkdown documents.
after issuing install.packages('bookdown'), RStudio may be coerced to use it by adding the following to the YAML header of a document:
output:
bookdown::pdf_document2:

Methods of Parsing Large PDF Files

I have a very large PDF File (200,000 KB or more) which contains a series of pages containing nothing but tables. I'd like to somehow parse this information using Ruby, and import the resultant data into a MySQL database.
Does anyone know of any methods for pulling this data out of the PDF? The data is formatted in the following manner:
Name | Address | Cash Reported | Year Reported | Holder Name
Sometimes the Name field overflows into the address field, in which case the remaining columns are displayed on the following line.
Due to the irregular format, I've been stuck on figuring this out. At the very least, could anyone point me to a Ruby PDF library for this task?
UPDATE: I accidentally provided incorrect information! The actual size of the file is 300 MB, or 300,000 KB. I made the change above to reflect this.
I assume you can copy'n'paste text snippets without problems when your PDF is opened in Acrobat Reader or some other PDF Viewer?
Before trying to parse and extract text from such monster files programmatically (even if it's 200 MByte only -- for simple text in tables that's huuuuge, unless you have 200000 pages...), I would proceed like this:
Try to sanitize the file first by re-distilling it.
Try with different CLI tools to extract the text into a .txt file.
This is a matter of minutes. Writing a Ruby program to do this certainly is a matter of hours, days or weeks (depending on your knowledge about the PDF fileformat internals... I suspect you don't have much experience of that yet).
If "2." works, you may halfway be done already. If it works, you also know that doing it programmatically with Ruby is a job that can in principle be solved. If "2." doesn't work, you know it may be extremely hard to achieve programmatically.
Sanitize the 'Monster.pdf':
I suggest to use Ghostscript. You can also use Adobe Acrobat Distiller if you have access to it.
gswin32c.exe ^
-o Monster-PDF-sanitized ^
-sDEVICE=pdfwrite ^
-f Monster.pdf
(I'm curious how much that single command will make your output PDF shrink if compared to the input.)
Extract text from PDF:
I suggest to first try pdftotext.exe (from the XPDF folks). There are other, a bit more inconvenient methods available too, but this might do the job already:
pdftotext.exe ^
-f 1 ^
-l 10 ^
-layout ^
-eol dos ^
-enc Latin1 ^
-nopgbrk ^
Monster-PDF-sanitized.pdf ^
first-10-pages-from-Monster-PDF-sanitized.txt
This will not extract all pages but only 1-10 (for proof of concept, to see if it works at all). To extract from every page, just leave off the -f 1 -l 10 parameter. You may need to tweak the encoding by changing the parameter to -enc ASCII7 (or UTF-8, UCS-2).
If this doesn't work the quick'n'easy way (because, as sometimes happens, some font in the original PDF uses "custom encoding vector") you should ask a new question, describing the details of your findings so far. Then you need to resort bigger calibres to shoot down the problem.
At the very least, could anyone point
me to a Ruby PDF library for this
task?
If you haven't done so, you should check out the two previous questions: "Ruby: Reading PDF files," and "ruby pdf parsing gem/library." PDF::Reader, PDF::Toolkit, and Docsplit are some of the relatively popular suggested libraries. There is even a suggestion of using JRuby and some Java PDF library parser.
I'm not sure if any of these solutions is actually suitable for your problem, especially that you are dealing with such huge PDF files. So unless someone offers a more informative answer, perhaps you should select a library or two and take them for a test drive.
This will be a difficult task, as rendered PDFs have no concept of tabular layout, just lines and text in predetermined locations. It may not be possible to determine what are rows and what are columns, but it may depend on the PDF itself.
The java libraries are the most robust, and may do more than just extract text. So I would look into JRuby and iText or PDFbox.
Check whether there is any structured content in the PDF. I wrote a blog article explaining this at http://www.jpedal.org/PDFblog/?p=410
If not, you will need to build it.
Maybe the Prawn ruby library? link text

Resources