Pandoc: Citing a Full Source - pandoc

for the purpose of creating a syllabus, I like to know whether it is possible to insert a citation as a full citation. Right now, I have following markdown code:
# Session 1
#zhu2015.
This converts (pandoc "document.md" -o "document.pdf" --from markdown --template "eisvogel" --listings --citeproc) in the pdf as
Session 1
Zhu and Basar (2015).
Bibliography
Zhu, Quanyan, and Tamer Basar. 2015. “Game-Theoretic Methods for Robustness, Security, and Resilience of Cyberphysical Control Systems: Games-in-Games Principle for Optimal Cross-Layer Resilient Control Systems.” Control Systems, IEEE 35 (1): 46–65.
However, would it be possible to insert the reference as a full-citation in text?
Such as:
Session 1
Zhu, Quanyan, and Tamer Basar. 2015. “Game-Theoretic Methods for Robustness, Security, and Resilience of Cyberphysical Control Systems: Games-in-Games Principle for Optimal Cross-Layer Resilient Control Systems.” Control Systems, IEEE 35 (1): 46–65.
Thanks for your help!

Here's how to do this with a Lua filter: First, the filter finds the generated bibliography entry and saves it to a table, indexed by the citation key. Then it looks for the citation and replaces it with the full entry.
local refs = {}
local function store_refs (div)
local ref_id = div.identifier:match 'ref%-(.*)$'
if ref_id then
refs[ref_id] = div.content
end
end
local function replace_cite (cite)
local citation = cite.citations[1]
if citation and refs[citation.id] and #cite.citations == 1 then
return pandoc.utils.blocks_to_inlines(refs[citation.id])
end
end
return {
{Div = store_refs},
{Cite = replace_cite},
}
Save the above to a file and pass that file to pandoc with the --lua-filter command line option. The filter must run after the citeproc processer has done its work, so it should be the last command line argument. Tested with the latest pandoc version 2.12 (which no longer requires pandoc-citeproc, but it should work either way).

Related

What is wrong with this PDF file?

I have to work with a PDF form created by a person unknown to me. Why did the program with which the form was created (Word + PDF export?) split the term "Stunde" into "S", "t" and "unde" in line 6909 of the decoded PDF? There is no visual break between the three parts.
/TT1 1 Tf
11.04 0 0 11.04 59.16 476.1203 Tm
(Datum)Tj
/C2_1 1 Tf
<0003>Tj
/TT1 1 Tf
(der)Tj
0.424 -1.315 Td
(Tätigkeit)Tj
-0.0022 Tc 0 11.04 -11.04 0 261.24 437.7203 Tm
[(Ve)-4.6<7267fc74>-4.2(ungssat)-4.2(z)]TJ
/C2_1 1 Tf
0 Tc <0003>Tj
/TT1 1 Tf
-0.0021 Tc 0.935 -1.315 Td
[<2880>-6.1(/)-7.2(S)0.8(t)-4.1(unde)-4.5(\))]TJ % <<< the important line
0 Tc 11.04 0 0 11.04 340.92 468.8003 Tm
(Anlass/Art)Tj
/C2_1 1 Tf
resulting in
[]
To get the source code above, I decoded the PDF file as described here. I have no know-how concerning the PDF file format.
Background: I had to replace the word "Stunde", it drove me crazy to find the place where "Stunde" was written (in parts) within the source code, since no free PDF editor seems to be able to work with horizontal text without problems.
Academic Bonus questions: Is it possible to set the sum over a column as default value for a form field? (Modifiable; changed every time the column is changed.) Why was I able to replace "Stunde" with "Einsatz" without making the PDF file corrupt due to now irregular offsets?
Why did the program with which the form was created (Word + PDF export?) split the term "Stunde" into "S", "t" and "unde" in line 6909 of the decoded PDF?
As #gettalong mentioned in his answer, in your case this most likely has been done to apply kerning.
If you start looking into the outputs of some other PDF producers, you'll see that this export from Word actually is very unobtrusive in regard to splitting words:
there are PDF producers that draw each character individually after explicitly setting the text matrix for it, and
there also are PDF producers that have the width information for the characters of the used fonts set to zero and use the numbers in TJ instructions to forward the current text matrix between characters accordingly.
And this doesn't cover all the variants to be found, not by far...
Thus,
I had to replace the word "Stunde", it drove me crazy to find the place where "Stunde" was written (in parts) within the source code
in your case replacing actually was a fairly trivial task...
Is it possible to set the sum over a column as default value for a form field? (Modifiable; changed every time the column is changed.)
If all the column values in question are stored in form fields, you can use JavaScript to recalculate sums after form changes. To have it serve as "default" only, you can use some other (hidden) field for a flag whether the field has already been touched. Beware, though: JavaScript is not supported by all PDF viewers. Furthermore, the JavaScript object model for PDF is not specified in an independent (like ISO) specification but in an Adobe one which can make interpretation of the specification biased.
Why was I able to replace "Stunde" with "Einsatz" without making the PDF file corrupt due to now irregular offsets?
As we don't know how exactly you applied the changes, this obviously is hard to tell.
Most likely, though, you did corrupt the PDF and the PDF viewers you opened it in merely repair the corruption under the hood. There is a strong tendency in PDF viewers to do such under-the-hood repairs without informing the user; the result is that a large part of the PDFs in the wild actually being broken.
You don't see a visual break but the standard distance between "S", "t" and "unde" has been changed nonetheless. This is done by PDF writers that support e.g. kerning so that the word appear nicer. This is the reason why it is split that way.

Is there any way for a section to link back to the table of contents in pandoc?

So I'm able to create a nice table of contents with pandoc --toc option, but I was wondering if there is any way of linking the header or a symbol near the heading back to the table of contents. For example, when you create a footnote in pandoc, it links the subscript number to the bottom of the page. At the end of the note, there is this little sign (↩︎) with a link for going back to the line where the footnote was. I'd like to do this with my table of contents for each header. I don't mind not use --toc, and instead manually writing out the table of contents, but I not sure whether this particular feature was available. Any tips would be very helpful!
A Lua filter can be used to add a link back to the TOC.
local link_to_toc = pandoc.Link({pandoc.Str '↑'}, '#TOC')
function Header (h)
h.content = h.content .. {pandoc.Space(), link_to_toc}
return h
end
Save the above into a file and pass it to pandoc via the --lua-filter (or -L) command line option.
Linking to a specific line in the TOC is not possible though.

SPSS syntax for naming individual analyses in output file outline

I have created syntax in SPSS that gives me 90 separate iterations of general linear model, each with slightly different variations fixed factors and covariates. In the output file, they are all just named as "General Linear Model." I have to then manually rename each analysis in the output, and I want to find syntax that will add a more specific name to each result that will help me identify it out of the other 89 results (e.g. "General Linear Model - Males Only: Mean by Gender w/ Weight covariate").
This is an example of one analysis from the syntax:
USE ALL.
COMPUTE filter_$=(Muscle = "BICEPS" & Subj = "S1" & SMU = 1 ).
VARIABLE LABELS filter_$ 'Muscle = "BICEPS" & Subj = "S1" & SMU = 1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0). FILTER BY filter_$.
EXECUTE.
GLM Frequency_Wk6 Frequency_Wk9
Frequency_Wk12 Frequency_Wk16
Frequency_Wk20
/WSFACTOR=Time 5 Polynomial
/METHOD=SSTYPE(3)
/PLOT=PROFILE(Time)
/EMMEANS=TABLES(Time)
/CRITERIA=ALPHA(.05)
/WSDESIGN=Time.
I am looking for syntax to add to this that will name this analysis as: "S1, SMU1 BICEPS, GLM" Not to name the whole output file, but each analysis within the output so I don't have to do it one-by-one. I have over 200 iterations at times that come out in a single output file, and renaming them individually within the output file is taking too much time.
Making an assumption that you are exporting the models to Excel (please clarify otherwise).
There is an undocumented command (OUTPUT COMMENT TEXT) that you can utilize here, though there is also a custom extension TEXT also designed to achieve the same but that would need to be explicitly downloaded via:
Utilities-->Extension Bundles-->Download And Install Extension Bundles--->TEXT
You can use OUTPUT COMMENT TEXT to assign a title/descriptive text just before the output of the GLM model (in the example below I have used FREQUENCIES as an example).
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
oms /select all /if commands=['output comment' 'frequencies'] subtypes=['comment' 'frequencies']
/destination format=xlsx outfile='C:\Temp\ExportOutput.xlsx' /tag='ExportOutput'.
output comment text="##Model##: This is a long/descriptive title to help me identify the next model that is to be run - jobcat".
freq jobcat.
output comment text="##Model##: This is a long/descriptive title to help me identify the next model that is to be run - gender".
freq gender.
output comment text="##Model##: This is a long/descriptive title to help me identify the next model that is to be run - minority".
freq minority.
omsend tag=['ExportOutput'].
You could use TITLE command here also but it is limited to only 60 characters.
You would have to change the OMS tags appropriately if using TITLE or TEXT.
Edit:
Given the OP wants to actually add a title to the left hand pane in the output viewer, a solution for this is as follows (credit to Albert-Jan Roskam for the Python code):
First save the python file "editTitles.py" to a valid Python search path (for example (for me anyway): "C:\ProgramData\IBM\SPSS\Statistics\23\extensions")
#editTitles.py
import tempfile, os, sys
import SpssClient
def _titleToPane():
"""See titleToPane(). This function does the actual job"""
outputDoc = SpssClient.GetDesignatedOutputDoc()
outputItemList = outputDoc.GetOutputItems()
textFormat = SpssClient.DocExportFormat.SpssFormatText
filename = tempfile.mktemp() + ".txt"
for index in range(outputItemList.Size()):
outputItem = outputItemList.GetItemAt(index)
if outputItem.GetDescription() == u"Page Title":
outputItem.ExportToDocument(filename, textFormat)
with open(filename) as f:
outputItem.SetDescription(f.read().rstrip())
os.remove(filename)
return outputDoc
def titleToPane(spv=None):
"""Copy the contents of the TITLE command of the designated output document
to the left output viewer pane"""
try:
outputDoc = None
SpssClient.StartClient()
if spv:
SpssClient.OpenOutputDoc(spv)
outputDoc = _titleToPane()
if spv and outputDoc:
outputDoc.SaveAs(spv)
except:
print "Error filling TITLE in Output Viewer [%s]" % sys.exc_info()[1]
finally:
SpssClient.StopClient()
Re-start SPSS Statistics and run below as a test:
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
title="##Model##: jobcat".
freq jobcat.
title="##Model##: gender".
freq gender.
title="##Model##: minority".
freq minority.
begin program.
import editTitles
editTitles.titleToPane()
end program.
The TITLE command will initially add a title to main output viewer (right hand side) but then the python code will transfer that text to the left hand pane output tree structure. As mentioned already, note TITLE is capped to 60 characters only, a warning will be triggered to highlight this also.
This editTitles.py approach is the closest you are going to get to include a descriptive title to identify each model. To replace the actual title "General Linear Model." with a custom title would require scripting knowledge and would involve a lot more code. This is a simpler alternative approach. Python integration required for this to work.
Also consider using:
SPLIT FILE SEPARATE BY <list of filter variables>.
This will automatically produce filter labels in the left hand pane.
This is easy to use for mutually exclusive filters but even if you have overlapping filters you can re-run multiple times (and have filters applied to get as close to your desired set of results).
For example:
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
sort cases by jobcat minority.
split file separate by jobcat minority.
freq educ.
split file off.

Applying a diff-patch to a string/file

For an offline-capable smartphone app, I'm creating a one-way text sync for Xml files. I'd like my server to send the delta/difference (e.g. a GNU diff-patch) to the target device.
This is the plan:
Time = 0
Server: has version_1 of Xml file (~800 kiB)
Client: has version_1 of Xml file (~800 kiB)
Time = 1
Server: has version_1 and version_2 of Xml file (each ~800 kiB)
computes delta of these versions (=patch) (~10 kiB)
sends patch to Client (~10 kiB transferred)
Client: computes version_2 from version_1 and patch <= this is the problem =>
Is there a Ruby library that can do this last step to apply a text patch to files/strings? The patch can be formatted as required by the library.
Thanks for your help!
(I'm using the Rhodes Cross-Platform Framework, which uses Ruby as programming language.)
Your first task is to choose a patch format. The hardest format for humans to read (IMHO) turns out to be the easiest format for software to apply: the ed(1) script. You can start off with a simple /usr/bin/diff -e old.xml new.xml to generate the patches; diff(1) will produce line-oriented patches but that should be fine to start with. The ed format looks like this:
36a
<tr><td class="eg" style="background: #182349;"> </td><td><tt>#182349</tt></td></tr>
.
34c
<tr><td class="eg" style="background: #66ccff;"> </td><td><tt>#xxxxxx</tt></td></tr>
.
20,23d
The numbers are line numbers, line number ranges are separated with commas. Then there are three single letter commands:
a: add the next block of text at this position.
c: change the text at this position to the following block. This is equivalent to a d followed by an a command.
d: delete these lines.
You'll also notice that the line numbers in the patch go from the bottom up so you don't have to worry about changes messing up the lines numbers in subsequent chunks of the patch. The actual chunks of text to be added or changed follow the commands as a sequence of lines terminated by a line with a single period (i.e. /^\.$/ or patch_line == '.' depending on your preference). In summary, the format looks like this:
[line-number-range][command]
[optional-argument-lines...]
[dot-terminator-if-there-are-arguments]
So, to apply an ed patch, all you need to do is load the target file into an array (one element per line), parse the patch using a simple state machine, call Array#insert to add new lines and Array#delete_at to remove them. Shouldn't take more than a couple dozen lines of Ruby to write the patcher and no library is needed.
If you can arrange your XML to come out like this:
<tag>
blah blah
</tag>
<other-tag x="y">
mumble mumble
</other>
rather than:
<tag>blah blah</tag><other-tag x="y">mumble mumble</other>
then the above simple line-oriented approach will work fine; the extra EOLs aren't going to cost much space so go for easy implementation to start.
There are Ruby libraries for producing diffs between two arrays (google "ruby algorithm::diff" to start). Combining a diff library with an XML parser will let you produce patches that are tag-based rather than line-based and this might suit you better. The important thing is the choice of patch formats, once you choose the ed format (and realize the wisdom of the patch working from the bottom to the top) then everything else pretty much falls into place with little effort.
I know this question is almost five years old, but I'm going to post an answer anyway. When searching for how to make and apply patches for strings in Ruby, even now, I was unable to find any resources that answer this question satisfactorily. For that reason, I'll show how I solved this problem in my application.
Making Patches
I'm assuming you're using Linux, or else have access to the program diff through Cygwin. In that case, you can use the excellent Diffy gem to create ed script patches:
patch_text = Diffy::Diff.new(old_text, new_text, :diff => "-e").to_s
Applying Patches
Applying patches is not quite as straightforward. I opted to write my own algorithm, ask for improvements in Code Review, and finally settle on using the code below. This code is identical to 200_success's answer except for one change to improve its correctness.
require 'stringio'
def self.apply_patch(old_text, patch)
text = old_text.split("\n")
patch = StringIO.new(patch)
current_line = 1
while patch_line = patch.gets
# Grab the command
m = %r{\A(?:(\d+))?(?:,(\d+))?([acd]|s/\.//)\Z}.match(patch_line)
raise ArgumentError.new("Invalid ed command: #{patch_line.chomp}") if m.nil?
first_line = (m[1] || current_line).to_i
last_line = (m[2] || first_line).to_i
command = m[3]
case command
when "s/.//"
(first_line..last_line).each { |i| text[i - 1].sub!(/./, '') }
else
if ['d', 'c'].include?(command)
text[first_line - 1 .. last_line - 1] = []
end
if ['a', 'c'].include?(command)
current_line = first_line - (command=='a' ? 0 : 1) # Adds are 0-indexed, but Changes and Deletes are 1-indexed
while (patch_line = patch.gets) && (patch_line.chomp! != '.') && (patch_line != '.')
text.insert(current_line, patch_line)
current_line += 1
end
end
end
end
text.join("\n")
end

Convert a .rtf into a mac .r resource, in a scriptable way

I currently have a SLA in a .rtf format, which is to be integrated into .dmg using the intermediary .r mac resource format, which is used by the Rez utility. I had already done it by hand once, but updates made to the .rtf file are overwhelming to propagate to the disk image, and error-prone. I would like to automate this task, which could also help adding other languages or variants.
How could the process of .rtf to .r text conversion be automated?
Thanks.
Only because I didn't fully understand how the accepted answer actually achieved the goal, I use a combination of a script to generate the hex encoding:
#!/usr/bin/env ruby
# Makes resource (.r) text from binaries.
def usage
puts "usage: #{$0} infile"
puts ""
puts " infile The file to convert (the output will go to stdout)"
exit 1
end
infile = ARGV[0] || usage
data = File.read(infile)
data.bytes.each_slice(16) do |slice|
hex = slice.each_slice(2).map { |pair| pair.pack('C*').unpack('H*')[0] }.join(' ')
# We could put the comments in too, but it probably isn't a big deal.
puts "\t$\"#{hex}\""
end
The output of this is inserted into a variable during the build and then the variable ends up in a template (we're using Ant to do this, but the specifics aren't particularly interesting):
data 'RTF ' (5000, "English SLA") {
#english.licence#
};
The one bit of this which did take quite a while to figure out is that 'RTF ' can be used for the resource directly. The Apple docs say to separately insert 'TEXT' (with just the plain text) and 'styl' (with just the style). There are tools to do this of course, but it was one more tool to run and I could never figure out how to make hyperlinks work in the resulting DMG. With 'RTF ', hyperlinks just work.
Hoping that this saves someone time in the future.
Use the unrtf port (from macports), then format the lines, heading and tail with a shell script.

Resources