PDFClown Copy annotations and then manipulate them - pdfclown

I have the need to copy annotations from one PDF File to another. I have used the excellent PDFClown library but unable to manipulate things like color,rotation etc. Is this possible? I can see the baseobject information but also unsure how to manipulate that directly.
I can copy the appearance via cloning appearance but can't "edit" it.
Thanks in advance.
Alex
P.S If Stephano the author is listeing ,is project dead?

On annotations in general and Callout annotations in particular
I looked into it a bit, and I'm afraid there is not much you can deterministically manipulate for arbitrary inputs using high level methods. The reason is that there are numerous alternative ways to set the appearance of a Callout annotation and PDF Clown only supports the less prioritized ways with explicit high level methods. From high priority downwards
An explicit appearance in an AP stream. If it is given, it is used, ignoring whether this appearance looks like a Callout annotation at all, let alone like one defined by the other Callout properties.
PDF Clown does not create an appearance for callout annotations from the other values yet, let alone update existing appearances to follow up to some specific attribute (e.g. Color) change. For ISO 32000-2 support, PDF Clown here will have to improve as appearance streams have become mandatory.
If it exists, you can retrieve the appearance using getAppearance() but you only get a FormXObject with its low level drawing instructions, nothing Callout specific.
One thing you can manipulate quite easily given a FormXObject, though, you can rotate or skew the appearance quite easily by setting its Matrix accordingly, e.g.
annotation.getAppearance().getNormal().get(null).setMatrix(AffineTransform.getRotateInstance(100, 10));
A rich text string in the RC string or stream. Unless an appearance is given, the text in the Callout text box is generated from this rich text datum (rich text here uses a XHTML 1.0 subset for formatting).
PDF Clown does not create a rich text representation of the Callout text yet, let alone update existing ones to follow up to some specific attribute (e.g. Color) change..
If it exists, you can retrieve the rich text by low level access using getBaseDataObject().get(PdfName.RC), change this string or stream, and set it again using getBaseDataObject().put(PdfName.RC, ...). Similarly you can retrieve, manipulate, and set the rich text default style string using its name PdfName.DS instead.
A number of different settings for separate aspects used to build the Callout from in the absence of appearance stream and (as far as the text content is concerned) rich text string.
PDF Clown supports (many of) these attributes, in particular if you cast the cloned annotation to StaticNote, e.g. the opacity CA using get/set/withAlpha, the border Border / BS using get/set/withBorder, the background color C using get/set/withColor, ...
It by the way has an error in its line ending style LE support: Apparently the code for the Line annotation LE property was copied without checking; unfortunately that attribute there follows a different syntax...
Your tasks
Concerning the attributes you stated you want to change, therefore,
Rotation: There is no rotation attribute in the Callout annotation per se (other than the flag whether or not to follow the page rotation). Thus, you cannot set a rotation as a simple annotation attribute. If the source annotation does have an appearance stream, though, you can manipulate its Matrix to rotate it inside the annotation rectangle, see above.
Border color and font: If your Callout has an appearance stream, you can try and parse its content using a ContentScanner and manipulate color and font setting operations. Otherwise, if rich text information is set, for the font you can try and parse the rich text using some XML parser and manipulate font style attributes. Otherwise, you can parse the default appearance DA string and manipulate its font and color setting instructions.
Some example code
I created a file with an example Callout annotation using Adobe Acrobat: Callout-Yellow.pdf. It contains an appearance stream, rich text, and simple attributes, so one can use this file for example manipulations at different levels.
The I applied this code to it with different values for keepAppearanceStream and keepRichText (you didn't mention whether you used PDF Clown for Java or .Net; so I chose Java; a port to .Net should be trivial, though...):
boolean keepAppearanceStream = ...;
boolean keepRichText = ...;
try ( InputStream sourceResource = GET_STREAM_FOR("Callout-Yellow.pdf");
InputStream targetResource = GET_STREAM_FOR("test123.pdf");
org.pdfclown.files.File sourceFile = new org.pdfclown.files.File(sourceResource);
org.pdfclown.files.File targetFile = new org.pdfclown.files.File(targetResource); ) {
Document sourceDoc = sourceFile.getDocument();
Page sourcePage = sourceDoc.getPages().get(0);
Annotation<?> sourceAnnotation = sourcePage.getAnnotations().get(0);
Document targetDoc = targetFile.getDocument();
Page targetPage = targetDoc.getPages().get(0);
StaticNote targetAnnotation = (StaticNote) sourceAnnotation.clone(targetDoc);
if (keepAppearanceStream) {
// changing properties of an appearance
// rotating the appearance in the appearance rectangle
targetAnnotation.getAppearance().getNormal().get(null).setMatrix(AffineTransform.getRotateInstance(100, 10));
} else {
// removing the appearance to allow lower level properties changes
targetAnnotation.setAppearance(null);
}
// changing text background color
targetAnnotation.setColor(new DeviceRGBColor(0, 0, 1));
if (keepRichText) {
// changing rich text properties
PdfString richText = (PdfString) targetAnnotation.getBaseDataObject().get(PdfName.RC);
String richTextString = richText.getStringValue();
// replacing the font family
richTextString = richTextString.replaceAll("font-family:Helvetica", "font-family:Courier");
richText = new PdfString(richTextString);
targetAnnotation.getBaseDataObject().put(PdfName.RC, richText);
} else {
targetAnnotation.getBaseDataObject().remove(PdfName.RC);
targetAnnotation.getBaseDataObject().remove(PdfName.DS);
}
// changing default appearance properties
PdfString defaultAppearance = (PdfString) targetAnnotation.getBaseDataObject().get(PdfName.DA);
String defaultAppearanceString = defaultAppearance.getStringValue();
// replacing the font
defaultAppearanceString = defaultAppearanceString.replaceFirst("Helv", "HeBo");
// replacing the text and line color
defaultAppearanceString = defaultAppearanceString.replaceFirst(". . . rg", ".5 g");
defaultAppearance = new PdfString(defaultAppearanceString);
targetAnnotation.getBaseDataObject().put(PdfName.DA, defaultAppearance);
// changing the text value
PdfString contents = (PdfString) targetAnnotation.getBaseDataObject().get(PdfName.Contents);
String contentsString = contents.getStringValue();
contentsString = contentsString.replaceFirst("text", "text line");
contents = new PdfString(contentsString);
targetAnnotation.getBaseDataObject().put(PdfName.Contents, contents);
// change the line width and style
targetAnnotation.setBorder(new Border(0, new LineDash(new double[] {3, 2})));
targetPage.getAnnotations().add(targetAnnotation);
targetFile.save(new File(RESULT_FOLDER, "test123-withCalloutCopy.pdf"), SerializationModeEnum.Standard);
}
(CopyCallOut test testCopyCallout)
Beware, the code only has proof-of-concept quality: For arbitrary PDFs you cannot simply expect a string replace of "font-family:Helvetica" by "font-family:Courier" or "Helv" by "HeBo" or ". . . rg" by ".5 g" to do the job: fonts can be given using different style attributes or names, and different coloring instructions may be used.
Screenshots in Adobe
The original file:
keepAppearanceStream = true:
keepAppearanceStream = false and keepRichText = true:
keepAppearanceStream = false and keepRichText = false:

As a post commment Mkl
Your great advice is really helpful for when creating new annotations. I did apply the following as a method of "copying" an existing annotation where note is the "cloned" annotation ad baseAnnotation the source
foreach (PdfName t in baseAnnotation.BaseDataObject.Keys)
{
if (t.Equals(PdfName.DA) || t.Equals(PdfName.DS) || t.Equals(PdfName.RC) || t.Equals(PdfName.Rotate))
{
note.BaseDataObject[t] = baseAnnotation.BaseDataObject[t];
}
}
Thanks again

Related

RinohType sphinx customize the styles in PDF

I am using RinohType for generating my RST files to PDF.
I am trying to understand how to provide custom styles in the PDF for my logo and other elements.
I somehow felt the explanation in the Default matcher doesn't provide examples on how to do this.
conf.py
rinoh_documents = [dict(doc='index', # top-level file (index.rst)
target='manual',
template='rinohtype.rtt',
logo='_static/rr-logo-vertical2022-1100px-transp.png')]
rhinotype.rtt
[TEMPLATE_CONFIGURATION]
name = my article configuration
template = article
stylesheet = my_stylesheet.rts
parts =
title
;front_matter
contents
language = fr
abstract_location = title
[SectionTitles]
contents = 'Contents'
[AdmonitionTitles]
caution = 'Careful!'
warning = 'Please be warned'
[VARIABLES]
paper_size = A5
[title]
page_number_format = lowercase roman
end_at_page = left
[contents]
page_number_format = number
[title_page]
top_margin = 2cm
my_stylesheet.rts
Here I am trying to change the width of my logo in the PDF.
What is the correct way to give the css properties here.
width: 100px
The default matcher defines the title page logo style. To adjust the style of this element, you can create a style sheet that builds upon the default sphinx style sheet and tweak the title page logo style:
[STYLESHEET]
name=My Style Sheet
description=My tweaks to the Sphinx style sheet
base=sphinx
[title page logo]
width = 4cm
This style accepts the FlowableStyle style attributes. In the linked documentation, you can see the width attribute supports a bunch of units but not px.
Please stay tuned for better documentation. Something is actually happening in that area!
P.S. If you want to make more changes to the styling of your document, the style log can be very useful to find out which style name corresponds to a particular document element.

How to get the entire Visual Studio active document... with formatting

I know how to use VS Extensibility to get the entire active document's text. Unfortunately, that only gets me the text and doesn't give me the formatting, and I want that too.
I can, for example, get an IWpfTextView but once I get it, I'm not sure what to do with it. Are there examples of actually getting all the formatting from it? I'm only really interested in text foreground/background color, that's it.
Note: I need the formatted text on every edit, so unfortunately doing cut-and-paste using the clipboard is not an option.
Possibly the simplest method is to select all of the text and copy it to the clipboard. VS puts the rich text into the clipboard, so when you paste, elsewhere, you'll get the colors (assuming you handle rich text in your destination).
Here's my not-the-simplest solution. TL;DR: you can jump to the code at https://github.com/jimmylewis/GetVSTextViewFormattedTextSample.
The VS editor uses "classifications" to show segments of text which have special meaning. These classifications can then be formatted differently according to the language and user settings.
There's an API for getting the classifications in a document, but it didn't work for me. Or other people, apparently. But we can still get the classifications through an ITagAggregator<IClassificationTag>, as described in the preceding link, or right here:
[Import]
IViewTagAggregatorFactoryService tagAggregatorFactory = null;
// in some method...
var classificationAggregator = tagAggregatorFactory.CreateTagAggregator<IClassificationTag>(textView);
var wholeBufferSpan = new SnapshotSpan(textBuffer.CurrentSnapshot, 0, textBuffer.CurrentSnapshot.Length);
var tags = classificationAggregator.GetTags(wholeBufferSpan);
Armed with these, we can rebuild the document. It's important to note that some text is not classified, so you have to piece everything together in chunks.
It's also notable that at this point, we have no idea how any of these tags are formatted - i.e. the colors used during rendering. If you want to, you can define your own mapping from IClassificationType to a color of your choice. Or, we can ask VS for what it would do using an IClassificationFormatMap. Again, remember, this is affected by user settings, Light vs. Dark theme, etc.
Either way, it could look something like this:
// Magic sauce pt1: See the example repo for an RTFStringBuilder I threw together.
RTFStringBuilder sb = new RTFStringBuilder();
var wholeBufferSpan = new SnapshotSpan(textBuffer.CurrentSnapshot, 0, textBuffer.CurrentSnapshot.Length);
// Magic sauce pt2: see the example repo, but it's basically just
// mapping the spans from the snippet above with the formatting settings
// from the IClassificationFormatMap.
var textSpans = GetTextSpansWithFormatting(textBuffer);
int currentPos = 0;
var formattedSpanEnumerator = textSpans.GetEnumerator();
while (currentPos < wholeBufferSpan.Length && formattedSpanEnumerator.MoveNext())
{
var spanToFormat = formattedSpanEnumerator.Current;
if (currentPos < spanToFormat.Span.Start)
{
int unformattedLength = spanToFormat.Span.Start - currentPos;
SnapshotSpan unformattedSpan = new SnapshotSpan(textBuffer.CurrentSnapshot, currentPos, unformattedLength);
sb.AppendText(unformattedSpan.GetText(), System.Drawing.Color.Black);
}
System.Drawing.Color textColor = GetTextColor(spanToFormat.Formatting.ForegroundBrush);
sb.AppendText(spanToFormat.Span.GetText(), textColor);
currentPos = spanToFormat.Span.End;
}
if (currentPos < wholeBufferSpan.Length)
{
// append any remaining unformatted text
SnapshotSpan unformattedSpan = new SnapshotSpan(textBuffer.CurrentSnapshot, currentPos, wholeBufferSpan.Length - currentPos);
sb.AppendText(unformattedSpan.GetText(), System.Drawing.Color.Black);
}
return sb.ToString();
Hope this helps with whatever you're doing. The example repo will ask if you you want the formatted text in the clipboard after each edit, but that was just a dirty way that I could test and see that it worked. It's annoying, but it was just a PoC.

Pango select multiples fonts

I have three fonts i want to use in my software with pango:
Font1: latin, Cryllic characters
Font2: Korean characters
Font3: Japanese characters
Pango render the text correctly but i want select a font
There any way to indicate this preference pango font?
I use: linux and pango 1.29
The simplest way is to use PangoMarkup to set the fonts you want:
// See documentation for Pango markup for details
char *pszMarkup = "<span face=\"{font family name goes here}\">"
"{text requiring font goes here}"
"</span>"; // Split for clarity
char *pszText; // Pointer for text without markup tags
PangoAttrList *pAttr; // Attribute list - will be populated with tag info
pango_parse_markup (pszMarkup, -1, 0, &attr_list, &pszText, NULL, NULL);
You now have a buffer of regular text and an attribute list. If you want to set these up by hand (without going through the parser), you will need one PangoAttribute per instance of the font and set PangoAttribute.start_index and PangoAttribute.end_index by hand.
However you get them, you now give them to a PangoLayout:
// pWidget is the windowed widget in which the text is displayed:
PangoContext *pCtxt = gtk_widget_get_pango_context (pWidget);
PangoLayout *pLayout = pango_layout_new (pCtxt);
pango_layout_set_attributes(pLayout, pAttr);
pango_layout_set_text (pLayout, pszText, -1);
That's it. Use pango_cairo_show_layout (cr, pLayout) to display the results. The setup only needs changing when the content changes - it maintains the values across draw signals.

Using Javascript for Automation to copy cells in Numbers

I want to use JXA to automate some updating of Numbers spreadsheets. For example, copying a range of cells from one spreadsheet to another one with a different structure.
At this point, I'm just testing a simple program to set or read the value of a cell and I can't get this to work.
When I try to set a value I get "Error -1700: Can't convert types." and when I try to read a value I get back a [object ObjectSpecifier] rather than a text or number value.
Here's an example of the code:
Numbers = Application('Numbers')
Numbers.activate()
delay(1)
doc = Numbers.open(Path('/Users/username/Desktop/Test.numbers'))
currentSheet = doc.Sheets[0]
currentTable = currentSheet.Tables[0]
console.log(currentTable['name'])
console.log(currentTable.cell[1][1])
currentTable.cell[1][1].set(77)
When I run this, I get and output of [object ObjectSpecifier] for the two console.logs and then an error -1700: Can't convert types when it tries to set a cell.
I've tried several other variations of accessing or setting properties but can't get it to work.
Thanks in advance,
Dave
Here is a script that sets and gets a cell's value and then sets a different cell's value in the same table:
// Open Numbers document (no activate or delay is needed)
var Numbers = Application("Numbers")
var path = Path("/path/to/spreadsheet.numbers")
var doc = Numbers.open(path)
// Access the first table of the first sheet of the document
// Note:
// .sheets and .tables (lowercase plural) are used when accessing elements
// .Sheet and .Table (capitalized singular) are used when creating new elements
var sheet = doc.sheets[0]
var table = sheet.tables[0]
// Access the cell named "A1"
var cell = table.cells["A1"]
// Set the cell's value
cell.value = 20
// Get the cell's value
var cellValue = cell.value()
// Set that value in a different cell
table.cells["B2"].value = cellValue
Check out the Numbers scripting dictionary (with JavaScript selected as the language) to see classes and their properties and elements. The elements section will show you the names of elements (e.g. the Document class contains sheets, the Sheet class contains tables, and so on). To open the scripting dictionary, in Script Editor's menu bar, choose Window > Library, and then select Numbers in the library window.
In regards to the logging you were seeing - I recommend using a function similar to this:
function prettyLog(object) {
console.log(Automation.getDisplayString(object))
}
Automation.getDisplayString gives you a "pretty print" version of any object you pass to it. You can then use that for better diagnostic logging.

JAVA PDFBox Embed EPS, converted to PDF [duplicate]

I use different tools like processing to create vector plots. These plots are written as single or multi-page pdfs. I would like to include these plots in a single report-like pdf using pdfbox.
My current workflow includes these pdfs as images with the following pseudo code
PDDocument inFile = PDDocument.load(file);
PDPage firstPage = (PDPage) inFile.getDocumentCatalog().getAllPages().get(0);
BufferedImage image = firstPage.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
PDXObjectImage ximage = new PDPixelMap(document, image);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
contentStream.drawXObject(ximage, 0, 0, ximage.getWidth(), ximage.getHeight());
contentStream.close();
While this works it looses the benefits of the vector file formats, espectially file/size vs. printing qualitity.
Is it possible to use pdfbox to include other pdf pages as embedded objects within a page (Not added as a separate page)? Could I e.g. use a PDStream? I would prefer a solution like pdflatex is able to embed pdf figures into a new pdf document.
What other Java libraries can you recommend for that task?
Is it possible to use pdfbox to include other pdf pages as embedded objects within a page
It should be possible. The PDF format allows the use of so called form xobjects to serve as such embedded objects. I don't see an explicit implementation for that, though, but the procedure is similar enough to what PageExtractor or PDFMergerUtility do.
A proof of concept derived from PageExtractor using the current SNAPSHOT of the PDFBox 2.0.0 development version:
PDDocument source = PDDocument.loadNonSeq(SOURCE, null);
List<PDPage> pages = source.getDocumentCatalog().getAllPages();
PDDocument target = new PDDocument();
PDPage page = new PDPage();
PDRectangle cropBox = page.findCropBox();
page.setResources(new PDResources());
target.addPage(page);
PDFormXObject xobject = importAsXObject(target, pages.get(0));
page.getResources().addXObject(xobject, "X");
PDPageContentStream content = new PDPageContentStream(target, page);
AffineTransform transform = new AffineTransform(0, 0.5, -0.5, 0, cropBox.getWidth(), 0);
content.drawXObject(xobject, transform);
transform = new AffineTransform(0.5, 0.5, -0.5, 0.5, 0.5 * cropBox.getWidth(), 0.2 * cropBox.getHeight());
content.drawXObject(xobject, transform);
content.close();
target.save(TARGET);
target.close();
source.close();
This code imports the first page of a source document to a target document as XObject and puts it twice onto a page there with different scaling and rotation transformations, e.g. for this source
it creates this
The helper method importAsXObject actually doing the import is defined like this:
PDFormXObject importAsXObject(PDDocument target, PDPage page) throws IOException
{
final PDStream src = page.getContents();
if (src != null)
{
final PDFormXObject xobject = new PDFormXObject(target);
OutputStream os = xobject.getPDStream().createOutputStream();
InputStream is = src.createInputStream();
try
{
IOUtils.copy(is, os);
}
finally
{
IOUtils.closeQuietly(is);
IOUtils.closeQuietly(os);
}
xobject.setResources(page.findResources());
xobject.setBBox(page.findCropBox());
return xobject;
}
return null;
}
As mentioned above this is only a proof of concept, corner cases have not yet been taken into account.
To update this question:
There is already a helper class in org.apache.pdfbox.multipdf.LayerUtility to do the import.
Example to show superimposing a PDF page onto another PDF: SuperimposePage.
This class is part of the Apache PDFBox Examples and sample transformations as shown by #mkl were added to it.
As mkl appropriately suggested, PDFClown is among the Java libraries which provide explicit support for page embedding (so-called Form XObjects (see PDF Reference 1.7, ยง 4.9)).
In order to let you get a taste of the way PDFClown works, the following code represents the equivalent of mkl's PDFBox solution (NOTE: as mkl later stated, his code sample was by no means optimised, so this comparison may not correspond to the actual status of PDFBox -- comments are welcome to clarify this):
Document source = new File(SOURCE).getDocument();
Pages sourcePages = source.getPages();
Document target = new File().getDocument();
Page targetPage = new Page(target);
target.getPages().add(targetPage);
XObject xobject = sourcePages.get(0).toXObject(target);
PrimitiveComposer composer = new PrimitiveComposer(targetPage);
Dimension2D targetSize = targetPage.getSize();
Dimension2D sourceSize = xobject.getSize();
composer.showXObject(xobject, new Point2D.Double(targetSize.getWidth() * .5, targetSize.getHeight() * .35), new Dimension(sourceSize.getWidth() * .6, sourceSize.getHeight() * .6), XAlignmentEnum.Center, YAlignmentEnum.Middle, 45);
composer.showXObject(xobject, new Point2D.Double(targetSize.getWidth() * .35, targetSize.getHeight()), new Dimension(sourceSize.getWidth() * .4, sourceSize.getHeight() * .4), XAlignmentEnum.Left, YAlignmentEnum.Top, 90);
composer.flush();
target.getFile().save(TARGET, SerializationModeEnum.Standard);
source.getFile().close();
Comparing this code to PDFBox's equivalent you can notice some relevant differences which show PDFClown's neater style (it would be nice if some PDFBox expert could validate my assertions):
Page-to-FormXObject conversion: PDFClown natively supports a dedicated method (Page.toXObject()), so there's no need for additional heavy-lifting such as the helper method importAsXObject();
Resource management: PDFClown automatically (and transparently) allocates page resources, so there's no need for explicit calls such as page.getResources().addXObject(xobject, "X");
XObject drawing: PDFClown supports both high-level (explicit scale, translation and rotation anchors) and low-level (affine transformations) methods to place your FormXObject into the page, so there's no need to necessarily deal with affine transformations.
The whole point is that PDFClown features a rich architecture made up of multiple abstraction layers: according to your requirements, you can choose the most appropriate coding style (either to delve into PDF's low-level basic structures or to leverage its convenient and elegant high-level model). PDFClown lets you tweak every single byte and solve complex tasks with a ridiculously simple method call, at your will.
DISCLOSURE: I'm the lead developer of PDFClown.

Resources