merge pdf files with extra blank page at the end of odd-paged documents - qpdf - qpdf

i'm hoping to use qpdf for this.
I'm printing lots of small files and need to print them double sided, so I merge, say, 20 documents and wind up with a single 200 page pdf. I then can let the printer print, even pages reversed, then flip the stack over and put it back into the printer and print the odd ones, so we're using both sides of the paper.
my question is how i can detect and add a single blank page to the end of any document that has an odd number of pages; that way, when i do double sided printing, each document is completely separate from the others, rather than just printing on the back of a finished document.

If you have an odd number of pages, just call
cpdf -pad-multiple
with the even number one larger than the odd number. For example, for 19 pages, run
cpdf -pad-multiple 20 in.pdf -o out.pdf
You can get the number of pages with cpdf -pages.

Currently using FOSS qpdf 10.6.3 this is possible in windows by something like (note %% is for use in a batch.cmd)
for /f %%N in ('qpdf --show-npages in.pdf') do set VAR=%%N& set /a num=2*(%%N/2)+1& if .%num%.==.%var%. qpdf in.pdf --pages . blankA4P.pdf -- out.pdf
Note the integer maths is rounding odds down so 31 = 30+1 is a match but 32 = 32+1 will not match.
Without checking the dimensions of in.pdf is not easy to know if blankA4P.pdf is required so best to either get last page dimensions for a matching prepared page or batch apply each shape in groups.
using cpdf (as mentioned in previous answer) we could build a blank on demand along the lines of cpdf -create-pdf -create-pdf-pages 1 -o blank.pdf and use a page size, However cpdf has the even better option for OP case so simplest is cpdf -pad-multiple 2 in.pdf -o out.pdf as 1st hinted by johnwhittington
So now we have a perfect short one line solution. however cpdf is not FOSS
I have queried with qpdf if there may be a simpler way with just qpdf so watch this space https://github.com/qpdf/qpdf/issues/753

I will continue looking for a way to streamline this further because I would love a simpler workflow, but here is what I use to ensure I do not have two articles sharing one sheet of paper.
pacman::p_load(tidyverse, pdftools, qpdf)
# some prep
directory <- "Some/FileFolder/Path"
filelist <- (paste(directory, "/",
list.files(directory,
pattern = "*.pdf"), sep = ""))
# bust apart all pdf to inspect
my_pages <- as.list(lapply(filelist, pdf_split))
page_summ <- cbind.data.frame(filelist,
lengths(my_pages)) %>%
rename(filename = 1,
pages = 2) %>%
mutate(is_odd = pages %%2==1)
# separate into odd and even sets
odd_docs <- page_summ %>%
filter(is_odd == TRUE)
even_docs <- page_summ %>%
filter(is_odd == FALSE)
# I could not find an R process for adding a page to PDFs.
# For now, I will add a buffer page to docs via a PDF program.
# Once you are satisfied with your even_docs subset, pdf_combine
tocombine <- as.data.frame(even_docs$filename)
lapply(tocombine, pdf_combine)
This auto-generates the combo file into the previously defined directory. The new file name is not able to be set using "output = " within lapply(). Look for new file name = "firstnameintocombine_combined.pdf".

Related

What is wrong with this PDF file?

I have to work with a PDF form created by a person unknown to me. Why did the program with which the form was created (Word + PDF export?) split the term "Stunde" into "S", "t" and "unde" in line 6909 of the decoded PDF? There is no visual break between the three parts.
/TT1 1 Tf
11.04 0 0 11.04 59.16 476.1203 Tm
(Datum)Tj
/C2_1 1 Tf
<0003>Tj
/TT1 1 Tf
(der)Tj
0.424 -1.315 Td
(Tätigkeit)Tj
-0.0022 Tc 0 11.04 -11.04 0 261.24 437.7203 Tm
[(Ve)-4.6<7267fc74>-4.2(ungssat)-4.2(z)]TJ
/C2_1 1 Tf
0 Tc <0003>Tj
/TT1 1 Tf
-0.0021 Tc 0.935 -1.315 Td
[<2880>-6.1(/)-7.2(S)0.8(t)-4.1(unde)-4.5(\))]TJ % <<< the important line
0 Tc 11.04 0 0 11.04 340.92 468.8003 Tm
(Anlass/Art)Tj
/C2_1 1 Tf
resulting in
[]
To get the source code above, I decoded the PDF file as described here. I have no know-how concerning the PDF file format.
Background: I had to replace the word "Stunde", it drove me crazy to find the place where "Stunde" was written (in parts) within the source code, since no free PDF editor seems to be able to work with horizontal text without problems.
Academic Bonus questions: Is it possible to set the sum over a column as default value for a form field? (Modifiable; changed every time the column is changed.) Why was I able to replace "Stunde" with "Einsatz" without making the PDF file corrupt due to now irregular offsets?
Why did the program with which the form was created (Word + PDF export?) split the term "Stunde" into "S", "t" and "unde" in line 6909 of the decoded PDF?
As #gettalong mentioned in his answer, in your case this most likely has been done to apply kerning.
If you start looking into the outputs of some other PDF producers, you'll see that this export from Word actually is very unobtrusive in regard to splitting words:
there are PDF producers that draw each character individually after explicitly setting the text matrix for it, and
there also are PDF producers that have the width information for the characters of the used fonts set to zero and use the numbers in TJ instructions to forward the current text matrix between characters accordingly.
And this doesn't cover all the variants to be found, not by far...
Thus,
I had to replace the word "Stunde", it drove me crazy to find the place where "Stunde" was written (in parts) within the source code
in your case replacing actually was a fairly trivial task...
Is it possible to set the sum over a column as default value for a form field? (Modifiable; changed every time the column is changed.)
If all the column values in question are stored in form fields, you can use JavaScript to recalculate sums after form changes. To have it serve as "default" only, you can use some other (hidden) field for a flag whether the field has already been touched. Beware, though: JavaScript is not supported by all PDF viewers. Furthermore, the JavaScript object model for PDF is not specified in an independent (like ISO) specification but in an Adobe one which can make interpretation of the specification biased.
Why was I able to replace "Stunde" with "Einsatz" without making the PDF file corrupt due to now irregular offsets?
As we don't know how exactly you applied the changes, this obviously is hard to tell.
Most likely, though, you did corrupt the PDF and the PDF viewers you opened it in merely repair the corruption under the hood. There is a strong tendency in PDF viewers to do such under-the-hood repairs without informing the user; the result is that a large part of the PDFs in the wild actually being broken.
You don't see a visual break but the standard distance between "S", "t" and "unde" has been changed nonetheless. This is done by PDF writers that support e.g. kerning so that the word appear nicer. This is the reason why it is split that way.

Increment Serial Number using EXIF

I am using ExifTool to change the camera body serial number to be a unique serial number for each image in a group of images numbering several hundred. The camera body serial number is being used as a second place, in addition to where the serial number for the image is in IPTC, to put the serial number as it takes a little more effort to remove.
The serial number is in the format ###-###-####-#### where the last four digits is the number to increment. The first three groups of digits do not change for each batch I run. I only need to increment that last group of digits.
EXAMPLE
I if I have 100 images in my first batch, they would be numbered:
811-010-5469-0001, 811-010-5469-0002, 811-010-5469-0003 ... 811-010-5469-0100
I can successfully drag a group of images onto my ExifTool Shortcut that has the values
exiftool(-SerialNumber='001-001-0001-0001')
and it will change the Exif SerialNumber Tag on the images, but have not been successful in what to add to this to have it increment for each image.
I have tried variations on the below without success:
exiftool(-SerialNumber+=001-001-0001-0001)
exiftool(-SerialNumber+='001-001-0001-0001')
I realize most likely ExifTool is seeing these as numbers being subtracted in the first line and seeing the second line as a string. I have also tried:
exiftool(-SerialNumber+='1')
exiftool(-SerialNumber+=1)
just to see if I can even get it to increment with a basic, single digit number. This also has not worked.
Maybe this cannot be incremented this way and I need to use ExifTool from the command line. If so, I am learning the command line/powershell (Windows), but am still weak in this area and would appreciate some pointers to get started there if this is the route I need to take. I am not afraid to use the command line, just would need a bit more hand holding then normal for a starting point. I also am learning Linux and could do this project from there but again, not afraid to use it, just would need a bit more hand holding to get it done.
I do program in PHP, JavaScript and other languages so code is not foreign to me. Just experience in writing it for the command-line.
If further clarification is needed, please let me know in the comments.
Your help and guidance is appreciated!
You'll probably have to go to the command line rather than rely upon drag and drop as this command relies upon ExifTool's advance formatting.
Exiftool "-SerialNumber<001-001-0001-${filesequence;$_=sprintf('%04d', $_+1 )}" <FILE/DIR>
If you want to be more general purpose and to use the original serial number in the file, you could use
Exiftool "-SerialNumber<${SerialNumber}-${filesequence;$_=sprintf('%04d', $_+1 )}" <FILE/DIR>
This will just add the file count to the end of the current serial number in the image, though if you have images from multiple cameras in the same directory, that could get messy.
As for using the command line, you just need to rename to remove the commands in the parens and then either move it to someplace in the command line's path or use the full path to ExifTool.
As for clarification on your previous attempts, the += option is used with numbers and with lists. The SerialNumber tag is usually a string, though that could depend upon where it's being written to.
If I understand your question correctly, something like this should work:
1..100 | % {
$sn = '811-010-5469-{0:D4}' -f $_
# apply $sn
}
or like this (if you iterate over files):
$i = 1
Get-ChildItem 'C:\some\folder' -File | % {
$sn = '811-010-5469-{0:D4}' -f $i
# update EXIF data of current file with $sn
$i++
}

SPSS syntax for naming individual analyses in output file outline

I have created syntax in SPSS that gives me 90 separate iterations of general linear model, each with slightly different variations fixed factors and covariates. In the output file, they are all just named as "General Linear Model." I have to then manually rename each analysis in the output, and I want to find syntax that will add a more specific name to each result that will help me identify it out of the other 89 results (e.g. "General Linear Model - Males Only: Mean by Gender w/ Weight covariate").
This is an example of one analysis from the syntax:
USE ALL.
COMPUTE filter_$=(Muscle = "BICEPS" & Subj = "S1" & SMU = 1 ).
VARIABLE LABELS filter_$ 'Muscle = "BICEPS" & Subj = "S1" & SMU = 1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0). FILTER BY filter_$.
EXECUTE.
GLM Frequency_Wk6 Frequency_Wk9
Frequency_Wk12 Frequency_Wk16
Frequency_Wk20
/WSFACTOR=Time 5 Polynomial
/METHOD=SSTYPE(3)
/PLOT=PROFILE(Time)
/EMMEANS=TABLES(Time)
/CRITERIA=ALPHA(.05)
/WSDESIGN=Time.
I am looking for syntax to add to this that will name this analysis as: "S1, SMU1 BICEPS, GLM" Not to name the whole output file, but each analysis within the output so I don't have to do it one-by-one. I have over 200 iterations at times that come out in a single output file, and renaming them individually within the output file is taking too much time.
Making an assumption that you are exporting the models to Excel (please clarify otherwise).
There is an undocumented command (OUTPUT COMMENT TEXT) that you can utilize here, though there is also a custom extension TEXT also designed to achieve the same but that would need to be explicitly downloaded via:
Utilities-->Extension Bundles-->Download And Install Extension Bundles--->TEXT
You can use OUTPUT COMMENT TEXT to assign a title/descriptive text just before the output of the GLM model (in the example below I have used FREQUENCIES as an example).
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
oms /select all /if commands=['output comment' 'frequencies'] subtypes=['comment' 'frequencies']
/destination format=xlsx outfile='C:\Temp\ExportOutput.xlsx' /tag='ExportOutput'.
output comment text="##Model##: This is a long/descriptive title to help me identify the next model that is to be run - jobcat".
freq jobcat.
output comment text="##Model##: This is a long/descriptive title to help me identify the next model that is to be run - gender".
freq gender.
output comment text="##Model##: This is a long/descriptive title to help me identify the next model that is to be run - minority".
freq minority.
omsend tag=['ExportOutput'].
You could use TITLE command here also but it is limited to only 60 characters.
You would have to change the OMS tags appropriately if using TITLE or TEXT.
Edit:
Given the OP wants to actually add a title to the left hand pane in the output viewer, a solution for this is as follows (credit to Albert-Jan Roskam for the Python code):
First save the python file "editTitles.py" to a valid Python search path (for example (for me anyway): "C:\ProgramData\IBM\SPSS\Statistics\23\extensions")
#editTitles.py
import tempfile, os, sys
import SpssClient
def _titleToPane():
"""See titleToPane(). This function does the actual job"""
outputDoc = SpssClient.GetDesignatedOutputDoc()
outputItemList = outputDoc.GetOutputItems()
textFormat = SpssClient.DocExportFormat.SpssFormatText
filename = tempfile.mktemp() + ".txt"
for index in range(outputItemList.Size()):
outputItem = outputItemList.GetItemAt(index)
if outputItem.GetDescription() == u"Page Title":
outputItem.ExportToDocument(filename, textFormat)
with open(filename) as f:
outputItem.SetDescription(f.read().rstrip())
os.remove(filename)
return outputDoc
def titleToPane(spv=None):
"""Copy the contents of the TITLE command of the designated output document
to the left output viewer pane"""
try:
outputDoc = None
SpssClient.StartClient()
if spv:
SpssClient.OpenOutputDoc(spv)
outputDoc = _titleToPane()
if spv and outputDoc:
outputDoc.SaveAs(spv)
except:
print "Error filling TITLE in Output Viewer [%s]" % sys.exc_info()[1]
finally:
SpssClient.StopClient()
Re-start SPSS Statistics and run below as a test:
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
title="##Model##: jobcat".
freq jobcat.
title="##Model##: gender".
freq gender.
title="##Model##: minority".
freq minority.
begin program.
import editTitles
editTitles.titleToPane()
end program.
The TITLE command will initially add a title to main output viewer (right hand side) but then the python code will transfer that text to the left hand pane output tree structure. As mentioned already, note TITLE is capped to 60 characters only, a warning will be triggered to highlight this also.
This editTitles.py approach is the closest you are going to get to include a descriptive title to identify each model. To replace the actual title "General Linear Model." with a custom title would require scripting knowledge and would involve a lot more code. This is a simpler alternative approach. Python integration required for this to work.
Also consider using:
SPLIT FILE SEPARATE BY <list of filter variables>.
This will automatically produce filter labels in the left hand pane.
This is easy to use for mutually exclusive filters but even if you have overlapping filters you can re-run multiple times (and have filters applied to get as close to your desired set of results).
For example:
get file="C:\Program Files\IBM\SPSS\Statistics\23\Samples\English\Employee data.sav".
sort cases by jobcat minority.
split file separate by jobcat minority.
freq educ.
split file off.

Creating animation with multiple plots in Octave

I'm using Octave to write a script that plots a function at different time periods. I was hoping to create an animation of the plots in order to see the changes through time.
Is there a way to save each plot for each time point, so that all plots can be combined to create this animation?
It's a bit of kludge, but you can do the following (works here with octave 4.0.0-rc2):
x = (-5:.1:5);
for p = 1:5
plot (x, x.^p)
print animation.pdf -append
endfor
im = imread ("animation.pdf", "Index", "all");
imwrite (im, "animation.gif", "DelayTime", .5)
Basically, print all your plots into a pdf, one per page. Then read the pdf's as images and print them back as gifs. This will not work on Matlab (its imread implementation can't handle pdf).
This creates an animated gif
data=rand(100,100,20); %100 by 100 and 20 frames
%data go from 0 to 1, so lets convert to 8 bit unsigned integers for saving
data=data*2^8;
data=uint8(data);
%Write the first frame to a file named animGif.gif
imwrite(data(:,:,1),'/tmp/animGif.gif','gif','writemode','overwrite',...
'LoopCount',inf,'DelayTime',0);
%Loop through and write the rest of the frames
for ii=2:size(data,3)
imwrite(data(:,:,ii),'/tmp/animGif.gif','gif','writemode','append','DelayTime',0)
end
Had to come chime in here because this was the top Google result for me when I was looking for help with this. I had issues with both answers, and some other issues, too. Notably:
For Rick T's answer, the code snippet doesn't write a plot figure, it just writes matrix data. Getting the plot window was a pain.
For carandraug's answer, writing to a PDF took a very long time and made a gigantic PDF.
On my own machine, I'm pretty sure I used apt-install to get Octave, but the getframe function I found referenced in other answers wasn't found. Turns out I had installed version 4.4, which was from 2018 (>3 years old).
I removed the old version of Octave sudo apt remove octave, then installed the new version with snap. If you try octave from a terminal without it installed it should prompt you to the snap install - be sure to include the # 6.4.0 or whatever is included in the command.
Once I had the current version installed, I got access to the getframe command, which is what lets you convert directly from a figure handle to image data - this bypasses the hackey (but previously necessary step) in #carandraug's answer where you had to write to PDF or some other image as a placeholder.
I used #RickT's answer to make my own MakeGif function, which I will share with you all here. Note that MakeGif stores the filename in a persistent variable, meaning it is retained across calls. If you change the filename it will make (or overwrite!!) the new file. If you need to overwrite the current file (i.e., running the same script multiple times and want new results) then you can use clear MakeGif between calls and that will reset the persistentFilename.
Here is the code for the MakeGif function; code to test it with is provided after this:
function MakeGif(figHandle, filename)
persistent persistentFilename = [];
if isempty(filename)
error('Can''t have an empty filename!');
endif
if ~ishandle(figHandle)
error('Call MakeGif(figHandle, filename); no valid figHandle was passed!');
endif
writeMode = 'Append';
if isempty(persistentFilename)|(filename!=persistentFilename)
persistentFilename = filename;
writeMode = 'Overwrite';
endif
imstruct = getframe(figHandle);
imwrite(imstruct.cdata, filename, 'gif', 'WriteMode',writeMode,'DelayTime',0);
endfunction
And here is the code to test the function. There's a commented-out call to clear MakeGif between the blue and green colors. If you leave it commented out it will append the green sine wave to the blue sine wave, resulting in alternating colors after every cycle - again the filename is persistent in the function. If you uncomment that call then the MakeGif function will treat the green's call as "new" and trigger the overwrite of the blue sine wave and all you'll see is green.
clear all;
time = 0:0.1:2*pi;
nSamples = numel(time);
figHandle = figure(1);
for i=1:nSamples
plot(time,sin(time + time(i)),'Color','blue');
drawnow;
MakeGif(figHandle, 'test.gif');
endfor
% Uncomment the 'clear' command below to clear the MakeGif persistent
% memory, which will trigger the green sine wave to overwrite the blue.
% Default behavior is to APPEND a green sine wave because the filename
% is the same.
%clear MakeGif;
for i=1:nSamples
plot(time,sin(time + time(i)),'Color','green');
drawnow;
MakeGif(figHandle, 'test.gif');
endfor
I spent several hours on this after being super dissatisfied with laggy screen captures so I really hope this helps someone in the future! Good luck and best wishes from the Age of Covid lol.
#Chuck thanks for that code; I've been using it to save 1500-frame GIFs of simulation output, and I find that after maybe ~500 frames the time to save the next frame to the output during the call to MakeGif starts to become ... unnerving. I guess imwrite reads and writes the entirety of the output file at each call that includes the 'WriteMode','Append' pair. At frame 1500 my output is 480Mb so that becomes untenable.
An apparent rescue for this is hinted at in the doc for Octave 7.1.0's imwrite, with the suggestion that you can pass it a 4-dimensional array and write the entire image sequence with one call. I haven't been able to make this work, though: Calling imwrite that way seems to simply write the very first image in the sequence into every frame in the output file.

Convert PDF files to PDF/A via Ghostscript

I'd like to convert arbitrary PDF files to PDF/A with Ghostscript 9.15.
Is Ghostscript able to create PDF/A-3b conformant PDFs? There is no parameter which represents a PDF/A conformance level, so I assume there is no possibility. Or is there anything I have overlooked?
I was following a blog post where a Windows batch file is used to convert from PDF to PDF/A (see http://www.mcbsys.com/techblog/2013/04/batch-convert-pdf-to-pdfa/). The gs invokation in the batch is:
"%gs_path%\gswin64c" ^
-dPDFA ^
-dNOOUTERSAVE ^
-sProcessColorModel=DeviceRGB ^
-sDEVICE=pdfwrite ^
-o "GS_%file1%" ^
-dPDFACompatibilityPolicy=1 ^
"%currentdir%\PDFA_def.ps" ^
%inputfilelist%
The PDFA_def.ps is an adjusted version of the official one:
%!
% This prefix file for creating a PDF/A document is derived from
% the sample included with Ghostscript 9.07, released under the
% GNU Affero General Public License.
% Modified 4/15/2013 by MCB Systems.
% Feel free to modify entries marked with "Customize".
% This assumes an ICC profile to reside in the file (AdobeRGB1998.icc),
% unless the user modifies the corresponding line below.
% The color space described by the ICC profile must correspond to the
% ProcessColorModel specified when using this prefix file (GRAY with
% DeviceGray, RGB with DeviceRGB, and CMYK with DeviceCMYK).
% Define entries in the document Info dictionary :
/ICCProfile (... PATH TO ... AdobeRGB1998.icc) % Customize.
def
[ /Title (Title) % Customize.
/DOCINFO pdfmark
% Define an ICC profile :
[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA} <</N systemdict /ProcessColorModel get /DeviceGray eq {1} {systemdict /ProcessColorModel get /DeviceRGB eq {3} {4} ifelse} ifelse >> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark
% Define the output intent dictionary :
[/_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark
[{OutputIntent_PDFA} <<
/Type /OutputIntent % Must be so (the standard requires).
/S /GTS_PDFA1 % Must be so (the standard requires).
/DestOutputProfile {icc_PDFA} % Must be so (see above).
/OutputConditionIdentifier (AdobeRGB1998) % Customize
>> /PUT pdfmark
[{Catalog} <</OutputIntents [ {OutputIntent_PDFA} ]>> /PUT pdfmark
So, I use AdobeRGB1998.icc which is obviously useable for PDF files with RGB color space. Depending on the -sProcessColorModel value (DEVICERGB) a correct value is printed out.
The conversion works for all files. But when I validate the created PDF file against PDF/A-1b, I get different results depending whether the input file has RGB color space or not (e.g. CMYK). So, when I have an input PDF file which uses CMYK color space, the file gets converted by the script, but the validator says something like this:
input.pdf", 1, 38, 0x03418614, "A device-specific color space (DeviceCMYK) without an appropriate output intent is used.", 1
"output.pdf", 20, 0, 0x83410612, "The document does not conform to the requested standard.", 1
My question: Is there a way to get the conversion done for arbitrary files (i.e. independent of the used color space in the input file)?
Update
#KenS Thanks for your answer. I've updated my initial post to clarify what I want to achieve.
To make it more explicit, I will use an example. There are two files: input1.pdf (seems to use RGB) and input2.pdf (seems to use CMYK). I want to convert both of them to PDF/A-1. Thanks to your hint, I've let go of the above mentioned batch script and instead tested the command directly in the command line. After reading Ps2pdf.htm#PDFA, I have adjusted the (official) PDFA_def.ps so that AdobeRGB1998.icc is used. Then I invoked the following command on both input files (replaced output1.pdf by output2.pdf and input1.pdf by input2.pdf for the second file):
gswin64c.exe -dPDFA=1 -dBATCH -dNOPAUSE -dNOOUTERSAVE \
-sColorConversionStrategy=/RGB \
-sOutputICCProfile=AdobeRGB1998.icc -sDEVICE=pdfwrite \
-sOutputFile=output1.pdf -dPDFACompatibilityPolicy=1 \
"PATH/TO/OFFICIAL/PDFA_def.ps" input1.pdf
The conversion was done without any errors. The output1.pdf seems to be valid, but the output2.pdf is still invalid (tested with 3heights Validator):
"output2.pdf", 1, 40, 0x03418614, "A device-specific color space (DeviceCMYK) without an appropriate output intent is used.", 1
"output2.pdf", 20, 0, 0x83410612, "The document does not conform to the requested standard.", 1
So when I understand your answer correctly, the above command should produce a pdf file which uses the RGB color space - independent of the color space of the input file. If the input file uses CMYK, than the colors have to be translated into RGB with the above command.
When I interpret the first error message correctly, the used color space in the output2.pdf is still CMYK (although the command parameters like ColorConversionStrategy=/RGB). Since I used AdobeRGB1998.icc, the validation error appears.
What am I missing in the above command?
Going back to my original question (which is one step further): Instead of always converting to RGB (or CMYK), I wanted to somehow detect which color space is used in the input file and then dynamically switch to a RGB or CMYK icc file. Is it possible to achieve that?
Ghostscript does not support PDF/A-3. The conformance parameter you are looking for is -dPDFA= where valid values are nothing (defaults to 1), 1 or 2. You can find this documented in ghostpdl/gs/doc/ps2pdf/htm#PDFA
I'm not sure what you are asking for here though. You must either create a PDF/A file (in level 1 or 2 anyway, I haven't read the revision 3 spec yet) which is RGB or CMYK, because you aren't allowed to use both (you can convert everything to device independent colour of course). The colour space used in the input isn't relevant, other than to decide whether it needs to be converted.
This is something you need to decide, we can't decide it for you. One important reason is that the OutputIntent must be consistent with either RGB or CMYK, and the pdfwrite device doesn't check it, it assumes you chose one which matches the device space you are using for the PDF file (by the way, don't set the ProcessColorModel, use ColorConversionStrategy instead) In your case you have set OutputIntent to AdobeRGB1988 so your colours must be specified either in device independent colour, or RGB.
Given the errors you quote, I would suggest the problem is that you haven't specified -sColorConversionStrategy, so the input colours are not being converted to the required device space. I would further guess that the script you copied this from set -dUseCIEColor, and you didn't copy that bit. DO NOT set -dUseCIEColor, its a horrbile ancient piece of PostScript hackery. Instead set ColorConversionStrategy, which will convert colours in a much better way, as required.
Updated answer as this started getting too long for a comment:
I can't immediately see any problems with your command line, can you share an example PDF file ? Its much easier to investigate these things with a solid example. I know from our customers and other free users that pdfwrite is capable of producing conforming PDF/A-1b files.
Regarding the second question; its not possible to do that because currently you need to set the OutputIntentProfile to either a CMYK one or an RGB one before you start. You can't just run through the input PDF file until you come to a colour operation and then decide. If you feel like some programming it could be done by modifying pdfwrite, because the profile isn't actually used till the output is closed.
One problem is that, in order to do the colour conversion, you need to set the underlying ProcessColorModel (this is done for you automatically by ColorConversionStategy). The only way to change ProcessColorModel is to execute a setpagedevice, which causes an erasepage. Now I think that's actually fixable with pdfwrite, all it does is write a white rectangle over the page, so you should be able to intercept that and not emit it. Otherwise any marks you made before you encountered an RGB or CMYK operation would be underneath the white rectangle.....
So essentially no, you can't do it right now, if its important to you then you could probably modify the code to do so (don't forget you will also need to supply 2 OutputIntent profiles to choose between as well). We've never had a customer request to do this, so we won't likely take it on as a project. Of course if you did get this working we might very well incorporate it into the code base if you were to offer it back to us.

Resources