soffice.exe to Convert odt to PDF/A - pdf-generation

trying to convert ODT to PDF using soffice.exe, but it always give me only PDF and not PDF/A...
Is there a way to convert odt to PDF/A using soffice.exe ?

I understand that whatever format you successfully used last, to export a document as a Pdf, will be the default. So export something as a Pdf/A and then try again from the command line.
I have tried the unoconv option:
unoconv -f pdf -eFormsType=1 -eSelectedPdfVersion=1 Some.odt
and whilst it purports to work, the result give errors when tested as a Pdf/A file.
Whereas, the same file run through the:
soffice --headless --convert-to pdf Some.odt
command does pass the test but only if the previous export was successfully made to a Pdf/A file.
Testing was performed using pdfa from http://www.pdftron.com/pdfamanager/downloads.html

Related

Piping an image retrieved using curl to sips to change format without saving intermediate file

I have url links to image files I want to retrieve from the internet.
I can download the files using curl without issue using:
curl "https://...web address..." > myfileName;
The image files are of various types, some .bmp some .jpg etc. I have been using sip in Terminal on Mac osx to convert each to .png files using:
sips -s format png downloadFileName --out newFileName.png
This works well on files I've saved as downloadedFileName regardless of the starting file type.
As I have many files to process I wanted to pipe the output of the curl download directly into sips, without saving an intermediate file.
I tried the following (which combines my two working steps without the intermediate file name):
curl "https://...web address..." | sips -s format png --out fileName.png
And get a no file error: Error 4: no file was specified.
I've searched the sip man pages but cannot find a reference for piped input and have been unable to find a useful answer searching SO or google.
Is there a way to process an image downloaded using curl directly in sips without first saving the file?
I do not necessarily need the solution to use a pipe, or even be on one line. I have a script that will cycle through a few thousand urls and simply want to avoid saving lots of files that will be deleted a line later.
I should add, I do not necessarily need to use sips either. However, any solution must be able to handle image files of unknown type (which sips does admirably) as no file extension is present on the files.
Thanks
I don't have sips installed but its
manpage indicates that it cannot read
from stdin. However, if you use Bash or ZSH (MacOS default now) you
can use process substitution, in this example I use convert which is
a part of ImageMagick and can convert different image types too:
$ convert <(curl -s https://i.kym-cdn.com/entries/icons/mobile/000/018/012/this_is_fine.jpg) this_is_fine.png
$ file this_is_fine.png
this_is_fine.png: PNG image data, 800 x 450, 8-bit/color RGB, non-interlaced
After doing that this_is_fine.png will be the only file in the
directory with no temporary files
Apparently sips only reads regular files which makes it impossible to use /dev/stdin or named pipes.
However, it is possible using the mature and feature-rich convert command:
$ curl -sL https://picsum.photos/200.jpg | convert - newFilename.png
$ file newFilename.png
newFilename.png: PNG image data, 200 x 200, 8-bit/color RGB, non-interlaced
(First install ImageMagick via brewinstall imagemagick or sudoportinstall ImageMagick.)
ImageMagick permits image data to be read and written from the standard streams STDIN (standard in) and STDOUT (standard out), respectively, using a pseudo-filename of -.
source, section STDIN, STDOUT, and file descriptors

Fill an existing LibreOffice document with values from the command line

I run a bash script which generates a PDF at the end of a billing run. I used to do that with LaTeX but the users ask for a more MS Office like solution. So I'm thinking of using a LibreOffice document and use LibreOffice on the command line to generate the PDF. That works. But I have no idea how to inject the values I need to change (e.g. the address and the billing information) into that document before I can generate a PDF.
Let's assume the example.odt document contains this text:
Dear $fist_name,
you own us $amount USD.
Regards
xyz
Since example.odt is not really easy to edit from a Bash script I'm searching for an other way to inject values for $first_name and $amount.
What is the best way to do this?
The LibreOffice file is a zip archive which can be unzipped with
unzip old.odt -d example
cd example
The content of the file is in the file content.xml. There is can be changed with sed or any other tool. After that the .odt file has to be created again:
zip -r ../new.odt .
After that the PDF can be created with this command (the path is from OS X):
/Applications/LibreOffice.app/Contents/MacOS/soffice --headless
--convert-to pdf:writer_pdf_Export --outdir ~/Desktop/ ~/Desktop/new.odt

Is it possible to get a files owner url metadata in the macOS terminal?

I can access the meta data property "owner url" thru Photoshop, but am hoping that there's a way to access it from the command line without having to open the file.
Does anyone know of a way to do this?
mdls doesn't list this particular metadata field.
There is no built-in command line tool to achieve this.
However, you can utilize exiftool, which is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in a wide variety of files.
Installation:
The guidelines for installing it on macOS can be found here. In summary:
Download the ExifTool OS X Package from the ExifTool home page.
(The file you download should be named ExifTool-11.17.dmg.)
Install as a normal OS X package.
(Open the disk image, double-click on the install package, and
follow the instructions.)
You can now run exiftool by typing exiftool in a Terminal window.
Processing a single file:
Reading the "owner url" via the command line:
Run the following command in a Terminal window:
$ exiftool -b −xmp:WebStatement ~/Desktop/path/to/image.psd
Note: the ~/Desktop/path/to/image.psd part in the command above should be replaced with a real image filepath.
This command will log the URL to the console only if the image metadata contains one. For instance:
https://www.example.com
Writing the "owner url" via the command line:
You can also write the "owner url" to a file by running the following command:
$ exiftool −xmp:WebStatement="https://www.foobar.com" ~/Desktop/path/to/image.psd
Note: As mentioned previously, the ~/Desktop/path/to/image.psd part in the command above should be replaced with a real image filepath, and the https://www.foobar.com part should be replaced with the actual URL you want to apply.
Processing multiple files:
Reading the "owner url" for multiple files via the command line:
If you wanted to read the "owner url" for all image files within a given folder, (including those in sub folders), and generate a JSON report you can run the following command:
$ exiftool -j -r −xmp:WebStatement ~/Desktop/path/to/folder/ -ext jpg -ext png -ext psd -ext tif > ~/Desktop/owner-urls.json
Breakdown of command (above):
-j - Use JSON formatting for output.
-r - Recursively process sub directories.
−xmp:WebStatement - Retrieve the WebStatement value, i.e. "owner url".
~/Desktop/path/to/folder/ - The path to the folder containing images (This should be replaced with a real path to a folder).
-ext jpg -ext png -ext psd -ext tif - The file extension(s) to process.
> ~/Desktop/owner-urls.json - Save the JSON output to file at the Desktop named owners-url.json.

Atom and Pandoc

Using Atom, I'm trying to generate a PDF file from markdown.
I installed markdown-preview-plus plugin, which supports pandoc, and then installed pandoc and configured the plugin to use it.
Now, markdown-preview-plus does recognize pandoc, but I cannot see any command to generate a PDF. Plugin's web page seems to say nothing about that.. Can you help me?
markdown-preview-plus
Atom's markdown-preview-plus package generates an HTML preview, as #mb21 alluded to. This is clear by the fact that you can right click on the preview and select Copy as HTML. With the Enable Pandoc Parser option enabled, MPP is indeed using pandoc to generate this HTML preview.
In light of your question, I tried adding the following to the Pandoc Options: Commandline Arguments setting in MPP:
--to=latex, --output=temp.pdf
Note 1: you can't specify --to=pdf because pandoc can only generate PDF by first generating a LaTeX file.
Note 2: the above doesn't work because MPP essentially passes the contents of the editor window through pandoc 'on-the-fly', so you can't really hijack the --output setting.
Workarounds
AFAIK, there is no way to get a "live" PDF preview (the way you can get a "live" HTML preview using MPP). This means you will have to build the document whenever you'd like to view what's changed.
Assuming you want to view the PDF in the Atom window, you can install pdf-view and open the PDF in a pane, side-by-side with your source. Otherwise, you could simply open the PDF in your favorite PDF reader.
Build from the command line
As #mb21 suggested, you can build from the command line. I sometimes use BAT/CMD files to store lengthy, or complicated build commands (since I'm on Windows). For example, in my document source directory, I might have make.cmd, which contains:
pandoc --filter=pandoc-crossref --filter=pandoc-citeproc --smart --listings --number-sections --standalone file.md -o file.pdf
Then I run make.cmd from the command prompt using `./make.cmd'.
Alternatively, you could use Makefile.
Build from Atom
Install atom-script. Then, configure your run options (Ctrl-Alt-Shift-O, on Windows) with something along the lines of the following:
Command: pandoc
Command Arguments: --filter=pandoc-crossref --filter=pandoc-citeproc --standalone file.md -o file.pdf
As you edit your source and would like to update the PDF, you can execute the command via Ctrl-Shift-B
Build from Atom using panzer
You'll still need Atom's script package, but you'll also need to get panzer which is a utility that helps manage build configurations for pandoc.
Edit:
Automatically Build-on-Save via Grunt
Rather than having to press a key combination (e.g. building from command line or using atom-script), I thought of automatically building the output PDF upon saving using Grunt. I've captured the basic idea in this gist
I'm working on the same thing right now, only I am using embedded latex mathematical formulae etc.
The markdown-preview-plus gives a fair representation of what I'm likely to see, but I've been running the following command from the command line in order to compile my pdf
pandoc -f markdown myfile.md -o pdffile.pdf
This works in most simple cases, for trickier ones, or where I want to stitch a few things together, I'm taking my markdown files first to their latex equivalents using a command like
pandoc -f markdown+tex_math_dollars+pipe_tables myfile.md -o myfile.tex
Which creates a latex version of my original file that I can compile/combine with other latex files, or convert to pdf using
pandoc myfile.tex -o myfile.pdf
It's supposed to be possible to embed these various scripts into Atom, using the 'scripts' package, but I'm yet to try that - would be great if someone were to post up their methods too in this direction.
I got pandoc-pdf toolbar working on Atom after installing Perl on Windows, though PDF compilation is much slower with "Latexmk".
I recommend using markdown-preview enhanced and then use several other options available including pandoc, ebook etc.
markdown-preview-enhanced in Atom has support for PDF-on-save, similar to the Grunt workflow suggested above. The output command has several options in addition to PDF, such as MS Word.
Example:
---
layout: post
title: tentative tentacles
date: 2020-09-15 15:01
bibliography: bibliography.bib
output: pdf_document
export_on_save:
pandoc: true
---

Extract Images from an Excel Document

I am doing some data mapping from an .xls excel document, and I am trying to write a quick script to pull images out excel document.
What is the quickest, simplest way to do this programatically?
I am running Ubuntu 10.10 and I would prefer to user python if possible.
a XLSX file is a compressed file.
$ unzip file.xlsx
in xl/media/ are all pictures. This is not true for older .XLS files, but you can convert them to XLSX with a modern version of MS Office.
If you don't have MS Office, you can do the same thing with LibreOffice. Convert the file to .ods and then open it as a zip file and it will be in the Pictures folder.
I hate to answer my own question, but the best method I found only required two commands at the command line (assuming you have the right software installed).
First, use unoconv to convert the .xls to .pdf:
http://dag.wieers.com/home-made/unoconv/
On Ubuntu 10.10 command line:
sudo apt-get install unoconv
unoconv -f pdf file.xls
Then extract the images from the pdf using pdfimages (which seems to come bundled with Ubuntu):
http://en.wikipedia.org/wiki/Pdfimages
Back on the command line:
pdfimages file.pdf fileimage
And done! All of the images in the .xls are now in separate files in the directory. This could be done very easily on most Linux systems using your language of choice. In python, for example:
import subprocess
subprocess.call(['unoconv','-f','pdf','file.xls'])
subprocess.call(['pdfimages','file.pdf','fileimage'])
I would love to hear a simpler solution if somebody has one.

Resources