How to make docx2pdf work on Google Colab to convert Docx to PDF automatically - pdf-generation

I am getting following error when I tried to convert docx to PDF using docx2pdf imported from docx
Google colab code:
!pip install docx2pdf
from docx2pdf import convert as doc2pdf
doc2pdf('My_document.docx')
Present output:
docx2pdf is not implemented for linux as it requires Microsoft Word to be installed
If it cannot work, can you suggest another package that I can use to convert the docx to PDF on the Google colab automatically?

Related

No available image handler could decode this transfer syntax JPEG Lossless when read DICOM and ploting using matplotlib

When i use pydicom in python3.6, there are some problem:
import pydicom
import matplotlib.pyplot as plt
import os
import pylab
filePath = "/Users/zhuangrui/Documents/Python/Dicom/dicoms/zhang_bo/0001.dcm"
dataSet_1 = pydicom.dcmread(filePath)
plt.imshow(dataSet_1.pixel_array)
plt.show()
here is the problem:
How can this problem be solved? Thank you very much!
I've faced with the same problem, after doing some research on the suggested link above. I've managed to solve it by updating to the latest pydicom module "1.2.0" and installing gdcm. You can update the pydicom with
pip install -U git+https://github.com/pydicom/pydicom.git
You can find the latest gdcm here and this link explains the installation.
I use anaconda and it's easier to install the gdcm package and solve the problem. If you use anaconda
just type inside from your environment:
conda install pydicom --channel conda-forge to get pydicom's latest and
conda install -c conda-forge gdcm
to get the gdcm. This resolves the problem. Hope these will help.
With pydicom, you need an appropriate image handler also installed to handle compressed image types.
For JPEG lossless, in theory the following should work: jpeg_ls, gdcm, or Pillow with jpeg plugin. All of these also require Numpy to be installed. See the discussion at https://github.com/pydicom/pydicom/issues/532.
There is also a pull request in progress to add more descriptive error messages for what image handlers are needed for different images.
Problem:
I was trying to read medical images with .dcm extension. But was getting an error on Windows as well as on Ubuntu. I find a solution which will work on both the machined.
The error I got on Ubuntu is: NotImplementedError: this transfer syntax JPEG 2000 Image Compression (Lossless Only), can not be read because Pillow lacks the jpeg 2000 decoder plugin
(Note for Windows I was getting a different error but I am sure it's because of the same issue i.e. Pillow does not support JPEG 2000 format)
Platforma Information:
I am using: Python 3.6, Anaconda and Ubuntu, 15 GB RAM
RAM is important:
The solution I applied is the same as Ali explained above. But I want to add this installation may take time (depending on RAM you are using). On ubuntu where I am using 15 GB RAM on Cloud platform taken less time and on Windows on a local machine having 4 GB RAM taken a lot of time.
Solution
Anaconda is necessary. Why?
Please check the official doc of pydicom (https://pydicom.github.io/pydicom/dev/getting_started.html) its mentioned "To install pydicom along with image handlers for compressed pixel data, we encourage you to use Miniconda or Anaconda" (Note for Windows I was getting a different error)
If you are using Ubuntu directly open Terminal. If you are using Windows then on Anaconda Navigator go to Environment from here start terminal. Execute the following commands on it:
pip install -U git+https://github.com/pydicom/pydicom.git
conda install pydicom --channel conda-forge
conda install -c conda-forge gdcm
Cross Check:
Now use .dcm file for which we got the Error. Try to use the following code in Python notebook
filename = 'FileName.dcm'
ds = pydicom.dcmread(filename)
plt.imshow(ds.pixel_array, cmap=plt.cm.bone)
It should print the output. Also try this code:
ds.pixel_array
This will give you the array containing values.

pdf2htmlEX cannot save font to

I have an error converting some pdf files, it is:
Internal Error: File Offset wrong for ttf table (name-data), -1 expected 174
Save Failed
Cannot save font to C:\Users\test\AppData\Local\Temp//pdf2htmlEX-a14136/__tmp_font1.ttf
I'm using Windows last executable:
pdf2htmlEX version 0.14.6
Libraries:
poppler 0.33.0
libfontforge 20150621
cairo 1.12.18
Supported image format: png jpg svg
I'm testing it and fails at page 76 but if I change pages order, it still fails at page 76, even if I remove it from file.
It's fails even with command:
pdf2htmlEx test.pdf
And testing to split it into files of 10 pages for example, it works OK... but I can't use it, I need to convert full pdf.

Python-docx installation error. Cannot import 'etree'

I am using using Pycharms to program using Python3.5. I tried to import docx module and I am getting the error shown in the picture. I am not able to do any pip install either. I get a fatal error shown in the second picture.

Import NLTK : no module NLTK corpus

I have installed NLTK. Here's an image of the installation log.
When i use import nltk i get an error:
"No module named NLTK.corpus"
Here is a screenshot.
What could be the cause?
I think I had the same problem. So, downloading all the packages at once (since question didn't specify).
Start python and then import the packages, exit python and upgrade nltk. Modify the 'all' to download a specific corpus. Took me awhile to complete the 'all' download, I separately downloaded framenet_v15 and restarted the 'all' after. Upgrade nltk when the download is complete.
$ python
>>>import nltk
>>>nltk.download('all')
exit python
$ pip install --upgrade nltk
To fix this, you should rename your file to something else, say nltkXXX.py. Also make sure to remove "nltk.pyc" from your directory if it exists, since this will also be loaded (it's the byte compiled version of your code). After that, it should work fine.
If you are using the latest version of python, then try installing nltk using pip and the wheel downloaded from here:
http://www.lfd.uci.edu/~gohlke/pythonlibs/
Then in command prompt, use the command:
pip3 install
This should install nltk correctly.
After that check the installation in python using the command:
import nltk
and download the nltk data required using:
nltk.download()
If you find (Import NLTK : no module NLTK corpus) that type of error .
Make sure your saved file not be the name like (nltk.py).
so just rename your file name (like rename nltk.py to example.py ) or something else:
I hope it will help you.
thanks
If you has using PyCharm IDE, you should have install NLTK from the IDE's own tools [File -> Settings -> Projetct Interpreter -> Install (button '+') -> Install Package].

unoconv not working on ubuntu 12.04 server

I am using unoconv to convert different file formats to pdf. It is working well on my local machine for all formats. But on my ubuntu 12.04 server unoconv is failing for some formats such as xls, ppt, pptx etc. However it is working fine for doc files. It shows the following error for the ppt conversion.
$unoconv -f pdf Googling.ppt
unoconv: UnoException during conversion in <class '__main__.com.sun.star.lang.IllegalArgumentException'>: Unsupported URL <file:///home/pythonuser/almamapper/media/library/files/c1cb92e62ce54b29a017a6e8eaa23c/Googling.ppt>: ""
Traceback (most recent call last):
File "/usr/bin/unoconv", line 790, in <module>
main()
File "/usr/bin/unoconv", line 769, in main
convertor.convert(inputfn)
File "/usr/bin/unoconv", line 679, in convert
error("ERROR: The provided document cannot be converted to the desired format. (code: %s)" % e.ErrCode)
File "/usr/lib/python2.7/dist-packages/uno.py", line 337, in _uno_struct__getattr__
return __builtin__.getattr(self.__dict__["value"],name)
AttributeError: ErrCode
I know I have to install openoffice-headless version on my server. But from this link I understand that Ubuntu switched to libreoffice instead of openoffice quite a while ago. So I installed libreoffice by the following command.
apt-get install libreoffice-core libreoffice-writer libreoffice-calc
But still am getting the same error. Am I missing something to install? Do anyone have any thoughts on this issue?
I fixed the above issue by installing latest version of unoconv. I tried updating libreoffice and installing complete version, neither helped.
I was using unoconv 0.3, and the latest available version is 0.6. So I installed the latest one and it solved the issue.
Here is the steps i followed:
apt-get remove --purge unoconv (remove the old unoconv first)
git clone https://github.com/dagwieers/unoconv
(download latest version of unoconv from github.)
now cd to unoconv directory and do sudo make install
Note: pls do git clone, dont download the tar file. In my case the installation failed when I downloaded the tar.
I had the same general problem after doing apt-get install unoconv, an additional apt-get install libreoffice fixed it. Probably your limited install of only some libreoffice components is the reason it only works for some formats. Certainly I would expect it to need libreoffice-impress for ppt conversion?

Resources