How to convert a pdf to text on windows? - windows

I have a scanned pdf and I want to transform it to an editable text format. Do you have some recommendations to do this on Windows? I was thinking about using Linux as a subprogram of Windows. Any other ideas?

It looks like you work in Python, so a pypi package you might want to look into is pypdfocr. Essentially you'll want to use a tool like poppler to render the pdf and get the images from it (a scanned PDF is built on images), then read the text from the images via an OCR solution to get the text.
I have not used this package myself, so this is as much help as I can give. It should work with python in both Windows and Linux.

Related

Programming language selection for showing image with other images on

I want to make a program that displays an image with other images on it (a map with various icons on it). What language (and library!) would you recommend, based on these facts:
-I have some basic oop knowledge
-I'll need a free and windows OS IDE for that
I would guess java? But it is not easy to compress my question to a good quality google search string....
displays an image with other images on it
Almost every language can achieve this. Such as Python with Pillow C++ with opencv and many other options. (I believe most of mainstream languages will have their own image library)
The problem is, what you want to implement specifically?
if you just want to put some images on to another image, and you do not need extra interactive function. Go with Python and Pillow, it is easy to learn and can perfectly solve your problems.

Extract pictures from a table in a PDF

I would like to write a small program, or script, to extract a set of pictures from a pdf.
I have several PDFs, they each have a table of pictures. I would link to have one picture per file. Therefore I need a way to extract them. Due to the nature of the PDF (A table/grid), it seems that it would be much easier to write a program, than do some manual method. However I have no idea what tools are available.
What libraries are available?
Preference Python, then C# or Java, then maybe some other language (My C and C++ is rusty, I have not done them for years).
I am on Debian Gnu/Linux, so have a wide choice of tools.
I went with pdfbox (an Apache project, so Free Software) it is a java library and a command line tool (the app module). I then scripted it with a bit of python to process the extracted text (yes it did that as well), and rename the image files.

WxPython: How to display a pdf file object in a wxPanel?

I have a wxPython GUI. I would like to display the pdf object as an image inside a wxPanel on Mac/UNIX. What would I use?
Any advice or suggestions would be appreciated. Thank you in advance.
There is wxPDF:
http://wxcode.sourceforge.net/components/wxpdfdoc/
You can write your own wrapper for python if you are good enough with C++.
Or you can try:
http://www.wxpython.org/docs/api/wx.lib.pdfwin-module.html
But that needs acrobat installed on the users system.
edit:
You could also use pdf2ps to convert every page (called from commandline so you don't violate the GPL if you are not releasing under GPL) and convert that to a png file with ghostscript.
Not very elegant, but probably the best approach without using acrobat.

How can I read and write DICOM images in MATLAB version 7.0?

I am currently using MATLAB version 7.0. I need to read a DICOM image and write it back out. What functions are available to help me do this?
You can use the dicomread/dicomwrite functions from the Image Processing Toolbox, but I'll recommend using one of the many functions found in the file exchange. Personally, I use this.
Look in the code at the bottom of this page
http://sites.google.com/site/dicomil/dicomandmatlab

Convert Powerpoint to Flash in Linux

I'm looking to convert PPT and PPTX files to Flash (or flv) files in an automated fashion in Linux - So I need a command-line utility.
Are there any available options out there for me? (I haven't found any so far).
I was also looking for a Flash player to play ppt/pptx files as an alternative (similar to what slideshare provides) - does anyone know of any other than openslide?
Thanks for any help.
Related question here: Convert powerpoint to flash
Summary of answers: you should probably use OpenOffice to do it.
To do it from the command-line, it looks like you should probably use PyODConverter http://www.artofsolving.com/opensource/pyodconverter
OpenOffice generates a very poor SWF version. It should generate a back/foward button at least.

Resources