I have a scanned pdf and I want to transform it to an editable text format. Do you have some recommendations to do this on Windows? I was thinking about using Linux as a subprogram of Windows. Any other ideas?
It looks like you work in Python, so a pypi package you might want to look into is pypdfocr. Essentially you'll want to use a tool like poppler to render the pdf and get the images from it (a scanned PDF is built on images), then read the text from the images via an OCR solution to get the text.
I have not used this package myself, so this is as much help as I can give. It should work with python in both Windows and Linux.
I have a HEX dump which I'm told is a PNG file and hence want to convert it into the PNG so I can view/manipulate appropriately.
I've searched all over the internet but can't find any good tools out there to do this. I was perhaps thinking a plugin for GIMP or otherwise?
I'd love your suggestions.
You could try ImageMagick.
Is not a GUI, but has a lot of features as format conversions.
Tell us which OS are you using.
If you believe it is just a Base64 encoding, try uploading it to http://www.motobit.com/util/base64-decoder-encoder.asp (first Google result for "base64 decoder upload"), choosing the options to decode, and export to a binary file.
Otherwise, if you could post a portion of the file in question, that would be of much assistance.
I am currently using MATLAB version 7.0. I need to read a DICOM image and write it back out. What functions are available to help me do this?
You can use the dicomread/dicomwrite functions from the Image Processing Toolbox, but I'll recommend using one of the many functions found in the file exchange. Personally, I use this.
Look in the code at the bottom of this page
http://sites.google.com/site/dicomil/dicomandmatlab
I'm looking to convert PPT and PPTX files to Flash (or flv) files in an automated fashion in Linux - So I need a command-line utility.
Are there any available options out there for me? (I haven't found any so far).
I was also looking for a Flash player to play ppt/pptx files as an alternative (similar to what slideshare provides) - does anyone know of any other than openslide?
Thanks for any help.
Related question here: Convert powerpoint to flash
Summary of answers: you should probably use OpenOffice to do it.
To do it from the command-line, it looks like you should probably use PyODConverter http://www.artofsolving.com/opensource/pyodconverter
OpenOffice generates a very poor SWF version. It should generate a back/foward button at least.
I have a series of PDFs named sequentially like so:
01_foo.pdf
02_bar.pdf
03_baz.pdf
etc.
Using Ruby, is it possible to combine these into one big PDF while keeping them in sequence? I don't mind installing any necessary gems to do the job.
If this isn't possible in Ruby, how about another language? No commercial components, if possible.
Update: Jason Navarrete's suggestion lead to the perfect solution:
Place the PDF files needing to be combined in a directory along with pdftk (or make sure pdftk is in your PATH), then run the following script:
pdfs = Dir["[0-9][0-9]_*"].sort.join(" ")
`pdftk #{pdfs} output combined.pdf`
Or I could even do it as a one-liner from the command-line:
ruby -e '`pdftk #{Dir["[0-9][0-9]_*"].sort.join(" ")} output combined.pdf`'
Great suggestion Jason, perfect solution, thanks. Give him an up-vote people.
A Ruby-Talk post suggests using the pdftk toolkit to merge the PDFs.
It should be relatively straightforward to call pdftk as an external process and have it handle the merging. PDF::Writer may be overkill because all you're looking to accomplish is a simple append.
You can do this by converting to PostScript and back. PostScript files can be concatenated trivially. For example, here's a Bash script that uses the Ghostscript tools ps2pdf and pdf2ps:
#!/bin/bash
for file in 01_foo.pdf 02_bar.pdf 03_baz.pdf; do
pdf2ps $file - >> temp.ps
done
ps2pdf temp.ps output.pdf
rm temp.ps
I'm not familiar with Ruby, but there's almost certainly some function (might be called system() (just a guess)) that will invoke a given command line.
If you have ghostscript on your platform, shell out and execute this command:
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=finished.pdf <your source pdf files>
I tried the pdftk solution and had problems on both SnowLeopard and Tiger. Installing on Tiger actually wreaked havoc on my system and left me unable to run script/server, fortunately it’s a machine retired from web development.
Subsequently found another option: - joinPDF. Was an absolutely painless and fast install and it works perfectly.
Also tried GhostScript and it failed miserably (could not read the fonts and I ended up with PDFs that had images only).
But if you’re looking for a solution to this problem, you might want to try joinPDF.
I don't think Ruby has tools for that. You might check ImageMagick and Cairo. ImageMagick can be used for binding multiple pictures/documents together, but I'm not sure about the PDF case.
Then again, there are surely Windows tools (commercial) to do this kind of thing.
I use Cairo myself for generating PDF's. If the PDF's are coming from you, maybe that would be a solution (it does support multiple pages). Good luck!
I'd suggest looking at the code for PDFCreator (VB, if I'm not mistaken, but that shouldn't matter since you'd just be implementing similar code in another language), which uses GhostScript (GNU license). Or just dig straight into GhostScript itself; there's also a facade layer available called GhostPDF, which may do what you want.
If you can control GhostScript with VB, you can do it with C, which means you can do it with Ruby.
Ruby also has IO.popen, which allows you to call out to external programs that can do this.
Any Ruby code to do this in a real application is probably going to be painfully slow. I would try and hunt down unix tools to do the job. This is one of the beauties of using Mac OS X, it has very fast PDF capabilities built-in. The next best thing is probably a unix tool.
Actually, I've had some success with rtex. If you look here you'll find some information about it. It is much faster than any Ruby library that I've used and I'm pretty sure latex has a function to bring in PDF data from other sources.