How can i have Tessaract on Windows? - windows

I am working on a Text Recognition Solution and I need to use Tesseract on Windows OS.
Is there a command line to know if it's already installed? If not how can I get it?

Installer for Windows for Tesseract 3.05 and Tesseract 4 are available from Tesseract at UB Mannheim. You can read more about it here. You need to install it, windows does not come preinstalled.

You need to install tesseract using windows installer available here. Then you should install the python wrapper as:
pip install pytesseract
Then you should also set the tesseract path in your script after importing pytesseract library as below (Please do not forget that installation path might be modified in your case!):
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files (x86)\Tesseract-OCR\tesseract.exe'

Related

getting ghostscript in docker image for mac

Apologies if this is a trivial question, I am very new to docker.
I'm trying to install ghostscript in a python base running on a mac. I've looked online and seen people load gs on linux with apt-get and on windows loading an exe installer program. On mac Ghostscript is loaded using brew, but brew is not in my docker image.
What options do I have? Should I try to find and pull a layer ghostscript? Copy the lib files into my image in the dockerfile? Somehow get brew in my image and use that to load?
Thanks
KenS's answer works - building from source code gives what I need, and also works for other architectures (where things like the apt-get or .exe installer I found through searches are platform dependent). I learned something about using Docker from his answer.

Installed Tesseract 4.1.0 on Windows but --version in cmd is showing version 3.05.00dev

As the title says I installed Tesseract version 4.1.0 which is also shows under apps and features but when I type tesseract --version in cmd it shows 3.05.00dev
according to this tutorial:
https://medium.com/quantrium-tech/installing-and-using-tesseract-4-on-windows-10-4f7930313f82
it should show at least 4.0
then I uninstalled Tesseract and typed tesseract --version in cmd and it still shows up as 3.05.00dev but I can't find anything when I search for "tesseract" on the hdd.
What is wrong here?
That simple means you have another installation of tesseract (3.05.00dev) somewhere and you need to uninstall it.
If you can not find it in installed app, try this command in cmd:
where tesseract
It should reveal you where is tesseract installed and just remove it.

git-cola will not run on windows

I have installed git-cola using the setup installer for windows. I pointed it to proper installs of git and python.
When I try to launch git-cola, nothing happens whatsoever.
Is there something I am missing here?
I had the same problem, in my case it was missing PyQt4 library. You can install PyQt4 by downloading an appropriate installer from Binary Packages section on PyQt4 Riverbank website.
How I investigated the issue
When I installed git-cola in a default directory and tried to run it using a command line
C:\Program Files (x86)\git-cola\bin>python git-cola.pyw
I got
Sorry, you do not seem to have PyQt4 installed.
Please install it before using git-cola.
e.g.: sudo apt-get install python-qt4
Note
I have two Python 2.7 installations, one at c:\program\Python27 and another at C:\Users\UserName\Anaconda2, I used the first one. I also installed Python SIP some time ago, I'm not sure if it required by git-cola.

How to install Poppler on Windows?

The most recent version of ScraperWiki depends on Poppler (or so the GitHub says). Unfortunately, it only specifies how to get it on macOS and Linux, not Windows.
A quick googling turned up nothing too promising. Does anyone know how to get Poppler on Windows for ScraperWiki?
Other answers have linked to the correct download page for Windows users but do not specify how to install them for the uninitiated.
Go to this page and download the binary of your choice. In this example we will download and use poppler-0.68.0_x86.
Extract the archive file poppler-0.68.0_x86.7z into C:\Program Files. Thus, the directory structure should look something like this:
C:
└ Program Files
└ poppler-0.68.0_x86
└ bin
└ include
└ lib
└ share
Add C:\Program Files\poppler-0.68.0_x86\bin to your system PATH by doing the following: Click on the Windows start button, search for Edit the system environment variables, click on Environment Variables..., under System variables, look for and double-click on PATH, click on New, then add C:\Users\Program Files\poppler-0.68.0_x86\bin, click OK.
If you are using a terminal to execute poppler (e.g. running pdf2image in command line), you may need to reopen your terminal for poppler to work.
Done!
Poppler Windows binaries are available from ftp://ftp.gnome.org/Public/GNOME/binaries/win32/dependencies/ -- but note that those aren't quite up-to-date.
If you're looking for Python (2.7) bindings (as this question's tag suggests), I requested them in the past via this bug report. A couple of people apparently managed to produce something, but I haven't checked those out yet.
As for a more recent (python bindings unrelated) poppler Windows binaries Google result, see http://blog.alivate.com.au/poppler-windows/
Finally, there's the brand-new (and currently very frequently updated) PyGObject all-in-one installer (mainly aiming to provide PyGObject-instrospected Gtk+3 Python bindings etc. for Windows), so if that's what you're looking for, go to http://sourceforge.net/projects/pygobjectwin32/files/?source=navbar
Download Poppler Packaged for Windows
https://github.com/oschwartz10612/poppler-windows/releases
I threw together a quick repo with the latest Poppler prebuilt-binaries packaged with dependencies for Windows. Built with the help of conda-forge and poppler-feedstock. Includes the latest poppler-data.
With anaconda installed on windows one can simply execute:
conda install -c conda-forge poppler
UPDATE 2
See the answer by Owen Schwartz.
UPDATE 1
Rumpel Stielzchen's comment:
This site is no longer maintained. Poppler version 0.68 is very
outdated today. You find the latest version compiled also for Windows
here: https://anaconda.org/conda-forge/poppler/files Sadly there is no
32 bit version, only 64 bit
… but this package contains no dependencies:
It seems that the Anaconda people have a tool to download a package
and all dependencies. And there is a file in the TAR package:
index.json which lists the package on which it depends. I downloaded
the dependencies one by one, and yes: It WAS a pain.
Original answer
Latest Poppler Windows binaries can be found here:
http://blog.alivate.com.au/poppler-windows/
Chocolatey
Poppler is available as Chocolatey package:
choco install poppler
By default Poppler is installed in C:\ProgramData\chocolatey\lib\poppler and shims are automatically created for the following tools: pdfdetach, pdffonts, pdfimages, pdfinfo, pdfseparate, pdftocairo, pdftohtml, pdftoppm, pdftops, pdftotext, pdfunite.
To update Poppler, run:
cup poppler
Scoop
Install from the main bucket:
scoop install poppler
By default Poppler is installed in ~\scoop\apps\poppler and shims are automatically created for the following tools: pdfdetach, pdffonts, pdfimages, pdfinfo, pdfseparate, pdftocairo, pdftohtml, pdftoppm, pdftops, pdftotext, pdfunite.
To update Poppler, run:
scoop update poppler
TeX Live
As mentioned in another answer, MiKTeX currently ships with Poppler tools, and so does another LaTeX distribution, TeX Live.
From the guide:
Command-line tools.
A number of Windows ports of common Unix command-line programs are installed along with the usual TeX Live binaries. These include gzip, zip, unzip, and the utilities from the poppler suite (pdfinfo, pdffonts, …)
Poppler suite is located by default in C:\texlive\<year>\bin\win32 and, if you can compile your LaTeX documents, should work out of the box since this location is added to the PATH by the installer.
To Simply install Poppler on Windows run through the below mentioned steps without touching the environmental varible.
Download the Latest Poppler Binary from the URL: http://blog.alivate.com.au/poppler-windows/index.html
Unzip it and copy the poppler-0.68.0_x86 folder in some path for ex, C:/User/Poppler/poppler-0.68.0_x86/poppler-0.68.0/bin
Now go to your Python code where you want to call Poppler for image conversion and use the below mentioned code snippet:
from pdf2image import convert_from_path
pages = convert_from_path('MyPdf.pdf', 500, poppler_path = r'C:\User\Poppler\poppler-0.68.0_x86\poppler-0.68.0\bin')
for page in pages:
page.save('out.jpg', 'JPEG')
You should consider using Windows Subsystem for Linux (WSL).
Enable WSL on Windows 10 (it will not work on S edition)
Install Ubuntu (latest version) on WSL from the Windows Store
Open Ubuntu command-line
In the Ubuntu Command-line, run the following commands:
sudo apt-get update
sudo apt-get upgrade
sudo apt install poppler-utils
pdftocairo -v - to check the installed version
You can then run pdftocairo (for example) in two ways:
Within the Ubuntu command-line: pdftocairo ...
Directly from Windows command-line: wsl pdftocairo...
NOTE: There is a default version of poppler for each release of Ubuntu. You will need to look up the instructions (there should be plenty on the internet), for how to install the latest version of poppler-utils on Ubuntu. This might involve quite a few steps, which will compile from the source code. For example, something like this https://askubuntu.com/a/722955. And then you might get a lot of problems.
The latest version of Ubuntu 19.04, can install Poppler 74. But Ubuntu 18.04 seems to be the latest version you can install for WSL for now, and that installs Poppler 62.
It looks like a version that is build-able with visual studio can be found here https://bitbucket.org/merarischroeder/poppler-for-windows/overview
Up to date binaries for Windows x64, Mac OSX-64, Linux-64bit can be found here
https://anaconda.org/conda-forge/poppler/files
Poppler version 0.84 is available at the link as of this writing which is very current.
The accepted answer and the link given by Alexey are no longer pointing to current versions of poppler
Update :
As of March 8, 2021 the best answer is by Owen Schwarz above https://stackoverflow.com/a/62615998/590388
Another option is that if you have installed MikTeX then poppler is included by default and is probably already in your PATH. In my case the binaries were installed under: C:\Program Files\MiKTeX 2.9\miktex\bin\x64
MSYS2 has the latest version available for install.
If you don't want to install the whole enviroment (or you wanted some kind of portable version) you could also just download Poppler straight from the repository, but then you'd also have to manually handle dependencies. Namely: libwinpthread, nspr, gcc-libs, nss, curl, brotli, openssl, libidn2, libiconv, gettext, libunistring, nghttp2, libpsl, libjpeg-turbo, lcms2, openjpeg2, libpng, zlib, libtiff, xz and zstd.
Install the Microsoft Visual C++ Build Tools
Install poppler through the Conda prompt conda:
conda install -c conda-forge poppler
please note: if you don't have anaconda installed, it can be downloaded from here,
https://docs.anaconda.com/anaconda/install/windows/
Installing Poppler on Windows
Go to https://github.com/oschwartz10612/poppler-windows/releases/
Under Release 21.11.0-0 Latest v21.11.0-0
Go to Assets 3 Download
Release-21.11.0-0.zip
Adding Poppler to path
Add Poppler installed to loaction : C:\Users\UserName\Downloads\Release-21.11.0-0.zip
Add C:\Users\UserName\Downloads\Release-21.11.0-0.zip to system variable path in Environment Variable
This is what I did.
Install msys2
Open msys2 shell and then run:
To List available packages named poppler
pacman -Ss poppler
To Install the package
pacman -S mingw-w64-ucrt-x86_64-poppler
Open MSYS2 UCRT64 Shell and access poppler binaries
The binaries are installed at:
C:\msys64\ucrt64\bin

Installing OpenCV on Windows 7 for Python 3.2.3 [duplicate]

am trying desperately to get OpenCV to work on Windows 7. I download and installed it, and it didn't work, I got
ImportError: No module named opencv
when I tried to run one of the samples. I google my problem and got only random solutions that don't work. Can anybody guide me in installing it, or know where i can get a clear installation guide design for a programming noob.
As of OpenCV 2.2.0, the package name for the Python bindings is "cv".The old bindings named "opencv" are not maintained any longer. You might have to adjust your code. See http://opencv.willowgarage.com/wiki/PythonInterface.
The official OpenCV installer does not install the Python bindings into your Python directory. There should be a Python2.7 directory inside your OpenCV 2.2.0 installation directory. Copy the whole Lib folder from OpenCV\Python2.7\ to C:\Python27\ and make sure your OpenCV\bin directory is in the Windows DLL search path.
Alternatively use the opencv-python installers at http://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv.
I have posted a very simple method to install OpenCV 2.4 for Python in Windows here : Install OpenCV in Windows for Python
It is just as simple as copy and paste. Hope it will be useful for future viewers.
Download Python, Numpy, OpenCV from their official sites.
Extract OpenCV (will be extracted to a folder opencv)
Copy ..\opencv\build\python\x86\2.7\cv2.pyd
Paste it in C:\Python27\Lib\site-packages
Open Python IDLE or terminal, and type
>>> import cv2
If no errors shown, it is OK.
UPDATE (Thanks to dana for this info):
If you are using the VideoCapture feature, you must copy opencv_ffmpeg.dll into your path as well. See: https://stackoverflow.com/a/11703998/1134940
I have posted an entry to setup OpenCV for Python in Windows:
http://luugiathuy.com/2011/02/setup-opencv-for-python/
Hope it helps.
Actually you can use x64 and Python 2.7. This is just not delivered in the standard OpenCV installer. If you build the libraries from the source (http://docs.opencv.org/trunk/doc/tutorials/introduction/windows_install/windows_install.html) or you use the opencv-python from cgohlke's comment, it works just fine.
download the opencv 2.2 version from https://sourceforge.net/projects/opencvlibrary/files/opencv-win/
install package.
then Copy cv2.pyd to C:/Python27/lib/site-packeges.
and it should work:
import cv2
open command prompt and run the following commands (assuming python 2.7):
cd c:\Python27\scripts\
pip install opencv-python
the above works for me for python 2.7 on windows 10 64 bit
One thing that needs to be mentioned. You have to use the x86 version of Python 2.7. OpenCV doesn't support Python x64. I banged my head on this for a bit until I figured that out.
That said, follow the steps in Abid Rahman K's answer. And as Antimony said, you'll need to do a 'from cv2 import cv'
Installing OpenCV on Windows 7 for Python 2.7

Resources