Running Tika through tika-python in Windows produces encoding errors - windows

I have python code that extracts text from pdf files using Tika Server through tika-python. It then stores the resulting output in individual json files.
The command I run to execute my script is
python extraction.py <full path to some local directory>
I'm using python 3.5
It works perfect in different MacBookPro computers.
It doesn´t work as expected in Windows, even using up-to-date Windows 10.
Some pdf files are processed, others produce an error such as:
'charmap' codec can't encode characters in position 3648-3649: character maps to <undefined>
I have tried changing the Code Page to 65001 and changing console font to Lucida Console, based on other questions posted on Stack Overflow, including 388490 (Unicode characters in Windows command line - how?) and 14109024 (How to make Unicode charset in cmd.exe by default?) and 1259084 (What encoding/code page is cmd.exe using?).
I also tried installing ConEmu (http://conemu.github.io/en/UnicodeSupport.html) and changing the default encoding for all consoles.
Other references mention win_unicode_console (https://github.com/Drekin/win-unicode-console) but the python patch recommended instructions are not working in my machine.
I use Anaconda as my python distribution.
I am interested in knowing how to be able to run my python code in Windows without having these encoding problems. From what I have read, this is not a problem with my python code nor with Tika Server but rather a Windows encoding issue.
Thank you all,
German

Related

Mercurial messes scandinavian characters OSX

Me and my friend set up Mercurial repository on Bitbucket for our project. He works on Windows while I use OSX. I installed Mercurial to OSX and cloned the repository only to find out that all scandinavian characters (äö) in files where interpreted wrongly. Also, folders which had those characters didn't get cloned properly.
Now, I suppose it has something to do with character encoding, which makes it somehow work on Windows, but not on OSX. I used Sourcetree as GUI for Mercurial and tried to find any character encoding settings with no results. What I should do to fix this problem? I have used GUI Mercurial on Windows before and never had problems like this.
You have usual "different encoding" problem.
OSX uses UTF8 (FIXME), Windows for Western-Europe 8bit chars (most probably) - ISO-8859-1
Before any CLI-operations in OSX you have to chcp to the same copepage, as it was on Win-side

How to run gmic from command line cmd windows gimp

I am looking up on the Internet for hours and can't get to how to execute a command on the command prompt that does something out of gimp files and furthermore get to apply filters to images with gmic, I managed to do it all well with Image Magick, using the convert command, it just works, but for gmic I try on cmd
gimp -b -
as stated on their website's documentation: http://www.gimp.org/tutorials/Basic_Batch/
But no, it pops:
'gimp is not recognized as an internal or external command, operable program or batch file.
Am I asking something very dumb, I really don't know what I am doing wrong, maybe I am missing some steps, the error gives me a clue as if I had to create a file for the gimp command to work other than have succesfully installed it in Windows.
I'm on Windows 8 64 bit by the way. Enterprise edition.
Thanks very much for any help.
Gimp installed 2.8.2
gmic installed 1.5.3
If your goal is only to apply a G'MIC effect on an image, why not using the command-line interface 'gmic' of the G'MIC project, instead of trying doing that through the plug-in for GIMP ? The G'MIC project provides a command-line interface 'gmic' for his tool, so it should be less difficult to use I guess + you won't have the limitations due to GIMP (8bits processing e.g, as 'gmic' is able to process 16bits images).
The command line interface does not know where gimp is all by itself. Either call it with the full path, something like C:\Program Files\Gimp\gimp.exe -b -, or add the directory that contains gimp to your %PATH% system variable.

I want to view a .ps file through Ghostscript 9.05 command prompt

I have a file abc.ps on my desktop. I have installed Ghostscript 9.05 on my machine and I want to view my abc.ps file using the command line.
GS>?????
What command should I write here. I am working on Windows 7.
Location of exe file: C:\Program Files\gs\gs9.05\bin
From the Ghostscript documentation:
GS> (c:/gs3.53/example.ps)
Just looks like you use forward slashes instead of back slashes. For example, if your Windows username is Ankit, you'd enter:
GS> (c:/Users/Ankit/Desktop/abc.ps)
You can avoid the interactive GS> commandprompt by running the command like this:
gswin32c.exe -sDEVICE=display c:/Users/Ankit/Desktop/abc.ps
Download and install Ghostview after installing Ghostscript, it is a GUI interface for Ghostscript:
http://pages.cs.wisc.edu/~ghost/gsview/get50.htm
Note: Be sure you download the same architecture. I.e. if you downloaded and installed 64 bit ghostscript, make sure you also download and install 64 bit Ghostview.

Cucumber not showing coloured output in windows

this is probably something really stupid but I can't work it out.
I upgraded my version of cucumber to v 0.10.0 and now the test's (running on Win 7) are not showing coloured output with the "pretty" formatter.
When tests are run it prints this error: *** WARNING: You must use ANSICON 1.31 or higher (http://adoxa.110mb.com/ansicon) to get coloured output on Windows
I have been to http://adoxa.110mb.com/ansicon but it's not obvious to me how I should be upgrading it. Anyone know how to upgrade my version of anscion?
One of the dev's at my work figured it out.
You need to
Download Ansicon from https://github.com/adoxa/ansicon/downloads and unzip it into a directory
with no spaces
Open a command prompt and cd to the folder where you unzipped it
Now, cd into either x86 or x64 (depending on your machine’s processor) and install it globally on
your machine (For example, D:\Cucumber\ansi160\x64)
Type ansicon.exe –i OR ansicon -i and press Enter
Any program that prints ANSI colors will now display properly on your machine.
Update as of today, http://adoxa.110mb.com/ansicon is no longer accessible.
Files have been uploaded to https://github.com/adoxa/ansicon/downloads.
I tried downloading from adoxa.3eeweb.com, but Chrome warned me that the file was "not commonly downloaded and could be dangerous."
So I opted with the file from github.
Besides that, I just followed the steps mentioned above and my output is now coloured.

How do I get color coded console output from SBT on Windows?

I'm using SBT (Simple Build Tool) to build my Scala projects on Windows. I've seen that one of my friends, that runs OSX, gets color coded output in his terminal windows when running SBT, but mine is just the same color everywhere. Is there any way to enable this for Windows?
For DOS shell, check out ansicon
download page
type in the DOS shell:
ansicon -i
(If the above links don't work too well, aeracode mentiones in the comments this address)
(this picture is not from a sbt session but illustrates colors within a DOS session)
One way would be to install a POSIX-layer like MinGW or Cygwin and add -Djline.terminal=jline.UnixTerminal as a parameter to java to your sbt startup script.
I do not know if JLine supports colored output on Windows natively though.
I was able to get color output on windows by using Minnty with Cygwin. See the following question for the script to execute sbt from minnty
how to get specs2 color support on windows using mingw and sbt

Resources