How to get positional data from PDF to text - windows

I need to convert PDF files to text to extract information using Perl. But I am not getting the text file in positional format means the position of the elements in the PDF and text should be same. I tried CAM::PDF::PageText but the output is very different.
I have come across posts referring to pdftotext and Poppler but I am not able to setup any of these in my Windows 10 64-bit system.
Please let me know if there are any other ways to solve this problem.

What you really want is pdftohtml with the -xml output. You can build it on Windows.
There are 2 ways to compile poppler on Windows:
using mingw compiler under cygwin
using native Visual Studio (msvc) makefile
This document describes the second method.
...
You can download Visual Studio Community Edition subject to license terms to get the 2013 and 2015 versions of compilers and build tools along with the IDE.
Or, you can just get the Visual C++ build tools. See also Walkthrough: Compiling a Native C++ Program on the Command Line.

Sorry for the delay but finally I got a solution for this which is pdftotext by Xpdf and the best way is to download pre compiled binaries (.exe) files. And then using the commmand line invaocation we can use the various tools like pdftohtml, pdftotext etc.
Look at this page
http://www.foolabs.com/xpdf/download.html
and under the heading "Precompiled binaries" you can find that.
On command prompt you need to change directory to the place where the binary is present then call the binary with the file as parameter
Exapmle: pdftotext File1.pdf
The above command will give File1.txt in the same folder where the binary is present.

Related

Cannot generate LaTeX from Isabelle/HOL under Windows7

I have spent too many hours trying to generate a .pdf document out of my Isabelle theory Increments.thy. The Isabelle build command gets stuck and apparently this is an installation thing on Windows. Frustratingly enough, friends have done this on their linux machines and they experience no problems at all. But I cannot find the right documentation to get it going on my Windows 7 laptop. Does anyone have the recipe?
I have a full LaTeX installation on my laptop, working like a breeze. I have installed CYGWIN, but it gave problems with access rights of files, that I couldn't solve (neither from the windows-end, nor from the cygwin-end). I tried various manuals, without much luck.
With some hands-on help of the university of Innsbruck, I could finally generate a pdf from an Isabelle theory on my Windows-7 laptop. I'd like to share the result for the community at large. Here is what I did to make it work:
In Microsoft Explorer, I went to the directory that contains the Isabelle executables. This directory is called Isabelle2016-1.
I found it by searching for Isabelle2016-1 in the file system. It is on C:\Users\sjo\AppData\Roaming\local\bin\Isabelle2016-1.
I checked that it contains the file Cygwin-Terminal.bat.
I called the file Cygwin-Terminal.bat by double-clicking it.
This opens a command-line interpreter (CLI), which is the GNU Bash interpreter.
In this CLI, I navigated to the directory that contains my Isabelle source code, Increments.sty, by issuing the command:
$ cd /cygdrive/d/git/Publications/2017AFPproofs
I used the command ls -al to verify that this directory contains my Isabelle source code file Increments.thy.
I generated a pdf-file D:\git\Publications\2017AFPproofs\output\document\root.pdf by calling Isabelle:
$ isabelle build -v -D .
I checked the result in Microsoft Explorer and displayed it with my pdf-viewer.
That worked.

Building OpenJDK8 on Windows x64

so I am trying to compile openjdk8 from sources, but I am stuck at missing files problem in the end of compilation process...
Here is the software that I use:
Windows 7 SP1 x64
Windows SDK for Windows 7.1
Microsoft .NET Framework 4
Visual Studio 2010 Express Edition
GNU make 3.82 (compiled by myself)
Freetype 2.3 (compiled by myself)
Oracle JDK 1.7 update 71
Direct X 9.0 (August 2009)
Cygwin
Here are the manuals which I was reading from:
Official README
Royvanrijn's build guide
Some other build guide
Build guide using MSYS
With all these guides I am able to let it compile, however during the Building Images - step , I get an error that some files are missing ( and they are indeed missing ) , which makes me think that something has gone wrong during the build...
There are several points where I afraid I might be doing something wrong...
Cygwin
Right now I use cygwin version 2.8. The openjdk configure script requires cygwin version >1.7 but fails to recognize that 2.8 is greater than 1.7 and throws me an error, so i've tweaked the script (made build work like 2 months ago)...
./configure
My configure command looks as follows:
./configure --disable-ccache --with-freetype=/cygdrive/c/freetype
Maybe I need more arguments here to make it work ( note that i've copied self compiled make executable to cygwin bin folder, so that i dont need to provide its location )
Visual Studio C++ 2010 Express
I would rather try Professional Trial version, but it cannot be found anywhere anymore... (except torrents...) I have a strong feeling that Express version is not suitable for openjdk build. I also get that error with missing ammintrin.h file, but it is easily resolved by creating the empty header file in the include folder of Visual Studio installation.
My basic procedure of building is:
Install all the software above
hg clone http://hg.openjdk.java.net/jdk8/jdk8
./get_source.sh
./configure --disable-ccache --with-freetype=/cygdrive/c/freetype`
make clean images
However, here how it ends :
Does anyone have any clue of how to solve this?
I found the proper fix: using the Cygwin installer, downgrade Grep to 2.27, which properly ignores CRLF line endings.
Run the Cygwin setup (e.g. setup-x86_64.exe)
Advance through the setup wizard until you get to the package selection
Choose "Full" from the View drop-down menu
Type "grep" into the search field
Click the icon in the New column until it shows a 2.x version (2.27 as of this writing)
Click Next and then Finish.
I found myself in the same position as you, except in my case I need OpenJDK build to be repeatable, so "run make repeatedly until it finishes" wasn't an acceptable solution.
Through some experimenting, I found the root cause:
grep was failing because the file being processed had Windows line endings (CRLF)
The Windows line endings were due to the fact that the file is generated by a Java app (fixpaths) which emits platform-native line endings
Identifying fixpaths led me to an old OpenJDK e-mail thread, which reported that some users were having the same problem and fixed it by downgrading.
This gave me the idea to try downgrading grep. I did so, and it worked.
So, after couple of days at this task my only approach was to ignore the errors with the missing files and continue extracting files... This resulted in still working jdk image, which i currently use. My guess is that the errors come frome Oracle boot jdk. Since i am compiling an openjdk, it cannot find oracleJDK files in its headers and thus produces errors.
So, if anyone also gets same errors a me, try to ignore the missing files error and continue the images build.

Unable to install Time:Piece module with cpan

I need to install the Time::Piece module in Perl. It's not there for some reason. When I use
cpan install Time::Piece
after some successful steps I get the error below
.....
Checking if your kit is complete...
Looks good
Unable to find a perl 5 (by these names: "My windows path variable contents here...i think"
Writing Makefile for Time::Piece
'nmake' is not recognized as an internal or external command,
operable program or batch file.
RJBS/Time-Piece-1.29.tar.gz
nmake -- NOT OK
Running make test
Can't test without successful make
Running make install
Make had returned bad status, install seems impossible
Failed during this command:
RJBS/Time-Piece-1.29.tar.gz : make NO
cpan[2]>
Why is this happening ? Please help me to fix it.
I'll wait for an answer while I try to fix it myself. First problem -
'nmake' is not recognized as an internal or external command, operable program or batch file.
I used this solution
Windows 7 Control Panel, Programs and Features, Select Microsoft
Visual Studio 2008 Standard or Professional Edition application then
choose Uninstall/Change/Modify. This will bring you into Maintenance
Mode. Select C++ then check X64 Compilers and Tools.
I had Visual Studio Express and Visual Studio Professional 2013 (I don't remember how or why it's there on my system.) I followed the above instructions. The options were different: one had C++ mentioned in it - Microsoft Visual C++ 2013 Microsoft Foundation Class Libraries. So, I chose that one. Its a 600MB download and install.
I went to C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin and found nmake there. If you don't find it there, then you might find in Microsoft Visual Studio 10, 11 etc. Look for nmake there. Add the path for nmake to the PATH environment variable.
Now, I get a new error
NMAKE : fatal error U1073: don't know how to make 'C:/Program'
Stop.
RJBS/Time-Piece-1.29.tar.gz
nmake -- NOT OK
Running make test
Can't test without successful make
Running make install
Make had returned bad status, install seems impossible
Failed during this command:
RJBS/Time-Piece-1.29.tar.gz : make NO
cpan[2]>
I'll try to fix this one too. By the way, #ikegami told me that installing to a path with no spaces (C:\progs\...) will solve my problem. I cannot install to another directory:
This version of Perl comes bundled with other software which must
install to the folder "C:\Program Files (x86)" which has a space in it. The
software needs to be in that path for some other things to work
correctly. Is there a simple way to edit the code which is trying to install the modules? I could make it parse the path by changing some
code. I am new to Perl though. Not sure if I'll be able to change
without causing harm .
EDIT -
I have both Active state perl 5.1.2 and perl 5.8 which are used by tool x and tool y (electric commander). Tool y has its own perl libraries which must be used in my code. So I am stuck with perl 5.8.
I just came to know this is due to issues with tool y. There is a workaround for this, but I am not able to understand it. Can you please help me to understand the workaround for windows ?
https://electriccloud.zendesk.com/hc/en-us/articles/202828073-KBEC-00180-Installing-Perl-modules-into-the-Commander-Perl-distribution
Which version of Perl are you running? what do you get if you run "perl -v" at a command prompt?
If the version number you get is 5.10 or higher, then Time::Piece should already be included with that version of Perl. If it's not, then your installation is broken in interesting ways and you should probably reinstall it from scratch.
If the version number you get is lower than 5.10 then you have a painfully old version of Perl installed and your best approach will be to upgrade to a newer version.

How to install Lua on windows

I'm new to Lua, and need to know how to install it on Windows?
I've tried and am unable to run the sample. When I try to compile it 100% success is shown, but when I click the run button it shows this error:
Can't find moai executable in any of the folders in PATH or MOAI_BIN:
C:\Program Files\moai, D:\Program Files\moai, C:\Program Files (x86)\moai, D:\Program Files (x86)\moai, C:\WINDOWS\system32, C:\WINDOWS, C:\WINDOWS\System32\Wbem, C:\moai-sdk\bin\win32\moai.exe, C:\moai-sdk/bin
If anyone can help me on how to install Lua, thanks.
Lua does not have a certified IDE or compiler to come with it. You usually run lua code from a lua command line / lua file which will handle the tasks you are attempting to create.
Downloading
Lua has a website where you can download their tools which will allow you to write and execute lua code: https://www.lua.org/download.html
Using lua console
After you download the file, put it in a file location anywhere on your computer, in order execute lua code; the first method is to open the lua console and simply type out your command: https://prnt.sc/ibw97h
Another method you can use, is make a .txt or .lua file, write your code in that, then you can drag and drop the file onto the lua console to execute it: https://prnt.sc/ibwa2f
Installing lua system wide
Add lua in the environment variables by adding the path from where it's installed. After doing this you can open PowerShell and enter lua53.exe to open lua.
Additional details
Although these is what lua directly offers, there are other third party alternatives of compiling and executing lua code. Examples of these can be found if you search for them.
I assume that you are trying to use Moai SDK to develop games.
I think this article may helps.

Configuring Bison to compile an input file under Visual C 6

I'm trying to get Bison to do it's thing in VC6. I'm sure this must be a problem with my configuration. At the moment I have a Custom Build step as follows.
<Commands>
echo Start parser generation
"C:\GnuWin32\bin\bison.exe" $(InputPath)
echo Finish parser generation
<Outputs>
$(ProjDir)\$(InputName).c
$(ProjDir)\$(InputName).h
The error I get is "C:\GnuWin32\bin\bison.exe: m4: No such file or directory", which makes me think the m4.exe doesn't exist or isn't on the path, but I can run the exact same command from CMD in the same directory with no errors.
This thought process makes me think it's a problem with the output options, but I've tried various configurations with no luck.
Any help would be great, thanks in advance.
Edit: I've added some more visual studio versions to the tag list to try to get some more exposure for the question. Hopefully someone will have done this in a later version and I can work backwards.
Okay, I've managed to slove this in a very round-about way but I will try my best to document it here.
It seems that VC6 Custom build options will only look in the project directory for the m4.exe, even when you specify where the calling exe (bison) is explicitly. To solve this I did a bit of a hack and did a full cd command in the custom build window to get to the gnuwin32 directory (where both bison and m4 live) before trying to call the parser generator.
This works fine but is a bit of a hassle for trying to distribute it to other people when they may have installed GNU tools to a different location.

Resources