Data retrieve from PDF using Perl - perl-module

I would like to know the retrieve data from PDF using PERL. I have used the API::PDF but I'm expecting other than that. I am expecting data output like as PDF 2 DOC. I would be appreciate any one help me.
Thanks!

Writing a general PDF converter is not a simple task. There are at least two modules on CPAN which can help:
CAM::PDF
PDF::API2

Related

Simple arithmetic functions in Elasticsearch

I am starting to get acquainted with the use of ELK for work purposes, but struggle to find a solution to use simple mathematic requests in my database.
As shown on the picture, my DB contains 16 available fields, but I would like to create others, without doing it on Excel before converting my file in CVS again.
For example, I would like to create a variable #Bugs/Release. I've heard that this is quite easy to make with no need of scripting, but I can't find the way to do it... Has anybody the solution of this problem?
Huge thanksenter image description here

How to convert PDF to PDF/A-1a using ghostscript? What conditions are needed to convert to PDF/A-1a?

I already did a lot of research and realized that clear information about "How to generate PDF/A-1a" or "...convert to PDF/A-1a" is really rare. I found some information to convert to PDF/A-1a via GhostScript, but I didn't make it to get it working. So, maybe there are some necessary conditions for the data missing in the first place. Conditions like propper metadata of the PDF, structured data for readability by a screen reader, alternative text for pictures, and a declaration of the given language of the text. I need a proper working GhostScript command with the corresponding gs version and the mandatory file conditions to generate or even convert to PDF/A-1a. PDF/A-1b means nothing to me because I'm already able to convert to that.
Thanks for any help.

How to read/search specific content in a webpage using shell scripting

I'm a beginner in shell scripting.
I'm trying to write a script in which a part of it involves reading the value from a webpage. In this case, The shell script tries to fetch the IMDB rating of a movie by going to the movie's IMDB page.
Can someone suggest me how i can achieve this & also what are the topics i need to learn ?
Thank you.
You can use wget and curl to get the page. Then you'll need to use regex or some other string manipulation to get the information you need from that. It would be a lot easier to use a library to do some of these things for you.

How can I convert GOPixbuf strings to images

I have an XML file produced with Gnumeric that contains images, stored as GOPixbuf strings inside XML. They look like this:
eXyA/4KEiP9xcnf/f3+E/3l5ff9xb3L/jo2Q/29wdP+ [truncated]
For each string I have width and height, and a rowstride parameter, like in this example:
<GOImage name="Image(70)" type="GOPixbuf" width="151" height="135" rowstride="604">
Is there a reasonable way to convert that to an image - any format will do?
I'm conversant with perl and image conversion tools (imagemagick, gimp) but I have not found any documentation by googling beyond GTK or GOffice docs.
You have already found stuff that is helpful. But since there are no Perl bindings for this on CPAN, you would have to make your own if you want to use Perl.
Fortunately, you don't have to know XS to do that. You can use FFI::Platypus to create temporary bindings and only map what you need.
The docs you have probably already found have a Getting started with GOffice section. After a quick check I found that on my recent Ubuntu there is a package that contains that lib. It is called libgoffice-0.10-dev.
Now you can set that up and play around with the lib functions. Somewhere in https://developer.gnome.org/goffice/unstable/GOImage.html there probably is a method to read and convert it.
One of the good ones might be go-image-get-pixbuf, which returns a GdkPixbuf. That in turn has a very extensive documentation. Maybe what you need might be in this one.
Good luck.

How to convert hadoop sequence file to json format?

As the name suggests, I'm looking for some tool which will convert the existing data from hadoop sequence file to json format.
My initial googling have only shown up results related to jaql, which I'm desperately trying to get to work.
Is there any tool from Apache available for this very purpose?
NOTE:
I've hadoop sequence file sitting on my local machine and would like to get data in corresponding json format.
So in-effect, I'm looking for some tool/utility which will take hadoop sequence file as input and produce output in json format.
Thanks
Apache Hadoop might be a good tool for reading sequence files.
All kidding aside, though, why not write the simplest possible Mapper java program that uses, say, Jackson to serialize each key and value pair it sees? That would be a pretty easy program to write.
I thought there must be some tool which will do this given that its such common requirement. Yes, it should be pretty easy to code but again why to do so if you already have something which does just the same.
Anyway, I figured out to do it using jaql. Sample working query which worked for me,
read({type: 'hdfs', location: 'some_hdfs_file', inoptions: {converter: 'com.ibm.jaql.io.hadoop.converter.FromJsonTextConverter'}});

Resources