Idexing pdf files using PDF-Paragraph converter - watson-explorer

When i index pdf files and using PDF-Paragraph converter WEX put a default title on all documents the (title). How can i modify this title using xpath?
On the search page all documents has the same title.

It is not possible
http://www-01.ibm.com/support/docview.wss?uid=swg22009088
check it we need to create a new custom java converter for your reqirement

Related

Is there any way to convert ckeditor html content to ms word document?

I have a Laravel project where I need to create a doc/docx document based on user input in Ckeditor. I have previously worked with PHPword where I can convert simple text input to a docx document. But the problem with ckeditor is it gives you html with inline css (which i need) and PHPWORD can not convert this to a docx.
I also tried to convert the html to word by xml but no luck. I know there is a paid tool called phpdocx but I am looking for a free solution.
Just a note, I can actually convert the html to pdf. But again, there is no solution from pdf to doc.
So, any help in converting the html to word or pdf to word?
thanks

How to implement a TOC if I'm using html2pdf to convert HTML to PDF?

I'm using the following method to convert html to PDF.
HtmlConverter.convertToPdf(htmlCode,servletOutputStream);
but here I don't have the possibility of handle the Document object to create the TOC according to this example
Now if I use
Document document=HtmlConverter.convertToDocument(.....)
Will it update the document created if I work with that document object?

Can't insert html with <object> tag

I'm struggling with ckeditor 4.5, as I'm creating plugins to insert specific tags in current document after having uploaded a file on my server.
For some specific file types, I want to embed the element. I can add <audio> or <video> tags (by using allowContent=true in my config file), but when I insert an <object> tag (to embed a pdf file), the tag is just ignored.
I already tested adding config.extraAllowedContent = 'object[id, name, width, height, data, type] to the config file, with no avail.
I found some workarounds by adding a <div> around the <object>, but the pdf viewer is not displayed in the editor (but the <object> tag is there).
I think I'm doing something wrong with ACF, but I really don't see what.
only wrapped the tag with or , and if you use wordpress , or drupal try to disable Advanced content filter from cms
<object> is not recommended for PDF and should not be a part of an editable area. There is no way it can be editable like text or paragraph. However, it can be a non-editable element inside the editor, with some editable parts. This is what CKEditor call widget, here is a tutorial how to create it: https://ckeditor.com/docs/ckeditor4/latest/guide/widget_sdk_intro.html
Note PDF format is not normally classed as "Media" more generally "Document" but there are two ways to embed. One is to use the allMedia plugin that does include PDF as a media format ;-) the other is to include content via google docs. So review the various "Demonstrations" on the website.

convert pdf to html using abcpdf

i am looking for a method to convert a pdf document into corresponding html document using abcpdf. kindly let me know if it is feasible. FYI, My pdf document has rich text along with images.
You can. Try this. Hopefully it'll work.
var doc = new WebSupergoo.ABCpdf10.Doc();
doc.Read('your Pdf byte array');
doc.Save('your HTML file path with .html extension');
doc.Clear();
doc.Dispose();
For documentation please have a look at the note section
http://www.websupergoo.com/helppdfnet/source/5-abcpdf/doc/1-methods/save.htm
To export as XPS, PostScript, DOCX or HTML you need to specify a file path with an appropriate extension - ".xps", ".ps", ".docx", ".htm", ".html" or ".swf". If the file extension is unrecognized then the default PDF format will be used.
You can definitely convert HTML to PDF, but I am not sure the inverse is possible to do with abcpdf.
Perhaps you can give a try to iText (iTextsharp)

Nokogiri, find XML node with multiple attributes and change text

I'm trying to change a indl file. The indl file is a file created by Adobe Indesign to keep the structure of a document, and is basically an XML. I want to use Nokogiri to find some selected XML nodes and replace the text with my text, saving then the xml to another file.
The XML of course is strange: i find some document to retrieve HTML tag with Nokogiri changing text but I don't know How I can manage a piece of XML like this:
<cflo>
<txsr prst="o_u5084" crst="o_u5085" trak="D_10">
<pcnt>c_tEST</pcnt>
</txsr>
<txsr prst="o_u5086" crst="o_u5c" trak="D_20">
<pcnt>c_Titolo titolo titolo</pcnt>
</txsr>
<cflo>
Basically I need to look for a combination of prst and crst attribute and replace the content inside the pcnt node.
I try with this
#doc.xpath("//txsr[prst='o_u5086' and crst='o_u5085']")
but I don't know how I can change ther text inside the pcnt node.
That's not the correct XPath. The correct XPath will look like this:
#doc.xpath("//txsr[#prst='o_u5086'][#crst='o_u5085']")
You should just take the first node from a set and use the inner_html= method to replace the text value.
Full code may be found here: https://gist.github.com/kaineer/7673698

Resources