Text stripper stops and waits - java-8

I'm trying to extract text from a pdf file, in order to index it with Lucene. This is the code:
PDFParser parser = new PDFParser(new FileInputStream(f));
parser.parse();
String text = new PDFTextStripper().getText(parser.getPDDocument()); // stops here
parser.getPDDocument().close();
The execution starts waiting indefinitely at the row indicated in comment. I am sure the previous row has been executed.
I'm using pdfbox version 1.8.
Can anybody help me?

First of all, I used the PdfBox with 2.0.17 version (not 1.8).
The correct code for getting the text inside a pdf file is the following:
PDDocument doc = PDDocument.load(file);
PDFTextStripper stripper = new PDFTextStripper();
String content = stripper.getText(doc);
This works!

Related

HiQPDF error "Cannot write the document to output file. Invalid Serial Number Version." when merging 2 pdf

I'm using HiQPdf to merge 2 pdf in one file, following to the official help: https://www.hiqpdf.com/documentation/html/e5d2f1ee-dccb-4351-888e-e3f3c15a93a5.htm
I get the "HiQPdf Evaluation". Which prove the code works.
I added my serial number:
PdfDocument resultDocument = new PdfDocument();
resultDocument.SerialNumber = "AU***************-OA=="; // this line is not in the help
PdfDocument document1 = PdfDocument.FromFile("c:\\temp\\doc1.pdf);
resultDocument.AddDocument(document1);
PdfDocument document2 = PdfDocument.FromFile("c:\\temp\\doc2.pdf);
resultDocument.AddDocument(document2);
resultDocument.WriteToFile("c:\\temp\\MergePdf.pdf"); //getting an Exception here !
I get an exception on the resultDocument.WriteToFile : Cannot write the document to output file. Invalid Serial Number Version.
If I remove the resultDocument.SerialNumber, the merged pdf is generated but with the "HiQPdf Evaluation" water mark.
I assert that my serial is correct since I succesfully use it to for HtmlToPdf conversion:
HtmlToPdf htmlToPdfConverter = new HtmlToPdf();
htmlToPdfConverter.SerialNumber = "AU***************-OA==";
PdfDocument resultDoc = null;
resultDoc = htmlToPdfConverter.ConvertHtmlToPdfDocument(html, "");
This code (with my serial) succesfully produces a PDF without the "HiQPdf Evaluation: water mark. And if I remove my serial, the "HiQPdf Evaluation: water mark appears.
Is there another place/way to use the serial? Is it possible that my Serial is correct for htmlToPdfConverter but not the PdfDocument merging?
I finally received a response from the support: my serial is valid for a PREVIOUS version of HiQPdf.
Be aware that in Nuget, the only version available is version 12. To downgrade you need to find a backup of your previous version of HiQPdf.dll and put it manually in you packages.
Below the response from the support:
HiQPdf Sales sales#hiqpdf.com Mon 18/04/2022 18:20
Hello,
Your serial number is for version 10 of the software. For version 12
you need a new serial number.
You can renew the old license with a 20% renewal discount to obtain a
serial number for the latest version of the software in the page
below: [link removed]
Best Regards,
Jacob

Qt : Reading the text file and Displaying in LineEdit

I have an input file and a batch file. When the batch file is executed using the System command,
a corresponding outfile is generated.
Now I want a particular text (position 350 to 357) from that outfile to be displayed on to my lineedit widget
Here is that part of my code:
system("C:/ORG_Class0178.bat")
Now the outfile will be generated
File.open("C:/ORG_Class0178_out.txt", 'r').each do |line|
var = line[350..357]
puts var
# To test whether the file is being read.
#responseLineEdit = Qt::LineEdit.new(self)
#responseLineEdit.setFont Qt::Font.new("Times NEw Roman", 12)
#responseLineEdit.resize 100,20
#responseLineEdit.move 210,395
#responseLineEdit.setText("#{var}")
end
When I do test whether the file is being read using puts statement, I get the exact required output in editor. However, the same text is not being displayed on LineEdit. Suggestions are welcome.
EDIT: A wired observation here. It works fine when I try to read the input file and display it , however it does not work with the output file. The puts statement does give the answer in editor confirming that output file does contain the required text. I am confused over this scenario.
There is nothing wrong with the code fragments shown.
Note that var is a local variable. Are the second and third code fragments in the same context? If they are in the same method, and var is not touched in-between, it will work.
If the fragments belong to different methods of the same class, than an instance variable (#var) will solve the problem.
If all that does not help, use Pry to chase the problem. Follow the link to find the pre-requisites and how to use. Place binding.pry in your code, and your program will stop at that line. Then inspect what your variables are doing.
try 'rb' instead of 'r'
File.open("C:/ORG_Class0178_out.txt", 'rb').each do |line|
var = line[350..357]
puts var

iText - adding Image element generates a corrupt PDF file

I'm using iText® 5.2.1 ©2000-2012 1T3XT BVBA and Integration Designer 8.0 to create a PDF file that is exported in an byte array.
I am creating a document with a fair amount of text and want to add a logo at the beginning.
Part of the code that is adding the image is as follows:
BASE64Decoder decoder = new BASE64Decoder();
byte[] decodedBytes = decoder.decodeBuffer(Stringovi.SLIKA1);
Image image1 = Image.getInstance(decodedBytes);
image1.setAbsolutePosition(30f, 770f);
image1.scalePercent(60f);
document.add(image1);
The input image is in byte array format because of the system requirements.
The rest of the document consists of different tables with various content and it's all text.
When I add the image in the before mentioned way the program finishes and i get an byte output that i run trough a Base64 decoder. Resulting PDF can not be opend and the error shown is:
"Error [PDF Structure 40]:Invalid reference table (xref)"
I can't see where my mistake is so if anybody could be so kind and point me in the right direction I would very much appreciate it.
The document you presented as a "broken PDF file" is not a complete PDF file. It doesn't end with %%EOF, it doesn't have a cross-reference table,... It's a PDF document that isn't complete.
This means that you don't have the following line in your code:
document.close();
If you do have this line, it isn't reached. For instance: an exception is thrown causing the code to jump to a catch clause, skipping the close() operation.
The error message saying Invalid reference table (xref) is consistent with that diagnosis. This isn't a problem caused by iText. It's a problem caused by bad coding: not closing the document and/or not dealing with exceptions correctly.

Visual Basic - Writing / Overwriting into a text line adds a new line

Hello everyone I made a simple program that takes my external IP and places in a my websites public camera. And I got a problem - The program is making a txt file with the ip inside it and uploads it to the server.When the program is overwriting/editing/creating the file its adding an empty new line which messes up my PHP code...
This is the code used for both overwriting/editing and creating the file
Dim strFile As String = "c:/IPtoUse.txt"
Dim fileExists As Boolean = File.Exists(strFile)
Using sw As New StreamWriter(File.Open(strFile, FileMode.OpenOrCreate))
sw.WriteLine( _
IIf(fileExists, GetIP, GetIP))
End Using
(the GetIP function is getting my ip from my server)
This ends up with another empty line. How can I fix it?
Thanks!
Going on the information from the question and comments, it seems that your file will end up with an additional linefeed at the end in both cases (ie. both for new and modified files).
The reason for this is that you're using the WriteLine method, which will append a newline at the end of the text it writes, even if that text already ends with a newline.
Simply change the code to use the Write method instead of the WriteLine method and you should end up with a file that contains only the text passed to the method.

Editing a spreadsheet using SPREADSHEET ruby gem

I have to read data from a spread sheet modify some rows and then write the updated rows / cells into the same file.
I have used Spreadsheet gem with Ruby 2.0.0.
When I write the results back to the same file, I am unable to open the xls any more. I get an error
"File Format is not Valid"
in MS Excel.
When the updates are written onto a different file, I am able to open the file but it is in protected view. Is there a solution to this issue?
Below is the sample code:
require 'rubygems'
require 'spreadsheet'
book = Spreadsheet::open('filePath')
sheet = book.worksheet 0
## have application logic in here
book.write('filePath')
I've worked with this problem a few times and they've had the issue on log for around a year now.
The first problem is that it locks the file when spreadsheet loads it and there is no clear way to close it the only way I've been able to get it to not lock is with this code block. It opens it and stores the first worksheet off into its own variable then closes the file.
worksheet = nil
Spreadsheet.open workbook_name do |inner_book|
worksheet = inner_book.worksheet 0
end
worksheet
If you want all the worksheets you could do something similar. In addition to the file opening closing/problem you have the issue around capturing the content of the worksheet depending on the format. I know for my purposes I end up doing the following to capture the content. This sadly loses any formatting you might have had in the source spreadsheet.
rows = []
worksheet.each do |row|
rows << row
end
You can then make your own workbook/sheet and iterate through the rows and add them to the new sheet/book. Then save the new book with the same file name.
Its not fun or efficient, but it is a way to go about solving the problem. Hope this helped.
check your file extension.
spreadsheet, writeexcel..etc gems seem couldn't work with xlsx files.
try .xls not .xlsx

Resources