ITextSharp taking too much time in getting Number of Pages

ITextSharp taking too much time in getting Number of Pages - performance

I have this piece of code:
foreach(string pdfFile in Directory.EnumerateFiles(selectedFolderMulti_txt.Text,"*.pdf",SearchOption.AllDirectories))
{
//filePath = pdfFile.FullName;
//string abc = Path.GetFileName(pdfFile);
try
{
//pdfReader = new iTextSharp.text.pdf.PdfReader(filePath);
pdfReader = new iTextSharp.text.pdf.PdfReader(pdfFile);
rownum = pdfListMulti_gridview.Rows.Add();
pdfListMulti_gridview.Rows[rownum].Cells[0].Value = counter++;
//pdfListMulti_gridview.Rows[rownum].Cells[1].Value = pdfFile.Name;
pdfListMulti_gridview.Rows[rownum].Cells[1].Value = System.IO.Path.GetFileName(pdfFile);
pdfListMulti_gridview.Rows[rownum].Cells[2].Value = pdfReader.NumberOfPages;
//pdfListMulti_gridview.Rows[rownum].Cells[3].Value = filePath;
pdfListMulti_gridview.Rows[rownum].Cells[3].Value = pdfFile;
//totalpages += pdfReader.NumberOfPages;
}
catch
{
//MessageBox.Show("There was an error while opening '" + pdfFile.Name + "'", "Error!", MessageBoxButtons.OK, MessageBoxIcon.Error);
MessageBox.Show("There was an error while opening '" + System.IO.Path.GetFileName(pdfFile) + "'", "Error!", MessageBoxButtons.OK, MessageBoxIcon.Error);
}
}
Problem is that when today I specified a folder having about 4000 pdf files, It took about 20 minutes to read all files and show me the results. Then, I thought what will this code do when I will input a folder having more than 20,000 files.
If I comment out this line:
pdfListMulti_gridview.Rows[rownum].Cells[2].Value = pdfReader.NumberOfPages;
Then, it seems if all of the processing burden is removed from the code.
So, what I want from you guys is a suggestion for making my approach efficient and less time should be taken to process all files. Or there is any alternative?

Definitely do what #ChrisBint said, that will get past Window's slowness with folders with many files.
But to get even more speed make sure to use the overload of PdfReader that takes a RandomAccessFileOrArray object instead. This object is way faster than regular streams in all of my testings. The constructor has a couple of overloads but you should mainly concern yourself with RandomAccessFileOrArray(string filename, bool forceRead). The second parameter is whether or not to load the entire file into memory (if I'm understanding the documentation correctly). For very large files this might be a performance hit but on modern machines it shouldn't matter much so I recommend that you pass true to this. If you pass false the disk will need to be hit several times as the parsing "cursor" walks through the file.
So with all of that you can do this in a very tight loop. For me, 4,000 files containing a total of over 42,000 pages takes about 2 seconds to run.
var files = Directory.EnumerateFiles(workingFolder, "*.pdf");
int totalPageCount = 0;
foreach (string f in files)
{
totalPageCount += new PdfReader(new RandomAccessFileOrArray(f, true), null).NumberOfPages;
}
MessageBox.Show(String.Format("Total Page Count : {0:N0}", totalPageCount));

Personally, I would change your code slightly to not call the Directory.EnumerateFiles in the foreach. For example;
var listOfFiles = Directory.EnumerateFiles(selectedFolderMulti_txt.Text,"*.pdf",SearchOption.AllDirectories);
foreach(string pdfFile in listOfFiles)
{
//Do something
}
I doubt this would impact the overall time by a massive amount, if any.
As far the speed to call the NumberOfPages property. It is unlikely that you will be able to optimise this due to be internal to the pdfReader object. If performance is a concern, then this may require additional hardware.
Personally, I would not factor this as an issue unless I have to continually run the scan (in which case I would start looking at caching/checking for existing files and only adding those that have changed/new).

Related

Indesign Scripting: View array's actual content (strings) in ExtendScript console

I'm a beginning learner of InDesign scripting and would like to help myself with debugging, but my attempts seem to run into walls. Hope someone has some insights that will help me going forward.
I'm working on a little project that loops through some selected tables, puts the 3 tables into an array/variable (accomplished that) and then loops through the content of those tables to find a GREP match and store those in an array/variable (for further uses I won't get into now)
My main objective at this point: See exactly what text characters the .findGrep(); function is catching and display those in the Javascript Console of the ExtendScript Toolkit app.
So here's a bit of the journey up to this point, including codes tried and suggestions from others. (All of my attempted uses of these has failed...why I'm here now... and why this is long; my apologies)
Initial try.
var myTables = []; (in Data Browser this shows values of [object Table], [object Table], [object Table]
var myFinds = [];
var myTest = [];
var myCharacters = [];
app.findGrepPreferences = null;
app.findGrepPreferences.findWhat = "\"";
for (x = 0; x < myTables.length; x++) {
var myFinds = myTables[x].findGrep();
$.writeln(myFinds);
};
Notes on this code: Because not every table has the characters in the findWhat, sometimes in this loop myFinds has nothing, but when it does have something, it shows this in console [object Character],[object Character],[object Character]
So someone (firstHelp) gave me this: And it did not work... error thrown on .contents.toString(); *"undefined is not an object" which I thought, "ok, yes I see at times in the loop myFinds has nothing in it... more on this later"
var stringArray = [];
for( var n=0; n<myFinds.length; n++ ) {
stringArray[n] = myFinds[n].contents.toString();
};
$.writeln(myFinds.join("\r"));
Code revamp Gave up on the $.writeln(myFinds); within the loop and tried this in order to gather Grep finds in a variable/array that could be dealt with outside of loop.
for (x = 0; x < myTables.length; x++) {
$.writeln(myTables[x].cells.firstItem().texts[0].contents[0]);
myFinds.push(myTables[x].findGrep());
};
$.writeln(myFinds);
ExtendScript Toolkit console now showing this for myFinds:
*myFinds = [Array], [object Character], [object Character], [object...
+ (object symbol) 0 =
+ (object symbol) 1 = [object Character], [object Character], [object Character]
+ (object symbol) 2 =
+ (object symbol) _proto_ =*
*again tried the .contents.toString(); on the myFinds and still the same error, "undefined..." including targeting the array when it clearly had something in it.
**So then I get this tipoff...(but no helpful code to apply to what I already have)
"you are dealing with arrays of arrays mixed with texts.
So you have to check with each item of the result array if it is text
or another array of texts.
If it is an array loop that array."
And later this bit of code that is supposed to "flatten" my array... a = [].concat.apply([],a);
Replacing a with myFinds like this, myFinds = [].concat.apply([],myFinds); did absolutely nothing. The array and its contents showed no change in the console... and I have no idea how to loop through each item of this array within an array, find out if it's text or another array and then show its real contents to console.
Really...how many loops and if/thens etc do I need to run on one array to show its actual contents in the console? But I know I struggle with breaking down every little step I want, to its minute scripting granularity and so my ignorance regularly impedes me. I welcome any suggestions/tips to move me closer to my **main objective" as stated above. Thanks

Regarding the first help. The real reason why you get an error while accessing content property is that you don’t check the type of the object and presume it will be a Text object. As the findGrep may not find a Text occurrence, you actually get an empty array. And Array.prototype.contents doesn’t exist hence the error.
Then $.writeln is legacy of Adobe ExtendScript toolkit, the IDE for ExtendScript. This product is no longer de eloped and maintained by Adobe. You should consider using other logging techniques such as the Visual Studio ExtendScript plugin which will allow you to use breakpoints and everything you need.

How to save an image in a subdirectory on android Q whilst remaining backwards compatible

I'm creating a simple image editor app and therefore need to load and save image files. I'd like the saved files to appear in the gallery in a separate album. From Android API 28 to 29, there have been drastic changes to what extent an app is able to access storage. I'm able to do what I want in Android Q (API 29) but that way is not backwards compatible.
When I want to achieve the same result in lower API versions, I have so far only found way's, which require the use of deprecated code (as of API 29).
These include:
the use of the MediaStore.Images.Media.DATA column
getting the file path to the external storage via Environment.getExternalStoragePublicDirectory(...)
inserting the image directly via MediaStore.Images.Media.insertImage(...)
My question is: is it possible to implement it in such a way, so it's backwards compatible, but doesn't require deprecated code? If not, is it okay to use deprecated code in this situation or will these methods soon be deleted from the sdk? In any case it feels very bad to use deprecated methods so I'd rather not :)
This is the way I found which works with API 29:
ContentValues values = new ContentValues();
String filename = System.currentTimeMillis() + ".jpg";
values.put(MediaStore.Images.Media.TITLE, filename);
values.put(MediaStore.Images.Media.DISPLAY_NAME, filename);
values.put(MediaStore.Images.Media.MIME_TYPE, "image/jpeg");
values.put(MediaStore.Images.Media.DATE_ADDED, System.currentTimeMillis() / 1000);
values.put(MediaStore.Images.Media.DATE_TAKEN, System.currentTimeMillis());
values.put(MediaStore.Images.Media.RELATIVE_PATH, "PATH/TO/ALBUM");
getContentResolver().insert(MediaStore.Images.Media.EXTERNAL_CONTENT_URI,values);
I then use the URI returned by the insert method to save the bitmap. The Problem is that the field RELATIVE_PATH was introduced in API 29 so when I run the code on a lower version, the image is put into the "Pictures" folder and not the "PATH/TO/ALBUM" folder.

is it okay to use deprecated code in this situation or will these methods soon be deleted from the sdk?
The DATA option will not work on Android Q, as that data is not included in query() results, even if you ask for it you cannot use the paths returned by it, even if they get returned.
The Environment.getExternalStoragePublicDirectory(...) option will not work by default on Android Q, though you can add a manifest entry to re-enable it. However, that manifest entry may be removed in Android R, so unless you are short on time, I would not go this route.
AFAIK, MediaStore.Images.Media.insertImage(...) still works, even though it is deprecated.
is it possible to implement it in such a way, so it's backwards compatible, but doesn't require deprecated code?
My guess is that you will need to use two different storage strategies, one for API Level 29+ and one for older devices. I took that approach in this sample app, though there I am working with video content, not images, so insertImage() was not an option.

This is the code that works for me. This code saves an image to a subdirectory folder on your phone. It checks the android version of the phone, if its above android q, it runs the required codes and if its below, it runs the code in the else statement.
Source: https://androidnoon.com/save-file-in-android-10-and-below-using-scoped-storage-in-android-studio/
private void saveImageToStorage(Bitmap bitmap) throws IOException {
OutputStream imageOutStream;
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.Q) {
ContentValues values = new ContentValues();
values.put(MediaStore.Images.Media.DISPLAY_NAME,
"image_screenshot.jpg");
values.put(MediaStore.Images.Media.MIME_TYPE, "image/jpeg");
values.put(MediaStore.Images.Media.RELATIVE_PATH,
Environment.DIRECTORY_PICTURES + File.pathSeparator + "AppName");
Uri uri =
getContentResolver().insert(MediaStore.Images.Media.EXTERNAL_CONTENT_URI,
values);
imageOutStream = getContentResolver().openOutputStream(uri);
} else {
String imagesDir =
Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_PICTURES). toString() + "/AppName";
File image = new File(imagesDir, "image_screenshot.jpg");
imageOutStream = new FileOutputStream(image);
}
bitmap.compress(Bitmap.CompressFormat.JPEG, 100, imageOutStream);
imageOutStream.close();
}

For old API (<29) I place an image into the external media directory and scan it via MediaScannerConnection.
Let's see my code.
This function creates an image file. Pay attention to an appName variable - it's is a name of an album in which the image will be displayed.
override fun createImageFile(appName: String): File {
val dir = File(appContext.externalMediaDirs[0], appName)
if(!dir.exists()) {
ir.mkdir()
}
return File(dir, createFileName())
}
Then, I place an image into the file, and, at last, I run a media scanner like this:
private suspend fun scanNewFile(shot: File): Uri? {
return suspendCancellableCoroutine { continuation ->
MediaScannerConnection.scanFile(
appContext,
arrayOf<String>(shot.absolutePath),
arrayOf(imageMimeType)) { _, uri -> continuation.resume(uri)
}
}
}

After some trial and error, I discovered that it is possible to use MediaStore in a backwards compatible way, such that as much code as possible is shared between the implementations for different versions. The only trick is to remember that if you use MediaColumns.DATA, you need to create the file yourself.
Let's look at the code from my project (Kotlin). This example is for saving audio, not images, but you only need to substitute MIME_TYPE and DIRECTORY_MUSIC for whatever you require.
private fun newFile(): FileDescriptor? {
// Create a file descriptor for a new recording.
val date = DateFormat.getDateTimeInstance().format(Calendar.getInstance().time)
val filename = "$date.mp3"
val values = ContentValues().apply {
put(MediaColumns.TITLE, date)
put(MediaColumns.MIME_TYPE, "audio/mp3")
// store the file in a subdirectory
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.Q) {
put(MediaColumns.DISPLAY_NAME, filename)
put(MediaColumns.RELATIVE_PATH, saveTo)
} else {
// RELATIVE_PATH was added in Q, so work around it by using DATA and creating the file manually
#Suppress("DEPRECATION")
val music = Environment.getExternalStoragePublicDirectory(Environment.DIRECTORY_MUSIC).path
with(File("$music/P2oggle/$filename")) {
#Suppress("DEPRECATION")
put(MediaColumns.DATA, path)
parentFile!!.mkdir()
createNewFile()
}
}
}
val uri = contentResolver.insert(MediaStore.Audio.Media.EXTERNAL_CONTENT_URI, values)!!
return contentResolver.openFileDescriptor(uri, "w")?.fileDescriptor
}
On Android 10 and above, we use DISPLAY_NAME to set the filename and RELATIVE_PATH to set the subdirectory. On older versions, we use DATA and create the file (and its directory) manually. After this, the implementation for both is the same: we simply extract the file descriptor from MediaStore and return it for use.

TB Plugin errors after updating TB from 38.7.2 to 45.1.0

Several years ago I made a private Thunderbird plugin for automatically processing paypal emails about subscriptions. The user has to put the paypal emails in a certain folder "PaypalMsgs", and the plugin reads them one by one, finds out if it is a payment, a cancellation etc. and then updates the "Other" field of the person in the address book.
The plugin got broken with the recent update of Thunderbird to 45.1.0 because it cannot find the folder PaypalMsgs any more.
This is the code for finding the folder:
// determine the local root folder
var localRootFolder = Components
.classes["#mozilla.org/messenger/account-manager;1"]
.getService(Components.interfaces.nsIMsgAccountManager)
.localFoldersServer
.rootFolder;
// start with root folder to find folder with given name
this.ppPaypalFldr = this.findFldrDeep(localRootFolder, "PaypalMsgs");
// recursive function to find a folder fldr with the name fldrName
findFldrDeep: function(fldr, fldrName) {
if(fldr.name == fldrName) {
return fldr;
} else {
if(fldr.hasSubFolders) {
var fldrEnum = fldr.subFolders;
while(fldrEnum.hasMoreElements()) {
var sfldr = fldrEnum.getNext();
var result = this.findFldrDeep(sfldr, fldrName);
if(result) {
return result;
}
}
} else {
return null;
}
}
},
When executed nothing happens and TB's error console shows:
Error: TypeError: this.ppPaypalFldr undefined
at the first location where this.ppPaypalFldr is used
It might be an easy thing, like the definition of the services of nsIMsgAccountManager might have changed or the folder type suddenly has different functions, but I have a really hard time to find reliable documentation or even the source for TB 45.
Thank you for any hints and support!

After more seach, debugging and thinking (sic!) I found the problem:
At the line
var sfldr = fldrEnum.getNext();
The interface is missing and it looks like in TB45 something has changed so the interface is not automatically retrieved from somewhere (the software worked without this interface since about 4 or 5 years).
So the correct line is:
var sfldr = fldrEnum.getNext().QueryInterface(Components.interfaces.nsIMsgFolder);
I also checked all of the plugin and added all interfaces - now it works like a charm.
Writing the problem here alone has helped me a lot to find the solution ;-)

TestComplete object sometimes not found

I have some trouble with TestComplete because sometimes it won't find my objects, sometimes it just doesn't and I get an error because the object is null.
For instance in this small function
function SelectCountry(country){
var page = Sys.Browser("*").Page("*");
var panel = page.Form("ID1");
select = panel.FindChildByXPath("//select[#id='ID2']");
select.ClickItem(country);
link = page.FindChildByXPath("//a[#id='ID3']");
link.Click();
page.Wait();
}
I get an error for 4 out of 5 runs telling me that select has not been found, but then on the one lucky run, everything passes fine.
Can anyone tell me what I have to check for?

Try searching for your object in a loop. Use the Exists property of the object to determine if the object exists after each search of the page. Another option would be to use the Wait methods https://support.smartbear.com/viewarticle/73657/
I would suggest avoiding hard coded delays for the reasons you have discovered. They way I search for page objects in my project is to do the search in a loop and log an error if not found.
var stopTime = Win32API.GetTickCount() + 60000;
var currentUpTime = Win32API.GetTickCount();
while (currentUpTime < stopTime) { //repeat search for element for n milliseconds
currentUpTime = Win32API.GetTickCount();
for (i = 0; i < attributes.length; i++) {
var element = eval('Sys.Browser("iexplore").Page("*").' + tcMethod + '(' + '"' + attributes[i] + '"' + ',' + "'" + attributeValue + "'" + ',20000' + ')');
if (element.Exists) {
return element;
}
}
}

I found a working solution. It's evident that the source of the problem is that the page is not properly loaded. So I put some hard coded stops before every stap that loades a new page.
aqUtils.Delay(2000);
Sometimes I have even to go for 5 seconds.
This is still not very stable since for some reason delays could be higher sometimes.
Is there some way of telling TestComplete it should try to find an element during 30 seconds and only then raise an error?

you can always put up a delay in the test with the test complete code
aqUtils.Delay(2000);
as mentioned.But this can also occur in case you are doing something very fast in the tests because of which the test reaches the point before the object is visible. That is reason why we use the delay in test to wait for object to load.
Try putting the breakpoint at the object and check for after waiting for 10 seconds.if the test passes in all the cases in this method it should be due to delay in object load.Or use the wait process mentioned in
https://support.smartbear.com/testcomplete/docs/app-objects/common-tasks/waiting-process-or-window-activation.html

Does anyone know how to stop T4 (tt) templates from regerating every single file? Is there a way to flag certain files?

So the tt templates will regenerate every file whenever you save. Now, great, it generates files. However, I am making partial classes to extend other classes, but I only need the files that dont already exist for me generated. The ones that exist, I'd like to preserve. So far, I am finding not one solid solution googling the globe...
In my code below, the exception for finding existing files doesnt matter, because the template will start by deleting all files first. Then it regenerates.
It there a method like "onsave" that I can override?
// BEGIN CODE TO GENERATE EXTENSIONS
<#
foreach (EntityType entity in ItemCollection.GetItems<EntityType>().OrderBy(e => e.Name))
{
string fileName = entity.Name + ".Extension.cs";
string filePath = this.Host.TemplateFile.Substring(0,this.Host.TemplateFile.LastIndexOf(#"\"));
filePath = filePath + #"\Extensions\" + fileName;
if((File.Exists(filePath) && PreserveExistingExtensions == false) || !File.Exists(filePath))
{
fileManager.StartNewFile(fileName);
BeginNamespace(namespaceName, code);
bool entityHasNullableFKs = entity.NavigationProperties.Any(np => np.GetDependentProperties().Any(p=>ef.IsNullable(p)));
#>
<#=Accessibility.ForType(entity)#>
<#=code.SpaceAfter(code.AbstractOption(entity))#>partial class
<#=code.Escape(entity)#><#=code.StringBefore(" : ", code.Escape(entity.BaseType))#>
{
}
<#
EndNamespace(namespaceName);
}
}
fileManager.Process();
#>

I do something similar (partial classes) where I have one that is always generated, but the custom one will only be generated if it doesn't exist. This second one is created as starting class for customizations. I'll output two files like so:
MyClass.generated.cs
MyClass.cs
MyClass.cs will never be recreated, unless it doesn't exist. MyClass.generated.cs will always be recreated.
I use the T4toolbox to do this, Oleg Sych has actually made this quite easy.
You can check out some sample T4 Templates I built here. Specifically have a look at this one, it's a good example for generated partial classes where one needs to be created every time, and one is only created if it doesn't exist.
The main thing to look at is this line in the code:
var requestBaseMessageCustom = new MessageTemplate(rootNamespace, serviceName + "Request");
requestBaseMessageCustom.Output.File = "Messages/" + serviceName + "Request.cs";
requestBaseMessageCustom.Output.PreserveExistingFile = true;
requestBaseMessageCustom.Render();
Notice the property called PreserveExistingFile, that's the key.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ITextSharp taking too much time in getting Number of Pages - performance

Related

Indesign Scripting: View array's actual content (strings) in ExtendScript console

How to save an image in a subdirectory on android Q whilst remaining backwards compatible

TB Plugin errors after updating TB from 38.7.2 to 45.1.0

TestComplete object sometimes not found

Does anyone know how to stop T4 (tt) templates from regerating every single file? Is there a way to flag certain files?

Categories

Resources