Merging PDFs skipping corrupted PDFs - ghostscript

Currently I am using Ghostscript to merge a list of PDFs which are downloaded. The issue is if any 1 of the pdf is corrupted, it stops the merging of the rest of the pdfs.
Is there any command which i must use so that it will skip the corrupted pdfs and merge the others?
I have also tested with pdftk but facing the same issue.
Or is there any other command line based pdf merging utility that I can use for this?

You could try MuPDF, you could also try using MUPDF 'clean' to repair files before you try merging them. However if the PDF file is so badly corrupted that Ghostscript can't even repair it that probably won't work either.
There is no facility to ignore PDF files which are so badly corrupted they can't even be repaired. Its hard to see how this could work in the current scheme, since Ghostscript doesn't 'merge' files anyway, it interprets them, creating a brand new PDF file from the sequence of graphic operations. When a file is badly enough corrupted to provoke an error we abort because we may have already written any parts of the file we could, and if we tried to ignore and continue both the interpreter and the output PDF file would be in an indeterminate state.

Related

VB.NET file has become an unreadable format

So the images below were originally a vb files. I have just opened it and it looks like this and the compiler won't run it. I am unsure whether this is a compiler error or whether it may have become corrupt because the project is stored on an external drive. It is just these two forms that have broken like this; I have one other form and a module in the same project that are okay but the project can't run because of the two that are broke.
Broken Login Form
Broken Diary Form
If it changes anything, the designer files for the forms are intact it is just the scripting for the forms elements that is broken.
Also, if I can't identify the cause, is there a way to revert it back to the last working version in visual studio to get my code back? Just because I put a lot of time into it.
The data in those files is most likely gone.
IMPORTANT: Do not write anything to that disk drive unless you find that you cannot recover those files.
If you are using a version control system then you can revert to an earlier version.
If you are using Windows 10 and you happen to have stored those files in a location included in what File History saves, you can recover them from that.
If you use some other form of backup, retrieve the files from that.
If you have a separate disk drive with at least as much free space as the one with the corrupted files, you could try running file recovery software as it might be that the zeroed-out file was written to a different place on the HDD.
TinTnMn pointed out in a comment that if you previously compiled the code, you should have executable files in the "obj" and "bin" folders that can be decompiled to recover most of your work
It could be quicker to re-write the code while it is still fresh in your mind.

Creating a variable zip archive on the fly, estimating file size for content-length

I'm maintaining a site where users can place pictures and other files in a kind of shopping cart. After selecting all the various contents the user wishes to download, he can checkout. Till' now an archive was generated beforehand and the user got an email with the link to the file after the generation finished.
I've changed this now by using web api and push stream to directly generate the archive on the fly. My code is offering either a zip, a zip64 or .tar.gz dynamically, depending on the estimated filesize and operating system. For performance reasons compression ist set to best speed ('none' would make the zip archives incompatible with Mac OS, the gzip library I'm using doesn't offer none).
This is working great so far, however the user is no longer having a progress bar while downloading the file because I'm not setting the content-length. What are the best ways to get around this? I've tried to guess the resulting file size, but either the browsers are canceling the downloads to early or stopping at 99,x% and are waiting for the missing bytes resulting for the difference between the estimated and actual file size.
My first thought was to guess the resulting file size always a little bit to big and filling the rest with zeros?
I've seen many file hosters offering the possibility to select files from a folder and putting them into a zip file and all are having the correct (?) file size with them and a progress bar. Any best practises? Thanks!
This is just some thoughts, maybe you can use them :)
Using Web API/HTTP the normal way to go about is that the response contains the lenght of the file. Since the response is first received after the call has finished, the actual time for generating the file will not show any progress bar in any browser other than a Windows wait cursor.
What you could do is using a two steps approach.
Generating the zip file
Create a duplex like channel using SignalR to give feedback on the file generation.
Downloading the zip file
After the file is generated you should know the file size, and the browser will show a progress bar while downloading.
It looks that this problem should have been addressed using chunk extensions, but it seems to never got further than a draft.
So I guess you are stuck with either no progress or sending the file size up front.
It seems that generating exact size zip archives is trickier than adding zero padding.
Another option might be to pre-generate the zip file without storing it just to determine the size.
But I am just wondering why not just use tar? It has no compression, so it is easy determine it's final size up front from the size of individual files and it should be also supported by both OSx and Linux. And Windows should be able to handle none compressed zip archives, so a similar trick might work as well.

How to intentionally break rar archive (make it unreadable)?

I want to intentionally break rar archive for testing purposes.
I was trying to copy archive in the middle or archiving process but it is impossible due to read lock (I use windows 7).
How to do that?
I think opening with editor and deleting some chunks of the gibberish code should work. However, there would still be trouble with the read-lock.
I tested it with a .zip file. After the first delete (first ~10 lines) it was still readable by 7-Zip, after deleting some more lines it was corrupted and Windows Explorer nor 7-Zip was able to open it.

Skip processing files during Gradle build

I'm in the process of migrating my build system from ANT to Gradle (as ANT/ADT is now no longer supported by Google) and I ran into an issue in one of the test packages. There is a test that works with an empty png (as if somebody ran 'touch empty.png') and a corrupted png. These png files are in our res/drawable-hdpi folder, as they should be. When building though, Gradle uses libpng to do some sort of processing and it errors on these two files.
My question is: can I tell Gradle to skip processing on these two files, or is there another way to get around this issue?
BTW, on a whim I tried to rename the files to .xml (the only other allowed format) and, still, it wouldn't parse.
To give an answer to others who find this question, I used horatius' answer and made the /res/raw directory and put my corrupted and empty png files in there. Gradle didn't try and process them and they still get indexed by R.java.

Checking files for errors

I have lots of files in different formats (mostly pdf files) and I need to check if they can be opened without errors and get a list of those that are broken.
Other than opening them all separately is there a way to find out which won't open / are corrupt?
Not really, no. Because there are so many file types it would be impossible to know if a file was corrupt without opening it. It might open without errors but still be corrupt so even that isn't going to help you. You could try a general file opening solution like KeyView which can open most file formats. If it fails then chances are the file is corrupt.

Resources