I'm doing image cutting, but handwritten content always interferes with me.The handwritten content made me can not separate different rows. Always it makes me
separate the two rows into the same row. So i want to know how can i solve this problem.
Related
Noticed that images sometimes are sliced up in PDFs.
Steps:
insert an image with a high resoultion (3000x1800) into a .docx
use "Microsoft Print to PDF" option of Word to convert to PDF
extracting all images with pdfimages or pymupdf
Result:
Image is sliced horizontally into three images
Questions:
What exactly happens in the in the transition from .docx to pdf (or in generell in the process to pdf) that makes the converter slice it up into three images instead of one?
Do the individuell XObjects of the sliced images contain information which says that these three images belong to originally one?
How do I know how the images are sliced (horizontally / vertically) and what if originally there were two images inserted into the .docx file and both of them are sliced. Can you tell if slice x belongs to original image y or z?
So, as you have found out: because the code which generates the PDF choose to do so.
The technical reasons may be various - it could be that historically there were printers which would only have so much memory, and would need to get limiterd size-images when printing, and someone at some point when writing the PDF export code present in Microsoft Office choose to apply this limit.
Anyway, technically, as put in the comments, an image in a PDF file could be composed of unlimited smaller images collated together.
Now, the second part, and your actual question: to know whether images ibn a PDF file belong together in a single original image one would need a custom extractor tool to check the geometry of all images in the document and find out which images have no margins or boundaries with others - it would not be that hard to do for well behaved files (which we can't know if MS Office generated files are: there are ways to obfuscate image positioning by making it indirectly). The metadata in the image-parts may or may not contain information that would allow one to recompose the original image: it would be up to the code generating the PDF to include this metadata or not - but the geometry can't lie in this case: if the final document presents a single image visually, it is possible to detect that when fetching the images.
I need to take an existing PDF (created with Prawn), and combine pairs after page 1 (the cover) into single pages. I would also like to add a vertical line in the center of the joined pages. The pages are to be printed in books, and the goal is to make single PDF pages that are similar to the side by side view in Acrobat. I know I can convert them to images, do what I need to with ImageMagick, then put them back into a PDF format, but I am trying to minimize the number of conversions so I can save as much quality as possible.
I also realize I can do this from the start with Prawn, but I am trying to avoid that as it would require a very large change to our application.
It is possible to do this with Ghostscript and the pdfwrite device, but its by no means simple. You need to write some PostScript to do the job.
You would need to add BeginPage and EndPage procedures, the BeginPage would need to check the current page number (and you would need to track this yourself). If its page 1, process normally. If its an even page, throw away the current PageSize and replace it with one which covers a pair of pages. Process the even page. Do not transmit the content.
If the page is odd (and not 1) then translate the origin so that its offset to the right by the width of the page. Process the odd page. use moveto, lineto and stroke to draw the required line between the two pages. Transmit the page.
This assumes that all the pages are the same size and orientation, or least that the sizes of each page are known in advance. It would be possible to retrieve those programmatically as well, but more complex.
Its definitely non-trivial, but if you rummage through my answers in the PostScript tags and look for anything with the word 'imposition' you'll probably find program outlines to do the job.
I did a quick look and here's an answer I wrote some time back. It uses a different approach to that outlined above, it copies some of the guts of the PDF interpreter and repurposes them. It does a chunk of what you want though.
I'm currently trying to achieve this:
I have a very large TIFF image, which contains scanned documents. The image contains invoices with barcodes/QR codes, followed by multiple other scanned documents related to the invoice which preceded them. This can be repeated multiple times ( the TIFF image may look like [invoice] + [documents] + [invoice] + [documents] ... )
I need a program (doesn't really matter in which language but I'd prefer either Java, JavaScript, PHP, C++ or Python) that takes said TIFF image, scans all the barcodes and returns their values and their position in the image (either which page it is on or it's absolute position, but the page is preferable, I know for certain that there won't be multiple barcodes on one page). The goal is to split this TIFF image into multiple PDF files, each containing only one invoice and all of the documents that belong to the invoice.
I have the latter part done already. I intend to use ImageMagick to split the TIFF file into multiple files (tested, works). I have also tried multiple barcode scanning methods, but met critical problems at every one. And that's the point of my question:
Is any of my presumptions false? Is there a better way/library/SW that you know about that could work?
Libraries/SW I tried so far:
ZXing port for PHP: Can't work with TIFF files
ZXing github
Quagga for JavaScript: Can't work with TIFF files either.
Quagga github
ZBar code reader: The best looking one by far. I managed to scan multiple QR codes in one TIFF image using CMD (Windows), but didn't find a way to get their positions. Also found out that C++ and Python versions exist, but didn't get to try them out just yet.
Thanks for any ideas/corrections.
The best one I heard -that is subjective ofc- is Barcode Rendering Framework
I'm not sure if it can detect multiple barcodes on a page but it can detect many different types of barcodes.
And it's also Open Source..
I am trying to display text on an SSRS report in a fixed position, regardless of the content above it. The problem is the content above may be one to three lines. As the above content fluctuates, so does the beginning of the next row of content. This happens in two places on the report.
In other words, I need two static starting points for dynamic content. I am printing text onto a pre-designed invoice that doesn't contain any customer data. The invoice is just a template (or shell if you will) and I am to position the data to print into spaces provided on that invoice.
Please see attachments. The attachment with two rows in each field is lining up correctly. I have done this intentionally as a basis. As you can see on the other two attachments, with either one or three rows, the content isn't lining up where it should be due to more or less content than my basis.
I have toyed around with cangrow & canshrink but couldn't get the results as desired.
Appreciate the help!
I think you're going to need to re-work your table to get working the way you want.
The way I would do it would be to have one table that groups each page data. Use this table to format your data - putting Rectangles in the area for your lines of data. Rectangles can have a set height so they fill your area when there isn't enough data.
Then you'll have to add another tables inside each of the rectangles to display each sections data.
See my example below - the RED represents where the Rectangles would go and the BLUE for tables.
I would keep the old table intact and copy and paste it in each rectangle. Then remove all the parts that that section doesn't need and set your rectangle to the correct size.
I need to plot and display several jpeg images in a single combined display (or canvas?). For example, suppose I have images {a,b,c,d}.jpg, each of different size, and I would like to plot them on one page in a 2x2 grid. It would be also nice to be able to set a title for each subplot.
I've been thoroughly looking for a solution, but couldn't find out how to do it, so any ideas would really help. I would preferably use a solution that is based on the EBImage package.
There are two ways how to arrange several plots with base graph functions, namely par(mfrow=c(rows,columns)) (substitute rows and columns with integers) and layout(mat) where mat is a matrix like matrix(c(1,2,3,4)).
For further info see ?par, ?layout, and especially Quick-R: Combining Plots.
However, as your question is about images I don't know if it helps you at all. If not, I am sorry for misinterpreting your question.
To add to Henriks solution, a rather convenient way of using the par() function is:
jpeg(filename="somefile.jpg")
op <- par(mfrow=c(2,2)
#plot the plots you want
par(op)
dev.off()
This way, you put the parameters back to the old state after you ran the code. Be aware of the fact this is NOT true if one of the plots gave an error.
Be aware of the fact that R always put the plots in the same order. Using mfrow fills the grid row by row. If you use mfcol instead of mfrow in the code, you fill up column by column.
Layout is a whole different story. Here you can define in which order the plots have to be placed. So layout(matrix(1:4,nrow=2) does the same as par(mfcol=c(2,2)). But layout(matrix(c(1,4,3,2),ncol=2)) places the first plot lefttop, the next one rightbottom, the third one righttop, and the last one leftbottom.
Every plot is completely independent, so the titles you specify using the option main are printed as well. If you want to have more flexibility, you should take a look at lattice plots.
If you do not want the images in a regular grid (the different sizes could imply this), then you might consider using the subplot function from the TeachingDemos package. The last example in the help page shows using an image as a plotting character, just modify to use your different images and sizes/locations.
The ms.image function (same package) used with my.symbols is another possibility.