How to increase number of displayed rows in Oracle Reports 6i? - oracle

I have a tabular report in Oracle Reports 6i, and when I print it, it produces a large margin at the bottom and continues to print the remaining rows/records on the next page. The orientation of the printout is portrait. I increased the number of maximum records per page, but when it prints it won't go beyond the number of rows it's been printing, and it's like the change is of no effect.

When in Paper Layout editor, check the margin layout. Thick black line shows "usable" part of the report - check whether it is stretched through the whole paper size.
Also, see whether paper size is correctly set. For example, for portrait A4 paper, it should be 21 cm wide and 29.7 cm high.
Furthermore, data is contained within a frame; actually, two of them (repeating and ... well, the one that encloses it; its default name begins with an M). See their vertical elasticity properties. Should be either variable or expand (maybe one (or both) of them are now fixed).

Related

DirectWrite renders issues and metric scaling inaccuracy

I have DirectWrite setup to render single glyphs and then shape them programmatically based on the glyph run and font metrics. (Due to my use case, I can't store every full texture in an OpenGL texture otherwise it's essentially a memory leak. So we store each glyph into one texture to lay them out glyph by glyph.)
However, I have two issues:
Inconsistent rendering results.
Scaling metrics leads to inconsistent distances between glyphs.
These results are are transferred to a bitmap using Direct2D and WIC bitmap (CreateWicBitmapRenderTarget).
Let's look at an example, font size 12 with Segoe UI.
Full string rendered 1st line is rendered using DrawTextLayout drawn with D2D1_DRAW_TEXT_OPTIONS_ENABLE_COLOR_FONT. 2nd line is drawn with each Glyph using DrawGlyphRun with DWRITE_MEASURING_MODE_NATURAL. 3rd is rendered with paint.net just for reference.
This leads to the second issue, the distance between each letter can be off. I am not sure if this is a symptom of the previous issue. You can see the distance between s and P is now 2 pixels when drawn separately. Because i is no longer 3 pixels wide, it visually looks too close to c now when zoomed out. p and e look too close.
I have checked the metrics, and I am receiving the right metrics from the font from shaping. Above string metrics from DirectWrite : [1088.0, 1204.0, 1071.0, 946.0, 496.0, 1071.0, 869.0]. I am comparing output with Harfbuzz: [S=0+1088|p=1+1204|e=2+1071|c=3+946|i=4+496|e=5+1071|s=6+869] which is correct.
To convert to DIP I am using this formula for the ratio multiplier: (size * dpi) / 72 / metrics.designUnitsPerEm
So with a default DPI of 96 and default size of 12 we get the following ratio: 0.0078125.
Let's look at S is 1088. So the advance should be 1088 * 0.0078125 = 8.5. Since we can't write between half a pixel, which way do we go? I tried every value from the lsb, to the advance, to render offset in every combination of flooring, ceiling, rounding, converting to int. Whichever way I choose, even if it fixes it for one situation, I'll test with another font, or another size, it will be one or two pixels too close in another string. I just can't seem to find a proper balance that is universal.
I am not really sure where to go from here. Any suggestions are appreciated. Here is the code: https://github.com/pyglet/pyglet/blob/master/pyglet/font/directwrite.py#L1736
EDIT: After a suggestion of using DrawGlyphRun using the full run, it does appear exactly what the DrawTextLayout outputs. So the DrawGlyphRun should produce the same appearance. Here's where it gets interesting:
Line1: DrawTextLayout
Line2: Single glyphs drawn by DrawGlyphRun
Line3: All glyphs drawn using DrawGlyphRun
You can see something interesting. If I render each 'c' by itself (right side), you can see that it has 4 pixels on the left of the character by itself. But in the strings it looks like it's missing. Actually, taking a deeper look, and a color dropper, it appears the color is indeed there, but it's much darker. So somehow each glyph is affecting the blend of the pixels around it. I am not really sure how it's doing this.
EDIT2: Talking with another, I think we narrowed this down to anti-aliasing. Applying the antialias to the whole string vs each character produces a different result. Setting D2D1_TEXT_ANTIALIAS_MODE_ALIASED each character looks and appears exactly the same now compared to both.

Report rdlc - Shrink rectangle inside tablix in group

I have a .rdlc report with grouping (4 levels).
In the last level, I have a pretty complex design of textboxes/images that can't be done with rows/cols. For example, they overlap on some points.
So what I have to do is to put a Rectangle on the cell and then, inside the Rectangle, put all the components.
The problem I have now is that some of these components can be hidden depending on the data, and because of that, sometimes there is a lot of white space inside the report that I don't want.
Is there any way to shrink the Rectangle if it doesn't have any visible data?
Unfortunately, by design Rows and Columns will not shrink below its definition height/width, therefore, a Rectangle can only be as small as its Cell.
However, you could try to make it as small as possible, and rely on the CanGrow property of Textboxes ("Property" window, under "General" tab), as suggested in the link given above.

"Barcode" reading from scanned image

I want to read a barcode from a scanned image that I printed. The image format is not relevant. I found that the scanned images are of very low quality and can understand why it normal barcodes fail.
My idea is to create a non standard and very simple barcode at the top of each page printed. It will be 20 squares in a row forming a simple binary code.Filled = 1, open = 0. It will be large enough on aA4 to make detection easy.
At this stage I need to load the image and find the barcode somewhere at the top. It will not be exactly at the same spot as it is scanned in. Step into each block and build the ID.
Any knowledge or links to info would be awesome.
If you can preset a region of interest that contains the code and nothing else, then detection is pretty easy. Scan a few rays across this region and find the white/black and black/white transitions. Then, knowing where the "cells" should be, you known their polarity.
For this to work, you need to frame your cells with two black ones on both ends to make sure to know where it starts/stops (if the scale is fixed, you can do with just a start cell, but I would not recommend this).
You could have a look at https://github.com/zxing/zxing. I would suggest to use a 1D bar code, but wide enough to match the low resolution of the scanner.
You could also invent your own bar code encoding and try to parse it your self. Use thick bars for 1 and thin lines for 0. A thick bar would be for instance 2 white pixels, 4 black pixels. A thin line would be 2 white pixels, 2 black pixels and 2 white pixels. The last two pixels encode the bit value.
The pixel should be the size of the scanned image pixel.
You then process the image scan line by scan line, trying to locate the bar code.
We locate the bar code by comparing a given pixel value sequence with a pattern. This is performed by computing a score function. The sum of squared difference is a good pick. When computing the score we ignore the two pixels encoding the bit value.
When the score is below a threshold, we found a matching pattern. It is good to add parity bits to the encoded value so that it's validity can be checked.
Computing a sum of square on a sliding window can be optimized.

XSLT create complex SVG visualisation minimizing line crossings etc

This is no actual single coding problem, rather a problem of the right approach to a complex issue.
So, I have built a rather complex svg visualisation of my XML data using xslt. It looks like this:
(source: erksst.de)
This is just a small sample of the whole data. There are two or three rows. Each row could contain up to 160 yellow boxes.
(The yellow boxes are letter collections, the blue/grey boxes single letters, the lines represent their way of dissemination.)
It works well so far but I want to optimize it:
(1) minimize the number of line crossing
(2) minimize the number of lines crossing a blue/grey box
(3) minimize the lines being too near to another line.
To achieve this there are things to vary:
(a) The broadest row (in the sample it is the third) is fix. It can't be moved. But the other (two) can be moved in the range of the width of the broadest row. I.e. in my example the yellow box of the second row could be moved some 160 pixels to the right.
(b) Furthermore, in the two smaller rows the margin between the yellow boxes could be varied. In my example there is just one per line. But of course there could be more than one yellow box in the two smaller rows.
(c) The order of the yellow boxes within a row could be altered.
So, many possibilites to realize this visualisation.
The problem is the performance time.
I have started with the line crossing problem by using a function which kind of pre-builds the visualisation and calculates the number of crossings.
The variation with the smallest number of crossings is actually built in the output.
The problem is the time it needs. The transformation with just 100 possibilites and my hole XML data took 90 seconds. Doesn't sound like much, but taking into account that 100 variations are just a very small part of all theoritically possible options and that the visualition should at some point in the future build on the fly on server for a user's selection of the data 90 seconds simply is way too much.
I have allready reduced the visualition template for the calculate line crossings functions to all what is necessary leaving asside all captions and so on. That did help, but not as much as expected.
The lines are drawn as follows: First, all boxes are drawn keeping their id from the original data. Then I go back to my data, look where connections are and build the lines.
You could transform your XML into the DOT language (plain TXT format) by XSLT and process it by GraphViz. I solved some similar issue (although not so huge as yours seems to be) this way.

Detecting empty pages in scanned documents

So we need to detect whether an image, created by a scanner, represents an empty page. I'm way out of my depth when it comes to image processing, so I have to run this by the community.
Here's what I have come up with so far:
Empty pages can be glaringly white, gray recycled paper, or yellowed old paper. The current idea is to create a histogram for a page, look for a steep increase of the curve, and get the percentage of pixels are darker than that. If that exceeds a threshold, the page is likely not empty.
Since this would likely classify a page containing a single line of text at the top as empty, we would tile the page and gather statistics about each tile.
We would need to detect scanned staplers and holes from binding (likely only in certain tiles), but this can be put off to some later stage. However, if you have an idea what to look out for besides these two, please mention it in a comment.
This needs to be fast. It's part of a document processing workflow that processes (tens of) thousands of pages per day. If processing a page takes ten seconds longer, than our customers will have to tell their customers that they'll have to wait several days longer for their results. (If this results in more false positives, some customers would rather have someone check a few dozen found "empty" pages, than have their customer wait one more day.)
So here's my questions:
Is it a good idea to take this route or is there something better?
If we do it this way, how would I do this? What's a good (cheap) algorithm for finding a threshold for a page? Could we gain significant speed by assuming a similar threshold for a batch of documents? To which precision could brightness values be rounded, before getting logged? What quirks could we expect?
If you know that a scanned page is going to fill the image entirely, then calculating the standard deviation might be a good way of doing this.
I would suggest blurring page slightly to reduce some noise. Then calculate the SD for the page, in theory, a page the is more or less all one colour will have a low SD and one with lots of text will have a higher SD. Then it's a case of 'training' the system to work out when a page is plain and when it is text. You might find that certain pages are hard for it to tell.
You could have it trained by having it process a vast number of pages, and it goes through them all, and you say if it is plain or not.
EDIT
ok, a white page with black text, if we have just the page and no surrounding stuff, will have a mean colour of grey, probably a fairly light grey. Getting the average is a for loop through all the pixels, adding their values and then dividing by the number of pixels. I'm not good with this o(logN) stuff, but suffice to say, it will not that long. Unless you have HUGE images.
SD is a second for loop, this time we are counting up how different each pixel is from the mean, and then dividing by the mean. This will take a bit longer then the mean, as we have to do something like
diff = thispixel - mean;
if(diff < 0) {
diff = -diff;
}
runningTotal += diff;
For a plain coloured page, each pixel will be close to the mean value, thus our SD will be low. If the SD is below a certain value, we can assume that this means the page is all one colour.
This might have problems if their is very minimal amount of text, as it will not have a large influence on the SD, so maybe like you suggested in the question, break the page into sections. I suggest strips horizontally, as text tends to go this way. If we do one of these strips one at a time, once one strip suggests it has text, we can stop as we don't care if the rest is blank or not.
Blurring the page will help reduce noise, as the odd pixel of noise will be reduced in its impact, thus give you a 'tighter' SD. You could also use it to reduce the resolution of your image.
Say you sauce image is 300 wide by 900 high, you could sample pixels in blocks of nine, 3 *3, and thus end up with an image that is 100 wide by 300 high, so it can actually be used to reduce the amount of calculations you need to do, in this case by a ninth!
The main problem is going to be in working out how high an SD can be with just a plain page. Maybe have it find the SD of a load of blank pages.
By the sounds of it, you are probably going to want to have a middle ground that lets it be unsure and ask for human intervention, possibly letting the human value train the system to get better?
Perform some sort of simple edge detection. If the number of pixels constituting edges is below some threshold, then there's going to be a high probability the page is empty. This could be improved by classifying certain edges that correspond with high certainty (by shape and location) to punched holes and staples as trivial and discounting them from the metric.
When I worked for a document processor (~8 years ago), we handled client projects varying from very "clean" only-US-letter-sized pages to cover-/cardstock of irregular shapes mixed with normal pages. Operators fed pre-sorted files into scanning machines and only had to watch for folded corners and similar mechanical problems. Their output was multiple streams of hundreds of images corresponding to a range of files. A single scanner operator could easily scan 15k pieces of paper in a shift (that's only 0.60 pages/sec, while a scanner at speed could handle 3 pages/sec and still scan both sides). Later operators processed those looking for key pages to mark file start and end. (Image recognition can be used here, sometimes, but people also provide a quality check on the first operators.) We had many variables that could be set per client project.
I'm basing the rough outline below on that experience and how it appears that your goals and workflow are similar.
(Terminology: By client I mean our client, e.g. a specific bank. A project or client project is a set of documents from that client that contains many files, e.g. all mortgages handled by a specific branch in a given year. A file is a logical arrangement that would normally be a physical file folder for one of the client's customers, e.g. all mortgage papers for one address.)
Cut off the top, bottom, sides, and corners. Throw these out of your calculations (even though you'll probably store them in the final image). This will cover staple holes, binder holes, but also (tiny) folded corners and very minute torn edges which appear as black spots. Depending on how you're scanning, the last two may be less of a problem.
Vary the sizes of these cuts for each client project, as required. For example, even a very thin edge slice, say 1-2mm, will eliminate most ragged edges without increasing false positive rate.
Convert to black and white, 1 bit per pixel. I suspect you are already doing this for some client projects anyway, so doing this efficiently and effectively, which can be subtle, should be no extra work. (Even if you don't store the 1bpp image as the deliverable result, the conversion will be helpful in empty page detection.) Eliminate noise by dropping any black pixels with none or only one black neighbor (using all surrounding 8 as neighbors).
After cutting extremities (#1) and this simplistic noise reduction, blank pages will have a very low number of black pixels; most blanks will have no black pixels at all – exempting exceptionally poor page quality, inked stamps (when scanning back-sides, mentioned more below), or other circumstances across the whole project, and so forth.
Depending on client project, you may set hotspots to be watched – the converse of cutting off the sides. For example, watching a 1" strip where a single line at the top of the page would appear may reduce false positives. A low contrast scan or faded hardcopy (perhaps even pencil, which can be common on back-sides) with only one line of text will be distinguished from a blank page this way.
What sections are worth watching depends on each project, but do not try to divide the page up into tiles and then subdivide those tiles into areas of interest. Instead, parallelize this on the page level; e.g. 1 worker per core, each worker handles a full page at a time.
Depending on how you're keying individual files, you may find it helpful to drop blanks (before marking start-of-file pages, which is still often a manual process even at high volume) then watch for blank pages at unexpected points after files have been keyed (e.g. expected would be the last page of the file, without being two blanks in a row, etc.).
For example, if a particular project is only scanning one side of each page, then detecting two blank pages in a row is a good indication that a couple pages in a file were flipped upside-down (clients often hand over hardcopy files like this). Either the sorters (who remove things like staples and paperclips) or the first machine operators should have caught this mistake, but, regardless, it will now need a manual check to verify.
On the other hand, there were projects that had very clean files so sorters could insert (usually colored) blank pages marking file boundaries. In this case, the second set of people still did the keying by file number, but only had to examine the first page of each file. This wasn't rare, but not common either.
Before I start rambling a bit, I hope my main point comes across: you have to decide how to mitigate rates of false positives (= data loss) and false negatives (= annoying blanks and otherwise harmless, but a maximum allowed rate may still be specified in the project contract). That varies drastically by project and the type of files/documents you're handling, but it guides you in how to do the detection. You will get much better results from a tailored approach than trying one-size-fits-all, even if the tailored approaches are 80-98% similar.
If you're delivering 1bpp images to the client, for example, you might not even want/need to eliminate blanks as filesize (and ultimately size of the delivered dataset) won't be an issue. This can be an acceptable trade-off when eliminating most blanks is harder while maintaining a low false positive rate; such as for files with inked stamps ("received on", "accepted", "due date", etc.; they bleed through to the back) or other problems, for example.
My fall class does a bunch of image-processing projects.
Here's what I would try:
Project from color to grayscale
Pour all the pixels into a simple histogram with say 100 buckets between 0 and 1
Find a local minimum in the histogram such that the absoluete value of above - below is as small as possible, where above is the number of brighter pixels and below is the number of darker pixels
Force the above pixels to white and the below pixels to black
If you like, as an extra step you could remove black edges
If there are hardly any black pixels, the page is blank
The first two steps should be combined, and they are the only time-consuming steps; on a 600dpi images you may have to touch many millions of pixels. The rest will be lightning fast. I'd be very surprised if you can't classify multiple images per second—especially if you know there will be no black edges.
The only part that requires training or experiment is the last step. It's also possible that you will need to fiddle around with the number of buckets in the histogram; if there are too many buckets, you may have a bad local minimum.
Good luck, and report back to us how you make out!
Check out this line detection algorithm: http://homepages.inf.ed.ac.uk/rbf/HIPR2/linedet.htm. In addition to a detailed explanation of how the algo works there's a demo where you can use your own image and see the results. I tried two images: 1) a B&W scan of a receipt, 2) the B&W, "blank" back side of that same receipt. All of the edge detection algorithms I tried found edges on the "blank" page. But, this line detection algorithm was the only algorithm that correctly found lines on the front page and yet didn't find anything on the "blank" back page.
It looks as if you're trying to convert all paperwork for a company into digital documents. Some of this paper can be really old.
Say your text is black, and any other color is the background. If you take two weighted averages, one consisting of what you think is the text, and one consisting of the background, you can compare those two and see if they're distant enough to consider further evaluation. This will removing any uneven aging of the paper.
Staple holes and punched holes in paper are pretty standard in size, but they'd show up as gray or not at all if you're scanning on a white background. If not, then you can guess where these are and remove them.
Now, we look at areas of high interest, areas where the black pixels are the most dense. Select a portion of that and OCR it. Place the starting top-left closest to an area where text begins. On a typical document, a solid blank linear area going left-to-right and another going top-to-bottom denotes the top and left sides of a paragraph. You can be sure that you got a line of text because below a line of text is another blank left-to-right area. So you don't need to worry about selecting a portion that will chop text in half.
You could take the mean gray level (integer) of each few rows of the scanned image (depending on the resolution and how many lines are required to capture one line of text), then consider the spread of row means. If there is no text on the page, the spread of means should be small (i.e. background ranges from 250-255), and if there is text on the whole page or on part of the page, the spread would be much larger (i.e. 15 for text to 250 for background).
Seems to me like the solution should be computationally simple due to the large number of pages to check. Approaches requiring further processing (edge detection, filtering, etc) seem like overkill, and will take much longer to run.
There is no need to process pixel by pixel, using matrices will help this be more efficient, for example using Numpy you can calculate means, sums, etc. for entire rows, columns or matrices at once much more efficiently. There is also no need to process EVERY pixel, a good sample of rows should be able to accomplish the task with similar accuracy. 8bit accuracy should be fine, and you could even resample to large pixels before running this processing algorithm.
You can do a noisy trim, i.e. blur the image and do an auto-trim (without actually modifying the image). If the width or height of the trim result is below a threshold (e.g. 80 to 100 for a 600 dpi image) then the page is empty.
A proof of concept using the ImageMagick command line front-end:
$ convert scan.png -shave 300x0 -virtual-pixel White -blur 0x15 -fuzz 15% \
-trim info:
The above command assumes a 600 dpi DIN A4 black and white (1 Bit) image. It also ignores a margin of 300 pixels such that artifacts like perforation holes don't yield false negatives.

Resources