How to calculate the "distance" between two rectangles? - algorithm

Given one rectangle and a bunch of images (also rectangles), I need to find the best image to place in it. That would be the one that requires less stretching or shrinking and that covers the area the best. I want to find the one with the least distance (as in, least transformation) to the target rectangle. The images are screenshots of websites, so, they contain a mix of text and images. The screenshots suffer whether they are stretched (pixelation) or shrunk (text becomes unreadable).
But it also feels like one of these problems that someone might have looked into already and there might be an algorithm to properly solve it.
The data is stored in a SQL database so I would need the analysis to be doable in SQL. The data might look like this:
---------------------------------------------------------
| Id | Width | Height |
---------------------------------------------------------
| 00b701c6-1c31-4323-a292-700b4dff2e45 | 784 | 1310 |
| 0a46a0f6-a3b2-4a5d-a8be-55bad84ba37d | 1414 | 957 |
| 0b79fbe8-6b9e-48d1-89da-8981570e23d7 | 784 | 561 |
| 0e9f5935-0e58-42d2-bba2-3e89db55260f | 400 | 400 |
| 0ebf14fb-094b-47f5-9e25-b4f54bc2eab9 | 2260 | 957 |
| 17131cd6-f5b2-4e4d-a63b-b909e04e2d89 | 1414 | 957 |
| 2298fc73-0bcb-49c8-b54e-3184cf4153d4 | 784 | 1310 |
| 28ffee4a-2d08-4862-aeb0-6546cda4e225 | 2560 | 1387 |
| 29cf92ad-b6fd-43c6-abb1-7c5a7e4af92d | 2260 | 957 |
| 307b2b6e-1f66-4784-bd7d-b6bfc4768fbd | 2560 | 1387 |
| 3edc916b-4b3d-4fd8-a1f9-6418a4d8d27a | 2333 | 435 |
| 3ef1132a-d059-487a-9cad-dbb3895ad25a | 1414 | 957 |
| 43e044e5-5f82-4b86-95ba-a9e76f5d2519 | 657 | 435 |
| 464be0ec-5cb7-4f3f-856d-6beb5fbc2f5e | 657 | 435 |
| 510d0236-e61a-4f1c-bb0b-754c4c1f80f7 | 2260 | 957 |
| 52f217d5-038c-475d-af96-89d1930e8c2f | 657 | 435 |
| 532cadf5-c20b-4b1c-84d4-78e1b501495f | 2333 | 435 |
| 5f3e55aa-12a4-4502-a159-fdc128b53e11 | 2260 | 957 |
| 626c33a9-aaa0-47b6-a6f3-bd5235f1655b | 784 | 561 |
| 6711a717-e1ee-4930-9f21-5e225a99a769 | 657 | 435 |
| 7125c301-c311-4339-b36c-519dc3714c68 | 784 | 561 |
| 8f5d8e3b-8213-4cd6-8ea0-311297f4cfc3 | 2333 | 435 |
| c3d7661f-12e6-4297-8830-15e82850bc32 | 784 | 1310 |
| cd32106e-2f3e-4614-ac40-19e3f5d7fa1f | 784 | 561 |
| d7191194-1f8a-4230-8ee0-8a8b427b86e7 | 784 | 1310 |
| d737de66-849d-4ec3-bf3b-cc48bfa1f3a6 | 2560 | 1387 |
| d935e10b-88f3-4aba-a2b4-a1a9cfd8acb4 | 2560 | 1387 |
| dcc8e9e6-4ee3-4737-a530-d2fcffd35a86 | 2333 | 435 |
| ec3187be-5a81-4ecb-a908-ddedaa5930ec | 1414 | 957 |
---------------------------------------------------------

You can compute the Jaccard index as follows:
function jaccard(rect : Rectangle, img : Rectangle) : float
rectArea := rect.width * rect.height
imgArea := img.width * img.height
interArea := min(rect.width, img.width) * min(rect.height, img.height)
return interArea / (rectArea + imgArea - interArea)
end
Then choose the highest scoring image (values go from zero to one).

I don't have a complete algorithm but my approach would be to score each image based on how good it matches the rectangle. The interesting parameter would the ratio (width/height), so calculate the ratio for each image and compare it to the ratio of the rectangle. The nearest ratio wins.
As for the second problem I'd probably set a threshold, if the ratio of the best fit is really close to the rectangle (below the threshold) you can get away with stretching (looks better than two very thin borders), if it's above the threshold add black borders since distorted text is hideous.

Related

filter and parse unstructured log with logstash

i'm pretty new to logstash and i have a problem.
I want to filter a line from a log that is different from others and then make some manipulation with grok.
This is the log file that i have:
Date: 3/1/2021 -- 05:08:14 (uptime: 2d, 22h 36m 18s)
------------------------------------------------------------------------------------
Counter | TM Name | Value
------------------------------------------------------------------------------------
capture.kernel_packets | Total | 433066
capture.kernel_drops | Total | 18183
decoder.pkts | Total | 414883
decoder.bytes | Total | 453509832
decoder.ipv4 | Total | 413834
decoder.ipv6 | Total | 778
decoder.ethernet | Total | 414883
decoder.tcp | Total | 409208
decoder.udp | Total | 5266
decoder.icmpv6 | Total | 56
decoder.avg_pkt_size | Total | 1093
decoder.max_pkt_size | Total | 1514
flow.tcp | Total | 1273
flow.udp | Total | 1336
flow.icmpv6 | Total | 26
flow.wrk.spare_sync_avg | Total | 100
flow.wrk.spare_sync | Total | 18
decoder.event.ipv4.opt_pad_required | Total | 82
decoder.event.ipv6.zero_len_padn | Total | 24
flow.wrk.flows_evicted_needs_work | Total | 535
flow.wrk.flows_evicted_pkt_inject | Total | 579
flow.wrk.flows_evicted | Total | 406
flow.wrk.flows_injected | Total | 535
tcp.sessions | Total | 667
tcp.syn | Total | 669
tcp.synack | Total | 668
tcp.rst | Total | 407
tcp.stream_depth_reached | Total | 14
tcp.reassembly_gap | Total | 8
tcp.overlap | Total | 27
detect.alert | Total | 1106
app_layer.flow.http | Total | 41
app_layer.tx.http | Total | 126
app_layer.flow.tls | Total | 611
app_layer.flow.ntp | Total | 15
app_layer.tx.ntp | Total | 15
app_layer.flow.dhcp | Total | 4
app_layer.tx.dhcp | Total | 6
app_layer.flow.dns_udp | Total | 964
app_layer.tx.dns_udp | Total | 1934
app_layer.flow.failed_udp | Total | 353
flow.mgr.full_hash_pass | Total | 35
flow.spare | Total | 9856
flow.mgr.rows_maxlen | Total | 2
flow.mgr.flows_checked | Total | 3998
flow.mgr.flows_notimeout | Total | 1808
it is repeating starting from the date. i need only the date string and nothing more and then i want to make some manipulation on the data, to send them as json. There is a way to do so?
First, you must manage multiline. So I suppose you have input log file (you could change as you want the approach is the same).
input {
file {
path => ["inputlogs/*"]
codec => multiline {
pattern => "^Date: %{DATE:date} -- %{TIME:time}"
negate => true
what => previous
}
}
}
After this, you must filter each line (I guess csv filter must be more useful in this case, but we can handle this with grok too as you've asked) :
grok {
match => ["message","^%{NOTSPACE:fieldname}\s*\| %{WORD:field}\s*\| %{INT:value}"]
}
All lines that don't match this pattern (header and -- line) are flaged with _grokparsefailure

How to compare two circuits based on their utilization

I have some hardware IPs that I need to synthesize. And the IP contains several generic parameters I can play with. Each combination of parameters gives me a different utilization report after synthesis and implementation.
So for example for two different configurations Design_1 and Design_2, I get the following in Vivado 2018.1. The 3rd line is the ratio of the values of Design_2 devided by values of Design_1.
So as you can see in this simple example, Design_2 has less Slice LUTs but slightly more F7 Muxes.
My question is how to conclude about the cost of each one? Should I privilege Slice LUTs or Registers ...etc?

| Name | Slice LUTs | Slice Registers | F7 Muxes | F8 Muxes | Slice | LUT as Logic | LUT as Memory | LUT Flip Flop Pairs | Block RAM Tile | DSPs | Bonded IOB | Bonded IPADs | PHY_CONTROL | PHASER_REF | OUT_FIFO | IN_FIFO | IDELAYCTRL | IBUFDS | PHASER_OUT/PHASER_OUT_PHY | PHASER_IN/PHASER_IN_PHY | IDELAYE2/IDELAYE2_FINEDELAY | ILOGIC | OLOGIC | BUFGCTRL | BUFIO | MMCME2_ADV | PLLE2_ADV | BUFMRCE | BUFHCE | BUFR | BSCANE2 | CAPTUREE2 | DNA_PORT | EFUSE_USR | FRAME_ECCE2 | ICAPE2 | PCIE_2_1 | STARTUPE2 | XADC |

| Design_1 | 34124 | 16913 | 1453 | 91 | 10272 | 31538 | 2586 | 9020 | 37 | 11 | 125 | 0 | 1 | 1 | 4 | 2 | 1 | 0 | 4 | 2 | 16 | 16 | 46 | 10 | 0 | 2 | 2 | 0 | 2 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Design_2 | 34097 | 16913 | 1550 | 91 | 10189 | 31511 | 2586 | 9021 | 37 | 11 | 125 | 0 | 1 | 1 | 4 | 2 | 1 | 0 | 4 | 2 | 16 | 16 | 46 | 10 | 0 | 2 | 2 | 0 | 2 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| -------- | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| (2)/(1) | 0.999208768022506 | 1 | 1.06675843083276 | 1 | 0.991919781931464 | 0.999143889910584 | 1 | 1.00011086474501 | 1 | 1 | 1 | #DIV/0! | 1 | 1 | 1 | 1 | 1 | #DIV/0! | 1 | 1 | 1 | 1 | 1 | 1 | #DIV/0! | 1 | 1 | #DIV/0! | 1 | #DIV/0! | 1 | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! | #DIV/0! |

It's depending on your needs, LUTs and F7 Muxes are differents physical cells in your FPGA. So even if you don't use its, its will be there.
If you have one ressource more critical than the other, you should try to minimize the utilisation of the critical ressource to simplify the place and route.
If you have nothing critical, I think the better is to use F7 Muxes first because Slice LUTs are more flexible for the rest of your design.

Buffers used by WINDOW SORT operation

I have a query with the following execution plan:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 21741 |00:00:11.38 | 150K| 1088 | | | |
| 1 | SORT AGGREGATE | | 46072 | 1 | 46072 |00:00:02.92 | 138K| 241 | | | |
| 2 | FIRST ROW | | 46072 | 1 | 3761 |00:00:02.83 | 138K| 241 | | | |
|* 3 | INDEX RANGE SCAN (MIN/MAX) | VERP_VIG_VEHICLE_STAGES_N2 | 46072 | 1 | 3761 |00:00:02.79 | 138K| 241 | | | |
|* 4 | HASH JOIN RIGHT OUTER | | 1 | 37010 | 21741 |00:00:11.38 | 150K| 1088 | 3272K| 1218K| 3302K (0)|
| 5 | VIEW | | 1 | 7402 | 23548 |00:00:11.17 | 147K| 1088 | | | |
| 6 | WINDOW SORT | | 1 | 7402 | 23548 |00:00:10.82 | 79621 | 1088 | 4801K| 915K| 4267K (0)|
|* 7 | HASH JOIN RIGHT OUTER | | 1 | 7402 | 23548 |00:00:07.84 | 8837 | 847 | 1599K| 1599K| 996K (0)|
| 8 | TABLE ACCESS FULL | VERP_OTM_PS_CONTROL_TABLE | 1 | 5 | 5 |00:00:00.01 | 39 | 0 | | | |
|* 9 | FILTER | | 1 | | 23548 |00:00:07.80 | 8798 | 847 | | | |
|* 10 | HASH JOIN RIGHT OUTER | | 1 | 7402 | 71904 |00:00:07.76 | 8798 | 847 | 1421K| 1421K| 1756K (0)|
| 11 | VIEW | | 1 | 4534 | 4554 |00:00:00.01 | 27 | 0 | | | |
|* 12 | HASH JOIN | | 1 | 4534 | 4554 |00:00:00.01 | 27 | 0 | 1888K| 1888K| 1596K (0)|
| 13 | INDEX FULL SCAN | VERP_VPS_SUPPLY_VVP_N1 | 1 | 27 | 27 |00:00:00.01 | 1 | 0 | | | |
| 14 | INDEX FULL SCAN | VERP_VPS_SUPPLY_VVVP_N1 | 1 | 4534 | 4554 |00:00:00.01 | 26 | 0 | | | |
|* 15 | HASH JOIN | | 1 | 37010 | 71904 |00:00:07.67 | 8771 | 847 | 1245K| 1245K| 1722K (0)|
| 16 | SORT UNIQUE | | 1 | 37010 | 1586 |00:00:00.05 | 3279 | 0 | 124K| 124K| 110K (0)|
| 17 | TABLE ACCESS FULL | VERP_OTM_STAGED_VONS | 1 | 37010 | 21741 |00:00:00.02 | 3279 | 0 | | | |
| 18 | TABLE ACCESS BY INDEX ROWID BATCHED| VERP_VIG_VEHICLES | 1 | 246K| 36104 |00:00:07.53 | 5492 | 847 | | | |
|* 19 | INDEX RANGE SCAN | VERP_VIG_VEHICLES_N22 | 1 | 246K| 36104 |00:00:07.38 | 891 | 838 | | | |
| 20 | VIEW | | 1 | 37010 | 21741 |00:00:00.12 | 3279 | 0 | | | |
| 21 | WINDOW SORT | | 1 | 37010 | 21741 |00:00:00.11 | 3279 | 0 | 1612K| 624K| 1432K (0)|
| 22 | TABLE ACCESS FULL | VERP_OTM_STAGED_VONS | 1 | 37010 | 21741 |00:00:00.03 | 3279 | 0 | | | |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("S"."VIN"=:B1 AND "S"."STAGE_CODE"='YARD_RECEIPT')
4 - access("VINS"."VIN_SEQUENCE"="VONS"."VON_SEQUENCE" AND "VINS"."PORT_CODE"="VONS"."PORT_CODE" AND "VINS"."INT_COLOR_CODE"="VONS"."INT_COLOR_CODE" AND
"VINS"."EXT_COLOR_CODE"="VONS"."EXT_COLOR_CODE" AND "VINS"."SPEC_CODE"="VONS"."SPEC_CODE" AND "VINS"."OPTION_CODE"="VONS"."OPTION_CODE" AND
"VINS"."MODEL_CODE"="VONS"."MODEL_CODE")
7 - access("C"."PORT"=CASE "VVV"."VEHICLE_SOURCE" WHEN 'SIA' THEN '020' ELSE "from$_subquery$_006"."PORT_CODE" END )
9 - filter("VPT"."PORT"=CASE "VVV"."VEHICLE_SOURCE" WHEN 'SIA' THEN '020' ELSE "from$_subquery$_006"."PORT_CODE" END )
10 - access("VVVP"."VESSEL_PORT_ID"="VVV"."VESSEL_PORT_ID")
12 - access("VVP"."PORT_ID"="VVVP"."PORT_ID")
15 - access("VVV"."SOA_MODEL_CODE"="VPT"."MODEL_CODE" AND "VVV"."SOA_OPTION_CODE"="VPT"."OPTION_CODE" AND "VVV"."SOA_SPEC_CODE"="VPT"."SPEC_CODE" AND
"VVV"."SOA_EXT_COLOR_CODE"="VPT"."EXT_COLOR_CODE" AND "VVV"."SOA_INT_COLOR_CODE"="VPT"."INT_COLOR_CODE")
19 - access("VVV"."PS_STATUS"='NOT_MATCHED')
I am interested to know why the WINDOW SORT operation in step #6 is requiring so many buffer gets. I usually don't see that sort of thing for a WINDOW SORT operation. For example see the operation in step 21 of the same plan -- no additional buffer gets.
Does anyone know what these buffer gets are? I suspect that maybe the sort operation is spilling to disk, due to its size, and the extra buffer gets are to access those temp tablespace blocks. I'd like confirmation or alternate explanations, as appropriate. Thanks.
UPDATE
To hopefully clarify: I want to know why step 6 required added buffer gets beyond what was required to get through step 7. I.e., why it is not like the buffer gets in step 21, which did not increase the number from what was necessary to get through step 22.

pdfbox does not respect white space

Hi I am trying to create a PDF with pdfbox 2.0
I need to print a simple loan amortization table like these
| 18 | 933.80 | 807.49 | 126.31 | 6,082.49 | 2017-04-12 |
| 19 | 933.80 | 822.29 | 111.51 | 5,260.20 | 2017-05-12 |
| 20 | 933.80 | 837.36 | 96.43 | 4,422.83 | 2017-06-12 |
| 21 | 933.80 | 852.72 | 81.08 | 3,570.11 | 2017-07-12 |
Exampple code:
cos.beginText();
cos.setFont(fontPlain, 12);
cos.newLineAtOffset(98, rect.getHeight() - spaceBetweenLines * (++line));
cos.showText("| 24 | 933.80 | 900.48 | 33.32 | 916.99 | 2017-10-12 |");
cos.endText();
cos.beginText();
cos.setFont(fontPlain, 12);
cos.newLineAtOffset(98, rect.getHeight() - spaceBetweenLines * (++line));
cos.showText("| 25 | 933.80 | 916.99 | 16.81 | 0.00 | 2017-11-12 |");
cos.endText();
But then the resulting pdf file remove some white-spaces.
I donĀ“t know to fix these
That is because the space character has a different width than the digits. As you can see from the image it starts being off as soon as the number of digits differ in a column.
You can get the width for a string with PDFont.getStringWidth(String text)(don't include the spaces) and move the position within a line using newLineAtOffsetwhich you are already doing

Ruby on Rails: Rake: rake stats didn't add my field to the correct value?

Before my rake stats modification
+----------------------+-------+-------+---------+---------+-----+-------+
| Name | Lines | LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers | 5037 | 3936 | 31 | 292 | 9 | 11 |
| Helpers | 150 | 128 | 0 | 17 | 0 | 5 |
| Models | 1523 | 1166 | 42 | 123 | 2 | 7 |
| Libraries | 633 | 415 | 4 | 65 | 16 | 4 |
| Functional tests | 289 | 228 | 13 | 0 | 0 | 0 |
| Unit tests | 560 | 389 | 30 | 0 | 0 | 0 |
| Model specs | 1085 | 904 | 0 | 3 | 0 | 299 |
| View specs | 88 | 75 | 0 | 0 | 0 | 0 |
| Controller specs | 468 | 388 | 0 | 2 | 0 | 192 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total | 9833 | 7629 | 120 | 502 | 4 | 13 |
+----------------------+-------+-------+---------+---------+-----+-------+
Code LOC: 5645 Test LOC: 1984 Code to Test Ratio: 1:0.4
now, when I add:
#Factories
::STATS_DIRECTORIES << %w(Factories\ specs test/factories) if File.exist?('test/factories')
::CodeStatistics::TEST_TYPES << "Factory specs" if File.exist?('test/factories')
around line 120, it should increase test LOC, right?
+----------------------+-------+-------+---------+---------+-----+-------+
| Controllers | 5037 | 3936 | 31 | 292 | 9 | 11 |
| Helpers | 150 | 128 | 0 | 17 | 0 | 5 |
| Models | 1523 | 1166 | 42 | 123 | 2 | 7 |
| Libraries | 633 | 415 | 4 | 65 | 16 | 4 |
| Functional tests | 289 | 228 | 13 | 0 | 0 | 0 |
| Unit tests | 560 | 389 | 30 | 0 | 0 | 0 |
| Model specs | 1085 | 904 | 0 | 3 | 0 | 299 |
| View specs | 88 | 75 | 0 | 0 | 0 | 0 |
| Controller specs | 468 | 388 | 0 | 2 | 0 | 192 |
| Factories specs | 144 | 119 | 0 | 0 | 0 | 0 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total | 9977 | 7748 | 120 | 502 | 4 | 13 |
+----------------------+-------+-------+---------+---------+-----+-------+
Code LOC: 5764 Test LOC: 1984 Code to Test Ratio: 1:0.3
Instead of adding tho 144 lines from factories to test LOC, it adds them to code LOC =\
How do I get the line count to be in Test LOC?
You're adding something called "Factories specs" (plural) to the STATS_DIRECTORIES array, but you call it "Factory specs" (singular) when you add it to TEST_TYPES array -- so when rake:stat hits your test/factories folder, it looks for "Factories specs" in TEST_TYPES, doesn't find it, and assumes it's code, not tests. You need to call it the same thing in both places:
::STATS_DIRECTORIES << %w(Factory\ specs test/factories) if File.exist?('test/factories')
::CodeStatistics::TEST_TYPES << "Factory specs" if File.exist?('test/factories')

Resources