How can I teach the NLP Splitter - stanford-nlp

Please give me directions:
How can I "teach" the splitter to split such paragraph:
The paper is 7 cm. length. What is the painter name? the size of the picture is 5 cm. x 8 cm.
into 3 parts.
and not to 5 parts as done by default:
1) The paper is 7 cm.
2) length.
3) What is the painter name?
4) the size of the picture is 5 cm.
5) x 8 cm.
Thanks, Aryeh.

The tokenizer is entirely rule-based so you can add custom abbreviations to it. You will have to edit PTBLexer.flex and recompile it using JFlex.
See also "stanford corenlp, splitting sentences, abbreviation exceptions".

Related

How to make tesseract recognize number 0 with open top right corner which is included in EU plates?

I have this character image:
This Image of 0 is similar to ones found in some eu plate cars. for example this image:
How to make tesseract (if possible) to recognise the 0 in the image as 0 in python or any other language? Currently on its on it is recognize it as u. Of-course if I croped the 0 and the 2 together is recognises them as digits (0 2) as there is nothing else but I want to recognize every single character where any general plate can have any number of characters in any order. Do I need to retrain tesseract? Thank you.
I used the trained font (tesseract) from this git repository:
https://github.com/openalpr/openalpr/tree/master/runtime_data/ocr/tessdata
They have trained data for various car plates and use can use any of them for your plate OCR.

How these matrices are working?

I am reading a article about pattern reconigation. I am not understanding how this 8 column coming from. and how its output is generating.
I tried to get the concept but i am not getting how first matrix have 8 column ? and how its calculating output ?
The network of figure 1 is trained to recognise the patterns T and H.
The associated patterns are all black and all white respectively as
shown below.
If we represent black squares with 0 and white squares with 1 then the
truth tables for the 3 neurones after generalisation are;
enter image description here
Each table represents one line of you image.
Each column(Xij) of your table represents possible combinations of those pixels in one line(of your input image) and OUT represents if those combinations evaluate to true or false.
There are 8 columns because there are 8 possibilities of combining 3 values of 1 and 0 (2 at the power of 3).
I think it's easier if you look at those tables on vertical(transpose).

ORB Feature Descriptor Official Paper Explanation

I was just reading the official paper of ORB from Ethan Rublee Official Paper and somewhat I find hard to understand the section of "4.3 Learning Good Binary Features"
I was surfing over the Internet to dig much deep into it and I found the below paragraph. I haven't getting the practical explanation of this. Can any of you explain me this in a simple terms.
"Given a local image patch in size of m × m, and suppose the local window
(i.e., the box filter used in BRIEF) used for intensity test is of size r × r , there are N = (m − r )2 such local windows.
Each two of them can define an intensity test, so we have C2N bit features. In the original implementation of ORB, m is set to 31, generating 228,150 binary tests. After removing tests that overlap, we finally have a set of 205,590 candidate bit features. Based on a training set, ORB selects at most 256 bits according to Greedy algorithm."
What am getting from the official paper and from the above paragraph is that.
We have a patch size of 31X31 and select a size of 5X5.. We will have N=(31-5)^2 = 676 possible Sub Windows. Am not getting the lines which are marked in bold. What does it mean by removing test that overlap, we get 205,590 bit Features?
Imagine a small image with size 31x31 (patch) and a small 5x5 window. How many different positions this window can be placed into the image? If you slide it 1 by 1 pixel then it can be placed in (31-5)^2 = 676 different positions, right? Combining only central pixels of 676 windows by 2 elements you have 676!/(2!*(676-2)!) = 228,150 combinations. In case of ORB descriptor they were not interested in slide the window in 1 by 1 pixel, it could be so much noised because of overlap between some windows (they are much near). Then they removed overlapping windows sliding it 5 by 5 pixels and used their central pixels to create binary tests, what reduced total combinations to 205,590.

Difference Chart using Dimple.js

Is there an easy way to create a difference chart using dimple? I'm looking to create something similar to this example: http://bl.ocks.org/mbostock/3894205/.
Thanks
I know this was asked 2 years ago, maybe someone is still interested :-)
A way to do this with dimple is to transform your 2 series (say A and B) into 3 series:
a serie that draws the lower line: its values are min(A,B)
a serie that draws the upper line when A > B and fills with green: its values are max(A-B,0)
a serie that draws the upper line when A < B and fills with red: its values are max(B-A,0)
Then you stack all 3 and use dimple.plot.area for the last 2 to have the fill-in effect.
We can make a working example if you provide your code and data.

How do ASCII art image conversion algorithms work? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
There are some nice free "image to ASCII art" conversion sites like this one: ASCII-art.org
How does such an image conversion algorithm work?
,
. W ,
W W #
W ,W W
, W, :W* .W .
# WW #WW WW #
W WW.WWW WW: W
W. WW*WWW# WW# W
* :WW.WWWWWWW#WWW#W #
+* #WW#WWWWWWWWWWWWW# W
W# #WWWWWWWWWWWWWWWWW W
WW WWWWWWWWWWWWWWWWWW W
WW WWWWWWWWWWWWWWWWWW#W#
,WW.WWWWWWWWWWWWWWWWWWWWW
WW#WWWWWWWWWWWWWWWWWWWWW
: WWWWWWWWWWWWWWWWWWWWWWWW :
# WWWWWWWW#WWWWWWW##WWWWWW.
W*WWWWWW::::#WWW:::::#WWWWW
WWWWWW#:: :+*:. ::#WWWW
WWWWW#:*:.:: .,.:.:WWWW
#WWWW#:.:::. .:: #:#WWW
:WWW#:#. :: :WWWW:#WWWW
WWW#*:W#*#W . W:#WWW
#WWWW:# :: :: *WWWW
W#WW*W .::,.::::,:+ ##WW#,
WWWW## ,,.: .:::.: . .WWW:,
#WWW#: W..::::: #. :WWWW
WWWW:: *..:. ::.,. :WWWW
WWWW:: :.:.: : :: ,#WW#
WWWW: .:, : ,, :WW,
.: # : , : *
W + ., ::: ., : #
W :: .: W
#,,,W:. ,, ::*#*:, . :#W.,,#
+.....*: : : .#WWWWW: : .#:....+,
#...:::*:,, : :WWWWWWW, , *::::..,#
:...::::::W:, #W::::*W. :W:::::...#
###########W#####W######W#####W##########:
The big-picture-level concept is simple:
Each printable character can be assigned an approximate gray-scale value; the "at" sign # obviously is visually darker than the "plus" sign +, for example. The effect will vary, depending on the font and spacing actually used.
Based on the proportions of the chosen font, group the input image into rectangular pixel blocks with constant width and height (e.g. a rectangle 4 pixels wide and 5 pixels high). Each such block will become one character in the output. (Using the pixel blocks just mentioned, a 240w-x-320h image would become 64 lines of 60 characters.)
Compute the average gray-scale value of each pixel block.
For each pixel block, select a character whose gray-scale value (from step 1) is a good approximation of the pixel block average (from step 3).
That's the simplest form of the exercise. A more sophisticated version will also take the actual shapes of the characters into account when breaking ties among candidates for a pixel block. For example, a "slash" (/) would be a better choice than a "backward slash" (\) for a pixel block that appears to have a bottom-left-to-upper-right contrast feature.
aalib (last release in 2001) is an open source ASCII art library that's used in applications like mplayer. You may want to check out its source code to see how it does it. Other than that, this page describes in more detail about how such algorithms work.
Also you can take a look at libcaca (latest release 2014), which acording to their website has the following improvements over aalib:
Unicode support
2048 available colours (some devices can onlyhandle 16)
dithering of colour images
advanced text canvas operations (blitting, rotations)
I found this CodeProject article written by Daniel Fisher containing a simple C# implementation of a image to ASCII art conversion algorithm.
These are the steps the program/library performs:
Load the Image stream to a bitmap object
Grayscale the bitmap using a Graphics object
Loop through the image's pixels (because we don't want one ASCII character per pixel, we take one per 10 x 5)
To let every pixel influence the resulting ASCII char, we loop them and calculate the brightness of the amount of the current 10 x 5 block.
Finally, append different ASCII characters based for the current block on the calculated amount.
Quite easy, isn't it?
BTW: In the comments to the article I found this cool AJAX implementation: Gaia Ajax ASCII Art Generator:
[...] I felt compelled to demonstrate
it could easily be done in a
standardized set of web technologies.
I set out to see if I could find some
libraries to use, and I found Sau Fan
Lee's codeproject article about his
ASCII fying .NET library.
P.S.: Lucas (see comments) found another CodeProject article.

Resources