Tool to recognize text in Image and edit it and create new image with edited text - image

Hi is there any tools which do the following steps:
Recognize Text in image
Edit the text
Create new image with ne text

In the general case, this is very difficult (especially for a picture).
You need a good OCR.
The OCR needs to be able to recognize the exact font.
If the background isn't a solid color, the OCR will be perturbed and possibly not able to extract the exact characters and erase them correctly.
But also, if there is a background, when you change the characters you need to reconstruct the background where characters have been erased.
Then the editor needs to paint the new text with the same rendering attributes (size, italics, outline...), which may be a difficult task.
Because of this complexity, the best option is often to do it by hand.

Related

Difference between text as image and graphics as image

The question seems to be weird, but I need to ask this, since I am witnessing a quite interesting output when I compare text as image and graphics as image.
Ideally I am in process of identifying an tool, or algorithm to compare two pdfs, generate output which will highlight the difference between them.
There are possibilities in pdfs, which will have text as image format (legacy text on papers, are converted to pdfs).
and we are doing migration of those legacy pdfs, and finally we are comparing with legacy and converted pdf output.
I am evaluating couple of tools like Adobe dc pro, i-net pdfc and power pdf etc, for comparing two pdfs.
While evaluating, I am able to see graphic images are getting compared(not accurate either) on either side of the pdfs. Where as text as images are completely ignored, unanimously same results in all the tools.
But I am more interested in text as image, since we deal more of legacy text pdfs.
Below, is attached graphic image comparison result, where it could able to capture the differences between the images.
But when I compare text image, differences are not highlighted in the tool.
What I understand from this, text is not compared as image graphics, and tool is completely ignoring the comparison. I would like have clarification whether my assumption is correct.
Secondly, I would like to know how to compare text image in pdfs to generate the differences?.
I'm working for the company that is author of i-net PDFC so I'll answer your first question as well:
Your assumption is correct. i-net PDFC is able to compare images and shapes, but it cannot detect if some content completely changed it's meaning, e.G. a line shape that is used to draw a letter or in your case an image that has to be recognized as text. Recognizing ASCII art as image won't work for the same reason either. Such cases will always be detected as differences even though their visual appearance is similar.
On your second question: Using an OCR conversion tool for one or both documents is a common solution to this problem. A simple image comparison of the compared pages in unlikely to work due to the different font styles and line wrappings in the converted file.
Please note that most OCR applications will use the rendered page images for the recognition. This may lead to incorrect recognition results even if there are no images in the PDF file.
i-net Software is aware of this general issue and an OCR module is currently in development. It'll provide an option to apply the recognition solely to the images in the PDF files.

Putting text on an image in Actionscript 3

I may be missing something obvious. I have large background images. I want to put some text on top of them. Currently the background image hides the text, I want the text on top of the background image.
I did some searches, and find plenty of information about including small images in text. That's not what I want to do. There is no relation between the image and the text. addChildAt() does not help. The background image has addChild before the text has addChild.
This is generally easy to do in other languages, which leads me to think I'm missing something. What is it?
Added in response to question: The background image and text are coming from different objects, which is why addChildAt does not work.

Implementing Emoticon in windows store chat

I am developing a Windows Store chat app.
In this apps, I am using a TextBox to receive message content from the user. I want to implement Emoticons (Smileys) such that typing a code gives a respective image inline with the text.
For example, for :), I want to have a 'smile' image.
What you'll need to do is use a RichTextBlock to display your text. This will give you access to a adding in an InlineUIContainer block where necessary.
So, your process will be:
Accept text in a regular text box
Parse the text into a series of Inlines (Run, InlineUIContainer, etc)
Create a new Paragraph for the message
Add the Inliness to the Paragraph.
Add the Paragraph to your RichTextBlock's Blocks property (a BlockCollection).
For each piece of text:
Split the text, likely using Regex, searching for the keys which trigger an Image (':)', '(heart)', etc).
For each non-image text, create a Run with the Text set to the text of the split
For each Image, create an InlineUIContainer and an Image. Set the Image source to the proper Image path, then set the Child of the InlineUIContainer to the Image.
Add the Run or InlineUIContainer the Paragraph via Paragraph.Blocks.Add(Inline).
Certain icons may be included in the Segoe UI Symbol Font Family. If this is the case, you may choose to not use an Image for that symbol, and instead use a Run with the FontFamily set to Segoe UI Symbol. You can play around with the FontSize if you want them to be more prominent.
Hope this helps and happy coding!

Add text layer to PDF of scanned handwritten notes in OSX

While in class I like to take handwritten notes, afterwards I scan them and then type them up (helps me remember them and also makes them easily searchable). The main issue is I have is I use A LOT of drawings and complex math and converting the math formulas into latex (or word) is very time consuming and the drawings require that I keep the PDF and the text document. What I would like to do is take the basic text that I have typed myself (no OCR) and add a text layer to the PDF's that way the PDF's will be searchable and I can save a lot of time by not converting the math or drawings.
I've looked into Preview, PDFpenPro, acrobat, a couple of linux programs but so far I haven't really found anything that will do this.
Any idea of how I could do this or a program to use?
I also scan my notes. Sometimes I go back and add some text to them using this technique:
Open up the scanned pdf in Preview, then click on the "Edit" button in the top right corner, then the "Text tools" button on the left side (its a little box with Aa in it). From there you can drag open a text box and type into it.
Now the secret trick is that if you save it here as it is and try to open it in your ipad using PDFExpert or some other program then the text might not be there. So here's how to go through that slight hiccup: After you've annotated your notes how you want instead of just saving it as a pdf, use the Print option: File->Print or Command+P. Now click the PDF button on the left to "Save it as a pdf". Now that its printed you can open it and search it in any program that reads pdfs. Attached is an example.
One other thing, it seems like maybe you want to write over your existing handwritten text with typed text? I'm not sure if this is the best way. But if that's what I was trying to do I would:
Scan my notes
Read through them, typing them up as you said
Open the scanned notes in Photoshop or some other program
Draw a giant White Fill White Stroke rectangle over the handwritten text
Save it as a pdf
Do the technique above and copy and paste the typed text from step 2.
I hope this helps. And I wish you luck, I'm still working out the kinks myself for scanned notes but the possibilities have me pretty excited!
EDIT: I just checked out PDFpenPro, which I highly recommend because you don't have to go through that printing trick, you can just save the pdf document after annotating and other programs will recognize the annotations.

Changing font of text error

I have a textbox control inside of a software app which has some text in it. That software is using a custom font which doesn't exist anywhere else and is just specific to this program. I don't have it's source or access to it's creators. Now I want to copy that text inside of a notepad or MS word but when I do the text is no more readable unless I change the font of word processor to the font that the software is using (the font that text is written with). So I want the text to be readable anywhere and not to depend on a specific font. So is it possible?
I'm a c# programmer. Here is an example of unreadable text:
ý¶† ±øõœ ­ý¶† –ý¾‡¨ ÿ†°†¬ ­ñð‡ì úÞ±¶ Ä쇤 ½±”
à¥ì ±øõœ þ·ñœ­Œ Ý稆­Œ ô±º±” (.ì)
[þü‡íý‘†õø]
ý¶†
[þ¶­ñùì ïõéÎ]
±øõœ ­ý¶† ‡º±”
[þíýº]
ý¶†
[úð‡ýì‡Î —‡¤çȾ†] ÿ¬.¹†.ë† °­©ì ÿû¬‡ì ²† þÎõð.ÿ¬.¹†.ë†"
The interesting thing is that it's showing up like this in almost all the fonts except the one that text is originally written with. By the way the text is in Arabic and all of fonts that I tested the text with are supporting Arabic chars.
Now if I type some text that consist of English and Arabic in that font then change the font of notepad to some other font it's looks OK and works normal! So the problem only appears when the text is pasted into the word processor.
EDIT: I think I found the problem! The custom font is a raster font (bitmap font) which has a .fon extension and in the following thread someone wanted to convert the bitmap font to ttf since he was having a problem in printing the documents. I want to copy and paste, so maybe I have to convert the font ?
The discussion:
how to convert a bitmap font .fon into a truetype font ttf
Any kind of help is really appreciated.
thank you.
any kind of help is really appreciated.
If I had seen this question on superuser.com my answer would have been:
You can change the font of text from font A to Arial.
For example in Microsoft Word
Open the Replace dialog box (Edit >> Replace or Ctrl + H)
Make sure no text is specified in the Find what or Replace with boxes
Click in the Find what box, then click Format (If you don’t see the Format button, click More to expand the search options)
Select Font from the pop up list
In the Find Font dialog box, select the text formatting options you would like to replace
Click OK
Click in the Replace with box
Click Format
Select Font from the pop up list
In the Replace Font dialog box, select the new text formatting options you would like to apply
Click OK
Click Replace all
Click OK
Click Close
(from http://wordprocessing.about.com/cs/quicktips/qt/fontreplace.htm)
As an aside: If the document uses styles, it is actually much easier to change the font. For this reason I try to always use styles and never directly apply fonts to text.
If you are not referring to Word documents, please amend your question to say exactly what software was used to create the text - or exactly what file-format the text is stored in.
Since you asked on stackoverflow.com I slowly deduced you may be writing a program in some unspecified programming language. I suggest you edit your question and specify what programming language you are using and give some example code to illustrate the problem.
For example, in Java you might do something like
JLabel label = new JLabel("hello world");
label.setFont(new Font("Arial", Font.PLAIN, 12));
It sounds very much as though the author of the original program has invented their own character encoding and provided a font to go with it. Maybe the development tools were restricted to ANSI text and the developers came up with this extreme solution.
Test out the hypothesis by writing some English text in the custom
font and see if Arabic
characters appear.
If this is so then you will have to work out what the encoding is and translate the strings character by character.

Resources