Search based on Text Attributes

Search based on Text Attributes - full-text-search

Is there any way to search text based on its colour and other attributes such as font size, style, etc in MS Word, other readers and editors.
Also the text can vary in its size, style, etc. Those are not included in the sample image. Plz help out solve this problem in Word atleast.

Related

Tool to recognize text in Image and edit it and create new image with edited text

Hi is there any tools which do the following steps:
Recognize Text in image
Edit the text
Create new image with ne text

In the general case, this is very difficult (especially for a picture).
You need a good OCR.
The OCR needs to be able to recognize the exact font.
If the background isn't a solid color, the OCR will be perturbed and possibly not able to extract the exact characters and erase them correctly.
But also, if there is a background, when you change the characters you need to reconstruct the background where characters have been erased.
Then the editor needs to paint the new text with the same rendering attributes (size, italics, outline...), which may be a difficult task.
Because of this complexity, the best option is often to do it by hand.

Changing Font Size of Japanese (Unicode) characters

I have a NSPopUpButton which contains either English or Japanese Strings read from a plist file according to the System's Language. Now when the Language is English I am able to change the font size by using code such as -
[auxStatePopup setFont: [NSFont fontWithName:#"Helvetica-BoldOblique" size:10.0]];
but Using such technique I am not able to change Japanese font size even if I tried by setting some Japanese font name which I googled and found out.
I want to do that because Japanese characters move slightly up when used. I intend to manipulate that upward movement by decreasing font size.
Thanks for any help..
OR
any way to move text in NSPopupButton downwards?

My impression is that the two samples are not using the same font. Please try to put a text with characters from both sets and see what happens.
Also try not to customize the font size and even the font face.
I also suspect that the text rendering engine may had overridden some of your changes due to the text length. iOS text rendering may try to change the font size of letter spacing if the text does not fit the control. So make some tests with shorter texts.
BTW, I think that you were mean to say that you want bigger font size for Japanese not smaller. In the screenshots the Japanese text is already too small to be properly read by anyone.

Changing font of text error

I have a textbox control inside of a software app which has some text in it. That software is using a custom font which doesn't exist anywhere else and is just specific to this program. I don't have it's source or access to it's creators. Now I want to copy that text inside of a notepad or MS word but when I do the text is no more readable unless I change the font of word processor to the font that the software is using (the font that text is written with). So I want the text to be readable anywhere and not to depend on a specific font. So is it possible?
I'm a c# programmer. Here is an example of unreadable text:
ý¶† ±øõœ ý¶† –ý¾‡¨ ÿ†°†¬ ñð‡ì úÞ±¶ Äì‡¤ ½±”
à¥ì ±øõœ þ·ñœŒ Ýç¨†Œ ô±º±” (.ì)
[þü‡íý‘†õø]
ý¶†
[þ¶ñùì ïõéÎ]
±øõœ ý¶† ‡º±”
[þíýº]
ý¶†
[úð‡ýì‡Î —‡¤çÈ¾†] ÿ¬.¹†.ë† °©ì ÿû¬‡ì ²† þÎõð.ÿ¬.¹†.ë†"
The interesting thing is that it's showing up like this in almost all the fonts except the one that text is originally written with. By the way the text is in Arabic and all of fonts that I tested the text with are supporting Arabic chars.
Now if I type some text that consist of English and Arabic in that font then change the font of notepad to some other font it's looks OK and works normal! So the problem only appears when the text is pasted into the word processor.
EDIT: I think I found the problem! The custom font is a raster font (bitmap font) which has a .fon extension and in the following thread someone wanted to convert the bitmap font to ttf since he was having a problem in printing the documents. I want to copy and paste, so maybe I have to convert the font ?
The discussion:
how to convert a bitmap font .fon into a truetype font ttf
Any kind of help is really appreciated.
thank you.

any kind of help is really appreciated.
If I had seen this question on superuser.com my answer would have been:
You can change the font of text from font A to Arial.
For example in Microsoft Word
Open the Replace dialog box (Edit >> Replace or Ctrl + H)
Make sure no text is specified in the Find what or Replace with boxes
Click in the Find what box, then click Format (If you don’t see the Format button, click More to expand the search options)
Select Font from the pop up list
In the Find Font dialog box, select the text formatting options you would like to replace
Click OK
Click in the Replace with box
Click Format
Select Font from the pop up list
In the Replace Font dialog box, select the new text formatting options you would like to apply
Click OK
Click Replace all
Click OK
Click Close
(from http://wordprocessing.about.com/cs/quicktips/qt/fontreplace.htm)
As an aside: If the document uses styles, it is actually much easier to change the font. For this reason I try to always use styles and never directly apply fonts to text.
If you are not referring to Word documents, please amend your question to say exactly what software was used to create the text - or exactly what file-format the text is stored in.
Since you asked on stackoverflow.com I slowly deduced you may be writing a program in some unspecified programming language. I suggest you edit your question and specify what programming language you are using and give some example code to illustrate the problem.
For example, in Java you might do something like
JLabel label = new JLabel("hello world");
label.setFont(new Font("Arial", Font.PLAIN, 12));

It sounds very much as though the author of the original program has invented their own character encoding and provided a font to go with it. Maybe the development tools were restricted to ANSI text and the developers came up with this extreme solution.
Test out the hypothesis by writing some English text in the custom
font and see if Arabic
characters appear.
If this is so then you will have to work out what the encoding is and translate the strings character by character.

iTextSharp stamper wraps text

I'm using iTextSharp to fill in some stamper AcroFields.
stamper.AcroFields.SetField("Title", "Lipsum");
I created the pdf in illustrator and the form fields with Adobe Acrobat X Pro. The problem is that although the text fields are the width of the page, in the saved pdf the text wraps at about 1 third of the width.
Another question would be if it's possible the have the textfield autoSize in height, or a way to handle the overflow of the text.

1) I'd like to see that PDF. I suspect the fields aren't as wide as you think they are.
2) You can set a field's font size to zero to enable "auto sizing", which works both within Reader and iText. However, it sizes to the actual field size, not what you think it might be.
I'm guessing you drew a spiffy form field background in Illustrator, then put a field over it in Acrobat Pro, but didn't size the field width to match the spiffy illustrator background. Could be wrong, but that's my hunch.
That's the flattened PDF. Can I see the original with the form field still intact? Sorry I wasn't more specific. None the less, I can learn a little from reading this PDF:
Looking at the bounding boxes for the flattened field XObject and it's internal clipping rectangle, it looks like it should be using most of the page:
The page is ~600 points wide by ~850 tall.
The flattened field XObject is ~560 points wide by ~100 tall.
I wonder if there's some non-standard carriage return characters in your text that iText picks up on by Acrobat does not...
Anyway, I'd like to see the unflattened PDF. Filled in is good, but not flattened.
Okay, looked at the template. I don't see anything that would cause the line breaking you're seeing... which makes me think my second guess was right: new line characters.
Looking at the text layout code might give me a hint. Each of your lines of text goes like this (for example):
1 0 0 1 2 88.24 Tm 0 g (Die Semmerrolle der l{e4}nge nach zu einer grossen Roulade)Tj
n n n n n n Tm: text matrix
g: gray (0 g: black)
(...)Tj: show text
That's consistent with the code path when you set a text field value in the trunk of iText (and the most recent release[s]). That code (ColumnText) is quite good at breaking text properly, and used all over the place. The bounding box is correct (as shown in a couple places of the flattened PDF).
Check your input.

Display of Asian characters (with Unicode): Difference in character spacing when presented in a RichEdit control compared with using ExtTextOut

This picture illustrates my predicament:
All of the characters appear to be the same size, but the space between them is different when presented in a RichEdit control compared with when I use ExtTextOut.
I would like to present the characters the same as in the RichEdit control (ideally), in order to preserve wrap positions.
Can anyone tell me:
a) Which is the more correct representation?
b) Why the RichEdit control displays the text with no gaps between the Asian Characters?
c) Is there any way to make ExtTextOut reproduce the behaviour of the RichEdit control when drawing these characters?
d) Would this be any different if I was working on an Asian version of Windows?
Perhaps I'm being optimistic, but if anyone has any hints to offer, I'd be very interested to hear.
In case it helps:
Here's my text:
快的棕色狐狸跳在懶惰狗1 2 3 4 5 6 7 8 9 0
apologies to Asian readers, this is merely for testing our Unicode implemetation and I don't even know what language the characters are taken from, let alone whether they mean anything
In order to view the effect by pasting these characters into a RichEdit control (eg. Wordpad), you may find you have to swipe them and set the font to 'Arial'.
The rich text that I obtain is:
{\rtf1\ansi\ansicpg1252\deff0\deflang2057{\fonttbl{\f0\fnil\fcharset0 Arial;}}{\colortbl ;\red0\green0\blue0;}\viewkind4\uc1\pard\sa200\sl276\slmult1\lang9\fs22\u24555?\u30340?\u26837?\u33394?\u29392?\u29432?\u36339?\u22312?\u25078?\u24816?\u29399?1 2 3 4 5 6 7 8 9 0\par\pard\'a3 $$ \'80\'80\cf1\lang2057\fs16\par}
It doesn't appear to contain a value for character 'pitch' which was my first thought.

I don't know the answer, but there are several things to suspect:
There are several versions of the rich edit control. Perhaps you're using an older one that doesn't have all the latest typographic improvements.
There are many styles and flags that affect the behavior of a rich editcontrol, so you might want to explore which ones are set and what they do. For example, look at EM_GETEDITSTYLE.
Many Asian fonts come in two versions on Windows. One is optimized for horizontal layout, and the other for vertical layout. That latter usually has the same name, but has # prepended to it. Perhaps you are using the wrong one in the rich edit control.
UPDATE: By messing around with Wordpad, I was able to reproduce the problem with the crowded text in the rich edit control.
Open a new document in Wordpad on Windows 7. Note that the selected font is Calibri.
Paste the sample text into the document.
Text appears correct, but Wordpad changed the font to SimSun.
Select the text and change the font back to Calibri or Arial.
The text will now be overcrowded, very similar to your example. Thus it appears the fundamental problem is with font linking and fallback. ExtTextOut is probably selecting an appropriate font for the script automatically. Your challenge is to figure out how to identify the right font for the script and set that font in the rich edit control.

This will only help with part of your problem, but there is a way to draw text to a DC that will look exactly the same as it does with RichEdit: what's called the windowless RichEdit control. It not exactly easy to use: I wrote a CodeProject article on it a few years back. I used this to solve the problem of a scrollable display of blocks of text, each one of which can be edited by clicking on it: the normal drawing is done with the windowless RichEdit, and the editing by showing a "real" RichEdit control on the top of it.
That would at least get you the text looking the same in both cases, though unfortunately both cases would show too little character spacing.
One further thought: if you could rely on Microsoft Office being installed, you could also try later versions of RichEdit that come with office. There's more about these on Murray Sargent's blog, as well as some interesting articles on font binding that might also help.

ExtTextOut allows you to specify the logical spacing between records. It has the parameter lpDx which is a const pointer to an array of values that indicate the distance between origins of adjacent character cells. The Microsoft API documentation notes that if you don't set it, then it sets it's own default spacing. I would have to say that's why ExtTextOut is working fine.
In particular, when you construct a EMR_EXTTEXTOUTW record in EMF, it populates an EMR_TEXT structure with this DX array - which looking at one of your comments, allowed the RichEdit to insert the EMF with the information contained in the record, whereby if you didn't set a font binding then the RTF record does some matching to work out what font to use.
In terms of the RichEdit control, the following article might be useful:
Use Font Binding in a Rich Edit Control
After character sets are assigned, Rich Edit scans the text around the
insertion point forward and backward to find the nearest fonts that
have been used for the character sets. If no font is found for a
character set, Rich Edit uses the font chosen by the client for that
character set. If the client hasn't specified a font for the character
set, Rich Edit uses the default font for that character set. If the
client wants some other font, the client can always change it, but
this approach will work most of the time. The current default font
choices are based on the following table. Note that the default fonts
are set per-process, and there are separate lists for UI usage and for
non-UI usage.
If you haven't set the characterset, then it further explains that it falls back to ANSI_CHARSET. However, it's most definitely a lot more complicated than that, as that blog article by Murray Sargent (a programmer at Microsoft) shows.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio