Hebrew Characters are displayed as junk values - pdf-generation

I am using Apache fop for PDF generation, using the belwp jars
1) Baitik all
2) fop
3) xmlgraphics-commons
4) avalon-framework
5) pdfbox
6) Fontbox
Can anybody help me on these.

Related

CKEditor: How to get only characters

I'm using wordcount plugin for CKEditor. It perfectly displays wordcount and characters count ignoring spaces.
How do I get only characters(without spaces/line-breaks)? Is there any default API provided CKeditor or wordcount plugin?
editor.getData() - returns complete text with HTML
editorContent.text().trim() - returns text(without HTML) but it doesn't ignore line-breaks and spaces.
No, there is no official API or plugin for that.
Instead of trimming editorContent.text() you could replace all whitespace characters using regex (e.g. /\s/g).

Does Npoi library support i18n for numbers and dates ?

Suppose I have a number - 1000.65 . I want this to be downloaded in Italian number format as - 1.000,65
Dont want to convert it to string while formatting.

Ghostscript - reembed font

I want to user ghostscript to optimize pdf files.
My files are generated by iText, and there is font which is embeded too many times - 3000+;
I want to repring document with ghostscript, which will remove all embeded and embed it only once in file.
Do you know how to do it ?
And additional question - is there any difference detween ghostscript and ghost4j ?
Thanks
You cannot do that, most likely. Without seeing the file I cannot be certain, but the probability is that each font is embedded as a subset. That is it contains just a few of the glyphs that were originally present in the font.
So if the first instance contains, say a, c, f and g and the second instance contains b, e and h you can see that the two fonts are actually different.
Worse, the text is usually re-encoded, so the character codes are not what you would expect. In the example above, 'a' would not have the character code 0x61 (ASCII for 'a'), it would have the character code 1. c would be 2, f would be 3 and so on. But in the second case, character code 1 would be 'b', character code 2 would be 'e' etc.
There's no easy way to recombine the multiple font subsets, and also re-encode each set of text to the 'combined' font.
Where the pdfwrite device detects multiple subset fonts which have compatible encodings (the character codes used are unique in each font) it will combine them together. It won't attempt to re-encode them again though, so if the two fonts use the same character codes (as per my example above) pdfwrite will just emit two fonts.
Assuming you've already tried running the file through pdfwrite and didn't get the result you wanted, then that's it, you can't achieve the result you want with the current code.
Probably you can tell iText not to subset the fonts, which will solve the problem for you at the source, rather than trying to fix it afterwards.

Oracle Reports UTF8 PDF fields trimmed to half the number of characters

Can you please help with this issue I'm having after the introduction of UTF8 on the Reports servers:
We had up to now: database - UTF8, reports server - CL8ISO8859P5.
Now the reporting server was changed to UTF8 too and in PDF fields which display characters with multi-byte code points are treated as they would have half of their real length.
They are filled only to half their length, the rest of the text is trimmed and the remainder of the field is left blank.
If I enlarge the field in Reports Builder, the text inside will accomodate more characters, proportionately, but also still up to only about half.
I've tried recompiling the reports (with UTF8 set in the compiler), tried also specifying NLS_LENGTH_SEMANTICS=CHAR in both DB and application server, also changing other fonts, all with no effect.
However, if we put in the respective DB field regular characters, with code points <= 255, and maintaining the same settings everywhere, the fields are filled entirely.
.
The fonts used are true type and subset-ed into the pdf (for example Courier New or Times New Roman) - same as before the UTF8 change.
Any hint would be greatly appreciated, thank you!

How do I prevent Turkish letters from dropping when using UIFont in cocos2d?

I'm doing the following to create a label that I use as part of attribution for a photo:
CCLabelTTF *imageSourceLabel = [CCLabelTTF labelWithString:[_organism imageSource] fontName:[[UIFont systemFontOfSize:12] fontName] fontSize:12];
Several of the image sources include Turkish letters. For example, in this URL:
http://commons.wikimedia.org/wiki/File:Şahlûr-33.jpg
This displays improperly in my iPad app; the Turkish letters are missing.
How do I create a label that will work with text like that in the URL above?
Edit:
Nevermind... the problem is with exporting from Excel. See the comments on the answer below. This link provides some additional information: Excel to CSV with UTF8 encoding
Additional Edit:
Actually, it's still a problem, even after I export correctly and verify that I have the proper UTF-8 (or is it 16?) letters in the CSV file. For example, this string:
Dûrzan cîrano / CC BY-SA 3.0
Is displayed like this:
and this string:
Christian Mehlführer / CC-BY 2.5
is displayed like this:
It's definitely being processed improperly upon import, as CCLOG generates the following:
Photo Credit: Dûrzan cîrano / CC BY-SA 3.0
More Info:
Upon import, I'm storing the following value as a string in an array:
"Christian Mehlf\U00c3\U00bchrer / CC-BY 2.5"
Wikipedia says the UTF-8 value for ü, in hex, is C3 BC. It looks like the c3bc is in there, but masked as \U00c3\U00bc.
Is there any way to convert this properly? Or is something fundamentally broken at the CSV import level?
The solution is below.
There were several problems:
Excel on the Mac doesn't export UTF-8 properly. The solution I used was to paste the data into Google Spreadsheet and export from there. More info here: Excel to CSV with UTF8 encoding
I realized that once I had the proper data in the CSV file, that I was importing it with the improper settings. I'm using parseCSV and needed to set _encoding in the -init method to NSUTF8StringEncoding instead of the default, NSISOLatin1StringEncoding.
if you try this:
[CCLabelTTF labelWithString:[[_organism imageSource] stringByUnescapingHTML] fontName:[[UIFont systemFontOfSize:12] fontName] fontSize:12];
it will likely work better. I suspect your url string is escaped HTML.

Resources