How define charset for HTMLhelp? - winapi

My C++ windows program uses htmlhelp. Structure HH_POPUP includes field pszFont in format: "Facename[, point size[, CHARSET[, color[, PLAIN BOLD ITALIC UNDERLINE]]]]", but I cannot find any info about way to define the charset. My russian popup help is totally unreadeable.
HH_POPUP popupAttr;
memset(&popupAttr, 0, sizeof(popupAttr));
popupAttr.cbStruct = sizeof(popupAttr);
popupAttr.clrBackground = COLORREF(-1);
popupAttr.clrForeground = COLORREF(-1);
popupAttr.rcMargins.left = -1;
popupAttr.rcMargins.bottom = -1;
popupAttr.rcMargins.right = -1;
popupAttr.idString = UINT(helpInfo->dwContextId);
popupAttr.pt = helpInfo->MousePos;
popupAttr.pszFont = _T("Arial,18,HOW_TO_DEFINE_THIS_CHARSET"); // please!!!
CWnd::GetDesktopWindow()->HtmlHelp(reinterpret_cast<DWORD>(&popupAttr), HH_DISPLAY_TEXT_POPUP);

(Just a guess.) It might be that the charset needs to be defined in your HTML Help rather than the HH_POPUP structure. Is the charset specified in the META tags of your HTML Help topics? E.g.:
<META http-equiv="Content-Type" content="text/html" charset="Windows-1251">
Also, is the corresponding language specified for your help file? E.g.:
<Project.hhp>
[OPTIONS]
Language=0x419 Russian (Russia)

The problem has solved by converting txt file with popup labels from UNICODE to ANSI. Thank you everyone for your help

Related

How to replace Smart Quotes with Straight Quote in Visual Basic (Classic ASP)? [duplicate]

How can I show an nvarchar column that stores unicode data (Entered with the zawgyi1 font) in a classic ASP web page?
When I retrieve and write the value to the page, it shows "?????". I set my ASP page's content type of UTF-8 with the following meta tag:
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
Unfortunately, the text is still rendered as "?????".
Any suggestions or ideas on how to display unicode values in a classic ASP page?
The Content-Type meta header informs the browser to treat the content sent as a UTF-8 encoded text stream. It doesn't ensure that the stream sent is actually UTF-8. To handle UTF-8 correctly you need to do 3 things:-
Ensure your static content is saved in a UTF-8 compatible encoding.
Ensure your dynamic content is encoded to UTF-8.
Inform the client that the content is UTF-8 encoded.
Item 1 requires either that you actually save the ASP file as a UTF-8 encoded file or that all your static content in the file is within the ASCII character range (0-127). Note if you save as UTF-8 then all your server-side script must use characters within the ASCII character range. In Visual Studio you can do so by "Saving the file AS..." and then clicking on the little arrow on the Save button.
Item 2 requires that the Response.CodePage property is set to the UTF-8 code page 65001, you can do this in code or by adding the attribute CODEPAGE=65001 to the <%# %> declarations on the first line of the ASP file. If you do it in code you must set it before any calls to Response.Write.
AND: do not use chr or asc functions (these are buggy when using 65001) but use chrw and ascw instead.
Item 3 requires that the Content-Type header contains the charset=UTF-8 qualifier. As you are already doing you can do this with the META header. Personally I find that to be a bit of kludge, I prefer to use Response.Charset = "UTF-8" in code. This places the qualifier on the true Content-Type HTTP header.
What about your codepage definition at the top of your page?
<%#LANGUAGE="VBSCRIPT" CODEPAGE="65001"%>
Here's a useful script to batch-convert ASP files from ANSI to UTF-8 encoding:
<HTML>
<HEAD>
<TITLE>ASP UTF-8 Converter - TFI 13/02/2015</TITLE>
</HEAD>
<BODY style='font-face:arial;font-size:11px'>
<%
Dim fso, folder, files, NewsFile, sFolder, objFSO, strFileIn, strFileOut
Set fso = CreateObject("Scripting.FileSystemObject")
sFolder = "C:\inetpub\wwwroot\sitefolder"
Function ANSItoUTF8( ANSIFile)
UFT8FileOut=ANSIFile&".utf8"
Set oFS = CreateObject( "Scripting.FileSystemObject" )
Set oFrom = CreateObject( "ADODB.Stream" )
sFFSpec = oFS.GetAbsolutePathName(ANSIFile)
Set oTo = CreateObject( "ADODB.Stream" )
sTFSpec = oFS.GetAbsolutePathName(UFT8FileOut)
oFrom.Type = 2 'adTypeText
oFrom.Charset = "Windows-1252"
oFrom.Open
oFrom.LoadFromFile sFFSpec
oTo.Type = 2 'adTypeText
oTo.Charset = "utf-8"
oTo.Open
oTo.WriteText oFrom.ReadText
oTo.SaveToFile sTFSpec,2
oFrom.Close
oTo.Close
oFS.DeleteFile sFFSpec
oFS.MoveFile sTFSpec,sFFSpec
End Function
ConvertFiles fso.GetFolder(sFolder), True
Function ConvertFiles(objFolder, bRecursive)
Dim objFile, objSubFolder
For each objFile in objFolder.Files
If Ucase(fso.GetExtensionName(objFile)) = "ASP" Then
ANSItoUTF8 objFile.path
response.write "• Converted <B>"&fso.GetAbsolutePathName(objFile)&"</B> from ANSI to UTF-8<BR>"
End If
Next
If bRecursive = true then
For each objSubFolder in objFolder.Subfolders
ConvertFiles objSubFolder, true
Next
End If
End Function
%>
</BODY>
</HTML>

encoding problem for SUBJECT of email using CDO

Using vbscript (asp) with CDO I have problem with encoding in SUBJECT of email. I have used two solutions for BODY part of email and both works but non of them works for SUBJECT part.
First solution: Endcoding characters of email BODY using chrw (not working for subject):
for x=1567 to 1785
encoded="&#" & x & ";"
Body= Replace(Body, chrw(x), encoded, 1, -1, 1)
next
Second solution: setting HTMLBodyPart encoding:
objMessage.HTMLBodyPart.Charset = "utf-8"
is there something similar for SUBJECT part of email (e.g. objMessage.SubjectPart.Charset)?
Try:
objMessage.TextBodyPart.Charset = "utf-8"
or simply:
objMessage.BodyPart.Charset = "utf-8"
It has been documented elsewhere that modifying the Charset of the TextBodyPart also impacts (the plain/text) Subject.
Hope this helps.

PdfBox: PDF/A-1A to PDF/A-3A

i have the following problem:
i want to transform a PDF/A-1A document to a PDF/A-3A.
The original document is validated by Arobat Reader Pro, so i can asume it is PDF/A-1A conform.
I try to convert the PDF metadata with the following code:
private PDDocumentCatalog makeA3compliant(PDDocument doc) throws IOException, TransformerException {
PDDocumentCatalog cat = doc.getDocumentCatalog();
PDMetadata metadata = new PDMetadata(doc);
cat.setMetadata(metadata);
XMPMetadata xmp = new XMPMetadata();
XMPSchemaPDFAId pdfaid = new XMPSchemaPDFAId(xmp);
xmp.addSchema(pdfaid);
XMPSchemaDublinCore dc = xmp.addDublinCoreSchema();
String creator = "TestCr";
String producer = "testPr";
dc.addCreator(creator);
dc.setAbout("");
XMPSchemaBasic xsb = xmp.addBasicSchema();
xsb.setAbout("");
xsb.setCreatorTool(creator);
xsb.setCreateDate(GregorianCalendar.getInstance());
PDDocumentInformation pdi = new PDDocumentInformation();
pdi.setProducer(producer);
pdi.setAuthor(creator);
doc.setDocumentInformation(pdi);
XMPSchemaPDF pdf = xmp.addPDFSchema();
pdf.setProducer(producer);
pdf.setAbout("");
PDMarkInfo markinfo = new PDMarkInfo();
markinfo.setMarked(true);
doc.getDocumentCatalog().setMarkInfo(markinfo);
pdfaid.setPart(3);
pdfaid.setConformance("A");
pdfaid.setAbout("");
metadata.importXMPMetadata(xmp);
return cat;
}
If i try to validate the new file with Acrobat again, i get a validation error:
CIDset in subset font is incomplete (font contains glyphs that are not listed)
if i try to validate the file with this online validator (http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx) it is a valid PDF/A-3A....
am i missing something?
is nobody able to help?
EDIT: Here is the PDF file
This worked for us to be fully PDF/A-3 compliant regarding the CIDset issue:
private void removeCidSet(PDDocumentCatalog catalog) {
COSName cidSet = COSName.getPDFName("CIDSet");
// iterate over all pdf pages
for (Object object : catalog.getAllPages()) {
if (object instanceof PDPage) {
PDPage page = (PDPage) object;
Map<String, PDFont> fonts = page.getResources().getFonts();
Iterator<String> iterator = fonts.keySet().iterator();
// iterate over all fonts
while (iterator.hasNext()) {
PDFont pdFont = fonts.get(iterator.next());
if (pdFont instanceof PDType0Font) {
PDType0Font typedFont = (PDType0Font) pdFont;
if (typedFont.getDescendantFont() instanceof PDCIDFontType2Font) {
PDCIDFontType2Font f = (PDCIDFontType2Font) typedFont.getDescendantFont();
PDFontDescriptor fontDescriptor = f.getFontDescriptor();
if (fontDescriptor instanceof PDFontDescriptorDictionary) {
PDFontDescriptorDictionary fontDict = (PDFontDescriptorDictionary) fontDescriptor;
fontDict.getCOSDictionary().removeItem(cidSet);
}
}
}
}
}
}
}
OK - I think I have an answer on your question from the perspective of the callas and/or Adobe technology (and once more, I'm affiliated with callas and its pdfToolbox technology that is also used inside of Acrobat).
According to my research and the people I consulted, your example PDF document contains a font with a CID character set that is incomplete. Why does pdfToolbox or Acrobat say it's a valid PDF/A-1a file but not a valid PDF/A-3a file? Interesting question:
1) The rules for incomplete CID sets changed between PDF/A-1a and PDF/A-3a. They are stricter in PDF/A-3a.
2) But while in PDF/A-1a a CID set always had to be there, in PDF/A-3a you can have a valid, compliant file, without such a CID set.
So, your PDF file contains a CID set (which makes it valid for PDF/A-1a and A-3a) but while that CID set is fine for A-1a it does not contains all characters to be A-3a compliant.
To test at least part of this theory, I processed your file through pdfToolbox with a fixup entitled "Remove CIDset if incomplete". That correction (as the name implies) removes the CID set from the file but doesn't change anything else. After doing so your file validates as a valid A-3a file.
That leaves the question why the pdftools web site claims this is a valid PDF/A-3a file; according to the people I've spoken to, the result from preflight for this file is correct and there should be an error on this file. So perhaps that's something you need to take up with the pdftools guys (and they possibly with callas to figure out who's finally right).
Feel free to send me a personal message if you want to discuss this further - more discussion on the tools themselves probably becomes off-topic for this public site.

special character in a a tags href link when making a link to an email is not parsed right to the mail client

I got the bug as Molizza Firefox post
https://bugzilla.mozilla.org/show_bug.cgi?id=230096
I would like to know this bug is fixed or not? anybody still has this issue?
Looks like some kind of UTF-8 issue. Either the mail is sent using UTF 8 (and interpreted as ANSI) or something similar happens when parsing DOM or evaluating the link.
But in general, you shouldn't put non-ANSI-characters into URLs. Instead, escape them using %HEXCODE. Also space characters should be replaced using %20 or +.
Most programs (like web browsers or in this case Outlook) accept space characters and other things, but you still shouldn't rely on that behavior as it might go wrong (as it did here).
Here is my html code
<html>
<head>
<meta charset="utf-8">
</head>
<body>
<a id="test">test</a>
<script>
function buildMailTo(address, subject, body) {
var strMail = 'mailto:' + encodeURIComponent(address)
+ '?subject=' + encodeURIComponent(subject)
+ '&body=' + encodeURIComponent(body);
return strMail;
}
var strTest = buildMailTo('abc#xyz.com', 'Foo&foo', 'Chỉ sau 2/3 thời gian làm bài thi tốt nghiệp môn Toán, nhiều thí sinh đã ra khỏi phòng với gương mặt phấn khởi. Nhiều em tự tin sẽ được trên 8 điểm.');
document.getElementById('test').href = strTest;
</script>
</body>
</html>

Unable to read a path with special characters in FSRef

With this code I am trying to get the path in const char *pathPtr from fsRefAEDesc. It gives the correct name and path if there are no special characters in the name of file which is there in fsRefAEDesc. Now if path has some special characters /Users/XYZ/.rtf I don't get a correct fsRef from AEGetDescData(). I believe it has some thing to do with Encoding and tried some them but could make it work.
FSRef fsRef;
//AEDesc fsRefAEDesc; //comes from some where.
status = AEGetDescData( &fsRefAEDesc, (void*)(&fsRef), sizeof(FSRef));
//OSErr result = FSMakeFSRefUnicode(&fsRef, 1024, (UniCharPtr)(&fsRef), kTextEncodingUnknown, &fileRef);
AEDisposeDesc( &fsRefAEDesc );
CFURLRef* gotURLRef = CFURLCreateFromFSRef(NULL, &fsRef);
CFStringRef macPath = CFURLCopyFileSystemPath(gotURLRef, kCFURLPOSIXPathStyle);
const char *pathPtr = CFStringGetCStringPtr(macPath, CFStringGetSystemEncoding());
Is there is any way to read such paths?
At what point in your code does the problem occur? For instance, if you insert CFShow(macPath), do you see the right path in the debug log? If so, then you are not passing the right encoding to CFStringGetCStringPtr. Use UTF-8.
Also tried this for gotURLRef but I
got the same on my console i.e.
"/Users/Manish/Desktop/\u27a4\u00a9\u261a.png"
The Unicode escape sequence is what you get when going through CFURL calls. URL has very limited character range.
You can try FSRefMakePath. It will get you UTF8 encoded path from a FSRef.

Resources