Google Search Appliance: Result Encoding

Google Search Appliance: Result Encoding - utf-8

I got some Problems with the Google Search Appliance, File Shares and German Special Charaters like "ä, ö, ü" in the Results page.
I did a lot of googling, and in Stackoverflow I couldn't find only one Question which wasn't answered.
We are working with Internet Explorer 8, but with Internet Explorer 10, we are facing the same Problem.
As long as there are none of those special Characters like ä,ö,ü, etc, the URL is rendered as
a href="file://///group.server.ch/Directory/"
with Escaped Urls and works fine.
If it's a Intranet Link which starts with "http://", it works with those special Characters too.
For Example the word "Ablösung", "ö" is Escaped with %C3%B6, which is UTF-8:
In a HTML Link, this is working (in IE shown as "Abl%C3%B6sung", but working)
In a File:////-Link, it's rendered wrongly. (In IE it's shown as "AblÂ¶sung", not working)
If i copy and save the Result Page, edit the HTML and change the Escaped Character to ISO 8859-1, where the "ö" is %F6: "Abl%F6sung", it's working fine (shown as "Ablösung")
The Problem now is that I'm not able to change the Encoding to ISO 8859-1 in the Google Search Appliance.
I tried the following:
In the XSLT there is the Part, where the "file://///"-Part of the URL is concatted:
<xsl:when test="$protocol='nfs' or $protocol='smb'">
<xsl:value-of disable-output-escaping='yes' select="concat('file://///',$temp_url)"/>
</xsl:when>
Changed "disable-output-escaping" to "no"
Changed "file://///" to "file://"
Set The "oe"-Param in the Search Query (which should change the Encoding of the Result) to "latin-1" or "ISO-8859-1"
All this did not change anything in the Result Encoding.
Now my questions:
Am I doing something wrong in the Configuration?
Are there other options to change the Encoding for File-Share-Links?
Or was the URL already crarwled and stored in a "wrong way" (with %C3%B6..) and I have to change something with the Crawling? (Although I couldn't find a lot of possible settings there).
Or are there any settings in the Internet Explorer to interpret the UTF-8 as ISO-8859-1?
As there should be a lot of Users with those Latin Characters using the GSA, I cannot imagine that I'm the only one with those Problems.
Any suggestions?

The issue is most likely
http://support.microsoft.com/kb/941052
The default frontend deals with this problem by invoking the function fixFileLinks(), which changes the link URL to Unicode string.
If you are developing your own frontend you need to run the same function to make sure the file:// links are clickable.

Related

How do I link and embed a UTF8 encoded text file in an MS-Word document?

I would like to include the contents of a UTF8 text file in a MS Word document as a link. This works for an ansi encoded file using the field:
{INCLUDETEXT "path\file.txt" \c ansitext \* MERGEFORMAT}
Is there a directive akin to \c ansitext for UTF8 files? \c utf8 and \c utf8text do not appear to work.
If I do not give any directive, Word recognizes that the file is UTF8, but a dialog pops up requiring me to confirm this each time the file needs updating, which I want to avoid.

There is a directive ( \c Unicode ) but unfortunately using it does not actually eliminate the character encoding pop-up, even when the Unicode text starts with a BOM (Byte Order Mark), which are in any case discouraged by Unicode.
So although that answers the question actually asked, it doesn't solve the problem. Nor, according to the discussion in comments to the Question, would any of the following solve the problem for the OP, but they might help others.
According to the ISO 29500 standard that describes .docx documents, INCLUDETEXT is supposed to have an \e switch that lets you specify an encoding. But, according to Microsoft's standard document [MS-OI29500].pdf, Word ignores any \e switch.
As far as I am aware the only way to avoid that pop-up when the included text is in Unicode format (UTF-8) is to set a value in the Windows Registry that tells Word the default encoding for text files.
The problem with that is that that setting will affect what happens to all the text files opened by Word, whether through the file open dialog or an INCLUDETEXT.
To create the setting, you need to navigate to the following Registry location, e.g. for Word 2016/2019 it would be
HKEY_CURRENT_USER\Software\Microsoft\Office\16.0\Word\Options
and for Word 2010 it would be
HKEY_CURRENT_USER\Software\Microsoft\Office\14.0\Word\Options
Then add a DWORD value called DefaultCPG and set its value to the code page you want to be the default. For UTF-8, that's decimal 65001.
If you have control over the format of the file to be included, you could consider using a format that wouldn't trigger the encoding pop-up. That leads to another set of problems, e.g. if you used HTML you would probably have to deal with HTML special characters such as & etc., whitespace, and RTL characters (which Word seems to reverse). But the following HTML "framework" is enough to insert a text chunk without additional paragraph marks and so on:
<html>
<meta charset="UTF-8">
<body>
<a name="x">your text</a>
</body>
</html>
In the INCLUDETEXT field, you then use the "x" to indicate the subset you want to include, e.g.
{INCLUDETEXT "path\file.htm" x \c HTML}
The HTML coding <a name="something"> is deprecated in HTML 5, but Word only understands the earlier HTML convention.

CK Editor strange characters

I have just installed CK Editor onto a form that submits data to a database, when I want to use an apostrophe ' It is displaying as & #39; on my web page instead.
How can I get it to display an apostrophe instead?

What's happening is that somewhere along the line, CKEditor (or maybe another part of the system) is going through and converting characters that might potentially cause problems (due to having special meaning in HTML) into their HTML entity representations.
This is normal behaviour and if you don't need to do any string manipulation inside your database you can happily leave it as is for that stage. Indeed you can have them in along with normal HTML text and it should render just fine.
Clearly your setup is sufficiently different that something isn't happening. So, you'll want to use something like PHP's html_entity_decode() to convert back to normal unescaped text. There should be an equivalent function available in any language with a half-decent standard library.

How to render Latin 1 characters correctly using coldfusion 11 and pdf document

I am having trouble getting the latin character ä to display correctly in a pdf generated from ColdFusion code. I get Ã¤ instead of ä. I am setting cfprocessing directive to UTF-8. I can hardcode &auml and get the ä character displaying correctly in the pdf. So the pdf can handle displaying the character.
Most posts refer to a mismatch of encoding that is causing this mojibake, but I can't see where there is any mismatch. The value being used as input into the pdf document is coming from a form (ColdFusion) that takes user input and the form tag in CF code has this property set:
accept-charset="utf-8"
On the input form, and in the cfm file handling the submit, the processing directive is set to UTF-8. Does anyone have any suggestions?

I did end up solving my issue. It was obviously an encoding mismatch and because the user entered data is passed to another cfm page via javascript logic I was guessing that I was getting a one-byte character represention...which cannot display latin-1...so i played around with the CharsetDecode and CharsetEncode methods. I decoded the user string using windows-1252 and then encoded it using utf-8, and voila..no more issues. Not real happy with my solution, but after a couple of days of playing with it i was happy to "beat" it finally.

Pinterest removing the media parameter

I'm having a bad time trying to implement a simple PinIt button.
I followed all the process and it works fine, with an exception: it is removing the Media parameter from the anchor tag.
This means that the PinIt button will open a window showing all the images from that page and the user needs to select one.
The source is ok:
<img src="//assets.pinterest.com/images/pidgets/pinit_fg_en_rect_white_20.png" />
But, when the page is loaded, the pinit.js is replacing the parameters.
I have tried to find a solution on the web and read something about the URL Enconde, I have tried with UTF-8 and ISO-8859-1 but without success.
The rendered html is:
<span class="PIN_1395089773564_hidden" id="PIN_1395089773564_pin_count_0"><i></i></span>
The media parameter is there, empty.
Thanks for your time,
William Borgo.

I believe the problem is actually in your url parameter. It cannot contain hashtags or other types of parameters. If you delete ?idItem=6920 from the url it will probably work.

I think your URL encoding is incorrect and is confusing Pinterest as to what is part of the Pinterest URL and what is part of one of the parameters - essentially where each parameter begins and ends, and what's a separate parameter for Pinterest vs a continuation of a previous parameter. (This is really the purpose of URL encoding for parameters.)
That is, the overall Pinterest URL should be like:
www.pinterest.com/pin/create/button/?url=[url]&media=[media]&description=[description]
The "&" separating the url, media, and description parameters should NOT be encoded. But each of the parameters themselves (the parts in [brackets]) SHOULD be encoded.
So for instance:
https://www.pinterest.com/pin/create/button/?url=http%3A%2F%2Fwww.tokstok.com.br%2Fvitrine%2Fproduto.jsf%3FidItem%3D121826&media=http%3A%2F%2Fwww.tokstok.com.br%2Fpnv%2F570%2Fc%2Fconnmlt_czbr1.jpg&description=CONNECTION%20MESA%20PARA%20LAPTOP
...which you could look at like this (with line breaks between parameters and some spacing):
https://www.pinterest.com/pin/create/button/
?url = http%3A%2F%2Fwww.tokstok.com.br%2Fvitrine%2Fproduto.jsf%3FidItem%3D121826
&media = http%3A%2F%2Fwww.tokstok.com.br%2Fpnv%2F570%2Fc%2Fconnmlt_czbr1.jpg
&description = CONNECTION%20MESA%20PARA%20LAPTOP
(Note: the URL you gave seems not to be active any more, so I grabbed another product from the site.)

How to prevent Html.ActionLink to encode non-ASCII character in URL

In my following code, when postSummary.SEOFriendlyTitleInURL contains Chinese characters, those characters will be encoded in the url.
#Html.ActionLink(
postSummary.Title,
"View",
new
{
id = postSummary.Id,
friendlyTitle = postSummary.SEOFriendlyTitleInURL
})
Although that url will be shown as original characters in Google Chrome and Firefox, it is an encoded string in IE. I want to prevent the default encoding behavior of ActionLink method, because I can type directly in address bar those characters that are not encoded. So I think they are legal in URL.
I can simply construct the link manually, but it would be better to be generated for consistency:
#postSummary.Title
Edit:
My current solution: instead of prevent the framework from encoding just non-ASCII characters, I tell it to not encode any characters, by using Html.ActionLink combined with Server.UrlDecode method. In case there are some characters that do need to be percent-encoded, they could only appear in the "friendlyTitle" fragment. Because that fragment is only used for readability, I replace such characters with a dash character.
Such replaced characters including
a list of reserved characters in URI. see https://www.rfc-editor.org/rfc/rfc3986#section-2.2
single and double quote
tab character
Still, non-ASCII characters should be percent-encoded at some point because they are not valid in URI, and it is better to be done when URL is generated. But by observing Fiddler when request a page through a URL that contains Chinese characters, it seems that the URL will be encoded automatically (maybe by web browser). For readability, I choose to let web browser do the encoding work.

There is no difference between the browsers as far as what is being generated in your <a> tag; the only difference is in how the browser is displaying it. There's nothing to be altered here; Html.ActionLink() is correctly encoding/not encoding.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio