UTF-8 or windows-1252? - utf-8

I have windows XP at home - home ed, with SP3. In any case, at College, they have windows 7. So, basically when I saved my documents and brought them here, things messed up. I was writing up a short bio.
I was coding my website, and so as usual I had used charset utf-8, the standard. But when I get home and I verify my website (locally), I see the weird characters appear! The triangle and the question mark inside it. So, then I'm like WTF? So I decide to go online and check which charset is better. So randomly, I fall onto windows-1252. Voila, it worked! But then, I decided to re-use charset utf-8, being the standard. I don't want to mess up my website lol.
So I basically go back inside my html document, just to notice that very weird characters appeared. So I delete them and replace them with the the apostrophe that were originally there. Finally, I check my website, and the apostrophes correctly appear.
So, what the hell is going on??? And should I keep using utf-8?

It sounds like the content of the webpage is actually encoded as Windows-1252 by whatever editor you are using, but you are manually writing a <meta> tag that states UTF-8 instead. That would account for the behavior you describe. An explicit charset declaration must match the actual encoding used by the data. When you tell your editor to save the document, make sure it is saving the data in the correct encoding you are expecting. Some editors do support multiple encodings, so don't just blindly use a default encoding if multiple encodings are available.

Related

Is it possible to specify a URI to a pdf that points to a specific page?

I'm specifically curious about Windows, but answers about different OS are interesting too.
Afaik in URLs a specific PDF page can be indicated by adding a #page=<page number> field. According to the URI specification, fields (using the #<field> syntax) and queries (using the ?<key>=<value> syntax) should be possible. However, URIs of the form file:///path_to_document.pdf#page=20 or file:///path_to_document.pdf?page=20 didn't work for me, Windows is interpreting the whole string as a path, which it then can't find.
Is there any way to accomplish this? I couldn't find anything online.
When calling files from the Operating System there are rules that are system based. So calling a file from OS needs a certain syntax, that needs quoting for some characters, so these work, whatever the default PDF handler may be:-
That default handler may include a different page switch syntax such as exe -page ## filename.
When using a URL you need a URL handler, so this will work in Windows
"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" "file:///C:/Apps/PDF/poppler/%2333.pdf#page=20"
The URL can be on a LAN server and windows will often fiddle with the usage of \ and / but not aways accept both. So the rule is "use quotes" AND replace system punctuation with safe characters AND ensure "Remember last page" is not active.
The use of # fragments for PDF navigation was introduced by Adobe for their Acrobat plug-in and has in parts been adopted by some browser plug-ins . So some work one way in Chrome but some can be different name or behaviour in FireFox, so beware which browser you use as default.
One example is #search= works well in FireFox but is totally different in Chromium. #:~:text=Chromium but does not work with PDF nor this page !

Laravel localization to German and special letters

I have a problem with some German's special letters (ö, ü...) in my Laravel application.
My encoding is set to UTF-8.
Everything works fine with the content from the database (where is utf8_general_ci). When I hardcode some text to Blade view files, that's fine, too. But, I'm using localization files (/app/lang/de/myFile.php) with an associative array.
German characters from that array are displayed as � � �. What is strange, when I var_dump(trans('myFile.key')) in Blade, special characters works, but when echo trans('myFile.key'), there are that question marks.
Any ideas?
Ok, after few hours :) I succeed! The point is to save localization file in UTF-8 encoding. Sublime Text by default saved it as Windows-1250.
I have had a similar problem with one of my projects; not sure if it is related though. Different web browsers handle locale and translations differently. Once in Firefox £ did not work properly unless you changed the browser's locale to UK. However, &163; worked universally. Opera seemed to work across the board but IE and Firefox had strange behavior when trying to use characters or symbols not native to the locale the browser was in.
First thing I would attempt is to change your browser's language and see if that fixes it. If so then the issue will be on how the browser is interpreting what gets returned. If this fixes it then there is a strong chance this will not be an issue for German speakers because their browser will (should) already support the language.
If not, then the problem could lay in the formatting of the files on the server. If the files are being stored on the webserver as ANSI then that could supersede the output. We had this problem as well due to an NFS mount and some windows users/editors. The most failsafe method I could suggest is changing ö and ü to &246; and &252; but I can understand where this would get tedious.

odd issue with funny characters in Joomla/ Jomsocial

I hope someone can help me with this issue.
For a few months (since last August) there has been an ongoing issue on my site with strange characters appearing all over the place - especially in user generated content.
I have searched and searched for answers but nothing ever seems to work, although the most pressing (in the blog component) has been resolved by setting JCE to validate HTML - which is does fine in the Blogging component (EasyBlog) but doesn't anywhere else (where it is less critical but still an issue).
Here is what I have done so far:
Checked the site from multiple machines, multiple browsers - no difference.
Checked the MySQL database and table collation - which are utf8_general_ci
Added AddDefaultCharset UTF-8 and AddCharset UTF-8 .php to the .htaccess files. I played about with these for ages and these two seemed to be the only combination which didn't crash the site.
Have checked the HTML headers and they definitely have the correct content encoding types (set to UTF-8)
I have tried different WSIWIG editors to no avail. Besides it is often in the code output where the characters appear - typically a A next to a »
I have tried a hack to force the connection script to UTF-8 but this causes the site to crash.
If anyone has any ideas at all as to what I can do still ... I'm all ears (please)
Many thanks in advance
If your server is running PHP 5.4+ I would suggest that you try the following solution described in the JCE forums:
In the Editor Global Configuration, set "Entity Encoding" to "UTF-8"
In the "Custom Configuration Variables" field, add:
keep_nbsp:0
The keep an eye out for the JCE 2.3.2 release which will address this issue.
Things to note:
anywhere the spurious â or  is occurring will have to be edited to remove the characters (once the changes above have been applied to JCE).
the problem is Joomla! 2.5.x's use of get_html_translation_table() which relies on default values and PHP 5.4 changed the default encoding parameter to UTF-8. Previously it defaulted to ISO-8859-1
For the core you could try and modify _decode() in /libraries/joomla/filter/input.php, look for the line (around 644):
$trans_tbl = get_html_translation_table(HTML_ENTITIES);
and change it to:
$trans_tbl = get_html_translation_table(HTML_ENTITIES, ENT_COMPAT, 'ISO-8859-1');

Joomla adds an empty space in some components

I have a problem I've noticed a while ago with a site I'm building.
I've been working with Joomla for a while now and I have never encountered such a problem.
On some components, like featured content, search, hwdvideoshare and some more, quotation marks seem to be added at the very top of the main content area. This causes an extra empty space that pushes the content down.
It's not really acceptable since I am designing a layout that has to be very precise.
Hopefully you guys can help me, I have tried everything.
Open the "index.php" file in the template with notepad++. Then from encoding choose "Convert to UTF8 without BOM", save and reload.
The issue was UTF8 with BOM.
However, it wasn't on the template file.
I started looking for what's in common in all the sites that added the empty space.
It was pagination. Converted the pagination file to UTF8 without BOM and works flawlessly now.

how to debug vb6 richtextbox not showing unicode (chinese) properly

I have a simple vb6 editor type application which has a richtextbox as the editor page. It allows users to key in stuff and the store it into a file which will keep all the text in RTF stored as CDATA in xml.
When you load back the file, it will read it off the xml and load back the rtf. We allow for unicode editing, but my problem is I have a user which is using Windows XP, and they have some problems reading the chinese characters. They show up as gibberish in their pc.
It displays fine in both mine and a coworker's. I've already checked that they have the proper regional language and settings in their system. The install files for east asian language is already checked. And they can see chinese words on websites and even type them out.
I feel like I'm missing something here but I'm at a lost on what to check next? Any ideas on what I could test or check next?
my bad for the poor description skills, if anything is not clear just ask me.
thanks.
~steve
That is weird. Try confirming that your user have the same version of RICHTXT32.OCX ?
Could be a problem with font?
Try using font that supports unicode characters (Arial Unicode).
Or try going to a website with chinese characters and paste it into richtextbox, save it to a file and try loading it from the file.
Does that work?
well they should because i packed the app in vs installer setup package.
and for fonts, it's sim sun, and i've already checked with the users that they do have the sim sun fonts under window/fonts.
Btw i've already updated that the data is actually stored in xml under CDATA, although the rtf chunk is kept as it is.
okie, this seems to be the solution although i don't know why. in my msi setup file i've included the riched.dll so when i installed it in, the dll acts up and screw up my chinese character in the richtext control.
but when i repack to exclude that dll file and reinstall using that setup, it seems to work now...

Resources