I tried to optimize Google web fonts query to include basic latin set + some of latin ext characters vital for my native language (Czech).
https://developers.google.com/webfonts/docs/getting_started?hl=cs#Quick_Start
The link above states that I can modify query to include only some characters to make it significantly lighter. So i tried those characters:
aábcčdďeéěfghchiíjklmnňoópqrřsštťuúůvwxyýzžAÁBCČDĎEÉĚFGHChIÍJKLMNŇOÓPQRŘSŠTŤUÚŮVWXYÝZŽ.,?!;/-_:"'|()[]ˇ+*##$%^&¨®°©
And the query looks like this( because all the "unusual" characters have to be html escaped):
http://fonts.googleapis.com/css?family=Open+Sans:300&%20a%C3%A1bc%C4%8Dd%C4%8Fe%C3%A9%C4%9Bfghchi%C3%ADjklmn%C5%88o%C3%B3pqr%C5%99s%C5%A1t%C5%A5u%C3%BA%C5%AFvwxy%C3%BDz%C5%BEA%C3%81BC%C4%8CD%C4%8EE%C3%89%C4%9AFGHChI%C3%8DJKLMN%C5%87O%C3%93PQR%C5%98S%C5%A0T%C5%A4U%C3%9A%C5%AEVWXY%C3%9DZ%C5%BD.,#$%^&¨®°©
The final result looks like normal and it is only 23KB instead of standard 45KB (with full latin ext charset). The problem is that on some computers, some characters are not properly loaded - they are rendered in Arial( for example "Ě" in word "ODPOVĚDI"). Can anyone help me where could be the problem or how could I trace it next time I see it? Or is it just because this feature is in beta in google web fonts?
This means that the font you are using simple doesn't support your chosen characters. I'm facing this problem and trying to find some solution, but at the moment with no results.
It sounds very much like a beta “feature” (i.e., bug). Generally, beta software is something that you should use only to contribute to testing and improving software in development, so you should report this bug and refrain from using beta software in production.
The bug may relate to different font formats served to different browsers by Google. This may well explain why it works on some browsers and not on others.
The difference between 23KB and 45KB is virtually ignorable these days. A single image often has a greater impact on loading time, and commonly used JavaScript libraries may require hundreds of KB.
You have to choose latin extended option for font to support your langauge character set. There is option to filter available fonts only with latin extended in google fonts site...
Related
Question may be a bit esoteric, however, I wouldn't have posted it had I not found so many clues leading to nowhere.
A cloud based web app works normally for everyone but a few people (possibly on the same network). There seems to be some kind of text injected in random places (note that the text disappears after a site refresh and appears in random intervals) as seen on the picture:
Facts:
"Zrkadlovka na čiernom pozadí" is what is being injected. There is no icon or something like that in its position. The text fields we use are basic vuetify components.
The above is in Slovak language. The web app is not. (The string means literally "a camera on black background" -> there is no icon/image/anything that should remotely convey this information)
The string is not found anywhere in the code.
The string is not found anywhere in the build of the app.
As it wouldn't be weird enough already, if you google the mysterious string you get a TON of results. Some have the text embedded in a <span> or something like that, sometimes with some class such as "wixGuard" but nothing like this is present in our code or the build.
All the websites found with that string on Google look suspicious to say the least which leads me to the idea of it being caused by some kind of malware either server side (websites found on Google) or PC-side (the person viewing our website). Our website runs in the cloud and definitely securely enough for it to not be spoofed. There are no similarities in the websites on Google, neither with our web nor with each other.
How would one even begin debugging this?
At least for some of those websites, a zero with space U+200B is present in other languages where this strange string occurs in Slovak.
So it looks like some translation(?) engine incorrectly translates Zero width space character into Zrkadlovka na čiernom pozadí or for some other reason this invisible space is being replaced with a string.
I would start checking if that is the same for you (if you have Zero width space), but the full analysis might be difficult w/o some internal details of your stack/packages and how the page is generated.
Update: Don't know if you can use google translate (or other tools) to automatically translate your text with vuetify, but seeing how google translate the above text (with ZWSP at the end), it must be something like that.
As far as a I know, in 1987 PC-DOS 3.3 as well as MS-DOS 3.3 were released and they had several code pages (850, 860, 863, 865).
Does it mean that user could write text using Portuguese (cp860) and, say, Nordic (cp865) symbols in one file?
Or it was something like one code page per one operation system. For example, PC-DOS from Portugal had only 860 code page and user could use symbols only from that code page, and PC-DOS from Scandinavia had only 865 code page.
The same question about Windows. Starting from what version it started to support multilingual text documents?
DOS has not really knowledge of code page. They were just ASCII strings (zero or dollar terminated).
Codepage were used mostly for display: changing a code page, it will change how a bytecode is printed on screen.
What you describe here, it is a frequent problem: mixed encoding in one text. If you are old enough, you will remember a lot of such problem in web. The text file has no tag or metadata about the codepage. If you mix it, you will just see the characters according the active codepage. You change the codepage of screen, and you will get a new interpretation of characters.
You can do anything you want in your own file. It's communicating how to read it to others that would be a problem.
So, no, not really. Using more than one character encoding in a file and calling it a text file would be more trouble than it's worth.
What the setting of an operating system does not have a direct relationship on the contents of a file. Programs that exchange files between systems (such as over the Internet) might use an understanding of the source character encoding and a local setting for character encoding and do a lossy transcoding.
Nothing has changed except with the advent of Unicode more than 25 years ago, more scripts than you can imagine are available in one character set. So, if there is any transcoding to be done, ideally, it would only be to UTF-8.
I am looking for a (preferably) command line utility to stamp/watermark unicode text content into a PDF document.
I tried PDF Stamp and a couple of others that I found over the net, but to no avail with Greek characters (e.g. ΓΔΘΛ become ÃÄÈË).
Many thanks for any help!
With sufficiently "odd" characters, you generally need to specify a font and an encoding. I suspect that at least one of the tools you experimented with have the capability to define such things.
Reading their docs, it looks like PDFStamp will let you specify a font, but not an encoding. That doesn't bode well. It might always pick "Identity-H" for system fonts... worth trying.
I must admit, I'm surprised. "Disappointed" even. Have you contacted their email support?
Once upon a time, iText shipped with a number of command line tools that were mostly intended as examples but were none the less useful. I suspect you could dig them out of the SVN archive on sourceforge and get them to build again, if your Java-fu is up to the task. Just be sure to use BaseFont.IDENTITY_H whenever you're given a choice of encodings for a font.
Normally I use Recaptcha for all captcha purposes, but now I'm building a website that is translated into Chinese and Japanese, among other languages. I'd like to make the captcha as accessible to those users as possible. Even if they can read and type English characters (which is not necessarily the case), often times even I as an English-speaker have had trouble figuring out what the word in Recaptcha has to be.
One good solution I've seen (from Google) is to use numbers instead of text. Are there other good solutions? Is there a reliable free captcha service out there such as Recaptcha that offers this option?
The Chinese and Japanese both use a keyboard with Latin characters on. The Chinese input their 1000s of characters via Pinyin (Romanized Chinese) and so they are very familiar with all the same letters that you and I are. Therefore, whatever you are using for English speaking people can also be used for them.
PS - I know this is an answer to an old post, but I'm hoping this answer will help anyone who comes here with the same question.
I have encountered the same problem in the past, I resolved the issue by using the following CAPTCHA which uses a numerical validation:
http://www.tipstricks.org/
However, this may not be the best solution for you, so here is an extensive list of different CAPTCHAs you might want to consider (most of them are text based, but some use alternative methods such as numerical expressions):
http://captcha.org/
Hope this helps
What are the steps to develop a multilingual web application?
Should i store the languages texts and resources in database or should i use property files or resource files?
I understand that I need to use CurrentCulture with C# alone with CultureFormat etc.
I wanted to know you opinions on steps to build a multilingual web application.
Doesn't have to be language specific. I'm just looking for steps to build this.
The specific mechanisms are different depending on the platform you are developing on.
As a cursory set of work items:
Separation of code from content. Generally, resources are compiled into assemblies with the help of resource files (in dot net) or stored in property files (in java, though there are other options), or some other location, and referred to by ID. If you want localization costs to be reasonable, you need to avoid changes to the IDs between releases, as most localization tools will treat new IDs as new content.
Identification of areas in the application which make assumptions about the locale of the user, especially date/time, currency, number formatting or input.
Create some mechanism for locale-specific CSS content; not all fonts work for all languages, and not all font-sizes are sane for all languages. Don't paint yourself into a corner of forcing Thai text to be displayed in 8 pt. Also, text directionality is going to be right-to-left for at least two languages.
Design your page content to reflow or resize reasonably when more or less content than you expect is present. Many languages expand 50-80% from English for short strings, and 30-40% for longer pieces of content (that's a rough rule of thumb, not a law).
Identify cultural presumptions made by your UI designers, and try to make them more neutral, or, if you've got money and sanity to burn, localizable. Mailboxes don't look the same everywhere, hand gestures aren't universal, and something that's cute or clever or relies on a visual pun won't necessarily travel well.
Choose appropriate encodings for your supported languages. It's now reasonable to use UTF-8 for all content that's sent to web browsers, regardless of language.
Choose appropriate collation for your databases, or enable alternate collations, if you are dealing with content in multiple languages in your databases. Case-insensitivity works differently in many languages than it does in English, and accent insensitivity is acceptable in some languages and generally inappropriate in others.
Don't assume words are delimited by spaces or that sentences are delimited by punctuation, if you're trying to support search.
Avoid:
Storing localized content in databases, unless there's a really, really, good reason. And then, think again. If you have content that is somewhat dynamic and representatives of each region need to customize it, it may be reasonable to store certain categories of content with an associated locale ID.
Trying to be clever with string concatenation. Also, try not to assume rules about pluralization or counting work the same for every culture. Make sure, at least, that the order of strings (and controls) can be specified with format strings that are typical your platform, or well documented in your localization kit if you elect to roll your own for some reason.
Presuming that it's ok for code bugs to be fixed by localizers. That's generally not reasonable, at least if you want to deliver your product within a reasonable time at a reasonable cost; it's sometimes not even possible.
The first step is to internationalize. The second step is to localize. The third step is to translate.