CLIPS: How to print special characters and letters like "ñ"? - clips

I'm coding a spanish expert system, so I have to print words with letters "ñ" or "á,é,í" or "¿" and cannot figure it out yet, is there a way to do it?

This is probably an issue with your terminal. Where are you running your programs? The best solution might be to search google for issues specific to your coding environment. (For example, I'm on macOS, my terminal has no issues with this.)
Also, if you just want a quick solution, most languages have support for Unicode-encoded strings. For example, here's how you would do it in Python (note, ñ is 361 in octal Unicode):
>>> print('ni\361o')
niño
However, if your issue persists I would recommend you try searching for users who had similar problems who also use your coding environment. I hope you're able to find a solution :)

Starting with version 6.3, CLIPS provides internal support for UTF-8 strings. So if you're using a text editor that supports UTF-8 encodings you can write programs with languages using diacritical marks (such as Spanish), other non-Latin characters sets with a small alphabet (such as Russia), and languages with complex glyphs (such as Japanese):
(deffacts hello
(hello "Hello World")
(hello "Привет мир")
(hello "مرحبا العالم")
(hello "你好世界")
(hello "ハローワールド")
(hello "여러분, 안녕하세요")
(hello "Olá Mundo"))
(defrule hello
(hello ?h)
=>
(printout t ?h crlf))
Input and output of these characters sets depends on the environment in which CLIPS is run. If you run CLIPS as a terminal application on macOS or Linux, UTF-8 is supported. The macOS IDE for CLIPS 6.3 and 6.4 support UTF-8.
CLIPSJNI, http://www.clipsrules.net/CLIPSJNI.html, also demonstrates how you can embed CLIPS in an environment supporting UTF-8 (in this case Java) to take advantage of CLIPS internal support of UTF-8.
The Windows IDE for CLIPS 6.3 does not support UTF-8 I/O. It uses older Win32 APIs and was not designed with unicode support in mind. The Windows IDE for CLIPS 6.4 uses the modern .NET frameworks which support UTF-8 I/O.
By default, Windows Command Prompt does not provide support for UTF-8 I/O, but you can enable it to provide some limited support. Launch Windows Settings and click "Time & Language", then "Language", and then "Administrative language settings". In the window that appears, click the "Change system locale..." button. Check the "Beta: Use Unicode UTF-8 for worldwide language support" check box and then click the OK button. UTF-8 output support will now be enabled in Command Prompt for languages that use a single codepoint for characters (such as Spanish and Russian). Unfortunately while input for diacritical marks and non-Latin characters appear correctly when being entered, they do not appear to be correctly passed to CLIPS (as occurs when running CLIPS as a terminal application on macOS or Linux).

Related

Spell-checker in OSX not recognizing even basic English words

Regardless of which application I use in mac OS (Sierra), even the most basic English words are not being recognized by the spell checker. Is there any way to "reset" the Dictionary in mac OS?
Press command-shift-semicolon to get into the spelling checker preferences, and check if you have picked a language other than English.
If it's set to German, yes, then it won't recognise even basic English words.

Is it possible to display Chinese characters if I don't install files for East Asian languages for my English Windows XP?

As you know, we can install files for East Asian language in Control Panel-->Regional and language options-->Languages tab-->Supplemental language support.
The question is: if I don't install this files (by unchecking the checkbox) for my English Windows XP, does that mean none application on the PC can display Chinese characters properly?
Or, if a app says that it's "UNICODE compatible", does this mean that it can handle the Chinese characters properly even when we don't have East Asian language support on our pc?
(I don't have the permission to uncheck the checkbox and test it on my own, so I hope I can get an answer from you guys.)
Any answers will be appreciated.
If an application is operating system dependent, you won't be able to see Chinese characters without adding supplemental language support. But os independent softwares will not be affected by that. So, it completely depends on the softwares you are using.

Why isn't UTF-8 allowed as the "ANSI" code page?

The Windows _setmbcp function allows any valid code page...
(except UTF-7 and UTF-8, which are not supported)
OK, not supporting UTF-7 makes sense: Characters have non-unique representations and that introduces complexity and security risks.
But why not UTF-8?
As I understand it, the "ANSI" versions of the Windows API functions convert their arguments to UTF-16, call the equivalent "W" function, and convert any strings in the output to "ANSI". This is what I've been doing manually. So why can't Windows do it for me?
The "ANSI" codepage is basically legacy: Windows 9X era. All modern software should be Unicode (that is, UTF-16) based anyway.
Basically, when the Ansi code page stuff was originally designed, UTF-8 wasn't even invented and so support for multi-byte encodings was rather haphazard (i.e. most Ansi code pages are single byte, with the exception of some East Asian code pages which are one-or-two byte). Adding support for "proper" multi-byte encodings was probably deemed not worth the effort when all new development should be done in UTF-16 anyway.
_setmbcp() is a VC++ RTL function, not a Win32 API function. It only affects how the RTL interprets strings. It has no effect whatsoever on Win32 API A functions. When they call their W counterparts internally, the A functions always use MultiByteToWideChar() and WideCharToMultiByte() specifying codepage 0 (CP_ACP) to use the system default Ansi codepage for the conversions.
Michael Kaplan, an internationalization expert from Microsoft, tried to answer this on his blog.
Basically his explanation is that even though the "ANSI" versions of Windows API functions are meant to handle different code pages, historically there was an implicit expectation that character encodings would require at most two bytes per code point. UTF-8 doesn't meet that expectation, and changing all of those functions now would require a massive amount of testing.
The reason is exactly like what was said in jamesdlin's answers and the comments below it: MBCS is the same as DBCS in Windows and some functions don't work with characters that are longer than 2 bytes
Microsoft said that a UTF-8 locale might break some functions as they were written to assume multibyte encodings used no more than 2 bytes per character, thus code pages with more bytes such as UTF-8 (and also GB 18030, cp54936) could not be set as the locale.
https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows#UTF-8
So UTF-8 was allowed in functions like read/write but not when using as a locale
However Microsoft has finally fixed that so now we can use UTF-8 as a locale. In fact MS even started recommending the ANSI APIs (-A) again instead of the Unicode (-W) versions like before. There are some new options in MSVC: /execution-charset:utf-8 and /utf-8 to set the charset, or you can also set the ActiveCodePage property in appxmanifest of the UWP app
Since Windows 10 insider build 17035, before those options were introduced, a "Beta: Use Unicode UTF-8 for worldwide language support" checkbox had also been added for setting the locale code page to UTF-8
To open that dialog box open start menu, type "region" and select Region settings > Additional date, time & regional settings > Change date, time, or number formats > Administrative
After enabling it you can call setlocale() to change to UTF-8 locale:
Starting in Windows 10 build 17134 (April 2018 Update), the Universal C Runtime supports using a UTF-8 code page. This means that char strings passed to C runtime functions will expect strings in the UTF-8 encoding. To enable UTF-8 mode, use "UTF-8" as the code page when using setlocale. For example, setlocale(LC_ALL, ".utf8") will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.
UTF-8 Support
You can also use this in older Windows versions
To use this feature on an OS prior to Windows 10, such as Windows 7, you must use app-local deployment or link statically using version 17134 of the Windows SDK or later. For Windows 10 operating systems prior to 17134, only static linking is supported.
See also
Is it possible to set “locale” of a Windows application to UTF-8?

Can I avoid using CP1252 on Windows?

I would like all my toolkit to use UTF-8 but find that some tools on Windows seem to use CP1252 (which appears to be Windows-specific). Does this create output which is incompatible and if so at which codepoints? If so, can I do anything about it?
(I don't completely understand the issues so I'd be grateful for basic education on these encodings).
Tools hard-coding for code page 1252 on Windows is very unlikely. Much more likely is that it happens to be the default code page on your machine. 1252 is used in Western Europe and the Americas. It is configured in Control Panel, Regional and Language options. They've been using different names for it, on Win7 it is in the Administrative tab, Change System Locale.
Yes, many tools use the default code page unless they have a good reason to chose another encoding. The BOM is such a good reason. Notable examples are Notepad (unless you change the Encoding in the File + Open dialog to something else than Ansi) and C/C++ compilers. There typically isn't anything special you need to do to use the default code page. Guessing the correct code page for a text file when you don't have a BOM is impossible to do accurately. Google "bush hid the facts" for a very amusing war story.
Six years old and still relevant: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Now, about your question: Yes, there are still tools out there that choke on UTF-8 files. But more and more tools are "getting it". If you're developing your own stuff, you might want to look into Python 3 where all strings are Unicode. The philosophy is to convert all your inputs into Unicode (if necessary) as early as possible, and reconvert them to a target encoding as late as possible. There are toolkits out there that will do a good job of guessing the encoding of a particular file (for example, Mark Pilgrim's chardet, a port of Mozilla's encoding detector). This is nice if you're working with files that don't specify an encoding.
CP1252 and UTF-8 are the same for all characters < 128. They differ above that. So if you stick to English and stay away from diacritical marks these will be the same.
Most of the Windows tools will use whatever is set as the current user's current codepage, which will default to 1252 for US Windows. You can change that to another codepage pretty easily. But UTF-8 is NOT one of the available codepage options for Windows. (I wish it was).
Some utilities under Windows will understand the UTF-8 byte-order mark at the start of a file. Unfortunately I don't know how to determine if this will work except to try it.
UTF-8 is supported on Windows but not as a current codepage. You can use UTF-8 for converting to/from it but you cannot set is as current codepage.
First do not try to waste time by setting the codepage - this approach will remind you of Sisyphus myth - you can't really solve the problem using codepages, you have to use Unicode.
The only real solution for you is to build your application as Unicode so it will use UTF-16 and to convert to/from UTF-8 on in/out operations. This is done quite simple because fopen supports reading or writing UTF-8.
Regarding the usage of other Windows tools with UTF-8 file, you should not be aware because if the tool is able to work with ASCII it will work with UTF-8 (even so it may not be able to distinguish between Unicode chars but at least it will be able to load/parse the files).
BTW, You forgot to specify what programming language are you using and what Windows tools are you considering for usage.
Also, if you ware interested about more internationalization stuff please visit my blog.i18n.ro

i18n shell in windows

Is there an i18n shell in windows that supports a large character set? Testing my application in windows results in some math characters not being rendered correctly. The Lucida font in cmd.exe and powershell do not have a wide enough selection.
Unicode UTF-8 would be the most preferable, followed by the other Unicode encodings.
I'm not sure if this is a problem in the font or the console itself but you could try installing the DejaVu Sans Mono font and see if that provides the necessary characters.
CMD.EXE supports it just fine; the issue is that it is doesn't allow a whole lot of other fonts by default and Lucida Console, usually the only TrueType font there, has no fonts defined in the font fallback chain. See http://www.siao2.com/2008/03/19/8323216.aspx and the screenshots I link to in the comments for that blog post.
You may want to see http://www.siao2.com/2006/10/19/842895.aspx on how to make more fonts appear amongst those you can choose as the main console font.
Also, make sure that your application really uses a Unicode codepage for its output - http://illegalargumentexception.blogspot.com/2009/04/i18n-unicode-at-windows-command-prompt.html probably explains the issue better than I could (or, at the very least, as well as I could).
I just found the ActiveState Tcl does a really good job with tkcon.
When starting tkcon.tcl, I just have to type:
encoding system utf-8
It works well and even has tab completion. Of course, it is a Tcl shell and not a system shell.
It seems to be able to find characters for all of the symbols I am currently using in the test suite for my application.
While working under Windows, I use the DejaVu Sans Mono font along with Console for getting better Unicode (UTF-8) support.

Resources