How to check that a string contains only certain characters in Prolog? - prolog

I want to write a predicate containsOnly(X,Y), which returns true, if string X contains only characters from string Y.
I wrote it this way:
containsOnly([],_).
containsOnly([H|T],AcceptableCharacters) :-
member(H, AcceptableCharacters),
containsOnly(T,AcceptableCharacters).
But the queries below return false. How can I modify the predicate in order for them to return true?
containsOnly('A', 'ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜabcdefghijklmnopqrstuvwxyzäöüАБВГДЕЁЖЗИКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзиклмнопрстуфхцчшщъыьэюя-').
containsOnly('a', 'ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÜabcdefghijklmnopqrstuvwxyzäöüАБВГДЕЁЖЗИКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯабвгдеёжзиклмнопрстуфхцчшщъыьэюя-').

working with atoms, as in your question, an ISO compliant solution:
containsOnly(X,Y) :- forall(sub_atom(X,_,1,_,C), sub_atom(Y,_,1,_,C)).
while SWI-Prolog ver. >= 7 seems to accept it for 'strings' also.

Your problem is the datatype. You use atoms, but you treat them as char/code lists. You can try use double quotes " instead of single quotes ' and see if this helps.
With SWI-Prolog 7 or later, it won't help. You would have to use backticks instead of double quotes.
You really should figure out the documentation and the datatypes though.
This is a list of codes in SWI-Prolog 7:
`абвгд`
And 0'x is Prolog notation for character codes:
?- X = 0'г.
X = 1075.
?- X = `абв`.
X = [1072, 1073, 1074].
Another thing: if you are using SWI-Prolog, you should use memberchk/2 instead of member/2 in this particular case. If this is an efficiency bottleneck, however, you might also consider using the method described at the very bottom of this page. (This whole section of the manual is very important if you are going to be dealing with text in your SWI-Prolog program.)

Related

SWI Prolog syntax need to write more than one word in term

How can I write more than one word in hypothesize term and other side? For example I want to write small cat
hypothesize(cat) :- cat, !.
I tried to write it with underscore but in the result the underscore appear which I don't want.
This is the full code.
https://www.cpp.edu/~jrfisher/www/prolog_tutorial/2_17.html
Besides underscores (small_cat) you can also write multi-word atoms with single quotes: 'small cat'.
For example:
?- Cat = 'small cat'.
Cat = 'small cat'.
Maybe you don't want the quotes in the output. In this case, the good news is that if you use write for printing your answer, like the code you linked, the quotes will not be printed:
?- write('small cat').
small cat
true.
If you do want quotes when writing such atoms, you can use writeq ("write quoted"):
?- writeq('small cat').
'small cat'
true.

Should text-processing DCGs be written to handle codes or chars? Or both?

In Prolog, there are traditionally two ways of representing a sequence of characters:
As a list of chars, which are atoms of length 1.
As a list of codes, which are just integers. The integers are to be interpreted as codepoints, but the convention to be applied is left unspecified. As a (eminently sane) example, in SWI-Prolog, the space of codepoints is Unicode (thus, roughly, the codepoint-integers range from 0 and 0x10FFFF).
DCGs, a notational way of writing left-to-right list processing code, are designed to perfom parsing on "lists of exploded text". Depending on preference, the lists to-be-handled can be lists of chars or lists of codes. However, the notation for char/code processing differs when writing down the constants. Does one generally write the DCG in "char style" or "code style"? Or maybe even in char/code style for portability in case of modules exporting DCG nonterminals?
Some Research
The following notations can be used to express constants in DCGs
'a': A char (as usual: single quotes indicate an atom, and they can be left out if the token starts with a lowercase letter.)
0'a: the code of a .
['a','b']: A list of char.
[ 0'a, 0'b ]: A list of codes, namely the codes for a and b (so you can avoid typing in the actual codepoint values).
"a" a list of codes. Traditionally, a double-quoted string is exploded into a list of codes, and this notation also works SWI-Prolog in DCG contexts, even though SWI-Prolog maps a "double-quoted string" to the special string datatype otherwise.
`0123`. Traditonally, text within back-quotes is mapped to an atom (I think, the 95 ISO Standard just avoids being specific regarding the meaning of a back-quoted string. "It would be a valid extension of this part of ISO/IEC 13211 to define a back quoted string as denoting a character string constant."). In SWI-Prolog, text within back-quotes is exploded into a list of codes unless the flag back_quotes has been set to demand a different behaviour.
Examples
Char style
Trying to recognize "any digit" in "char style" and make its "char representation" available in C:
zero(C) --> [C],{C = '0'}.
nonzero(C) --> [C],{member(C,['1','2','3','4','5','6','7','8','9'])}.
any_digit(C) --> zero(C).
any_digit(C) --> nonzero(C).
Code style
Trying to recognize "any digit" in "code style":
zero(C) --> [C],{C = 0'0}.
nonzero(C) --> [C],{member(C,[0'1,0'2,0'3,0'4,0'5,0'6,0'7,0'8,0'9])}.
any_digit(C) --> zero(C).
any_digit(C) --> nonzero(C).
Char/Code transparent style
DCGs can be written as "char/code transparent style" by duplicating the rules involving constants. In the above example:
zero(C) --> [C],{C = '0'}.
zero(C) --> [C],{C = 0'0}.
nonzero(C) --> [C],{member(C,['1','2','3','4','5','6','7','8','9'])}.
nonzero(C) --> [C],{member(C,[0'1,0'2,0'3,0'4,0'5,0'6,0'7,0'8,0'9])}.
any_digit(C) --> zero(C).
any_digit(C) --> nonzero(C).
The above also accepts a sequence of alternating codes and chars (as lists of stuff cannot be typed). This is probably not a problem). When generating, one will get arbitrary char/code mixes which are unwanted, and then cuts need to be added.
Char/Code transparent style taking an additional Mode indicator
Another approach would be to explicitly indicate the mode. Looks clean:
zero(C,chars) --> [C],{C = '0'}.
zero(C,codes) --> [C],{C = 0'0}.
nonzero(C,chars) --> [C],{member(C,['1','2','3','4','5','6','7','8','9'])}.
nonzero(C,codes) --> [C],{member(C,[0'1,0'2,0'3,0'4,0'5,0'6,0'7,0'8,0'9])}.
any_digit(C,Mode) --> zero(C,Mode).
any_digit(C,Mode) --> nonzero(C,Mode).
Char/Code transparent style using dialect features
Alternatively, features of the Prolog dialect can be used to achieve char/code transparency. In SWI-Prolog, there is code_type/2, which actually works on codes and chars (there is a corresponding char_type/2 but IMHO there should be only chary_type/2 working for chars and codes in any case) and for "digit-class" codes and chars yield the compound digit(X):
?- code_type(0'9,digit(X)).
X = 9.
?- code_type('9',digit(X)).
X = 9.
?- findall(W,code_type('9',W),B).
B = [alnum,csym,prolog_identifier_continue,ascii,
digit,graph,to_lower(57),to_upper(57),
digit(9),xdigit(9)].
And so one can write this for clean char/code transparency:
zero(C) --> [C],{code_type(C,digit(0)}.
nonzero(C) --> [C],{code_type(C,digit(X),X>0}.
any_digit(C) --> zero(C).
any_digit(C) --> nonzero(C).
In SWI-Prolog in particular
SWI-Prolog by default prefers codes. Try this:
The flags
back_quotes
double_quotes
influence interpretation of "string" and `string` in "standard code". By default"string" is interpreted as an atomic "string" whereas `string` is interpreted as a "list of codes".
Outside of DCGs, the following holds in SWI-Prolog, with all flags at their default:
?- string("foo"),\+atom("foo"),\+is_list("foo").
true.
?- L=`foo`.
L = [102,111,111].
However, in DCGs, both "string" and `string` are interpreted as "codes" by default.
Without any settings changed, consider this DCG:
representation(double_quotes) --> "bar". % SWI-Prolog decomposes this into CODES
representation(back_quotes) --> `bar`. % SWI-Prolog decomposes this into CODES
representation(explicit_codes_1) --> [98,97,114]. % explicit CODES (as obtained via atom_codes(bar,Codes))
representation(explicit_codes_2) --> [0'b,0'a,0'r]. % explicit CODES
representation(explicit_chars) --> ['b','a','r']. % explicit CHARS
Which of the above matches codes?
?-
findall(X,
(atom_codes(bar,Codes),
phrase(representation(X),Codes,[])),
Reps).
Reps = [double_quotes,back_quotes,explicit_codes_1,explicit_codes_2].
Which of the above matches chars?
?- findall(X,
(atom_chars(bar,Chars),phrase(representation(X),Chars,[])),
Reps).
Reps = [explicit_chars].
When starting swipl with swipl --traditional the backquoted representation is rejected with Syntax error: Operator expected , but otherwise nothing changes.
The Prolog Standard (6.3.7) says:
A double quoted list is either an atom (6.3.1.3) or a list (6.3.5).
Consequently, the following should succeed:
Welcome to SWI-Prolog (threaded, 64 bits, version 7.6.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free software.
Please run ?- license. for legal details.
For online help and background, visit http://www.swi-prolog.org
For built-in help, use ?- help(Topic). or ?- apropos(Word).
?- Foo = "foo", (atom(Foo) ; Foo = [F, O, O]).
false.
So SWI-Prolog is not a Prolog by default. That's OK, but if you want to know about SWI-Prolog's non-Prolog behavior, please adjust the tags on the question.
From the definition it also follows that double quoted lists are completely useless by default even in a conforming Prolog: They might denote atoms, so regardless of the chars/codes distinction you can't even know that the double quoted list is actually a list. Even DCGs that only care about structural properties of the "text" (whether it's a palindrome, for example) are useless if the "list" is in fact an atom.
Hence a Prolog program that wants to process text with DCGs must at startup force the double_quotes flag to the value it wants. You have the choice between codes and chars. Codes have no advantages over chars, but they do have disadvantages in readability and typeability. Thus:
Answer: Use chars. Set the double_quotes flag explicity.
I should start to by noting that the answer to the "Should text-processing DCGs be written to handle codes or chars? Or both?" question can be neither. DCGs work by using an implicit difference list to thread state. But the elements of that difference list can be other than chars or codes. It depends on the output of text tokenization and what exactly text processing entails. E.g. I have worked on and come across Prolog NLP applications where codes/chars were only used for the basic tokenization and the reasoning was performed (still with DCGS) using either atoms or compound terms that reified the token data (e.g. v(Verb) or n(Noun)). One of those applications (a personal assistant like it's common nowadays in phones) used atoms produced by a voice-recognition component.
But let's go back to chars vs codes. Legacy practices and failed standardization left Prolog with problematic text representation. ASCII gives us a singe quote, a back quote, and a double-quote. With single quotes being used for atoms, a choice could have been made to use e.g. back quotes for representing a list of codes and double-quotes for representing a list of chars. Or the other way around. Instead, and this is where standardization failed, we got the problematic double_quotes flag. There's no shortage of Prolog code in the wild that makes an assumption about the meaning of double-quoted terms and thus works or breaks depending on the implicit value of the double_quotes flag (if you think this is mainly an issue with legacy code, think again). Guess what happens when we try to combine code that require different values for the flag? Note that, in almost all systems (including those that support modules), the flag value is global ... As Isabelle wrote in her answer, setting the flag explicitly is good general advice. But not, as I explained, without problems.
Some systems provide additional values for the flag. E.g. SWI-Prolog allows the flag to also be set to string. GNU Prolog supports additional atom_no_escape, chars_no_escape and codes_no_escape. Some systems only support codes. Some systems also provide a back_quotes flag. This Babel tower means that portable and resilient code is often forced to use atoms to represent text. But this is may not be ideal from a performance perspective.
Back to the original question. As Isabelle mentioned, chars is usually a more readable (read, easier to debug) choice. But, depending on the Prolog system, codes may provide better performance. If application performance is critical, benchmark both solutions. Some recent Prolog systems (e.g. Scryer-Prolog or Trealla Prolog) have efficient support for chars. Older systems may trail behind.
Note that your question is very much related to I/O. Prior to ISO, many systems in the DEC-10 succession supported a single kind of I/O via get0/1 and put/1 (and versions for tty) which served both characters and bytes at the same time. What can go wrong with that? Today, that is obvious. But multi-octet character set handling (MOCSH as it was called) was for many a much more exotic feature as it is today, a quarter century after the standard's publication. After all, the today mostly accepted UTF-8 encoding was invented 1992-09 and first published in 1993. And like so many projects like TRON it could have failed as well. Some other programming languages got burnt by betting on UCS-2/UTF-16 encoding.
What the standard did was to split I/O into character and byte I/O (and their corresponding types text and binary). So there is now get_char/1, get_byte/1 ... That the _byte versions all use integers in the range of 0..255 was non-controversial (plus -1 for EOF). But what about the _char versions? The only way to resolve this was to provide both _char and _code versions and consequently chars and codes versions of double-quoted strings and related built-ins. The default for flag double_quotes is implementation defined (7.11.2.5).
In this manner systems with a lot of DEC-10 legacy could continue to use codes explicitly. For them, an integer thus meant either an integer or a byte or a character. But users of such system still could use the better encoding. New systems that do not have to deal with such legacies going back to 1977 opt as default for chars like Tau, Scryer, and Trealla. As much as tradition is concerned, note that Prolog I, often called Marseille Prolog, did encode double quoted strings as lists of atoms of length one. And in the preliminary version of Prolog of 1972, often called Prolog 0, strings were encoded as nil-s-t-r-i-n-g qua boum facilitating stemming. In any case, not a single character code was present at all.
The advantages of chars should be obvious. It is much easier to read and debug, in particular if you have partially instantiated strings, say [a,X,c] vs. [97,X,99], which occur often when generalizing queries as with library(diadem). It is also a bit shorter to write. And, double quoted strings can be used for printing answers.
If you really want to write programs that both support codes and chars at the same time, use rather goals like [Ch] = "a" where Ch is now the atom a or the integer 97 or 129 or whatever processor character set you are using. It all depends on the Prolog flag double_quotes. And more succinctly you can write
nonzero(C) --> [C],{member(C,"123456789")}.
What is even more important is that phrase("abc", "abc") still holds.
However, changing that flag within the same application is certainly not a good idea (nor to switch to the value atom or some non-conforming value).
((When using chars note that single quotes as in C = 'a' are a bit misleading since the single quotes do not serve any purpose. Instead, round brackets are preferable if you want to ensure that the code will be valid even in the presence of an operator declaration for a. When a occurs as a functor's argument or a list's element, no round brackets are needed, but they are often used redundantly in operator declarations.))
You are making incorrect assumptions. These are not "chars":
foo_or_bar(foo) --> "foo".
The "foo" is a string, in SWI-Prolog, but this works perfectly within a DCG rule definition. The place to read about this is here, in particular:
A DCG literal
Although represented as a list of codes is the correct representation for handling in DCGs, the DCG translator can recognise the literal and convert it to the proper representation. Such code need not be modified.
All your other suggestions are just unnecessary, you should be either enumerating explicitly all possible "nonzeros", digits and so on, or using the library.
PS: if your main goal is to write code that runs on any Prolog, you might as well use something like Logtalk instead.

converting character list to list of strings with each one contains 3 characters in Prolog

I want to convert list of character to list of strings with each one contains 3 characters using Prolog.
For example, ['a','b','c','d','e','f'] will be converted to ["abc", "def"].
What I tried was
toTriplets([H1|[H2|[H3|T]]],[F|R]) :- string_concat(H1,H2,X) , string_concat(X,H3,F) , toTriplets(T,R).
However, this gives me false when I execute the command
?- toTriplets(['a','b','c','d','e','f'],X).
false.
What is the problem with my code? I really can't figure it out...
The problem is you're missing the base case of recursion.
For instance
toTriplets([],[]).
Indeed there you have the chance to decide if accept only triples - as above - or accept also incomplete ones, like
toTriplets([A,B],[AB]) :- string_concat(A,B,AB).
toTriplets([A],[S]) :- atom_string(A,S).

How to get SWI-Prolog to always print strings with quotes, in interactive mode

When using SWI-prolog, it will print output that doesn't need to be quoted (output that doesn't contain special characters), without quotes.
As an example:
?- p('this_is_a_string').
true.
?- p(X).
X = this_is_a_string.
I would like Prolog to always output with quotes. It is okay if my output ends up quoting stuff like functor names, which were not originally quoted when input. How can I achieve this?
To change the default behaviour of SWI-Prolog's top-level output, you want to look towards setting a flag. If you query the appropriate flag, you should find this is your default output:
?- current_prolog_flag(print_write_options, Options).
Options = [portray('true'), quoted('true'), numbervars('true')].
In this particular case, the needed flag is already set: we have portray('true') in our Options.
portray/1 is a dynamic predicate that you can assert. It's called with a ground term when it is being printed, if it succeeds then it's assumed the the term has been printed.
So in your case you can assert the following:
portray(Term) :- atom(Term), format("'~s'", Term).
Now you'll get the desired behaviour:
?- p(this_is_an_atom).
true.
?- p(X).
X = 'this_is_an_atom'.
You could add this to your .swiplrc file if it's something you want all the time. Note, this will have no effect on write/1 and similar predicates, you'll need to use this instead:
?- write_term(foo, [portray(true)]).
'foo'.
To add the additional requirement of escaping characters in the the atom, you'll either need to implement your own DCG to the ISO standard for escaping characters, or abuse the built-in one. To do this you can write out to an atom and see if you need to add your single-quotes, or if they would already be there. The case of X = (\). is most easily handled in its own clause, you can then choose if you wish to print '\\' or (\).
portray(\) :-
format("'\\\\'"), !.
portray(Term) :-
atom(Term), \+ Term = (\),
( with_output_to(chars(['\''|_]),
write_term(Term, [quoted(true), character_escapes(true), portrayed(false)]))
-> format("~q", Term)
; format("'~s'", Term)
).

Prolog: declaring an operator

I have defined ! (factorial) function and registered it as arithmetic function and an operator, so that I can execute: A is 6!.
Now I'd like to define !! (factorial of odd numbers), but the same way - writing clauses, registering arithmetic_function and operator, calling A is 7!! - results in
SyntaxError: Operator expected
How should I, if possible, register !! operator ?
Yes, I realize, ! is normally the cut.
! is a so-called solo character, you cannot have two in a row. If it were not, you could not write for example:
c :- !.
but would instead have to write:
c :- ! .
because "!." would otherwise be interpreted as a single token.
Also, if you let ! be an operator, both versions are invalid syntax (yes, SWI still accepts it, but for example GNU Prolog does not). You need to write:
c :- (!).
because operators that are operands need to be bracketed. Instead of !, use for example "f" and "ff", or fitting Unicode characters for your use case.

Resources