how to display blackshashes in SWI-Prolog - prolog

I am trying to display a network share path in my Prolog output code.
The path is like :
\\fileserver\path\to\file.txt (ex1)
or
\\\\fileserver\\path\\to\\file.txt (ex2)
but If I try displaying it using format :
pri(Z):-
format('Printing Zx : \"~w\"',[Z]).
the slashes get truncated to
\fileserverpathtofile.txt (ex1)
Obviously some times, the path may contain \\\\ in which case the display is correct.
How to make it print proper path?
Any help please.
Thanks.

In the Prolog atoms backslash is a meta-character, i.e. if you want your atom to contain a backslash character then you need to escape it using the backslash character. E.g. in order to represent the Windows path \\fileserver\path\to\file.txt as a Prolog atom you need to write
Path = '\\\\fileserver\\path\\to\\file.txt'.
In principle there are two ways of printing stuff out, one for the humans (pretty-printing), using write
?- Path = '\\\\fileserver\\path\\to\\file.txt', write(Path).
\\fileserver\path\to\file.txt
and one for the machines (serializing), using write_canonical
?- Path = '\\\\fileserver\\path\\to\\file.txt', write_canonical(Path).
'\\\\fileserver\\path\\to\\file.txt'
write_canonical makes sure that Prolog can read the output back into the same exact atom.
Your problem seems to be that you do not correctly represent the path in Prolog. If the path comes from an external source, you first need to escape it (add a backslash in front of every backslash) before you can store it as a Prolog atom.

Related

Ruby regexp /(\/.*?(?=\/|$)){2}/

My regexp behaves just like I want it to on http://regexr.com, but not like I want it in irb.
I'm trying to make a regular expression that will match the following:
A forward slash,
then 2 * any number of random characters (i.e. `.*`),
up to but not including another /
OR the end of the string (whichever comes first)
I'm sorry as that was probably unclear, but it's my best attempt at an English translation.
Here's my current attempt and hopefully that will give you a better idea of what I'm trying to do:
/(\/.*?(?=\/|$)){2}/
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar). I'm trying to do this using the command path.scan(-regex-).shift.
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar)
Ruby already has a class for handling paths, Pathname. You can use Pathname#relative_path_from to do what you want.
require 'pathname'
path = Pathname.new("/foo/bar/baz/bin/bash")
# Normally you'd use Pathname.getwd
cwd = Pathname.new("/foo/bar")
# baz/bin/bash
puts path.relative_path_from(cwd)
Regexes just invite problems, like assuming the path separator is /, not honoring escapes, and not dealing with extra /. For example, "//foo/bar//b\\/az/bin/bash". // is particularly common in code which joins together directories using paths.join("/") or "#{dir}/#{file}.
For completeness, the general way you match a single piece of a path is this.
%r{^(/[^/]+)}
That's the beginning of the string, a /, then 1 or more characters which are not /. Using [^/]+ means you don't have to try and match an optional / or end of string, a very useful technique. Using %r{} means less leaning toothpicks.
But this is only applicable to a canonicalized path. It will fail on //foo//b\\/ar/. You can try to fix up the regex to deal with that, or do your own canonicalization, but just use Pathname.

InDesign Grep: Changing sentence beginnings to Uppercase

I am relatively new to scripting and within an InDesign Script I am trying to change all the first letters of all sentences to uppercase (many of the are lowercase, since I randomly generated the setences from different text sources).
I am so far able to find the text parts with this Grep expression:
\.(\s)+\l
I also found this script by Peter Kahrel, that he shares on InDesign Secrets:
app.findGrepPreferences.findWhat = "^.";
found = app.activeDocument.findGrep();
for (i = 0; i < found.length; i++)
found[i].characters[0].changecase (ChangecaseMode.lowercase);
However, when I now replace the ^. with my own expression, and change lowercase to uppercase, the script does not work, which makes sense, since I do not want to change the first character of my findGrep results, but the last one. But how can I find the last character? The breaks between the sentences have different lengths, so I cannot simply type 2 instead of 0.
Any help would be very appreciated! Thank you!
Edit: I'm working on CS6.
Your GREP returns matches that start with a period, then have any number of spaces (including hard returns, probably), and always end with one lowercase character. So far, so good. You can access the last character (and in fact any last item in any InDesign object collection) in this way:
found[i].characters[-1].changecase (ChangecaseMode.lowercase);
which 'indexes' from the end, rather than from the start.
However! The only character in your matches, other than the period and spaces, is always going to be a lowercase letter. So you can skip the entire "how to find the correct index" thing, and probably slightly speed up the script as well, by simply applying lowercase (or, as you are using it, uppercase) to the entire match:
found[i].changecase (ChangecaseMode.lowercase);
because nothing will happen to not-lowercaseable characters (a word I declare to signify "having the Unicode-defined property of being lowercase and having an uppercase equivalent). (Or the other way around, if I understand your purpose correct.)

Is there any character that is illegal in file paths on every OS?

Is there any character that is guaranteed not to appear in any file path on Windows or Unix/Linux/OS X?
I need this because I want to join together a few file paths into a single string, and then split them apart again later.
In the comments, Harry Johnston writes:
The generic solution to this class of problem is to encode the file paths before joining them. For example, if you're dealing with single-byte strings, you could convert them to hex strings; so "hello" becomes "68656c6c6f". (Obviously that isn't the most efficient solution!)
That is absolutely correct. Please don't try to do anything "tricky" with filenames and reserved characters, because it will eventually break in some weird corner case and your successor will have a heck of a time trying to repair the damage.
In fact, if you're trying to be portable, I strongly recommend that you never attempt to create any filenames including any characters other than [a-z0-9_]. (Consider that common filesystems on both Windows and OS X can operate in case-insensitive mode, where FooBar.txt and FOOBAR.TXT are the same identifier.)
A decently compact encoding scheme for practical use would be to make a "whitelisted set" such as [a-z0-9_], and encode any character ch outside your "whitelisted set" as printf("_%2x", ch). So hello.txt becomes hello_2etxt, and hello_world.txt becomes hello_5fworld_2etxt.
Since every _ is escaped, you can use double-_ as a separator: the encoded string hello_2etxt__goodbye___2e_2e uniquely identifies the list of filenames ['hello.txt', 'goodbye', '..'].
You can use a newline character, or specifically CR (decimal code 13) or LF (decimal code 10) if you like. Whether this is suitable or not depends on what requirements you have with regard to displaying the concatenated string to the user - with this approach, it will print its parts on separate lines - which may be very good or very bad for the purpose (or you may not care...).
If you need the concatenated string to print on a single line, edit your question to specify this additional requirement; and we can go from there then.

Using phrase_from_file to read a file's lines

I've been trying to parse a file containing lines of integers using phrase_from_file with the grammar rules
line --> I,line,{integer(I)}.
line --> ['\n'].
thusly: phrase_from_file(line,'input.txt').
It fails, and I got lost very quickly trying to trace it.
I've even tried to print I, but it doesn't even get there.
EDIT::
As none of the solutions below really fit my needs (using read/1 assumes you're reading terms, and sometimes writing that DCG might just take too long), I cannibalized this code I googled, the main changes being the addition of:
read_rest(-1,[]):-!.
read_word(C,[],C) :- ( C=32 ;
C=(-1)
) , !.
If you are using phrase_from_file/2 there is a very simple way to test your programs prior to reading actual files. Simply call the very same non-terminal with phrase/2. Thus, a goal
phrase(line,"1\n2").
is the same as calling
phrase_from_file(line,fichier)
when fichier is a file containing above 3 characters. So you can test and experiment in a very compact manner with phrase/2.
There are further issues #Jan Burse already mentioned. SWI reads in character codes. So you have to write
newline --> "\n".
for a newline. And then you still have to parse integers yourself. But all that is tested much easier with phrase/2. The nice thing is that you can then switch to reading files without changing the actual DCG code.
I guess there is a conceptional problem here. Although I don't know the details of phrase_from_file/2, i.e. which Prolog system you are using, I nevertheless assume that it will produce character codes. So for an integer 123 in the file you will get the character codes 0'1, 0'2 and 0'3. This is probably not what you want.
If you would like to process the characters, you would need to use a non-terminal instead of a bare bone variable I, to fetch them. And instead of the integer test, you would need a character test, and you can do the test earlier:
line --> [I], {0'0=<I, I=<0'9}, line.
Best Regards
P.S.: Instead of going the DCG way, you could also use term read operations. See also:
read numbers from file in prolog and sorting

Colon/Asterisk as a filename delimiter?

I'm looking for a character to use a filename delimiter (I'm storing multiple filenames in a plaintext string). Windows seems not to allow :, ?, *, <, >, ", |, / and \ in filenames. Obviously, \ and / can't be used, since they mean something within a path. Is there any reason why any of those others shouldn't be used? I'm just thinking that, similar to / or \, those other disallowed characters may have special meaning that I shouldn't assume won't be in path names. Of those other 7 characters, are any definitely safe or definitely unsafe to use for this purpose?
The characters : and " are also used in paths. Colon is the drive unit delimiter, and quotation marks are used when spaces are part of a folder or file name.
The charactes * and ? are used as wildcards when searching for files.
The characters < and > are used for redirecting an application's input and output to and from a file.
The character | is used for piping output from one application into input of another application.
I would choose the pipe character for separating file names. It's not used in paths, and its shape has a natural separation quality to it.
An alternative could be to use XML in the string. There is a bit of overhead and some characters need encoding, but the advantage is that it can handle any characters and the format is self explanatory and well defined.
Windows uses the semicolon as a filename delimiter: ;. look at the PATH environment variable, it is filled with ; between path elements.
(Also, in Python, the os.path.pathsep returns ";", while it expands to ":" on Unix)
I have used * in the past. The reason for portability to Linux/Unix. True, technically it can be used on those fileysystems too. In practice, all common OSes use it as a wildcard, thus it's quite uncommon in filenames. Also, people are not surprised if programs do break when you put a * in a filename.
Why dont you use any character with ALT key combination like ‡ (Alt + 0135) as delimiter ?
It is actually possible to create files programmatically with every possible character except \. (At least, this was true at one time and it's possible that Windows has changed its policy since.) Naturally, files containing certain characters will be harder to work with than others.
What were you using to determine which characters Windows allows?
Update: The set of characters allowed by Windows is also be determined by the underlying filesystem, and other factors. There is a blog entry on MSDN that explains this in more detail.
If all you need is the appearance of a colon, and will be creating it programatically, why not make use of a UTF-8 character that just looks like a colon?
My first choice would be the Modifier Letter (U+A789), as it is a typical RTL character and appears a lot like a colon. It is what I use when I need a full DateTime in the filename, such as file_2017-05-04_16꞉45꞉22_clientNo.jpg
I would stay away from characters like the Hebrew Punctuation Sof Pasuq (U+05C3), as it is a LTR character and may mess with how a system aligns the file name itself.

Resources