cd using empty string inconsistencies - shell

According to "chdir" xopen specification, using an empty string ("") as argument should results in an error (enoent):
[ENOENT]
A component of path does not name an existing directory or path is an empty string.
I've checked many different combinations of OSes and shells using command;
cd ""
which eventually calls "chdir" system call, with argv == 2, and argv[1] pointing to an empty string.
The results is that only some ksh93 (not all versions) on Linux (not on AIX) returns an error. "/bin/sh" always success but on AIX it moves to $HOME and on linux cwd is unchanged
Why so many differences?

Check section 4, Shell & Utilities of the IEEE Std 1003.1™ or Open Group Base Specification.
This contains a separate page for cd, which says:
The cd utility shall then perform actions equivalent to the chdir()
function called with curpath as the path argument. If these actions
fail for any reason, the cd utility shall display an appropriate
error message and the remainder of this step shall not be executed.
This would suggest that the ksh93 that fails on cd "" is actually working according to spec. This is what I see on Ubuntu 14.04, ksh Version AJM 93u+ 2012-08-01.

You are comparing apples and pears here.
The xopen specification you are quoting, refers to the C-function chdir.
The two shells I'm using (bash and zsh) have an internal command cd, and in both shells, a
cd ''
is interpreted as a no-op. This is explained in the man pages, for example for bash:
A null directory name in CDPATH is the same as the current directory, i.e., ``.''.
So this is the intended behaviour. Note that the standard you are quoting doesn't say anything about the shell's cd command.
I didn't check how the developers of bash and zsh actually implemented the cd command, but if they want to comply to their own specification, they must implement it (in C) similar to this:
if(argc == 0) {
chdir(getenv("HOME"));
} else if(strlen(argv[1]) == 0) {
chdir(".");
} else {
chdir(argv[1]);
}
If it's not done in this way, the behaviour of the chdir command would depend on the underlying implementation of the system library (and, yes, on the conformance to the xopen standard), and this would certainly be a bug in the shell implementation (though a different one than you are referring to).
UPDATE: As CoolRaoul correctly noted in his comment, my quote of the bash manpage is not relevant here, as it refers only to an empty element in the CDPATH, not to an empty argument of the cd command. While it is reasonable to assume that the effect in both cases should be the same, this is not explicitly specified. The same is true for the zsh manpage. In both manpages, it is also not explicitly said that the cd command invokes the C function chdir (although this also can reasonably assumed), nor do they seem to refer to any compliance to the xopen specification. At least for bash and zsh, I think we can safely say that the behaviour of cd "" is simply unspecified.
BTW, I also tried it with the ksh which comes with Cygwin (and which identifies itself as MIRBSD KSH R50), and it behaves in the same way as bash and zsh.

As suggested previously, you can trace through the cd open group page to find the behavior (my notes are the bullet-points):
If no directory operand is given and the HOME environment variable is empty or undefined, the default behavior is implementation-defined and no further steps shall be taken.
This is not true, as there is a directory operand, it's just a zero-length string
If no directory operand is given and the HOME environment variable is set to a non-empty value, the cd utility shall behave as if the directory named in the HOME environment variable was specified as the directory operand.
See above
If the directory operand begins with a <slash> character, set curpath to the operand and proceed to step 7.
Again false, go to the next step
If the first component of the directory operand is dot or dot-dot, proceed to step 6.
false again
Starting with the first pathname in the <colon>-separated pathnames of CDPATH (see the ENVIRONMENT VARIABLES section) if the pathname is non-null, test if the concatenation of that pathname, a <slash> character if that pathname did not end with a <slash> character, and the directory operand names a directory. If the pathname is null, test if the concatenation of dot, a <slash> character, and the operand names a directory. In either case, if the resulting string names an existing directory, set curpath to that string and proceed to step 7. Otherwise, repeat this step with the next pathname in CDPATH until all pathnames have been tested.
Here's where we hit the meat. By the specification, if CDPATH is set and a pathname in there points to a directory, it will find the first existing pathname. So if CDPATH is /foo:/bar:/baz and /foo does not exist, cd will first try /foo/ and fail this step. It will then try /bar/. If /bar exists as a directory, it will set curpath to /bar/ and proceed. If CDPATH is null, it will test ./ to see if it points to a directory (and it will, typically, because this is your pwd).
Set curpath to the directory operand.
In other words, if CDPATH is set but none of its components exist, it will just use the directory operand, which is an empty string.
If the -P option is in effect, proceed to step 10. If curpath does not begin with a <slash> character, set curpath to the string formed by the concatenation of the value of PWD, a <slash> character if the value of PWD did not end with a <slash> character, and curpath.
If we hit step 6, this will set curpath to the PWD, as it will be $(pwd)/.
In essence, by this step, if CDPATH is set and has an existing directory as a component, the first existing component will be what curpath is now, otherwise curpath will be PWD (or possibly PWD/./ to the same effect)
The curpath value shall then be converted to canonical form as follows, considering each component from beginning to end, in sequence:
Dot components and any <slash> characters that separate them from the next component shall be deleted.
For each dot-dot component, if there is a preceding component and it is neither root nor dot-dot, then:
If the preceding component does not refer (in the context of pathname resolution with symbolic links followed) to a directory, then the cd utility shall display an appropriate error message and no further steps shall be taken.
The preceding component, all <slash> characters separating the preceding component from dot-dot, dot-dot, and all characters separating dot-dot from the following component (if any) shall be deleted.
An implementation may further simplify curpath by removing any trailing <slash> characters that are not also leading characters, replacing multiple non-leading consecutive characters with a single <slash>, and replacing three or more leading <slash> characters with a single <slash>. If, as a result of this canonicalization, the curpath variable is null, no further steps shall be taken.
Simple path canonicalization. Interestingly, they account in the last step for if the path is left null by explicitly showing it as a noop, though I'm not sure how this could possibly happen, as all relative paths have PWD prepended to them before this step.
If curpath is longer than {PATH_MAX} bytes (including the terminating null) and the directory operand was not longer than {PATH_MAX} bytes (including the terminating null), then curpath shall be converted from an absolute pathname to an equivalent relative pathname if possible. This conversion shall always be considered possible if the value of PWD, with a trailing <slash> added if it does not already have one, is an initial substring of curpath. Whether or not it is considered possible under other circumstances is unspecified. Implementations may also apply this conversion if curpath is not longer than {PATH_MAX} bytes or the directory operand was longer than {PATH_MAX} bytes.
Doesn't seem to do much other than make the path relative again if it's too long.
The cd utility shall then perform actions equivalent to the chdir() function called with curpath as the path argument. If these actions fail for any reason, the cd utility shall display an appropriate error message and the remainder of this step shall not be executed. If the -P option is not in effect, the PWD environment variable shall be set to the value that curpath had on entry to step 9 (i.e., before conversion to a relative pathname). If the -P option is in effect, the PWD environment variable shall be set to the string that would be output by pwd -P. If there is insufficient permission on the new directory, or on any parent of that directory, to determine the current working directory, the value of the PWD environment variable is unspecified.
And here is where the chdir actually takes place.
In conclusion
So in essence, a command of cd '' by the standard should cd to the first existing component of CDPATH, if it is set, or to the current directory otherwise. If cd -P '' is used, it will also remove symlinks from the path. In this way, chdir should only be called with an empty string if CDPATH is non-null, but none of its components exist, and cd -P '' is called, as that will pass through step 5, set the curpath to an empty string in step 6, then jump from step 7 to step 10. I don't see any other way that chdir would be called with an empty string, unless a bad implementation takes step 9 too literally and sets curpath to an empty string following the last sentence. ksh93 on Linux and /bin/sh on AIX are nonconformant by these rules. In this way, I'd be careful about using a cd to a path that might evaluate zero-length, as a CDPATH being set can weirdly affect what you're trying to do (though CDPATH has unexpected and confusing behavior anyway, and should not be used in most cases).

Related

bash array slicing strange syntax in perl path: `${PATH:+:${PATH}}"`

On Linux Ubuntu, when you do sudo apt update && sudo apt install perl, it adds the following to the bottom of your ~/.bashrc file (at least, many months later, I think that is what added those lines):
PATH="/home/gabriel/perl5/bin${PATH:+:${PATH}}"; export PATH;
PERL5LIB="/home/gabriel/perl5/lib/perl5${PERL5LIB:+:${PERL5LIB}}"; export PERL5LIB;
PERL_LOCAL_LIB_ROOT="/home/gabriel/perl5${PERL_LOCAL_LIB_ROOT:+:${PERL_LOCAL_LIB_ROOT}}"; export PERL_LOCAL_LIB_ROOT;
PERL_MB_OPT="--install_base \"/home/gabriel/perl5\""; export PERL_MB_OPT;
PERL_MM_OPT="INSTALL_BASE=/home/gabriel/perl5"; export PERL_MM_OPT;
What does this strange syntax do in many of the lines, including in the first line? It appears to be some sort of bash array slicing:
${PATH:+:${PATH}}
The ${PATH} part is pretty straightforward: it reads the contents of the PATH variable, but the rest is pretty cryptic to me.
It's not array slicing; it's a use of one of the POSIX parameter expansion operators. From the bash man page, in the Parameter Expansions section,
${parameter:+word}
Use Alternate Value. If parameter is null or unset, nothing is
substituted, otherwise the expansion of word is substituted.
It's a complex way of making sure that you only add a : to the value if PATH isn't empty to start with. A longer, clearer way of writing it would be
if [ -n "$PATH" ]; then
PATH=/home/gabriel/perl5/bin:$PATH
else
PATH=/home/gabriel/perl5/bin
fi
However, since it if almost inconceivable that PATH is empty when .basrhc is sourced, it would be simpler to just prepend the new path and be done with it.
PATH=/home/gabriel/perl5/bin:$PATH
If PATH actually ended with a :, it would implicitly include the current working directory in the search path, which isn't a good idea for security reasons. Also from the bash man page, in the section on Shell Variables under the entry for PATH:
A zero-length (null) directory name in the
value of PATH indicates the current directory. A null directory
name may appear as two adjacent colons, or as an initial or
trailing colon.
As an aside, it's good to understand what various installers try to add to your shell configuration. It's not always necessary, and sometimes can actively change something you already have configure.
I would much prefer if packages simply printed instructions for what needs to be added to your configuration (and why), and leave it to the user to make the appropriate modifications.
What does this strange syntax do in many of the lines, including in the first line?
It's the ${parameter:+word} form of parameter expansion where word becomes the expanded value if parameter is not unset and not having the value of an empty string (a.k.a. null).

GOBIN root setting with var multi GOPATH in .zshrc config

export GOPATH=~/mygo:~/go
export GOBIN=$GOPATH/bin
I expected the $GOBIN equals ~/mygo/bin:~/go/bin but it is ~/mygo:~/go/bin instead.
how could I set them a better way? thx
Solution
export GOPATH=~/mygo:~/go
export GOBIN=${(j<:>)${${(s<:>)GOPATH}/%//bin}}
Explanation
Although whatever program uses GOPATH might interprete it as an array, for zsh it is just a scalar ("string").
In order to append a string (/bin) to every element the string "$GOPATH" first needs to be split into an array. In zsh this can be done with the parameter expansion flag s:string:. This splits a scalar on string and returns an array. Instead of : any other character or matching pairs of (), [], {} or <> can be used. In this case it has to be done because string is to be :.
GOPATH_ARRAY=(${(s<:>)GOPATH)
Now the ${name/pattern/repl} parameter expansion can be used to append /bin to each element, or rather to replace the end of each element with /bin. In order to match the end of a string, the pattern needs to begin with a %. As any string should be matched, the pattern is otherwise empty:
GOBIN_ARRAY=(${GOPATH_ARRAY/%//bin})
Finally, the array needs to be converted back into a colon-separated string. This can be done with the j:string: parameter expansion flag. It is the counterpart to s:string::
GOBIN=${(j<:>)GOBIN_ARRAY}
Fortunately, zsh allows Nested Substitution, so this can be done all in one statement, without intermediary variables:
GOBIN=${(j<:>)${${(s<:>)GOPATH}/%//bin}}
Alternative Solution
It is also possible to do this without parameter expansion flags or nested substitution by simply appending /bin to the end of the string and additionally replace every : with /bin::
export GOBIN=${GOPATH//://bin:}/bin
The ${name//pattern/repl} expansion replaces every occurence of pattern with repl instead of just the first like with ${name/pattern/repl}.
This would also work in bash.
Personally, I feel that it is a bit "hackish", mainly because you need to write /bin twice and also because it completely sidesteps the underlying semantics. But that is only personal preference and the results will be the same.
Note:
When defining GOPATH like you did in the question
export GOPATH=~/mygo:~/go
zsh will expand each occurence of ~/ with your home directory. So the value of GOPATH will be /home/kevin/mygo:/home/kevin/go - assuming the user name is "kevin". Accordingly, GOBIN will also have the expanded paths, /home/kevin/mygo/bin:/home/kevin/go/bin, instead of ~/mygo/bin:~/go/bin
This could be prevented by quoting the value - GOPATH="~/mygo:~/go" - but I would recommend against it. ~ as synonym for the home directory is not a feature of the operating system and while shells usually support it, other programs (those needing GOPATH or GOBIN) might not do so.

How to setup prompt with three nearest folders ../c/Users/test

Here's a sample: ../c/Users/test
I stil have no idea how to achieve this zsh's prompt, if possible also how it will be in bash?
EDIT: the nearest means my pwd will show for i.e: /home/abc/c/Users/test.
If I'm under /home/abc/c or /home/abc or /home, prompt should be /home/abc/c> or /home/abc> or /home>
So only current path that excesses three folders will have .. appended in front and the three nearest folders.
ZSH
In zsh this can be achieved entirely with built-in features. You just have to place
%(4/|../%3d|%d)
in your PROMPT parameter (also known as PS1).
For example:
PROMPT='[%m#%n %(4/|../%3d|%d)]%# '
Would get you something like
[abc#machine ../c/Users/test]%
when the current directory is /home/abc/c/Users/test.
Explanation:
%(x|true-text|false-text): x represents a test, if it evaluates to true it zsh will print what is placed as true-text else it will print the false-text.
4/ is true if the current absolute path has at least 4 elements. For example /home/abc/c/Users/test has 5 elements
So, if the current path has 4 or more elements, the output is ../%3d, where %3d will be replaced with the last 3 elements of the current path. For example ../c/Users/test.
If the current path has less than 4 elements, the output is %d, which will be replaced by the full current path.
BASH
Method 1: simple but not a perfect match
In bash (since version 4) you can achieve very similar results by setting PROMPT_DIRTRIM=3 and placing \w in PS1. This will also only display the last 3 elements of the current path, preceded by either ~/.../ or .../. Which depends on whether the current directory is within the user's home directory.
For example:
PS1='[\u#\h \w]\$ '
PROMPT_DIRTRIM=3
would get you
[abc#machine ~/.../c/Users/test]$
when the current working directory is /home/abc/c/Users/test and
[abc#machine .../share/doc/sometool]$
when the current working directory is /usr/local/share/doc/sometool.
Method 2: complicated but works as asked
For an exact match place the following in your PS1:
$(a=${PWD%/*} a=${a%/*} a=${a%/*}; echo ${PWD/#$a/${a:+..}})
For example
PS1='[\u#\h $(a=${PWD%/*} a=${a%/*} a=${a%/*}; echo ${PWD/#$a/${a:+..}})]\$ '
Important: At least the part that generates the path output needs to be fully quoted, e.g. surrounded by single quotes. Otherwise it would be evaluated at the time of definition and not when the prompt is displayed.
Explanation:
$(command): This is called Command Substitution. It will run command and then be substituted by the resulting output.
The parameter PWD contains the current working directory.
a=${PWD%/*}: The shortest possible match to /* will be removed from the end of $PWD and the resulting value will be assigned to parameter a. That is, the last path element will be removed from $PWD.
a=${PWD%/*} a=${a%/*} a=${a%/*}: this removes the last three path elements from $PWD. If $PWD has three or less elements, then a will be empty at the end. If there are more than three elements, then a contains all element you do not want to be shown, i.e. the ones you want to replace with ...
(Note: While a=${PWD%/*/*/*} also removes the last three path elements, it does not work as intended, if there are less than three elements. In that case the end of $PWD would not match to /*/*/* and nothing would be removed, leaving $a to be identical to $PWD.)
${a:+..}: if a is defined and not null this will be substituted by .., otherwise nothing is substituted. This means if there are path elements to be removed, then ${a:+..} will evaluate to ...
${PWD/#$a/${a:+..}}): if the beginning of $PWD matches $a then it will be replaced by the subtstitution of ${a:+..}. Essentially, if a contains any path elements, than they will be replaced by .., otherwise nothing will be changed.
echo: As this all happens within a Command Substitution, echo is needed in order to output the shortened path.
this seems to do the trick:
PS1='$(pwd|sed -r "sx.+(/[^/]*/[^/]*/[^/]*)\$x..\1x" ):'
eg:
jasen#gonzo:/var/spool/news/comp/lang/c/moderated$ PS1='$(pwd|sed -r "sx.+(/[^/]*/[^/]*/[^/]*)\$x..\1x" ):'
../lang/c/moderated:cd
/home/jasen:
How it works:
PS1 is expanded to produce the prompt.
I use command substitusion $( ... ) to insert output of a command into the prompt.
the command itself is a pipeline
first pwd pints out the current directory
it's output is piped (|) to sed in extended (-r) regular expression mode. sed is given the command.
"sx.+(/[^/]*/[^/]*/[^/]*)\$x..\1x"
this an s substitution command
in this command the symbol that follows the s is the separator here I used x
the phrase [^/]* indicates a seaquence of zero or more non-slashes (like the name of a directory) while the other slashes / represent actual slashes and .+ matches anything at all (but not nothing). and the $ represents end of line.
so starting from dollar it matches lines that end like /name/name/name
the bit after the second X where it says ..\1 is what to replace the match with. in this case .. followed by the bit contained in the bit matched by the parenthesised pattern. ../name/name/name

What is the meaning of '-*' as case parameter in a fish shell script?

The official documentation of fish shell has this example.
function mkdir -d "Create a directory and set CWD"
command mkdir $argv
if test $status = 0
switch $argv[(count $argv)]
case '-*'
case '*'
cd $argv[(count $argv)]
return
end
end
end
I understand case '*' is like default: in C++ switch statement.
What is the meaning or usage of case '-*'?
It's a glob match.
case '-*' will be executed whenever the switched parameter starts with a "-".
And because only the first matching case will be used, case '*' as the last case is like "default:". If you had it earlier, it would swallow all cases after it.
Also the quotes here are necessary because otherwise fish would expand that glob, which would mean case -* would have all matching filenames in the current directory as parameters, so it would be true if the switched parameter is the name of a file in the current directory that starts with "-".
With the help of #faho's answer, I understand the purpose of -*.
-* is glob pattern. It is not at all different from patterns like *.pdf or Report_2016_*.
Author added this check to ignore all directories that start with -. It will create a directory that starts with - but will not set CWD to it.
The reason, - has special usage in shells.
For example, cd - does not change directory into a directory named -. Instead it switches to the last directory you were in.
Directories or files whose name start with - are a source of trouble. Following question on SO sister sites give an idea.
How do you enter a directory that's name is only a minus?
How do I delete a file whose name begins with “-” (hyphen a.k.a. dash or minus)?
How to cd into a directory with this name “-2” (starting with the hyphen)?
No wonder author decided to ignore directory that start with -.

What does the path "//" mean?

I just found the direcory // on my machine and now i am wondering what it means.
user#dev:~$ cd /
user#dev:/$ pwd
/
user#dev:/$ cd //
user#dev://$ pwd
//
It is obvously the root directory, but when and why do i use the double slash instead of the single slash?
Is it related to the escaped path strings which i use while programming?
For example:
string path = "//home//user//foo.file"
I also tried it with zsh but it changes to the usual root directory /. So I think its bash specific.
This is part of the specification for Pathname Resolution:
A pathname consisting of a single <slash> shall resolve to the root directory of the process. A null pathname shall not be successfully resolved. If a pathname begins with two successive <slash> characters, the first component following the leading <slash> characters may be interpreted in an implementation-defined manner, although more than two leading <slash> characters shall be treated as a single <slash> character.
So your shell is just following the specification and leaving // alone as it might be implementationally defined as something other than /.

Resources