How to bulk susbstitute _ by / in Windows filename for all files in a folder? - windows

The command I'm trying is:
Get-Children | Rename-Item -NewName { $_.Name -replace '_','/' }
But apparently we can't substitute by / for file names in Windows. The error is:
Cannot rename the specified target, because it represents a path or device name.

As others have already pointed out, what you want simply isn't possible in Windows. Forward slashes are reserved characters that are not allowed in file and folder names.
Naming Conventions
The following fundamental rules enable applications to create and process valid names for files and directories, regardless of the file system:
[…]
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
&ast; (asterisk)

Related

Detecting invalid (Windows) filenames

We have SMB shares that are used by Windows and Mac clients. We want to move some data to Sharepoint, but need to validate the filenames against characters that are not allowed in Windows. Although Windows users wouldn't be able to create files with illegal characters anyway, Mac users are still able to create files with characters that are illegal in Windows.
The problem is that for files with illegal characters in their names, Windows/Powershell substitutes those characters with private-use address unicode codepoint. These vary by input character.
$testfolder = "\\server\test\test*dir" # created from a Mac
$item = get-item -path $testfolder
$item.Name # testdir
$char = $($item.Name)[4] # 
$bytes = [System.Text.Encoding]::BigEndianUnicode.GetBytes($char) # 240:33
$unicode = [System.BitConverter]::toString($bytes) # F0-21
For a file with name pipe|, the above code produces the output F0-27, so it's not simply a generic "invalid" character.
How can I check filenames for invalid values when I can't actually get the values??
As often happens, in trying to formulate my question as precisely as possible, I came upon a solution. I would still love any other answers for how this could be tackled more elegantly, but since I didn't find any other resources with this information, I'm providing my solution here in hopes it might help others with this same problem.
Invalid Characters Map to Specific Codepoints
Note: I'm extrapolating all of this from observations I've made. I'm happy for someone to comment or provide an alternative answer that is more complete or correct.
There is a certain set of characters that are invalid for Windows file names, but this is a restriction of the OS, NOT the filesystem. This means that that it's possible to set a filename on an SMB share that is valid on another OS (e.g. MacOS) but not on Windows. When Windows encounters such a file, the invalid characters are shadowed by a set of proxy unicode codepoints, which allows Windows to interact with the files without renaming them. These codepoints are in the unicode Private Use Area, which covers 0xE000-0xF8FF. Since these codepoints are not mapped to printable characters, Powershell displays them all as ▯ (U+25AF). In my specific use case, I need to run a report of what invalid characters are present in a filename, so this generic character message is not helpful.
Through experimentation, I was able to determine the proxy codepoints for each of the printable restricted characters. I've included them below for reference (note: YMMV on this, I haven't tested it on multiple systems, but I suspect it's consistent between versions).
Character
Unicode
"
0xF020
*
0xF021
/
0xF022
<
0xF023
>
0xF024
?
0xF025
\
0xF026
|
0xF027
(trailing space)
0xF028
: is not allowed in filenames on any system I have easy access to, so I wasn't able to test that one.
Testing names in Powershell
Now that we know this, it's pretty simple to tackle in powershell. I created a hashtable with all of the proxy unicode points as keys and the "real" characters as values, which we can then use as a lookup table. I chose to replace the characters in the filename string before testing the name. This makes debugging easier.
#Set up regex for invalid characters
$invalid = [Regex]::new('^\s|[\"\*\:<>?\/\\\|]|\s$')
#Create lookup table for unicode values
$charmap = #{
[char]0xF020 = '"'
[char]0xF021 = '*'
[char]0xF022 = '/'
[char]0xF023 = '<'
[char]0xF024 = '>'
[char]0xF025 = '?'
[char]0xF026 = '\'
[char]0xF027 = '|'
[char]0xF028 = ' '
}
Get-ChildItem -Path "\\path\to\folder" -Recurse | Foreach-Object {
# Get the filename
$fixedname = split-path -path $_.FullName -leaf
#Iterate through the hashtable and replace all the proxy characters with printable versions
foreach($key in $charmap.getEnumerator()){
$fixedname = $fixedname.Replace($key.Name,$key.Value)
}
#Build a list of invalid characters to include in report (not shown here)
$invalidmatches = $invalid.Matches($fixedname)
if ($invalidmatches.count -gt 0) {
$invalidchars = $($invalidmatches | foreach-object {
if ($_.value -eq ' '){"Leading or trailing space"} else {$_.value}}) -join ", "
}
}
Extending the solution
In theory, you could also extend this to cover other prohibited characters, such as the ASCII control characters. Since these proxy unicode points are in the PUA, and there is no documentation on how this is handled (as far as I know), discovering these associations is down to experimentation. I'm content to stop here, as I have run through all of the characters that are easily put in filenames by users on MacOS systems.

Is ./*/ portable?

I often use ./*/ in a for loop like
for d in ./*/; do
: # do something with dirs
done
to match all non-hidden directories in current working directory, but I'm not really sure if this is a portable way to do that. I have bash, dash and ksh installed on my system and it works with all, but since POSIX spec doesn't say anything about it (or it says implicitly, and I missed it) I think I can't rely on it. I also checked POSIX bug reports, but to no avail, there's no mention of it there as well.
Is its behaviour implementation or filesystem dependent? Am I missing something here? How do I know if it's portable or not?
Short answer: YES
Long Answer:
The POSIX standard (from opengroup) states that / will only match slashes in the expanded file name. Since Unix/Linux does not allow / in the file name, I believe that this is a safe assumption on Unix/Linux systems.
From the bolded text below, it seems that even for systems that will allow / in the file name, the POSIX standard require that / will not be matched to such file.
On Windows, looks like / is not allowed in the file name, but I'm not an expert on Windows.
From Shell Programming Language § Patterns Used for Filename Expansion:
The slash character in a pathname shall be explicitly matched by using one or more slashes in the pattern; it shall neither be matched by the asterisk or question-mark special characters nor by a bracket expression. Slashes in the pattern shall be identified before bracket expressions; thus, a slash cannot be included in a pattern bracket expression used for filename expansion.
...
Additional Note - clarifying pathname:
The pathname is defined in 4.13, with explicit reference to pathname with trailing slash in General Concepts § Pathname Resolution.
A pathname that contains at least one non-<slash> character and that ends with one or more trailing <slash> characters shall not be resolved successfully unless the last pathname component before the trailing <slash> characters names an existing directory or a directory entry that is to be created for a directory immediately after the pathname is resolved. Interfaces using pathname resolution may specify additional constraints when a pathname that does not name an existing directory contains at least one non-<slash> character and contains one or more trailing <slash> characters.

":"-Character in a windows filesystem (formerly: ":"-Character in Delphi TFileStream)

I tried to put a colon in the String of the filename of a filestream.
Is it true that one can't use a colon in a TFileStream in Delphi?
And if you can, then how?
EDIT: Thanks for all the downvotes. It deserves that. In retrospekt I have asked a stupid question...
On Windows, which I presume is your platform, the colon is a reserved character and so not allowed in a filename. This is documented here:
File and Directory Names
Naming Conventions
The following fundamental rules enable applications to create and process valid names for files and directories, regardless of the file system:
...
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
...

Replace incorrectly displayed special chars in bash

I've uploaded a big number of files including their folder structure to my Ubuntu 12.04 LTS Server using WinSCP.
The goal is to access these files in Owncloud.
However, all files that contain special character like German Umlauts cause problems. In Ownclouds view, their name is cut off at the special character and trying to view that folder or file will send you back to the folder root.
Using ls, the special character is always displayed as a question mark, e.g. "Moterschwei?en1.jpg"
What works is manually renaming them through "mv" in the shell. Inserting the special char properly, e.g. "Motorschweißen1.jpg" for this example, does work, but doing this for all of them would take ages.
Using find . -name "?" will not yield any hits.
Is there any way to replace all of those special characters, e.g. with an underscore?
Try the command rename:
rename 'y/\W/_' *
The above command will replace all non alphanumeric characters with _. See http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators and http://perldoc.perl.org/perlre.html#Special-Backtracking-Control-Verbs for the documentation of perl regex expression.

Special Character in windows file name

why we can not use any special character (?, <..) in windows File name ?
Fundamental rules for for Universal Naming Convention (UNC),which enable applications to create and process valid names for files and directories, regardless of the file system:
Following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255),
Because they have special meanings in filesystem:
C:*.? - get all files with single letter extensions from C drive
: \ * ? - all have special meanings
Since some character are Reserved characters in some operating system,say ? is used as wildcard,/ as path name component separator.

Resources