why we can not use any special character (?, <..) in windows File name ?
Fundamental rules for for Universal Naming Convention (UNC),which enable applications to create and process valid names for files and directories, regardless of the file system:
Following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255),
Because they have special meanings in filesystem:
C:*.? - get all files with single letter extensions from C drive
: \ * ? - all have special meanings
Since some character are Reserved characters in some operating system,say ? is used as wildcard,/ as path name component separator.
Related
The command I'm trying is:
Get-Children | Rename-Item -NewName { $_.Name -replace '_','/' }
But apparently we can't substitute by / for file names in Windows. The error is:
Cannot rename the specified target, because it represents a path or device name.
As others have already pointed out, what you want simply isn't possible in Windows. Forward slashes are reserved characters that are not allowed in file and folder names.
Naming Conventions
The following fundamental rules enable applications to create and process valid names for files and directories, regardless of the file system:
[…]
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
I tried to put a colon in the String of the filename of a filestream.
Is it true that one can't use a colon in a TFileStream in Delphi?
And if you can, then how?
EDIT: Thanks for all the downvotes. It deserves that. In retrospekt I have asked a stupid question...
On Windows, which I presume is your platform, the colon is a reserved character and so not allowed in a filename. This is documented here:
File and Directory Names
Naming Conventions
The following fundamental rules enable applications to create and process valid names for files and directories, regardless of the file system:
...
Use any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
The following reserved characters:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)
...
I am running Windows 7 and (have to) use Turbo Grep (Borland something) to search in a file.
I have 2 version of this file, one encoded in UTF-8 and one in ANSI.
If I run the following grep on the ANSI file, I get the expected results, but I get no results with the same statement on the UTF-8 file:
grep -ni "[äöü]" myfile.txt
[-n for line numbers, -i for ignoring cases]
The Turbo Grep Version is :
Turbo GREP 5.6 Copyright (c) 1992-2010 Embarcadero Technologies, Inc.
Syntax: GREP [-rlcnvidzewoqhu] searchstring file[s] or #filelist
GREP ? for help
Help for this command lists:
Options are one or more option characters preceded by "-", and optionally
followed by "+" (turn option on), or "-" (turn it off). The default is "+".
-r+ Regular expression search -l- File names only
-c- match Count only -n- Line numbers
-v- Non-matching lines only -i- Ignore case
-d- Search subdirectories -z- Verbose
-e Next argument is searchstring -w- Word search
-o- UNIX output format Default set: [0-9A-Z_]
-q- Quiet: supress normal output
-h- Supress display of filename
-u xxx Create a copy of grep named 'xxx' with current options set as default
A regular expression is one or more occurrences of: One or more characters
optionally enclosed in quotes. The following symbols are treated specially:
^ start of line $ end of line
. any character \ quote next character
* match zero or more + match one or more
[aeiou0-9] match a, e, i, o, u, and 0 thru 9 ;
[^aeiou0-9] match anything but a, e, i, o, u, and 0 thru 9
Is there a problem with the encoding of these charactes in UTF-8? Might there be a problem with Turbo Grep and UTF-8?
Thanks in advance
Yes there are a different w7 use UTF-16 little endian not UTF-8, UTF-8 is used in unix, linux and plan 9 for cite a few OS.
Jon Skeet explain:1
ANSI: There's no one fixed ANSI encoding - there are lots of them. Usually when people say "ANSI" they mean "the default code page for my system" which is obtained via Encoding.Default, and is often Windows-1252
UTF-8: Variable length encoding, 1-4 bytes covers every current character. ASCII values are encoded as ASCII.
UTF-16 is more similar to ANSI so for this reason with ANSI work well.
if you use only ascii both encodings are usable, but with special characters as ä ö ü etc you need use UTF-16 in windows and UTF-8 in the others
I have written a c program that retrieves arguments from the command line under Windows. One of the arguments is a regular expression. So I need to retrieve special characters such as "( , .", etc., but cmd.exe treats "(" as a special character.
How could I input these special character?
thanks.
You can put the arguments in quotes:
myprogram.exe "(this is some text, with special characters.)"
Though I wouldn't assume that parentheses cause problems unless you are using blocks for conditional statements or loops in a batch file. The usual array of characters that are treated specially by the shell and need quoting or escaping are:
& | > < ^
If you use those in your regular expression, then you need quotes, or escape those characters:
myprogram "(.*)|[a-f]+"
myprogram (.*)^|[a-f]+
(^ is the escape character which causes the following character to be not interpreted by the shell but instead used literally)
You can generally prefix any character with ^ to turn off its special nature. For example:
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\Documents and Settings\Pax>echo No ^<redirection^> here and can also do ^
More? multi-line, ^(parentheses^) and ^^ itself
No <redirection> here and can also do multi-line, (parentheses) and ^ itself
C:\Documents and Settings\Pax>
That's a caret followed by an ENTER after the word do.
I'm creating a grammar using JavaCC and have run across a small problem. I'm trying to allow for any valid character within the ASCII extended set to be recognized by the resulting compiler. After looking at the same JavaCC examples (primarily the example showing the JavaCC Grammer itself) I set up the following token to recognize my characters:
< CHARACTER:
( (~["'"," ","\\","\n","\r"])
| ("\\"
( ["n","t","b","r","f","\\","'","\""]
| ["0"-"7"] ( ["0"-"7"] )?
| ["0"-"3"] ["0"-"7"] ["0"-"7"]
)
)
)
>
If I'm understanding this correctly it should be matching on the octal representation of all of the ASCII characters, from 0-377 (which covers all 256 characters in the Extended ASCII Set). This performs as expected for all keyboard characters (a-z, 0-9, ?,./ etc) and even for most special characters (© , ¬ ®).
However, whenever I attempt to parse the 'trademark' symbol (™) my parser continually throws an End of File exception, indicating that it is unable to recognize the symbol. Is there some obvious way that I can enhance my definition of a character to allow the trademark symbol to be accepted?
I had similar a issue for recognizing special symbols of a text file (either CP1252 or ISO-8859-1 encoded) which was read to a String before parsing. My solution was adding the UNICODE_INPUT to the grammar header:
options {
UNICODE_INPUT=true;
}
Worked like a breeze.
More information on JavaCC options: http://javacc.java.net/doc/javaccgrm.html
It turns out that what I wanted my grammar to do was to accept all valid Unicode characters and not ASCII characters, the ™ symbol is part of the Unicode specification and not in an ASCII extended character set. Changing my token for a valid character as outlined below solved my problem: (A valid unicode being of the format- U+00FF)
< CHARACTER:( (~["'"," ","\\","\n","\r"])
| ("\\"
( ["n","t","b","r","f","\\","'","\""]
| ["u","U"]["+"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]["0"-"9","a"-"f","A"-"F"]
)
) )>