What means tilde in windows file pattern - windows

I have pattern to search. Say "*.txt".
Now I have some files I do not want to list there. I believe they do not match this pattern.
But on windows, they do.
I know tilde character is used to make short form of legacy 8.3 filename. That is LongFilename.json might be LONGFI~1.JSO. But I did not know they are handled somehow on windows in file search patterns. They are. I cannot find any documentation about what they mean and how to match files my way.
My problem is NOT with short forms. Or I think it is not directly related to it.
I have file "A.txt". Now I wanted temporary file and used "A.txt~". It is unix backup files that is not usually visible. But on windows, they should not have special meaning by itself. Only for my application.
Now I want list of "*.txt" files. Command
dir *.txt
returns to my surprise also all .txt~ files in the same directory. And I do not want them. I use FindFileFirst from Win32 API. I did not find anything about tilde character in documentation. FindFileFirst(".txt", handle) returns also files "A.txt~". Can I use some flag to exclude them? I know I can make special condition, like I have for "." and "..". How does ~ operator work? A.txt~1 is also matched. Is everything after tilde ignored? Is that feature or bug?
I am testing that on Windows 7 Professional, 64 edition, if that changes anything.

FindFirstFile also includes short names for legacy reasons so the pattern *.txt will include anything with an 8.3 representation ending in *.txt which includes *.txtANYTHING , not just the ~ character (see dir /xfor what's being matched against).
You will need to filter in your FindNext enumeration.

If you are searching for .txt files for example, you can use "kind:text" option in windows to exclude txt~ and similar files since they are not a recognized type anymore.
That's something that works on regular windows search. I'm not 100% sure about the API, but it should also be there.

Related

Documented behavior for multiple backslashes in Windows paths

Apparently, Windows (or at least some part of Windows) ignores multiple backslashes in a path and treats them as a single backslash. For example, executing any of these commands from a command prompt or the Run window opens Notepad:
C:\Windows\System32\Notepad.exe
C:\Windows\System32\\Notepad.exe
C:\Windows\System32\\\Notepad.exe
C:\Windows\System32\\\\Notepad.exe
C:\\Windows\\System32\\Notepad.exe
C:\\\Windows\\\System32\\\Notepad.exe
This can even work with arguments passed on the command line:
notepad "C:\Users\username\Desktop\\\\myfile.txt"
Is this behavior documented anywhere? I tried several searches, and only found this SO question that even mentions the behavior.
Note: I am not asking about UNC paths (\\servername), the \\?\ prefix, or the \\" double-quote escape.
Note: I stumbled upon this behavior while working with a batch file. One line in the batch file looked something like this:
"%SOME_PATH%\myapp.exe"
After variable expansion, the command looked like:
"C:\Program Files\Vendor\MyApp\\myapp.exe"
To my surprise, the batch file executed as desired and did not fail with some kind of "path not found" error.
In most cases, Win32 API functions will accept a wide range of variations in the path name format, including converting a relative path into an absolute path based on the current directory or per-drive current directory, interpreting a single dot as "this directory" and two dots as "the parent directory", converting forward slashes into backslashes, and removing extraneous backslashes and trailing periods.
So something like
c:\documents\..\code.\\working\.\myprogram\\\runme.exe..
will wind up interpreted as
c:\code\working\myprogram\runme.exe
Some of this is documented, some is not. (As Hans points out, documenting this sort of workaround legitimizes doing it wrong.)
Note that this applies to the Win32 API, not necessarily to every application, or even every system component. In particular, the command interpreter has stricter rules when dealing with a long path, and Explorer will not accept the dot or double-dot and typically will not accept forward slashes. Also, the rules may be different for network drives if the server is not running Windows.
There is no consequence because you can't even name a file or folder with a backslash. So multiple consecutive backslashes will always be seen as one separator in the path.

Globbing files having single different character in filename

I have couple of files of the format,
unit-a.test.one.two
unit-b.test.one.two
unit-c.test.one.two
and I want to move all the files to make the filename
unit-a.test.one.sample.two
and so on.
While I know that using a globbing of the type
unit-*
will match all the files, I was wondering if there is a better solution to match the filenames since this method will also incorrectly match files of the type
unit-1.txt
Which is undesirable.
Why not using unit-?.test.one.two ?

FindFirstFile Multiple file types

Is it possible to use Windows API function FindFirstFile to search for multiple file types, e.g *.txt and *.doc at the same time?
I tried to separate patterns with '\0' but it does not work - it searches only the first pattern (I guess, that's because it thinks that '\0' is the end of string).
Of course, I can call FindFirstFile with *.* pattern and then check my patterns or call it for every pattern, but I don't like this idea - I will use it only if there no other solutions.
This is not supported. Run it twice with different wildcards. Or use *.* and filter the result. This is definitely the better choice, wildcards are ambiguous anyway due to support for legacy MS-DOS 8.3 filenames. A wildcard like *.doc will find both .doc and .docx files for example. A filename like longfilename.docx also creates an entry named LONGFI~1.DOC
The MSDN docs mention nothing about FindFirstFile allowing multiple search patterns, hence it doesn't exist.
In this case your best bet is to scan using an open selection (like C:\\some directory\* or *) and then filter based on WIN32_FIND_DATA's cFileName member, using strrchr (or the appropriate Unicode variant) to find the extension. It should run pretty fast for the small set of characters that make up a file extension.
If you know the that all the extensions are say 3 characters, you should be able to mask it off as *.??? to speed things up.

Rename file in Win32 to name with only differences in capitalization

Does anyone know a pure Win32 solution for renaming a file and only changing its capitalization, that does not involve intermediate renaming to a different name or special privileges (e.g. backup, restore).
Since the Win32 subsystem generally regards two file names differing only in capitalization as the same, I haven't been able to find any solution to the problem.
A test program I made with the MoveFile API seems to work. So does the rename command in cmd.exe. What have you tried, and what error are you getting?
This isn't relevant, but further testing shows that renaming a long filename in this way works but will change the short filename (alternating between ~1 and ~2 for example), incidentally.
Just use the normal MoveFile API. That call probably just turns into ZwSetInformationFile(..., FileRenameInformation,...) The docs for FILE_RENAME_INFORMATION states that you need DELETE access and the file can't be locked etc, but those restrictions will probably apply to other solutions also.
I do not believe there is a way to expose two files with identical names that differ only in spelling to the Win32 subsystem. Even if some how you were able to create these files, the most likely result would be that only one file would be accessible - defeating the purpose of staying soley in Win32.
If you want to go into the Native layer, you can create a file with NtCreateFile and InitializeObjectAttributes w/o OBJ_CASE_INSENSITIVE or you can pad the end with extra spaces (if you pad with extra spaces, the file will not be accessible from Win32 dos paths). See here: http://www.osronline.com/ddkx/kmarch/k109_66uq.htm . I'm pretty sure you were already aware but I included it incase you did not know.
So long as your file is not immediately needed by another program, you can use my solution.
When you rename the file, capitalize, and delete the last letter. Then rename again and return the letter.
:)

Are there any invalid linux filenames?

If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.

Resources