Globbing files having single different character in filename - bash

I have couple of files of the format,
unit-a.test.one.two
unit-b.test.one.two
unit-c.test.one.two
and I want to move all the files to make the filename
unit-a.test.one.sample.two
and so on.
While I know that using a globbing of the type
unit-*
will match all the files, I was wondering if there is a better solution to match the filenames since this method will also incorrectly match files of the type
unit-1.txt
Which is undesirable.

Why not using unit-?.test.one.two ?

Related

bzr - how to only commit one file pattern

I have a large hierarchical directory structure. I only want to commit files of one type (say *.c) and ignore all the other files. I know how to use .bzrignore to ignore specific file patters.. but is it possible to set something like a .bzrinclude file to only include a specific file pattern ?
Thanks !
Yes, this should be possible because bazaar ignore files support regexes as patterns, using the RE: prefix:
http://doc.bazaar.canonical.com/beta/en/user-reference/patterns-help.html
So you just need to design a regular expression which matches everything except the files you're interested in.

What means tilde in windows file pattern

I have pattern to search. Say "*.txt".
Now I have some files I do not want to list there. I believe they do not match this pattern.
But on windows, they do.
I know tilde character is used to make short form of legacy 8.3 filename. That is LongFilename.json might be LONGFI~1.JSO. But I did not know they are handled somehow on windows in file search patterns. They are. I cannot find any documentation about what they mean and how to match files my way.
My problem is NOT with short forms. Or I think it is not directly related to it.
I have file "A.txt". Now I wanted temporary file and used "A.txt~". It is unix backup files that is not usually visible. But on windows, they should not have special meaning by itself. Only for my application.
Now I want list of "*.txt" files. Command
dir *.txt
returns to my surprise also all .txt~ files in the same directory. And I do not want them. I use FindFileFirst from Win32 API. I did not find anything about tilde character in documentation. FindFileFirst(".txt", handle) returns also files "A.txt~". Can I use some flag to exclude them? I know I can make special condition, like I have for "." and "..". How does ~ operator work? A.txt~1 is also matched. Is everything after tilde ignored? Is that feature or bug?
I am testing that on Windows 7 Professional, 64 edition, if that changes anything.
FindFirstFile also includes short names for legacy reasons so the pattern *.txt will include anything with an 8.3 representation ending in *.txt which includes *.txtANYTHING , not just the ~ character (see dir /xfor what's being matched against).
You will need to filter in your FindNext enumeration.
If you are searching for .txt files for example, you can use "kind:text" option in windows to exclude txt~ and similar files since they are not a recognized type anymore.
That's something that works on regular windows search. I'm not 100% sure about the API, but it should also be there.

FindFirstFile Multiple file types

Is it possible to use Windows API function FindFirstFile to search for multiple file types, e.g *.txt and *.doc at the same time?
I tried to separate patterns with '\0' but it does not work - it searches only the first pattern (I guess, that's because it thinks that '\0' is the end of string).
Of course, I can call FindFirstFile with *.* pattern and then check my patterns or call it for every pattern, but I don't like this idea - I will use it only if there no other solutions.
This is not supported. Run it twice with different wildcards. Or use *.* and filter the result. This is definitely the better choice, wildcards are ambiguous anyway due to support for legacy MS-DOS 8.3 filenames. A wildcard like *.doc will find both .doc and .docx files for example. A filename like longfilename.docx also creates an entry named LONGFI~1.DOC
The MSDN docs mention nothing about FindFirstFile allowing multiple search patterns, hence it doesn't exist.
In this case your best bet is to scan using an open selection (like C:\\some directory\* or *) and then filter based on WIN32_FIND_DATA's cFileName member, using strrchr (or the appropriate Unicode variant) to find the extension. It should run pretty fast for the small set of characters that make up a file extension.
If you know the that all the extensions are say 3 characters, you should be able to mask it off as *.??? to speed things up.

How do you filter Ruby Find.find() results?

Find.find("d") {|path| puts path}
I want to exclude certain type of files, say *.gif and directories.
PS: I can always add code inside my block to check for the file name and directory type, but I want find itself to filter files for me.
I don't think you can tell find to do that.You could try using Dir#[], which accepts file globs. If you are looking for particular types of files, or files that can be filtered with the file glob pattern language, it may be a better fit.
eg
Dir["dir/**/*.{xml,png,css,html}"]
would find all the xml, png, css, and html files under the directory d.
Check out the docs for more info.
You can't make find do it, but Find may help: in the block, you need to check whether the current path is one of those you'd like to exclude or not; if so, then call Find#prune. This seems to be the standard idiom when using Find.
If you decide to use Dir#[] instead, you may call reject on its result, passing a block to exclude certain types of files. However, note that, as far as I understand, Dir#[] reads all the contents of your d directory before you can filter, while Find#prune guarantees not to read the contents of pruned subdirectories if you call it within the block passed to Find#find.

Are there any invalid linux filenames?

If I wanted to create a string which is guaranteed not to represent a filename, I could put one of the following characters in it on Windows:
\ / : * ? | < >
e.g.
this-is-a-filename.png
?this-is-not.png
Is there any way to identify a string as 'not possibly a file' on Linux?
There are almost no restrictions - apart from '/' and '\0', you're allowed to use anything. However, some people think it's not a good idea to allow this much flexibility.
An empty string is the only truly invalid path name on Linux, which may work for you if you need only one invalid name. You could also use a string like "///foo", which would not be a canonical path name, although it could refer to a file ("/foo"). Another possibility would be something like "/dev/null/foo", since /dev/null has a POSIX-defined non-directory meaning. If you only need strings that could not refer to a regular file you could use "/" or ".", since those are always directories.
Technically it's not invalid but files with dash(-) at the beginning of their name will put you in a lot of troubles. It's because it has conflicts with command arguments.
I personally find that a lot of the time the problem is not Linux but the applications one is using on Linux.
Take for example Amarok. Recently I noticed that certain artists I had copied from my Windows machine where not appearing in the library. I check and confirmed that the files were there and then I noticed that certain characters in the folder names (Named for the artist) were represented with a weird-looking square rather than an actual character.
In a shell terminal the filenames look even stranger: /Music/Albums/Einst$'\374'rzende\ Neubauten is an example of how strange.
While these files were definitely there, Amarok could not see them for some reason. I was able to use some shell trickery to rename them to sane versions which I could then re-name with ASCII-only characters using Musicbrainz Picard. Unfortunately, Picard was also unable to open the files until I renamed them, hence the need for a shell script.
Overall this a a tricky area and it seems to get very thorny if you are trying to synchronise a music collection between Windows and Linux wherein certain folder or file names contain funky characters.
The safest thing to do is stick to ASCII-only filenames.

Resources