Are stringified MongoDB ObjectID's safe as folder names? - ruby

As it says on the tin. I'd like to name folders corresponding to "Channels." I'd personally rather use the human-readable name but I was told halfway through development that names cannot be static (for some reason)

Yes ObjectId's are safe as folder names if by safe you mean are they valid folder name.
For example, an ObjectId is a hex value of characters 0-9 and a-z of length 24, which will always be a valid folder name.
If you mean to ask if the ObjectId carries some sensitive information, you should know that it has the date its corresponding document was created embedded in it. Someone with access to the ObjectId would be able to discover when it was created. Whether this is a concern is up to you.

Related

For the use of the LocalizedResourceName property

I wish to customize my own folder style, I tried to make the folder get remarks by modifying the LocalizedResourceName property in desktop.ini.
I try to set LocalizedResourceName to a Chinese string. But it is displayed as garbled characters when it is actually displayed.
I noticed the following code in the desktop.ini of the system folder:
LocalizedResourceName=#%SystemRoot%\system32\shell32.dll,-21798
So I try to write a .dll file by myself, encapsulate the icon and string, and use it.
I already know how to make a resource-only dll file, but I don't know how to get a certain resource in the file. (ie, get the number -21798 in the above example code)
How should I do ?
By convention, a positive resource number is an index (0 is the first resource etc.) and negative numbers are resource ids. In this specific case, it is the string resource with the id of abs(-21798) that Windows would pass to LoadString.
If you want to create your own .dll, add a string with an id of 2 for example (any number between 2 and 0xffff) and in your .ini you would use #c:\path\mydll.dll,-2.
Before you go to all this trouble, just try saving the .ini as UTF-16 LE (Unicode in Notepad) and use Chinese strings directly without the #.

HL7 FHIR mark resources as anonymized

I am trying to map an existing domain into HL7 FHIR.
So far it was pretty easy to find FHIR resources that more or less represent the same data and can be used for that purpose. But now I am running into a problem of which I am not sure how to solve it.
The existing domain allows that data can be anonymized depending on the users access level. e.g. a patient's name or address might be removed and marked as anonymized. Other data will be pseudonymised, for example a the birthdate in 1980 will be replaced with 01.01.1980. An Age of 37 will be replaced with a category of 30-40.
So I am unsure how to integrate that into the FHIR domain. I was thinking I could create an extension holding a boolean, indicating if a value was anonymized or not and always replace or remove the original value. This might work, but I will run into big problems when the anonymized value is of a different type than the original value (e.g. Age is replaced by a range of values)
Is that even a valid approach? I thought this might be common problem, but I could not find any examples where people described methods of how to mark data as altered. Unfortunately the documentation at http://build.fhir.org/extensibility-registry.html does not contain anything that would help my case.
You can use security labels for this purpose (Resource.meta.security). Take a look at REDACTED and SUBSETTED in the security label value set: https://www.hl7.org/fhir/valueset-security-labels.html
If you need to convey a data type other than the one allowed by the resource (e.g. wanting to convey a range rather than a birthdate), you'd need to use an extension. (Note that dates are valid even if you only include the year.)

Delphi TOpenDialog/TSaveDialog last used path

Referring my question to this answer: https://stackoverflow.com/a/4016075/698266, in particular step 3 says "Otherwise, if the application has used an Open or Save As dialog box in the past, the path most recently used is selected as the initial directory."
Where does Windows save this information?
Note: by experimenting, it seems to be linked to the application file name without its path - i.e. the same executable copied in different directories "sees" the same last path information, while changing the exe file name makes the dialogs point to the user's Documents directory.
My actual interest is for testing purposes. I need to "reset" this information in order to test my application in conditions similar to a first run.
Windows XP uses HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\ComDlg32\LastVisitedMRU and the format of each item seems to be ExeFilename+Path with both strings zero terminated and in UTF-16LE format. The MRU list is stored as a string named MRUList.
Newer versions of Windows uses HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\ComDlg32\LastVisitedPidlMRU and HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\ComDlg32\LastVisitedPidlMRULegacy and the format seems to be ExeFilename+ItemIdList (ExeFilename in UTF-16LE and zero terminated). The MRU list seems to be a list of DWORDs in a binary value named MRUListEx and the list is terminated by 0xffffffff.
I would assume that the change happened in Vista because that is when the new IFileDialog was added. LastVisitedPidlMRULegacy is probably used when GetOpen/SaveFileName is called with a custom template and/or hook function.
I finally found the answer myself.
For Windows 10 (this may be different in different versions of Windows, as David pointed up) there's a list of values in the registry that keep track of the executable name and its associated last "visited" path.
The list can be found in this key:
HKCU\SOFTWARE\Microsoft\Windows\CurrentVersion\Explorer\ComDlg32\LastVisitedPidlMRU
In order to reset the default open/save path for a particular program, you have to find the value whose data string (UNICODE) starts with your executable name and delete it. If you watch at the data string, you'll notice that the last used path is there, after the executable name.

read known file extensions / types from the registry

I want to present the user with a list of known file extensions for him to pick. I know that these are stored in the Registry under HKEY_CLASSES_ROOT usually like this:
.txt -> (default)="txtfile"
where txtfile then contains the information about associated programs etc.
Unfortunately that place in the registry also stores lots of other keys, like the file types (e.g. txtfile) and entries like
CAPICOM.Certificates (whatever that is)
How do I determine which of the entries are file extensions? Or is there a different way to get these extensions like an API function?
(I don't think it matters, but I am using Delphi for the program.)
There is no guarantee that every keys preceded by a dot in HKEY_CLASSES_ROOT is intended for file association, but every file association requires creation of a key preceded by a dot. See MSDN on File Types topic.
AFAIK, the method I describe here conforms with how the Windows Set File Associations feature works to get a list of all known file types. It was based on my former observation when I delved into this subject.
To achieve that, you'll need to do intricate steps as follows:
Enumerating every keys preceded by a dot . , you can use RegQueryInfoKey() and RegEnumKeyEx() for this purpose.
In every keys preceded by a dot, look at the default value data:
a. If the default value is not empty, this is enough indication that the "preceding dot key" is intended for file association in all Windows NT version, then try to open the key name as mentioned by the value data, just says TheKeyNameMentioned.
a1) If there is subkeys shell\open\command under TheKeyNameMentioned, then test the existence of the path pointed by the default value of this key; if the path exists, there is a default application associated with the extension; if the path doesn't exists, the default application is unknown. To get the file extension description, look at the default value of TheKeyNameMentioned. To get the program description, first, test whether the following key contain a value-name equal to the EXE file path, that is HKCR\Local Settings\Software\Microsoft\Windows\Shell\MuiCache. If it is there, then look at the value data to get the file description; if it is not there, use GetFileVersionInfo() directly to get the file description.
a2) If there is no subkeys shell\open\command under TheKeyNameMentioned, then the default application is unknown. To get the file extension description, look at the default value of TheKeyNameMentioned.
b. On Windows Vista and later, when the point [a] fails, you need additional check. If the default value is empty, test whether the key has a subkey named OpenWithProgIDs.
If OpenWithProgIDs subkey exists, use RegEnumValue() to find the first encountered value name that meets the criteria, that is, the name of the value name must point to an existing key (just says TheKeyNameMentioned.) with the same name as the value name. If TheKeyNameMentioned exists, this is enough indication that the "preceding dot key" is intended for file association. Read point a1 and a2 for the next steps.
If OpenWithProgIDs subkey doesn't exist, the default application is unknown. To get the file extension description, look at the default value of TheKeyNameMentioned.
Hope that helps. :-)
For a command-line alternative, the assoc command-line program included in Windows shows registered file extensions.
c:\> assoc
.3g2=VLC.3g2
.3gp=VLC.3gp
.3gp2=VLC.3gp2
.3gpp=VLC.3gpp
...
I'm not sure which verb this looks for. Open perhaps? I'm also not sure which extensions will appear in this list. Perhaps the extensions of files that can open from the command line.
To then find out which executable is mapped to each file type, the ftype command will tell:
c:\> ftype VLC.3g2
VLC.3g2="c:\vlc.exe" --started-from-file "%1"
IMHO - all those registry subkeys starting with the dot (.) - are for file extensions.
For instance in your case .txt stands for the "txt" extension, whereas txtfile doesn't start with the dot.
I don't know the details, but it seems you could use the IQueryAssociations interface.

SHA-1 hash for storing Files

After reading this, it sounds like a great idea to store files using the SHA-1 for the directory.
I have no idea what this means however, all I know is that SHA-1 and MD5 are hashing algorithms. If I calculate the SHA-1 hash using this ruby script, and I change the file's content (which changes the hash), how do I know where the file is stored then?
My question is then, what are the basics of implementing a SHA-1/file-storage system?
If all of the files are changing content all the time, is there a better solution for storing them, or do you just have to keep updating the hash?
I'm just thinking about how to create a generic file storing system like GoogleDocs, Flickr, Youtube, DropBox, etc., something that you could reuse in different environments (such as storing PubMed journal articles or Cramster homework assignments and tests, or just images like on Flickr). I'd probably store them on Amazon EC2. Just some system so I can say "this is how I'll 99% of the time do file storing from now on", so I can stop thinking about building a solid/consistent way to store files and get onto some real problems.
First of all, if the contents of the files are changing, filename from SHA-digest approach is not very suitable, because the name and location of the file in filesystem must change when the contents of the file changes.
Basically you first compute a SHA-1 or MD5 digest (= hash value) from the contents of the file.
When you have a digest, for example, 00e4f56c0de1c61fdb926e79e8a0a65bd12930c9, you generate a file location and filename from the digest. For example, you split the first few characters from the digest to directory structure and rest of the characters to file name. For example:
00e4f56c0de1c61fdb926e79e8a0a65bd12930c9 => some/path/00/e4/f5/6c0de1c61fdb926e79e8a0a65bd12930c9.txt
This way you only need to store the SHA-1 digest of the file to database. You can then always find out the right location and the name of the file.
Directories usually also have maximum number of files they can contain, for example maximum of 32000 subdirectories and files per directory. A directory structure based on this kind of hashing makes it unlikely that you store too many files to same directory. Also using hashing like this make sure that every directory has about the same number of files, you won't get into situation where all your files are in same directory.
The idea is not to change the file content, but rather its name (and path), by using a hash value.
Changing the content with a hash would be disastrous since a hash is normally not reversible.
I'm not sure of the motivivation for using a hash rather than the file name (or even rather than a long random number), but here are a few advantages of the hash appraoch:
the file names on the disk is uniform
the upper or lower parts of the hash value can be used to name the directories and hence distribute the files relatively uniformely
the name becomes a code, making it difficult for someone to
a) guess a file name
b) categorize pictures (would someone steal the hard drive content)
be able to retrieve the filename and location from the file contents itself (assuming the hash comes from such content. (not quite sure which use case would involve this... a bit contrieved...)
The general interest of using a hash is that unlike a file name, a hash is meaningless, and therefore one would require the database to relate images and "bibliographic" type data (name of uploader, date of upload, tags, ...)
In thinking about it, re-reading the referenced SO response, I don't really see much of an advantage of a hash, as compared to, say, a random number...
Furthermore... some hashes produce a numeric value, typically expressed in hexadecimal (as seen in the refernced SO question) and this could be seen as wasteful, by making the file names longer than they need to be, and hence putting more stress on the file system (bigger directories...)
One advantage I see with storing files using their hash is that the file data only needs to be stored once and then can be referenced multiple times within your database. This will save you space if you have a different users uploading the exact same file.
However the downside to this is when a user deletes what they think is there file from your app, you can't just physically delete the file from disk because other users that uploaded the same exact file may still be using it.
The idea is that you need to come up with a name for the photo, and you probably want to scatter the files among a number of directories. One easy way to come up with a unique name is to use the hash.
So the beginning of the hash was peeled off for a multi-level directory structure and the rest of the hash was used for a filename for the jpg.
This has the additional benefit of detecting duplicate uploads.

Resources