How data is stored in windows clipboard - windows

Whenever we copy any multimedia file or any file except text (Not sure about it) in clipboard , does it stores the address of file or data copy because whenever we copy any movie of like 3 GB , C disk size doesn't increases it means clipboard stores the address not the copy .Is it true ???

If you're copying files, you're dealing with file pointers like HDrop, which take almost no space. And almost no time to perform the copy. If you actually had to wait for 3 GB to be copied into a memory buffer, you would be waiting a long time, there would be a lot of I/O, and unless you had a lot of memory, your system would need to utilize pagefile space, thereby causing even more I/O.
You should also realize that unlike a text/HTML/RTF/graphic copy (where the data is actually on the clipboard), the clipboard cannot be used as a safety net. With text, you can copy, then delete the text, and paste it to get it back. Not so with files. If you copy a file, then delete that file, you won't be able to paste it. This may seem obvious, but it's important to understand when you're using any kind of clipboard manager that lets you go back and paste prior clips. You can paste a file pointer from 3 days ago, for example, but the result won't be that file from 3 days ago. It'll be whatever that file pointer references on today's disk.

does it stores the address of file
Basically yes, but not the really address but so-called handle of the file.
It's an abstract reference value to a resource, often memory or an open file, or a pipe.
Properly, in Windows, (and generally in computing) a handle is an abstraction which hides a real memory address from the API user, allowing the system to reorganize physical memory transparently to the program. Resolving a handle into a pointer locks the memory, and releasing the handle invalidates the pointer. In this case think of it as an index into a table of pointers.
You use the index for the system API calls, and the system can change the pointer in the table at will.
You can take a look at this article if you want to know how exactly the clipboard works: http://blogs.msdn.com/b/ntdebugging/archive/2012/03/16/how-the-clipboard-works-part-1.aspx

#Hot Cool stud :
To copy the path of a file/folder
Press Shift Down, select the file or folder, Right Click, you will see an extra menu_option as "Copy as path". Select it, and the path is copied to the clipboard

Related

How do I safely and correctly create a backup of the Windows clipboard?

I'm trying to create a backup of the Windows clipboard. Basically what I'm doing is using EnumClipboardFormats() to get all of the formats that exist on the clipboard currently, and then for each format, I'm calling GetClipboardData(format).
Part of backing up the data obviously involves duplicating it. I do that by calling GlobalLock() (which "Locks a global memory object and returns a pointer to the first byte of the object's memory block.") on the data returned by GetClipboardData(), then I fetch the size of the data by calling GlobalSize(), and then finally I do a memcpy() to duplicate the data. I then of course call GlobalUnlock() when I'm done.
Well, this works... most of the time. My program crashes at the GlobalLock() if the clipboard contains data with the format CF_BITMAP or CF_METAFILEPICT. After reading this Old New Thing blog post (https://devblogs.microsoft.com/oldnewthing/20071026-00/?p=24683) I found out why the crash occurs: apparently not all data on the clipboard is allocated using GlobalAlloc() (such as CF_BITMAP data), and so calling GlobalLock() on that data causes a crash.
I came across this MSDN article and it gives a list of clipboard formats and how they are freed by the system. So what I did was hard-code into my program all of the clipboard formats (CF_*) that are not freed by the GlobalFree() function by the system, and I simply don't back up those formats; I skip them.
This workaround seems to work well, actually. Even if a bitmap is on the clipboard, or "special" data (such as rows copied from Excel to the clipboard), my clipboard backup function works well and I haven't experienced any crashes. Also, even if there's a bitmap on the clipboard and I skip some formats during the backup (like CF_BITMAP), I can still Ctrl+V paste the originally-copied bitmap from the clipboard after restoring my clipboard backup, as the bitmap is represented by other formats on the clipboard as well that don't cause my program to crash (CF_DIB).
However, it's a workaround at best. My fear is that one of these times some weird format (perhaps a private one, i.e one between CF_PRIVATEFIRST and CF_PRIVATELAST, or maybe some other type) will be on the clipboard and my program, after calling GlobalLock(), will crash again. But since there doesn't seem to be much documentation explaining the best way to back up the clipboard, and it's clear that GlobalLock() does not work properly for all datatypes (unfortunately), I'm not sure how to handle these situations. Is it safe to assume that all other formats -- besides the formats listed in the previous URL that aren't freed by GlobalFree() -- can be "grabbed" using GlobalLock()?
Any ideas?
This is folly, as you cannot 100% backup/restore the clipboard. Lots of apps used delayed rendering, and the data is not actually on the clipboard. When you request to paste, they get notified and produce the data. This would take several minutes and hundreds of MB for large amounts of data from apps like Excel. Take a look at the number of formats listed on the clipboard when you copy from Excel. There will be more than two dozen, including Bitmap, Metafile, HTML. What do you think will happen if you select 255x25000 cells in Excel and copy that? How large would that bitmap be? Tip: save any open documents before attempting this, as you're likely going to have to reboot.

Efficient way to send files across processes

How to effectively send a file from my own process to a program such as Photoshop, Word, Paint.
I do not want to save the whole file to disk and then open the program from the startup parameters using CreateProcess, ShellExecute, etc.
Maybe the only way out is Memory Maped Files?
Maybe I should look to COM, IPC, Pipes?
You cannot tell these programs that your file data is actually a memory mapped file. That really doesn't matter, files are already memory mapped by default. Much more efficiently than a MMF, file data is stored in RAM and doesn't take any space in the paging file.
The file system cache takes care of that. Think of it as a large RAM disk without actually having to pay for the RAM. This works so well that there never was a need for these programs to do something else than accept their input from a file.

Opening a custom file on-demand

I have a custom file type that is implemented in sections with a header at the shows the offset and length of each section within the file.
Currently, whenever I want to interact with the file, I must either load and parse the entire thing up front, or else pick only the sections that I need and load just them.
What I would like to do is to achieve a hybrid approach where each of the sections is loaded on-demand.
It seems however that doing this has a lot of potential downsides in terms of leaving filesystem handles open for longer than I would like and the additional code complexity that I would incur.
Are there any standard patterns for this sort of thing? It seems that my options are to:
Just load the entire file and stop grousing about the cycles/memory wasted
Load the entire file into memory as raw bytes and then satisfy any requests for unloaded sections from the memory buffer rather than disk. This saves me the cost of parsing the unneeded sections and requires less memory (since the disk representation is much more compact than the object model around it), but still means that I waste memory for sections that I never end up loading.
Load whatever sections I need right away and close the file but hold onto the source location of the file. Then if another section is requested, re-open the file and load the data. In this case I could get strange results if the underlying file is changed.
Same as the above but leave a file handle open (perhaps allowing read sharing).
Load the file using Memory-Mapped IO and leave a view on the file open.
Any thoughts
If possible, MMAP-ing the whole file is usually the easiest thing to do if you have a random-access pattern. This way you just delegate the loading/unloading issue to the OS and you have 1 & 2 for free.
If you have very special access patterns, you can even use something like fadvise() (I don't the exact Win32 equivalent) to tell the OS your access intend.
If your file is more than 2GB and you can either go the 64bits way or to mmap() the file on demand.
If the file is relatively small, mmap-ing the entire file is good enough. If the file is large, you could leave a mmap view open, and just move it around the file and resize it to view each section when needed.

Memory mapped files optional write possible?

When using memory-mapped files it seems it is either read-only, or write-only. By this I mean you can't:
have one open for writing, and later decide not to save it
have open open for reading, and later decide to save it
Our application uses a writeable memory-mapped file to save data files, but since the user might want to exit without saving changes, we have to use a temporary file which the user actually edits. When the user opts to save the changes, the original file is overwritten with the temporary file so it has the latest changes. This is cumbersome because the files can be very large (>1GB) and it takes a long time to copy them.
I've tried many combinations of the flags used to create the file mapping but none seem to allow the flexibility of saving on demand. Can anyone confirm this is the case? Our application is written in Delphi, but it uses the standard Windows API to create the mapping, in our case
FMapHandle := CreateFileMapping(FFileHandle, nil, PAGE_READWRITE, 0, 2 * 65536, nil);
FBasePointer := MapViewOfFile(FileMapHandle, FILE_MAP_WRITE, FileOffsetHigh,
FileOffsetLow, NumBytes);
I don't think you can. By that I mean you may be able to, but it doesn't make any sense to me :-)
The whole point of a memory-mapped file is that it's a window onto the actual file. If you don't wany changes reflected in the file, you'll probably have to do something like batch up the changes in a data structure (e.g., an array of base address, size and data) and apply them when saving.
In which case, you wouldn't actually need the memory mapped file, just read in and maintain the chunks you want to change (lock the file first if there's a chance of multi-user access).
Update:
Have you thought of the possibility of, when doing a save, deleting the original file and just renaming the temporary file to the original file name? That's likely to be much faster than copying 1G of data from temporary to original. That way, if you don't want it saved, just delete the temporary file and keep the original.
You'll still have to copy the original data to the temporary file when loading but you won't have to copy the temporary data back (whether you save it or not) - that would halve the time taken.
Possible, but non-trivial.
You have to understand memory mapped basics, and the difference between the three modes of memory-mapped files. Both set aside a part of your virtual address space and create a mapping entry in an internal table. No physical RAM is initially allocated. Hence, when you DO try to access the memory, the CPU faults and the OS has to fix up. It does so by copying the file contents to RAM and mapping the RAM to your process, at the faulting address.
Now, the difference between the three modes is how the descriptors are set on the mapped pages. In all cases you get read access on the pages. (The first mode). However, if you ask for write access and subsequently write to it, on your first write the page is marked as writeable and dirty. It can then be written back to the original file, at the discretion of the OS (Second mode). Finally, it's possible to get copy-on-write semantics. You still start out with only read access to the page in memory. When you write to it, the CPU still faults and the OS needs to fix it up. With copy-on-write, that fixup is done by setting the backing store of the changed page to the page file, instead of the original mapped file.
So, in your case you want to use copy-on-write mode. If the user decides to discard the modifications, no problem. You simply discard the memory mapping. All pages that were modified in memory, and were backed by the page file are also discarded.
If the user does decide to save, you've got a slightly harder task. You now need to figure out which parts of the file have changed. Those changes are in memory, and you need to reapply those to the source file. You can do this with Page Guards. So, when the user decides to save, copy all modified pages to a separate memory block, remap the (unchanged) file for write, and apply the changes.

Are there alternatives for creating large container files that are cross platform?

Previously, I asked the question.
The problem is the demands of our file structure are very high.
For instance, we're trying to create a container with up to 4500 files and 500mb data.
The file structure of this container consists of
SQLite DB (under 1mb)
Text based xml-like file
Images inside a dynamic folder structure that make up the rest of the 4,500ish files
After the initial creation the images files are read only with the exception of deletion.
The small db is used regularly when the container is accessed.
Tar, Zip and the likes are all too slow (even with 0 compression). Slow is subjective I know, but to untar a container of this size is over 20 seconds.
Any thoughts?
As you seem to be doing arbitrary file system operations on your container (say, creation, deletion of new files in the container, overwriting existing files, appending), I think you should go for some kind of file system. Allocate a large file, then create a file system structure in it.
There are several options for the file system available: for both Berkeley UFS and Linux ext2/ext3, there are user-mode libraries available. It might also be possible that you find a FAT implementation somewhere. Make sure you understand the structure of the file system, and pick one that allows for extending - I know that ext2 is fairly easy to extend (by another block group), and FAT is difficult to extend (need to append to the FAT).
Alternatively, you can put a virtual disk format yet below the file system, allowing arbitrary remapping of blocks. Then "free" blocks of the file system don't need to appear on disk, and you can allocate the virtual disk much larger than the real container file will be.
Three things.
1) What Timothy Walters said is right on, I'll go in to more detail.
2) 4500 files and 500Mb of data is simply a lot of data and disk writes. If you're operating on the entire dataset, it's going to be slow. Just I/O truth.
3) As others have mentioned, there's no detail on the use case.
If we assume a read only, random access scenario, then what Timothy says is pretty much dead on, and implementation is straightforward.
In a nutshell, here is what you do.
You concatenate all of the files in to a single blob. While you are concatenating them, you track their filename, the file length, and the offset that the file starts within the blob. You write that information out in to a block of data, sorted by name. We'll call this the Table of Contents, or TOC block.
Next, then, you concatenate the two files together. In the simple case, you have the TOC block first, then the data block.
When you wish to get data from this format, search the TOC for the file name, grab the offset from the begining of the data block, add in the TOC block size, and read FILE_LENGTH bytes of data. Simple.
If you want to be clever, you can put the TOC at the END of the blob file. Then, append at the very end, the offset to the start of the TOC. Then you lseek to the end of the file, back up 4 or 8 bytes (depending on your number size), take THAT value and lseek even farther back to the start of your TOC. Then you're back to square one. You do this so you don't have to rebuild the archive twice at the beginning.
If you lay out your TOC in blocks (say 1K byte in size), then you can easily perform a binary search on the TOC. Simply fill each block with the File information entries, and when you run out of room, write a marker, pad with zeroes and advance to the next block. To do the binary search, you already know the size of the TOC, start in the middle, read the first file name, and go from there. Soon, you'll find the block, and then you read in the block and scan it for the file. This makes it efficient for reading without having the entire TOC in RAM. The other benefit is that the blocking requires less disk activity than a chained scheme like TAR (where you have to crawl the archive to find something).
I suggest you pad the files to block sizes as well, disks like work with regular sized blocks of data, this isn't difficult either.
Updating this without rebuilding the entire thing is difficult. If you want an updatable container system, then you may as well look in to some of the simpler file system designs, because that's what you're really looking for in that case.
As for portability, I suggest you store your binary numbers in network order, as most standard libraries have routines to handle those details for you.
Working on the assumption that you're only going to need read-only access to the files why not just merge them all together and have a second "index" file (or an index in the header) that tells you the file name, start position and length. All you need to do is seek to the start point and read the correct number of bytes. The method will vary depending on your language but it's pretty straight forward in most of them.
The hardest part then becomes creating your data file + index, and even that is pretty basic!
An ISO disk image might do the trick. It should be able to hold that many files easily, and is supported by many pieces of software on all the major operating systems.
First, thank-you for expanding your question, it helps a lot in providing better answers.
Given that you're going to need a SQLite database anyway, have you looked at the performance of putting it all into the database? My experience is based around SQL Server 2000/2005/2008 so I'm not positive of the capabilities of SQLite but I'm sure it's going to be a pretty fast option for looking up records and getting the data, while still allowing for delete and/or update options.
Usually I would not recommend to put files inside the database, but given that the total size of all images is around 500MB for 4500 images you're looking at a little over 100K per image right? If you're using a dynamic path to store the images then in a slightly more normalized database you could have a "ImagePaths" table that maps each path to an ID, then you can look for images with that PathID and load the data from the BLOB column as needed.
The XML file(s) could also be in the SQLite database, which gives you a single 'data file' for your app that can move between Windows and OSX without issue. You can simply rely on your SQLite engine to provide the performance and compatability you need.
How you optimize it depends on your usage, for example if you're frequently needing to get all images at a certain path then having a PathID (as an integer for performance) would be fast, but if you're showing all images that start with "A" and simply show the path as a property then an index on the ImageName column would be of more use.
I am a little concerned though that this sounds like premature optimization, as you really need to find a solution that works 'fast enough', abstract the mechanics of it so your application (or both apps if you have both Mac and PC versions) use a simple repository or similar and then you can change the storage/retrieval method at will without any implication to your application.
Check Solid File System - it seems to be what you need.

Resources