Allocate file on NTFS without zeroing - winapi

I want to make a tool similar to zerofree for linux. I want to do it by allocating a big file without zeroing it, look for nonzero blocks and rewrite them.
With admin privileges it is possible, uTorrent can do this: http://www.netcheif.com/Articles/uTorrent/html/AppendixA_02_12.html#diskio.no_zero , but it's closed source.

I am not sure this answers your question (need), but such a tool already exists. You might have a look at fsutil.exe Fsutil command line tool. This tool has a huge potential to discover the internal structures of NTFS files and can also create file of any size (without the need to zeroing it manually). Hope that helps.

Wrote a tool https://github.com/basinilya/winzerofree . It uses SetFileValidData() as #RaymondChen suggested

You should try SetFilePointerEx
Note that it is not an error to set the file pointer to a position
beyond the end of the file.
So after you create file, call SetFilePointerEx and then SetEndOfFile or WriteFile or WriteFileEx and close the file, size should be increased.
EDIT
Raymonds suggested SetValidData is also good solution, but this requares privileges, so shouldn't be used often by anyone.
My solution is the best on NTFS, because it supports feature known as initialized size it means that after using SetFilePointerEx data won't be initialized to zeros, but after attempt to read uninitialized data you will receive zeros.
To sum up, if NTFS use SetFilePointerEx, if FAT (not very likely) - use SetValidData

Related

Calling os.Lstat only if the file has changed since the last time I called os.Lstat

I'm trying to write a program, calcsize, that calculates the size of all sub directories. I want to create a cache of the result and only re-walk the directory if it has changed since the last time I've run the program.
Something like:
./calcsize
//outputs
/absolute/file/path1/ 1000 Bytes
/absolute/file/path2/ 2000 Bytes
I'm already walking the dirs with my own walk implementation because the built in go filepath.Walk is already calling Lstat on every file.
Is there any way to know if a directory or set of files has changed without calling Lstat on every file? Maybe a system call I'm not aware of?
In general, no. However you might want to look at: https://github.com/mattn/go-zglob/blob/master/fastwalk/fastwalk_unix.go
And using that data you can skip some of the stat calls, if you only care about files.
Whether, and how, this is possible depends heavily on your operating system. But you might take a look at github.com/howeyc/fsnotify which claims to offer this (I've never used it--I just now found it via Google).
In general, look at any Go program that provides a 'watch' feature. GoConvey and GopherJS's serve mode come to mind as examples, but there are others (possibly even in the standard library).

Move or copy and truncate a file that is in use

I want to be able to (programmatically) move (or copy and truncate) a file that is constantly in use and being written to. This would cause the file being written to would never be too big.
Is this possible? Either Windows or Linux is fine.
To be specific what I'm trying to do is log video with FFMPEG and create hour long videos.
It is possible in both Windows and Linux, but it would take cooperation between the applications involved. If the application that is writing the new data to the file is not aware of what the other application is doing, it probably would not work (well ... there is some possibility ... back to that in a moment).
In general, to get this to work, you would have to open the file shared. For example, if using the Windows API CreateFile, both applications would likely need to specify FILE_SHARE_READ and FILE_SHARE_WRITE. This would allow both (multiple) applications to read and write the file "concurrently".
Beyond sharing the file, though, it would also be necessary to coordinate the operations between the applications. You would need to use some kind of locking mechanism (either by locking some part of the file or some shared mutex/semaphore). Note that if you use file locking, you could lock some known offset in the file to act as a "semaphore" (it can even be a byte value beyond the physical end of the file). If one application were appending to the file at the same exact time that the other application were truncating it, then it would lead to unpredictable results.
Back to the comment about both applications needing to be aware of each other ... It is possible that if both applications opened the file exclusively and kept retrying the operations until they succeeded, then perform the operation, then close the file, it would essentially allow them to work without "knowledge" of each other. However, that would probably not work very well and not be very efficient.
Having said all that, you might want to consider alternatives for efficiency reasons. For example, if it were possible to have the writing application write to new files periodically, it might be more efficient than having to "move" the data constantly out of one file to another. Also, if you needed to maintain some portion of the file (e.g., move out the first 100 MB to another file and then move the second 100 MB to the beginning) that could be a fairly expensive operation as well.
logrotate would be a good option is linux, comes stock on just about any distro. I'm sure there's a similar windows service out there somewhere

How do I determine where process executable code starts and ends when loaded in memory?

Say I have app TestApp.exe
While TestApp.exe is running I want a separate program to be able to read the executable code that is resident in memory. I'd like to ignore stack and heap and anything else that is tangential.
Put another way, I guess I'm asking how to determine where the memory-side equivalent of the .exe binary data on disk resides. I realize it's not a 1:1 stuffing into memory.
Edit: I think what I'm asking for is shown as Image in the following screenshot of vmmap.exe
Edit: I am able to get from memory all memory that is tagged with any protect flag of Execute* (PAGE_EXECUTE, etc) using VirtualQueryEx and ReadProcessMemory. There are a couple issues with that. First, I'm grabbing about 2 megabytes of data for notepad.exe which is a 189 kilobyte file on disk. Everything I'm grabbing has a protect flag of PAGE_EXECUTE. Second, If I run it on a different Win7 64bit machine I get the same data, only split in half and in a different order. I could use some expert guidance. :)
Edit: Also, not sure why I'm at -1 for this question. If I need to clear anything up please let me know.
Inject a DLL to the target process and call GetModuleHandle with the name of the executable. That will point to its PE header that has been loaded in the memory. Once you have this information, you can parse the PE header manually and find where .text section is located relative to the base address of the image in the memory.
no need to inject a dll
use native api hooking apis
I learned a ton doing this project. I ended up parsing the PE header and using that information to route me all over. In the end I accomplished what I set out to and I am more knowledgeable as a result.

Is it possible to create a file that cannot be copied?

To restrict the scope, let assume we are in Windows world only.
Also assume we don't want to play with permission policy.
Is it possible for us to create a file that cannot be copied?
Thank you in advance.
"Trying to make digital files uncopyable is like trying to make water not wet." ~ Bruce Schneier
No. You can't create a file that a SYSADMIN can't copy. You could encrypt it, though.
Well, how about creating a file that uses up more than 50% of the total space on that machine and that is not compressible?
For instance, let us assume that you want to save a boolean (true or false) in such a fashion.
Depending on its value, you could then write a bit stream of ones or zeroes and encrypt said stream using some kind of encryption algorith, such as AES in CBC mode. This gives you the added advantage of error correction. Even in case of massive data corruption, you should be able to recover your boolean by checking whether ones or zeroes are prevalent in the decrypted stream.
In that case you cannot copy it around (completely) on the machine...
Of course, any type of external memory that can be added to the system would pose a problem in this scenario. But the file would be already encrypted, so don't worry about it too much...
Any file that can be read can have its contents written to another location (such as another file, i.e. copied).
The only thing you can do is limit who/what can read the file.
What is the motivation behind? If it is a read-only file, you can have it as embedded resources within your assembly.
Nice try, RIAA.
But seriously, no you can not. It is always possible to copy, you can just make it it more difficult for people to make sense of the file or try to hide it using like encryption. Spotify does it.
If you really try hard thou, you cold make a root-kit for windows and use it to prevent windows from even knowing about the file and also prevent copies. The file will still be there and copy-able by other tools, or Linux accessing the ntfs.
If in a running process you open a file and hold an exclusive lock, then other processes cannot read the file until you close the handle or your process terminates. However, as admin you could forcibly remove the lock handle.
Short answer: No.
You can, of course, use security settings to limit who can read the file. But if someone can read it, then they can copy it. Even if you found some operating system trick to disable "ordinary" copying, if someone can read the file, they can extract the contents, store it in memory, and then write it somewhere else.
You can encrypt the contents so it's only useful to your own program, that knows how to decrypt it.
That's about it.
When using Windows 7 to copy some files from a hard drive, certain files popped up a message saying they could not be copied in their entirety; certain data would be omitted from the copy. I suspect that had something to do with slack space at the end of the files, though I thought the message was curious. I would have expected the copy operation to just ignore the slack space.
If you are running old (OLD) versions of windows, there are certain characters you can put in the filename that make it invalid, not listed in folders, etc. They were used a lot in the old pub ftp days of filesharing ;)
In the old DOS days, you used to be able to flag disk sectors as bad and still read from them. This meant the OS ignored the sector in question but your application would know where to look and be able to get the data. Not sure this would work these days.
Another old MS-DOS trick was to put a space character in the middle of the filename (yes, spaces were valid characters for filenames). Since there was no method on the command line to escape a space, the file couldn't be copied using the DOS commands.
This answer is outside Windows so yeah
Dont know if its already been said but what about a file that is an inseperable part of the firmware so that it is always on AND running, perhaps it has firmware that generates a sequence that is required for the other . AN incedental effect of its running is to prevent any 80% or more of its code from being replicated. Lets say its on an entirely different board, protected by surge protectors, heavy em proof shielding and anything else required to make it completely unerasable.
If its possible to make a program that is ALWAYS on and running as long as the copying software is running then yes.
I have another way and this IS with windows. I will come to your house and give you a disk, i will then proceed to destroy every single computer you put the disk into. This doesnt work on XP
Well technically you could create and write to a write-only network share.

How to see fragmentation of a specific file?

Is there a tool that would show me for a specific file on disk, how fragmented it is? (How many seeks does physical disk need to make if I were to read that file in a linear fashion)
The Sysinternals tool contig with parameter -a can do this for a file or all files in a folder and its subfolders.
You can use DeviceIoControl with FSCTL_GET_VOLUME_BITMAP, FSCTL_GET_RETRIEVAL_POINTERS and FSCTL_MOVE_FILE, see Defragmenting Files.
You can also find different code examples if you search for FSCTL_MOVE_FILE.
Here is one in C and another in .NET.
filefrag is the tool you're looking for, if you're using Linux.
Use -v parameter with filename to get detailed list of fragmentation.
http://linux.die.net/man/8/filefrag
And, of course, "fragmentation" is suspect:
The file may be in pieces in the same cylinder. No seek overhead, just rotational latency. Or not as the pieces may be an optimal order (chances are near zero for this one).
The file may be "contiguous" but across several cylinders. Even reading sequentially will result in seeks.
The file may be on a stripe set and you have no idea where the boundaries are. You may skip to another controller, another spindle, or another partition on the same drive.
Be careful about what conclusions you draw.
fsutil file queryallocranges offset=<o> length=<l> <file> will show you the file's extents you will need admin rights.

Resources