Is there a way to just get the number of files in the entire drive before going through the entire list using the list api? I can't seem to find the data I need, I can get the total storage quota, but I just need the number of files in the drive so I can show progress in my UI.
The only way to get a total for the number of files on a Google drive account is going to be doing a File.list and looping though everything. Depending upon how many files you have you may have to do some pagination.
The only other option would be to use About which will return the amount of storage used but not the total of files, which doesnt sound like what you are after.
Related
I need to get any information about where the file is physically located on the NTFS disk. Absolute offset, cluster ID..anything.
I need to scan the disk twice, once to get allocated files and one more time I'll need to open partition directly in RAW mode and try to find the rest of data (from deleted files). I need a way to understand that the data I found is the same as the data I've already handled previously as file. As I'm scanning disk in raw mode, the offset of the data I found can be somehow converted to the offset of the file (having information about disk geometry). Is there any way to do this? Other solutions are accepted as well.
Now I'm playing with FSCTL_GET_NTFS_FILE_RECORD, but can't make it work at the moment and I'm not really sure it will help.
UPDATE
I found the following function
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=vs.85).aspx
It returns structure that contains nFileIndexHigh and nFileIndexLow variables.
Documentation says
The identifier that is stored in the nFileIndexHigh and nFileIndexLow members is called the file ID. Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.
I don't really understand what is this. I can't connect it to the physical location of file. Is it possible later to extract this file ID from MFT?
UPDATE
Found this:
This identifier and the volume serial number uniquely identify a file. This number can change when the system is restarted or when the file is opened.
This doesn't satisfy my requirements, because I'm going to open the file and the fact that ID might change doesn't make me happy.
Any ideas?
Use the Defragmentation IOCTLs. For example, FSCTL_GET_RETRIEVAL_POINTERS will tell you the extents which contain file data.
I store a lot of my music and some movies on my external hard drive so I can go upstairs and play them on my PS3. Not all of it is appropriate for her age group and so I am trying to devise a way to prevent her from viewing or listening to it so my parents don't yell at me. I would just encrypt it, but the PS3 does not support encryption and is a pain to decrypt a one time encryption every time I want to use it (if such a thing exists). So I thought I would employ a little steganography. If I created 100 empty folders, placing the real files in one undisclosed one she would have a 1% chance of guessing and would probably give up quickly. She could just look at the file size, but I highly doubt she would ever think of that. Anyone know how I can create a whole bunch of folders, I don't want to do it by hand. A simple executable script would be very helpful (e.g. just insert how many folders you want and where). Thanks!
If you're using bash then this will work:
for (( i=0; i<100; i++ )); do mkdir junk$i; done
It will create 100 directories named junk0 thru junk99. You can change junk to anything you like. If you want to get fancy you could read "man random" and figure out how to use random numbers rather than consecutive numbers.
So, I am in this little predicament where I am stuck watching a few ftp folders to see if they have new files added to them. If they do, it needs to throw an event with the file name. Thereby telling something else to download that file.
This is a pretty simple object to make, I was just curious if anyone knew how expensive this operation would be?
I plan on using the command NLIST because I don't need file size information, and there will be no sub-directories in the folder. Each file in the folder will have exactly 25 characters in its name.
There could be anywhere from 10 to 'maybe' a couple thousand (max around 2000) files per folder (usually on the lower end, 100-300, but currently growing).
The files are anywhere from 250kb to a very VERY unlikely 10mb (usually within the 250kb to 4mb range).
There possibly could be up to a few hundred folders (in which case I could change the watch frequency depending on number of folders), but currently there are only a few (6-10ish).
There also would be multiple logins for the ftp server, different logins would have access to different folders.
I am not asking for an implementation, just if anyone has some first or second hand knowledge about FTP, how could this affect my network.
I am not opposed to putting in file retention times or change the frequency in which I check for new files.
Do you have any control over the remote servers? FTP isn't really optimized for this, and you could probably do a lot better with some sort of dedicated mini-server. You could use file system monitoring on the remote side and just send out the filenames when they arrive rather than continuously polling. You'd only need to have one connection open too, rather than the two that FTP requires.
I'm rendering millions of tiles which will be displayed as an overlay on Google Maps. The files are created by GMapCreator from the Centre for Advanced Spatial Analysis at University College London. The application renders files in to a single folder at a time, in some cases I need to create about 4.2 million tiles. Im running it on Windows XP using an NTFS filesystem, the disk is 500GB and was formatted using the default operating system options.
I'm finding the rendering of tiles gets slower and slower as the number of rendered tiles increases. I have also seen that if I try to look at the folders in Windows Explorer or using the Command line then the whole machine effectively locks up for a number of minutes before it recovers enough to do something again.
I've been splitting the input shapefiles into smaller pieces, running on different machines and so on, but the issue is still causing me considerable pain. I wondered if the cluster size on my disk might be hindering the thing or whether I should look at using another file system altogether. Does anyone have any ideas how I might be able to overcome this issue?
Thanks,
Barry.
Update:
Thanks to everyone for the suggestions. The eventual solution involved writing piece of code which monitored the GMapCreator output folder, moving files into a directory heirarchy based upon their filenames; so a file named abcdefg.gif would be moved into \a\b\c\d\e\f\g.gif. Running this at the same time as GMapCreator overcame the filesystem performance problems. The hint about the generation of DOS 8.3 filenames was also very useful - as noted below I was amazed how much of a difference this made. Cheers :-)
There are several things you could/should do
Disable automatic NTFS short file name generation (google it)
Or restrict file names to use 8.3 pattern (e.g. i0000001.jpg, ...)
In any case try making the first six characters of the filename as unique/different as possible
If you use the same folder over and (say adding file, removing file, readding files, ...)
Use contig to keep the index file of the directory as less fragmented as possible (check this for explanation)
Especially when removing many files consider using the folder remove trick to reduce the direcotry index file size
As already posted consider splitting up the files in multiple directories.
.e.g. instead of
directory/abc.jpg
directory/acc.jpg
directory/acd.jpg
directory/adc.jpg
directory/aec.jpg
use
directory/b/c/abc.jpg
directory/c/c/acc.jpg
directory/c/d/acd.jpg
directory/d/c/adc.jpg
directory/e/c/aec.jpg
You could try an SSD....
http://www.crucial.com/promo/index.aspx?prog=ssd
Use more folders and limit the number of entries in any given folder. The time to enumerate the number of entries in a directory goes up (exponentially? I'm not sure about that) with the number of entries, and if you have millions of small files in the same directory, even doing something like dir folder_with_millions_of_files can take minutes. Switching to another FS or OS will not solve the problem---Linux has the same behavior, last time I checked.
Find a way to group the images into subfolders of no more than a few hundred files each. Make the directory tree as deep as it needs to be in order to support this.
The solution is most likely to restrict the number of files per directory.
I had a very similar problem with financial data held in ~200,000 flat files. We solved it by storing the files in directories based on their name. e.g.
gbp97m.xls
was stored in
g/b/p97m.xls
This works fine provided your files are named appropriately (we had a spread of characters to work with). So the resulting tree of directories and files wasn't optimal in terms of distribution, but it worked well enough to reduced each directory to 100s of files and free the disk bottleneck.
One solution is to implement haystacks. This is what Facebook does for photos, as the meta-data and random-reads required to fetch a file is quite high, and offers no value for a data store.
Haystack presents a generic HTTP-based object store containing needles that map to stored opaque objects. Storing photos as needles in the haystack eliminates the metadata overhead by aggregating hundreds of thousands of images in a single haystack store file. This keeps the metadata overhead very small and allows us to store each needle’s location in the store file in an in-memory index. This allows retrieval of an image’s data in a minimal number of I/O operations, eliminating all unnecessary metadata overhead.
I was wondering, is there some type of disc ID i can use to search in my database to see if that disc is has been scanned or not? All disc were created by me typically burnt on windows.
-edit- I could compare write time and volume label to see if the disc has been scanned but i notice if i put in certain commercial disc that fields are blank or wrong causing many false positives (i once had the time set to the future, i dont know if people want to archive the contents of files on a commercial disc in my app).
Please look at
http://wiki.dvdlookup.org/index.php?title=Disc_Identification