Hope someone here can help me with this, or know where I can ask.
I have made a distribution model using maxent (versio 3.3.3) in R (dismo package), and thereafter made a map of limiting factors as described in the appendix of Elith et. al. (http://onlinelibrary.wiley.com/doi/10.1111/j.2041-210X.2010.00036.x/full), using the maxent software through the windows cmd window. The instructions have worked fine, and I now have the limiting factors map in a file called lf_map.asc (ca 10 GB). In order to open the map in ArcGis, I have imported the asc-file as a raster into R, and saved it as a tif-file, using this R-script:
lf_map<- raster("//home//...//lf_map.asc")
writeRaster(lf_map,"//home//...//lf_map.tif")
When I open it in ArcGis, the different variables(factors) from the model have the names 0-4 in the map (I have 5 variables in the model), but now I don't know which variables belong to which number. I have also tried to use the ASCII to Raster (Conversion) tool in ArcGis, but the names still come out as 0 to 4, and not as the names of the variables. Does anyone know how to find out this?
Best regards
Kristin
I stumbled over the answer myself: I checked the cmd-window where I had run the script for the Limiting factors.map (it was still available, since it was in a detached screen), and now I saw that when the process was finished, the information about which number equaled which variable was printed in the cmd-window. Apparently, the variables were sorted alphabetically. 0=Aspect; 1= Mean summer temp etc.
Related
Right now, I am creating files to make unterminating variables. But I'm curious if there's a simpler way to create variables that don't terminate.
I find Redis invaluable for persisting data like this. It is a quick and lightweight installation and allows you to store many types of data:
strings, including complete JSONs and binary data like JPEG/PNG/TIFF images - also with TTL (Time-to-Live) so data can be expired when no longer needed
numbers, including atomic integers, floats
lists/queues/stacks
hashes (like Python dictionaries)
sets, and sorted (ordered) sets
streams, bitfields, geospatial data and esoteric hyperlogs
PUB/SUB is also possible, where one or more machines/processes publish items and multiple consumers, who have subscribed to that topic, receive the published items.
It can also perform very fast operations on your data for you, like set intersections and unions, getting lengths of lists, moving items between lists, atomically adding/subtracting from numbers and so on.
You can also use it to pass data between processes, sub-processes, shell scripts, parent and child, child and parent (!) scripts and so on.
In addition to all that, it is networked, so you can set variables on one computer and read/alter them from another - very simply. For example, you can PUSH jobs to a queue, potentially from multiple machines, and run workers on multiple machines that wait for jobs on the queue, process them and return results to another list.
There is a discussion of the things you can store here.
Example: Store a string, then retrieve it:
redis-cli SET name fred
name=$(redis-cli GET name)
Example: Increment views of page 2 by 10, and then retrieve from different machine on network:
redis-cli INCRBY views:page:2 10
views=$(redis-cli -h 192.168.0.10 GET views:page:2)
Example: Push a value onto a list:
redis-cli LPUSH shoppingList bananas
Example: Blocking wait for next item in list - use RPOP for non-blocking:
item=$(redis-cli BRPOP shoppingList)
Also, there are bindings for Python, C/C++, Java, Ruby, PHP etc. So you can "inject" dummy/test data into, or extract debug data from a running Python program using the redis-cli tool even on a different computer.
Use environment variables to store your data.
ABC="abc"; export ABC
And the other question is, how to make environment variables persistents after reboot.
Depending on your shell, you may have different file to persist the veriables.
if using bash, run this command containing the variable's last value before reboot.
echo 'export ABD="hello"' >> $HOME/.bashrc
I think this is a good time to be using an SQL Database. It's more scalable and functional than having a fileful of "persistent variables".
It may require a little more setup, and I admit it isn't "simpler" per say, but it will probably be worth it in the long run. You will be able to do things with your variables and that may make your future scripts simpler.
I recommend going to YouTube and find a simple instruction on how to set up a local MySQL or MSSQL. There is a guy, Mike Dane, who makes really beginner-friendly instructions. Try searching "GiraffeAcademy SQL Beginner" and see if that helps you.
So for our computer science class, we had to write a program that randomly generates cards, i decided to do mine in batch, because im a massive noob XD
I was confident that i could do it as im quite experienced with it. Even though batch isn't by any means a good 'language' if your going to call it that. I was able to fix most of the problems by myself with some hard work. I am however, still having some issues i don't know how to resolve.
My biggest issues i don't know how to fix are...
Text not being displayed properly.
Numbers (i use for the base of the AI and card generation) sometimes not being defined properly in variables.
The point system just refuses to work not matter what i do.
It sometimes randomly just flat out decides to crash on me if i skip a 'TIMEOUT' or a 'PAUSE'.
Some 'IF' statements not being executed properly even though there exactly the same as the other ones.
I'm sorry if this question is too broad, but i really didn't know quite how to summarize it.
Here is a link to my card game: http://pastebin.com/t2S3yWk5
Here is our question:
1) Create a program that will generate two random numbers - one that maps to a suit (hearts,diamonds, clubs or spades) and one that maps to a card (Ace, 2, 3, ..... Jack, Queen, King)
*Mine is slightly different, it generates two different suits based on two random numbers.*
2) Create a program that will generate a random card and then ask the user to go Higher,Lower or Quit. If the player chooses Higher the next card must be of a higher value or theplayer is out. Likewise for Lower.
3)Extending the previous program, the user will select a trump suit at the start of the game. If the card is a trump suit card then the game continues regardless of the card’s value. The program will keep score and will save the score to a highscore file. The user will also be able to display the current highscore file.
I would like to try and do this stuff (listed above) myself. I just need help trying to fix my existing program.
I hope that if your reading this you could give me some advise or provide solutions to some of my problems. Thanks in advance! :3
Good news; nothing is super wrong with your code, it's just a bunch of little things that seem like a lot. Some off-by-one errors and a missing variable that I can only assume got replicated from copying and pasting.
Text not being displayed properly
I assume this is the "Access denied" errors that your code produces instead of the AI's comments. This is due to the fact that > and < are used for output redirection, so the emoticons you are adding are trying to create a file in a place you don't have access to. Either get rid of them (recommended) or use ^ to escape them (^>:)).
Numbers (I use for the base of the AI and card generation) sometimes not being defined properly in variables
%random% %% 5 results in one of the numbers in the set 0, 1, 2, 3, 4. You currently do not have an if statement for what should happen if 0 is randomly selected. Add an if statement or have the code go back to the top of the section if a 0 is selected.
The point system just refuses to work no matter what I do
You're going to kick yourself...
Your set statements are missing the assignment portion. SET /A %p1p% + 2 should be SET /A p1p=%p1p% + 2 and so forth (or set /a p1p+=2 if you think that looks better).
If sometimes randomly just flat out decides to crash on me if I skip a 'TIMEOUT' or 'PAUSE'
I couldn't replicate that, but the code seemed to work fine when I removed those statements.
Some 'IF' statements not being executed properly even though they're exactly the same as the other ones
Your comment indicated lines 119-132, which include the if statements that assign points. See above for why those aren't working.
Some other recommendations for your code
Your variable names should be more descriptive. For example, ctog doesn't tell me anything about what that variable should be; I can look at the code to see what it does, but without any context, that could be doing anything.
You should add the /i flag to the if statements that check which card you put down so that C1 and c1 get treated the same. On a related note, you should add a check for when the player enters something other than C1 or C2. You can even use a choice command like you did earlier.
:pvic is missing an exit command, so you automatically play again if you win. Combined with the fact that you only check if lt is equal to 2, not greater than or equal to 2, there's no way to stop playing if you win. Also on the subject of end game conditions, there's no if statement for if you tie the computer.
cp1 and num1 are effectively the same variable, there's no reason to have both (same with cp2/num2, ap1/num3, and ap2/num4).
You need some kind of goto at the end of :pc1 so that :pc2 doesn't automatically run after :pc1 finishes.
Having approximately 600GB of photos collected over 13 years - now stored on freebsd zfs/server.
Photos comes from family computers, from several partial backups to different external USB HDDs, reconstructed images from disk disasters, from different photo manipulation softwares (iPhoto, Picassa, HP and many others :( ) in several deep subdirectories - shortly = TERRIBLE MESS with many duplicates.
So in the first i done:
searched the the tree for the same size files (fast) and make md5 checksum for those.
collected duplicated images (same size + same md5 = duplicate)
This helped a lot, but here are still MANY MANY duplicates:
photos what are different only with exif/iptc data added by some photo management software, but the image is the same (or at least "looks as same" and have the same dimensions)
or they are only a resized versions of the original image
or they are the "enhanced" versions of originals, etc..
Now the questions:
how to find duplicates withg checksuming only the "pure image bytes" in a JPG without exif/IPTC and like meta informations? So, want filter out the photo-duplicates, what are different only with exif tags, but the image is the same. (therefore file checksuming doesn't works, but image checksuming could...). This is (i hope) not very complicated - but need some direction.
What perl module can extract the "pure" image data from an JPG file what is usable for comparison/checksuming?
More complex
how to find "similar" images, what are only the
resized versions of the originals
"enchanced" versions of the originals (from some photo manipulation programs)
is here already any algorithm available in a unix command form or perl module (XS?) what i can use to detect these special "duplicates"?
I'm able make complex scripts is BASH and "+-" :) know perl.. Can use FreeBSD/Linux utilities directly on the server and over the network can use OS X (but working with 600GB over the LAN not the fastest way)...
My rough idea:
delete images only at the end of workflow
use Image::ExifTool script for collecting duplicate image data based on image-creation date, and camera model (maybe other exif data too).
make checksum of pure image data (or extract histogram - same images should have the same histogram) - not sure about this
use some similarity detection for finding duplicates based on resize and foto enhancement - no idea how to do...
Any idea, help, any (software/algorithm) hint how to make order in the chaos?
Ps:
Here is nearly identical question: Finding Duplicate image files but i'm already done with the answer (md5). and looking for more precise checksuming and image comparing algorithms.
Assuming you can work with localy mounted FS:
rmlint : fastest tool I've ever used to find exact duplicates
findimagedupes : automatize the whole ImageMagick way (as Randal Schwartz's script that I haven't tested? it seems)
Detecting Similar and Identical Images Using Perseptual Hashes goes all the way (a great reference post)
dupeguru-pe (gui) : dedicated tool that is fast and does an excellent job
geeqie (gui) : I find it fast/excellent to finish the job, using the granular deduplication options. Also then you can generate an ordered collection of images such that 'simular images are next to each other, allowing you to 'flip' between the two to see the changes.
Have you looked at this article by Randal Schwartz? He uses a perl script with ImageMagick to compare resized (4x4 RGB grid) versions of the pictures that he then compares in order to flag "similar" pictures.
You can remove exif data with mogrify -strip from ImageMagick toolset. So you could, for each image, copy it without exif, md5sum, and then compare md5sums.
When it comes to visually similar messages - you can, for example, use compare (also from ImageMagick toolset), and produce black/white diff map, like described here, then make histogram of the difference and check if there is "enough" white to mean that it's different.
I had a similar dilemma - several hundred gigs of photos and videos spread and duplicated over about a dozen drives. I know this may not be the exact way you are looking for, but the FSlint Janitor application (on Ubuntu 16.x, then 18.x) was a lifesaver for me. I took the project in chunks, eventually cleaning it all up and ended up with three complete sets (I wanted two off-site backups).
FSLint Janitor:
sudo apt install fslint
I am using FFTW for FFTs, it's all working well but the optimisation takes a long time with the FFTW_PATIENT flag. However, according to the FFTW docs, I can improve on this by reusing wisdom between runs, which I can import and export to file. (I am using the floating point fftw routines, hence the fftwf_ prefix below instead of fftw_)
So, at the start of my main(), I have :
char wisdom_file[] = "optimise.fft";
fftwf_import_wisdom_from_filename(wisdom_file);
and at the end, I have:
fftwf_export_wisdom_to_filename(wisdom_file);
(I've also got error-checking to check the return is non-zero, omitted for simplicity above, so I know the files are reading and writing correctly)
After one run I get a file optimise.fft with what looks like ASCII wisdom. However, subsequent runs do not get any faster, and if I create my plans with the FFTW_WISDOM_ONLY flag, I get a null plan, showing that it doesn't see any wisdom there.
I am using 3 different FFTs (2 real to complex and 1 inverse complex to real), so have also tried import/export in each FFT, and to separate files, but that doesn't help.
I am using FFTW-3.3.3, I can see that FFTW-2 seemed to need more setting up to reuse wisdom, but the above seems sufficient now- what am I doing wrong?
Previously, I asked the question.
The problem is the demands of our file structure are very high.
For instance, we're trying to create a container with up to 4500 files and 500mb data.
The file structure of this container consists of
SQLite DB (under 1mb)
Text based xml-like file
Images inside a dynamic folder structure that make up the rest of the 4,500ish files
After the initial creation the images files are read only with the exception of deletion.
The small db is used regularly when the container is accessed.
Tar, Zip and the likes are all too slow (even with 0 compression). Slow is subjective I know, but to untar a container of this size is over 20 seconds.
Any thoughts?
As you seem to be doing arbitrary file system operations on your container (say, creation, deletion of new files in the container, overwriting existing files, appending), I think you should go for some kind of file system. Allocate a large file, then create a file system structure in it.
There are several options for the file system available: for both Berkeley UFS and Linux ext2/ext3, there are user-mode libraries available. It might also be possible that you find a FAT implementation somewhere. Make sure you understand the structure of the file system, and pick one that allows for extending - I know that ext2 is fairly easy to extend (by another block group), and FAT is difficult to extend (need to append to the FAT).
Alternatively, you can put a virtual disk format yet below the file system, allowing arbitrary remapping of blocks. Then "free" blocks of the file system don't need to appear on disk, and you can allocate the virtual disk much larger than the real container file will be.
Three things.
1) What Timothy Walters said is right on, I'll go in to more detail.
2) 4500 files and 500Mb of data is simply a lot of data and disk writes. If you're operating on the entire dataset, it's going to be slow. Just I/O truth.
3) As others have mentioned, there's no detail on the use case.
If we assume a read only, random access scenario, then what Timothy says is pretty much dead on, and implementation is straightforward.
In a nutshell, here is what you do.
You concatenate all of the files in to a single blob. While you are concatenating them, you track their filename, the file length, and the offset that the file starts within the blob. You write that information out in to a block of data, sorted by name. We'll call this the Table of Contents, or TOC block.
Next, then, you concatenate the two files together. In the simple case, you have the TOC block first, then the data block.
When you wish to get data from this format, search the TOC for the file name, grab the offset from the begining of the data block, add in the TOC block size, and read FILE_LENGTH bytes of data. Simple.
If you want to be clever, you can put the TOC at the END of the blob file. Then, append at the very end, the offset to the start of the TOC. Then you lseek to the end of the file, back up 4 or 8 bytes (depending on your number size), take THAT value and lseek even farther back to the start of your TOC. Then you're back to square one. You do this so you don't have to rebuild the archive twice at the beginning.
If you lay out your TOC in blocks (say 1K byte in size), then you can easily perform a binary search on the TOC. Simply fill each block with the File information entries, and when you run out of room, write a marker, pad with zeroes and advance to the next block. To do the binary search, you already know the size of the TOC, start in the middle, read the first file name, and go from there. Soon, you'll find the block, and then you read in the block and scan it for the file. This makes it efficient for reading without having the entire TOC in RAM. The other benefit is that the blocking requires less disk activity than a chained scheme like TAR (where you have to crawl the archive to find something).
I suggest you pad the files to block sizes as well, disks like work with regular sized blocks of data, this isn't difficult either.
Updating this without rebuilding the entire thing is difficult. If you want an updatable container system, then you may as well look in to some of the simpler file system designs, because that's what you're really looking for in that case.
As for portability, I suggest you store your binary numbers in network order, as most standard libraries have routines to handle those details for you.
Working on the assumption that you're only going to need read-only access to the files why not just merge them all together and have a second "index" file (or an index in the header) that tells you the file name, start position and length. All you need to do is seek to the start point and read the correct number of bytes. The method will vary depending on your language but it's pretty straight forward in most of them.
The hardest part then becomes creating your data file + index, and even that is pretty basic!
An ISO disk image might do the trick. It should be able to hold that many files easily, and is supported by many pieces of software on all the major operating systems.
First, thank-you for expanding your question, it helps a lot in providing better answers.
Given that you're going to need a SQLite database anyway, have you looked at the performance of putting it all into the database? My experience is based around SQL Server 2000/2005/2008 so I'm not positive of the capabilities of SQLite but I'm sure it's going to be a pretty fast option for looking up records and getting the data, while still allowing for delete and/or update options.
Usually I would not recommend to put files inside the database, but given that the total size of all images is around 500MB for 4500 images you're looking at a little over 100K per image right? If you're using a dynamic path to store the images then in a slightly more normalized database you could have a "ImagePaths" table that maps each path to an ID, then you can look for images with that PathID and load the data from the BLOB column as needed.
The XML file(s) could also be in the SQLite database, which gives you a single 'data file' for your app that can move between Windows and OSX without issue. You can simply rely on your SQLite engine to provide the performance and compatability you need.
How you optimize it depends on your usage, for example if you're frequently needing to get all images at a certain path then having a PathID (as an integer for performance) would be fast, but if you're showing all images that start with "A" and simply show the path as a property then an index on the ImageName column would be of more use.
I am a little concerned though that this sounds like premature optimization, as you really need to find a solution that works 'fast enough', abstract the mechanics of it so your application (or both apps if you have both Mac and PC versions) use a simple repository or similar and then you can change the storage/retrieval method at will without any implication to your application.
Check Solid File System - it seems to be what you need.