I know it talks about it in the configuration guide, but it seems like a pain to verify it visually. Are there any tools available to automatically verify readback data?
According to both the 7-series and Virtex 5 Configuration guides, there are two methods of verifying readback data.
Compare the data with the RBD and MSD generated by Xilinx
Compare the data with the bitfile and MSK.
In general the first method is easier since you don't have to spend the time to find out how the data is aligned. Xilinx says this:
The simplest way to verify the readback data stream is to compare it to the RBD golden readback file, masking readback bits with the MSD file. This approach is simple because there is a 1:1 correspondence between the start of the readback data stream and the start of the RBD and MSD files, making the task of aligning readback, mask, and expected data easier.
The RBD and MSD files contain an ASCII representation of the readback and mask data along with a file header that lists the file name, etc. This header information should be ignored or deleted. The ASCII 1s and 0s in the RBD and MSD files correspond to the binary readback data from the device. Take care to interpret these files as text, not binary sources. Users can convert the RBD and MSD files to a binary format using a script or text editor, to simplify the verify procedure for some systems and to reduce the size of the files by a factor of eight.
So, really you just need a simple program that converts the ASCII to binary and then compare it with the readback data. I haven't been able to find any tools that do this online (although it is relatively simple to do), so I've made a simple open-source tool in C to help out. You can get it here:
Xilinx Readback Verify on GitHub.
Related
I am looking for an algorithm to compress small ASCII strings. They contain lots of letters but they also can contain numbers and rarely special characters. They will be small, about 50-100 bytes average, 250 max.
Examples:
Android show EditText.setError() above the EditText and not below it
ImageView CENTER_CROP dont work
Prevent an app to show on recent application list on android kitkat 4.4.2
Image can't save validable in android
Android 4.4 SMS - Not receiving sentIntents
Imported android-map-extensions version 2.0 now my R.java file is missing
GCM registering but not receiving messages on pre 4.0.4. devices
I want to compress the titles one by one, not many titles together and I don't care much about CPU and memory usage.
You can use Huffman coding with a shared Huffman tree among all texts you want to compress.
While you typically construct a Huffman tree for each string to be compressed separately, this would require a lot of overhead in storage which should be avoided here. That's also the major problem when using a standard compression scheme for your case: most of them have some overhead which kills your compression efficiency for very short strings. Some of them don't have a (big) overhead but those are typically less efficient in general.
When constructing a Huffman tree which is later used for compression and decompression, you typically use the texts which will be compressed to decide which character is encoded with which bits. Since in your case the texts to be compressed seem to be unknown in advance, you need to have some "pseudo" texts to build the tree, maybe from a dictionary of the human language or some experience of previous user data.
Then construct the Huffman tree and store it once in your application; either hardcode it into the binary or provide it in the form of a file. Then you can compress and decompress any texts using this tree. Whenever you decide to change the tree since you gain better experience on which texts are compressed, the compressed string representation also changes. It might be a good idea to introduce versioning and store the tree version together with each string you compress.
Another improvement you might think about is to use multi-character Huffman encoding. Instead of compressing the texts character by character, you could find frequent syllables or words and put them into the tree too; then they require even less bits in the compressed string. This however requires a little bit more complicated compression algorithm, but it might be well worth the effort.
To process a string of bits in the compression and decompression routine in C++(*), I recommend either boost::dynamic_bitset or std::vector<bool>. Both internally pack multiple bits into bytes.
(*)The question once had the c++ tag, so OP obviously wanted to implement it in C++. But as the general problem is not specific to a programming language, the tag was removed. But I still kept the C++-specific part of the answer.
I'm writing a parser for the most common geographic data storage type, a collection of files called a "shapefile". This is my first project where I've had to think about endianness.
It turns out that the geometry storage is mixed endian; some parts of the file are big endian, but most of it is little endian. The shapefile standard is described here.
Is there a discernible performance rationale, or was it simply born out of historical context? If so, do you happen to know what that historical context is?
The integers and double-precision integers that make up the data description fields in the
file header (identified below) and record contents in the main file are in little endian (PC or Intel®) byte order. The integers and double-precision floating point numbers that make up the rest of the file and file management are in big endian (Sun® or Motorola®) byte
order.
While there doesn't seem to be a clear answer for it, what I've seen is a mixture of "confusion while trying to create a format that works on all platforms" and "a lot of poorly designed formats were designed back then". More info here: https://gis.stackexchange.com/questions/18969/oddities-in-the-shapefile-technical-specification
I have to build an One Time Pad system and for that, I have to build my own TRNG. I want to know how to make record atmospheric noise and use that to generate random numbers. I've tried so far to record a .wav file and read it in Java, but the values don't seem very...random. Any suggestions? I know about Random.org, but I can't really use their generators, I have to build my own, so what I want is some insight into how the folks at Random.org have built their numbers generator, with atmospheric noise as a source of 'randomness'.
Non Real-time solution
What you can do is record the audio surrounding the room before in and save a temporary WAV file. If you know how the WAV file works which is based on the RIFF specification. Then strip the WAV header which is 44 bytes in length. Then read the audio bytes and do the proper conversions depending on whether you want to generate WORDS, DWORDS, or BYTES, it is up to you. Then you should have some random values to work with. Then use those random values accordingly.
Real-time solution
Since I do not know whether you want to program this in Java or some other language. In addition, I do not know the intended platform; so I cannot recommend you any realtime audio processing libraries.
For C# you can use NAudio and you can record the audio in realtime and recieve the audio bytes. Then you can convert the audio bytes into either a DWORD, QWORD, WORD, etc. You should be able to have some random values. Remember to stop recording and to release unmanaged resources when generating random numbers has ceased.
Good Resources On The WAV File Specification
Link to the specification (Easy to understand)
The answer is unknown and probably intentionally so. Although hard to be sure, the site seems to be a combination of charity and for-profit work. Each radio source only produces a few Kbps of random data. How he describes it in many links, I don't see evidence of a CSRNG. It doesn't matter. For OTP purposes, if it's not truly random, it's a glorified stream cipher. (I think that's what Bruce and others have always said.)
I find it hard to recall when a good CSRNG was broken. I'd recommend you use something like ISAAC or a properly implemented block/stream cipher. Perfect Paper Passwords does this. Use a Fortuna construction with the internals of Fortuna using the above ciphers/algorithms to produce the majority of the random data. The Fortuna system can regularly have data injected into it by a TRNG. The very best TRNG on a budget is random.org plus locally generated stuff. The best cheap, hardware solution is a VIA Artigo board with VIA Padlock (TRNG + acceleration for SHA-1, SHA256, AES, & RSA) for $300. They have libraries to help you use things, too. (There's even a pseudo-TRNG that uses processor timing under network load.)
Remember, the crypto is usually the strongest link in the chain. System security exists on many levels: processor, firmware, peripheral firmware (esp DMA), kernel mode code, OS, trusted middleware or OS functions, application. Security as a whole includes users, policy, physical security, EMSEC, etc. Anyone worrying way too much about RNG's is usually wasting effort. Just use an accepted solution or something I mentioned above. Then, focus on the rest. Especially, how people and systems interact. Configuration, patching, choice of OS, policies. Most problems happen there.
I recall an article on random.org that I can't seem to find now. I all remember is that they used the lsb of the noise they were measuring. The MSBs will certainly not be random. Then then generated a string of 1s and 0's based on the lsb. Don't do something silly like a simple binary conversion, that won't work. You maybe have to sample the noise in binary, to make the distribution of the lsb have a more uniform sampling.
The trick they used to ensure an even distribution was to not use this string of 1's and 0's as the random numbers. Instead they would parse the string, 2 bits at a time. Every time the bits matched (ie 00 or 11) they added a 1 to their random string. Every times the bits flipped (ie 01 or 10) they added a 0 to their random string.
If you make your own TRNG, make sure you verify it!
It is hardly possible to get real random numbers out of software. Even the static in your wav file is likely to be influenced by periodic EMI generated by your computer and is therefore not purely random.
Can you use special hardware or are you forced to stick to pure software? Why won't pseudo random numbers satisfy your needs? They will do fine on a relatively small number of random samples. Because you want to use the random numbers in an OTP, I guess you won't be using it in a big scale.
Can you provide a little more detail?
The atmospheric noise approach to generating random numbers is complex because the atmosphere is filled with non-random signals, all of which pollute the entropy you seek. There is an easier way.
Chances are good your CPU already contains a true random number generator, assuming you have an Intel Ivy Bridge-based Core/Xeon processor, which became available in April, 2012. (The new Haswell architecture also has this feature).
Intel's random generator exploits the random effects of thermal noise inside an unstable digital circuit. Thermal noise is just random atomic vibrations, which is pretty much the same underlying physical phenomenon that Random.org uses when it samples atmospheric noise. The sampled random bits go through a sophisticated conditioning and testing process to eliminate pollution from non-random signals. I highly recommend this excellent article on IEEE Spectrum which describes the process in detail.
Intel added a new x86 instruction called RDRAND that allows programs to directly retrieve these random numbers. Although Java does not yet support direct access to RDRAND, it's possible using JNI. This is the approach I took with the drnglib project. For example:
DigitalRandom random = new DigitalRandom();
System.out.println(random.nextInt());
The nextInt() method is implemented as a JNI native call that invokes RDRAND. The performance is pretty good considering the quality of randomness. Using eight threads, I've generated ~760 MB/sec of random data.
True random number generators (TRNGs) are usually from natural sources like seismic signals, non-stationary bio-signals, etc. The two issues faced by these generators are:
1) The data points are non-uniformly distributed
2) It takes very long time to generate large sequence of numbers (specially when the requirement is in millions).
However, the most important advantage on their part is their unpredictable nature. To overcome their issues and to retain its advantage, it is better to fuse the output of TRNG to seed a pseudo-random number generator. For this, you may try using the amplitude values of atmospheric noise at random time points and use it to seed a PRNG.
This will help you to get large numbers of uniformly distributed values. As the seed is unpredictable, the output of PRNG too becomes unpredictable.
I have been writing an audio editor for the last couple of months, and have been recently thinking about how to implement fast and efficient editing (cut, copy, paste, trim, mute, etc.). There doesn't really seem to be very much information available on this topic, however... I know that Audacity, for example, uses a block file strategy, in which the sample data (and summaries of that data, used for efficient waveform drawing) is stored on disk in fixed-sized chunks. What other strategies might be possible, however? There is quite a lot of info on data-structures for text editing - many text (and hex) editors appear to use the piece-chain method, nicely described here - but could that, or something similar, work for an audio editor?
Many thanks in advance for any thoughts, suggestions, etc.
Chris
the classical problem for editors handling relative large files is how to cope with deletion and insertion. Text editors obviously face this, as typically the user enters characters one at a time. Audio editors don't typically do "sample by sample" inserts, i.e. the user doesn't interactively enter one sample per time, but you have some cut-and-paste operations. I would start with a representation where an audio file is represented by chunks of data which are stored in a (binary) search tree. Insert works by splitting the chunk you are inserting into two chunks, adding the inserted chunk as a third one, and updating the tree. To make this efficient and responsive to the user, you should then have a background process that defragments the representation on disk (or in memory) and then makes an atomic update to the tree holding the chunks. This should make inserts and deletes as fast as possible. Many other audio operations (effects, normalize, mix) operate in-place and do not require changes to the data structure, but doing e.g. normalize on the whole audio sample is a good opportunity to defragment it at the same time. If the audio samples are large, you can keep the chunks as it is standard on hard disk also. I don't believe the chunks need to be fixed size; they can be variable size, preferably 1024 x (power of two) bytes to make file operations efficient, but a fixed-size strategy can be easier to implement.
For an open source project I have I am writing an abstraction layer on top of the filesystem.
This layer allows me to attach metadata and relationships to each file.
I would like the layer to handle file renames gracefully and maintain the metadata if a file is renamed / moved or copied.
To do this I will need a mechanism for calculating the identity of a file. The obvious solution is to calculate an SHA1 hash for each file and then assign metadata against that hash. But ... that is really expensive, especially for movies.
So, I have been thinking of an algorithm that though not 100% correct will be right the vast majority of the time, and is cheap.
One such algorithm could be to use file size and a sample of bytes for that file to calculate the hash.
Which bytes should I choose for the sample? How do I keep the calculation cheap and reasonably accurate? I understand there is a tradeoff here, but performance is critical. And the user will be able to handle situations where the system makes mistakes.
I need this algorithm to work for very large files (1GB+ and tiny files 5K)
EDIT
I need this algorithm to work on NTFS and all SMB shares (linux or windows based), I would like it to support situations where a file is copied from one spot to another (2 physical copies exist are treated as one identity). I may even consider wanting this to work in situations where MP3s are re-tagged (the physical file is changed, so I may have an identity provider per filetype).
EDIT 2
Related question: Algorithm for determining a file’s identity (Optimisation)
Bucketing, multiple layers of comparison should be fastest and scalable across the range of files you're discussing.
First level of indexing is just the length of the file.
Second level is hash. Below a certain size it is a whole-file hash. Beyond that, yes, I agree with your idea of a sampling algorithm. Issues that I think might affect the sampling speed:
To avoid hitting regularly spaced headers which may be highly similar or identical, you need to step in a non-conforming number, eg: multiples of a prime or successive primes.
Avoid steps which might end up encountering regular record headers, so if you are getting the same value from your sample bytes despite different location, try adjusting the step by another prime.
Cope with anomalous files with large stretches of identical values, either because they are unencoded images or just filled with nulls.
Do the first 128k, another 128k at the 1mb mark, another 128k at the 10mb mark, another 128k at the 100mb mark, another 128k at the 1000mb mark, etc. As the file sizes get larger, and it becomes more likely that you'll be able to distinguish two files based on their size alone, you hash a smaller and smaller fraction of the data. Everything under 128k is taken care of completely.
Believe it or not, I use the ticks for the last write time for the file. It is as cheap as it gets and I am still to see a clash between different files.
If you can drop the Linux share requirement and confine yourself to NTFS, then NTFS Alternate Data Streams will be a perfect solution that:
doesn't require any kind of hashing;
survives renames; and
survives moves (even between different NTFS volumes).
You can read more about it here. Basically you just append a colon and a name for your stream (e.g. ":meta") and write whatever you like to it. So if you have a directory "D:\Movies\Terminator", write your metadata using normal file I/O to "D:\Movies\Terminator:meta". You can do the same if you want to save the metadata for a specific file (as opposed to a whole folder).
If you'd prefer to store your metadata somewhere else and just be able to detect moves/renames on the same NTFS volume, you can use the GetFileInformationByHandle API call (see MSDN /en-us/library/aa364952(VS.85).aspx) to get the unique ID of the folder (combine VolumeSerialNumber and FileIndex members). This ID will not change if the file/folder is moved/renamed on the same volume.
How about storing some random integers ri, and looking up bytes (ri mod n) where n is the size of file? For files with headers, you can ignore them first and then do this process on the remaining bytes.
If your files are actually pretty different (not just a difference in a single byte somewhere, but say at least 1% different), then a random selection of bytes would notice that. For example, with a 1% difference in bytes, 100 random bytes would fail to notice with probability 1/e ~ 37%; increasing the number of bytes you look at makes this probability go down exponentially.
The idea behind using random bytes is that they are essentially guaranteed (well, probabilistically speaking) to be as good as any other sequence of bytes, except they aren't susceptible to some of the problems with other sequences (e.g. happening to look at every 256-th byte of a file format where that byte is required to be 0 or something).
Some more advice:
Instead of grabbing bytes, grab larger chunks to justify the cost of seeking.
I would suggest always looking at the first block or so of the file. From this, you can determine filetype and such. (For example, you could use the file program.)
At least weigh the cost/benefit of something like a CRC of the entire file. It's not as expensive as a real cryptographic hash function, but still requires reading the entire file. The upside is it will notice single-byte differences.
Well, first you need to look more deeply into how filesystems work. Which filesystems will you be working with? Most filesystems support things like hard links and soft links and therefore "filename" information is not necessarily stored in the metadata of the file itself.
Actually, this is the whole point of a stackable layered filesystem, that you can extend it in various ways, say to support compression or encryption. This is what "vnodes" are all about. You could actually do this in several ways. Some of this is very dependent on the platform you are looking at. This is much simpler on UNIX/Linux systems that use a VFS concept. You could implement your own layer on tope of ext3 for instance or what have you.
**
After reading your edits, a couplre more things. File systems already do this, as mentioned before, using things like inodes. Hashing is probably going to be a bad idea not just because it is expensive but because two or more preimages can share the same image; that is to say that two entirely different files can have the same hashed value. I think what you really want to do is exploit the metadata of that the filesystem already exposes. This would be simpler on an open source system, of course. :)
Which bytes should I choose for the sample?
I think that I would try to use some arithmetic progression like Fibonacci numbers. These are easy to calculate, and they have a diminishing density. Small files would have a higher sample ratio than big files, and the sample would still go over spots in the whole file.
This work sounds like it could be more effectively implemented at the filesystem level or with some loose approximation of a version control system (both?).
To address the original question, you could keep a database of (file size, bytes hashed, hash) for each file and try to minimize the number of bytes hashed for each file size. Whenever you detect a collision you either have an identical file, or you increase the hash length to go just past the first difference.
There's undoubtedly optimizations to be made and CPU vs. I/O tradeoffs as well, but it's a good start for something that won't have false-positives.