Algorithm used by pluploadQueue to "randomize" names

Algorithm used by pluploadQueue to "randomize" names - random

Does anybody know, what algorithm Plupload widget uses, when pluploadQueue.rename = TRUE?
I've sent 77 test files and all of them have file name starting with o_18pm0noe (first ten characters equal). Only rest of the file name differs.
I wonder, why such pattern is used and can we somehow control it to have entire file name randmized (in any "natural" way, that is -- not by changing Plupload source code)?

Does anybody know, what algorithm Plupload widget uses, when pluploadQueue.rename = TRUE?
According to the sources, it is a combination of current time and some random numbers (both base32).
Can we somehow control it to have entire file name randmized (in any "natural" way, that is -- not by changing Plupload source code)?
Maybe you can try to set the file name to a custom randomized name? See _enableRenaming for an example of file renaming (file.name = nameInput.val() + ext;).

Related

How to prevent specific line/s inside a notepad file from being edited?

I have a file that can be opened thru notepad application.
Basically, this file(which can be opened thru notepad) is created by a software and that software uses the values inside that file to run. You can edit the values inside the file using its software.
I just want specific lines(values) to be restricted from being edited because I am implementing strict values inside that file that no one will be able to edit except me.
Is there any clever way to restrict specific lines inside that file from being edited?
I tried the basic way - I used the change permission read/write on that file but I can't change ANY values inside the file which is undesirable.
Note: I have very little to no experience about python, c++, or java but any suggestion will give me idea to learn from it.
Edit:
Here's an example inside the file:
[Type Data]
Comment=Standard Dispense
[Shared_A]
802=1
807=750
11=0
12=0
.
What I want is restrict the value from row/column "807" which is equal to number 750.
I want this number 750 not to be edited even from the software so that other people will not mess it up. I want to set this value as standard value.
Is there any program that you can write inside that file so that it cannot be edited from the software unless I open that file and edit it?
I work from a production/manufacturing company that uses the software that is used for dispensing.

A text file is simply a sequence of bytes that represent code units to encode code points in any given character set. Every byte value is a potentially legal character encoding, leaving no values to encode additional semantics (like guard regions).
With that it should be obvious that there is nothing you can do to partially limit editing of a file using a standard text editor. Whatever problem you are trying to solve, this is not a solution. Next time around you might want to ask about the problem you are trying to solve rather than your proposed solution.

Rules for file extensions?

Are there any rules for file extensions? For example, I wrote some code which reads and writes a byte pattern that is only understood by that specific programm. I'm assuming my anti virus programm won't be too happy if I give it the name "pleasetrustme.exe"... Is it gerally allowed to use those extensions? And what about the lesser known ones, like ".arw"?

You can use any file extension you want (or none at all). Using standard extensions that reflect the actual type of the file just makes things more convenient. On Windows, file extensions control stuff like how the files are displayed in Windows Explorer and what happens when you double click on it.

I wrote some code which reads and writes a byte pattern that is only
understood by that specific programm.
A file extension is only an indication of what type of data will be inside, never a guarantee that certain data formatted in a specific way will be inside the file.
For your own specific data structure it is of course always best to choose an extension that is not already in use for other file formats (or use a general extension like .dat or .bin maybe). This also has the advantage of being able to use an own icon without it being overwritten by other software using the same extension - or the other way around.
But maybe even more important when creating a custom (binary?) file format, is to provide a magic number as the first bytes of that file, maybe followed by a file header structure containing a version number etc. That way your own software can first check the header data to make sure it's the right type and version (for example: anyone could rename any file type to your extension, so your program needs to have a way to do some checks inside the file before reading the remaining data).

What is the best way to edit the middle of an existing flat file?

I have tool that creates variables for a simulation. The current workflow involves hand copying those variables into the simulation input file. The input file is a standard flat file, i.e. not binary or XML. I would like to automate the addition of the variables to the flat input file.
The variables copy over existing variables in the file, e.g.
New Variables:
Length 10
Height 20
Depth 30
Old Variables:
...
Weight 100
Age 20
Length 10
Height 20
Depth 30
...
Would like to have the old variables copy over the new variable. They are 200 lines into the flat input file.
Thanks for any insights.
P.S. This is on Windows.

If you're stuck using flat, then you're stuck using the old fashioned way of updating them: read from original, write to temp file, either write the original row or change the data and then write that. To add data, write it to the temp file at the appropriate point; to delete data, simply don't copy it from the original file.
Finally, close both files and rename the temp file to the original file name.
Alternatively, it might be time to think about a little database.

For something like this I'd be looking at a simple template engine. You'd have a base template with predefined marker tokens instead of variable values and then just pass the values required to your engine along with the template and it will spit out the resultant file, all present and correct. There are a number of Open Source template engines available in Java that would meet your needs, I imagine such things are also available in your language of choice. You could even roll your own without too much difficulty.

Note that under Unix, one would probably look at using mmap() because you can then use functions such as memmove() to move the data around and add new data or truncate() the result if the file is then smaller (you may also want to use truncate() to grow the file).
Under MS-Windows, you have the MapViewOfFileEx() function to do the same thing. The API is different, though,
and I'm not exactly sure what happens or how to grow/shrink the file (MSDN also includes a truncate()-like function and maybe that works).
Of course, it's important to use memcpy() or memmove() properly to not overwrite the wrong data or go outside the buffer.

How to generate a unique hash for a URL?

Given these two images from twitter.
http://a3.twimg.com/profile_images/130500759/lowres_profilepic.jpg
http://a1.twimg.com/profile_images/58079916/lowres_profilepic.jpg
I want to download them to local filesystem & store them in a single directory.
How shall I overcome name conflicts ?
In the example above, I cannot store them as lowres_profilepic.jpg.
My design idea is treat the URLs as opaque strings except for the last segment.
What algorithms (implemented as f) can I use to hash the prefixes into unique strings.
f( "http://a3.twimg.com/profile_images/130500759/" ) = 6tgjsdjfjdhgf
f( "http://a1.twimg.com/profile_images/58079916/" ) = iuhd87ysdfhdk
That way, I can save the files as:-
6tgjsdjfjdhgf_lowres_profilepic.jpg
iuhd87ysdfhdk_lowres_profilepic.jpg
I don't want a cryptographic algorithm as it this needs to be a performant operation.

Irrespective of the how you do it (hashing, encoding, database lookup) I recommend that you don't try to map a huge number of URLs to files in a big flat directory.
The reason is that file lookup for most file systems involves a linear scan through the filenames in a directory. So if all N of your files are in one directory, a lookup will involve 1/2 N comparisons on average; i.e. O(N) (Note that ReiserFS organizes the names in a directory as a BTree. However, ReiserFS seems to be the exception rather than the rule.)
Instead of one big flat directory, it would be better to map the URIs to a tree of directories. Depending on the shape of the tree, lookup can be as good as O(logN). For example, if you organized the tree so that it had 3 levels of directory with at most 100 entries in each directory, you could accommodate 1 million URLs. If you designed the mapping to use 2 character filenames, each directory should easily fit into a single disk block, and a pathname lookup (assuming that the required directories are already cached) should take a few microseconds.

It seems what you really want is to have a legal filename that won't collide with others.
Any encoding of the URL will work, even base64: e.g. filename = base64(url)
A crypto hash will give you what you want - although you claim this will be a performance bottleneck, don't be sure until you've benchmarked

A very simple approach:
f( "http://a3.twimg.com/profile_images/130500759/" ) = a3_130500759.jpg
f( "http://a1.twimg.com/profile_images/58079916/" ) = a1_58079916.jpg
As the other parts of this URL are constant, you can use the subdomain, the last part of the query path as a unique filename.
Don't know what could be a problem with this solution

The nature of a hash is that it may result in collisions. How about one of these alternatives:
use a directory tree. Literally create sub directories for each component of the URL.
Generate a uniques id. The problem here is how to keep the mapping between real name and saved id. You could use a database which maps between a URL and generated unique id. You can simply insert a record into a database which generates unique ids, and then use that id as the filename.

One of the key concepts of a URL is that it is unique. Why not use it?
Every algorithm that shortens the info, can produce collisions. Maybe unlikely, but possible nevertheless

While CRC32 produces a maximum 2^32 values regardless of your input and so will not avoid conflicts, it is still a viable option for this scenario.
It is fast, so if you generate filename that conflicts, just add/change a character to your URL and simply re-calc the CRC.
4.3 billion possible checksums mean the likelihood of a filename conflict, when combined with the original filename, are going to be so low as to be be unimportant in normal situations.
I've used this approach myself for something similar and was pleased with the performance.
See Fast CRC32 in Software.

You can use UUID Class in Java to generate anything into UUID from bytes which is unique and you won't be having a problem with file lookup
String url = http://www.google.com;
String shortUrl = UUID.nameUUIDFromBytes("http://www.google.com".getBytes()).toString();

I see your question is what is the best hash algorithm for this matter. You might want to check this Best hashing algorithm in terms of hash collisions and performance for strings

The git content management system is based on SHA1 because it has very minimal chance for collision.
If it good for git it will be good to you so.

I'm playing with thumbalizr using a modified version of their caching script, and it has a few good solutions I think. The code is on github.com/mptre/thumbalizr but the short version is that is uses md5 to build the file names, and it takes the first two characters from the filename and uses it to create a folder which is named the exact same thing. This means that it is easy to break the folders up, and fast to find the corresponding folder without a database. Kind of blew my mind with it's simplicity.
It generates file names like this
http://pappmaskin.no/opensource/delicious_snapcasa/mptre-thumbalizr/cache/fc/fcc3a328e0f4c1b51bf5e13747614e7a_1280_1024_8_90_250.png
the last part, _1280_1024_8_90_250, matches the different settings that the script uses when talking to the thumbalizr api, but I guess fcc3a328e0f4c1b51bf5e13747614e7a is a straight md5 of the url, in this case for thumbalizr.com
I tried changing the config to generate images 200px wide, and that images goes in the same folder, but instead of _250.png it is called _200.png
I haven't had time to dig that much in the code, but I'm sure it could be pulled apart from the thumbalizr logic and made more generic.

You said:
I don't want a cryptographic algorithm as it this needs to be a performant operation.
Well, I understand your need for speed, but I think you need to consider drawbacks from your approach. If you just need to create hash for urls, you should stick with it and don't to write a new algorithm, where you'll need to deal with collisions, for instance.
So you could have a Dictionary<string, string> to work as a cache to your urls. So, when you get a new address, you first do a lookup in that list and, if doesn't find a match, hash it and storage for future usage.
Following this line, you could give MD5 a try:
public static void Main(string[] args)
{
foreach (string url in new string[]{
"http://a3.twimg.com/profile_images/130500759/lowres_profilepic.jpg",
"http://a1.twimg.com/profile_images/58079916/lowres_profilepic.jpg" })
{
Console.WriteLine(HashIt(url));
}
}
private static string HashIt(string url)
{
Uri path = new Uri(new Uri(url), ".");
MD5CryptoServiceProvider md5 = new MD5CryptoServiceProvider();
byte[] data = md5.ComputeHash(
Encoding.ASCII.GetBytes(path.OriginalString));
return Convert.ToBase64String(data);
}
You'll get:
rEoztCAXVyy0AP/6H7w3TQ==
0idVyXLs6sCP/XLBXwtCXA==

It appears that the numerical part of twimg.com URLs are already a unique value for each image. My research indicates that the number is sequential (i.e. the example url below is for the 433,484,366th profile image ever uploaded - which just happens to be mine). Thus, this number is unique. My solution would be to simply use the numerical part of the filename as the "hash value", with no fear of ever finding a non-unique value.
URL: http://a2.twimg.com/profile_images/433484366/terrorbite-industries-256.png
Filename: 433484366.terrorbite-industries-256.png
Unique ID: 433484366
I already use this system for a Python script that displays notifications for new tweets, and as part of its operation it caches profile image thumbnails to reduce unneccessary downloads.
P.S. It makes no difference what subdomain the image is downloaded from, all images are available from all subdomains.

What are the best practices for building multi-lingual applications on win32?

I have to build a GUI application on Windows Mobile, and would like it to be able user to choose the language she wants, or application to choose the language automatically. I consider using multiple dlls containing just required resources.
1) What is the preferred (default?) way to get the application choose the proper resource language automatically, without user intervention? Any samples?
2) What are my options to allow user / application control what language should it display?
3) If possible, how do I create a dll that would contain multiple language resources and then dynamically choose the language?

For #1, you can use the GetSystemDefaultLangID function to get the language identifier for the machine.
For #2, you could list languages you support and when the user selects one, write the selection into a text file or registry (is there a registry on Windows Mobile?). On startup, use the function in #1 only if there is no selection in the file or registry.
For #3, the way we do it is to have one resource DLL per language, each of which contains the same resource IDs. Once you figure out the language, load the DLL for that language and the rest just works.

Re 1: The previous GetSystemDefuaultLangID suggestion is a good one.
Re 2: You can ask as a first step in your installation. Or you can package different installers for each language.
Re 3:
In theory the DLL method mentioned above sounds great, however in practice it didn't work very well at all for me personally.
A better method is to surround all of the strings in your program with either: Localize or NoLocalize.
MessageBox(Localize("Hello"), Localize("Title"), MB_OK);
RegOpenKey(NoLocalize("\\SOFTWARE\\RegKey"), ...);
Localize is just a function that converts your english text to a the selected language. NoLocalize does nothing.
You want to surround your strings with these values though because you can build a couple of useful scripts in your scripting language of choice.
1) A script that searches for all the Localize(" prefixes and outputs a .ini file with english=otherlangauge name value pairs. If the output .ini file already contains a mapping you don't add it again. You never re-create the ini file completely, your script just adds the missing ones each time you run your script.
2) A script that searches all the strings and makes sure they are surrounded by either Localize(" or NoLocalize(". If not it tells you which strings you still need to localize.
The reason #2 is important is because you need to make sure all of your strings are actually consciously marked as needing localization or not. Otherwise it is absolutely impossible to make sure you have proper localization.
The reason for #1 instead of loading from a DLL is because it takes no work to maintain this solution and you can add new strings that need to be translated on the fly.
You ship the ini files that are output with your program. You also give these ini files to your translators so they can convert the english=otherlanguage pairs. When they send it back to you, you simply replace your checked in .ini file with the one given by your translator. Running your script as mentioned in #1 will re-add any missing translations if any were done while the translator was translating.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio