One annoying thing of encoded packages is that they have to be in a separate file. If we want to distribute a simple self contained app (encoded), we need to supply two files: the app "interface", and the app package.
If I place all the content of the encoded file inside a string, and transform that string into an InputStream, I'm halfway to view that package content as a file.
But Get, that to my knowledge is the only operation (also used by Needs) that has the decoding function, doesn't work on Streams. It only works on real files.
Can someone figure out a way to Get a Stream?
Waiting for Mathematica to arrive on my iPhone so couldn't test anything, but why don't you write the string to a temporary file and Get that?
Update
Here's how to do it:
encoded = ToFileName[$TemporaryDirectory, "encoded"];
Export[encoded, "code string", "Text"]; (*export encrypted code to temp file *)
It's important to copy the contents of the code string from the ASCII file containing the encoded code using an ASCII editor and paste it between existing empty quotes (""). Mathematica will then do automatic escaping of backslashes and quotes that may be in the code. This file has been made earlier using Encode. Can't do it here in the sample code as SO's Markdown messes with the string.
Get[encoded] (* get encrypted code and decode *)
DeleteFile[encoded] (* Remove temp file *)
Final Answer
Get doesn't appear to be necessary for decoding. ImportString does work as well:
ImportString["code string", "NB"]
As above, paste your encoded tekst from an ASCII editor straight between the "" and let MMA do the escaping.
I don't know of a way to Get a Stream, but you could store the encoded data in your single package, write it out to a temp file, then read the temp file back in with Get.
Just to keep things up to date:
Get works with streams since V9.0.
Related
I'm trying to write a short program (short enough that it has a simple main function). First, I should list the dependency in the cargo.toml file:
[dependencies]
passwords = {version = "3.1.3", features = ["crypto"]}
Then when I use the crate in main.rs:
extern crate passwords;
use passwords::hasher;
fn main() {
let args: Vec<String> = std::env::args().collect();
if args.len() < 2
{
println!("Error! Needed second argument to demonstrate BCrypt Hash!");
return;
}
let password = args.get(1).expect("Expected second argument to exist!").trim();
let hash_res = hasher::bcrypt(10, "This_is_salt", password);
match hash_res
{
Err(_) => {println!("Failed to generate a hash!");},
Ok(hash) => {
let str_hash = String::from_utf8_lossy(&hash);
println!("Hash generated from password {} is {}", password, str_hash);
}
}
}
The issue arises when I run the following command:
$ target/debug/extern_crate.exe trooper1
And this becomes the output:
?sC�M����k��ed from password trooper1 is ���Ka .+:�
However, this input:
$ target/debug/extern_crate.exe trooper3
produces this:
Hash generated from password trooper3 is ��;��l�ʙ�Y1�>R��G�Ѡd
I'm pretty content with the second output, but is there something within UTF-8 that could cause the "Hash generat" portion of the output statement to be overwritten? And is there code I could use to prevent this?
Note: Code was developed in Visual Studio Code in Windows 10, and was compiled and run using an embedded Git Bash Terminal.
P.S.: I looked at similar questions such as Rust println! problem - weird behavior inside the println macro and Why does my string not match when reading user input from stdin? but those issues seem to be issues with new-line and I don't think that's the problem here.
To complement the previous, the answer to your question of "is there something within UTF-8 that could cause the "Hash generat" portion of the output statement to be overwritten?" is:
let str_hash = String::from_utf8_lossy(&hash);
The reason's in the name: from_utf8_lossy is lossy. UTF8 is a pretty prescriptive format. You can use this function to "decode" stuff which isn't actually UTF8 (for whatever reason), but the way it will do this decoding is:
replace any invalid UTF-8 sequences with U+FFFD REPLACEMENT CHARACTER, which looks like this: �
And so that is what the odd replacement you get is: byte sequences which can not be decoded as UTF8, and are replaced by the "replacement character".
And this is because hash functions generally return random-looking binary data, meaning bytes across the full range (0 to 255) and with no structure. UTF8 is structured and absolutely does not allow such arbitrary data so while it's possible that a hash will be valid UTF8 (though that's not very useful) the odds are very very low.
That's why hashes (and binary data in general) are usually displayed in alternative representations e.g. hex, base32 or base64.
You could convert the hash to hex before printing it to prevent this
Neither of the other answers so far have covered what caused the Hash generated part of the answer to get overwritten.
Presumably you were running your program in a terminal. Terminals support various "terminal control codes" that give the terminal information such as which formatting they should use to output the text they're showing, and where the text should be output on the screen. These codes are made out of characters, just like strings are, and Unicode and UTF-8 are capable of representing the characters in question – the only difference from "regular" text is that the codes start with a "control character" rather than a more normal sort of character, but control characters have UTF-8 encodings of their own. So if you try to print some randomly generated UTF-8, there's a chance that you'll print something that causes the terminal to do something weird.
There's more than one terminal control code that could produce this particular output, but the most likely possibility is that the hash contained the byte b'\x0D', which UTF-8 decodes as the Unicode character U+000D. This is the terminal control code "CR", which means "print subsequent output at the start of the current line, overwriting anything currently there". (I use this one fairly frequently for printing progress bars, getting the new version of the progress bar to overwrite the old version of the progress bar.) The output that you posted is consistent with accidentally outputting CR, because some random Unicode full of replacement characters ended up overwriting the start of the line you were outputting – and because the code in question is only one byte long (most terminal control codes are much longer), the odds that it might appear in randomly generated UTF-8 are fairly high.
The easiest way to prevent this sort of thing happening when outputting arbitrary UTF-8 in Rust is to use the Debug implementation for str/String rather than the Display implementation – it will output control codes in escaped form rather than outputting them literally. (As the other answers say, though, in the case of hashes, it's usual to print them as hex rather than trying to interpret them as UTF-8, as they're likely to contain many byte sequences that aren't valid UTF-8.)
I was given a file that seems to be encoded in UTF-8, but every byte that should start with 1 starts with 0.
E.g. in place where one would expect polish letter 'ę', encoded in UTF-8 as \o304\o231, there is \o104\o031. Or, in binary, there is 01000100:00011001 instead of 11000100:10011001.
I assume that this was not done on purpose by evil file creator who enjoys my headache, but rather is a result of some erroneous operations performed on a correct UTF-8 file.
The question is: what "reasonable" operations could be the cause? I have no idea how the file was created, probably it was exported by some unknown software, than could have been compressed, uploaded, copied & pasted, converted to another encoding etc.
I'll be greatful for any idea : )
I'm developing a software that stores its data in a binary file format. However, as a courtesy to innocent shell users that might cat to inspect the contents of such a file, I'm thinking of having an ASCII-compatible "magic string" in the start of the file that tells the name and the version of the binary format.
I'm thinking of having at least ten rows (\n) in the message so that head by default settings doesn't hit the binary part.
Now, I wonder if there is any control character or escape code that would hint to the shell that the following content isn't interpretable as printable text, and should be just ignored? I tried 0x00 (the null byte) and 0x04 (ctrl-D) but they seem to be just ignored when catting the file.
Cat regards a file as text. There is no way you can trigger an end-of-file, since EOF is not actually any character.
The other way around works of course; specifying a format that only start reading binary format from a certain character on.
I am looking for a way to check if a PDF is missing an end of file character. So far I have found I can use the pdf-reader gem and catch the MalformedPDFError exception, or of course I could simply open the whole file and check if the last character was an EOF. I need to process lots of potentially large PDF's and I want to load as little memory as possible.
Note: all the files I want to detect will be lacking the EOF marker, so I feel like this is a little more specific scenario then detecting general PDF "corruption". What is the best, fast way to do this?
TL;DR
Looking for %%EOF, with or without related structures, is relatively speedy even if you scan the entirety of a reasonably-sized PDF file. However, you can gain a speed boost if you restrict your search to the last kilobyte, or the last 6 or 7 bytes if you simply want to validate that %%EOF\n is the only thing on the last line of a PDF file.
Note that only a full parse of the PDF file can tell you if the file is corrupted, and only a full parse of the File Trailer can fully validate the trailer's conformance to standards. However, I provide two approximations below that are reasonably accurate and relatively fast in the general case.
Check Last Kilobyte for File Trailer
This option is fairly fast, since it only looks at the tail of the file, and uses a string comparison rather than a regular expression match. According to Adobe:
Acrobat viewers require only that the %%EOF marker appear somewhere within the last 1024 bytes of the file.
Therefore, the following will work by looking for the file trailer instruction within that range:
def valid_file_trailer? filename
File.open filename { |f| f.seek -1024, :END; f.read.include? '%%EOF' }
end
A Stricter Check of the File Trailer via Regex
However, the ISO standard is both more complex and a lot more strict. It says, in part:
The last line of the file shall contain only the end-of-file marker, %%EOF. The two preceding lines shall contain, one per line and in order, the keyword startxref and the byte offset in the decoded stream from the beginning of the file to the beginning of the xref keyword in the last cross-reference section. The startxref line shall be preceded by the trailer dictionary, consisting of the keyword trailer followed by a series of key-value pairs enclosed in double angle brackets (<< … >>) (using LESS-THAN SIGNs (3Ch) and GREATER-THAN SIGNs (3Eh)).
Without actually parsing the PDF, you won't be able to validate this with perfect accuracy using regular expressions, but you can get close. For example:
def valid_file_trailer? filename
pattern = /^startxref\n\d+\n%%EOF\n\z/m
File.open(filename) { |f| !!(f.read.scrub =~ pattern) }
end
I have a number in Mathematica, a large number. I have even gotten this number in base 16 form, using OutputForm[]. I am basically trying to write out a number to a file in hex format.
Please keep in mind I am using 123456 in these examples instead of my 70,000 digit number.
Whenever I write a file using a simple Put[123456, "file.raw"] command, I get a raw data file with the actual data 3132333435360A with a line ending.
If I use Put[OutputForm[BaseForm[123456, 16]], "file.raw"] command, I get a raw data file with the data in hex format 31653234300A202020202031360A but still not written as raw data.
I would like the Hex Form of the Number Dumped as Data.
I have tried Export, BinaryWrite, and DumpSave, but can't figure it out.
I just am getting a headache I guess cause I can't see past what I need to do.
One thing I did try was doing:
Export["file.raw", 123456];
But the file is not raw enough. What I mean by that is there is there is header data and extra crap.
Would love to get this working thanks.
Please let us know what you expect to see in your output file, and what you want use it for. Do you want something a human can read, or something in a specified format to be used by a computer? Please provide an example.
The two examples using Put[] correctly provide files containing ASCII characters corresponding to the text representations of your inputs, and which are human-readable.
I think what you're looking for is IntegerString[_,16]:
In[33]:= IntegerString[123456, 16]
Out[33]= "1e240"
str = OpenWrite[];
WriteString[str, IntegerString[123456, 16]];
Close[str];
FilePrint[%]
1e240
(using WriteString instead of Put avoids having the string characters