How can I extract the .jpg/.png components of an .hpi file? - binaryfiles

I stumbled across my rather ancient photo objects disks, and sadly found out the company (hemera) doesn't provide support for it anymore. this has left me with a whole pile of .hpi files. Luckily, I found this information on extracting the jpg and png components of the file.
Unfortunately, I haven't been able to get it to work. Can anyone figure out what's wrong with this code? I'd be happy with a PHP or Python solution if Perl isn't your thing. :)
open(I, "$name") || die;
binmode(I);
$_ = <I>;
close(I);
my ($j, $p) = m|^.{32}(.*)(\211PNG.*)$|s;
open(J, ">$name.jpg") &&
do { binmode(J); print J $j; close J; };
open(P, ">$name.png") &&
do { binmode(P); print P $p; close P; };
The hexdump of the current test file I snagged off a CD is here, if it helps at all:
0000000 89 48 50 49 0d 0a 1a 0a 64 00 00 00 20 00 00 00
0000010 45 89 00 00 65 89 00 00 0a 21 00 00 00 d0 d0 00

I had a similar problem extracting images from an MS Word document. Here's the program I wrote for that. It only extracts PNGs, though:
#!/usr/bin/perl
use strict;
my $HEADER = "\211PNG";
my $FOOTER = "IEND\xAEB`\x82";
foreach my $file ( #ARGV )
{
print "Extracting $file\n";
(my $image_base = $file) =~ s/(.*)\..*/$1/;
my $data = do { local $/; open my( $fh ), $file; <$fh> };
my $count = 0;
while( $data =~ m/($HEADER.*?$FOOTER)/sg )
{
my $image = $1;
$count++;
my $image_name = "$image_base.$count.png";
open my $fh, "> $image_name" or warn "$image_name: $!", next;
print "Writing $image_name: ", length($image), " bytes\n";
print $fh $image;
close $fh;
}
}
__END__

It seems the regexp is wrong. That's why I wrote a little C program to do it for me:
#include <stdio.h>
#include <stdlib.h>
#define MAX_SIZE 1048576
char stuff[MAX_SIZE];
int main (int argc, char **argv)
{
unsigned int j_off, j_len, p_off, p_len;
FILE *fp, *jp, *pp;
fp = fopen (argv[1], "r");
if (!fp) goto error;
if (fseek (fp, 12, SEEK_SET)) goto error;
if (!fread (&j_off, 4, 1, fp)) goto error;
if (!fread (&j_len, 4, 1, fp)) goto error;
if (!fread (&p_off, 4, 1, fp)) goto error;
if (!fread (&p_len, 4, 1, fp)) goto error;
fprintf (stderr, "INFO %s \t%d %d %d %d\n",
argv[1], j_off, j_len, p_off, p_len);
if (j_len > MAX_SIZE || p_len > MAX_SIZE) {
fprintf (stderr, "%s: Chunk size too big!\n", argv[1]);
return EXIT_FAILURE;
}
jp = fopen (argv[2], "w");
if (!jp) goto error;
if (fseek (fp, j_off, SEEK_SET)) goto error;
if (!fread (stuff, j_len, 1, fp)) goto error;
if (!fwrite (stuff, j_len, 1, jp)) goto error;
fclose (jp);
pp = fopen (argv[3], "w");
if (!pp) goto error;
if (fseek (fp, p_off, SEEK_SET)) goto error;
if (!fread (stuff, p_len, 1, fp)) goto error;
if (!fwrite (stuff, p_len, 1, pp)) goto error;
fclose (pp);
fclose (fp);
return EXIT_SUCCESS;
error:
perror (argv[1]);
return EXIT_FAILURE;
}
It works with the command line parameters input.hpi output.jpg output.png.
The error handling is not 100% correct, but it is good enough to always tell you if something's wrong, and most times what it is. For large files, you will have to enlarge MAX_SIZE.
Here is a shell script which you can call with *.hpi:
#!/bin/bash
dest=<destination-folder>
for arg in "$#"
do
base=`echo $arg | cut -d'.' -f1`
<executable> $arg $dest/original/$base.jpg $dest/mask/$base.png 2>>$dest/log
#composite -compose CopyOpacity $dest/mask/$base.png $dest/original/$base.jpg $dest/rgba/$base.png
done
The optional composite command (comes with ImageMagick) will create a new PNG image which has the mask applied as alpha channel. Note that this file will be about 5 times larger than the original files.
Note that some HPI files come without mask. In this case, my program will still work, but give an empty PNG file.

Not a program-your-own solution, but this application, which is freeware for personal use, states that it can convert hpi files.

For those arriving by Google here, I've written a Python script that solves this problem for PNG images only:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re, sys
def main():
if len(sys.argv) < 2:
print """Usage:
{0} BINARY_FILE PNG_PATH_TEMPLATE
Example:
{0} bin/program 'imgs/image.{{0:03d}}.png'""".format(__file__)
return
binfile, pngpath_tpl = sys.argv[1:3]
rx = re.compile("\x89PNG.+?IEND\xAEB`\x82", re.S)
bintext = open(binfile, "rb").read()
PNGs = rx.findall(bintext)
for i, PNG in enumerate(PNGs):
f = open(pngpath_tpl.format(i), "wb") # Simple string format.
f.write(PNG)
f.close()
if __name__ == "__main__":
main()

For .jpeg and .mov files there is recoverjpeg, which I tested on linux (but may be compatible with other platforms).
On some debian systems it's available through apt get install recoverjpeg

Related

stoll function(c++11 string) giving ambiguous output

I encountered this problem :
https://www.urionlinejudge.com.br/judge/en/problems/view/1193
/*input
3
101 bin
101 dec
8f hex
*/
/*************************************************************
* Purpose : https://www.urionlinejudge.com.br/judge/en/problems/view/1193
* Author: Sahil Arora
* Version: 1.0
* Date: 22/10/15
* Bugs : None
*************************************************************/
#include<bits/stdc++.h>
using namespace std;
int main(int argc, char const *argv[])
{
/* code */
std::ios_base::sync_with_stdio(false);
int test;
cin>>test;
for(int i=1;i<=test;++i){
cout<<"Case "<<i<<":\n";
string str,base;
long long num,j;
cin>>str>>base;
if(base=="dec"){
num = stoll(str,nullptr);
cout<<hex<<num<<" hex\n";
bitset<32> bin(num);
for(j=31;bin[j]==0 && j>=0;--j)
;
while(j>=0)
cout<<bin[j--];
cout<<" bin\n\n";
}
else if(base=="hex"){
str = "0x" + str;
num = stoll(str,nullptr,16);
cout<<dec<<num<<" dec\n"; // <--focus on this line
bitset<32> bin(num);
for(j=31;bin[j]==0 && j>=0;--j)
;
while(j>=0)
cout<<bin[j--];
cout<<" bin\n\n";
}
else{
num = stoll(str,nullptr,2);
cout<<dec<<num<<" dec\n"<<hex<<num<<" hex\n\n";
}
}
return 0;
}
Now on changing line 45 to :
cout<<num<<" dec\n";
My output for hex changes. It gives the same output as input for hex to dec.I fail to understand why it gives such an error. Also, if I enter only 1 test case, it gives correct output for a hex, but still gives 20% Wrong answer on submission!
Input :
3
101 bin
101 dec
8f hex
Expected output :
Case 1:
5 dec
5 hex
Case 2:
65 hex
1100101 bin
Case 3:
143 dec
10001111 bin
My Output without using dec in cout :
Case 1:
5 dec
5 hex
Case 2:
65 hex
1100101 bin
Case 3:
8f dec
10001111 bin
I believe your program behavior is expectional and has nothing to do with stoll itself.
So, std::hex, std::dec (and std::oct) are manipulators that modify output base. Once applied they will not reset (this changes them from std::setw for example). That is why you should reapply manipulators every time you want to change output base.
So you just have to write
std::cout<<std::dec<<num<<" dec\n";
because you may have applied
std::cout<<std::hex<<num<<" hex\n";
earlier

Windows SCSI ReadCapacity16 in D

I'm attempting to send a scsi ReadCapacity16 (0x9E) to a volume on Windows using D. The CDBs are to spec and my ReadCapacity16 works on Linux and scsi Inquiries work on Windows. Only the not-inquiry calls on Windows fail to work with an "incorrect function" from the windows kernel.
Since only inquiries work, is there a trick to sending not-inquiries through the Windows kernel? Any tips on getting this to work? I've researched a couple weeks and haven't solved this.
This is an example of the CDB:
\\.\physicaldrive0
CDB buffer contents:
9e 10 00 00 00 00 00 00 - 00 00 00 00 00 20 00 00
sgio.exceptions.IoctlFailException#sgio\exceptions.d(13): ioctl error code is 1. Incorrect function.
Here is where the CDB is copied to a buffer for the DeviceIoControl call, and this is the same code path which successfully sends the Inquiry commands (but fails for readcap). Code in github pasted below:
void sgio_execute(ubyte[] cdb_buf, ubyte[] dataout_buf, ubyte[] datain_buf, ubyte[] sense_buf)
version (Windows)
{
const uint SENSE_LENGTH = 196;
ubyte[512] iobuffer = 0;
DWORD amountTransferred = -1;
SCSI_PASS_THROUGH_DIRECT scsiPassThrough = {0};
scsiPassThrough.Cdb[] = 0;
uint size = cast(uint)((cdb_buf.length <= scsiPassThrough.Cdb.length ?
cdb_buf.length : scsiPassThrough.Cdb.length));
scsiPassThrough.Cdb[0..size] = cdb_buf[0..size];
scsiPassThrough.Length = SCSI_PASS_THROUGH_DIRECT.sizeof;
scsiPassThrough.ScsiStatus = 0x00;
scsiPassThrough.TimeOutValue = 0x40;
scsiPassThrough.CdbLength = cast(ubyte)(size);
scsiPassThrough.SenseInfoOffset = SCSI_PASS_THROUGH_DIRECT.sizeof;
scsiPassThrough.SenseInfoLength = SENSE_LENGTH;
scsiPassThrough.DataIn = SCSI_IOCTL_DATA_IN;
scsiPassThrough.DataBuffer = datain_buf.ptr;
scsiPassThrough.DataTransferLength = bigEndianToNative!ushort(cast(ubyte[2]) cdb_buf[3..5]);
int status = DeviceIoControl( m_device,
IOCTL_SCSI_PASS_THROUGH_DIRECT,
&scsiPassThrough,
iobuffer.length, //scsiPassThrough.sizeof,
&iobuffer,
iobuffer.length,
&amountTransferred,
null);
if (status == 0)
{
int errorCode = GetLastError();
// build error message ...
throw new IoctlFailException(exceptionMessage);
}
}
}
Reading the Windows SCSI_PASS_THROUGH_DIRECT structure documentation very closely I noticed this:
DataTransferLength: Indicates the size in bytes of the data buffer.
Many devices transfer chunks of data of predefined length. The value
in DataTransferLength must be an integral multiple of this predefined,
minimum length that is specified by the device. If an underrun occurs,
the miniport driver must update this member to the number of bytes
actually transferred.
I changed the code to use 512 bytes for DataTransferLength, by increasing the size of datain_buffer, and the code now works just fine.

Trouble trying to output file using vtkOBJWriter

I am trying to use vtkOBJWriter from David Doria to convert a .vtk file to a .obj file. I git cloned from https://github.com/daviddoria/vtkOBJWriter, added a build directory for the CMake and make, and altered the file vtkOBJWriterExample.cxx to:
#include <vtkSmartPointer.h>
#include <vtkPolyData.h>
#include <vtkSphereSource.h>
#include <vtkPolyDataReader.h>
#include "vtkOBJWriter.h"
int main (int argc, char *argv[])
{
vtkSmartPointer<vtkPolyData> input;
std::string outputFilename;
// Verify command line arguments
if(argc > 1) // Use the command line arguments
{
if(argc != 3)
{
std::cout << "Required arguments: InputFilename.vtp OutputFilename.obj" << std::endl;
return EXIT_FAILURE;
}
vtkSmartPointer<vtkPolyDataReader> reader =
vtkSmartPointer<vtkPolyDataReader>::New();
reader->SetFileName(argv[1]);
reader->Update();
input = reader->GetOutput();
outputFilename = argv[2];
}
else
{
outputFilename = "output.obj";
vtkSmartPointer<vtkSphereSource> sphereSource =
vtkSmartPointer<vtkSphereSource>::New();
sphereSource->Update();
input->ShallowCopy(sphereSource->GetOutput());
}
vtkSmartPointer<vtkOBJWriter> writer =
vtkSmartPointer<vtkOBJWriter>::New();
writer->SetInput(input);
writer->SetFileName(outputFilename.c_str());
writer->Update();
return EXIT_SUCCESS;
}
to reflect that I am using VTK 5.8.0 . When I try to do sudo ./vtkOBJWriterExample trytry1.vtk Documents/comeOn.obj , no output file is made (I don't see it in the appropriate directory). I also tried it with trytry1.vtp, and it didn't seem to work. My vtk file format is :
# vtk DataFile Version 3.0
vtk output
ASCII
DATASET POLYDATA
FIELD FieldData 3
group_id 1 1 int
0
base_index 1 3 int
0 0 0
avtOriginalBounds 1 6 double
-10 10 -10 10 -10 10
POINTS 14387 float
-5.10204 -2.65306 -9.69246 -5.10204 -2.75294 -9.59184 -5.37199 -2.65306 -9.59184
...
POLYGONS 28256 113024
3 0 1 2
...
POINT_DATA 14387
SCALARS hardyglobal float
LOOKUP_TABLE default
3.4926 3.4926 3.4926 3.4926 3.4926 3.4926 3.4926 3.4926 3.4926
...
which doesn't seem to match the formatting of car.vtp in the data directory, but I thought I made the appropriate changes (using the formatting of vtkPolyDataReader.h instead of vtkXMLPolyDataReader.h ). I am not sure why there is no file being outputted.
I do not receive any error messages.
It was a directory problem (my command line arguments were pointing to the wrong directory). It should have been just ./vtkOBJWriterExample trytry1.vtk comeOn.obj

Can a breakpoint display the contents of "const unsigned char* variable"?

I'm on the trail of why the contents of a TXT record in a Bonjour service discovery is sometimes being incompletely interpreted, and I've reached a point where it would be really useful to have a breakpoint print out the contents of an unsigned char in a callback (I've tried NSLog, but using NSLog in a threaded callback can get really tricky).
The callback function is defined this way:
static void resolveCallback(DNSServiceRef sdRef, DNSServiceFlags flags, uint32_t interfaceIndex, DNSServiceErrorType errorCode,
const char* fullname, const char* hosttarget, uint16_t port, uint16_t txtLen,
const unsigned char* txtRecord, void* context) {
So I'm interested in the txtRecord
Right now my breakpoint is using:
memory read --size 4 --format x --count 4 `txtRecord`
But that's only because that was an example on the lldv.llvm.org example page ;-) It's certainly showing data that I expect to be there, partially.
Do I have to apply informed knowledge of the length or can the breakpoint be coded such that it uses the length that is present? I'm thinking that instead of "hard coding" the two 4s in the example there ought to be a way to wrap in other read instructions inside back ticks like I did with the variable name.
Looking at http://lldb.llvm.org/varFormats.html I thought I'd try a format of C instead of x but that prints out series of dots which must mean I picked a wrong format or something.
I just tried
memory read `txtRecord`
and that's almost exactly what I wanted to see as it gives:
0x1c5dd884: 10 65 6e 30 3d 31 39 32 2e 31 36 38 2e 31 2e 33 .en0=192.168.1.3
0x1c5dd894: 36 0a 70 6f 72 74 3d 35 30 32 37 38 00 00 00 00 6.port=50278....
This looks really close:
memory read `txtRecord` --format C
giving:
0x1d0c6974: .en0=192.168.1.36.port=50278....
If that's the best I can get, I guess I can deal with the length bytes in front of each of the two strings in that txtRecord.
I'm asking this question because I'd like to display the actual and correct values... the bug is that sometimes the IP address comes back wrong, losing the frontmost 1, other times the port comes back "short" (in network byte order) with non-numeric characters at the end, like "502¿" instead of "50278" (in this example run).
My initial response to this question, while informative, was not complete. I originally thought the problem being reported was just about printing a c-string array of type unsigned char * where the default formatters (char *) weren't being used. That answer comes first. Then comes the answer about how to print this (somewhat unique) array of pascal strings data that the program is actually dealing with.
First answer: lldb knows how to handle the char * well; it's the unsigned char * bit that is making it behave a little worse than usual. e.g. if txtRecord were a const char *,
(lldb) p txtRecord
(const char *) $0 = 0x0000000100000f51 ".en0=192.168.1.36.port=50278"
You can copy the type summary lldb has built in for char * for unsigned char *. type summary list lists all of the built in type summaries; copying lldb-179.5's summaries for char *:
(lldb) type summary add -p -C false -s ${var%s} 'unsigned char *'
(lldb) type summary add -p -C false -s ${var%s} 'const unsigned char *'
(lldb) fr va txtRecord
(const unsigned char *) txtRecord = 0x0000000100000f51 ".en0=192.168.1.36.port=50278"
(lldb) p txtRecord
(const unsigned char *) $2 = 0x0000000100000f51 ".en0=192.168.1.36.port=50278"
(lldb)
Of course you can put these in your ~/.lldbinit file and they'll be picked up by Xcode et al from now on.
Second answer: To print the array of pascal strings that this is actually using, you'll need to create a python function. It will take two arguments, the size of the pascal string buffer (txtLen) and the address of the start of the buffer (txtRecord). Create a python file like pstrarray.py (I like to put these in a directory I made, ~/lldb) and load it into your lldb via the ~/.lldbinit file so you have the command available:
command script import ~/lldb/pstrarray.py
The python script is a little long; I'm sure someone more familiar with python could express this more concisely. There's also a bunch of error handling which adds bulk. But the main idea is to take two parameters: the size of the buffer and the pointer to the buffer. The user will express these with variable names like pstrarray txtLen txtRecord, in which case you could look up the variables in the current frame, but they might also want to use an acutal expression like pstrarray sizeof(str) str. So we need to pass these parameters through the expression evaluation engine to get them down to an integer size and a pointer address. Then we read the memory out of the process and print the strings.
import lldb
import shlex
import optparse
def pstrarray(debugger, command, result, dict):
command_args = shlex.split(command)
parser = create_pstrarray_options()
try:
(options, args) = parser.parse_args(command_args)
except:
return
if debugger and debugger.GetSelectedTarget() and debugger.GetSelectedTarget().GetProcess():
process = debugger.GetSelectedTarget().GetProcess()
if len(args) < 2:
print "Usage: pstrarray size-of-buffer pointer-to-array-of-pascal-strings"
return
if process.GetSelectedThread() and process.GetSelectedThread().GetSelectedFrame():
frame = process.GetSelectedThread().GetSelectedFrame()
size_of_buffer_sbval = frame.EvaluateExpression (args[0])
if not size_of_buffer_sbval.IsValid() or size_of_buffer_sbval.GetValueAsUnsigned (lldb.LLDB_INVALID_ADDRESS) == lldb.LLDB_INVALID_ADDRESS:
print 'Could not evaluate "%s" down to an integral value' % args[0]
return
size_of_buffer = size_of_buffer_sbval.GetValueAsUnsigned ()
address_of_buffer_sbval = frame.EvaluateExpression (args[1])
if not address_of_buffer_sbval.IsValid():
print 'could not evaluate "%s" down to a pointer value' % args[1]
return
address_of_buffer = address_of_buffer_sbval.GetValueAsUnsigned (lldb.LLDB_INVALID_ADDRESS)
# If the expression eval didn't give us an integer value, try it again with an & prepended.
if address_of_buffer == lldb.LLDB_INVALID_ADDRESS:
address_of_buffer_sbval = frame.EvaluateExpression ('&%s' % args[1])
if address_of_buffer_sbval.IsValid():
address_of_buffer = address_of_buffer_sbval.GetValueAsUnsigned (lldb.LLDB_INVALID_ADDRESS)
if address_of_buffer == lldb.LLDB_INVALID_ADDRESS:
print 'could not evaluate "%s" down to a pointer value' % args[1]
return
err = lldb.SBError()
pascal_string_buffer = process.ReadMemory (address_of_buffer, size_of_buffer, err)
if (err.Fail()):
print 'Failed to read memory at address 0x%x' % address_of_buffer
return
pascal_string_array = bytearray(pascal_string_buffer, 'ascii')
index = 0
while index < size_of_buffer:
length = ord(pascal_string_buffer[index])
print "%s" % pascal_string_array[index+1:index+1+length]
index = index + length + 1
def create_pstrarray_options():
usage = "usage: %prog"
description='''print an buffer which has an array of pascal strings in it'''
parser = optparse.OptionParser(description=description, prog='pstrarray',usage=usage)
return parser
def __lldb_init_module (debugger, dict):
parser = create_pstrarray_options()
pstrarray.__doc__ = parser.format_help()
debugger.HandleCommand('command script add -f %s.pstrarray pstrarray' % __name__)
and an example program to run this on:
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main ()
{
unsigned char str[] = {16,'e','n','0','=','1','9','2','.','1','6','8','.','1','.','3','6',
10,'p','o','r','t','=','5','1','6','8','7'};
uint8_t *p = str;
while (p < str + sizeof (str))
{
int len = *p++;
char buf[len + 1];
strlcpy (buf, (char*) p, len + 1);
puts (buf);
p += len;
}
puts ("done"); // break here
}
and in use:
(lldb) br s -p break
Breakpoint 1: where = a.out`main + 231 at a.c:17, address = 0x0000000100000ed7
(lldb) r
Process 74549 launched: '/private/tmp/a.out' (x86_64)
en0=192.168.1.36
port=51687
Process 74549 stopped
* thread #1: tid = 0x1c03, 0x0000000100000ed7 a.out`main + 231 at a.c:17, stop reason = breakpoint 1.1
#0: 0x0000000100000ed7 a.out`main + 231 at a.c:17
14 puts (buf);
15 p += len;
16 }
-> 17 puts ("done"); // break here
18 }
(lldb) pstrarray sizeof(str) str
en0=192.168.1.36
port=51687
(lldb)
While it's cool that it's possible to do this in lldb, it's not as smooth as we'd like to see. If the size of the buffer and the address of the buffer were contained in a single object, struct PStringArray {uint16_t size; uint8_t *addr;}, that would work much better. You could define a type summary formatter for all variables of type struct PStringArray and no special commands would be required. You'd still need to write a python function, but it could get all the information it needed out of the object directly so it would disappear into the lldb type format system. You could just write (lldb) p strs and the custom formatter function would be called on strs to print all the strings in there.

Random file generator code?

Does anyone have a simple shell script or c program to generate random files of a set size
with random content under linux?
How about:
head -c SIZE /dev/random > file
openssl rand can be used to generate random bytes.
The command is below:
openssl rand [bytes] -out [filename]
For example,openssl rand 2048 -out aaa will generate a file named aaa containing 2048 random bytes.
Here are a few ways:
Python:
RandomData = file("/dev/urandom", "rb").read(1024)
file("random.txt").write(RandomData)
Bash:
dd if=/dev/urandom of=myrandom.txt bs=1024 count=1
using C:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int byte_count = 1024;
char data[4048];
FILE *fp;
fp = fopen("/dev/urandom", "rb");
fread(&data, 1, byte_count, fp);
int n;
FILE *rand;
rand=fopen("test.txt", "w");
fprintf(rand, data);
fclose(rand);
fclose(rand);
}
Python. Call it make_random.py
#!/usr/bin/env python
import random
import sys
import string
size = int(sys.argv[1])
for i in xrange(size):
sys.stdout.write( random.choice(string.printable) )
Use it like this
./make_random 1024 >some_file
That will write 1024 bytes to stdout, which you can capture into a file. Depending on your system's encoding this will probably not be readable as Unicode.
Here's a quick an dirty script I wrote in Perl. It allows you to control the range of characters that will be in the generated file.
#!/usr/bin/perl
if ($#ARGV < 1) { die("usage: <size_in_bytes> <file_name>\n"); }
open(FILE,">" . $ARGV[0]) or die "Can't open file for writing\n";
# you can control the range of characters here
my $minimum = 32;
my $range = 96;
for ($i=0; $i< $ARGV[1]; $i++) {
print FILE chr(int(rand($range)) + $minimum);
}
close(FILE);
To use:
./script.pl file 2048
Here's a shorter version, based on S. Lott's idea of outputting to STDOUT:
#!/usr/bin/perl
# you can control the range of characters here
my $minimum = 32;
my $range = 96;
for ($i=0; $i< $ARGV[0]; $i++) {
print chr(int(rand($range)) + $minimum);
}
Warning: This is the first script I wrote in Perl. Ever. But it seems to work fine.
You can use my generate_random_file.py script (Python 3) that I used to generate test data in a project of mine.
It works both on Linux and Windows.
It is very fast, because it uses os.urandom() to generate the random data in chunks of 256 KiB instead of generating and writing each byte separately.

Resources