Streaming JSON Algorithm - WITHOUT STACK - algorithm

Is it possible to design a streaming JSON algorithm that writes JSON directly to a socket with the following properties:
* can only write to, but cannot delete or seek within the stream
* does not use either an IMPLICIT or explicit stack
* uses only a constant amount of memory stack depth no matter how deep the object nesting within the json
{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{"1":{...}}}}}}}}}}}}}}}}}

Short answer: No.
Slightly longer: At least not in the general case. If you could guarantee that the nesting has no branching, you could use a simple counter to close the braces at the end.

No, because you could use such a program to compress infinite amounts of memory into a finite space.
Encoding implementation:
input = read('LIBRARY_OF_CONGRESS.tar.bz2')
input_binary = convert_to_binary(input)
json_opening = replace({'0':'[', '1':'{'}, input_binary)
your_program <INPUTPIPE >/dev/null
INPUTPIPE << json_opening
Execute the above program then clone virtual machine it is running on. That is your finite-space compressed version of the infinitely-large input data set. Then to decode...
Decoding implementation:
set_output_pipe(your_program, OUTPUTPIPE)
INPUTPIPE << EOL
json_closing << OUTPUTPIPE
output_binary = replace({']':'0', '}':'1'}, reverse(json_closing))
output = convert_from_binary(output_binary)
write(output, 'LIBRARY_OF_CONGRESS-copy.tar.bz2')
And of course, all good code should have a test case...
Test case:
bc 'LIBRARY_OF_CONGRESS.tar.bz2' 'LIBRARY_OF_CONGRESS-copy.tar.bz2'

Related

Parallel for loop in Julia

I am aware there are a multitude of questions about running parallel for loops in Julia, using #threads, #distributed, and other methods. I have tried to implement the solutions there with no luck. The structure of what I'd like to do is as follows.
for index in list_of_indices
data = h5read("data_set_$index.h5")
result = perform_function(data)
save(result)
end
The data sets are independent, and no part of this loop depends on any other. It seems this should be parallelizable.
I have tried, e.g.,
"#threads for index in list_of_indices..." and I get a segmentation error
"#distributed for index in list_of_indices..." and the code does not actually perform the function on my data.
I assume I'm missing something about how parallel processes work, and any insight would be appreciated.
Here is a MWE:
Assume we have files data_1.h5, data_2.h5, data_3.h5 in our working directory. (I don't know how to make things more self-contained than this because I think the problem is arising from asking multiple threads to read files.)
using Distributed
using HDF5
list = [1,2,3]
Threads.#threads for index in list
data = h5read("data_$index.h5", "data")
println(data)
end
The error I get is
signal (11): Segmentation fault
signal (6): Aborted
Allocations: 1587194 (Pool: 1586780; Big: 414); GC: 1
Segmentation fault (core dumped)
As noted by other people there is no enough details. However, given the current state of information the safest code that has the highest chance to work is:
using Distributed
addprocs(4)
#everywhere using HDF5
list = [1,2,3]
#sync #distributed for index in list
data = h5read("data_$index.h5", "data")
println(data)
end
Distributed approach separates processes completely and hence you have much lesser chance to do something wrong (eg. use a library with a shared resource etc).

Open CL Kernel - every workitem overwrites global memory?

I'm trying to write a kernel to get the character frequencies of a string.
First, here is the code I have for kernel right now:
_kernel void readParallel(__global char * indata, __global int * outdata)
{
int startId = get_global_id(0) * 8;
int maxId = startId + 7;
for (int i = startId; i < maxId; i++)
{
++outdata[indata[i]];
}
}
The variable inData holds the string in the global memory, and outdata is an array of 256 int values in the global memory. Every workitem reads 8 symbols from the string and should increase the appropriate ASCII-code in the array. The code compiles and executes, but outdata contains less occurrences overall than the number of characters in inData. I think the problem is that workitems overwrites the global memory. It would be nice if you can give me some tips to solve this.
By the way,. I am a rookie in OpenCL ;-) and, yes, I looked for solutions in other questions.
You are experiencing the effects of your uses of global memory not being atomic (C++-oriented description of what those are or another description by the Intel TBB folks). What happens, chronologically, is:
Some workgroup "thread" loads outData[123] into some register r1
... lots of work, reading and writing, happens, including on
outData[123]...
The same workgroup "thread" increments r1
... lots of work, reading and writing, happens, including on
outData[123]...
The same workgroup "thread" writes r1 to outData[123]
So, the value written to outData[123] "throws away" the updates during the time period between the read and the write (I'm ignoring the possibility of parallel writes corrupting each other rather than one of them winning out).
What you need to do is either:
Use atomic operations - the least amount of modifications to your code, but very inefficient, since it serializes your work to a great extent, or
Use work-item-specific, warp-specific and/or work-group-specific partial results, which require less/cheaper synchronization, and combine them eventually after having done a lot of work on them.
On an unrelated note, and as #huseyintugrulbuyukisik correctly points out, your code uses signed char values to index the array. To fix that, do one of the following:
reinterpret those char's as unsigned chars for array indices (and reinterpret back when reading the array.
upcast the char values to a larger integral type and add 128 to get an offset into the outArray.
Define your kernel to only support ASCII characters (no higher than 127), in which case you can ignore this issue (although that will be a potential crasher if you get invalid input.
If you only care about the frequency of printable characters (but can also have non-printing characters in the input), you could perform a run-time check before counting a character.

std::copy runtime_error when working with uint16_t's

I'm looking for input as to why this breaks. See the addendum for contextual information, but I don't really think it is relevant.
I have an std::vector<uint16_t> depth_buffer that is initialized to have 640*480 elements. This means that the total space it takes up is 640*480*sizeof(uint16_t) = 614400.
The code that breaks:
void Kinect360::DepthCallback(void* _depth, uint32_t timestamp) {
lock_guard<mutex> depth_data_lock(depth_mutex);
uint16_t* depth = static_cast<uint16_t*>(_depth);
std::copy(depth, depth + depthBufferSize(), depth_buffer.begin());/// the error
new_depth_frame = true;
}
where depthBufferSize() will return 614400 (I've verified this multiple times).
My understanding of std::copy(first, amount, out) is that first specifies the memory address to start copying from, amount is how far in bytes to copy until, and out is the memory address to start copying to.
Of course, it can be done manually with something like
#pragma unroll
for(auto i = 0; i < 640*480; ++i) depth_buffer[i] = depth[i];
instead of the call to std::copy, but I'm really confused as to why std::copy fails here. Any thoughts???
Addendum: the context is that I am writing a derived class that inherits from FreenectDevice to work with a Kinect 360. Officially the error is a Bus Error, but I'm almost certain this is because libfreenect interprets an error in the DepthCallback as a Bus Error. Stepping through with lldb, it's a standard runtime_error being thrown from std::copy. If I manually enter depth + 614400 it will crash, though if I have depth + (640*480) it will chug along. At this stage I am not doing something meaningful with the depth data (rendering the raw depth appropriately with OpenGL is a separate issue xD), so it is hard to tell if everything got copied, or just a portion. That said, I'm almost positive it doesn't grab it all.
Contrasted with the corresponding VideoCallback and the call inside of copy(video, video + videoBufferSize(), video_buffer.begin()), I don't see why the above would crash. If my understanding of std::copy were wrong, this should crash too since videoBufferSize() is going to return 640*480*3*sizeof(uint8_t) = 640*480*3 = 921600. The *3 is from the fact that we have 3 uint8_t's per pixel, RGB (no A). The VideoCallback works swimmingly, as verified with OpenGL (and the fact that it's essentially identical to the samples provided with libfreenect...). FYI none of the samples I have found actually work with the raw depth data directly, all of them colorize the depth and use an std::vector<uint8_t> with RGB channels, which does not suit my needs for this project.
I'm happy to just ignore it and move on in some senses because I can get it to work, but I'm really quite perplexed as to why this breaks. Thanks for any thoughts!
The way std::copy works is that you provide start and end points of your input sequence and the location to begin copying to. The end point that you're providing is off the end of your sequence, because your depthBufferSize function is giving an offset in bytes, rather than the number of elements in your sequence.
If you remove the multiply by sizeof(uint16_t), it will work. At that point, you might also consider calling std::copy_n instead, which takes the number of elements to copy.
Edit: I just realised that I didn't answer the question directly.
Based on my understanding of std::copy, it shouldn't be throwing exceptions with the input you're giving it. The only thing in that code that could throw a runtime_error is the locking of the mutex.
Considering you have undefined behaviour as a result of running off of the end of your buffer, I'm tempted to say that has something to do with it.

How to use audioConverterFillComplexBuffer and its callback?

I need a step by step walkthrough on how to use audioConverterFillComplexBuffer and its callback. No, don't tell me to read the Apple docs. I do everything they say and the conversion always fails. No, don't tell me to go look for examples of audioConverterFillComplexBuffer and its callback in use - I've duplicated about a dozen such examples both line for line and modified and the conversion always fails. No, there isn't any problem with the input data. No, it isn't an endian issue. No, the problem isn't my version of OS X.
The problem is that I don't understand how audioConverterFillComplexBuffer works, so I don't know what I'm doing wrong. And nothing out there is helping me understand, because it seems like nobody on Earth really understands how audioConverterFillComplexBuffer works, either. From the people who actually use it(I spy cargo cult programming in their code) to even the authors of Learning Core Audio and/or Apple itself(http://stackoverflow.com/questions/13604612/core-audio-how-can-one-packet-one-byte-when-clearly-one-packet-4-bytes).
This isn't just a problem for me, it's a problem for anybody who wants to program high-performance audio on the Mac platform. Threadbare documentation that's apparently wrong and examples that don't work are no fun.
Once again, to be clear: I NEED A STEP BY STEP WALKTHROUGH ON HOW TO USE audioConverterFillComplexBuffer plus its callback and so does the entire Mac developer community.
This is a very old question but I think is still relevant. I've spent a few days fighting this and have finally achieved a successful conversion. I'm certainly no expert but I'll outline my understanding of how it works. Note I'm using Swift, which I'm also just learning.
Here are the main function arguments:
inAudioConverter: AudioConverterRef: This one is simple enough, just pass in a previously created AudioConverterRef.
inInputDataProc: AudioConverterComplexInputDataProc: The very complex callback. We'll come back to this.
inInputDataProcUserData, UnsafeMutableRawPointer?: This is a reference to whatever data you may need to be provided to the callback function. Important because even in swift the callback can't inherit context. E.g. you may need to access an AudioFileID or keep track of the number of packets read so far.
ioOutputDataPacketSize: UnsafeMutablePointer<UInt32>: This one is a little misleading. The name implies it's the packet size but reading the documentation we learn it's the total number of packets expected for the output format. You can calculate this as outPacketCount = frameCount / outStreamDescription.mFramesPerPacket.
outOutputData: UnsafeMutablePointer<AudioBufferList>: This is an audio buffer list which you need to have already initialized with enough space to hold the expected output data. The size can be calculated as byteSize = outPacketCount * outMaxPacketSize.
outPacketDescription: UnsafeMutablePointer<AudioStreamPacketDescription>?: This is optional. If you need packet descriptions, pass in a block of memory the size of outPacketCount * sizeof(AudioStreamPacketDescription).
As the converter runs it will repeatedly call the callback function to request more data to convert. The main job of the callback is simply to read the requested number packets from the source data. The converter will then convert the packets to the output format and fill the output buffer. Here are the arguments for the callback:
inAudioConverter: AudioConverterRef: The audio converter again. You probably won't need to use this.
ioNumberDataPackets: UnsafeMutablePointer<UInt32>: The number of packets to read. After reading, you must set this to the number of packets actually read (which may be less than the number requested if we reached the end).
ioData: UnsafeMutablePointer<AudioBufferList>: An AudioBufferList which is already configured except for the actual data. You need to initialise ioData.mBuffers.mData with enough capacity to hold the expected number of packets, i.e. ioNumberDataPackets * inMaxPacketSize. Set the value of ioData.mBuffers.mDataByteSize to match.
outDataPacketDescription: UnsafeMutablePointer<UnsafeMutablePointer<AudioStreamPacketDescription>?>?: Depending on the formats used, the converter may need to keep track of packet descriptions. You need to initialise this with enough capacity to hold the expected number of packet descriptions.
inUserData: UnsafeMutableRawPointer?: The user data that you provided to the converter.
So, to start you need to:
Have sufficient information about your input and output data, namely the number of frames and maximum packet sizes.
Initialise an AudioBufferList with sufficient capacity to hold the output data.
Call AudioConverterFillComplexBuffer.
And on each run of the callback you need to:
Initialise ioData with sufficient capacity to store ioNumberDataPackets of source data.
Initialise outDataPacketDescription with sufficient capacity to store ioNumberDataPackets of AudioStreamPacketDescriptions.
Fill the buffer with source packets.
Write the packet descriptions.
Set ioNumberDataPackets to the number of packets actually read.
return noErr if successful.
Here's an example where I read the data from an AudioFileID:
var converter: AudioConverterRef?
// User data holds an AudioFileID, input max packet size, and a count of packets read
var uData = (fRef, maxPacketSize, UnsafeMutablePointer<Int64>.allocate(capacity: 1))
err = AudioConverterNew(&inStreamDesc, &outStreamDesc, &converter)
err = AudioConverterFillComplexBuffer(converter!, { _, ioNumberDataPackets, ioData, outDataPacketDescription, inUserData in
let uData = inUserData!.load(as: (AudioFileID, UInt32, UnsafeMutablePointer<Int64>).self)
ioData.pointee.mBuffers.mDataByteSize = uData.1
ioData.pointee.mBuffers.mData = UnsafeMutableRawPointer.allocate(byteCount: Int(uData.1), alignment: 1)
outDataPacketDescription?.pointee = UnsafeMutablePointer<AudioStreamPacketDescription>.allocate(capacity: Int(ioNumberDataPackets.pointee))
let err = AudioFileReadPacketData(uData.0, false, &ioData.pointee.mBuffers.mDataByteSize, outDataPacketDescription?.pointee, uData.2.pointee, ioNumberDataPackets, ioData.pointee.mBuffers.mData)
uData.2.pointee += Int64(ioNumberDataPackets.pointee)
return err
}, &uData, &numPackets, &bufferList, nil)
Again, I'm no expert, this is just what I've learned by trial and error.

Wrapping a binary data file to self-convert to CSV?

I'm writing custom firmware for a SparkFun Logomatic V2 that records binary data to a file on a 2GB micro-SD card. The data file size will range from 100 MB to 1 GB.
The format of the binary data is in flux as the board's firmware evolves (it will actually be dynamically reconfigurable at run-time). Rather than create and maintain a separate decoder/converter program for each version of firmware/configuration, I'd much rather make the data files self-converting to CSV format by starting the data file with a Bash script that is written to the data file before data recording starts.
I know how to create a Here Document, but I suspect Bash would be unable to quickly parse and convert a gigabyte of binary data, so I'd like to make the process run much faster by having the script first compile some C code (assume GCC is present and in the path), then run the resulting program, passing the binary data to stdin.
To make the problem more concrete, assume the firmware will create binary data consisting of 4 16-bit integer values: A timestamp (unsigned) followed by 3 accelerometer axes (signed). There is no separator between records (mainly because I'm saturating the SPI interface to the uSD card).
So, I think I need a script with TWO here documents: One for the C code (parameterized by expanded Bash variables), and another for the binary data. Here's where I am so far:
#! env bash
# Produced by firmware version 0.0.0.0.0.1 alpha
# Configuration for this data run:
header_string = "Time, X, Y, Z"
column_count = 4
# Create the converter executable
# Use "<<-" to permit code to be indented for readability.
# Allow variable expansion/substitution.
gcc -xc /tmp/convertit - <<-THE_C_CODE
#include <stdio.h>
int main (int argc, char **argv) {
// Write ${header_string} to stdout
while (1) {
// Read $(column_count} shorts from stdin
// Break if EOF
// Write $(column_count} comma-delimited values to stdout
}
// Close stdout
return 0;
}
THE_C_CODE
# Pass the binary data to the converter
# Hard-quote the Here tag to prevent subsequent expansion/substitution
/tmp/convertit >./$1.csv <<'THE_BINARY_DATA'
...
... hundreds of megabytes of semi-random data ...
...
THE_BINARY_DATA
rm /tmp/convertit
exit 0
Does that look about right? I don't yet have a real data file to test this with, but I wanted to verify the idea before going much further.
Will Bash complain if the closing lines are missing? This may happen if data capture terminates unexpectedly due to a shock knocking loose the battery or uSD card. Or if the firmware borks.
Is there a faster or better method I should consider? For example, I wonder if Bash will be too slow to copy the binary data as fast as the C program can consume it: Should the C program open the data file directly?
TIA,
-BobC
You may want to have a look at makeself. It allows you to change any .tar.gz archive into a self-extracting file which is platform independent (something like a shell script that contains a here document). This will allow you to easily distribute your data and decoder. It also allows you to configure a script contained within the archive to be run when the container script is run. This way you can use makeself for packaging and inside the archive you can put your data files and decoder written in C or bash or whatever language you find suitable.
While it is possible to decode binary data using shell tools (e.g. using od), it's very cumbersome and ineffective. I'd recommend using either a C program or perl which is also likely to be found on almost any machine (check this page).

Resources