Trying to implement stream demuxing/decoding in memory, using producer/consumer idiom.
The producer reads data from network and pushes it into a buffer. The consumer reads data from this buffer and forwards it to avformat.
According to ffmpeg docs, the only way to implement in-memory processing is using AVIOContext, providing read function for it.
The problem is that if there is too few data in buffer, read function returns 0 and avformat thinks that it's EOF, producing an error.
Specifically this happens on avformat_open_input/avformat_find_stream_info calls, where avformat tries to probe given data. Sometimes it work, sometimes not, depending on input buffer size.
I've tried to return EAGAIN from the read function (seemed reasonable), but it doesn't work.
As workaround I've increased input buffer size, accumulating more data before passing it to avformat, but it's waste of memory (there could be hundreds/thousands streams) and can't be a general solution since we don't know exactly how much data will be needed
Is there any way to tell avformat that there will be more data, but later?
Related
I've implemented Scala Akka application that streams 4 different types of data from biomodule sensor (ECG, EEG, Breath and general data). These data (timestamp and value) are typically stored in 4 different CSV files. However, sometimes I have to store each sample in two different files with different timestamps, so application is writing in 8 different CSV files at the same time.
Initially I've implemented one Akka actor that is responsible for persisting data, which receive path to the file in which to write data, timestamp and value. However, this was a bottleneck, since a number of samples that I need to store is large (e.g. one ECG sample is received each 4ms). As a result, this actor had finished recording in very short experiment 1-2 minutes after experiment is over.
I've also tried with 4 actors for 4 different message types, with the idea to distribute work. I didn't notice significant improvement in performances.
I'm wondering if someone has an idea how to improve the performance. Is it better to use one actor for storing files, few actors or it is most efficient if I have one actor for each file? Or maybe, it doesn't make any difference? Could I improve my code for storing data?
This is my method responsible for storing data:
def processValue(sample: WaveformValue): Unit ={
val csvfilewriter=new PrintWriter(new BufferedWriter(new FileWriter(sample.filepath,true)))
csvfilewriter.append(sample.timestamp.toString)
csvfilewriter.append(",")
csvfilewriter.append(sample.value.toString)
csvfilewriter.append("\r\n")
csvfilewriter.flush()
csvfilewriter.close()
}
It seems to me that your bottleneck is I/O -- disk access. It looks like you are opening, writing to, and closing a file for each sample, which is very expensive. I would suggest:
Open each file just once, and close it at the end of all processing. You might need to store the file in a member variable, or if you have have an arbitrary collection of files then store them in a map in a member variable.
Don't flush after every sample write.
Use buffered writes for each file writer. This avoids flushing data to the filesystem with every write, which involves a system call and waiting for the data to be written to disk. I see that you're already doing this, but the benefit is lost since you are flushing/closing the file after each sample anyway.
I am using HornetQ for email sending.
File attachments are transmitted out-of-band (not as part of the message) using an object storage system. This adds some overhead that I want to avoid for small files by putting them into message properties directly.
I know that I can send huge message bodies, but for large files, object storage works well, this is about small files, and delivery by property value would be very convenient if it works.
What are the considerations for message property values? Can I make them a 100K byte array? Will this slow things down (or even break)?
Headers, Properties and the Body buffer themselves are all combined in relatively straightforward process into the overall buffer for the message, so there should not be significant performance issues from that perspective. You can see the core implementation here:
https://github.com/hornetq/hornetq/blob/master/hornetq-core-client/src/main/java/org/hornetq/core/message/impl/MessageImpl.java
One consideration would be the size of your consumer window size, which by default would only be 1MB. This is the size that will be buffered on the consumer, so if you are sending messages near this size your performance in reading may be much slower as you wait for data at the consumer. This can be changed with the consumer-window-size parameter. See http://docs.jboss.org/hornetq/2.4.0.Final/docs/user-manual/html/flow-control.html#d0e4023 for more information.
Pulling from comments, you'll probably also want to increase your journal size and buffer size. See You'd probably be close to the limits. You would want to site the journal buffer size larger for sure to get some headroom, and probably size up the journal itself as well. http://hornetq.sourceforge.net/docs/hornetq-2.1.1.Final/user-manual/en/html/persistence.html#configuring.message.journal.journal-buffer-size and https://developer.jboss.org/thread/154423
My question may be similar to this: Why might my AudioQueueOutputCallback not be called?
It seems that person was able to fix by running audio stuff on main thread. I cannot do that.
I enqueue buffers to prime audio Q, then start audio Q. Shouldn't those buffers complete immediately once I start my queue?
I am setting the data size correctly.
As a hack I just re-use buffers without waiting for them to be reported by cabllback as done. If I do this, I run for a couple of seconds like this, then the buffer callback starts working from them on.
definitely not a good idea to hack your way around with core audio.. while it may be a quick fix, it will definitely hurt you in ambiguous ways in the long run.
your problem isn't the same as the link you posted, their problem was assigning the callback on the wrong thread.. in your case, your callback is in the right thread, it's just that the audio buffers you are feeding it initially are either empty, too small or contains data not fit for audio playback
keep in mind that the purpose of the callback is to fire after each audio buffer supplied to the audio queue has been played (ie consumed).. the fact that after you start the queue the callback isn't being fired.. it means that there is nothing in the audio buffers for it to consume.. or too little meaningful information for it to consume..
when you do it manually you see a lag b/c the audio queue is trying to process the empty/erroneous buffers you supplied it.. then you resupply the same buffers with valid data that the queue eventually plays and then fires the callback
solution: compare the data you put in the buffers before starting the queue with the data you are supplying manually.. i'm sure there is a difference.. if that doesn't work please show your code for further analysis
I fetch images with open-uri from a remote website and persist them on my local server within my Ruby on Rails application. Most of the images were shown without a problem, but some images just didn't show up.
After a very long debugging-session I finally found out (thanks to this blogpost) that the reason for this is that the class Buffer in the open-uri-libary treats files with less than 10kb in size as IO-objects instead of tempfiles.
I managed to get around this problem by following the answer from Micah Winkelspecht to this StackOverflow question, where I put the following code within a file in my initializers:
require 'open-uri'
# Don't allow downloaded files to be created as StringIO. Force a tempfile to be created.
OpenURI::Buffer.send :remove_const, 'StringMax' if OpenURI::Buffer.const_defined?('StringMax')
OpenURI::Buffer.const_set 'StringMax', 0
This works as expected so far, but I keep wondering, why they put this code into the library in the first place? Does anybody know a specific reason, why files under 10kb in size get treated as StringIO ?
Since the above code practically resets this behaviour globally for my entire application, I just want to make sure that I am not breaking anything else.
When one does network programming, you allocate a buffer of a reasonably large size and send and read units of data which will fit in the buffer. However, when dealing with files (or sometimes things called BLOBs) you cannot assume that the data will fit into your buffer. So, you need special handling for these large streams of data.
(Sometimes the units of data which fit into the buffer are called packets. However, packets are really a layer 4 thing, like frames are at layer 2. Since this is happening a layer 7, they might better be called messages.)
For replies larger than 10K, the open-uri library is setting up the extra overhead to write to a stream objects. When under the StringMax size, it just includes the string in the message, since it knows it can fit in the buffer.
I am creating a streaming eventmachine server. I'm concerned about avoiding blocking IO or doing anything else to muck up the event loop.
From what I've read, ruby's non-blocking IO can be used to stream files in a non-blocking way, or I can call next_tick, but I'm a little unclear about which of these approaches is preferable.
Part of the problem is that I have not found a good explanation of non-blocking IO library functions in ruby.
Short version:
Assuming a long-lived network IO operation, several wall clock minutes of streaming per file, transfer, what is the best way to do this in eventmachine without gumming up the event loop?
while 1 do
file.read do |bytes|
#conn.send_data bytes
end
end
I understand that the above code will block and I'm wondering what to put in its place. Also, I cannot use the FileStreamer class that is part of eventmachine as is, because I need to manipulate the data after it's read but before it's sent.
I think you can still use FileStreamer. FileStreamer expects its first argument to be a Connection, but this is a loose contract. As long as you implement the methods that FileStreamer expects, it should work. Take a look at this
https://gist.github.com/f4d997c3eeb6bdc5a9f3
The methods you'll need to handle are send_data and send_file_data. You can perform your manipulations here. Then pass the result along to EM::Connection.
Also, from my reading of the code, the special property of FileStreamer is that it allocates a memory mapped file (unless the file is small). You could do essentially the same thing by opening a regular Ruby File, reading blocks out of it, doing your manipulation, and emulating the behavior of FileStreamer.stream_one_chunk. Which is basically:
Each iteration must either send some data to the Connection, or reschedule itself using next_tick
Data can be repeatedly written to the Connection until the outbound buffer is full (according to get_outbound_data_size)
Once the file has been fully read, it should be closed (of course)
In fact, it seems to me that you had better not use FileStreamer unless your file will comfortably fit in memory.
You can look at the EM::Protocols for ideas about how to transform the data as it is streaming through.