I'm new to audio programming so excuse me if I'm not using the right terms...
I have two streaming buffers that I want to have playing simultaneously completely synchronized. I want to control ratio of blending between the streams. I'm sure it's as simple as having two sources playing and just changing the their gain, but I read about people doing some tricks like having 2 channels buffer instead of two single channels. Then they play from a single source but control the blending between the channels. The article I read wasn't about OpenAL so my question is: Is this even possible with OpenAL?
I guess I don't have to do it this way but now I'm curious and want to learn how to set it up. Do I suppose to setup alFilter? Creative's documentation sais "Buffers containing more than one channel of data will be played without 3D spatialization." Reading this I guess I need a pre-pass on a buffer level and then having the source output blended mono channel signal.
I guess I'll ask another question. Is OpenAL flexible enough to do tricks like this?
I decode my stream manually so I realize how easy it will be to do the blending myself before feeding the buffer but then I won't be able in real time to change the blending factor since I already have a second or so of the stream buffered.
I have two streaming buffers that I want to have playing simultaneously completely synchronized.
I want to control ratio of blending between the streams. I'm sure it's as simple as having two sources playing and just changing the their gain
Yes, it should be. Did you try that? What was the problem?
ALuint source1;
ALuint source2;
...
void set_ratio(float ratio) {
ratio=std::min(ratio,1);
alSourcef (source1, AL_GAIN, ratio);
alSourcef (source2, AL_GAIN, (1-ratio));
}
Related
I'm new to stack overflow, but I've been researching how to do this for a couple weeks to no avail. I'm hoping perhaps one of you has some knowledge I haven't seen online yet.
Here is a crude illustration of what I hope to accomplish. I have a video wall of eight monitors - four each of two different sizes. The way it's set up now, all eight monitors are treated together as one big monitor displaying an oddly shaped cutout of a desktop.
Eventually I need each individual monitor to display a separate RTSP stream for about thirty seconds, then have the entire display - all eight monitors in conjunction - to fade out into a large logo.
My problem right now is that I don't know of a way to mask an rtsp stream so it looks like this rather than this, let alone how to arrange them into a weirdly spaced, oddly angled, multiple aspect-ratio mosaic like in the original illustration.
Thank you all for your time. I'm just an intern here without insane technical knowhow, but I'll try to clarify as much as I can.
-J
I believe -filter-complex is one of the ffmpeg CLI flags that you need. You can find many examples online, but here are a few links of interest:
Here's an ffmpeg wiki on creating a mosaic https://trac.ffmpeg.org/wiki/Create%20a%20mosaic%20out%20of%20several%20input%20videos
FFMpeg - Combine multiple filter_complex and overlay functions
That should get you started, but you will probably need to add customization depending on frame size and formats.
I am a beginner to Image Processing and I want to know whether it's okay to convert a 4 channel image to a 3 channel image and a 2 channel image to a single channel image for simple image processing applications so that I only have to write code for 3 and single channel images?
99.5% of all image processing algorithms work on one channel only. If you have more channels you usually convert them into one channel.
Like if you have RGB you usually work with R,G,B separately or you convert them into H,S and I. Sometimes you have more complex conversions.
Images from cameras come without alpha channel. Its just something you use to make things transparent for web design and things like that. So besides the design stuff the alpha channel usually bears no information. Of course you are always free to somehow use that alpha channel to make transparent things black in an image that has no alpha channel e.g.
I suggest you get yourself some basic knowledge about colour spaces and image formats befor you continue with image processing. If you understand the basics you can answer such questions to yourself.
This really cannot be answered in general - it depends on your current application.
Usual images like photographs don't come with an alpha channel. If the image has one, you usually want to respect it. Sometimes blending over a single color background will do the job, sometimes it won't work.
Some algorithms work great on B/W images where one channel is the way to go, but then again, others don't.
If you're just "getting your feet wet" with CV, starting with B/W images is a reasonable approach. It just won't work for all applications.
I'm trying to extract raw streams from devices and files using ffmpeg. I notice the crucial frame information (Video: width, height, pixel format, color space, Audio: sample format) is stored both in the AVCodecContext and in the AVFrame. This means I can access it prior to the stream playing and I can access it for every frame.
How much do I need to account for these values changing frame-to-frame? I found https://ffmpeg.org/doxygen/trunk/demuxing__decoding_8c_source.html#l00081 which indicates that at least width, height, and pixel format may change frame to frame.
Will the color space and sample format also change frame to frame?
Will these changes be temporary (a single frame) or lasting (a significant block of frames) and is there any way to predict for this stream which behavior will occur?
Is there a way to find the most descriptive attributes that this stream is possible of producing, such that I can scale all the lower-quality frames up, but not offer a result that is mindlessly higher-quality than the source, even if this is a device or a network stream where I cannot play all the frames in advance?
The fundamental question is: how do I resolve the flexibility of this API with the restriction that raw streams (my output) do not have any way of specifying a change of stream attributes mid-stream. I imagine I will need to either predict the most descriptive attributes to give the stream, or offer a new stream when the attributes change. Which choice to make depends on whether these values will change rapidly or stay relatively stable.
So, to add to what #szatmary says, the typical use case for stream parameter changes is adaptive streaming:
imagine you're watching youtube on a laptop with various methods of internet connectivity, and suddenly bandwidth decreases. Your stream will automatically switch to a lower bandwidth. FFmpeg (which is used by Chrome) needs to support this.
alternatively, imagine a similar scenario in a rtc video chat.
The reason FFmpeg does what it does is because the API is essentially trying to accommodate to the common denominator. Videos shot on a phone won't ever change resolution. Neither will most videos exported from video editing software. Even videos from youtube-dl will typically not switch resolution, this is a client-side decision, and youtube-dl simply won't do that. So what should you do? I'd just use the stream information from the first frame(s) and rescale all subsequent frames to that resolution. This will work for 99.99% for the cases. Whether you want to accommodate your service to this remaining 0.01% depends on what type of videos you think people will upload and whether resolution changes make any sense in that context.
Does colorspace change? They could (theoretically) in software that mixes screen recording with video fragments, but it's highly unlikely (in practice). Sample format changes as often as video resolution: quite often in the adaptive scenario, but whether you care depends on your service and types of videos you expect to get.
Usually not often, or ever. However, this is based on the codec and are options chosen at encode time. I pass the decoded frames through swscale just in case.
I want to play sound "on-demand". A simple drum machine is what I want to program.
Is it possible to make DirectShow read from a memory buffer ?(object created by c++)
I am thinking:
Create a buffer of, lets say, 40000 positions, type double (I don't know the actual data type to use as sound, so I might be wrong with double).
40000 positions can be 1 second of playback.
The DirectShow object is supposed to read this buffer position by position, over and over again. and the buffer will contain the actual value of the output of the sound. For example (a sine-looking output):
{0, 0.4, 0.7, 0.9, 0.99, 0.9, 0.7, 0.4, 0, -0,4, -0.7, -0.9, -0.99, -0.9, -0.7, -0.4, 0}
The resolution of this sound sequence is probably not that good, but it is only to display what I mean.
Is this possible? I cannot find any examples or information about it on Google.
edit:
When working on DirectShow and streaming video (UBS camera), I used something called Sample Grabber. Which called a method for every frame from the cam. I am looking for something similar, but for music, and something that is called before the music is played.
Thanks
You want to stream your data through and injecting data into DirectShow pipeline is possible.
By design, outer DirectShow interface does not provide access to streamed data. Controlling code builds the topology, connects filters, sets them up and controls the state of the pipeline. All data is streamed behind the scenes, filters are passing pieces of data one to another and this adds up into data streaming.
Sample Grabber is the helper filter that allows to grab a copy of data being passed through certain graph point. Because otherwise payload data is not available to controlling code, Sample Grabber gained popularity, esp. for grabbing video frames out the the "inaccessible" stream, live or file backed playback.
Now when you want to do the opposite, put your own data into pipeline, the Sample Grabber concept does not work. Taking a copy of data is one thing, and proactive putting your own data into the stream is a different one.
To inject your own data you typically put your own custom filter into the pipeline that generates the data. You want to generate PCM audio data. You are choose where you take it from - generation, reading from file, memory, network, looping whatsoever. You fill buffers, you add time stamps and you deliver the audio buffers to the downstream filters. A typical starting point is PushSource Filters Sample which introduces the concept of a filter producing video data. In a similar way you want to produce PCM audio data.
A related question:
How do I inject custom audio buffers into a DirectX filter graph using DSPACK?
First let me explain the application a little bit. This is video security software that can display up to 48 cameras at once. Each video stream gets its own Windows HDC but they all use a shared OpenGL context. I get pretty good performance with OpenGL and it runs on Windows/Linux/Mac. Under the hood the contexts are created using wxWidgets 2.8 wxGLCanvas, but I don't think that has anything to do with the issue.
Now here's the issue. Say I take the same camera and display it in all 48 of my windows. This basically means I'm only decoding 30fps (which is done on a different thread anywa) but displaying up to 1440fps to take decoding out of the picture. I'm using PBOs to transfer the images over, depending on whether pixel shaders and multitexturing are supported I may use those to do YUV->RGB conversion on the GPU. Then I use a quad to position the texture and call SwapBuffers. All the OpenGL calls come from the UI thread. Also I've tried doing YUV->RGB conversion on the CPU and messed with using GL_RGBA and GL_BGRA textures, but all formats still yield roughly the same performance. Now the problem is I'm only getting around 1000fps out of the possible 1440fps (I know I shouldn't be measuring in fps, but its easier in this scenario). The above scenario is using 320x240 (YUV420) video which is roughly only 110MB/sec. If I use a 1280x720 camera then I get roughly the same framerate which is nearly 1.3GB/sec. This tells me that it certainly isn't the texture upload speed. If I do the YUV->RGB conversion and scaling on the CPU and paint using a Windows DC then I can easily get the full 1440fps.
The other thing to mention is that I've disabled vsync both on my video card and through OpenGL using wglSwapIntervalEXT. Also there are no OpenGL errors being reported. However, using very sleepy to profile the application it seems to be spending most of its time in SwapBuffers. I'm assuming the issue is somehow related to my use of multiple HDCs or with SwapBuffers somewhere, however, I'm not sure how else to do what I'm doing.
I'm no expert on OpenGL so if anyone has any suggestions or anything I would love to hear them. If there is anything that I'm doing that sounds wrong or any way I could achieve the same thing more efficiently I'd love to hear it.
Here's some links to glIntercept logs for a better understanding of all the OpenGL calls being made:
Simple RGB: https://docs.google.com/open?id=0BzGMib6CGH4TdUdlcTBYMHNTRnM
Shaders YUV: https://docs.google.com/open?id=0BzGMib6CGH4TSDJTZGxDanBwS2M
Profiling Information:
So after profiling it reported several redundant state changes which I'm not surprised by. I eliminated all of them and saw no noticeable performance difference which I kind of expected. I have 34 state changes per render loop and I am using several deprecated functions. I'll look into using vertex arrays which would solve these. However, I'm just doing one quad per render loop so I don't really expect much performance impact from this. Also keep in mind I don't want to rip everything out and go all VBOs because I still need to support some fairly old Intel chipset drivers that I believe are only OpenGL 1.4.
The thing that really interested me and it hadn't occurred to me before was that each context has its own front and back buffer. Since I'm only using one context the previous HDCs render call must finish writing to the back buffer before the swap can occur and then the next one can start writing to the back buffer again. Would it really be more efficient to use more than one context? Or should I look into rendering to textures (FBOs I think) instead and continue using one context?
EDIT: The original description mentioned using multiple OpenGL contexts, but I was wrong I'm only using one OpenGL context and multiple HDCs.
EDIT2: Added some information after profiling with gDEBugger.
What I try to make your application faster. I made one OpenGL render thread (or more if you have 2 or more video cards). Video card cannot process several context in one time, your multiple OpenGL contexts are waiting one of context. This thread will make only OpenGL work, like YUV->RGB conversion (Used FBO to render to texture). Camere`s thread send images to this thread and UI thread can picked up it to show on window.
You have query to process in OpenGL context and you can combine several frames to one texture to convert it by one pass. It maybe useful, because you have up to 48 cameras. As another variant if OpenGL thread is busy now, you can convert some frame on CPU.
From the log I see you often call the same methods:
glEnable(GL_TEXTURE_2D)
glMatrixMode(GL_TEXTURE)
glLoadIdentity()
glColor4f(1.000000,1.000000,1.000000,1.000000)
You may call it once per context and did not call for each render.
If I understung correct you use 3 texture for each plane of YUV
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,352,240,GL_LUMINANCE,GL_UNSIGNED_BYTE,00000000)
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,176,120,GL_LUMINANCE,GL_UNSIGNED_BYTE,000000)
glTexSubImage2D(GL_TEXTURE_2D,0,0,0,176,120,GL_LUMINANCE,GL_UNSIGNED_BYTE,00000000)
Try to use one texture and use calculation in shader to take correct YUV value for pixel. It is possible, I made it in my application.