Encode WebCam frames with H.264 on .NET

Encode WebCam frames with H.264 on .NET - windows

What i want to do is the following procedure:
Get a frame from the Webcam.
Encode it with an H264 encoder.
Create a packet with that frame with my own "protocol" to send it via UDP.
Receive it and decode it...
It would be a live streaming.
Well i just need help with the Second step.
Im retrieving camera images with AForge Framework.
I dont want to write frames to files and then decode them, that would be very slow i guess.
I would like to handle encoded frames in memory and then create the packets to be sent.
I need to use an open source encoder. Already tryed with x264 following this example
How does one encode a series of images into H264 using the x264 C API?
but seems it only works on Linux, or at least thats what i thought after i saw like 50 errors when trying to compile the example with visual c++ 2010.
I have to make clear that i already did a lot of research (1 week reading) before writing this but couldnt find a (simple) way to do it.
I know there is the RTMP protocol, but the video stream will always be seen by one peroson at a(/the?) time and RTMP is more oriented to stream to many people. Also i already streamed with an adobe flash application i made but was too laggy ¬¬.
Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.
I hope that at least someone could point me on(/at?) the right direction.
My english is not good maybe blah blah apologies. :P
PS: doesnt has to be in .NET, it can be in any language as long as it works on Windows.
Many many many many thanks in advance.

You could try your approach using Microsoft's DirectShow technology. There is an opensource x264 wrapper available for download at Monogram.
If you download the filter, you need to register it with the OS using regsvr32. I would suggest doing some quick testing to find out if this approach is feasible, use the GraphEdit tool to connect your webcam to the encoder and have a look at the configuration options.
Also would like you to give me an advice about if its ok to send frames one by one or if it would be better to send more of them within each packet.
This really depends on the required latency: the more frames you package, the less header overhead, but the more latency since you have to wait for multiple frames to be encoded before you can send them. For live streaming the latency should be kept to a minimum and the typical protocols used are RTP/UDP. This implies that your maximum packet size is limited to the MTU of the network often requiring IDR frames to be fragmented and sent in multiple packets.
My advice would be to not worry about sending more frames in one packet until/unless you have a reason to. This is more often necessary with audio streaming since the header size (e.g. IP + UDP + RTP) is considered big in relation to the audio payload.

Related

Real time microphone audio manipulation windows

I would like to make an app (Target pc windows) that let you modify the micro input in real time, like introducing sound effects or even modulating your voice.
I searched over the internet and only found people telling that it would not be possible without using a virtual audio cable.
However I know some apps with similar behavior (voicemod, resonance) not using a virtual audio cable so I would like some help about how can be done (just the name of a library capable would be enough) or where to start.

Firstly, you can use professional ready-made software for that - Digital audio workstation (DAW) in combination with a huge number of plugins for that.
See 5 steps to real-time process your instrument in the DAW.
And What is (audio) direct monitoring?
If you are sure you have to write your own, you can use libraries for real-time audio processing (as far as I know, C++ is better for this than C#).
These libraries really works. They are specially designed for realtime.
https://github.com/thestk/rtaudio
http://www.portaudio.com/
See also https://en.wikipedia.org/wiki/Csound
If you don't have a professional sound interface yet, but want to minimize a latency, read about Asio4All

The linked tutorial worked for me. In it, a sound is recorded and saved to a .wav.
The key to having this stream to a speaker would be opening a SourceDataLine and outputting to that instead of writing to a wav file. So, instead of outputting on line 59 to AudioSystem.write, output to a SourceDataLine write method.
IDK if there will be a feedback issue. Probably good to output to headphones and not your speakers!
To add an effect, the AudioInputLine has to be accessed and processed in segments. In each segment the following needs to happen:
obtain the byte array from the AudioInputLine
convert the audio bytes to PCM
apply your audio effect to the PCM (if the effect is a volume change over time, this could be done by progressively altering a volume factor between 0 to 1, multiplying the factor against the PCM)
convert back to audio bytes
write to the SourceDataLine
All these steps have been covered in StackOverflow posts.
The link tutorial does some simplification in how file locations, threads, and the stopping and starting are handled. But most importantly, it shows a working, live audio line from the microphone.

DirectX vs FFmpeg

i'm in the process of deciding how to decode received video frames, based on the following:
platform is Windows.
frames are encoded in H264 or H265.
GPU should be used as much
certainly we prefer less coding and simplest code. we just need to decode and show the result on screen. no recording is required, not anything else.
still i'm a newbie, but i think one may decode a frame directly by directx or through ffmpeg. am i right?
if so, which one is preferred?

For a simple approach and simple code using GPU only, take a look at my project using DirectX : H264Dxva2Decoder
If you are ready to code, you can use my approach.
If not, you can use MediaFoundation or FFMPEG, both can do the job.
MediaFoundation is C++ and COM oriented. FFMPEG is C oriented. It can make the difference for you.
EDIT
You can use my program because you have frames encoded in H264 or H265. For h265, you will have to add extra code.
Of course, you need to make modifications. And yes you can send frames to DirectX without using a file. This project use only avcc video file format, but it can be modify for others cases.
You don't need the atom parser. You need to modify the nalu parser, if frames are annex-b format, for example. You will also need to modify the buffering mechanism, if frames are annex-b format.
I can help you, if you provide frames samples encoded in H264.
About Ffmpeg, it has fewer limitations than my program, according to h264 specifications,
but does not provide the rendering mechanism. You will have to mix Ffmepg and my rendering mechanism, for example.
Or study a program like MPC-HC that shows the mix. I can not help anymore here.
EDIT 2
One thing to know, you can't decode encoded packets directly to GPU. You need to parse them before. That's why there is a nalu parser (see DXVA_PicParams_H264).
If you are not ready to code and to understand how it works, use Ffmpeg, it will be simpler, in effect. You can focus on rendering, not on decoding.
It's also important to know which one gives a better result, consumes less resources (CPU, GPU, RAM (both system memory and graphics card memory), supports wider range of formats, etc.
You ask for a real expertise...
If you code your own program, you will be able to optimize it, and certainly get better results. If you use Ffmpeg, and it has performance problems in your context, you could be blocked... because you will not modify Ffmpeg.
You say you will use Bosch cameras. Normally, all encoded video will be in the same format. So once your code is able to decode it, you don't really need all the Ffmpeg features.

PCM audio streaming over websocket

I've been struggling with the following problem and can't figure out a solution. The provided java server application sends pcm audio data in chunks over a websocket connection. There are no headers etc. My task is to play these raw chunks of audio data in the browser without any delay. In the earlier version, I used audioContext.decodeAudioData because I was getting the full array with the 44 byte header at the beginning. Now there is no header so decodeAudioData cannot be used. I'll be very grateful for any suggestions and tips. Maybe I've to use some JS decoding library, any example or link will help me a lot.
Thanks.

1) Your requirement "play these raw chunks of audio data in the browser without any delay" is not possible. There is always some amount of time to send audio, receive it, and play it. Read about the term "latency." First you must get a realistic requirement. It might be 1 second or 50 milliseconds but you need to get something realistic.
2) Web sockets use tcp. TCP is designed for reliable communications, congestion control, etc. It is not design for fast low latency communications.
3) Give more information about your problem. Is you client and server communicating over the Internet or over a local Lan? This will hugely effect your performance and design.
4) The 44 byte header was a wav file header. It tells the type of data (sample rate, mono/stereo, bits per sample). You must know this information to be able to play the audio. IF you know the PCM type, you could insert it yourself and use your decoder as you did before. Otherwise, you need to construct an audio player manually.
Streaming audio over networks is not a trivial task.

Sending per frame metadata with H264 encoded frames

We're looking for a way to send per frame metadata (for example an ID) with H264 encoded frames from a server to a client.
We're currently developing a remote rendering application, where both client and server side are actively involved.
The server renders a high quality image with all effects, lighting etc.
The client also has model-informations and renders a diffuse image that is used when the bandwidth is too low or the images have to be warped in order to avoid stuttering .
So far we're encoding the frames on the server side with ffmpeg and streaming them with live555 to the client, who receives an rtsp-stream and decodes the frames again using ffmpeg.
For our application, we now need to send per frame metadata.
We want the client to tell the server where the camera is right now.
Ideally we'd be able to send the client's view matrix to the server, render the corresponding frame and send it back to the client together with its view matrix. So when the client receives a frame, we need to know exactly at what camera position the frame was rendered.
Alternatively we could also tag each view matrix with an ID, send it to the server, render the frame and tag it with the same ID and send it back. In this case we'd have to assign the right matrix to the frame again on the client side.
After several attempts to realize the above intent with ffmpeg we came to the conclusion that ffmpeg does not provide the required functionality. ffmpeg only provides a fix, predefined set of fields for metadata, that either cannot store a matrix or can only be set for every key frame, which is not frequently enough for our purpose.
Now we're considering using live555. So far we have an on demand Server, witch gets a VideoSubsession with a H264VideoStreamDiscreteFramer to contain our own FramedSource class. In this class we load the encoded AVPacket (from ffmpeg) and send its data-buffer over the network. Now we need a way to send some kind of metadata with every frame to the client.
Do you have any ideas how to solve this metadata problem with live555 oder another library?
Thanks for your help!

It seems this question was answered in the comments:
pipe the output of ffmpeg through a custom tool that embedded the data
in the 264 elementary stream via an SEI
Someone also gave the following answer, which was deleted a few years ago for dubious reasons (it is brief but does seem to contain sufficient information):
You can do so using MPEG-4. See details for MPEG-4 Part 14 for
details.

Analyse audio stream using Ruby

I'm searching for a way to analyse the content of internet radios. I want to write a ruby client that can get the current track, next track, band, bpm and other meta information from a stream (e.g. a radio on shoutcast).
Does anybody know how to do this? And how do I record that stream into a mp3 or aac file?
Maybe there is a library that can already do this, I haven't one so far.
regards

I'll answer both of your questions.
Metadata
What you are seeking isn't entirely possible. Information on the next track is not available (keep in mind not all stations are just playing songs from a playlist... many offer live content). Advanced metadata such as BPM is not available. All you get is something like this:
Some Band - Some Song
The format of {artist} - {song title} isn't always followed either.
With those caveats, you can get that metadata from a stream by connecting to the stream URL and requesting the metadata with the following request header:
Icy-MetaData: 1
That tells the server to send the metadata, which is interleaved into the stream. Every 8KB or so (specified by the server in a response header), you'll find a chunk of metadata to parse. I have written up a detailed answer on how to parse that here: Pulling Track Info From an Audio Stream Using PHP The prior question was language-specific, but you will find that my answer can be easily implemented in any language.
Saving Streams to Disk
Audio playing software is generally very resilient to errors. SHOUTcast servers are built on this principal, and are not knowledgeable about the data going through them. They just receive data from an encoder, and when the client requests the stream, they start sending that data at an arbitrary point.
You can use this to your advantage when saving stream data. It is possible to simply write the stream data as it comes in to a file. Most audio players will play them without problem. I have tested this with MP3 and AAC.
If you want a more conformant file, you will have to use a library or parse the stream yourself to split on the appropriate frames, and then handle bit reservoir issues in your code. This is a lot of work, and generally isn't worth doing unless you find your files have real compatibility problems.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio