Is there a way to serve the video part, providing start end seconds? - go

I am trying to use the backend to serve the video from the storage. I use Go + GIN It works but I need to implement video requests with start and end parameters. For example, I have a video with 10 mins duration and I want to request a fragment from 2 to 3 mins. Is it possible or are there examples somewhere?
This is what I have now:
accessKeyID := ""
secretAccessKey := ""
useSSL := false
ctx := context.Background()
endpoint := "127.0.0.1:9000"
bucketName := "mybucket"
// Initialize minio client object.
minioClient, err := minio.New(endpoint, &minio.Options{
Creds: credentials.NewStaticV4(accessKeyID, secretAccessKey, ""),
Secure: useSSL,
})
if err != nil {
log.Fatalln(err)
}
// Get file
object, err := minioClient.GetObject(ctx, bucketName, "1.mp4", minio.GetObjectOptions{})
if err != nil {
fmt.Println(err)
return
}
objInfo, err := object.Stat()
if err != nil {
return
}
buffer := make([]byte, objInfo.Size)
object.Read(buffer)
c.Writer.Header().Set("Content-Length", fmt.Sprintf("%d", objInfo.Size))
c.Writer.Header().Set("Content-Type", "video/mp4")
c.Writer.Header().Set("Connection", "keep-alive")
c.Writer.Header().Set("Content-Range", fmt.Sprintf("bytes 0-%d/%d", objInfo.Size, objInfo.Size))
//c.Writer.Write(buffer)
c.DataFromReader(200, objInfo.Size, "video/mp4", bytes.NewReader(buffer), nil)

This will require your program to at least demux the media stream to get time information out of it, in case you're using a container that supports that, or to actually decode the video stream in case it doesn't - in general, you can't know how many bytes you need to seek into a video file to go to a specific location¹.
As the output again needs to be a valid media container so that whoever requested it can deal with it, there's going to be remixing into an output container.
So, pick yourself a library that can do that and read its documentation. Ffmpeg / avlib is the classical choice there, but I have positively no idea about whether someone else has already written go bindings for it. If not, doing that works be worthwhile.
¹ there is cases where you can, that would probably apply to MPEG Transport Streams with a fixed mux bitrate. But unless you're working in streaming of video for actual TV towers or actual TV satellites that need a constant rate data stream, you will not likely be dealing with these

Related

pion/webrtc - How do I set audio sink and source in GO Pion API?

I'm working on a CLI Go app to run in the background on my Linux server. This is an implementation of pion/webrtc. My Go app is connecting to the Janus, but not receiving or sending audio. I need to send microphone audio and receive audio from Janus. I guess that I should link my audio sink/source in pion, but I'm confused.
I'm not sure about this code:
// Create a audio track
opusTrack, err := webrtc.NewTrackLocalStaticSample(webrtc.RTPCodecCapability{MimeType: "audio/opus"}, "audio", "pion")
if err != nil {
panic(err)
} else if _, err = peerConnection.AddTrack(opusTrack); err != nil {
panic(err)
}
Neither this:
gst.CreatePipeline("opus", []*webrtc.TrackLocalStaticSample{opusTrack}, "audiotestsrc").Start()
I used the sample code of pion/example-webrtc-applications/janus-gateway.
My whole code here.
Thanks for helping!
This is what I got:
gst.CreatePipeline("opus", []*webrtc.TrackLocalStaticSample{opusTrack}, "autoaudiosrc").Start()
"autoaudiosrc" is the matter.
Linux Ubuntu Server 21.04.4

How to handle chunked file upload

I'm creating a simple application where it allows users to upload big files using simple-uploader since this plugin sends the files in chunks instead of one big file. The problem is that when I save the file the first chunk is the only one that is being saved. Is there a way in Go where I'll wait for all the chunks to arrive in the server then save it afterward?
Here's a snippet of the code I'm doing:
dFile, err := c.FormFile("file")
if err != nil {
return SendError(c, err)
}
filename := dFile.Filename
f, err := dFile.Open()
if err != nil {
return SendError(c, err)
}
defer f.Close()
// save file in s3
duration := sss.UploadFile(f, "temp/"+filename")
... send response
By the way for this project, I'm using the fiber framework.
While working on this I encountered tus-js-client which is doing the same as the simple-uploader and implementation in go called tusd which will reassemble the chunks so you don't have to worry about it anymore.
Here's a discussion where I posted my solution: https://stackoverflow.com/a/65785097/549529.

Encoding a file to send to Google AutoML

I am writing a golang script to send an image to the prediction engine of Google AutoML API.
It accepts most files using the code below, but certain .jpeg or .jpeg it returns error 500 saying invalid file. Mostly it works, but I can't figure out the exceptions. They are perfectly valid jpg's.
I am encoding the payload using EncodeToString.
Among other things, I have tried decoding it, saving it to a PNG, nothing seems to work. It doesn't like some images.
I wonder if I have an error in my method? Any help would be really appreciated. Thanks
PS the file saves to the filesystem and uploads to S3 just fine. It's just the encoding to a string when it goes to Google that it fails.
imgFile, err := os.Open(filename)
if err != nil {
fmt.Println(err)
}
img, fname, err := image.Decode(imgFile)
if err != nil {
fmt.Println(fname)
}
buf := new(bytes.Buffer)
err = jpeg.Encode(buf, img, nil)
// Encode as base64.
imgBase64Str := base64.StdEncoding.EncodeToString(buf.Bytes())
defer imgFile.Close()
payload := fmt.Sprintf(`{"payload": {"image": {"imageBytes": "%v"},}}`, imgBase64Str)
// send as a byte
pay := bytes.NewBuffer([]byte(payload))
req, err := http.NewRequest(http.MethodPost, URL.String(), pay)
I believe I fixed it.
I looked in the Google docs again and for the speech to text (which is a different API) it says to do encode64 -w 0
So, looking in Go docs, it seems RawStdEncoding is right to use to replicate this behaviour, not StdEncoding
No image failures yet. Hope this helps someone else one day.

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

I have an rtmp stream of a video call and I want to transcribe it. I have created 2 services in Go and I'm getting results but it's not very accurate and a lot of data seems to get lost.
Let me explain.
I have a transcode service, I use ffmpeg to transcode the video to Linear16 audio and place the output bytes onto a PubSub queue for a transcribe service to handle. Obviously there is a limit to the size of the PubSub message, and I want to start transcribing before the end of the video call. So, I chunk the transcoded data into 3 second clips (not fixed length, just seems about right) and put them onto the queue.
The data is transcoded quite simply:
var stdout Buffer
cmd := exec.Command("ffmpeg", "-i", url, "-f", "s16le", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", "-")
cmd.Stdout = &stdout
if err := cmd.Start(); err != nil {
log.Fatal(err)
}
ticker := time.NewTicker(3 * time.Second)
for {
select {
case <-ticker.C:
bytesConverted := stdout.Len()
log.Infof("Converted %d bytes", bytesConverted)
// Send the data we converted, even if there are no bytes.
topic.Publish(ctx, &pubsub.Message{
Data: stdout.Bytes(),
})
stdout.Reset()
}
}
The transcribe service pulls messages from the queue at a rate of 1 every 3 seconds, helping to process the audio data at about the same rate as it's being created. There are limits on the Speech API stream, it can't be longer than 60 seconds so I stop the old stream and start a new one every 30 seconds so we never hit the limit, no matter how long the video call lasts for.
This is how I'm transcribing it:
stream := prepareNewStream()
clipLengthTicker := time.NewTicker(30 * time.Second)
chunkLengthTicker := time.NewTicker(3 * time.Second)
cctx, cancel := context.WithCancel(context.TODO())
err := subscription.Receive(cctx, func(ctx context.Context, msg *pubsub.Message) {
select {
case <-clipLengthTicker.C:
log.Infof("Clip length reached.")
log.Infof("Closing stream and starting over")
err := stream.CloseSend()
if err != nil {
log.Fatalf("Could not close stream: %v", err)
}
go getResult(stream)
stream = prepareNewStream()
case <-chunkLengthTicker.C:
log.Infof("Chunk length reached.")
bytesConverted := len(msg.Data)
log.Infof("Received %d bytes\n", bytesConverted)
if bytesConverted > 0 {
if err := stream.Send(&speechpb.StreamingRecognizeRequest{
StreamingRequest: &speechpb.StreamingRecognizeRequest_AudioContent{
AudioContent: transcodedChunk.Data,
},
}); err != nil {
resp, _ := stream.Recv()
log.Errorf("Could not send audio: %v", resp.GetError())
}
}
msg.Ack()
}
})
I think the problem is that my 3 second chunks don't necessarily line up with starts and end of phrases or sentences so I suspect that the Speech API is a recurrent neural network which has been trained on full sentences rather than individual words. So starting a clip in the middle of a sentence loses some data because it can't figure out the first few words up to the natural end of a phrase. Also, I lose some data in changing from an old stream to a new stream. There's some context lost. I guess overlapping clips might help with this.
I have a couple of questions:
1) Does this architecture seem appropriate for my constraints (unknown length of audio stream, etc.)?
2) What can I do to improve accuracy and minimise lost data?
(Note I've simplified the examples for readability. Point out if anything doesn't make sense because I've been heavy handed in cutting the examples down.)
I think you are right that splitting the text into chunks causes many words to be chopped off.
I see another problem in the publishing. Between the calls topic.Publish and stdout.Reset() some time will pass and ffmpeg will probably have written some unpublished bytes to stdout, which will get cleared by the reset.
I am afraid the architecture is not fitted for your problem. The constraint of the message size causes many problems. The idea of a PubSub system is that a publisher notifies subscribers of events, but not necessarily to hold a large payload.
Do you really need two services? You could use two go routines to communicate via a channel. That would eliminate the pub sub system.
A strategy would be to make the chunks as large as possible. A possible solution:
Make the chunks as large as possible (nearly 60 seconds)
Make the chunks overlap each other by a short time (e.g. 5 seconds)
Programmatically detect the overlaps and remove them

PortAudio: Playback lag at default frames-per-buffer

I'm trying to play audio in Go, asynchronously, using PortAudio. As far as I'm aware PortAudio handles its own threading, so I don't need to use any of Go's build-in concurrency stuff. I'm using libsndfile to load the file (also Go bindings). Here is my code:
type Track struct {
stream *portaudio.Stream
playhead int
buffer []int32
}
func LoadTrackFilesize(filename string, loop bool, bytes int) *Track {
// Load file
var info sndfile.Info
soundFile, err := sndfile.Open(filename, sndfile.Read, &info)
if err != nil {
fmt.Printf("Could not open file: %s\n", filename)
panic(err)
}
buffer := make([]int32, bytes)
numRead, err := soundFile.ReadItems(buffer)
if err != nil {
fmt.Printf("Error reading from file: %s\n", filename)
panic(err)
}
defer soundFile.Close()
// Create track
track := Track{
buffer: buffer[:numRead],
}
// Create stream
stream, err := portaudio.OpenDefaultStream(
0, 2, float64(44100), portaudio.FramesPerBufferUnspecified, track.playCallback,
)
if err != nil {
fmt.Printf("Couldn't get stream for file: %s\n", filename)
}
track.stream = stream
return &track
}
func (t *Track) playCallback(out []int32) {
for i := range out {
out[i] = t.buffer[(t.playhead+i)%len(t.buffer)]
}
t.playhead += len(out) % len(t.buffer)
}
func (t *Track) Play() {
t.stream.Start()
}
Using these functions, after initialising PortAudio and all the rest, plays the audio track I supply - just. It's very laggy, and slows down the rest of my application (a game loop).
However, if I change the frames per buffer value from FramesPerBufferUnspecified to something high, say, 1024, the audio plays fine and doesn't interfere with the rest of my application.
Why is this? The PortAudio documentation suggests that using the unspecified value will 'choose a value for optimum latency', but I'm definitely not seeing that.
Additionally, when playing with this very high value, I notice some tiny artefacts - little 'popping' noises - in the audio.
Is there something wrong with my callback function, or anything else, that could be causing one or both of these problems?
I'm using OSX 10.10.5, with Go 1.3.3 and the libsndfile and portaudio from Homebrew.
Thanks.
Moving to the comment to an answer:
Always test with the latest version of Go.
Also, #Joel figured out that you need to use float32 instead of int32.

Resources