Retrieve audio data in JSON format and play it through Amazon Alexa - aws-lambda

I am working on a custom lambda function in JavaScript for the Amazon Alexa. Amazon's docs have clear details on building custom skills, and I have successfully built several "stock" skills from their templates.
I am writing a unique skill now which must retrieve the JSON data located at this link:
https://api.ense.nyc/latest
and then 'play' that data (since the data is snippets of audio) through the Alexa. I am not sure what to write to bring about this functionality.

This is a bit complicated than your average stock skills, from the url it looks like a podcast skill.
You need to
Parse The JSON and get the audiourl from the list.
Set the skill state to PLAY_MODE.
Keep track of audio progress with audio event handlers.
Probably use a dynamodb alike database to persist incase your session ends and your audios are long so they keep on playing.
here is a sample skill, that parses a RSS feed for a podcast then plays the audios in a row
https://github.com/bespoken/streamer

It seems that the audio files are short. In that case connect to the endpoint using an http fetch library (eg. httpm module, node-fetch or axios in node.js). Once you get the json file navigate to the properties that have the audio, get the url, surround them by audio tags <audio src="url"/> ands send them in a standard speech response of your skill. The audio tag has time and quality limitations so if you run into issues the audio is probably longer or a different quality than expected.

1)The audio should be available to public in (.mp3)
2)The audio should be in Alexa-friendly format
Converting audio files to an Alexa-friendly format using Audacity
1)Open the file to convert.
2)Set the Project Rate in the lower-left corner to 16000.
3)Click File > Export Audio and change the Save as type to MP3 Files.
4)Click Options, set the Quality to 48 kbps and the Bit Rate Mode to Constant.

Related

Playback raw sound data in memory using MFC functions

I have an MFC based project that decodes some data and generates 16 bit 48000 Hz raw wav audio data
The program continuously generates wav audio data in real time
Are there any functions in MFC that will let me play back the audio data in the sound card? I have been googling around for a while and the consensus seems to be that MFC doesn't have this feature. I have also found this tutorial that shows how to playback a wav file using PlaySound() function, but it looks like it is only for wav files and even if it plays audio data in memory, that data has to be prepared in the form of a full wav file with all the header information, while I need to play back raw wav data generated in real time
I have also seen people suggest using Direct X, but I feel like something like this should be possible using basic windows library functions without having to use any other extra libraries. I also found this tutorial for creating and reading wav files in an MFC based project, but it's not really clear how to use it to play raw wav data in memory. This tutorial uses waveOutOpen() function to playbakc the wav file, and it looks like this is probably what I need, but I cannot find a simple tutorial that shows how to use it.
How do I playback raw wav audio in memory in an MFC Dialog based project? I am looking for something where I can specify pointer to the wav data, number of samples, bits and sampling frequency and the function would playback the wav data for me. A basic working example such as generating a sinewave and playing it back will be appreciated. If directx is the only way to do this then that's fine as well.

Dynamic video creation using multiple images

I want to create a user video which should take a photo album as input and play exactly like Facbook Look back video.
I have looked at couple of option including imagemagick and ffmpeg. Are there any good alternatives available for doing this.
If you want to create a video dynamically through the browser you cannot do this on client side (not in a convenient way anyways). There is no functionality in browsers today that allows you to create video files (only streams) and the option is to write JavaScript code to do all the low-level encoding etc. which will take ages (to write but also in processing) and be prone to errors etc.
Your best option is to send the individual frames to server as for example jpeg (or png if you need high quality) and process it there using jobs where the processing can be done with f.ex. FFMpeg (which is great for these things).
Track the job id using some sort of user id and have a database or file updated with current status so the user can come back and check.

Take image out of video stream in ruby

I have a link to some video stream (web cam that is always recording some place). I would like to be able to take a screenshot of what ever is on that video stream at the moment a user goes to my app.
Can it be done and how?
I have looked but all I could find was for taking screenshots out of a movie/video, not out of a streaming video.
I suspect ffmpeg connected to the streaming service as an input could probably extract thumbnails for you. You could either leave it running and pick up latest thumbnails, or fire it up with a system command and make it connect and emit a single screenshot. The latter would be more efficient and easier to code if you have a low number of hits, but would have a high latency on each request.
I did a quick search for you, but the most common uses of ffmpeg with streaming input is to re-format and re-stream, or to use it in personal video recorder setup. Ffmpeg is quite complex, so I could not complete the search in the time I have had so far.

Analyse audio stream using Ruby

I'm searching for a way to analyse the content of internet radios. I want to write a ruby client that can get the current track, next track, band, bpm and other meta information from a stream (e.g. a radio on shoutcast).
Does anybody know how to do this? And how do I record that stream into a mp3 or aac file?
Maybe there is a library that can already do this, I haven't one so far.
regards
I'll answer both of your questions.
Metadata
What you are seeking isn't entirely possible. Information on the next track is not available (keep in mind not all stations are just playing songs from a playlist... many offer live content). Advanced metadata such as BPM is not available. All you get is something like this:
Some Band - Some Song
The format of {artist} - {song title} isn't always followed either.
With those caveats, you can get that metadata from a stream by connecting to the stream URL and requesting the metadata with the following request header:
Icy-MetaData: 1
That tells the server to send the metadata, which is interleaved into the stream. Every 8KB or so (specified by the server in a response header), you'll find a chunk of metadata to parse. I have written up a detailed answer on how to parse that here: Pulling Track Info From an Audio Stream Using PHP The prior question was language-specific, but you will find that my answer can be easily implemented in any language.
Saving Streams to Disk
Audio playing software is generally very resilient to errors. SHOUTcast servers are built on this principal, and are not knowledgeable about the data going through them. They just receive data from an encoder, and when the client requests the stream, they start sending that data at an arbitrary point.
You can use this to your advantage when saving stream data. It is possible to simply write the stream data as it comes in to a file. Most audio players will play them without problem. I have tested this with MP3 and AAC.
If you want a more conformant file, you will have to use a library or parse the stream yourself to split on the appropriate frames, and then handle bit reservoir issues in your code. This is a lot of work, and generally isn't worth doing unless you find your files have real compatibility problems.

DirectShow - How to read a file from a source filter

I'm writing a DirectShow source filter which is registered as a CLSID_VideoInputDeviceCategory, so it can be seen as a Video Capture Device (from Skype, for example, it is viewed as another WebCam).
My source filter is based on the VCam example from here, and, for now, the filter produces the exact output as this example (random colored pixels with one Video output pin, no audio yet), all implemented in the FillBuffer() method of the one and only output pin.
Now the real scenario will be a bit more tricky - The filter uses a file handle to a hardware device, opened using the CreateFile() API call (opening the device is out of my control, and is done by a 3Party library). It should then read chunks of data from this handle (usually 256-512 bytes chunk sizes).
The device is a WinUSB device and the 3Party framework just "gives" me an opened file handle to read chunks from.
The data read by the filter is a *.mp4 file, which is streamed from the device to the "handle".
This scenario is equivalent to a source filter reading from a *.mp4 file on the disk (in "chunks") and pushing its data to the DirectShow graph, but without the ability to read the file entirely from start to end, so the file size is unknown (Correct?).
I'm pretty new to DirectShow and I feel as though I'm missing some basic concepts. I'll be happy if anyone can direct me to solutions\resources\explanations for the following questions:
1) From various sources on the web and Microsoft SDK (v7.1) samples, I understood that for an application (such as Skype) to build a correct & valid DirectShow graph (so it will render the Video & Audio successfully), the source filter pin (inherits from CSourceStream) should implement the method "GetMediaType". Depending on the returned value from this implemented function, an application will be able to build the correct graph to render the data, thus, build the correct order of filters. If this is correct - How would I implement it in my case so that the graph will be built to render *.mp4 input in chunks (we can assume constant chunk sizes)?
2) I've noticed the the FillBuffer() method is supposed to call SetTime() for the IMediaSample object it gets (and fills). I'm reading raw *.mp4 data from the device. Will I have to parse the data and extract the frames & time values from the stream? If yes - an example would b great.
3) Will I have to split the data received from the file handle (the "chunks") to Video & Audio, or can the data be pushed to the graph without the need to manipulate it in the source filter? If split is needed - How can it be done (the data is not continuous, and is spitted to chunks) and will this affect the desired implementation of "GetMediaType"?
Please feel free to correct me if I'm using incorrect terminology.
Thanks :-)
This is a good question. On the one hand this is doable, but there is some specific involved.
First of all, your filter registered under CLSID_VideoInputDeviceCategory category is expected to behave as a live video source. By doing so you make it discoverable by applications (such as Skype as you mentioned), and those applications will be attempting to configure video resolution, they expect video to go at real time rate, some applications (such as Skype) are not expecting compressed video such H.264 there or would just reject such device. You can neither attach audio right to this filter as applications would not even look for audio there (not sure if you have audio on your filter, but you mentioned .MP4 file so audio might be there).
On your questions:
1 - You would have a better picture of application requirement by checking what interface methods applications call on your filter. Most of the methods are implemented by BaseClasses and convert the calls into internal methods such as GetMediaType. Yes you need to implement it, and by doing so you will - among other - enable your filter to connect with downstream filter pins by trying specific media types you support.
Again, those cannot me MP4 chunks, even if such approach can work in other DirectShow graphs. Implementing a video capture device you should be delivering exactly video frames, preferably decompressed (well those could be compressed too, but you are going to immediately have compatibility issies with applications).
A solution you might be thinking of is to embed a fully featured graph internally to which you inject your MP4 chunks, then the pipelines parse those, decodes and delivers to your custom renderer, taking frames on which you re-expose them off your virtual device. This might be a good design, though assumes certain understanding of how filters work internally.
2 - Your device is typically treated as/expected to be a live source, which means that you deliver video in realtime and frames are not necessarily time stamped. So you can put times there and yes you definitely need to extract time stamps from your original media (or have it done by internal graph as mentioned in item 1 above), however be prepared that applications strip time stamps especially for preview purposes, since the source is "live".
3 - Getting back to audio, you cannot implement audio on the same virtual device. Well you can, and this filter might be even working in a custom built graph, but this is not going to work with applications. They will be looking for separate audio device, and if you implement such, they will instantiate it separately. So you are expected to implement both virtual video and virtual audio source, and implement internal synchronization behind the scenes. This is where timestamps will be important, by providing them correctly you will keep lip sync in live session to what it was originally on the media file you are streaming from.

Resources