Cannot control volume level with prosody in google Cloud Text-to-Speech

Cannot control volume level with prosody in google Cloud Text-to-Speech - ssml

SSML Volume property has no effect on the output audio
Following is the ssml
<speak>
<prosody volume = "+0dB"> This is a sentence with volume 10 For GOOGLE. </prosody>
<s><prosody volume = "+6dB"> This is a sentence with volume 6 For GOOGLE. </prosody></s>
<s><prosody volume = "+24dB"> This is a sentence with volume +24 For GOOGLE. </prosody></s>
<s><prosody volume = "+48dB"> This is a sentence with volume +48 For GOOGLE.</prosody></s> <s><prosody volume = "+196dB"> This is a sentence with volume +196 For GOOGLE.</prosody></s>
</speak>
And here is a sample code
String ssml = $"<speak><prosody volume = \"+0dB\"> This is a sentence with volume 10 For GOOGLE.</prosody>" +
$" <s><prosody volume = \"+6dB\"> This is a sentence with volume 6 For GOOGLE.</prosody></s>" +
$" <s><prosody volume = \"+24dB\"> This is a sentence with volume +24 For GOOGLE.</prosody></s>" +
$" <s><prosody volume = \"+48dB\"> This is a sentence with volume +48 For GOOGLE.</prosody></s>" +
$" <s><prosody volume = \"+196dB\"> This is a sentence with volume +196 For GOOGLE.</prosody></s>" +
$"</speak>";
Dubb(ssml);
public static void Dubb(string ssml)
{
var client = TextToSpeechClient.Create();
// The input to be synthesized, can be provided as text or SSML.
var input = new SynthesisInput
{
Ssml = ssml
};
// Build the voice request.
var voiceSelection = new VoiceSelectionParams
{
LanguageCode = "en-US",
SsmlGender = SsmlVoiceGender.Female
};
// Specify the type of audio file.
var audioConfig = new AudioConfig
{
AudioEncoding = AudioEncoding.Linear16
};
// Perform the text-to-speech request.
var response = client.SynthesizeSpeech(input, voiceSelection, audioConfig);
// Write the response to the output file.
using (var output = File.Create("output.wav"))
{
response.AudioContent.WriteTo(output);
}
}
I expected that the volume will increase in each line, but it doesn't.

I tried this
<speak>
<prosody volume = "+0dB"> This is a sentence with volume 10 For GOOGLE. </prosody>
<s><prosody volume = "+6dB"> This is a sentence with volume 6 For GOOGLE. </prosody></s>
<s><prosody volume = "+24dB"> This is a sentence with volume +24 For GOOGLE. </prosody></s>
<s><prosody volume = "+48dB"> This is a sentence with volume +48 For GOOGLE.</prosody></s> <s><prosody volume = "+196dB"> This is a sentence with volume +196 For GOOGLE.</prosody></s>
</speak>
On the TTS UI and it does work as expected.
From there you can export it to JSON (maybe it helps you).
{
"audioConfig": {
"audioEncoding": "LINEAR16",
"pitch": 0,
"speakingRate": 1
},
"input": {
"ssml": "<speak> <prosody volume = \"+0dB\"> This is a sentence with volume 10 For GOOGLE. </prosody> <s><prosody volume = \"+6dB\"> This is a sentence with volume 6 For GOOGLE. </prosody></s> <s><prosody volume = \"+24dB\"> This is a sentence with volume +24 For GOOGLE. </prosody></s> <s><prosody volume = \"+48dB\"> This is a sentence with volume +48 For GOOGLE.</prosody></s> <s><prosody volume = \"+196dB\"> This is a sentence with volume +196 For GOOGLE.</prosody></s> </speak>"
},
"voice": {
"languageCode": "en-US",
"name": "en-US-Standard-A"
}
}

Related

Bing custom search apis returning only limited results from one location and full result from different location

I am trying to use Bing Custom Search's API for documents from Cognitive Services. The strange thing is that when I run it from India, it gives me more than a thousand results, but when I run it from a US server, it returns only 25 (sometimes 50 results). Here is the sample code for that:
var totalCount = 0;
var filetypes = new List<string> { "pdf", "docx", "doc" };
foreach (var filetype in filetypes)
{
var searchTerm = "microsoft%20.net%20resume+filetype%3a" + filetype;
Console.WriteLine("Searching for : " + filetype);
for (var i = 0; i < 40; i++)
{
var nextCount = 0;
var url = "https://api.cognitive.microsoft.com/bingcustomsearch/v7.0/search?" +
"q=" + searchTerm +
"&customconfig=" + customConfigId +
"&count=25" + "&offset=" + ((i * 25) + nextCount);
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", subscriptionKey);
var httpResponseMessage = client.GetAsync(url).Result;
var responseContent = httpResponseMessage.Content.ReadAsStringAsync().Result;
BingCustomSearchResponse response =
JsonConvert.DeserializeObject<BingCustomSearchResponse>(responseContent);
if (response.webPages == null || response.webPages.value.Length <= 0)
{
Console.WriteLine("response.webPages is null ");
break;
}
foreach (var webPage in response.webPages.value)
{
Console.WriteLine("name: " + webPage.name);
Console.WriteLine("url: " + webPage.url);
Console.WriteLine("displayUrl: " + webPage.displayUrl);
Console.WriteLine("snippet: " + webPage.snippet);
Console.WriteLine("dateLastCrawled: " + webPage.dateLastCrawled);
Console.WriteLine();
}
totalCount = totalCount + response.webPages.value.Length;
}
}
}
The subscription key I am using is a trial key.

I got the reason of this behavior. Actually it had nothing to do with region/country/market.
After looking into the response i got this message.
"Rate limit is exceeded. Try again in 1 seconds"
It means for after each call in the loop i have to wait for 1 second to give next call. Now need to know is this limit for trial subscription or this is kept for all calls to prevent DDOS attack or something.
May be from India it was working because may one iteraction is already taking one or more second.

Two things you can try: 1) In searchTerm, no need to use %20 and %3a, just use punctuations as you type in Bing, e.g. var searchTerm = "microsoft.net resume filetype::"+filetype, and 2) Enforce market by appending mkt=en-in (for India) or en-us (for US) in the query. You can do this by appending +"&mkt=en-in" at the end of url.
I presume for custom search you have selected domains (for both en-in and en-us markets) that return thousands of results for this query.

Check if system volume is muted

I'm Currently working on a small project, in which I need to check if the system volume is muted from the App Delegate.
As sound as the user mute's/unmute's the volume an function needs to be called.
I've found some things about AudioToolbox, but I can't seem to find anything that works.

I know to look up if the default device is muted or not. First, you need to look up the 'default' audio device hardware ID. This can be done once and stored in your program.
var propAddr = AudioObjectPropertyAddress(
mSelector: AudioObjectPropertySelector(kAudioHardwarePropertyDefaultOutputDevice),
mScope: AudioObjectPropertyScope(kAudioObjectPropertyScopeGlobal),
mElement: AudioObjectPropertyElement(kAudioObjectPropertyElementMaster))
var defaultAudioHardwareID : AudioDeviceID = 0
var propSize = UInt32(sizeof(uint32))
let status = AudioHardwareServiceGetPropertyData(AudioObjectID(kAudioObjectSystemObject), &propAddr, 0 , nil, &propSize, &defaultAudioHardwareID)
After that, you can look up if the device is muted.
var propAddr = AudioObjectPropertyAddress(
mSelector: AudioObjectPropertySelector(kAudioDevicePropertyMute),
mScope: AudioObjectPropertyScope(kAudioObjectPropertyScopeOutput),
mElement: AudioObjectPropertyElement(kAudioObjectPropertyElementMaster))
var isMuted: uint32 = 0
var propSize = UInt32(sizeof(uint32))
let status = AudioHardwareServiceGetPropertyData(defaultAudioHardwareID, &propAddr, 0, nil, &propSize, &isMuted)
if isMuted != 0 {
// Do stuff here
return;
}
I don't know if there's a way to get a notification when the mute state changes or not.

Image Transmit to Intermec PM4i printer and then Print

I'm using Fingerprint to upload and then print image with pcx format.
Step1 Upload image to printer using TCP port, I use command :
IMAGE LOAD "bigfoot.1",1746,""\r\n
The printer returns with message "OK".
And then I send bytes data of bigfoot.1 to printer using socket.
Step 2 Print the image "bigfoot.1":
PRPOS 200,200
DIR 3
ALIGN 5
PRIMAGE "bigfoot.1"
PRINTFEED
RUN
The problem comes, the printer returns with message "Image not found". So I come up with the possibility of failure of upload. So I open the software PrintSet4 to check the image, the image already exists in TMP.Odd!!!
At last, I used PrintSet4 to substitute my socket application to upload image, After add file and apply, I use the step2 print command to print image, It works fine!
Here is the C# code to upload Image:
public void SendFile(string filePath, string CR_LF)
{
FileInfo fi = new FileInfo(filePath);
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
byte[] byteFile = new byte[fs.Length];
string cmd = "IMAGE LOAD \"" + fi.Name + "\"," + byteFile.Length.ToString() + ",\" \"" + CR_LF;
ClientSocket.Send(encode.GetBytes(cmd));
fs.Read(byteFile, 0, byteFile.Length);
Thread.Sleep(1000);
ClientSocket.Send(byteFile);
}
}

I have modified your code and used serial port.
public void SendFile(string filePath)
{
SerialPort port = new SerialPort("COM3", 38400, Parity.None, 8, StopBits.One);
port.Open();
FileInfo fi = new FileInfo(filePath);
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
{
byte[] byteFile = new byte[fs.Length];
// string cmd = "IMAGE LOAD \"" + fi.Name + "\"," + teFile.Length.ToString()+ ",\"\"" + CR_LF;
string cmd = "IMAGE LOAD " + "\"" + fi.Name + "\"" + "," + byteFile.Length.ToString() + "," + "\"S\"";
port.WriteLine(cmd);
fs.Read(byteFile, 0, byteFile.Length);
port.Write(byteFile,0,byteFile.Count());
int count = byteFile.Count();
int length = byteFile.Length;
}
}
So I noticed the problem was using CR_LF. Instead, I used port.WriteLine(cmd), which acts the same as adding a line separator. And it worked fine.

How do I set/get the volume level with Core Audio?

I want to be able to get and set the system volume level with Core Audio. I've followed the code on this other thread:
objective c audio meter
However, my call to AudioHardwareServiceHasProperty to find the kAudioHardwareServiceDeviceProperty_VirtualMasterVolume property returns false. Why is this happening, and how do I get around it? What approach should I take to getting and setting the system volume level with Core Audio?

Have you tried kAudioDevicePropertyVolumeScalar:
UInt32 channel = 1; // Channel 0 is master, if available
AudioObjectPropertyAddress prop = {
kAudioDevicePropertyVolumeScalar,
kAudioDevicePropertyScopeOutput,
channel
};
if(!AudioObjectHasProperty(deviceID, &prop))
// error
Float32 volume;
UInt32 dataSize = sizeof(volume);
OSStatus result = AudioObjectGetPropertyData(deviceID, &prop, 0, NULL, &dataSize, &volume);
if(kAudioHardwareNoError != result)
// error

coreaudio: how to get/set the system alert volume as opposed to device volume

I've been searching the documentation, mailing lists on and off for a couple of days but can't seem to find the answer to this.
I've got an OS X app that, amongst other things, queries the available hardware devices and their current volumes using kAudioDevicePropertyVolumeScalar and friends.
What I want to be able to do is get and set the -alert- volume (?) for the system output device represented by kAudioHardwarePropertyDefaultSystemOutputDevice rather than that devices volume.
To clarify from my limited understand, this is the volume setting users can adjust in System Preferences under 'Play sound effects through'.
Searching the coreaudio-api lists, I've managed to glean that this volume setting is not a device property but some kind of derived value, but I'm stumped as to where to from here.
Any help would be gratefully received.

I am not sure if you really have a requirement for reading it through CoreAudio, but the following works just fine:
NSUserDefaults *defaults = [NSUserDefaults standardUserDefaults];
[defaults addSuiteNamed:#"com.apple.systemsound"];
NSLog(#"%f", [defaults floatForKey:#"com.apple.sound.beep.volume"]);
Though this might change with operating system updates as the settings are stored in ~/Library/Preferences/com.apple.systemsound.plist.

TL;DR: I’ve published a reusable library to get the system alert volume: https://github.com/teddywing/SSVLSystemAlertVolume
As you pointed out, the system alert volume is different from the output device volume.
If we query the volume of the device associated with kAudioHardwarePropertyDefaultSystemOutputDevice using the kAudioHardwareServiceDeviceProperty_VirtualMasterVolume property (using a method similar to the one described in Technical Q&A QA1016), we get the volume of the device, not the system alert volume.
While kAudioHardwareServiceDeviceProperty_VirtualMasterVolume is an Audio Hardware property, the system alert volume is an Audio Services property. This is the Audio Services property that represents the system alert volume:
const AudioServicesPropertyID kAudioServicesPropertySystemAlertVolume = 'ssvl';
From what I can tell, this Audio Services property is undocumented and/or private, as I haven’t been able to find it in any of the headers in MacOSX10.15.sdk/System/Library/Frameworks/. I’m defining it here at the application level.
To get the system alert volume:
#include <AudioToolbox/AudioToolbox.h>
#include <CoreFoundation/CoreFoundation.h>
OSStatus system_volume_get(Float32 *volume) {
UInt32 volume_size = sizeof(*volume);
OSStatus result = AudioServicesGetProperty(
kAudioServicesPropertySystemAlertVolume,
0,
NULL,
&volume_size,
volume
);
if (*volume != 0) {
*volume = log(*volume) + 1.0;
}
else {
*volume = 0;
}
return result;
}
To set the alert volume:
OSStatus system_volume_set(Float32 volume) {
volume = exp(volume - 1.0);
return AudioServicesSetProperty(
kAudioServicesPropertySystemAlertVolume,
0,
NULL,
sizeof(volume),
&volume
);
}
Reverse engineering the Sound preference pane
To discover how to access the system alert volume, I disassembled the Sound preference pane binary using Hopper. On macOS 10.15, the binary lives at: /System/Library/PreferencePanes/Sound.prefPane/Contents/MacOS/Sound.
The disassembly gives us two accessor methods for the property, which, after translating into C, yield the functions at the top of this answer:
/* #class AppleSound_SoundSettings */
-(float)alertSoundVolume {
var_8 = 0x4;
rax = AudioServicesGetProperty(0x7373766c, 0x0, 0x0, &var_8, &var_4);
if (rax != 0x0) {
xmm0 = *(float *)float_value_0_5;
}
else {
xmm1 = var_4;
xmm0 = 0x0;
if (xmm1 != xmm0 || !CPU_FLAGS & NP) {
log(intrinsic_cvtss2sd(0x0, xmm1));
xmm0 = intrinsic_cvtsd2ss(xmm0 + *double_value_1, xmm0 + *double_value_1);
}
}
return xmm0;
}
/* #class AppleSound_SoundSettings */
-(void)setAlertSoundVolume:(float)arg2 {
xmm0 = arg2;
if (xmm0 != 0x0 || !CPU_FLAGS & NP) {
exp(intrinsic_cvtss2sd(xmm0, xmm0) + *double_value_minus_1);
xmm0 = intrinsic_cvtsd2ss(xmm0 + *double_value_minus_1, xmm0 + *double_value_minus_1);
}
var_C = xmm0;
rax = AudioServicesSetProperty(0x7373766c, 0x0, 0x0, 0x4, &var_C);
if (rax != 0x0) {
NSLog(#"Error %d setting system sound volume", rax);
}
else {
[[NSDistributedNotificationCenter defaultCenter] postNotificationName:*0x19f00 object:0x0 userInfo:0x0 options:0x3];
}
return;
}
Where the AudioServicesPropertyID comes from
We can see in the disassembled accessors above that the property is accessed with AudioServicesGetProperty() and AudioServicesSetProperty(), which are defined in “AudioServices.h” in “AudioToolbox.framework”. Those functions are called with the AudioServicesPropertyID 0x7373766c as their first argument. That AudioServicesPropertyID is the key to getting the system alert volume.
Only two AudioServicesPropertyIDs are defined in “AudioToolbox.framework” (in MacOSX10.15.sdk):
// AudioToolbox.framework/Versions/A/Headers/AudioServices.h
typedef UInt32 AudioServicesPropertyID;
// ...
CF_ENUM(AudioServicesPropertyID)
{
kAudioServicesPropertyIsUISound = 'isui',
kAudioServicesPropertyCompletePlaybackIfAppDies = 'ifdi'
};
Neither of the above properties are equal to the system alert volume property, 0x7373766c.
Their values are UInt32s, or really something called a FourCharCode:
// CoreFoundation.framework/Versions/A/Headers/CFBase.h
typedef UInt32 FourCharCode;
The answers to the following question describe ways to convert an integer to a FourCharCode: iOS/C: Convert "integer" into four character string.
Using one of the methods described in the above question, we can convert 0x7373766c to a FourCharCode, which results in 'ssvl'.
Logarithm and exponent
I haven’t been able to work out why the alert volume uses logarithms and exponents, but they’re required from what I can tell. If I set the alert volume to 0.5 without the exponent as in the following:
Float32 volume = 0.5;
AudioServicesSetProperty(
kAudioServicesPropertySystemAlertVolume,
0,
NULL,
sizeof(volume),
&volume
);
then the Sound preference pane shows a volume level of 31% instead of 50%:
System alert volume set to 31% of output volume

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Cannot control volume level with prosody in google Cloud Text-to-Speech - ssml

Related

Bing custom search apis returning only limited results from one location and full result from different location

Check if system volume is muted

Image Transmit to Intermec PM4i printer and then Print

How do I set/get the volume level with Core Audio?

coreaudio: how to get/set the system alert volume as opposed to device volume

Categories

Resources