Extremely Slow file upload to a Blazor Server app deployed as Azure Web App - performance

I created a Blazor Server app that would allow end users to upload large excel files that would be consumed in downstream logic.
I use a the standard .NET core 5 InputFile component to upload the excel file to the app, within the app, I read the stream async, copy it into a memory stream and then use ExcelDataReader to convert it into dataset.
The challenge I see is that the upload takes a long time specifically when App is deployed to Azure. To dig a bit deeper into what exactly was consuming the time, I track progress of the StreamCopy operation:
The following code handles my upload:
private async Task OnInputFileChange(InputFileChangeEventArgs e)
{
this.StateHasChanged();
IReadOnlyList<IBrowserFile> selectedFiles;
selectedFiles = e.GetMultipleFiles();
foreach (var file in selectedFiles)
{
DataSet ds = new DataSet();
{
bool filesuccesfullRead = false;
//allowing a 100MB file at once
var timer = new Timer(new TimerCallback(_ =>
{
if (fileTemplateData.uploadProgressInfo.percentage <= 100)
{
// Note that the following line is necessary because otherwise
// Blazor would not recognize the state change and not refresh the UI
InvokeAsync(() =>
{
StateHasChanged();
});
}
}), null, 1000, 1000);
using (Stream stream = file.OpenReadStream(104857600))
using (MemoryStream ms = new MemoryStream())
{
fileTemplateData.uploadProgressInfo = new GlobalDataClass.CopyProgressInfo();
await ExtensionsGeneric.CopyToAsync(stream, ms, 128000, fileTemplateData.uploadProgressInfo);
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
try
{
using (var reader = ExcelReaderFactory.CreateReader(ms))
{
ds = reader.AsDataSet(new ExcelDataSetConfiguration()
{
ConfigureDataTable = _ => new ExcelDataTableConfiguration()
{
UseHeaderRow = false
}
});
filesuccesfullRead = true;
}
}
catch (Exception ex)
{
Message = "Unable to read provided file(s) with exception " + ex.ToString();
}
stream.Close();
ms.Close();
}
}
ds.Dispose();
ds = null;
}
fileTemplateData.fileloading = false;
this.StateHasChanged();
}
Here is the CopyToAsync Function which is same as regular stream copy but provides progress tracking:
public static async Task CopyToAsync(this Stream fromStream, Stream destination, int bufferSize, GlobalDataClass.CopyProgressInfo progressInfo)
{
var buffer = new byte[bufferSize];
int count;
progressInfo.TotalLengthinBytes = fromStream.Length;
while ((count = await fromStream.ReadAsync(buffer, 0, buffer.Length)) != 0)
{
progressInfo.BytesTransfered += count;
progressInfo.percentage = Math.Round((((double)progressInfo.BytesTransfered /(double) progressInfo.TotalLengthinBytes) * 100), 1);
await destination.WriteAsync(buffer, 0, count);
}
}
public class CopyProgressInfo
{
public long BytesTransfered { get; set; }
public long TotalLengthinBytes { get; set; }
public double percentage { get; set; }
public DateTime LastProgressUpdateVisualized = new DateTime();
}
Now Let me put the question:
Using this code, I achieve a fair upload speed when the app is running on a local host(A 75MB file with tonnes of data would upload in around 18 seconds). When the app is deployed to an Azure App service plan, the same file would take more than 10 minutes to upload, which makes me feel something is seriously wrong. Using progress tracking, I was able to confirm that the time is being consumed by the CopytoAsync function and not the logic after that.
Here's what I have investigated:
I checked my internet upload speed on two seprate connections with a stable upload bandwidth of more than 25Mbps, so this is not an issue.
I upgraded the app service plan to a higher tier momentarily to see if upload bandwidth was somehow linked with Azure App Service plan tier, even increasing it to a powerful P3V2 tier made no difference.
To see if the specific Datacenter where my App service sits in was offering poor upload performance from my part of the world, I checked average upload speed using https://www.azurespeed.com/Azure/UploadLargeFile and a 75Mb file would upload in around 38 seconds to Azure West Europe Datacenter. So I donot see if the connectivity is the problem here.
With all that is mentioned above, what could be causing such a poor file upload speed when uploading the file onto a Deployed Blazor Server Web App.

I don't see such performance impact. I upload to azure blob storage though.
My implementation summary:
razor component called imageUpload.razor that contains
public async Task HandleFileSelected(InputFileChangeEventArgs e)
and calls a service like:
await hService.UploadImgToAzureAsync(imageFile.OpenReadStream(), fileName);
service that contains the following:
public async Task UploadImgToAzureAsync(Stream fileStream, string fileName)
{
return await ImageHelper.UploadImageToStorage(fileStream, fileName);
}
ImageHelper calls AzureStorage.cs
AzureStorage.cs that handles calls UploadFromStreamAsync

I finally managed to improve the upload performance, unfortunately Blazor's built in InputFile component doesn't seem to be designed very well for large file uploads specially when the app has been deployed. I used Tewr's upload file component with a larger buffer size(128000) and that has significantly improved performance(3X reduction). Tewr's sample code is available here:
https://github.com/Tewr/BlazorFileReader/blob/master/src/Demo/Blazor.FileReader.Demo.Common/IndexCommon.razor

Related

Various errors using VisionServiceClient in XamarinForms

I am trying to create a simple Xamarin forms app which allows the user to browse for or take a photo and have azure cognitive services tag the photo using a custom vision model.
I am unable to get the client to successfully authenticate or find a resource per the error message in the exception produced by the VisionServiceClient. Am I missing something? What would be the correct values to use for the arguments to VisionServiceClient?
All keys have been removed from the below images, they are populated.
Exception thrown in VS2017:
'Microsoft.ProjectOxford.Vision.ClientException' in System.Private.CoreLib.dll
Call to VisionServiceClient:
private const string endpoint = #"https://eastus2.api.cognitive.microsoft.com/vision/prediction/v1.0";
private const string key = "";
VisionServiceClient visionClient = new VisionServiceClient(key, endpoint);
VisualFeature[] features = { VisualFeature.Tags, VisualFeature.Categories, VisualFeature.Description };
try
{
AnalysisResult temp = await visionClient.AnalyzeImageAsync(imageStream,
features.ToList(), null);
return temp;
}
catch(Exception ex)
{
return null;
}
VS Exception Error:
Azure Portal for cognitive services:
Custom Vision Portal:
It looks like you're confusing the Computer Vision and the Custom Vision APIs. You are attempting to use the client SDK for the former using the API key of the latter.
For .NET languages, you'll want the Microsoft.Azure.CognitiveServices.Vision.CustomVision.Prediction NuGet package.
Your code will end up looking something like this:
ICustomVisionPredictionClient client = new CustomVisionPredictionClient()
{
ApiKey = PredictionKey,
Endpoint = "https://southcentralus.api.cognitive.microsoft.com"
};
ImagePrediction prediction = await client.PredictImageAsync(ProjectId, stream, IterationId);
Thank you to cthrash for the extended help and talking with me in chat. Using his post along with a little troubleshooting I have figured out what works for me. The code is super clunky but it was just to test and make sure I'm able to do this. To answer the question:
Nuget packages and classes
Using cthrash's post I was able to get both the training and prediction nuget packages installed, which are the correct packages for this particular application. I needed the following classes:
Microsoft.Azure.CognitiveServices.Vision.CustomVision.Prediction
Microsoft.Azure.CognitiveServices.Vision.CustomVision.Prediction.Models
Microsoft.Azure.CognitiveServices.Vision.CustomVision.Training
Microsoft.Azure.CognitiveServices.Vision.CustomVision.Training.Models
Endpoint Root
Following some of the steps Here I determined that the endpoint URL's only need to be the root, not the full URL provided in the Custom Vision Portal. For instance,
https://southcentralus.api.cognitive.microsoft.com/customvision/v2.0/Prediction/
Was changed to
https://southcentralus.api.cognitive.microsoft.com
I used both the key and endpoint from the Custom Vision Portal and making that change I was able to use both a training and prediction client to pull the projects and iterations.
Getting Project Id
In order to use CustomVisionPredictionClient.PredictImageAsync you need a Guid for the project id and an iteration id if a default iteration is not set in the portal.
I tested two ways to get the project id,
Using project id string from portal
Grab the project id string from the portal under the project settings.
For the first argument to PredictImageAsync pass
Guid.Parse(projectId)
Using the training client
Create a new CustomVisionTrainingClient
To get a list of <Project> use
TrainingClient.GetProjects().ToList()
In my case I only had a single project so I would just need the first element.
Guid projectId = projects[0].Id
Getting Iteration Id
To get the iteration id of a project you need the CustomVisionTrainingClient.
Create the client
To get a list of <Iteration> use
client.GetIterations(projectId).ToList()
In my case I had only a single iteration so I just need the first element.
Guid iterationId = iterations[0].Id
I am now able to use my model to classify images. In the code below, fileStream is the image stream passed to the model.
public async Task<string> Predict(Stream fileStream)
{
string projectId = "";
//string trainingEndpoint = "https://southcentralus.api.cognitive.microsoft.com/customvision/v2.2/Training/";
string trainingEndpoint = "https://southcentralus.api.cognitive.microsoft.com/";
string trainingKey = "";
//string predictionEndpoint = "https://southcentralus.api.cognitive.microsoft.com/customvision/v2.0/Prediction/";
string predictionEndpoint = "https://southcentralus.api.cognitive.microsoft.com";
string predictionKey = "";
CustomVisionTrainingClient trainingClient = new CustomVisionTrainingClient
{
ApiKey = trainingKey,
Endpoint = trainingEndpoint
};
List<Project> projects = new List<Project>();
try
{
projects = trainingClient.GetProjects().ToList();
}
catch(Exception ex)
{
Debug.WriteLine("Unable to get projects:\n\n" + ex.Message);
return "Unable to obtain projects.";
}
Guid ProjectId = Guid.Empty;
if(projects.Count > 0)
{
ProjectId = projects[0].Id;
}
if (ProjectId == Guid.Empty)
{
Debug.WriteLine("Unable to obtain project ID");
return "Unable to obtain project id.";
}
List<Iteration> iterations = new List<Iteration>();
try
{
iterations = trainingClient.GetIterations(ProjectId).ToList();
}
catch(Exception ex)
{
Debug.WriteLine("Unable to obtain iterations.");
return "Unable to obtain iterations.";
}
foreach(Iteration itr in iterations)
{
Debug.WriteLine(itr.Name + "\t" + itr.Id + "\n");
}
Guid iteration = Guid.Empty;
if(iterations.Count > 0)
{
iteration = iterations[0].Id;
}
if(iteration == Guid.Empty)
{
Debug.WriteLine("Unable to obtain project iteration.");
return "Unable to obtain project iteration";
}
CustomVisionPredictionClient predictionClient = new CustomVisionPredictionClient
{
ApiKey = predictionKey,
Endpoint = predictionEndpoint
};
var result = await predictionClient.PredictImageAsync(Guid.Parse(projectId), fileStream, iteration);
string resultStr = string.Empty;
foreach(PredictionModel pred in result.Predictions)
{
if(pred.Probability >= 0.85)
resultStr += pred.TagName + " ";
}
return resultStr;
}

Google Drive Api Pdf export from Google Doc generate empty response

I'm using the export Google Drive API to retrieve a Google Doc as Pdf: https://developers.google.com/drive/v3/reference/files/export
I'm having the following problem: for documents bigger than a certain size (I don't know exactly the threshold, but it happens even with relatively small files around 1,5 MB) the API return a 200 response code with a blank result (normally it should contains the pdf data as byte stream), as you can see in the following screenshot:
I can successfully export the file via GoogleDrive/GoogleDoc UI with the "File -> Download as.. -> Pdf" command, despite it takes a bit of time.
Here is the file used for test (1.180 KB exported from Google Doc), I shared it so you can access to try export:
https://docs.google.com/document/d/18Cz7kHfEiDLeTWHyyoOi6U4kFQDMeg0D-CCJzILMMCk/edit?usp=sharing
Here is the (Java) code I'm using to perform the operation:
#Override
public GoogleDriveDocumentContent downloadFileContentAsPDF(String executionGoogleUser, String fileId) {
GoogleDriveDocumentContent documentContent = new GoogleDriveDocumentContent();
String conversionMimeType = "application/pdf";
try {
getLogger().info("GDrive APIs - Downloading file content in PDF format ...");
InputStream gDriveFileData = getDriveService(executionGoogleUser).files()
.export(fileId, conversionMimeType)
.executeMediaAsInputStream();
getLogger().info("GDrive APIs - File content as PDF format downloaded.");
documentContent.setFileName(null);
documentContent.setMimeType(conversionMimeType);
documentContent.setData(gDriveFileData);
} catch (IOException e) {
throw new RuntimeException(e);
}
return documentContent;
}
Does anyone has the same issue and know how to solve it?
The goal is to generate a pdf from a Google Doc.
Thanks
I think you should try using media downloadeder you will have to alter it for Google drive rather than storage service.
{
// Create the service using the client credentials.
var storageService = new StorageService(new BaseClientService.Initializer()
{
HttpClientInitializer = credential,
ApplicationName = "APP_NAME_HERE"
});
// Get the client request object for the bucket and desired object.
var getRequest = storageService.Objects.Get("BUCKET_HERE", "OBJECT_HERE");
using (var fileStream = new System.IO.FileStream(
"FILE_PATH_HERE",
System.IO.FileMode.Create,
System.IO.FileAccess.Write))
{
// Add a handler which will be notified on progress changes.
// It will notify on each chunk download and when the
// download is completed or failed.
getRequest.MediaDownloader.ProgressChanged += Download_ProgressChanged;
getRequest.Download(fileStream);
}
}
static void Download_ProgressChanged(IDownloadProgress progress)
{
Console.WriteLine(progress.Status + " " + progress.BytesDownloaded);
}
Code ripped from here

Webmasters API - Quota limits

We're trying to download page data for sites using the Webmasters API .NET Client Library, by calling WebmastersService.SearchAnalytics.Query(). To do this we are using Batching and sending approx. 600 requests in one batch. However most of these fail with the error "Quota Exceeded". The amount that fail varies each time but it is only about 10 of the 600 that work (and it varies where they are within the batch). The only way we can get it to work is to reduce the batch size down to 3, and wait 1 second between each call.
According to the Developer Console our daily quota is set to 1,000,000 (and we have 99% remaining) and our per user limit is set to 10,000 requests / second / user.
The error we get back is:
Quota Exceeded [403] Errors [ Message[Quota Exceeded] Location[ - ]
Reason[quotaExceeded] Domain[usageLimits]]
Is there another quota which is enforced? What does "Domain[usage limits]" mean - is the domain the site we are query the page data for, or is it our user account?
We still get the problem if we run each request separately, unless we wait 1 second between each call. Due to the number of sites and the number of pages we need to download the data for this isn't really an option.
I found this post which points out that just because the max batch size is 1000 doesn't mean to say the Google service you are calling supports batches of those sizes. But I'd really like to find out exactly what the quota limits really are (as they don't relate to the Developer Console figures) and how to avoid the errors.
Update 1
Here's some sample code. Its specially written just to prove the problem so no need to comment on it's quality ;o)
using Google.Apis.Auth.OAuth2;
using Google.Apis.Services;
using Google.Apis.Util.Store;
using Google.Apis.Webmasters.v3;
using Google.Apis.Webmasters.v3.Data;
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
new Program().Run().Wait();
}
private async Task Run()
{
List<string> pageUrls = new List<string>();
// Add your page urls to the list here
await GetPageData("<your app name>", "2015-06-15", "2015-07-05", "web", "DESKTOP", "<your domain name>", pageUrls);
}
public static async Task<WebmastersService> GetService(string appName)
{
//if (_service != null)
// return _service;
//TODO: - look at analytics code to see how to store JSON and refresh token and check runs on another PC
UserCredential credential;
using (var stream = new FileStream("c:\\temp\\WMT.json", FileMode.Open, FileAccess.Read))
{
credential = await GoogleWebAuthorizationBroker.AuthorizeAsync(
GoogleClientSecrets.Load(stream).Secrets,
new[] { Google.Apis.Webmasters.v3.WebmastersService.Scope.Webmasters },
"user", CancellationToken.None, new FileDataStore("WebmastersService"));
}
// Create the service.
WebmastersService service = new WebmastersService(new BaseClientService.Initializer()
{
HttpClientInitializer = credential,
ApplicationName = appName,
});
//_service = service;
return service;
}
private static async Task<bool> GetPageData(string appName, string fromDate, string toDate, string searchType, string device, string siteUrl, List<string> pageUrls)
{
// Get the service from the initial method
bool ret = false;
WebmastersService service = await GetService(appName);
Google.Apis.Requests.BatchRequest b = new Google.Apis.Requests.BatchRequest(service);
try
{
foreach (string pageUrl in pageUrls)
{
SearchAnalyticsQueryRequest qry = new SearchAnalyticsQueryRequest();
qry.StartDate = fromDate;
qry.EndDate = toDate;
qry.SearchType = searchType;
qry.RowLimit = 5000;
qry.Dimensions = new List<string>() { "query" };
qry.DimensionFilterGroups = new List<ApiDimensionFilterGroup>();
ApiDimensionFilterGroup filterGroup = new ApiDimensionFilterGroup();
ApiDimensionFilter filter = new ApiDimensionFilter();
filter.Dimension = "device";
filter.Expression = device;
filter.Operator__ = "equals";
ApiDimensionFilter filter2 = new ApiDimensionFilter();
filter2.Dimension = "page";
filter2.Expression = pageUrl;
filter2.Operator__ = "equals";
filterGroup.Filters = new List<ApiDimensionFilter>();
filterGroup.Filters.Add(filter);
filterGroup.Filters.Add(filter2);
qry.DimensionFilterGroups.Add(filterGroup);
var req = service.Searchanalytics.Query(qry, siteUrl);
b.Queue<SearchAnalyticsQueryResponse>(req, (response, error, i, message) =>
{
if (error == null)
{
// Process the results
ret = true;
}
else
{
Console.WriteLine(error.Message);
}
});
await b.ExecuteAsync();
}
}
catch (Exception ex)
{
Console.WriteLine("Exception occurred getting page stats : " + ex.Message);
ret = false;
}
return ret;
}
}
}
Paste this into program.cs of a new console app and add Google.Apis.Webmasters.v3 via nuget. It looks for the wmt.json file in c:\temp but adjust the authentication code to suit your setup. If I add more than 5 page urls to the pageUrls list then I get the Quota Exceeded exception.
I've found that the stated quotas don't really seem to be the quotas. I had to slow my requests down to avoid this same issue (1/sec), even though I was always at or below the stated rate limit (20/sec). Furthermore, it claims that it gives a rateLimitExceeded error in the docs for going too fast, but really it returns a quotaExceeded error. It might have to do with how Google averages the rate of requests over time (as some of the requests we made were simultaneous, even though the long-run average was designed to be at or below 20/sec), but I cannot be sure.

Windows Workflow Foundation 4.0 and Tracking

I'm working with the Beta 2 version of Visual Studio 2010 to get some advanced learning using WF4. I've been working with the SqlTracking Sample in the WF_WCF_Samples SDK, and have gotten a pretty good understanding of how to emit and store tracking data in a SQL Database, but haven't seen anything on how to query the data when needed. Does anyone know if there are any .Net classes that are to be used for querying the tracking data, and if so are there any known samples, tutorials, or articles that describe how to query the tracking data?
According to Matt Winkler, from the Microsoft WF4 Team, there isn't any built in API for querying the tracking data, the developer must write his/her own.
These can help:
WorkflowInstanceQuery Class
Workflow Tracking and Tracing
Tracking Participants in .NET 4 Beta 1
Old question, I know, but there is actually a more or less official API in AppFabric: Windows Server AppFabric Class Library
You'll have to find the actual DLL's in %SystemRoot%\AppFabric (after installing AppFabric, of course). Pretty weird place to put it.
The key classes to look are at are SqlInstanceQueryProvider, InstanceQueryExecuteArgs. The query API is asynchronous and can be used something like this (C#):
public InstanceInfo GetWorkflowInstanceInformation(Guid workflowInstanceId, string connectionString)
{
var instanceQueryProvider = new SqlInstanceQueryProvider();
// Connection string to the instance store needs to be set like this:
var parameters = new NameValueCollection()
{
{"connectionString", connectionString}
};
instanceQueryProvider.Initialize("Provider", parameters);
var queryArgs = new InstanceQueryExecuteArgs()
{
InstanceId = new List<Guid>() { workflowInstanceId }
};
// Total ruin the asynchronous advantages and use a Mutex to lock on.
var waitEvent = new ManualResetEvent(false);
IEnumerable<InstanceInfo> retrievedInstanceInfos = null;
var query = instanceQueryProvider.CreateInstanceQuery();
query.BeginExecuteQuery(
queryArgs,
TimeSpan.FromSeconds(10),
ar =>
{
lock (synchronizer)
{
retrievedInstanceInfos = query.EndExecuteQuery(ar).ToList();
}
waitEvent.Set();
},
null);
var waitResult = waitEvent.WaitOne(5000);
if (waitResult)
{
List<InstanceInfo> instances = null;
lock (synchronizer)
{
if (retrievedInstanceInfos != null)
{
instances = retrievedInstanceInfos.ToList();
}
}
if (instances != null)
{
if (instances.Count() == 1)
{
return instances.Single();
}
if (!instances.Any())
{
Log.Warning("Request for non-existing WorkflowInstanceInfo: {0}.", workflowInstanceId);
return null;
}
Log.Error("More than one(!) WorkflowInstanceInfo for id: {0}.", workflowInstanceId);
}
}
Log.Error("Time out retrieving information for id: {0}.", workflowInstanceId);
return null;
}
And just to clarify - this does NOT give you access to the tracking data, which are stored in the Monitoring Database. This API is only for the Persistence Database.

Best Technology to use to set up an audio file cache

We have a client application that allows users to download full length 192Kb/s MP3 audio files. Because the files are stored externally to us as a business, we need to be able to:
1) Copy file from external location into a local Server cache
2) Copy that file to the client that requested it
Obviously further requests on the same file would come from the cache and would not need to go external.
Now, we already have a current system that does this (using a Squid Cache), but the problem is that 2 only executes once 1 is fully complete. This means that if a 10min long 192kb/s track takes 75 seconds to be copied from an external location into the cache, the client's HTTP timeout kicks in at about 60 seconds! This does not fulfil our requirements.
It seems that what we need is a cache that can transfer out to a client WHILE it is getting data from an external location. And my questions are:
1) Can this be done with a Squid Cache (this is the legacy incumbent and not my choice)?
2) If not, what technology would be the most suited for this kind of scenario (cost is not really an issue)?
Please let me know if this isn't clear in any way!
Here's an asp.net handler I wrote a while back to proxy some stuff from another server. It wouldn't be that hard to write to file and use the file second time round. Flushing the response in the loop would make it deliver while downloading:
namespace bla.com
{
/// <summary>
/// Summary description for $codebehindclassname$
/// </summary>
[WebService(Namespace = "http://tempuri.org/")]
[WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]
public class Proxy : IHttpHandler
{
private static Regex urlRegex=new Regex(#"http://some_regex_here_to_prevent_abuse_of_proxy.mp3",RegexOptions.Compiled);
public void ProcessRequest(HttpContext context)
{
var targetUrl = context.Request.QueryString["url"];
MatchCollection matches = urlRegex.Matches(targetUrl);
if (matches.Count != 1 || matches[0].Value != targetUrl)
{
context.Response.StatusCode = 403;
context.Response.ContentType = "text/plain";
context.Response.Write("Forbidden");
return;
}
HttpWebRequest req = (HttpWebRequest) WebRequest.Create(targetUrl);
Stream responseStream;
using (HttpWebResponse response = (HttpWebResponse)req.GetResponse())
{
responseStream = response.GetResponseStream();
context.Response.ContentType = response.ContentType;
byte[] buffer = new byte[4096];
int amt;
while ((amt = responseStream.Read(buffer, 0, 4096))>0)
{
context.Response.OutputStream.Write(buffer, 0, amt);
Debug.WriteLine(amt);
}
responseStream.Close();
response.Close();
}
context.Response.Flush();
}
public bool IsReusable
{
get
{
return false;
}
}
}
}

Resources