I'm trying to get the src values for all img tags in an HTML page using Chromeless. My current implementation is something like this:
async function run() {
const chromeless = new Chromeless();
let url = 'http://someurl/somepath.html';
var allImgUrls = await chromeless
.goto(url)
.evaluate(() => document.getElementsByTagName('img'));
var htmlContent = await chromeless
.goto(url)
.evaluate(() => document.documentElement.outerHTML );
console.log(allImgUrls);
await chromeless.end()
}
The issue is, I'm not getting any values of img object in the allImgUrls.
After some research, found out that we could use this approach:
var imgSrcs = await chromeless
.goto(url)
.evaluate(() => {
/// since document.querySelectorAll doesn't actually return an array but a Nodelist (similar to array)
/// we call the map function from Array.prototype which is equivalent to [].map.call()
const srcs = [].map.call(document.querySelectorAll('img'), img => img.src);
return JSON.stringify(srcs);
});
Related
In scraping the following website, I am not get the table in order to scrape. I am waiting for the dynamic text to load. But I never see the results of the correct table.
https://masseyratings.com/nba/games
Here is my Agility Pack code:
var url = "https://masseyratings.com/nba/games";
HtmlWeb web = new HtmlWeb();
var doc = web.LoadFromBrowser(url, o =>
{
var webBrowser = (WebBrowser)o;
// WAIT until the dynamic text is set
return !string.IsNullOrEmpty(webBrowser.Document.GetElementById("mytable0").InnerText);
});
int docLen = doc.Text.Length;
currentSiteData = doc.Text.ToString();
I am not getting any error, I am just not seeing the table of data. And strangely, the HTML tags are getting capitalized.
How can I get the correct data into the currentsiteData variable to further process?
I was able to fix the problem by using the "PuppeteerSharp" and "AngleSharp" nuget package.
Here is my code that works.
using PuppeteerSharp;
using AngleSharp;
var browserFetcher = new BrowserFetcher();
await browserFetcher.DownloadAsync(BrowserFetcher.DefaultChromiumRevision);
var browser = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true
});
var page = await browser.NewPageAsync();
await page.GoToAsync("https://masseyratings.com/nba/games");
var content = await page.GetContentAsync();
var context = BrowsingContext.New(AngleSharp.Configuration.Default);
var document = await context.OpenAsync(req => req.Content(content));
var currentSiteData = document.Source.Text.ToString();
I have created a script in Google Apps Scripts and Google Sheets that returns some speed metrics from the urls that are pasted in the sheet.
The script works good, the only problem is that it takes forever to present the results in the sheet. It makes a call for each url, I suspect that's why it's slow.
Is there any way I can optimise this script so it gives me the results faster?
Screenshot
The code:
const sheet = SpreadsheetApp.getActiveSpreadsheet();
const API_STRING = sheet.getSheetByName("instructions").getRange("K10").getValues();
const PLATFORM = sheet.getSheetByName("urls").getRange("B1").getValues();
const OUTPUT_CELL = sheet.getSheetByName("urls").getRange("B5:" + ("K" + sheet.getLastRow()));
console.log(PLATFORM);
// KPI
const lighthouseMetrics = [
"first-contentful-paint",
"largest-contentful-paint",
'interactive',
"cumulative-layout-shift",
"speed-index",
"total-blocking-time"
]
const fieldData = [
"FIRST_CONTENTFUL_PAINT_MS",
"LARGEST_CONTENTFUL_PAINT_MS",
"FIRST_INPUT_DELAY_MS",
"CUMULATIVE_LAYOUT_SHIFT_SCORE"
]
// CALLING FUNCTION
async function fetch_array() {
let URLS_LIST = sheet.getSheetByName("urls").getRange("A5:" + ("A" + sheet.getLastRow())).getValues();
console.log(URLS_LIST)
let arrayData = [];
for (let element of URLS_LIST) {
let dataEl = await getPageSpeedInfo(PLATFORM, element);
let dataRow = produceArray(dataEl);
arrayData.push(dataRow);
}
return OUTPUT_CELL.setValues(arrayData);
}
// PRODUCE ARRAY WITH KPIS
function produceArray(data) {
let kpiArray = [];
fieldData.forEach(function(item) {
let fieldDataRoute = data.loadingExperience.metrics[item].category;
kpiArray.push(fieldDataRoute);
})
lighthouseMetrics.forEach(function(item) {
let lighthouseRoute = data.lighthouseResult.audits[item].displayValue;
kpiArray.push(lighthouseRoute);
})
return kpiArray;
}
// CALL TO API
async function getPageSpeedInfo(strategy, element) {
let pageSpeedUrl = 'https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url=' + element + '&key=' + API_STRING + '&strategy=' + strategy;
console.log(pageSpeedUrl);
let response = await UrlFetchApp.fetch(pageSpeedUrl);
let data = await response.getContentText();
return JSON.parse(data);
}
I need to convert a NetworkImage to an ui.Image.
I tried to use the given solution from this question with some adjusts but it isn't working.
Someone can help me?
Uint8List yourVar;
ui.Image image;
final DecoderCallback callback =
(Uint8List bytes, {int cacheWidth, int cacheHeight}) async {
yourVar = bytes.buffer.asUint8List();
var codec = await instantiateImageCodec(bytes,
targetWidth: cacheWidth, targetHeight: cacheHeight);
var frame = await codec.getNextFrame();
image = frame.image;
return image;
};
ImageProvider provider = NetworkImage(yourImageUrl);
provider.obtainKey(createLocalImageConfiguration(context)).then((key) {
provider.load(key, callback);
});
first create a field in your class:
var cache = MapCache<String, ui.Image>();
then to get ui.Image you can simply call:
var myUri = 'http:// ...';
var img = await cache.get(myUri, ifAbsent: (uri) {
print('getting not cached image from $uri');
return http.get(uri).then((resp) => decodeImageFromList(resp.bodyBytes));
});
print('image: $img');
of course you should add some http response error handling but this is the base idea...
I have a million records in my table. I want to call a soap service and i need to do process in all the records in less than one hour. and besides i should update my table , insert the requests and responses in my other tables. but the code below works on less than 10 records every time i run my app.
I know My code is wrong,, I want to know what is the best way to do it.
static async Task Send( )
{
var results = new ConcurrentDictionary<string, int>();
using (AppDbContext entities = new AppDbContext())
{
var List = entities.Request.Where(x => x.State == RequestState.InitialState).ToList();
Parallel.ForEach(Enumerable.Range(0, List.Count), async index =>
{
var selected = List.FirstOrDefault();
List.Remove( selected );
var res1 = await DoAsyncJob1(selected); ///await
// var res = CallService(selected);
var res2 = await DoAsyncJob2(selected); ///await
var res3 = await DoAsyncJob3(selected); ///await
// var responses = await Task.WhenAll(DoAsyncJob1, DoAsyncJob2, DoAsyncJob3);
// results.TryAdd(index.ToString(), res);
});
}
}
static async Task<int> DoAsyncJob1(Request item)
{
using (AppDbContext entities = new AppDbContext())
{
var bReq = new BankRequest();
bReq.Amount = Convert.ToDecimal(item.Amount);
bReq.CreatedAt = DateTime.Now;
bReq.DIBAN = item.DIBAN;
bReq.SIBAN = item.SIBAN;
entities.BankRequest.Add(bReq);
entities.SaveChanges();
}
return item.Id;
}
static async Task<int> DoAsyncJob2(Request item)
{
using (AppDbContext entities = new AppDbContext())
{
}
return item.Id;
}
static async Task<int> DoAsyncJob3(Request item)
{
using (AppDbContext entities = new AppDbContext())
{
}
return item.Id;
}
Maybe the below lines are wrong :
var selected = List.FirstOrDefault();
List.Remove( selected );
Thanks in advance..
First, it is a bad practice to use async-await within Parallel.For - you introduce only more load to the task scheduler and more overhead.
Second, you are right:
var selected = List.FirstOrDefault();
List.Remove( selected );
is very, very wrong. Your code will behave in a totally unpredictable way, due to the race conditions.
I'm just wondering if someone could explain what is happening here.
Given this Post method on an API controller:
public HttpResponseMessage PostImage()
{
var request = HttpContext.Current.Request;
var c = SynchronizationContext.Current;
var result = new HttpResponseMessage(HttpStatusCode.OK);
if (Request.Content.IsMimeMultipartContent())
{
Request.Content.ReadAsMultipartAsync(new MultipartMemoryStreamProvider()).ContinueWith((task) =>
{
MultipartMemoryStreamProvider provider = task.Result;
foreach (HttpContent content in provider.Contents)
{
Stream stream = content.ReadAsStreamAsync().Result;
Image image = Image.FromStream(stream);
var uploadFileName = content.Headers.ContentDisposition.FileName;
var requestInside = HttpContext.Current.Request; // this is always null
string filePath = Path.Combine(HostingEnvironment.MapPath(ConfigurationManager.AppSettings["UserFilesRootDir"]), userprofile.UserCode);
//string[] headerValues = (string[])Request.Headers.GetValues("UniqueId");
string fileName = userprofile.UserCode + ".jpg";
string fullPath = Path.Combine(filePath, fileName);
image.Save(fullPath);
}
});
return result;
}
}
Why would var requestInside = HttpContext.Current.Request; be null?
I've checked all the relevant settings:
<compilation debug="true" targetFramework="4.5">
...
<httpRuntime targetFramework="4.5"
And SynchronizationContext.Current is the newer AspNetSynchronizationContext rather than LegacyAspNetSynchronizationContext.
I'm presuming at the moment that it's because I'm on a different thread, is this a correct assumption?
ContinueWith is not guaranteed to run on the same thread hence the synchronization context could be lost. You could change your call to specify to resume on the current thread with parameter TaskScheduler.Current. See this previous SO answer.
If you use await/async pattern it will automatically capture the current syncronization context on resume once an awaitable operation completes. This is done by resuming the operation on the same thread which is bound to that context. An added benefit, IMHO, is cleaner looking code.
You can change your code to this which uses that pattern. I have not made any other changes to it other than use async/await.
public async Task<HttpResponseMessage> PostImage()
{
var request = HttpContext.Current.Request;
var c = SynchronizationContext.Current;
var result = new HttpResponseMessage(HttpStatusCode.OK);
if (Request.Content.IsMimeMultipartContent())
{
MultipartMemoryStreamProvider provider = await Request.Content.ReadAsMultipartAsync(new MultipartMemoryStreamProvider());
foreach (HttpContent content in provider.Contents)
{
Stream stream = await content.ReadAsStreamAsync();
Image image = Image.FromStream(stream);
var uploadFileName = content.Headers.ContentDisposition.FileName;
var requestInside = HttpContext.Current.Request; // this is always null
string filePath = Path.Combine(HostingEnvironment.MapPath(ConfigurationManager.AppSettings["UserFilesRootDir"]), userprofile.UserCode);
//string[] headerValues = (string[])Request.Headers.GetValues("UniqueId");
string fileName = userprofile.UserCode + ".jpg";
string fullPath = Path.Combine(filePath, fileName);
image.Save(fullPath);
}
}
return result;
}