PuppeteerSharp - anyone knows why it is always triggering reCaptcha? - recaptcha

I am trying to get data from one page, but I need to log in first. When using Puppeteer I am always stuck with reCaptcha. No matter if I use headless or not (it took me a while to figure out it was captcha as it was not rendered correctly).
When I manually log in using Chrome on the same machine, captcha is not displayed and I can log in. Also works if I use CefSharp.ChromiumWebBrowser.
I could not find any specific answer, but if someone's been there and it is a lost cause, please let me know. I cannot use any alternative here, so ChromiumWebBrowser will be my choice then.
Here is part of my code:
browser = await Puppeteer.LaunchAsync(new LaunchOptions { Headless = true, ExecutablePath = #"c:\Program Files (x86)\Google\Chrome\Application\chrome.exe", UserDataDir = #"C:/Users/XXXX/AppData/Local/Google/Chrome/User Data/Default/" });
page = await browser.NewPageAsync();
await page.SetViewportAsync(new ViewPortOptions() { IsMobile = false });
await page.SetUserAgentAsync("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36");
await page.GoToAsync(pUrl);
await page.WaitForXPathAsync(pWaitingExpression);
var cookies = await GetElementValue(null, "//button[#id='xyz']", false);
if (cookies != null)
{
await page.EvaluateExpressionAsync("document.getElementById('xyz').click();");
}
var loginNode = await GetElementValue(null, "//span[#id='XXX']", false);
if (loginNode != null)
{
await page.EvaluateExpressionAsync("document.getElementById('XXX').click();");
await page.WaitForXPathAsync("//div[#id='XXX']");
await page.EvaluateExpressionAsync("document.getElementById('XXX').value = 'XXXX';");
await page.EvaluateExpressionAsync("document.getElementById('XXX').value = 'XXXXX';");
await page.EvaluateExpressionAsync("document.evaluate('//div[#id=\"XXX\"]/input[#type=\"submit\"]', document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null).singleNodeValue.click();");
await page.WaitForXPathAsync("//div[#id='ZZX']");
}
UPDATE:
I think I know the reason why I am getting captcha in this particular case, but not the others. In non-headless mode I see following information below address bar - "Chrome is being controlled by automated test software". Based on this I followed up my research and found this page.
[https://intoli.com/blog/not-possible-to-block-chrome-headless/][1]
I included following code before navigating to my url:
await page.EvaluateExpressionOnNewDocumentAsync("Object.defineProperty(navigator,'webdriver', { get: () => false, });");
Unfortunately still no luck. I can see that text blinks before loading page, which indicates that property is reset, but then immediately changed back.
So I believe I answered original question - WHY, but now I need answer HOW to bypass that. Anyone knows?

I answered this myself after some more research - PuppeteerExtraSharp.
puppeteerExtra = new PuppeteerExtra();
stealth = new PuppeteerExtraSharp.Plugins.ExtraStealth.StealthPlugin();
puppeteerExtra.Use(stealth);
browser = await puppeteerExtra.LaunchAsync(new LaunchOptions { Headless = false, ExecutablePath = #"c:\Program Files (x86)\Google\Chrome\Application\chrome.exe" });
In case someone like me has already PuppeterSharp added to the project - it seems Extra requires lower level of original.

Related

Using ZXingScannerPage with XF, my content page has weird behavior

I am making an app in xamarin forms of which I will have a login similar to that of whatapp web, an on-screen qr that will be scanned by the phone, in the emulator with visual studio 2017 I have no problems, but when I export the app to an apk and the I install on a mobile device, the app reads the qr and returns to the previous login screen, not showing any reaction, which should be to go to the next screen where I have a dashboard.
What can be? I enclose my code used.
btnScanQRCode.IsEnabled = false;
var scan = new ZXingScannerPage();
scan.OnScanResult += (result) =>
{
scan.IsScanning = false;
Device.BeginInvokeOnMainThread(async () =>
{
await Application.Current.MainPage.Navigation.PopAsync();
var resultado = JsonConvert.DeserializeObject<QrCode>(result.Text);
JObject qrObject = JObject.Parse(JsonConvert.SerializeObject(resultado));
JsonSchema schema = JsonSchema.Parse(SettingHelper.SchemaJson);
bool valid = qrObject.IsValid(schema);
if (valid == true)
{
App.Database.InsertQrCode(resultado);
QrCode qr = App.Database.GetQrCode();
await _viewModel.Login();
await Navigation.PushAsync(new Organization());
}
else
{
await DisplayAlert("False", JsonConvert.SerializeObject(resultado), "ok");
}
});
};
await Application.Current.MainPage.Navigation.PushAsync(scan);
btnScanQRCode.IsEnabled = true;
This was originally a comment, but through the writing i realized this is the answer.
You need to debug your code. Attach a device and deploy the app in Debug config. Step through your code and see where it fails.
It sounds like it's crashing silently and probably on the line where you Deserialize result.Text in a QrCode. result.Text is just a string and will never deserialize into an object. You probably need a constructor that takes a string like QrCode(result.Text).
First scan then use the result to do other things in your app.
var scanner = new ZXing.Mobile.MobileBarcodeScanner();
var result = await scanner.Scan();
Check for proper camera permissions. I bet your problem is there.

How do I save a storage file to a location that the user chooses in UWP?

I'm opening a file via:
await Windows.System.Launcher.LaunchFileAsync(storageFile, options);
The documentation for LaunchFileAsync says:
When the launch fails for any of the above reasons, the API succeeds
and returns FALSE from its asynchronous operation. Since it has no
ability to query whether the above restrictions apply to the current
launch, the calling app should not assume that the launch succeeded,
and should provide fallback mechanism in case it failed. A possible
solution would be to ask the user to save the file and direct the user
to open it in the desktop.
What's the most straightforward way to do that?
I tried:
var picker = new FolderPicker();
var pfolder = await picker.PickSingleFolderAsync();
StorageApplicationPermissions.FutureAccessList.Add(pfolder);
var folder = await StorageFolder.GetFolderFromPathAsync(pfolder.Path);
var file = await folder.CreateFileAsync(storageFile.Name);
using (var writer = await file.OpenStreamForWriteAsync())
{
await writer.WriteAsync(storageFile,0,0);
}
But unfortuantely writer.WriteAsync only takes Bytes[] and not the StorageFile. How do I get my StorageFile saved?

addFileAttachmentAsync() is not functioning as expected on Outlook Desktop on Windows 10 machine

I am trying to add inline image to mail body through Outlook Add-in. It works fine in OWA but Desktop app fails to attach it inline, instead I get the image as a regular attachment, and broken image icon on email body.
I contacted Microsoft Devchat, they don't seem to able to repro it, I tried the code they sent me , and it behaves the same.
Here is the code:
function AttCallback(asyncResult) {
if (asyncResult.status == Office.AsyncResultStatus.Failed) {
console.log(asyncResult.error);
} else {
var szCID = asyncResult.asyncContext.UniqueName;
var szAddBodyData = "<p>Here's a cute bird!</p><br><div><img src='cid:" + szCID + "'></div><br>";
Office.context.mailbox.item.body.setSelectedDataAsync(
szAddBodyData,
{ coercionType: Office.CoercionType.Html });
console.log("Attachment added");
}
}
function insertAttachment() {
var szName = "cute_bird.png";
var options = { isInline: true, ContentId: szName, 'asyncContext': { UniqueName: szName } };
//var options = { asyncContext: null };
Office.context.mailbox.item.addFileAttachmentAsync(
"http://i.imgur.com/WJXklif.png",
szName,
options,
AttCallback);
}
Here is what is happening on my machine.
Note: As you can see from the code, by the time callback function gets hit, attachment was already added. However I do have inline property set to true.
Has anyone experienced it before? Any suggestions would be appreciated.
See from the documentation:
https://learn.microsoft.com/en-us/office/dev/add-ins/reference/objectmodel/requirement-set-1.5/outlook-requirement-set-1.5
that inline image addition support shipped with Outlook requirement set 1.5. You should specify this capability in your manifest.xml to ensure your add-in is only appearing in clients where it can work and not show up if it can't.

WaitUntil not waiting / Get HTML on WaitForSelectorAsync

Having two problems that I would appreciate some advise on. Have used puppeteer in the past in node, but for some reason, running into a problem on the sharp version.
Basically I'm crawling a webpage with a WaitUntil set to WaitUntilNavigation.Networkidle0, the longest wait period. In my node code, this runs and loads my website correctly, but in the C# version, I get the page without angular loaded. From the best I can tell it is not waiting and returning the initial Load state. Below is my code.
if (BROWSER == null)
{
await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
BROWSER = await Puppeteer.LaunchAsync(new LaunchOptions
{
Headless = true,
Args = new string[] { "--no-sandbox", "--disable-accelerated-2d-canvas", "--disable-gpu", "--proxy-server='direct://'", "--proxy-bypass-list=*" }
});
}
if (page == null)
{
page = await BROWSER.NewPageAsync();
await page.SetUserAgentAsync("PScraper-SiteCrawler");
await page.SetViewportAsync(new ViewPortOptions() { Width = 1024, Height = 842 });
var response = await page.GoToAsync(url, new NavigationOptions() { Referer = "PScraper-SiteCrawler", Timeout = timeoutMilliseconds, WaitUntil = new[] { WaitUntilNavigation.Networkidle0 } });
}
Timeout is set to 30 seconds, or 30,000 milliseconds. I then get the html of the page doing
await reponse.TextAsync()
My second question is unrelated, but likely simpler to solve. One route I was considering was using the page.WaitForSelectorAsync() method. This appears to wait until the content I'm looking for is loaded, but I haven't been able to figure out how to get the entire html of the page after this is done from the ElementHandle return.
Would appreciate some help here, tried a couple routes and haven't been able to figure out whats causing the difference between the node and C# code.
Solved my problem. The issue was how I was getting the html of the page.
I was using...
await reponse.TextAsync()
Apparently, this gets me only the initial response. When I changed my html get to the following line of code everything worked as expected.
await page.GetContentAsync()

Detecting Gmail attachment downloads

Is there a way to detect if a particular file that is being downloaded is a Gmail attachment?
I am looking for a way to write a Greasemonkey script which would help me organize the downloads, based on their download sources, say Gmail email attachments would have a different behavior from other stuff.
So far, I've found out that attachments redirect to https://mail-attachment.googleusercontent.com/attachment/u/0/ , which I guess is not sufficient.
EDIT
Since an add-on would be more powerful than a userscript, I've decided to pursue the Add On idea. However, the problem of detection remains unsolved.
This is too complicated for just one question; it has at least these major parts:
Do you want to redirect downloads when the user clicks, or automatically download select files? Clarify the question.
Your GM script must identify the appropriate download links, and on which pages, and for which views? For gMail, this is not a trivial task, and the question needs to be clearer. It's worthy of a whole question just on this issue given the variety of views and AJAX involved.
Once identified, the script probably needs to intercept clicks on those links. (Depends on your goal (clarify!) and what the Firefox extension can do.)
Greasemonkey needs to interact with an extension that either intercepts the user-initiated download, or allows for an automatic download. I've detailed the auto-download approach, below.
Once your script has identified the appropriate file URLs and/or links (Open a new question for more help with that, and include pictures of the types of pages and links you want.), it can interface with a Firefox add-on, like the one below, to automatically save those files.
Automatically saving files from Greasemonkey with the help of an additional Add-on:
WARNING: The following is a working proof of concept for education only. It has no security features, and if you use it as-is, for actual surfing, some webpage or script writer or extension writer will use it to completely pwn your computer.
If you use the Add-on builder or SDK to install or "Test" the DANGER. DANGER. DANGER. File download utility,
Then you can use a Greasemonkey script, like this, to automatically save files:
// ==UserScript==
// #name _Call our File download add-on to trigger a file download.
// #include https://mail.google.com/mail/*
// #include https://stackoverflow.com/questions/14440362/*
// #require http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js
// #grant GM_addStyle
// ==/UserScript==
/*- The #grant directive is needed to work around a design change
introduced in GM 1.0. It restores the sandbox.
*/
var fileURL = "http://userscripts.org/scripts/source/29222.user.js";
var savePath = "D:\\temp\\";
var extensionLoaded = false;
window.addEventListener ("ImAlivefromExtension", function (zEvent) {
console.log ("The test extension appears to be loaded!", zEvent.detail);
extensionLoaded = true;
} );
window.addEventListener ("ReplyToDownloadRequest", function (zEvent) {
//var xxxx = JSON.parse (zEvent.detail);
console.log ("Extension replied: ", zEvent.detail);
} );
$("body").prepend ('<button id="gmFileDownloadBtn">Click to File download request.</button>');
$("#gmFileDownloadBtn").click ( function () {
if (extensionLoaded) {
detailVal = JSON.stringify (
{targFileURL: fileURL, targSavePath: savePath}
);
var zEvent = new CustomEvent (
"SuicidalDownloadRequestToAddOn",
{"detail": detailVal }
);
window.dispatchEvent (zEvent);
}
else {
alert ("The file download extension is not loaded!");
}
} );
You can test the script on this SO question page.
Note that any other extension, userscript, web page, or plugin can listen to or send spoof events, the only security, so far, is to limit which pages the extension runs on.
For reference, the extension source files are below. The rest is supplied by Firefox's Add-on SDK.
The content script:
var zEvent = new CustomEvent ("ImAlivefromExtension",
{"detail": "GM, DANGER, DANGER, DANGER, File download utility" }
);
window.dispatchEvent (zEvent)
window.addEventListener ("SuicidalDownloadRequestToAddOn", function (zEvent) {
console.log ("Extension received download request: ", zEvent.detail);
//-- Relay request to extension main.js
self.port.emit ("SuicidalDownloadRequestRelayed", zEvent.detail);
//-- Reply back to GM, or whoever is pretending to be GM.
var zEvent = new CustomEvent ("ReplyToDownloadRequest",
{"detail": "Your funeral!" }
);
window.dispatchEvent (zEvent)
} );
The background JS:
//--- For security, MAKE THESE AS RESTRICTIVE AS POSSIBLE!
const includePattern = [
'https://mail.google.com/mail/*',
'https://stackoverflow.com/questions/14440362/*'
];
let {Cc, Cu, Ci} = require ("chrome");
Cu.import ("resource://gre/modules/Services.jsm");
Cu.import ("resource://gre/modules/XPCOMUtils.jsm");
Cu.import ("resource://gre/modules/FileUtils.jsm");
let data = require ("sdk/self").data;
let pageMod = require ('sdk/page-mod');
let dlManageWindow = Cc['#mozilla.org/download-manager-ui;1'].getService (Ci.nsIDownloadManagerUI);
let fileURL = "";
let savePath = "";
let activeWindow = Services.wm.getMostRecentWindow ("navigator:browser");
let mod = pageMod.PageMod ( {
include: includePattern,
contentScriptWhen: 'end',
contentScriptFile: [ data.url ('ContentScript.js') ],
onAttach: function (worker) {
console.log ('DANGER download utility attached to: ' + worker.tab.url);
worker.port.on ('SuicidalDownloadRequestRelayed', function (message) {
var detailVal = JSON.parse (message);
fileURL = detailVal.targFileURL;
savePath = detailVal.targSavePath;
console.log ("Received request to \ndownload: ", fileURL, "\nto:", savePath);
downloadFile (fileURL, savePath);
} );
}
} );
function downloadFile (fileURL, savePath) {
dlManageWindow.show (activeWindow, 1);
try {
let newFile;
let fileURIToDownload = Services.io.newURI (fileURL, null, null);
let persistWin = Cc['#mozilla.org/embedding/browser/nsWebBrowserPersist;1']
.createInstance (Ci.nsIWebBrowserPersist);
let fileName = fileURIToDownload.path.slice (fileURIToDownload.path.lastIndexOf ('/') + 1);
let fileObj = new FileUtils.File (savePath);
fileObj.append (fileName);
if (fileObj.exists ()) {
console.error ('*** Error! File "' + fileName + '" already exists!');
}
else {
let newFile = Services.io.newFileURI (fileObj);
let newDownload = Services.downloads.addDownload (
0, fileURIToDownload, newFile, fileName, null, null, null, persistWin, false
);
persistWin.progressListener = newDownload;
persistWin.savePrivacyAwareURI (fileURIToDownload, null, null, null, "", newFile, false);
}
} catch (exception) {
console.error ("Error saving the file! ", exception);
dump (exception);
}
}
So far from what you are saying,the only thing you can do is making add-on(Firefox) and Extension(for chrome if you want).
If you have closer look at download of attachment,it happens when:
1) You click on icon of attachments
2) If you click download
For these two things you can find the click event of <a> tag containing download_url value.You can easily do that using js/jquery for creting extension.
So you can write the functionality when user tries to download attachment.
You could use Gmail contextual gadgets to modify the behavior on the Google side:
Gmail Contexual Gadgets
Contextual Gadgets don't have direct access to attachments but server side, you could use IMAP to access the attachment (based on the Gmail message ID identified by the gadget):
Gmail IMAP Extensions
Using gadgets and server-side IMAP has the advantage of being browser-agnostic.
It's not entirely clear what you want to do differently with the downloaded Gmail attachment as opposed to any given download (save it to a different location? Perform actions upon the attachment data?) But the contextual gadget and IMAP should give you some chance to modify the attachment data as needed before the browser download begins.

Resources