HTTPBuilder - How can I get the HTML content of a web page? - html-content-extraction

I need to extract the HTML of a web page
I'm using HTTPuilder in groovy, making the following get:
def http = new HTTPBuilder('http://www.google.com/search')
http.request(Method.GET) {
requestContentType = ContentType.HTML
response.success = { resp, reader ->
println "resp: " + resp
println "READER: " + reader
}
response.failure = { resp, reader ->
println "Failure"
}
}
The response I get, does not contain the same html I can see when I explore the html source of www.google.com/search. In fact, it's neither an html, and does not contains the same info I can see in the html source of the page.
I've tried setting differents headers (for example, headers.Accept = 'text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8', headers.Accept = 'text/html', seting the user-agent, etc), but the result is the same.
How can I get the html of www.google.com/search (or any web page) using http builder?

Why use httpBuilder? You might instead use
def url = "http://www.google.com/".toURL()
println url.text`
to extract the content of the webpage

Because the httpbuilder will auto parse the result by the content type.
to get the raw html, try to get text from Entity
def htmlResult = http.get(uri: url, contentType: TEXT){ resp->
return resp.getEntity().getContent().getText()
}

Related

Is there a way to map local in proxyman based off of parameters attached to the body of a url?

I have a url:
https://cn.company.com/appv2/search
and want to have a different map local depending on the request coming with a different parameter in the body (i.e. it is NOT attached to the url like https://cn.company.com/appv2/search?cursor=abc. Instead it is in the body of the request { cursor: abc }.
Any idea on if this can be done in proxyman?
I basically want to be able to stub pagination through the proxy without waiting on a server implementation. So I'd have no cursor on the first request, server would return a cursor and then use that on the next request and get a different response from server on the request so that I can test out the full pagination flow.
Yes, it can be solved with the Scripting from Proxyman app.
Use Scripting to get the value of the request body
If it's matched, use Scripting to mimic the Map Local (Mock API also supports)
Here is the sample code and how to do it:
Firstly, call your request and make sure you can see the HTTPS Response
Right-Click on the request -> Tools -> Scripting
Select the Mock API checkbox if you'd like a Mock API
Use this code
/// This func is called if the Response Checkbox is Enabled. You can modify the Response Data here before it goes to the client
/// e.g. Add/Update/Remove: headers, statusCode, comment, color and body (json, plain-text, base64 encoded string)
///
async function onResponse(context, url, request, response) {
// get the value from the body request
var cursorValue = request.body["cursor"];
// Use if to provide a map local file
if (cursorValue === "abc") {
// Set Content Type as a JSON
response.headers["Content-Type"] = "application/json";
// Set a Map Local File
response.bodyFilePath = "~/Desktop/my_response_A.json";
} else if (cursorValue === "def") {
// Set Content Type as a JSON
response.headers["Content-Type"] = "application/json";
// Set a Map Local File
response.bodyFilePath = "~/Desktop/my_response_B.json";
}
// Done
return response;
}
Reference
Map Local with Scripting: https://docs.proxyman.io/scripting/snippet-code#map-a-local-file-to-responses-body-like-map-local-tool-proxyman-2.25.0+

ASP.NET MVC - ajax call to server side for pdf files?

I am trying to make an Ajax call to my app's controller to get some PDF files as follows:
function AjaxCallImages(URL) {
var result = $.ajax({
type: "GET",
url: URL,
success: SuccessFunctionImages,
error: ErrorFunction
});
return result;
}
On the server side of my web app (in the model), I am reading files from a remote server:
public static List<byte[]> GetFiles()
{
List<byte[]> files = new List<byte[]>();
string uri = #"\\REMOTER_SERVER_IP\Users\Public\myfolder";
string[] filesInfo = Directory.GetFiles(uri);
foreach (string fPath in filesInfo)
{
string fileName = MYPATH;
using (var webClient = new WebClientNoKeepAlive())
{
byte[] filedata = webClient.DownloadData(fPath);
files.Add(filedata);
}
}
return files;
}
and the result (List) is sent back to the controller in the ajax call.
What I receive is arrays of strings.
I need to show these pdf files in the browser but I am not sure how could I do this and if using Ajax is a good idea. But since I want to show images without reloading the page, I opted for Ajax. It there a good solution to do so? I would appreciate if someone kindly guide me about that. Thank you

outlook add-in image & files

I try to find solution to my problems but didn't find any where,hope that someone here can save me.
I write add-in in JavaScript on VS2015 that encrypte and decrypte body messages.
1. The first problem is with images that the receiver can't see .
(Talk about images that copy into the body by "insert picture inline" button)
In Compose mode we encrypte the message and then when we decrypte it's works good because the compose mode is a client side and he his recognize the local images .
In read mode when user want to decrypte the message and to see the images he couldn't see because the encrypte prevent outlook to convert the local image to data on the server .
In my code I take the body message like this ( compose mode )
item.body.getAsync(
item.body.getAsync(
"html",
{ asyncContext: "This is passed to the callback" },
function callback(resultbody) {
......Here we send the body for ENCRYPT.
}))
then , the user send the encrypte message by clicking 'send' regular.
In the read mode I just print it to my html to check if the decrypte is good :
(JSON.parse(xhr.responseText).Data.Content));
and then i get icon of picture ,but not success to show the real pic .
The src of the icon is going for place that not access for him ..
<img src="https://attachment.outlook.office.net/owa/*****/service.svc/s/GetFileAttachment?id=AAMkADUwMDE0YWM1LTYwODctNG ......
How can i take this tag of image and do something that the receiver can see the image ? I don't want that user will be need to upload image to body from my my add-in instead of the original outlook. I try to convert the image to base-64 string, but with what I have in the tag it not enough ,just with original picture and also it success to show in html but not in the body of message with SetAsync function..
2. The second problem is with attachments .
I upload files with dropzone plug-in (because outlook don't give access to take attachment and change him). So, after I upload files and encrypte him I make some new file with the response from server with File API of JS :
ar f = new File([""], "filename.txt", {type: "text/plain", lastModified: date}) . .. .
than I want to attach the file to mail, so the only method that do this is:
addFileAttachmentAsync(uri, attachmentName, optionsopt, callback opt)
then,I need to create a url for file for this method so I use this method:
var objectURL = URL.createObjectURL(f);
But now when I use the method addFileAttachmentAsync with objectURL it's write that there is a problem and its can't attach it , I think that the URL is incorrect .
Thanks all!!
For everyone who look any solution to this problems..
**In outlook web this solutions works good but in Outlook Desktop there is a problem of synchronize with server so there is a delay with saveAsync function without any solution to this right now , so it's work but need to wait a little bit.You could read more about it here.
First Question:
There is a problem in outlook add-in with when using getAsync and then setAsync functions . The problem occurs when there is some image inside the body . It's happen because when you take the body in Html format and then return the body with some different the image still not 'upload' and the src is being wrong .
I success to workaround this problem using Outlook rest API.
So the workaround is going like this:
Get the body message in type of Html by getAsync method. create div
element and set the return body message inside the div.
To get message id, you need to save your message as a draft with saveAsync function.
To make request to Outlook rest
API you need to get access token , so call to getCallbackTokenAsync function and save the access
token.
Make Http Request to outlook rest API to get all attachment exist in
the message.
Find the right ID of your image and replace the image src to the
base-64 of the image that you get from your request to outlook rest
API.
Finally , you could set your new body with SetAsync function .
Code:
item.body.getAsync(
Office.CoercionType.Html,
{ asyncContext: "This is passed to the callback" },
function callback(resultbody) {
var bodyDiv = document.createElement('div');
bodyDiv.innerHTML = content;
Office.context.mailbox.item.saveAsync(
function callback(result) {
var myNewItemSaved = result.value;
Office.context.mailbox.getCallbackTokenAsync({ isRest: true },
function (result) {
if (result.status === "succeeded") {
var accessToken = result.value;
var itemId = "";
if (Office.context.mailbox.diagnostics.hostName === 'OutlookIOS')
itemId = Office.context.mailbox.item.itemId;
else
itemId = Office.context.mailbox.convertToRestId(myNewItemSaved,
Office.MailboxEnums.RestVersion.v2_0);
var xhr3 = new XMLHttpRequest();
xhr3.open("GET", "https://outlook.office.com/api/v2.0/me/messages/" + itemId + "/attachments", true);
xhr3.setRequestHeader("Content-type", "application/json");
xhr3.setRequestHeader("Access-Control-Allow-Origin", "*");
xhr3.setRequestHeader("Authorization", "Bearer " + accessToken);
xhr3.send();
xhr3.onreadystatechange = function () {
if (xhr3.readyState == 4) {
if (xhr3.status == 200) {
var allImages = JSON.parse(xhr3.response).value;
var isDesktop = false;
var imgSrcId = bodyDiv.getElementsByTagName('img')[0].getAttribute("src");
if (imgSrcId.indexOf("cid") != -1) //Outlook Desktop
isDesktop = true;
for (var i = 0; i < allImages.length; i++) {
if (bodyDiv.getElementsByTagName('img')[i].getAttribute("src").indexOf("base64")!=-1)
continue;
if (isDesktop)
imgSrcId = bodyDiv.getElementsByTagName('img')[i].getAttribute("src");
else
imgSrcId = bodyDiv.getElementsByTagName('img'[i].getAttribute("originalsrc");
imgSrcId = imgSrcId.substr(4, imgSrcId.length);
var wantedImg;
for (var j = 0; j < allImages.length; j++) {
if ((allImages[j].ContentId).localeCompare(imgSrcId) != -1) {
wantedImg = allImages[j]; break;}
}
bodyDiv.getElementsByTagName('img')[i].src = 'data:' + wantedImg.ContentType + ';base64,' + wantedImg.ContentBytes;
}
}
setAsync......
}
}}}})})};
Second question
The problem with addFileAttachmentAsync that this is work only with files that is on external server, and it's not add a blob , local files.
So also here the solution is with Outlook rest API . The solution will attach our file to the message but we can't see this-no preview of the attachment in message , but when we send it this will attach to message , and we could see in our message that the attachment is there.
The solution is really similar to the one of the image in body - Save the message as a draft , get access token and this time the Http Request will be 'POST' request to our message id to attach our file to the current message.
Code to the request to add attachment to message ( all the way until here is the same like question 1):
var attachment ={
"#odata.type": "#Microsoft.OutlookServices.FileAttachment",
"Name": "smile.png",
"ContentBytes": "AAACFAMxLjAeKUDndY7EKF4P7QiWE7HgHLa7UiropGUTiDp5V07M0c5jaaTteauhzs0hOU+EOmVT0Lb6eSQ2MzgkCre/zCV9+kIB9PjWnOzoufau67J9PQdXapsOQSMcpt9X2QpcIjnl7H3sLu9iu2rqcvSjwhDnK6JygtghUB405EZHZ9LQcfJ1ZTYHylke2T9zbViq2BPqU/8IHZWsb/KQ/qzV4Jwv3NHnI583JvOuAtETJngh964edC4cU2IY6FkIWprksRw7d4fEQ/+3KbEyW0trIZm59jpTSV01/PhOI0RDKj1xI1Vr+lgMRZpOrYDfChWWWbByNzSXbIsTjHMU6GmQ5Cb09H3kv/2koFa5Pj2z8i+NGywYKw8ZSu3NVblM9I0EkQVLrxkM8gqyrDEtAobxPRxEzGTEXdnjws5UIiiGFBq3khuxejFGCNvUbmPM9guVZO0ccDe1FICTFHkrPlLZW/TvJYMou0HBrvH7s4taBHyZw5x03dhps+WG19D5na44vaVX2Vni6ZrrxfqFo7JTUpCJxCcPyoG7/nEWtJ/V/J+oXdypeapN9Agl6Q81WvCbzuyZgbLTfj6NXWDoliie069Hvk/k2lP+HyO7Iu5ffeRX2WWguwdfGXiNbqInrxn18tX+N7/KqWbRJv96tmijdCmCvsF9Lpr9k7QFKB93wuHfTuE6Qi2IVNBfzNBaz1iJYjY="
}
var xhr4 = new XMLHttpRequest();
xhr4.open("POST", "https://outlook.office.com/api/v2.0/me/messages/" + itemId + "/attachments", true);
xhr4.setRequestHeader("Content-type", "application/json");
xhr4.setRequestHeader("Access-Control-Allow-Origin", "*");
xhr4.setRequestHeader("Authorization", "Bearer " + accessToken);
xhr4.send(JSON.stringify(attachment));
xhr4.onreadystatechange = function () {
if (xhr4.readyState == 4) {
if (xhr4.status == 200)
console.log("ok");
else
console.log(xhr4.response);
}};
Hope it's will help someone , good luck !

How to load image list from REST API using angularJS

I have searched in this forum for quiet a bit and here's my problem -
I have a ng-repeat in my html which takes in a list of messages(Json object).
Each message JSON has a sender email address - e.g. abc#gmail.com
Now, I have to get the email address from the message and form another REST API request to fetch their images for e.g. - http://<>:8080/getImage/abc#gmail.com (email address dynamic)
So in my code, I'll have a ng-repeat and a ng-src pointing to the image REST URL
If there's no image in server, it returns a 404 and displays a broken image on the UI. How do I handle it? On the other hand, if I make a http request to determine if there's a success message and on failure return a default image, then the whole thing goes through an endless loop. I'll try to create a fiddle and include it for better explanation.
Use the error block to handle such behavior:
function($http) {
var restUrl = 'getImage/abc';
return {
fetchImage: function(imageId) {
var self = this;
return $http.get(restUrl + '/' + imageId).
success(function(data) {
return self.imageUrl = data;
}).
error(function(data) {
return self.imageUrl = "pathToDefaultImage";
});
},
...

Use Grails Controller to proxy an AJAX request

I have a Javascript component that when the DOM is loaded it needs to send a request out to our CDN, which may be in a different domain, to see if there is content for this component. If there is, the component will self-instantiate (its a link to open an embedded video in a modal), if not it will self destruct. My question is mainly about the Grails controller I am using to proxy the AJAX request.
Here is the JS in pseudocode:
checkForVideoAssets: function(videoDataUrl){
Ajax.get(videoDataUrl, function(data){
if(data.responseText==='error'){
//tear down the component
}
else{
//if there is data for the video instantiate the component
}
Here is the Grails controller:
def checkForModalVideoAsset = {
def req = new URL("http://" + params.videoUrl + "/expense/videos/")
def connection = req.openConnection()
if(connection.responseCode != 200){
render 'error'
}
if(connection.responseCode == 200){
render req.getText()
}
}
So, to sum up, the JS grabs an attribute from the DOM that has part of a URL (that we define by convention), sends that URL to the controller, the controller attempts to connect to that URL (at our CDN) and then passes that response back to the AJAX success callback inside the responseText part of the XHR object. This feels less than ideal to me, is it possible to pass the actual response back up to the JS function?
The httpbuilder may be usefull to you
I never tried it but something similar!?
def checkForModalVideoAsset = {
def http = new HTTPBuilder("http://" + params.videoUrl )
http.get(
path : "/expense/videos/",
contentType : TEXT ) { resp, reader ->
response.properties=resp.properties //<-- to easy to work but why not try :)
response << resp
}
}

Resources