I am trying to call the same link, but with different values, the issue is that the url is correct containing the new values but when I download it (Webclient.DownloadStringTaskAsync), it gives me the previous calls result.
I have tried adding headers no-cache, and attaching a random value to the call, and ifmodifiedSince header. however it is still not working.
any help will be much appreciated cause I have tried everything.
uri: + "&junk=" + Guid.NewGuid());
client.Headers["Cache-Control"] = "no-cache";
client.Headers[HttpRequestHeader.IfModifiedSince] = DateTime.UtcNow.ToString();
var accessdes = await client.DownloadStringTaskAsync(uri3);
so here my uri3 contains the latest values, but when I hover over accessdes, it contains the result as if I am making a old uri3 call with previous set data.
I saw one friend that was attaching a random GUID to the Url in order to prevent the OS to cache its content. For example:
if the Url were: http://www.ms.com/getdatetime and the OS is caching it.
Our solution was adding a guid for creating "sort of" like a new url, as an example our previous Url would look like: http://www.ms.com/getdatetime?cachebuster=21EC2020-3AEA-4069-A2DD-08002B30309D
(see more about cache buster : http://www.adopsinsider.com/ad-ops-basics/what-is-a-cache-buster-and-how-does-it-work/ )
Related
I am actually trying to parse a website using the requests module, and extract some text out of it.
Url : https://www.icsi.in/student/Members/MemberSearch.aspx
after hitting the url in the Cp Number text field input : 16803
hit search,
on the bottom you can see some data, I want that data, let's say a name.
I am successfully able to get the data using selenium, but can't able to get it using requests module.
I have tried the requests module giving parameters, sessions, cookies etc.
but nothing worked.
url = "https://www.icsi.in/student/Members/MemberSearch.aspx"
ss = {'dnn$ctr410$MemberSearch$txtCpNumber':'16803',
'__EVENTTARGET':'dnn$ctr410$MemberSearch$btnSearch',
'__VIEWSTATEGENERATOR':'6A295697',
'dnn$ctlHeader$dnnSearch$Search':'SiteRadioButton'}
session = requests.Session()
cookies = session.cookies.get_dict()
for cookie in cookies:
session.cookies.set(cookie['name'], cookie['value'])
response = requests.post(url, data=ss)
print(response)
HTMLTree = html.fromstring(response.content)
name = HTMLTree.xpath('//div[#class="name_head"]//text()')
print(name)
I expect the output of the name of the person.
Anyone out there please help me.
If you don't mind using C# code I would be more than happy to help you otherwise it's a very lengthy process. If you choose that python is the only road you're willing to take then you should try grabbing the encrypted value within C:\User[USERNAME]\Appdata\Local\Google\Chrome\User Data\Default\Cookies You can change the file path accordingly to your OS. You can use SQLite to read and modify the encrypted values.
cookie = Decrypt(Encoding.Default.GetBytes(SQLDatabase1.GetValue(i, "encrypted_value")
if (cookie.Contains(".ASPXANONYMOUS")):
Step1 = cookie + "END"
Step2 = (step1 + ".ASPXANONYMOUS")
The following code above may help you with your journey.
I'm using the URL::to call to embed a link in an outgoing mail message. What I get when I do this is something like: "baseroot/public/index.php/xxx/yyy".
And yet when I do the same call, for example, within a route call, I get "baseroute/xxx/yyy".
Any idea?
The source of URL::to resides at
http://laravel.com/api/source-class-Illuminate.Routing.UrlGenerator.html#76-98
(linked to from http://laravel.com/api/class-Illuminate.Routing.UrlGenerator.html).
I suggest you add debug printing to your copy and see what values $this->getScheme() and $this->getRootPath() yield. These must be the source of the discrepancy, apparently caused by different this objects.
I had a very similar problem with URL::to('user/123') returning an incorrect value when visiting the homepage vs. another page. After some investigation, in my case it was a matter of case-sensitivity (!) in the request's url. I hope it's somehow related to your mysterious case.
More about my case: URL:to('user/123') gave me different results whether I visited http://localhost/MyApp/public/someurl or http://localhost/Myapp/public/someurl. In the former it gave the correct result http://localhost/MyApp/public/user/123, while the latter gave the wrong result of http://localhost/user/123.
.
From here, less important notes from my investigation, for future Laravel archaeologists. I hope I'm not talking all nonsense. I am new to Laravel, using a local Laravel 4 installation + WAMP on a Windows machine.
UrlGenerator's to() method uses $root = $this->getRootUrl($scheme);. The latter uses $this->request->root();, where request is \Symfony\Component\HttpFoundation\Request.
Request::root() indeed defaults to a wrong value e.g. http://localhost when visiting someurl with the incorrect case.
The culprit is Symfony\Component\HttpFoundation\Request (in vendor\symfony\http-foundation\Symfony\Component\HttpFoundation\Request.php). Its getBaseUrl() calls prepareBaseUrl(), and there the actual logic of comparing the requestUri with the baseUrl is finally performed.
For the few archaeologists still following, in my case the $baseUrl was /MyApp/public/index.php while the $requestUri was /Myapp/public/someurl, which sadly led the code to not satisfy this conditional:
if ($baseUrl && false !== $prefix = $this->getUrlencodedPrefix($requestUri, dirname($baseUrl))) {
return rtrim($prefix, '/');
}
I'm trying to make a simple feature where a user can specify a term and the program fetches a definition for it and returns it. The best definition system I know of is Google's "define" keyword in search queries where if you start the query with "define " or "define:" etc it returns very accurate and sufficient definitions. However, I have no idea how to access this information programatically.
Google's new Custom Search Engine API doesn't show definitions and the old one gives slightly better results but is deprecated and still doesn't show the same definitions I see when I Google the term in the browser.
Failing Google, I turned to Wikipedia, which has a huge API but I still couldn't find a way to extract summaries like Google definitions.
So my question is, does anybody know how I can get this information out of Google via the API or any other means?
This is an older question but is asking the same thing. Except the answers given are no longer applicable as Google Dictionary no longer exists.
Update: So I'm now going down the route of trying to scrape the definitions straight out of the page itself. Now the problem is, when I visit the page in the browser (Firefox), the definitions show up, but when I'm scraping them using cheerio, they don't show up anywhere on the page. I must mention I'm scraping the page through nitrous.io so it's rendering the page from a different region and operating system to the one I'm viewing it in the browser with so maybe it's region related. Will look into it further.
Update 2.0: I think maybe the definitions are loaded asynchronously and so I have no idea how to scrape them because I've never really done scraping before and I'm just a newbie :(
Update 3.0: Ok, so now I think it's not to do with the asynchronous loading but the renderer of the page. When I load this in Firefox, the page looks like this:
However, when I load it in IE (8) it looks like this:
Anybody got some insight on this?
Finally got to the answer. Had to set user agent when screen scraping. My resulting code for getting definitions via scraping:
var request = require('request')
, cheerio = require('cheerio');
var searchTerm = 'test';
request({url:'https://www.google.co.uk/search?q=define+'+searchTerm,headers:{"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0"}}, function(err, resp, body){
$ = cheerio.load(body);
var defineBlocks = $(".lr_dct_sf_sen");
var numOfBlocks = (defineBlocks.length < 3) ? defineBlocks.length : 3;
for (var i=0; i<numOfBlocks; i++){
var block = defineBlocks[i].children[1].children[0]; //font-size:small level
process(block);
function process (block) {
for (var i=0; i<block.children.length; i++){
var line = block.children[i];
if ("style" in line.attribs){ // main text
exampleStr = "";
for (var k=0; k<line.children.length; k++){
exampleStr += line.children[k].children[0].data;
}
console.log(exampleStr);
} else if ("class" in line.attribs){ // example
console.log("\""+line.children[1].children[0].data+"\"");
} else { // nothing i want
}
}
}
}
});
We have the same requirement of passing huge data like http://bugsquash.blogspot.in/2010/12/customizing-solrnet.html, we tried the following.
1) Increased the requestHeaderSize to Int32.MaxValue - stackoverflow exception
2) Used PostSolrconnection - got the StackOverflow exception.
3) Downloaded the source of solrnet and added as project reference - Stackoverflow exception
Then even we changed to GET, we are getting the StackOverflow exception. The error is coming when we have more than 500 reference ids. If we have less values, it works.
This is how we are calling,
searchResults = solrPost.Query(new SolrMultipleCriteriaQuery(new[]
{
query
}),
new SolrNet.Commands.Parameters.QueryOptions
{
Fields = new[] { "*", "score" },
Start = pageSize,
Rows = 40,
OrderBy = listSort
});
Any ideas?
EDIT:
We tried requesting solr using HttpRequest and identified as maxBooleanClause issue and then POST started working through HttpRequest. But using SolrNet the error is occurred and it is happening at serializing the query object. queryserializer.serialize(Query)
Wondering why step 2 didn't work which is the exact fix for the long Get request issue, i.e. to switch over to Post request.
Chances are there is an issue with the piece of code where you are initialising SolrNet to use PostSolrConnection instead of the default SolrConnection. Need to look at the bit of code which gets you the instance of solrPost object. Take another look at it and post it here.
Unfortunately the SolrNet didn't work due to stackoverflow error while serializing the query parameters. Alternate workaround is posted http://smartcoder.in/solrnetstackoverflow/
I am trying to manage an Inbox in Exchange 2003 automatically using webdav from a C# application. Looking at msdn is not helping me a whole lot as the methods described here (http://msdn.microsoft.com/en-us/library/aa142917.aspx) do not coincide at all with the samples I have found otherwise. So there are two things I am trying to determine:
Of all the fields that return from a webdav query
string reqStr =
#"<?xml version=""1.0""?>
<g:searchrequest xmlns:g=""DAV:"">
<g:sql>
SELECT
*
FROM
""http://server/Exchange/email1#domain.com/Inbox/""
WHERE ""urn:schemas:mailheader:from"" = 'email2#domain.com'
</g:sql>
</g:searchrequest>";
Which one is the unique identifier? I have browsed it (but not sure of a reference to verify the fields) and it appears at first glance that DAV:id is what I want (), but I am not wanting to work on assumptions.
Secondly, what is the correct way to programmatically delete an email after I have processed it? Would something like the following work (will it remove the entry and all related metadata). I don't want any files left orphaned on the server...
string reqStr =
#"<?xml version=""1.0""?>
<g:searchrequest xmlns:g=""DAV:"">
<g:sql>
DELETE
FROM
""http://server/Exchange/email1#domain.com/Inbox/""
WHERE ""DAV:id"" = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXX'
</g:sql>
</g:searchrequest>";
And finally, what are the best online sources for investigating all the data returned in the XML from the first request, and where are all the options documented for managing the webdav interface? Looking at MSDN just hasn't been fruitful.
Look for the dav:hef tags tag in the response. They contain an url you can use to issue a delete command.
From the result of a query that gets you the msg Uri then:
var request = (HttpWebRequest)WebRequest.Create(mail.MailUri);
request.Credentials = _credential;
request.Method = "DELETE";
var response = (HttpWebResponse)request.GetResponse();
if (response.StatusCode != HttpStatusCode.OK)
{
//something might of broke
}