Why my page doesn't show the newest data - caching

I write an app with google app engine. It is a simple blog system. If I delete a piece of blog, the page didn't refresh as I wish. It still present the blog that has been deleted.But if I refresh the page after that, it will present in the correct way. I though it was the problem of cache. I have been working on it for serveral days. Could anyone teach me how to fix it? Thanks very much.
class BlogFront(BlogHandler):
def get(self):
val = self.request.get("newPost")
#get all the pages
posts = Post.all().order('-created')
#stop the cache in the browser
self.response.headers["Pragma"]="no-cache"
self.response.headers["Cache-Control"]="no-cache, no-store, must-revalidate, pre-check=0, post-check=0"
self.response.headers["Expires"]="Thu, 01 Dec 1994 16:00:00"
self.render('front.html', posts = posts)
def post(self):
#press the delete button
operatorRequest = self.request.get('Delete')
articleId = operatorRequest.split('|')[0]
operator = operatorRequest.split('|')[1]
key = db.Key.from_path('Post', int(articleId), parent=blog_key())
post = db.get(key)
db.delete(post.key())
self.redirect("/")

I assume redirect to / is handled by BlogFront handler. Seems you're hitting datastore eventual consistency.
Google App Engine Datastore: Dealing with eventual consistency
GAE: How long to wait for eventual consistency?

Related

Logging Into Google To Scrape A Private Google Group (over HTTPS)

I'm trying to log into Google, so that I can scrape & migrate a private google group.
It doesn't seem to log in over SSL. Any ideas appreciated. I'm using Mechanize and the code is below:
group_signin_url = "https://login page to goolge, with referrer url to a private group here"
user = ENV['GOOGLE_USER']
password = ENV['GOOGLE_PASSWORD']
scraper = Mechanize.new
scraper.user_agent = Mechanize::AGENT_ALIASES["Linux Firefox"]
scraper.agent.http.verify_mode = OpenSSL::SSL::VERIFY_NONE
page = scraper.get group_signin_url
google_form = page.form
google_form.Email = user
google_form.Passwd = password
group_page = scraper.submit(google_form, google_form.buttons.first)
pp group_page
I worked with Ian (the OP) on this problem and just felt we should close this thread with some answers based on what we found when we spent some more time on the problem.
1) You can't scrape a Google Group with Mechanize. We managed to get logged in abut the content of the Google Group pages is all rendered in-browser, meaning that HTTP requests, such as issued by Mechanize, are returned with a few links and no actual content.
We found that we could get page content by the use of Selenium (we used Selenium in Firefox, using the Ruby bindings).
2) the HTML element IDs/classes in Google Groups are obfuscated but we found that these Selenium commands will pull out the bits you need (until Google change them)
message snippets (click on them to expand messages)
find_elements(:class, 'GFP-UI5CCLB')
elements with name of author
find_elements(:class, 'GFP-UI5CA1B')
elements with content of post
find_elements(:class, 'GFP-UI5CCKB')
elements containing date
find_elements(:class, 'GFP-UI5CDKB') (and then use the attribute[:title] for a full length date string)
3) I have some Ruby code here which scrapes the content programmatically and uploads it into a Discourse forum (which is what we were trying to migrate to).
It's hacky but it kind of works. I recently migrated 2 commercially important Google Groups using this script. I'm up for taking on 'We Scrape Your Google Group' type work, please PM me.

Best way to get definitions out of Google?

I'm trying to make a simple feature where a user can specify a term and the program fetches a definition for it and returns it. The best definition system I know of is Google's "define" keyword in search queries where if you start the query with "define " or "define:" etc it returns very accurate and sufficient definitions. However, I have no idea how to access this information programatically.
Google's new Custom Search Engine API doesn't show definitions and the old one gives slightly better results but is deprecated and still doesn't show the same definitions I see when I Google the term in the browser.
Failing Google, I turned to Wikipedia, which has a huge API but I still couldn't find a way to extract summaries like Google definitions.
So my question is, does anybody know how I can get this information out of Google via the API or any other means?
This is an older question but is asking the same thing. Except the answers given are no longer applicable as Google Dictionary no longer exists.
Update: So I'm now going down the route of trying to scrape the definitions straight out of the page itself. Now the problem is, when I visit the page in the browser (Firefox), the definitions show up, but when I'm scraping them using cheerio, they don't show up anywhere on the page. I must mention I'm scraping the page through nitrous.io so it's rendering the page from a different region and operating system to the one I'm viewing it in the browser with so maybe it's region related. Will look into it further.
Update 2.0: I think maybe the definitions are loaded asynchronously and so I have no idea how to scrape them because I've never really done scraping before and I'm just a newbie :(
Update 3.0: Ok, so now I think it's not to do with the asynchronous loading but the renderer of the page. When I load this in Firefox, the page looks like this:
However, when I load it in IE (8) it looks like this:
Anybody got some insight on this?
Finally got to the answer. Had to set user agent when screen scraping. My resulting code for getting definitions via scraping:
var request = require('request')
, cheerio = require('cheerio');
var searchTerm = 'test';
request({url:'https://www.google.co.uk/search?q=define+'+searchTerm,headers:{"User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0"}}, function(err, resp, body){
$ = cheerio.load(body);
var defineBlocks = $(".lr_dct_sf_sen");
var numOfBlocks = (defineBlocks.length < 3) ? defineBlocks.length : 3;
for (var i=0; i<numOfBlocks; i++){
var block = defineBlocks[i].children[1].children[0]; //font-size:small level
process(block);
function process (block) {
for (var i=0; i<block.children.length; i++){
var line = block.children[i];
if ("style" in line.attribs){ // main text
exampleStr = "";
for (var k=0; k<line.children.length; k++){
exampleStr += line.children[k].children[0].data;
}
console.log(exampleStr);
} else if ("class" in line.attribs){ // example
console.log("\""+line.children[1].children[0].data+"\"");
} else { // nothing i want
}
}
}
}
});

Windows Phone webclient caching "issue"?

I am trying to call the same link, but with different values, the issue is that the url is correct containing the new values but when I download it (Webclient.DownloadStringTaskAsync), it gives me the previous calls result.
I have tried adding headers no-cache, and attaching a random value to the call, and ifmodifiedSince header. however it is still not working.
any help will be much appreciated cause I have tried everything.
uri: + "&junk=" + Guid.NewGuid());
client.Headers["Cache-Control"] = "no-cache";
client.Headers[HttpRequestHeader.IfModifiedSince] = DateTime.UtcNow.ToString();
var accessdes = await client.DownloadStringTaskAsync(uri3);
so here my uri3 contains the latest values, but when I hover over accessdes, it contains the result as if I am making a old uri3 call with previous set data.
I saw one friend that was attaching a random GUID to the Url in order to prevent the OS to cache its content. For example:
if the Url were: http://www.ms.com/getdatetime and the OS is caching it.
Our solution was adding a guid for creating "sort of" like a new url, as an example our previous Url would look like: http://www.ms.com/getdatetime?cachebuster=21EC2020-3AEA-4069-A2DD-08002B30309D
(see more about cache buster : http://www.adopsinsider.com/ad-ops-basics/what-is-a-cache-buster-and-how-does-it-work/ )

Bing translator HTTP API throws bad request error, how to solve this?

Whenever I call Bing Translation API [HTTP] to translate some text, first time it works fine, and second time onwards it gives me 'bad request' [status code 400] error. If I wait for 10 or so minutes and then try again, then first request is successful, but second one onwards same story. I have a free account [2million chars translation] with Bing Translation APIs, are there any other limitations calling this API?
Thanks, Madhu
Answer:
hi, i missed to subscribing to Microsoft Translator DATA set subscription. Once i get the same, then things have solved. i.e; once i have signed up for https://datamarket.azure.com/dataset/bing/microsofttranslator then things are working.
i was generating the access_token correctly, so that is not an issue.
thanks, madhu
i missed to subscribing to Microsoft Translator DATA set subscription. Once i get the same, then things have solved. i.e; once i have signed up for https://datamarket.azure.com/dataset/bing/microsofttranslator then things are working.
i was
thanks, madhu
As a note to anyone else having problems, I figured out that the service only allows the token to be used once when using the free subscription. You have to have a paid subscription to call the Translate service more than once with each token. This limitation is, of course, undocumented.
I don't know if you can simply keep getting new tokens -- I suspect not.
And regardless of subscription, the tokens do expire every 10 minutes, so ensure you track when you receive a token and get a new one if needed, e.g. (not thread-safe):
private string _headerValue;
private DateTime _headerValueCreated = DateTime.MinValue;
public string headerValue {
get {
if(_headerValueCreated < DateTime.Now.AddMinutes(-9)) {
var admAuth = new AdmAuthentication("myclientid", "mysecret");
_headerValue = "Bearer " + admAuth.GetAccessToken();
_headerValueCreated = DateTime.Now;
}
return _headerValue;
}
}

Ruby Mechanize login not working

Let me set the stage for what I'm trying to accomplish. In a physics class I'm taking, my teacher always likes to brag about how impossible it is to cheat in her class, because all of her assignments are done through WebAssign. The way WebAssign works is this: Everyone gets the same questions, but the numbers used in the question are random variables, so each student has different numbers, thus a different answer. So I've been writing ruby scripts to solve the question's for people by just imputing your specific numbers.
I would like to automate this process using mechanize. I've used mechanize plenty of times before, but I'm having trouble logging in to the site. I'll submit the form and it returns the same page I was just on. You can take a look at the site's source code, at http://webassign.net, and I've also tried using the login at http://webassign.net/login.html with no luck either.
Let me follow all of this up with some ruby code that doesn't do what I want it to:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get("http://www.webassign.net/login.html")
form = page.forms.last
puts "Enter your username"
form.WebAssignUsername = gets.chomp
puts "Enter your password (Don't worry, we don't save this)"
form.WebAssignPassword = gets.chomp
form.WebAssignInstitution = "trinityvalley.tx"
form.submit #=> Returns original page
If anyone really takes an interest in getting this to work, I would be more than happy to send them a working username and password.
The site could be checking that the Login post variable is set (see the login button). Try adding form.Login = "Login".
Have you tried to use agent.submit(form, form.buttons.first) instead of form.submit?
This worked for me when I tried to submit a form. I tried using form.submit first and it kept returning the original page.
Try setting the user agent:
agent = Mechanize.new do |a|
a.user_agent_alias = 'Mac Safari'
end
Some sites seem to require that.
Your question seems a little ambiguous, saying that you're not having any luck? What is the problem exactly? Are you getting a different response entirely than when you view the page in a browser? If so, then do what #cam says and analyzer the headers, you can do it in Firefox via an extension, or you can do it in Chrome natively. Either way, try to mimic the headers that you see in whatever browser you are doing in you mechanize user agent. Here is a script that I used to mimic the iTunes request headers when I was data-mining the app-store:
def mimic_itunes( mech_agent )
mech_agent.pre_connect_hooks << lambda {|headers|
headers[:request]['X-Apple-Store-Front'] = X_APPLE_STOREFRONT;
headers[:request]['X-Apple-Tz'] = X_APPLE_TZ;
headers[:request]['X-Apple-Validation'] = X_APPLE_VALIDATION;
}
mech_agent.user_agent = 'iTunes/9.1.1 (Windows; Microsoft Windows 7 x64 Business Edition (Build 7600)) AppleWebKit/531.22.7'
mech_agent
end
Note: the constants in the example are just strings... not really that important what they are, as long as you know you can add any string there
Using this approach, you should be able to alter/add any headers that the web application might need.
If this is not the problem that you are having, then post more in-depth details of what exactly is happening.

Resources