crawl realtime google finance price - ajax

I want to create a small excel sheet which sort of like Bloomberg's launchpad for me to monitor live stock market price. So far, out of all the available free data source, I only found Google finance provides real time price for a list of exchanges I need. The issue with Google finance is they have already closed down their finance API. I am looking for a way to help me to programmatically retrieve the real price that I circled in chart below to have it update live in my excel.
I have been searching around and to no avail as of now. I read some post here:
How does Google Finance update stock prices? but the method suggested in the answer points to retrieving a time series of data in the chart, instead of the live updating price part I need. I have been examining the network communication of the web page in chrome's inspection and didn't find any request that returns the part of real time price I need. Any help is greatly appreciated. some sample codes (can be in other languages other than VBA) would be very beneficial. Thanks everyone !

There are so many way ways to do this: VBA, VB, C# R, Python, etc. Below is a way to download statistics from Yahoo finance.
Sub DownloadData()
Set ie = CreateObject("InternetExplorer.application")
With ie
.Visible = True
.navigate "https://finance.yahoo.com/quote/AAPL/key-statistics?p=AAPL"
' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
' Set a reference to the data elements that will be downloaded. We can download either 'td' data elements or 'tr' data elements. This site happens to use 'tr' data elements.
Set Links = ie.document.getElementsByTagName("tr")
RowCount = 1
' Scrape out the innertext of each 'tr' element.
With Sheets("DataSheet")
For Each lnk In Links
.Range("A" & RowCount) = lnk.innerText
RowCount = RowCount + 1
Next
End With
End With
MsgBox ("Done!!")
End Sub
I will leave it up to you to find other technologies that do the same. Thing, for instance, R and Prthon can do exactly the same thing, although, the scripts will be a bit different than the VBA scripts that do this kind of work.

It's fairly easy to make it work in Python. You'll need a few libraries:
Library
Purpose
requests
to make a request to Google Finance and then return HTML.
bs4
to process returned HTML.
pandas
to easily save to CSV/Excel.
Code and full example in the online IDE:
from bs4 import BeautifulSoup
import requests, lxml, json
from itertools import zip_longest
def scrape_google_finance(ticker: str):
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"hl": "en", # language
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
# https://www.whatismybrowser.com/detect/what-is-my-user-agent
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
}
html = requests.get(f"https://www.google.com/finance/quote/{ticker}", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(html.text, "lxml")
ticker_data = {"right_panel_data": {},
"ticker_info": {}}
ticker_data["ticker_info"]["title"] = soup.select_one(".zzDege").text
ticker_data["ticker_info"]["current_price"] = soup.select_one(".AHmHk .fxKbKc").text
right_panel_keys = soup.select(".gyFHrc .mfs7Fc")
right_panel_values = soup.select(".gyFHrc .P6K39c")
for key, value in zip_longest(right_panel_keys, right_panel_values):
key_value = key.text.lower().replace(" ", "_")
ticker_data["right_panel_data"][key_value] = value.text
return ticker_data
# tickers to iterate over
tickers = ["DIS:NYSE", "TSLA:NASDAQ", "AAPL:NASDAQ", "AMZN:NASDAQ", "NFLX:NASDAQ"]
# temporary store the data before saving to the file
tickers_prices = []
for ticker in tickers:
# extract ticker data
ticker_data = scrape_google_finance(ticker=ticker)
# append to temporary list
tickers_prices.append({
"ticker": ticker_data["ticker_info"]["title"],
"price": ticker_data["ticker_info"]["current_price"]
})
# create dataframe and save to csv/excel
df = pd.DataFrame(data=tickers_prices)
# to save to excel use to_excel()
df.to_csv("google_finance_live_stock.csv", index=False)
Outputs:
ticker,price
Walt Disney Co,$137.06
Tesla Inc,"$1,131.21"
Apple Inc,$176.99
"Amazon.com, Inc.","$3,321.61"
Netflix Inc,$384.93
Returned data from ticker_data
{
"right_panel_data": {
"previous_close": "$138.61",
"day_range": "$136.66 - $139.20",
"year_range": "$128.38 - $191.67",
"market_cap": "248.81B USD",
"volume": "9.98M",
"p/e_ratio": "81.10",
"dividend_yield": "-",
"primary_exchange": "NYSE",
"ceo": "Bob Chapek",
"founded": "Oct 16, 1923",
"headquarters": "Burbank, CaliforniaUnited States",
"website": "thewaltdisneycompany.com",
"employees": "166,250"
},
"ticker_info": {
"title": "Walt Disney Co",
"current_price": "$136.66"
}
}
If you want to scrape more data with a line-by-line explanation, there's a Scrape Google Finance Ticker Quote Data in Python blog post of mine that also covers scraping time-series chart data.

Related

openpyxl convert scraped data in time format

I would like to convert the data scraping from an internet site, regarding the time, the data is extracted like this (for example 9:15) and inserted into the cell, I would like at the bottom of the column to make the total of the hours, the problem I would like python to convert it to numerical format so that I can add it up.
any idea?
def excel():
# Writing on a EXCEL FILE
filename = f"Monatsplan {userfinder} {month} {year}.xlsx"
try:
wb = load_workbook(filename)
ws = wb.worksheets[0] # select first worksheet
except FileNotFoundError:
headers_row = [
"Datum",
"Tour",
"Funktion",
"Von",
"Bis",
"Schichtdauer",
"Bezahlte Zeit",
]
wb = Workbook()
ws = wb.active
ws.append(headers_row)
wb.save(filename)
ws.append(
[
datumcleaned[:10],
tagesinfo,
"",
"",
"",
"",
"",
]
)
wb.save(filename)
wb.close()
excel()
You should split the data you scrapped.
time_scrapped = '9:15'
time_split = time_scrapped.split(":")
hours = int(time_split[0])
minutes = int(time_split[1])
Then you can place it in separate columns and create formula at the bottom of the column.

scrapying a h3 tag returns null

I'm learning Scrapy. As an exercise I want to get the product title in this web page https://scrapingclub.com/exercise/detail_json/ using this code:
scrapy shell "https://scrapingclub.com/exercise/detail_json/"
response.xpath("//h3[1]/text()")
[]
but the only thing I get is nothing (a zero dim dic).
Try this,
response.xpath("//script[contains(., 'title')]/text()")
if you press control+u you can see this information at the footer
var obj = {
"title": "Short Sweatshirt",
"price": "$24.99",
"description": "Short sweatshirt with long sleeves and ribbing at neckline, cuffs, and hem. 57% cotton, 43% polyester. Machine wash
cold.",
"img_path": "/static/img/" + "96230-C" + ".jpg" };

Google Search Console API not returning enough data

https://developers.google.com/webmaster-tools/search-console-api-original/v3/searchanalytics/query
Google Analytics tells me ~20k pages were visited from Google search, but the Google Search Console API returns just under 5k urls. Changing startRow doesn't help.
What's really odd is I've connected Google Search Console to Google Analytics, and when viewing GSC data in GA (Acquisition -> Search Console -> Landing Pages) the GSC data there also gives me ~20k rows.
How do I get all ~20k rows out of the Google Search Console API?
date_str = '2017-12-20'
start_index = 0
row_limit = 5000
next_index = 5000
rows = []
while next_index == row_limit:
req = webmasters_service.searchanalytics().query(
siteUrl='https://tenor.com/',
fields='responseAggregationType,rows',
body={
"startDate": date_str,
"endDate": date_str,
"searchType": search_type,
"dimensions": [
"page",
],
"rowLimit": row_limit,
"startRow": start_index,
},
)
try:
resp = req.execute()
next_rows = resp['rows']
rows += next_rows
next_index = len(next_rows)
start_index += next_index
except Exception as e:
print(e)
break
return rows
For anyone else viewing this post:
When I view 'Search Results' under 'Performance' in the sidebar of the google search console webpage, at the end of the url there is a variable 'resource_id=sc-domain%3example.com'. I recommend using this resource_id variable as your siteURL.

Webscraping Nokogiri unable to pick any classes

I am using this page:
https://www.google.com/search?q=ford+fusion+msrp&oq=ford+fusion+msrp&aqs=chrome.0.0l6.2942j0j7&sourceid=chrome&ie=UTF-8
I am trying to get the this element: class="_XWk"
page = HTTParty.get('https://www.google.com/search?q=ford+fusion+msrp&oq=ford+fusion+msrp&aqs=chrome.0.0l6.11452j0j7&sourceid=chrome&ie=UTF-8')
parse_page = Nokogiri::HTML(page)
parse_page.css('_XWk')
Here I can see the whole page in parse_page but when I try the .cc('classname') I don't see anything. Am I using the method the wrong way?
Check out the SelectorGadget Chrome extension to grab css selectors by clicking on the desired element in the browser.
It's because of a simple typo, e.g. . (dot) before selector as ran already mentioned.
In addition, the next problem might occur because no HTTP user-agent is specified thus Google will block a request eventually and you'll receive a completely different HTML that will contain an error message or something similar without the actual data you was looking for. What is my user-agent.
Pass a user-agent:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
HTTParty.get("https://www.google.com/search", headers: headers)
Iterate over container to extract titles from Google Search:
data = doc.css(".tF2Cxc").map do |result|
title = result.at_css(".DKV0Md")&.text
Code and example in the online IDE:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
params = {
q: "ford fusion msrp",
num: "20"
}
response = HTTParty.get("https://www.google.com/search",
query: params,
headers: headers)
doc = Nokogiri::HTML(response.body)
data = doc.css(".tF2Cxc").map do |result|
title = result.at_css(".DKV0Md")&.text
link = result.at_css(".yuRUbf a")&.attr("href")
displayed_link = result.at_css(".tjvcx")&.text
snippet = result.at_css(".VwiC3b")&.text
puts "#{title}#{snippet}#{link}#{displayed_link}\n\n"
-------
'''
2020 Ford Fusion Prices, Reviews, & Pictures - Best Carshttps://cars.usnews.com/cars-trucks/ford/fusionhttps://cars.usnews.com › Cars › Used Cars › Used Ford
Ford® Fusion Retired | Now What?Not all vehicles qualify for A, Z or X Plan. All Mustang Shelby GT350® and Shelby® GT350R prices exclude gas guzzler tax. 2. EPA-estimated city/hwy mpg for the ...https://www.ford.com/cars/fusion/https://www.ford.com › cars › fusion
...
'''
Alternatively, you can achieve this by using Google Organic Results API from SerpApi. It's a paid API with a free plan.
The difference in your case is that you don't need to figure out what the correct selector is or why results are different in the output since it's already done for the end-user.
Basically, the only thing that needs to be done is just to iterate over structured JSON and get the data you were looking for.
Example code:
require 'google_search_results'
params = {
api_key: ENV["API_KEY"],
engine: "google",
q: "ford fusion msrp",
hl: "en",
num: "20"
}
search = GoogleSearch.new(params)
hash_results = search.get_hash
data = hash_results[:organic_results].map do |result|
title = result[:title]
link = result[:link]
displayed_link = result[:displayed_link]
snippet = result[:snippet]
puts "#{title}#{snippet}#{link}#{displayed_link}\n\n"
-------
'''
2020 Ford Fusion Prices, Reviews, & Pictures - Best Carshttps://cars.usnews.com/cars-trucks/ford/fusionhttps://cars.usnews.com › Cars › Used Cars › Used Ford
Ford® Fusion Retired | Now What?Not all vehicles qualify for A, Z or X Plan. All Mustang Shelby GT350® and Shelby® GT350R prices exclude gas guzzler tax. 2. EPA-estimated city/hwy mpg for the ...https://www.ford.com/cars/fusion/https://www.ford.com › cars › fusion
...
'''
P.S - I wrote a blog post about how to scrape Google Organic Search Results.
Disclaimer, I work for SerpApi.
It looks like something is swapping the classes so what you see in the browser is not what you are getting from the http call. In this case from _XWk to _tA
page = HTTParty.get('https://www.google.com/search?q=ford+fusion+msrp&oq=ford+fusion+msrp&aqs=chrome.0.0l6.11452j0j7&sourceid=chrome&ie=UTF-8')
parse_page = Nokogiri::HTML(page)
parse_page.css('._tA').map(&:text)
# >>["Up to 23 city / 34 highway", "From $22,610", "175 to 325 hp", "192″ L x 73″ W x 58″ H", "3,431 to 3,681 lbs"]
Change parse_page.css('_XWk') to parse_page.css('._XWk')
Note the dot (.) difference. The dot references a class.
Using parse_page.css('_XWk'), nokogiri doesn't know wether _XWk is a class, id, data attribute etc..

How to parse a more complicated JSON object in Ruby on Sinatra

I'm a Java guy, new to Ruby. I've been playing with it just to see what it can do, and I'm running into an issue that I can't solve.
I decided to try out Sinatra, again, just to see what it can do, and decided to play with the ESPN API and see if I can pull the venue of a team via the API.
I'm able to make the call and get the data back, but I am having trouble parsing it:
{"sports"=>[{"name"=>"baseball", "id"=>1, "uid"=>"s:1", "leagues"=>[{"name"=>"Major League Baseball", "abbreviation"=>"mlb", "id"=>10, "uid"=>"s:1~l:10", "groupId"=>9, "shortName"=>"MLB", "teams"=>[{"id"=>17, "uid"=>"s:1~l:10~t:17", "location"=>"Cincinnati", "name"=>"Reds", "abbreviation"=>"CIN", "color"=>"D60042", "venues"=>[{"id"=>83, "name"=>"Great American Ball Park", "city"=>"Cincinnati", "state"=>"Ohio", "country"=>"", "capacity"=>0}], "links"=>{"api"=>{"teams"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17"}, "news"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news"}, "notes"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news/notes"}}, "web"=>{"teams"=>{"href"=>"http://espn.go.com/mlb/team/_/name/cin/cincinnati-reds?ex_cid=espnapi_public"}}, "mobile"=>{"teams"=>{"href"=>"http://m.espn.go.com/mlb/clubhouse?teamId=17&ex_cid=espnapi_public"}}}}]}]}], "resultsOffset"=>0, "resultsLimit"=>50, "resultsCount"=>1, "timestamp"=>"2013-08-04T14:47:13Z", "status"=>"success"}
I want to pull the venues part of the object, specifically the name value. Every time I try to parse it I end up getting an error along the lines of "cannot change from nil to string" and then also I've gotten an integer to string error.
Here's what i have so far:
get '/venue/:team' do
id = ids[params[:team]]
url = 'http://api.espn.com/v1/sports/baseball/mlb/teams/' + id + '?enable=venues&apikey=' + $key
resp = Net::HTTP.get_response(URI.parse(url))
data = resp.body
parsed = JSON.parse(resp.body)
#venueData = parsed["sports"]
"Looking for the venue of the #{params[:team]}, which has id " + id + ", and here's the data returned: " + venueData.to_s
end
When I do parsed["sports"} I get:
[{"name"=>"baseball", "id"=>1, "uid"=>"s:1", "leagues"=>[{"name"=>"Major League Baseball", "abbreviation"=>"mlb", "id"=>10, "uid"=>"s:1~l:10", "groupId"=>9, "shortName"=>"MLB", "teams"=>[{"id"=>17, "uid"=>"s:1~l:10~t:17", "location"=>"Cincinnati", "name"=>"Reds", "abbreviation"=>"CIN", "color"=>"D60042", "venues"=>[{"id"=>83, "name"=>"Great American Ball Park", "city"=>"Cincinnati", "state"=>"Ohio", "country"=>"", "capacity"=>0}], "links"=>{"api"=>{"teams"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17"}, "news"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news"}, "notes"=>{"href"=>"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news/notes"}}, "web"=>{"teams"=>{"href"=>"http://espn.go.com/mlb/team/_/name/cin/cincinnati-reds?ex_cid=espnapi_public"}}, "mobile"=>{"teams"=>{"href"=>"http://m.espn.go.com/mlb/clubhouse?teamId=17&ex_cid=espnapi_public"}}}}]}]}]
But nothing else parses. Please help!
Like I said, I'm not trying to do anything fancy, just figure out Ruby a little for fun, but I have been stuck on this issue for days now. Any help would be appreciated!
EDIT:
JSON straight from the API:
{"sports" :[{"name" :"baseball","id" :1,"uid" :"s:1","leagues" :[{"name" :"Major League Baseball","abbreviation" :"mlb","id" :10,"uid" :"s:1~l:10","groupId" :9,"shortName" :"MLB","teams" :[{"id" :17,"uid" :"s:1~l:10~t:17","location" :"Cincinnati","name" :"Reds","abbreviation" :"CIN","color" :"D60042","venues" :[{"id" :83,"name" :"Great American Ball Park","city" :"Cincinnati","state" :"Ohio","country" :"","capacity" :0}],"links" :{"api" :{"teams" :{"href" :"http://api.espn.com/v1/sports/baseball/mlb/teams/17"},"news" :{"href" :"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news"},"notes" :{"href" :"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news/notes"}},"web" :{"teams" :{"href" :"http://espn.go.com/mlb/team/_/name/cin/cincinnati-reds?ex_cid=espnapi_public"}},"mobile" :{"teams" :{"href" :"http://m.espn.go.com/mlb/clubhouse?teamId=17&ex_cid=espnapi_public"}}}}]}]}],"resultsOffset" :0,"resultsLimit" :50,"resultsCount" :1,"timestamp" :"2013-08-05T19:44:32Z","status" :"success"}
The result of data.inspect:
"{\"sports\" :[{\"name\" :\"baseball\",\"id\" :1,\"uid\" :\"s:1\",\"leagues\" :[{\"name\" :\"Major League Baseball\",\"abbreviation\" :\"mlb\",\"id\" :10,\"uid\" :\"s:1~l:10\",\"groupId\" :9,\"shortName\" :\"MLB\",\"teams\" :[{\"id\" :17,\"uid\" :\"s:1~l:10~t:17\",\"location\" :\"Cincinnati\",\"name\" :\"Reds\",\"abbreviation\" :\"CIN\",\"color\" :\"D60042\",\"venues\" :[{\"id\" :83,\"name\" :\"Great American Ball Park\",\"city\" :\"Cincinnati\",\"state\" :\"Ohio\",\"country\" :\"\",\"capacity\" :0}],\"links\" :{\"api\" :{\"teams\" :{\"href\" :\"http://api.espn.com/v1/sports/baseball/mlb/teams/17\"},\"news\" :{\"href\" :\"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news\"},\"notes\" :{\"href\" :\"http://api.espn.com/v1/sports/baseball/mlb/teams/17/news/notes\"}},\"web\" :{\"teams\" :{\"href\" :\"http://espn.go.com/mlb/team/_/name/cin/cincinnati-reds?ex_cid=espnapi_public\"}},\"mobile\" :{\"teams\" :{\"href\" :\"http://m.espn.go.com/mlb/clubhouse?teamId=17&ex_cid=espnapi_public\"}}}}]}]}],\"resultsOffset\" :0,\"resultsLimit\" :50,\"resultsCount\" :1,\"timestamp\" :\"2013-08-05T19:44:24Z\",\"status\" :\"success\"}"
parsed["sports"] does not exist, parse your input and inspect it/ dump it
With the data you've provided in the question, you can get to the venues information like this:
require 'json'
json = JSON.parse data
json["sports"].first["leagues"].first["teams"].first["venues"]
# => [{"id"=>83, "name"=>"Great American Ball Park", "city"=>"Cincinnati", "state"=>"Ohio", "country"=>"", "capacity"=>0}]
By replacing each of the first calls with an iterator, you can search through without knowing where the data is:
json["sports"].each{|h|
h["leagues"].each{|h|
h["teams"].each{|h|
venues = h["venues"].map{|h| h["name"]}.join(", ")
puts %Q!name: #{h["location"]} #{h["name"]} venues: #{venues}!
}
}
}
This outputs:
name: Cincinnati Reds venues: Great American Ball Park
Depending on how stable the response data is you may be able to cut out several of the iterators:
json["sports"].first["leagues"]
.first["teams"]
.each{|h|
venues = h["venues"].map{|h| h["name"] }.join(", ")
puts %Q!name: #{h["location"]} #{h["name"]} venues: #{venues}!
}
and you'll most likely want to save the data, so something like each_with_object is helpful:
team_and_venues = json["sports"].first["leagues"]
.first["teams"]
.each_with_object([]){|h,xs|
venues = h["venues"].map{|h| h["name"]}.join(", ")
xs << %Q!name: #{h["location"]} #{h["name"]} venues: #{venues}!
}
# => ["name: Cincinnati Reds venues: Great American Ball Park"]
team_and_venues
# => ["name: Cincinnati Reds venues: Great American Ball Park"]
Notice that when an iterator declares variables, even if there is a variable with the same name outside the block, the scope of the block is respected and the block's variables remain local.
That's some pretty ugly code if you ask me, but it's a place to start.

Resources