Google Places API - Which data can i store? [closed] - google-places-api

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed last month.
Improve this question
I am about to create a application which uses Google Places Api.
As long as i understand the terms it is not allowed to store places data in my database
like name, latitude, longitude or detail informations. Is that right ?
I found that it is allowed to "cache" the data instead of storing it.
In order to show the "last used places" on my website i need to store something (e.g. the reference id of the place).
Which informations about a place can i store in my database ?
thanks in advance!

You can only store the Place ID in your database. Everything else has to be requested each time you want to display the data.

You can store place_id from google places response. If you want to make some actions with that data, you can store place_id in your local table along user_id. for example if you want to like any restaurant from google response:
user_id: 1,
place_id: "xyz",
isLiked: true
You can design like above in local database. And you can take "place_id" from this table and fetch full restaurant data from google places api.

From what I understand you can store everything from the APi:
(b) No Pre-Fetching, Caching, or Storage of Content. You must not pre-fetch, cache, or store any Content, except that you may store: (i) limited amounts of Content for the purpose of improving the performance of your Maps API Implementation if you do so temporarily (and in no event for more than 30 calendar days), securely, and in a manner that does not permit use of the Content outside of the Service; and (ii) any content identifier or key that the Maps APIs Documentation specifically permits you to store. For example, you must not use the Content to create an independent database of "places" or other local listings information.
But you cannot store the users data, unless consented:
(i) Your Maps API Implementation must notify the user in advance of the type(s) of data that you intend to collect from the user or the user's device. Your Maps API Implementation must not obtain or cache any user's location in any manner except with the user's prior consent. Your Maps API Implementation must let the user revoke the user's consent at any time.
Refereced from google places terms (yes it is the same as the google maps)
See these two links:
https://developers.google.com/places/policies#pre-fetching_caching_or_storage_of_content
https://developers.google.com/maps/terms#section_10_1_3

Related

Where in the stack to best merge analytical data-warehouse data with data scraped+cached from third-party APIs?

Background information
We sell an API to users, that analyzes and presents corporate financial-portfolio data derived from public records.
We have an "analytical data warehouse" that contains all the raw data used to calculate the financial portfolios. This data warehouse is fed by an ETL pipeline, and so isn't "owned" by our API server per se. (E.g. the API server only has read-only permissions to the analytical data warehouse; the schema migrations for the data in the data warehouse live alongside the ETL pipeline rather than alongside the API server; etc.)
We also have a small document store (actually a Redis instance with persistence configured) that is owned by the API layer. The API layer runs various jobs to write into this store, and then queries data back as needed. You can think of this store as a shared persistent cache of various bits of the API layer's in-memory state. The API layer stores things like API-key blacklists in here.
Problem statement
All our input data is denominated in USD, and our calculations occur in USD. However, we give our customers the query-time option to convert the response just-in-time to another currency. We do this by having the API layer run a background job to scrape exchange-rate data, and then cache it in the document store. Individual API-layer nodes then do (in-memory-cached-with-TTL) fetches from this exchange-rates key in the store, whenever a query result needs to be translated into a specific currency.
At first, we thought that this unit conversion wasn't really "about" our data, just about the API's UX, and so we thought this was entirely an API-layer concern, where it made sense to store the exchange-rates data into our document store.
(Also, we noticed that, by not pre-converting our DB results into a specific currency on the DB side, the calculated results of a query for a particular portfolio became more cache-friendly; the way we're doing things, we can cache and reuse the portfolio query results between queries, even if the queries want the results in different currencies.)
But recently we've been expanding into also allowing partner clients to also execute complex data-science/Business Intelligence queries directly against our analytical data warehouse. And it turns out that they will also, often, need to do final exchange-rate conversions in their BI queries as well—despite there being no API layer involved here.
It seems like, to serve the needs of BI querying, the exchange-rate data "should" actually live in the analytical data warehouse alongside the financial data; and the ETL pipeline "should" be responsible for doing the API scraping required to fetch and feed in the exchange-rate data.
But this feels wrong: the exchange-rate data has a different lifecycle and integrity constraints than our financial data. The exchange rates are dirty and ephemeral point-in-time samples attained by scraping, whereas the financial data is a reliable historical event stream. The exchange rates get constantly updated/overwritten, while the financial data is append-only. Etc.
What is the best practice for serving the needs of analytical queries that need to access backend "application state" for "query result presentation" needs like this? Or am I wrong in thinking of this exchange-rate data as "application state" in the first place?
What I find interesting about your scenario is about when the exchange rate data is applicable.
In the case of the API, it's all about the realtime value in the other currency and it makes sense to have the most recent value in your API app scope (Redis).
However, I assume your analytical data warehouse has tables with purchases that were made at a certain time. In those cases, the current exchange rate is not really relevant to the value of the transaction.
This might mean that you want to store the exchange rate history in your warehouse or expand the "purchases" table to store the values in all the currencies at that moment.

Accessing star ratings from reviews, for multiple businesses

Is it possible to fetch the Google star rating for any business using the Google Places API?
I have a comparison website and want to display the Google star ratings for each business on my site.
Many thanks
Yes. The responses from the Place Search and Place Details APIs include a rating field.
However, two important warnings:
These APIs are both billed, and are quite expensive ($17 and $32 per 1000 requests, respectively). Making a Place Details request for each business displayed in a comparison will probably be economically infeasible.
The Places API policies place a number of requirements on your use of Google's data. In particular, you cannot cache most data returned by the API (including ratings), and you cannot use the data alongside a non-Google map.

Architecture question about one Elasticsearch instance per microservice

I like the approach of microservices. Easily(or easier) to deploy, to manage, to develop and so on than a monolith. The microservices pattern says one database instance per microservice, in the most cases that isn't a matter but in some cases it is. I explain my problem with an example.
I have a web service where the users can upload e.g. an image and other users can comment and rate it and there's a view counter. Now I would implement 4 services.
Upload Image Service
the user uploads its image to the website
the image has some meta information like description, title, tags, upload date
Comment Service
if an user adds a comment to the image then this service handles the request and creates an entry in the database with the attributes content, videoId, userId and date
View Counter Service
always if an user views/clicks that image a new request to service will be created and an new entry in the DB with the user id and video id is stored
Each service has its own database and all services are completely independent to each other. The communication between services is only via REST API. The DB is ElasticSearch.
And here comes the problem. I will create a fourth service the "Image Search Service". It's a really common task like the search function in youtube.
For the best search results I need each of the attributes/information from the preceding 3 services. The search is depending on, of course, the tags, description and the upload date but the likes/dislike have an influence and views and comments too. An image with a high view count will be ranked higher, for example.
But when I store all this information in separate DBs then I can not consider it in one query, but I think this necessary by a full text search.
Has someone may be some experiences or some ideas to solve this problem or is there may be a best practice? I rode something about event sourcing but that's not the right solution for this special problem.
Of course I can create three requests to each service and then create an algorithm and merge the results by myself, but I think elasticsearch is the right man for this job.
JHipster uses elasticsearch on top of a mysql DB. Maybe this could be a solution.

Clickstream data analysis [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I came across an interesting scenario called clickstream data analysis. All I know is what is clickstream data. I would like to know more about this and different scenarios in which it can be used in the best interests of the business and the set of tools we need to process the data in different steps of each scenario.
What is Clickstream Data?
It is a virtual trail that a user leaves behind while surfing the Internet. A clickstream is a record of a user's activity on the Internet, including every Web site and every page of every Web site that the user visits, how long the user was on a page or site, in what order the pages were visited, any newsgroups that the user participates in and even the e-mail addresses of mail that the user sends and receives. Both ISPs and individual Web sites are capable of tracking a user's clickstream.
Clickstream data may include information like: browser height-width,browser name,browser language, device type (desktop,laptop,tablets,mobile),Revenue,Day,Timestamp,IP address, URL,Number of products added in cart, number of products removed,State,Country,Billing zip code,Shipping zip code,etc.
How can we extract more information from Clickstream data?
In the web analytics realm, site visitors and potential customers are the equivalent of subjects in a subject-based data set.
Consider a following clickstream data example, a subject-based dataset is structured in rows and columns (like an Excel spreadsheet) — each row of the data set is a unique subject and each column is some piece of information about that subject. If you want to do customer-based analysis, you will need a customer based data set. In its most granular form, clickstream data looks like the chart below. Hits from the same visitor have been color coded together.
Data Scientists derive more features from clickstream data. For each visitor, we have several hits within a visit, and over an extended period of time we have a collection of visits. We need a way to organize the data at the visitor level. Something like this:
Obviously, there are many different ways you could aggregate the data. For numeric data like page views, revenue and video views, we may want to use something like an average or total. By doing this we get more information about customer behavior. If you will observe aggregated chart, you can easily tell that company is making more revenue on Friday.
Once you have obtained a customer-based data set, there are a number of different statistical models and data science techniques that can allow you to access deeper, more meaningful analysis at the visitor level. Data Science Consulting has expertise and experience in leveraging these methods to:
Predict which customers are at the highest risk for churn and
determine the factors that are affecting that risk (allows you to be
proactive in retaining your customer base)
Understand the level of brand awareness of individual customers
Target customers with individualized, relevant offers
Anticipate which customers are most likely to convert and
statistically determine how your site is influencing that decision
Determine the types of site content that visitors are most likely to
respond to and understand how content engagement drives high-value
visits
Define the profiles and characteristics of the different personas of
visitors coming to your site, and understand how to engage with them.
You may also be interested in the following Coursera course:
https://www.coursera.org/learn/process-mining?recoOrder=6&utm_medium=email&utm_source=recommendations&utm_campaign=recommendationsEmail~recs_email_2016_06_26_17%3A57
It's on process mining, which has click trace analysis as a special case, I think.
The following can give a high-level picture of what most companies do:
Ingestion REST-ful API for clients to pass in events
Pump thee events to Kafka
Spark streaming to do real-time computations
Gobblin (or similar) to pump data from Kafka to HDFS, then run batch M/R jobs on HDFS
Both real-time and batch jobs pump the computed metrics to Druid (Lambda architecture)
UI for end-user reports/dashboards
Nagios (or similar) for alerting
Metrics aggregation framework, which tracks events through every layer in our stack
From my experience, it is better to start with fairly mature tools and do a POC end to end, then look at other tools that you can play around with. Example, as your pipeline starts maturing, you could even have an asynchronous ingestion API (written in scala/akka), Kafka streams to do inline events transformations, Flink for both real-time and batch jobs, etc.
Maybe you can take a look on spark courses on EDX, they use clickstream examples with spark for analysis and machine learning.

What is the Rails way to log visits in order to collect data for a recommendation engine?

For a summer internship, I am asked to collect some specific data relative to the pages a user visits on the startup's website.
In order to simplify things, we can consider the website as a dating site, where each user has its profile page and is tagged under certain categories (hair color, city, etc).
I would like to know the best way, in the Rails framework, to keep traces of each visits a user makes to a profile or to a tag page.
Should it be logged in a file or added in a database, where exactly in the code should the functions be called ?
Maybe a gem already exists for this specific purpose?
The question is both about where functions should be called in Rails and how data should be stored because the goal is to build a recommendation system, ultimately.
There are a wide range of options available to you. I'd recommend one of the following:
Instrument detailed logging of the relevant controller actions. Periodically run a rake task that aggregates data from the log files and makes it available to your relevance engine.
Use a key/value store such as Redis to increment user/action specific counters during requests. Your relevance engine can query this store for the required metrics. Again, periodic aggregation of metrics is advised.
Both approaches lend themselves well to before_filter statements. You can interrogate the input params before the controller action executes to transparently implement the collection of statistics.
I wouldn't recommend using a relational database to store the raw data.

Resources