I need twitter dataset for last 3-4 months relating to any company/ comodity for stock price prediction - sentiment-analysis

I need twitter dataset for last 3-4 months relating to any company/ commodity for performing sentiment analysis and thereby stock price prediction.
But the twitter API only goes back upto 10-12 days.The code that I've prepared works well but I need more dataset to reach a reliable conclusion.I can't wait for 3-4 months since I've to submit the project soon.
If anyone knows any link where I can find some dataset or if anyone has it,
Please let me know.
Thank you in advance

You probably could check out these discussions (1,2) and also this site.

Related

How to look for S&P 500 Constituents history, added and removed dates etc

I am trying to get a historical list of the S&P500 underlying stocks mix. all tickers the dates were added to the S&P500 index mix and the dates tickers were removed from the list. and throughout the years for each period what is the mix. I did some search, doesn't seems to have any luck.
if anyone can provide some good search keywords, or suggest a place to look for would be appreciated
this is something very specific.
I currently use backtrader to work on some data. if there is a systematic way to get the data, please let me know as well.
many thanks.
You can access this data systematically in QuantRocket, via data provider Sharadar:
https://www.quantrocket.com/data/?filter=sharadar

Neo4j - count relationships vs store the no. of relationships

I'm feeling good to work with Neo4j as I'm building a social network and neo4j is working well for me. Please answer to these points:
1) I'm stuck at making a decision as to store the number of likes on a post (de-normalized)somewhere in the database or should I count the number of edges to that post dynamically every time.
For example, When retrieving the "post" json data, for each user who needs that data, I need to count the no. of edges everytime I generate json.
2) Stuck at deciding best way to notify users about likes or comments.
For example, I want to push a notification to the user saying "John and 3 others also commented on Cena's post".
This notification might be updated as the number of comments increases. So, it's helpful for me to update notification if I'm using count(*) rather than than storing the counter somewhere, because I can fetch the count of "${count} new replies on your post" easily. But I'm worried about the performance.
3) Can I use Redis or other memcache with neo4j? Does that make a "significant" difference?
Please help me out in deciding which is better.
P.S: Please keep in mind the efficiency and scalability of application.

API User Usage Report: Inconsistent Reporting

I'm using a JVM to perform API calls to the Google Apps Administrator API.
I've noticed with the User Usage Reports, I'm not getting complete data when it comes to a field I'm interested in (num_docs_externally_visible) and the fields which form that fields calculation. I generally request a single day's usage report at a time, across my entire user base (~40k users).
According to the documentation on the developer's, I should be able to see that in a report 2-6 days after; however after running a report for the first 3 weeks of February, I've only gotten it for 60% of the days. The pattern appears to be random (in that I have up to 4 day in a row streaks of the item appearing and 3 days in a row streaks of it not, but there is no consistency to this).
Has anyone else experienced this issue? And if so, were you able to resolve it? Or should I expect this behavior to continue if this is an issue with what the API is returning outside of my control?
I think it's only natural that the data you get is not yet complete, it takes a certain day to receive the complete data.
This SO question is not exactly the same of your question, but i think it will help you. Especially the part that you need to use your account time zone.

Google Analytics Date/Hour/Time Stamp

I'm new to Google Analytics. I have a goal and I would like to see the date/hour/minute the goal occurred. Is this possible? Sorry if this is a stupid question, I could not find an answer after hours of googling...
Thanks!
You cant actually see the Date , hour and minute the goal in one request.
Using a custom Report
I added Hour, minute and Goal1. Hour returned information minute returns nothing. I tried to add date as a secondary dimension to the hour one and that also returned no information.
Remember not all of the dimensions and metrics can be combined to give information. Hour and minute are two of them.
using a dashboard also didn't reveal any decent data.

How does the Twitter timeline algorithm work?

I'm trying to design a system similar to Twitter's timeline, but I can't wrap my head around how to get updates from so many followers while remaining efficient. Let's say I'm following 1000 people on Twitter. When I go to my feed, how does it know which tweets to show me? This is what I'm thinking, but it seems extremely inefficient and unlikely:
You have 10,000 friends.
In a for loop, loop through each friend, getting their latest
status updates since their last update.
But that just seems ridiculous to loop through 10,000 friends. I can't imagine how else they'd do it though. Or would it be something like:
Someone I am following posted a tweet. That tweet is inserted in
an array containing the tweets of all people I am following.
But then that would seem weird, if I followed someone new who has 20,000 tweets, then 20,000 tweets would be inserted in my array, and if that person has millions of followers, then there are a million X 20,000 copies of the same set of tweets. So that also seems unlikely.
Anyone have any ideas how they could possibly do it?
I advice you to check the twissandra project
they have implemented all the basic functionality of twitter using cassandra , a nosql database. It is said twitter is no longer using it for tweets .
The old implementation can be consulted here

Resources