Any way to download a Facebook group? - bash

Sorry for a strange question.
I'm an admin of a very useful Facebook group. There is a lot of valuable info, which I'd like to have offline. Is there any (cli) method to download it?

You could use online services like Sociographand Grytics to get data and even export them(I tried sociograph).
If you want to download the data yourself, then you need to build a program that gets the data for you through the graph api and from there you can do whatever you want with the data you get.
Here is a simple I hacked in python to get the data from a facebook group.
Using this SDK
#!/usr/bin/env python3
import requests
import facebook
from collections import Counter
graph = facebook.GraphAPI(access_token='fb_access_token', version='2.7', timeout=2.00)
posts = []
post = graph.get_object(id='{group-id}/feed') #graph api endpoint...group-id/feed
group_data = (post['data'])
all_posts = []
"""
Get all posts in the group.
"""
def get_posts(data=[]):
for obj in data:
if 'message' in obj:
print(obj['message'])
all_posts.append(obj['message'])
"""
return the total number of times each word appears in the posts
"""
def get_word_count(all_posts):
all_posts = ''.join(all_posts)
all_posts = all_posts.split()
for word in all_posts:
print(Counter(word))
print(Counter(all_posts).most_common(5)) #5 most common words
"""
return number of posts made in the group
"""
def posts_count(data):
return len(data)
get_posts(group_data)
get_word_count(all_posts)
Basically using the graph-api you can get all the info you need about the group such as likes on each post, who liked what, number of videos, photos etc and make your deductions from there.
I googled but couldn't find a bash script for this.

Related

How do I add multiple roles form a list to a user?

I have a command that adds all the current roles of a user to a Database (MongoDB).
The code:
def add_roles_to_db(self):
check = cursor.find_one({"_id": self.ctx.author.id})
if check is None:
cursor.insert_one({"_id": self.ctx.author.id, "roles": [str(r) for r in self.ctx.author.roles[1:]]})
else:
cursor.update_one({"_id": self.ctx.author.id}, {"$set": {"roles": [str(r) for r in self.ctx.author.roles[1:]]}})
The code to get the roles:
def get_roles_from_db(self):
return cursor.find_one({"_id": self.ctx.author.id})["roles"]
When I get the roles from the DB I get a list, everything I've tried led to an error. Error: "AttributeError: 'str' object has no attribute 'id'"
if len(roles) != 0:
await author.add_roles(*roles)
I saw a other post where someone added roles via a list but that didn't work
You're passing a list of strings, not a list of Roles. Turn them into discord.Role instances using the id's first, and then pass them to add_roles.
You can get them using Guild.get_role, or Guild.roles.
await author.add_roles(*[discord.Object(role_id) for role_id in roles])
One good way of adding multiple roles to a user is to have the list of role IDs into a list. You would have to look at your code and figure out how to do that bit, as I don't know either but I reckon just append it into a list. Then interate through each item in that list (each ID) and append it.
Example code:
guild = client.get_guild(1234) #replace 1234 with your guild ID
guild = ctx.guild #this is another way of doing it, chose the one above or this
role_ids = [] #your role ids would be in this list
for id in role_ids:
role = guild.get_role(id)
await author.add_roles(role)
await ctx.send("Given you all the roles!")
I haven't tried this myself, but I don't see why it wouldn't work.
If you need any more clarification, please ask me and if this has worked, please mark it as correct! :)

How do I download all the abstract datas from the pubmed data ncbi

I want to download all the pubmed data abstracts.
Does anyone know how I can easily download all of the pubmed article abstracts?
I got the source of the data :
ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/af/12/
Is there anyway to download all these tar files..
Thanks in advance.
There is a package called rentrezhttps://ropensci.org/packages/. Check this out. You can retrieve abstracts by specific keywords or PMID etc. I hope it helps.
UPDATE: You can download all the abstracts by passing your list of IDS with the following code.
library(rentrez)
library(xml)
your.ids <- c("26386083","26273372","26066373","25837167","25466451","25013473")
# rentrez function to get the data from pubmed db
fetch.pubmed <- entrez_fetch(db = "pubmed", id = your.ids,
rettype = "xml", parsed = T)
# Extract the Abstracts for the respective IDS.
abstracts = xpathApply(fetch.pubmed, '//PubmedArticle//Article', function(x)
xmlValue(xmlChildren(x)$Abstract))
# Change the abstract names with the IDS.
names(abstracts) <- your.ids
abstracts
col.abstracts <- do.call(rbind.data.frame,abstracts)
dim(col.abstracts)
write.csv(col.abstracts, file = "test.csv")
I appreciate that this is a somewhat old question.
If you wish to get all the pubmed entries with python I wrote the following script a while ago:
import requests
import json
search_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&mindate=1800/01/01&maxdate=2016/12/31&usehistory=y&retmode=json"
search_r = requests.post(search_url)
search_data = search_r.json()
webenv = search_data["esearchresult"]['webenv']
total_records = int(search_data["esearchresult"]['count'])
fetch_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmax=9999&query_key=1&webenv="+webenv
for i in range(0, total_records, 10000):
this_fetch = fetch_url+"&retstart="+str(i)
print("Getting this URL: "+this_fetch)
fetch_r = requests.post(this_fetch)
f = open('pubmed_batch_'+str(i)+'_to_'+str(i+9999)+".json", 'w')
f.write(fetch_r.text)
f.close()
print("Number of records found :"+str(total_records))
It starts of by making an entrez/eutils search request between 2 dates which can be guaranteed to capture all of pubmed. Then from that response the 'webenv' (which saves the search history) and total_records are retrieved. Using the webenv capability saves having to hand the individual record ids to the efetch call.
Fetching records (efetch) can only be done in batches of 10000, the for loop handles grabbing batches of 9,999 records and saving them in labelled files until all the records are retrieved.
Note that requests can fail (non 200 http responses, errors), in a more robust solution you should wrap each requests.post() in a try/except. And before dumping/using the data to file you should ensure that the http response has a 200 status.

Designing a Firebase based scalable feed model

Question :
How to design a social network "feed" with Firebase as backend, that scales ?
Possible answers :
"MVP" solution is to design a feeds root child, one for each user, and append any new post from the followed user in every follower's feeds.
users
user1
name: bob
user2
name: alice
follows:
user1: true
posts
post1
author: user1
text: 'Hi there'
feeds
user2
post1: true
This works well, and is demoed in the Firefeed project. But it does not scale well : if Katy Perry wants to post something, her mobile phone will have to write to millions of feed.
Hence the solution reported in this SO question to delegate this operation to a server based process.
My problem is, Firebase is a "no-backend" solution, and this is the main reason why I use it, so I'd like to make sure there is absolutely no chance of implementing this feature without a server.
What if the feeds child is removed in the above schema ?
Then do this :
baseRef.child('posts')
.orderBy('author')
.whereIn(baseRef.child('users/user2/follows').keys())
Unfortunately, whereIn does not exists in Firebase API, nor subqueries :(
Any other model structure possible without the need of a server ?
Thanks
Firebase guys kinda replied on their blog : https://www.firebase.com/blog/2015-10-07-how-to-keep-your-data-consistent.html
The post is about "Data fanning" (spreading items across many nodes in one atomic write operation).
The technique greatly addresses the feed model of the original question
The post actually contains example code for implementing it :
Function for creating the fannout object (actually a simple object with keys being API endpoints to be written)
function fanoutPost({ uid, followersSnaphot, post }) {
// Turn the hash of followers to an array of each id as the string
var followers = Object.keys(followersSnaphot.val());
var fanoutObj = {};
// write to each follower's timeline
followers.forEach((key) => fanoutObj['/timeline/' + key] = post);
return fanoutObj;
}
And the logic using this function :
var followersRef = new Firebase('https://<YOUR-FIREBASE-APP>.firebaseio.com/followers');
var followers = {};
followersRef.on('value', (snap) => followers = snap.val());
var btnAddPost = document.getElementById('btnAddPost');
var txtPostTitle = document.getElementById('txtPostTitle');
btnAddPost.addEventListener(() => {
// make post
var post = { title: txtPostTitle.value };
// make fanout-object
var fanoutObj = fanoutPost({
uid: followersRef.getAuth().uid,
followers: followers,
post: post
});
// Send the object to the Firebase db for fan-out
rootRef.update(fanoutObj);
});
Note: this is way more scalable than a loop writing each time in one follower feed. However, it could nevertheless be insufficient for millions of followers. In that case, it would be safer to trust a server operation making several writes. I think client-side can be used for up to a few hundreds followers, which is the average number of followers on social media. (This needs to be verified by testing though)

Facebook Graph API only returning 50 comments

I'm using the koala gem as show in Railscasts episode #361. I'm attempting to get all of the comments of a given Post but Facebook only seems to be giving me back the last 50 comments on the post. Is this a limitation of Facebook's Graph API or am I doing something wrong?
fb = Koala::Facebook::API.new oauth_token
post = fb.get_object(id_of_the_post)
comments = fb.get_object(post['id'])['comments']['data']
puts comments.size # prints 50
Graph API paginates the result when is a larger number of posts than the limit that is set (in your case 50).
In order to access the next page of results, call "next_page" method:
comments = fb.get_object(post['id'])
while comments['comments']['data'].present?
# Make operations with your results
comments = comments.next_page
end
Also, by looking in source one can see that "get_object" method receives 3 parameters:
def get_object(id, args = {}, options = {})
This way, you can raise your posts per page to as many posts as you want:
comments = fb.get_object(post['id'], {:limit => 1000})

What is the best way pre filter user access for sqlalchemy queries?

I have been looking at the sqlalchemy recipes on their wiki, but don't know which one is best to implement what I am trying to do.
Every row on in my tables have an user_id associated with it. Right now, for every query, I queried by the id of the user that's currently logged in, then query by the criteria I am interested in. My concern is that the developers might forget to add this filter to the query (a huge security risk). Therefore, I would like to set a global filter based on the current user's admin rights to filter what the logged in user could see.
Appreciate your help. Thanks.
Below is simplified redefined query constructor to filter all model queries (including relations). You can pass it to as query_cls parameter to sessionmaker. User ID parameter don't need to be global as far as session is constructed when it's already available.
class HackedQuery(Query):
def get(self, ident):
# Use default implementation when there is no condition
if not self._criterion:
return Query.get(self, ident)
# Copied from Query implementation with some changes.
if hasattr(ident, '__composite_values__'):
ident = ident.__composite_values__()
mapper = self._only_mapper_zero(
"get() can only be used against a single mapped class.")
key = mapper.identity_key_from_primary_key(ident)
if ident is None:
if key is not None:
ident = key[1]
else:
from sqlalchemy import util
ident = util.to_list(ident)
if ident is not None:
columns = list(mapper.primary_key)
if len(columns)!=len(ident):
raise TypeError("Number of values doen't match number "
'of columns in primary key')
params = {}
for column, value in zip(columns, ident):
params[column.key] = value
return self.filter_by(**params).first()
def QueryPublic(entities, session=None):
# It's not directly related to the problem, but is useful too.
query = HackedQuery(entities, session).with_polymorphic('*')
# Version for several entities needs thorough testing, so we
# don't use it yet.
assert len(entities)==1, entities
cls = _class_to_mapper(entities[0]).class_
public_condition = getattr(cls, 'public_condition', None)
if public_condition is not None:
query = query.filter(public_condition)
return query
It works for single model queries only, and there is a lot of work to make it suitable for other cases. I'd like to see an elaborated version since it's MUST HAVE functionality for most web applications. It uses fixed condition stored in each model class, so you have to modify it to your needs.
Here is a very naive implementation that assumes there is the attribute/property self.current_user logged in user has stored.
class YourBaseRequestHandler(object):
#property
def current_user(self):
"""The current user logged in."""
pass
def query(self, session, entities):
"""Use this method instead of :method:`Session.query()
<sqlalchemy.orm.session.Session.query>`.
"""
return session.query(entities).filter_by(user_id=self.current_user.id)
I wrote an SQLAlchemy extension that I think does what you are describing: https://github.com/mwhite/multialchemy
It does this by proxying changes to the Query._from_obj and QueryContext._froms properties, which is where the tables to select from ultimately get set.

Resources