Skip to content
This repository has been archived by the owner on May 22, 2023. It is now read-only.

Commit

Permalink
Add support to mentions in post and comments, add -|mentions|-> edge …
Browse files Browse the repository at this point in the history
…type, issue #4
  • Loading branch information
Dataninja committed Apr 6, 2017
1 parent 55ec3db commit fcc7658
Show file tree
Hide file tree
Showing 2 changed files with 115 additions and 7 deletions.
8 changes: 5 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Facebook group harvester

Download all data from a Facebook Group: posts, comments (and subcomments), likes, reactions, users, using the [official facebook-sdk module](https://github.com/mobolic/facebook-sdk/) for Python 2.7. WARNING! Under development, not all features are implemented or to be considered stable.
Download all data from a Facebook Group: posts, comments (and subcomments), likes, reactions, users, shares and user mentions using the [official facebook-sdk module](https://github.com/mobolic/facebook-sdk/) for Python 2.7. WARNING! Under development, not all features are implemented or to be considered stable.

## Installation

Expand Down Expand Up @@ -44,10 +44,12 @@ Harvested data are organized in a graph structure:
* user -|is author of|-> comment,
* user -|reacts to|-> post,
* user -|reacts to|-> comment
* post -|mentions|-> user
* comment -|in reply to|-> post,
* comment -|in reply to|-> comment.
* comment -|in reply to|-> comment,
* comment -|mentions|-> user.

There are three types of nodes and three types of edges. Nodes have additional attributes (ie. name of user, message for posts and comments). Only the -|reacts to|-> edge has an attribute, the [type of reaction](https://developers.facebook.com/docs/graph-api/reference/post/reactions) (LIKE, LOVE, WOW, HAHA, SAD, ANGRY, THANKFUL).
There are three types of nodes and four types of edges. Nodes have additional attributes (ie. name of user, message for posts and comments). Only the -|reacts to|-> edge has an attribute, the [type of reaction](https://developers.facebook.com/docs/graph-api/reference/post/reactions) (LIKE, LOVE, WOW, HAHA, SAD, ANGRY, THANKFUL[, SHARE]).

This graph is saved in two formats: [GEXF](https://networkx.github.io/documentation/development/reference/generated/networkx.readwrite.gexf.write_gexf.html#networkx.readwrite.gexf.write_gexf) and [JSON](https://networkx.github.io/documentation/development/reference/generated/networkx.readwrite.json_graph.node_link_data.html#networkx.readwrite.json_graph.node_link_data).

Expand Down
114 changes: 110 additions & 4 deletions fb_group.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,14 +161,15 @@
posts = graph.get_all_connections(
id = group_id,
connection_name = "feed",
fields = "id,message,from,updated_time",
fields = "id,message,from,updated_time,to",
since = int((since_datetime - datetime(1970, 1, 1)).total_seconds()),
until = int((until_datetime - datetime(1970, 1, 1)).total_seconds())
)

num_posts = 0
num_shares = 0
num_reactions = 0
num_mentions = 0
num_comments = 0
for post in posts:

Expand Down Expand Up @@ -212,6 +213,40 @@
logging.info("+ DATETIME: %s" % post['updated_time'])
logging.info("+ AUTHOR: %s" % post['from']['name'])

for mention in post.get('to',{}).get('data',[]):

logging.debug(mention)

if mention['id'] == group['id']:
continue

num_mentions += 1

if not nx.get_node_attributes(G, mention['id']):
G.add_node(
mention['id'],
mtype = 'user',
fid = mention['id'],
label = mention.get('name','__NA__'),
url = "https://facebook.com/%s" % mention['id'],
name = mention.get('name','__NA__'),
about = '__NA__',
age_range = "0 - 99",
birthyear = '__NA__',
birthday = '__NA__',
cover = '__NA__',
education = '__NA__',
email = '__NA__',
gender = '__NA__',
hometown = '__NA__',
is_verified = '__NA__',
work = '__NA__'
) # user

G.add_edge(post['id'], mention['id'], mtype = 'mentions') # post -|mentions|> user

logging.info("++ MENTIONS: %s" % mention.get('name','__NA__'))

shares = graph.get_all_connections(
id = post['id'],
connection_name = "sharedposts",
Expand Down Expand Up @@ -286,7 +321,8 @@

comments = graph.get_all_connections(
id = post['id'],
connection_name = "comments"
connection_name = "comments",
fields = "id,message,from,created_time,message_tags"
)

for comment in comments:
Expand Down Expand Up @@ -344,6 +380,40 @@
logging.info("++ DATETIME: %s" % comment['created_time'])
logging.info("++ AUTHOR: %s" % comment['from']['name'])

for mention in comment.get('message_tags',[]):

logging.debug(mention)

if mention['id'] == group['id']:
continue

num_mentions += 1

if not nx.get_node_attributes(G, mention['id']):
G.add_node(
mention['id'],
mtype = 'user',
fid = mention['id'],
label = mention.get('name','__NA__'),
url = "https://facebook.com/%s" % mention['id'],
name = mention.get('name','__NA__'),
about = '__NA__',
age_range = "0 - 99",
birthyear = '__NA__',
birthday = '__NA__',
cover = '__NA__',
education = '__NA__',
email = '__NA__',
gender = '__NA__',
hometown = '__NA__',
is_verified = '__NA__',
work = '__NA__'
) # user

G.add_edge(comment['id'], mention['id'], mtype = 'mentions') # comment -|mentions|> user

logging.info("++ MENTIONS: %s" % mention.get('name','__NA__'))

comment_likes = graph.get_all_connections(
id = comment['id'],
connection_name = "likes"
Expand Down Expand Up @@ -381,7 +451,8 @@

replies = graph.get_all_connections(
id = comment['id'],
connection_name = "comments"
connection_name = "comments",
fields = "id,message,from,created_time,message_tags"
)

for reply in replies:
Expand Down Expand Up @@ -427,6 +498,40 @@
logging.info("+++ DATETIME: %s" % reply['created_time'])
logging.info("+++ AUTHOR: %s" % reply['from']['name'])

for mention in reply.get('message_tags',[]):

logging.debug(mention)

if mention['id'] == group['id']:
continue

num_mentions += 1

if not nx.get_node_attributes(G, mention['id']):
G.add_node(
mention['id'],
mtype = 'user',
fid = mention['id'],
label = mention.get('name','__NA__'),
url = "https://facebook.com/%s" % mention['id'],
name = mention.get('name','__NA__'),
about = '__NA__',
age_range = "0 - 99",
birthyear = '__NA__',
birthday = '__NA__',
cover = '__NA__',
education = '__NA__',
email = '__NA__',
gender = '__NA__',
hometown = '__NA__',
is_verified = '__NA__',
work = '__NA__'
) # user

G.add_edge(reply['id'], mention['id'], mtype = 'mentions') # comment -|mentions|> user

logging.info("++ MENTIONS: %s" % mention.get('name','__NA__'))

reply_likes = graph.get_all_connections(
id = reply['id'],
connection_name = "likes"
Expand Down Expand Up @@ -477,11 +582,12 @@

logging.info("Statistics from %s to %s" % (G.graph['since'], G.graph['until']))
logging.info(
"Members: %d | Posts: %d | Reactions: %d | Shares: %d | Comments: %d | Nodes: %d | Edges: %d" % (
"Members: %d | Posts: %d | Reactions: %d | Shares: %d | Mentions: %d | Comments: %d | Nodes: %d | Edges: %d" % (
len(filter(lambda (n, d): d['mtype'] == 'user', G.nodes(data=True))),
len(filter(lambda (n, d): d['mtype'] == 'post', G.nodes(data=True))),
len(filter(lambda (n1, n2, d): d['mtype'] == 'reacts to', G.edges(data=True))),
num_shares,
len(filter(lambda (n1, n2, d): d['mtype'] == 'mentions', G.edges(data=True))),
len(filter(lambda (n, d): d['mtype'] == 'comment', G.nodes(data=True))),
G.number_of_nodes(),
G.number_of_edges()
Expand Down

0 comments on commit fcc7658

Please sign in to comment.