Skip to content

To predict the Online News Popularity of mashable.com articles by no. of shares on the social media channels.

Notifications You must be signed in to change notification settings

anantkh/Predicting-Online-News-Popularity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Online News Popularity Project: To predict the Online News Popularity of mashable.com articles by no. of shares on the social media channels

Executive Summary

  • In order for online news companies like Mashable to succeed, they need to determine patterns and trends that contribute to the popularity of their models.
  • Our goal was to create and develop a model predicting which Mashable articles were widely shared on social networks based on several features of online news.
  • Based on the results, we determined which variables would contribute toward future content creations having a wider reach on social media through organic sharing.
  • Our business insights and recommendations for Mashable are based on our logistic regression model, which was implemented on a dataset built using stratified undersampling.
  • This model provides multiple insights and recommendations that will help Mashable improve their business.
  • These insights includes the impact of image insertions, article categorization, keyword strength, and article release day.
  • We then conclude with direction and specific actions Mashable could take to improve their articles’ virality.

Problem Statement

  • Mashable is a global, multi-platform media and entertainment company, and they post articles of multiple genres online from which they earn revenue from advertisers.
  • In order for companies like Mashable to succeed, they must be aware of the trends within their successful articles.
  • Without learning these trends, a company like Mashable may fail against competitors who also uses data-driven strategy.
  • Therefore, it is imperative to understand and predict what article characteristics are most appealing to readers.
  • Our solution to this problem involves predicting which articles are widely shared on social media.
  • Shares on social media is a key metric for article virality, and the specific business insight we are looking for is which factors lead to higher article virality.
  • This is very significant for business because it results in more article views without requiring paid marketing.

Dataset Introduction

  • The dataset is called “Online News Popularity Data Set” and it can be accessed from UCI Machine Learning Repository (https://archive.ics.uci.edu).
  • Each row represents a news article and was collected from January 7, 2013 to January 7, 2015.
  • There are 39,797 rows and 61 columns as shown in the Appendix Table 1.
  • The original dataset contained 37 attributes of articles, and several natural language processing features were extracted by previous researchers (Fernandes, Vinagre & Cortez, 2015). In this study, 17 selected predictors are used to build models.

Methodology

  • The methodology of this study is followed by SEMMA. SEMMA stands for Sample, Explore, Modify, Models, and Assess (Shmueli, Bruce, Stephens & Patel, 2017, p. 18).
  • During this process, our data visualizations were created by both Tableau and JMP software, and the models were built via JMP software.

About

To predict the Online News Popularity of mashable.com articles by no. of shares on the social media channels.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published