The News API - Requesting Live Headlines with Python

Part One: Collecting Live News Data Using the News API

This project was completed for Cloud Computing @ George Washington University, with Daniel Anderson, Michael Arango, and Megan Foler. The full python code can be found here.

Introduction

Many of us choose the news we read, with or without knowing it, whether it's by the news sources we visit, articles we click, or personalities we follow. But what about the mood of the articles we want to read, from any source? This project creates a program to provide a user the choice to receive positive or negative news, from news sources around the world.

There are three parts to design this program:

1) Collecting Live News Data Using the News API

First, we will receive news article metadata and perform data preprocessing.

2) Using the Google Translate API to Translate International News

Next, we will translate any articles not in English.

3) Using the Google Natural Language API to Analyze News Sentiment

Finally, we will select positive or negative news, based on the Natural Language analysis and visualize the results.

The News API

The News API is a free to use API to pull live news articles from 70 international sources.

Set Up

Using this API takes few steps to set up. We must first request a key, here.

Then, install the newsapi python library with pip install newsapi

Once those steps are complete, we can write the following to connect our new API key:

import newsapi
apikey = 'insertkeynumber'

Articles

The News API offers two classes, Articles and Sources, that redirect to two endpoints, https://newsapi.org/v1/articles and https://newsapi.org/v1/sources, respectively.

Starting with articles, we make our first request using .get()

#instantiate an article object
from newsapi.articles import Articles
a = Articles(API_KEY=apikey)
data = a.get(source="bbc-news", sort_by='top')

The .get() method takes three arguments: source (required), sort_by (optional), and attributes_format (optional Default:True). The source is the identifer for the news source or blog you want headlines from. The sort_by argument lets the user specify which type of list they want. The possible options are top, latest, and popular. Note: not all options are available for all sources (default: top).

For each article request, the api returns: author - The author of the article description - A description or preface for the article. title - The headline or title of the article. url - The direct URL to the content page of the article. urlToImage - The URL to a relevant image for the article. publishedAt - The best attempt at finding a date for the article, in UTC (+0).1

A snippet of output for BBC News articles only. You can choose any source that you prefer:

{'articles': [{'author': 'BBC News',
   'description': 'Five people are reported dead and 1,000 have been rescued, the US National Weather Service says.',
   'publishedAt': '2017-08-27T13:54:29Z',
   'title': "Storm Harvey: 1,000 rescued as Houston hit by 'catastrophic floods'",
   'url': 'http://www.bbc.co.uk/news/world-us-canada-41067315',
   'urlToImage': 'https://ichef.bbci.co.uk/images/ic/1024x576/p05dg1pb.jpg'},
  {'author': 'BBC News',
   'description': 'Rockport residents describe the moment the great storm hit their homes in Texas.',
   'publishedAt': '2017-08-27T07:08:39Z',
   'title': 'Harvey: Too poor to flee the hurricane',
   'url': 'http://www.bbc.co.uk/news/world-us-canada-41065335',
   'urlToImage': 'https://ichef.bbci.co.uk/news/1024/cpsprodpb/B42E/production/_97562164_judy.jpg'},
  {'author': 'BBC News',
   'description': 'The sister of two terrorist suspects has condemned the attacks in Spain which left 15 people dead.',
   'publishedAt': '2017-08-26T23:48:54Z',
   'title': "Spain attacks: Suspects' sister condemns violence",
   'url': 'http://www.bbc.co.uk/news/av/world-europe-41064715/spain-attacks-suspects-sister-condemns-violence',
   'urlToImage': 'https://ichef-1.bbci.co.uk/news/1024/cpsprodpb/4728/production/_97561281_p05df89t.jpg'},

Data Processing

A pandas DataFrame is much easer to work with down the road, so we will convert from JSON to DataFrame using pd.DataFrame.from_dict and .appy([pd.Series]):

import pandas
data = pd.DataFrame.from_dict(data)
data = pd.concat([data.drop(['articles'], axis=1), data['articles'].apply(pd.Series)], axis=1)
data.head()

Sources

Next, let's work with Sources

The .get() method for Sources takes three arguments: category (optional), language (optional), and country (optional).

First set up the API connection

from newsapi.sources import Sources
s = Sources(API_KEY=apikey)

Next, we make a request for all available sources and convert to pandas dataframe

# return all available sources
sources = s.get()
# convert to dataframe and drop status column
sources = data = pd.DataFrame.from_dict(sources).drop('status', axis=1)
# take the array column 'sources' and spread it across multiple columns
sources = pd.concat([sources.drop(['sources'], axis=1),
                     sources['sources'].apply(pd.Series)], axis=1).drop('urlsToLogos', axis=1)
sources.tail()

Combining the News API Data

As you will see in the next post, we want both article and source information when analyzing the content of these articles.

We now call all articles and transform the dataset into a DataFrame:

results = []
for i in range(len(sources)):
      results.append(a.get(sources['id'][i]))

results = pd.DataFrame.from_dict(results)
results = pd.concat([results.drop(['articles'], axis=1),
                  results['articles'].apply(pd.Series)], axis=1)

We use the pd.melt() panads function so that our data is organized to reflect one article per row.

results = pd.melt(frame = results, id_vars=['source','status','sortBy'],
        var_name='new').drop(['status','sortBy','new'],axis=1)

We want a column for each piece of metadata, so we use apply(pd.Series) again, and we drop the columns with empty values:

results = pd.concat([results.drop(['value'], axis=1),
             results['value'].apply(pd.Series)], axis=1)

results = results.drop([0], axis=1)

results = results.dropna()  

Conclusion

We have completed Part 1 of this project. Now, we will use our preprocessed data to perform translation and sentiment analysis.

Click here to continue to Part 2: Calling the Google Translate API to Translate International News

Go Top