Google Natural Language API - Analyzing Live News Sentiment in Python

// under API Google machine learning python

Part Three: Using the Google Natural Language API to Analyze News Sentiment

This is a continuation of a three-part series. Go to: Part One / Part Two.

We set out to create a program to provide a user the choice to receive positive or negative news, from news sources around the world. This can show us not only how some amazing open-source machine learning technology works, but also how our news can be manipulated using this technology.

So far, we have gathered the data using the News API, and translated article metadata from German to English using the Google Translation API.

We are now onto our experiment with a third API, the Google Natural Language API (part of the Google Cloud Machine Learning library). We will use the API to score news article descriptions on a negative to positive sentiment scale.

The Google Natural Language API on the Google Cloud Platform (GCP) is an useful tool to analyze text. Instead of starting from scratch to build an algorithm that understands the sentiment behind a block of words, which would be an extremely time-consuming task, one can call this API to take advantage of Google's already trained dataset, hosted in the cloud.

API Set Up

Because we already authenticated our Google Cloud account with our workstation in the previous post, we do not need to do so again. However, there are a few steps to follow to install the Natural Language API.

From your Google Cloud console, open the left side menu and select APIs & services, and then select Library.
Click the Natural Language API under Google Cloud Machine Learning.
Click ENABLE.
Open your terminal, and install the API library.

$ pip install --upgrade google-cloud-natural-language

That's it, we are ready to use the API in python!

Sentiment Analysis

We are going to create a function that assigns a sentiment score to the description field associated with each news article we gathered.

First, import the necessary libraries and define the client variable:

#Import libraries after installing the Google Language API: https://cloud.google.com/natural-language/docs/reference/libraries

from google.cloud import language

from google.cloud.language import enums

from google.cloud.language import types

# define client variable
client = language.LanguageServiceClient()

We will use the DataFrame that contains all description fields in English, gathered in the last part.

Our first function, sentiment_list() provides us a list of dictionaries with the sentiment score and magnitude score of each article, based on the description field. We also include some other metadata from the articles.

def sentiment_list():

    #local list, will be saved for later
    sent_list = []

    #iterates through the all articles capture by the News API. The datatframe is called "results"
    for i, row in translated.iterrows():

        #creates a dictionary to store the data capture in the loop
        sent_dict={}

        #uses the types.Document() function to perform analysis on the text and store it
        document = types.Document(
            content=translated['description_x'][i],
            type=enums.Document.Type.PLAIN_TEXT)

        # Detects the sentiment of the text stored above
        sentiment = client.analyze_sentiment(document=document).document_sentiment

        # Assigns dictionary values for the URL, Title, Description, Sentiment, Magnitude, and Category
        sent_dict['URL'] = translated.url_x[i]
        sent_dict['Title'] = translated.title[i]
        sent_dict['Sentiment'] = sentiment.score
        sent_dict['Magnitude'] = sentiment.magnitude
        sent_dict['Description'] = translated.description_x[i]
        sent_dict['Category'] = translated.category[i]

        #Appending the values in our dictionary to a list
        sent_list.append(sent_dict)

    return sent_list

Now, let's store the sent_list that we created (we'll use it in the next function), and preview the first entry.

sent_list = sentiment_list()

sent_list[0]

Sentiment and Magnitude Overview

Before we build our "Choose Your News" function with our data, I want to first review sentiment and magnitude, and the parameters we will choose for this program.

Sentiment Analysis inspects the given text and identifies the prevailing emotional opinion within the text, especially to determine a writer's attitude as positive, negative, or neutral.

Our model uses the following thresholds, but you can choose any: Sentiment >= .25 → Positive Article Sentiment <= - .25 → Negative Article

Magnitude indicates the overall strength of emotion (both positive and negative) within the given text, between 0.0 and +inf

Our model uses the following thresholds: Magnitude >= .5 → Positive Article Magnitude <= - .5 → Negative Article

As you can see from our print out above, the article has a sentiment score of -.6, and a magnitude score of .6. This means that this article fits our negative article group, given that is negative enough and strong enough.

Choose Your News function

We will now build our function to allow an end user to choose whether they would like to receive positive or negative news, based on the sentiment and magnitude parameters we've chosen.

For positive articles we sort the list from highest to lowest - so that we could capture the top ten. For negative articles, we sort the list from lowest to highest so that we could capture the highest negative articles. Using this method, we are able to filter the articles on a scale of -1 (very negative) to 1 (very positive).

def choose_news(sent_list):

    #User selects type of articles
    choice = input("Are you looking for positive or negative articles?  \n")

    #sorts list from highest to lowest (reverse=True) for the positive articles
    if choice == 'positive':
        sent_list = sorted(sent_list, key=lambda k: k['Sentiment'], reverse=True)

    #sorts list from lowest to highest for the negative articles
    if choice == 'negative':
        sent_list = sorted(sent_list, key=lambda k: k['Sentiment'])

    i=0
    for score in sent_list:
        if choice == 'positive':
            print('')
            print('Here is a generally POSITIVE sentiment article:')
            print('')
            if (score['Sentiment'] > .25 and  score['Magnitude'] > .5):
                print (u"Title: {}\nURL: {}\nDescription: {}\nSentiment: {}".format(score['Title'],score['URL'],score['Description'],score['Sentiment']))
            i=i+1   
            if i == 10:
                break

        if choice == 'negative':
            print('')
            print('Here is a generally NEGATIVE sentiment article:')
            print('')
            #sorted_list = sorted(sent_list, key=lambda k: k['Sentiment'])
            if (score['Sentiment'] <= -.25 and  score['Magnitude'] > .5):
                print (u"Title: {}\nURL: {}\nDescription: {}\nSentiment: {}".format(score['Title'],score['URL'],score['Description'],score['Sentiment']))
            i=i+1   
            if i == 10:
                break

choose_news(sent_list)

Here's some of the output of positive news:

Here's some of the output of negative news:

As you can see, the positive article descriptions are generally happy, and the negative ones are generally not. It's pretty neat what's going on here: the Language API has learned what is "positive" and what is "negative", and can score a block of text using this training. You can also see one of our German articles appears in the negative news output.

Conclusion

We've reached the end of our my three-part series on APIs, machine learning, and natural language processing.

After our News API data was preprocessed, we created an algorithm using python. We passed each article through a function using the Google Natural Language API. We captured sentiment and magnitude using the “description” field captured in our dataset. We stored the sentiment and magnitude values, as well as other metadata about the news, in a list and printed the user's choice from the list.

I hope you've learned something about the emerging machine learning technology that we all can harness.

Go Top