As of 2017 I am no longer associated with Alaska Dispatch News or Anchorage Daily News.

Overview

Google AMP is quite restrictive by requiring all pages to hit a fairly comprehensive validation criteria, and sometimes human error can cause a document to become invalid. At Alaska Dispatch News two of our biggest examples of this come from malformed URL's, and content pasted from another source due to additional attributes that AMP doesn't like. While we've put in a number of restrictions that curb how content is filtered through to the AMP site there's only so much we can do until human intervention is required to solve the issue. But how do you know there's an issue?

Most of these issues come from improper content sanitzation. In an ideal world your content editors wouldn't ever need to care about this stuff.

Google Webmaster tools reports on AMP errors whenever it crawls the site, but that is not instant and not everyone has access to it, and by the time you're aware you might have already missed the traffic spike which it may have produced. In order to make sure that all of our articles are reaching their full potential we decided to create a Slack bot using Python.

Cloudflare & Chartbeat

Alaska Dispatch News uses a product called Chartbeat to monitor real-time site analytics on a day-to-day basis. It reports current visitors, their traffic source, distribution, you name it. The nice thing about Chartbeat is that it has an API method to fetch your highest performing content. We already use this method to power other things such as our "Most-Read" widget so this is what we decided to use to power our bot.

def getArticles():
    """Gets a list of articles from the most-read endpoint (Sourced from Chartbeat)"""

    # Makes sure that requests is able to get data and if not throw an exception
    try:
        request = requests.get(ADN_API_CHARTBEAT_PATH)
        request.raise_for_status()

    except requests.exceptions.RequestException as error:
        # Handle error
        print error

    request_json = request.json()
    for article in request_json['content_elements']:
        # Format the URL so it points to the AMP article
        canonical_url = article['canonical_url']
        amp_url = 'adn.com%s?outputType=amp-type' % canonical_url

        # Send the article to the validator function
        validate(amp_url)

During Google's AMPConf it was announced that Cloudflare would be providing a free AMP validator API which you can cURL with your AMP document path to see if it's validating or not. If it's not validating it will return a piece of JSON which contains the reason as to why. This is perfect for the bot, so once we have our URL we send it to the validate function which does just that.

def validate(article):
    """Sends each article to the Cloudflare AMP validator API"""
    path = 'https://amp.cloudflare.com/q/%s' % article
    request = requests.get(path)
    document = request.json()

    # Fires pass/fail depending on the response from the API
    if document['valid'] == True:
        passed()

    if document['valid'] == False:
        failed(document)

If the document failed validation the bot sends a payload to Slack via a webhook with the information returned from the Cloudflare API. This message includes the error description, code, line and source. We also didn't want warnings appearing so we filtered them out here using an if statement.

def failed(article):
    """If an article fails validation send a payload to Slack with the information"""
    global errors
    global time
    article_source = article['source']
    article_errors = article['errors']

    # Loops over all errors in the object
    for error in article_errors:
        errors = errors + 1
        reason = error['error']
        line = str(error['line'])
        code = error['code']

        # Filter out warnings
        if code != 'WARNING_TAG_REQUIRED_BY_MISSING':
            attachments = []
            attachment = {"fallback": "AMP Error found on line %s for article %s - %s - %s" % (line, article_source, reason, code), "title": "AMP Error Found: %s :x:" % (code), "title_link": "%s" % (article_source), "color": "#ff0000", "pretext": "An AMP error has been located on line %s. This error is causing the AMP document to not validate. If you're unsure how to resolve this error please refer to an Alaska Dispatch News developer. The next test will run in one hour." % (line), "text": "%s" % (reason), "footer": "Alaska Dispatch News"}
            attachments.append(attachment)
            slack.notify(attachments=attachments)

    print 'The following article object failed validation --  %s' % article

We wanted the bot to send a confirmation message if all of the articles it tested passed validation. In order to do this I setup an errors and passes variable near the top of the script and used the passed() and failed() functions to increment their counters. Once each article has finished validating I call a confirmation function which tallies up the results. If there are no errors it sends a confirmation message to Slack, if there are a large amount of errors (4 or more) it sends an alert to our Developer user group on Slack as it's likely that something more serious is causing a large amount of articles to become invalid.

def confirm():
    "" "If there are no AMP errors found alert the Slack channel and remind them that the next test will run in an hour (Based on schedule)"""
    global errors
    global passes
    articles = str(passes)

    if errors == 0:
        # Send a confirmation to Slack
        print 'All articles validated successfully'

    # If a large amount of errors are found send an alert to the developer team
    if errors >= 4:
        # Send a message to Slack which highlights developer team
        print 'Large amount of AMP errors detected'

And with that the 'Google AMP Validator Cat' bot was born. It's setup to run hourly on a scheduler via Heroku, but can also be run locally.

AMP Validator Slack
Bot

Cheers

I created this service with inspiration from the talk that Natalia Baltazar from The Guardian gave at AMPConf, it's certainly worth checking out. I also used the following Python packages: requests, and slackweb. It's been a long time since I've wrote anything in Python so I'm quite pleased with how useful this bot has already become considering how simple it was to make.

If you have any questions or comments please let me know, I'd love to hear them. If you'd like to check out the Slack bot code for yourself, click here.

Edit: After posting this article I received a lovely package from Cloudflare with some swag. Thanks! ❤️

Cloudflare swag

Cloudflare swag