Building a Twitter-bot RSS Aggregator

I hate RSS readers

I love my new bot

For some reason, about once a year, I go hunting for a new RSS reader so I can keep track of blogs that I like to read.  It’s always the same process:

  1. Dig through reviews to find an app that I think I will love.
  2. Come up with a list of blogs that I like to read.
  3. Install and configure RSS reader app.
  4. Spend the night reading blog posts.
  5. Use app for a week.
  6. Never open RSS reader app again.

Well, it’s time to stop the insanity. Its time to stop doing the same thing over and over again, hoping it will work this time. Its time to find a solution that really works.

Before I can find something that works, I need to think about

Why RSS readers don’t work for me

Its actually quite simple: I can’t be bothered to deal with yet another single-purpose app. I need blogs to be organized in a way that they are right in my face. When I have to go hunt for blogs to read, I tend to not do it.

I’m lazy.

How I find read blogs today, and why that’s not good enough

If you know me, this won’t be a surprise. Twitter. I follow people on twitter that I want to hear from. They tweet their blog posts. They tweet blog posts from others that are interesting. I click. I read. Done.

One problem with twitter is that my feed moves fast. I only read a fraction of what gets posted. My feed is filled with dog pictures, politics, dumb jokes, random conversation….oh, and SQL blogs. I miss stuff that I want to see.

Another problem is that not everyone is on twitter. (I know, I was pretty shocked to find out about this, too.) Or sometimes, I want to read someone’s blog without having to actually follow them.

Twitter to the rescue

Twitter isn’t good enough, so I decided twitter was the answer. Wait, does that make any sense?

Twitter isn’t working for me, but rather than jumping ship, I want to make it better. I was inspired by Chrissy LeMaire’s (blog|twitter) tweetbot @ClosedAsFixed, where her bot tweets when Connect items are closed as fixed. Her source code is available on GitHub.

I decided I would turn Twitter into an RSS aggregator. Here’s my plan:

  1. Create a dedicated twitter account to act as my aggregated feed: @SQLAnthology
  2. Use PowerShell to scrape RSS feeds that I care about
  3. Use PowerShell to tweet those blog posts from my dedicated account; Throttle my tweeting so I don’t spam myself
  4. Host it in Azure on some existing resources

By using Twitter, I’ll still have the posts thrust in front of me during my normal tweeting. I’ll also be able to go directly to that twitter handle and get to a reading list. I can do it regardless of what device I am using at the time. Best of all, I don’t need to check a dedicated app.

Step 1) Create & configure a twitter account

Create a Twitter account, just like any other. The only special step is that you’ll need to go to apps.twitter.com, and do some setup. The process is pretty straightforward, so I won’t go into details. Ultimately you want to get two key/secret pairs: a “Consumer Key and Secret” and “Access Token and Secret”. That’s four big ugly strings of gibberish. Save them, you’ll need them later.

Step 2) Gathering RSS feeds

What feeds do I care about? (aside: What feeds do you care about? Comment below and help me build my reading list.) I need a way to configure which feeds the job should care about. I’m a DBA, so I created a table (anthology.Blog) in an Azure SQL DB, and a wee little stored procedure (anthology.BlogList_Get) to get the list of blogs:

--Create a table
CREATE TABLE anthology.Blog(
 BlogID int identity(1,1),
 DisplayName nvarchar(30) COLLATE Latin1_General_CI_AS,
 RssUri nvarchar(200) COLLATE Latin1_General_CI_AS,
 CONSTRAINT PK_anthologyBlog PRIMARY KEY CLUSTERED (BlogID)
 );
GO
--Insert the blogs I want to read
INSERT INTO anthology.Blog (DisplayName,RssUri)
VALUES (N'Andy Mallon', N'http://am2.co/feed/');
GO
--A procedure to use to fetch the list when I need it
CREATE OR ALTER PROCEDURE anthology.BlogList_Get
AS
SET NOCOUNT ON
SELECT BlogName = DisplayName,
 [Url] = RssUri
FROM anthology.Blog;
GO

 

I’m going to rate-limit my bot, so that I only tweet once every half hour–so I need to keep track of what’s been tweeted, and what hasn’t. To accomplish this, I want to store the individual blog post URLs and metadata(not the post content) in my database, too. So, I’ll create another table (anthology.Archive) to keep track of what I’ve tweeted, as well as an upsert procedure (anthology.Archive_Upsert) to insert/update rows:

--Create a table
CREATE TABLE anthology.Archive (
 PostUrl nvarchar(200) COLLATE Latin1_General_CI_AS,
 BlogName nvarchar(30) COLLATE Latin1_General_CI_AS,
 PostTitle nvarchar(150) COLLATE Latin1_General_CI_AS,
 PostAuthor nvarchar(50) COLLATE Latin1_General_CI_AS,
 PostPublishDate datetime2(0),
 TweetText nvarchar(340) COLLATE Latin1_General_CI_AS, --Make sure we support emoji
 IsTweeted bit NOT NULL CONSTRAINT DF_anthologyIsTweeted DEFAULT 0,
 CONSTRAINT PK_anthologyArchive PRIMARY KEY CLUSTERED (PostUrl) WITH (DATA_COMPRESSION=PAGE),
 );
GO
--TweetText is 340 because I'm not storing the shortened URL. Twitter doesn't count the full URL length, but I need to consider it.
CREATE INDEX IX_anthologyArchiveUntweeted 
 ON anthology.Archive (BlogName, PostTitle, PostAuthor, PostUrl)
 WHERE IsTweeted = 0;
GO
--Upsert procedure
CREATE OR ALTER PROCEDURE anthology.Archive_Upsert
 @PostUrl nvarchar(200),
 @BlogName varchar(30) = NULL,
 @PostTitle nvarchar(150) = NULL,
 @PostAuthor varchar(50) = NULL,
 @PostPublishDate datetime2(0) = NULL,
 @TweetText nvarchar(340) = NULL,
 @IsTweeted bit = NULL
AS
SET NOCOUNT ON
UPDATE a
 SET BlogName = COALESCE(@BlogName,BlogName),
 PostTitle = COALESCE(@PostTitle,PostTitle),
 PostAuthor = COALESCE(@PostAuthor,PostAuthor),
 PostPublishDate = COALESCE(@PostPublishDate,PostPublishDate),
 TweetText = COALESCE(@TweetText,TweetText),
 IsTweeted = COALESCE(@IsTweeted,IsTweeted)
 FROM anthology.Archive a
 WHERE PostUrl = @PostUrl;

IF @@ROWCOUNT = 0
BEGIN
 INSERT INTO anthology.Archive (PostURL, BlogName, PostTitle,PostAuthor, PostPublishDate, TweetText, IsTweeted)
 SELECT @PostURL, @BlogName, @PostTitle, @PostAuthor, @PostPublishDate, @TweetText, COALESCE(@IsTweeted,0);
END;
GO

 

Now to do some work. Using PowerShell, we’ll step through the RSS feeds, parse out the important bits, then store those bits in the database.

  • Call BlogList_Get to get the list of RSS feeds
  • Loop through those RSS feeds
    • For each feed, loop through the posts
      • For each post, grab the author, post title, and post URL.
      • Upsert that post info to the database
# Step 1: Pull recent tweets into the archive
############################################################################
$rssList = Invoke-Sqlcmd -ServerInstance $sqlInstance -Database $sqlDatabase -Username $sqlUsername -Password $sqlPassword -Query "EXEC anthology.BlogList_Get"

$rssList | ForEach-Object {
 #Write-Host $_.blogName
 #Write-Host $_.Url

$rss = Invoke-WebRequest $_.Url

if ($rss.StatusCode -ne 200){
 # This should log an error somewhere
 # Add error handling really soon
 Write-Host "Error"
 Break
 }

#This is the content of the RSS feed
 [xml]$rssXml = $rss.Content
 $feed = $rssXml.rss.channel
 
 # The blog name from the RSS feed sucks sometimes. Just use what I set up in the configuration
 $blogName = $_.blogName
 
 #Loop through all the posts
 ForEach ($msg in $Feed.Item){

# post publish date -- We only want to tweet recent posts
 [datetime]$postPubDate = $msg.pubDate

# author -- Exact XML element varies by blog platform
 $postAuthor = $msg.creator.InnerText
 if ($postAuthor.length -eq 0){
 $postAuthor = $msg.creator
 }


 # Title
 $postTitle = $msg.title

# URL
 $postUrl = $msg.link
 
 $exists = $archive | Where-Object { $_.postUrl -eq $postUrl }
 
 # Add to archive if Post from the last 7 days that isn't in the archive
 if ($postPubDate -gt (Get-Date).AddDays(-7)) {
 Invoke-Sqlcmd -ServerInstance $sqlInstance -Database $sqlDatabase -Username $sqlUsername -Password $sqlPassword `
 -Query "EXEC anthology.Archive_Upsert @PostURL = N'$($postUrl)', @BlogName = N'$($blogName)', @PostTitle = N'$($postTitle)', @PostAuthor = N'$($postAuthor)', @PostPublishDate = N'$($postPubDate)'"
 }
 }#EndForEach Tweet
 
 }#EndForEach Blog

Holy cow! It works! …well, it mostly works. Blogger seems to have a different RSS feed format, so it doesn’t quite work with Blogger…oh well. I’ll revisit that in v2.

 

Step 3) Tweet based on what I’ve collected to my database

This was when I realized that it would be cool to map bloggers to their twitter handles. I might want to @mention a blogger when my bot tweets. One more table (anthology.BlogAuthor):

--Create a table
CREATE TABLE anthology.BlogAuthor(
 BlogID int,
 AuthorName nvarchar(50) COLLATE Latin1_General_CI_AS,
 TwitterHandle nvarchar(30) COLLATE Latin1_General_CI_AS,
 CONSTRAINT PK_anthologyBlogAuthor PRIMARY KEY CLUSTERED (BlogID,AuthorName)
 );
GO
--Insert some blog author-to-twitter handle mappings
INSERT INTO anthology.BlogAuthor (BlogID, AuthorName,TwitterHandle)
VALUES (1,N'Andy', N'@AMtwo')

 

With that done, time to tweet. I need to query the database to pull out the next blog post to tweet out. I use the pattern SELECT TOP 1…ORDER BY NEWID() to get one random row–this ensures that a backlog of posts goes out one at a time in a random order. Tuesday mornings are a busy time for blog posts, so when that happens, I will build a backlog, then trickle out the tweets for those posts throughout the day.

I’m going to build the tweet’s text in my T-SQL while I’m at it. This procedure (anthology.Archive_GetNextTweet) pieces together the blog name, author, post title, and URL to build a tweet. Remember: tweets can only be 140-characters long including a shortened URL. This means potentially truncating the tweet’s text to make sure we don’t break the rules:

--Isn't the CREATE OR ALTER syntax beautiful?
CREATE OR ALTER PROCEDURE anthology.Archive_GetNextTweet
AS
SET NOCOUNT ON
DECLARE @TweetText nvarchar(340),
 @PostURL nvarchar(200)
SELECT TOP 1 @TweetText = N'[' + BlogName + N']' + N' ' + COALESCE(ba.TwitterHandle,a.PostAuthor) 
 + NCHAR(10) + NCHAR(13) + PostTitle,
 @PostURL = PostUrl
FROM anthology.Archive a
LEFT JOIN anthology.Blog b ON b.DisplayName = a.BlogName
LEFT JOIN anthology.BlogAuthor ba ON ba.BlogID = b.BlogID AND ba.AuthorName = a.PostAuthor
WHERE IsTweeted = 0
ORDER BY NEWID();

IF LEN(@TweetText) > 119
BEGIN
 SET @TweetText = SUBSTRING(@TweetText,1,115) + N'…' + CHAR(10) + @PostURL
END;
ELSE
BEGIN
 SET @TweetText = @TweetText + CHAR(10) + @PostURL
END;

SELECT PostURL= @PostURL,
 TweetText = @TweetText
GO

 

Back to PowerShell to send that tweet. Sending a tweet via PowerShell sounds like it would be really intimidating…thankfully, someone else took care of that for me. Chrissy’s @ClosedAsFixed bot uses the Send-Tweet.ps1 script from the TechNet Gallery. It works, so I’m using it too. This is where you’ll need those key/secret pairs from Step 1. Check out the instructions from the gallery to see where to pop them into your script.

We just have to use Send-Tweet to …uhh… send a tweet. This actually turned out to be the easiest part:

  • Call Archive_GetNextTweet to get the next post to tweet
  • Use Send-Tweet to send the TweetText from that procedure
  • Call Archive_Upsert to mark that tweet as sent, and add the tweet text to the archive.
# Step 2: Post a single untweeted tweet from the archive 
############################################################################

# Pick one random untweeted post to be tweeted
$untweeted = Invoke-Sqlcmd -ServerInstance $sqlInstance -Database $sqlDatabase -Username $sqlUsername -Password $sqlPassword -Query "EXEC anthology.Archive_GetNextTweet"
$untweeted.TweetText

# Send the tweet
$null = Send-Tweet $untweeted.TweetText

# Mark this post as tweeted
Invoke-Sqlcmd -ServerInstance $sqlInstance -Database $sqlDatabase -Username $sqlUsername -Password $sqlPassword -Query "EXEC anthology.Archive_Upsert @PostURL = N'$($untweeted.postUrl)',@TweetText = N'$($untweeted.TweetText)', @IsTweeted = 1"

 

 

Step 4) Set it up to run automagically

I have an Azure SQL DB, and a Windows machine running in Azure. There are other possibilities for how you could configure this, but this was easiest for me. If you want to spin up resources for your own bot, you could easily run this on a Basic 5DTU SQL DB, and a Basic A0 VM.

  1. On the Windows machine, make sure you have SSMS 2016 installed. This is a requirement for you to get the SQLSERVER PowerShell module to do stuff like Invoke-Sqlcmd.
  2. Open SSMS, and run SqlAnthology.sql to create the necessary objects. Be sure to pick a password to use on the CREATE USER statement.
  3. Copy twitterbot.ps1 to your Windows machine (I put it in C:\scripts\SqlAnthology\). Be sure to set the SQL connection info and twitter key/secret pairs.
  4. Create a scheduled task to run every 30 minutes.
  5. Sit back and relax.

 

Download

My SQLAnthology code is available on GitHub.

 

B4C0

1 Comment

Comments are closed.