What I Learned Building a Startup on Microsoft Cloud Services: Part 4 – To Build or Buy a Queuing System

February 15th, 2012

I am the founder of a startup called Cotega and also a Microsoft employee within the SQL Azure group where I work as a Program Manager. This is a series of posts where I talk about my experience building a startup outside of Microsoft. I do my best to take my Microsoft hat off and tell both the good parts and the bad parts I experienced using Azure.

One core piece of the Cotega services is the queuing system. This system resides within the Windows Azure Worker Roles that are constantly checking for new jobs. A job is simply a task that the Cotega service needs to execute. For example, the service may check if a user’s database is up and running, or it may check the user count in a user’s database. For each job, the data is logged and if issues are found email notifications are sent.

Building a Queuing System in SQL Azure

As you might imagine, this is a perfect job for a queue. Most of you would probably start by using Windows Azure queues, but since I am a long time database guy, my preference is to always do as much as I can within the database, simply because that is where I am most comfortable and my queuing needs are pretty simple. I also like to keep things simple and since I was already using SQL Azure for my system database it seemed simpler to use that also for queiing rather than add in another service.

I have to tell you, building a simple queuing system in SQL Azure was way easier than I expected and it worked really well. My queue was in a table called Notifications. When a user adds a new notification in the admin web site, I insert a row into this table with a timestamp of ‘1900/01/01’. Then the worker role which is constantly checking for rows that are less than the current time, picks up these notifications, marks a “Processing Time” field for this row with the current timestamp and starts executing the associated job. When the job is done, the jobs status is marked as complete and the timestamp is updated to some interval past the current time, so that it can be picked up again later on.

Handling Multiple Threads

One of the biggest problems I had to handle was the fact that I have multiple worker roles and then multiple threads within each of those worker roles. With all of these processes it is very possible for a single job to be picked up by more than one thread. To handle this problem, I needed to create a stored procedure that would use a row locking system to ensure that a job could not be picked up by more than one thread. The secret to getting this to work is in the use of WITH (UPDLOCK, READPAST) code. This tells the SQL Azure engine to skip any rows that are locked. The locking ensures no other threads will pick it up. This is what my stored procedure looked like:

create procedure get_next_job as begin

DECLARE @NextId INTEGER
BEGIN TRANSACTION

– Find next available item available where the status is enabled
SELECT TOP 1 @NextId = [NotificationId]
FROM [Notifications] WITH (UPDLOCK, READPAST) WHERE [NextRunTime] <= getdate() and [Status] = 1 ORDER BY [NextRunTime] ASC

– If found, flag it to prevent being picked up again
IF (@NextId IS NOT NULL)
BEGIN
UPDATE [Notifications]
SET [ProcessingTime] = getdate(),NextRunTime] = dateadd(mi,frequency,[NextRunTime])
WHERE [NotificationId] = @NextId
END

COMMIT TRANSACTION

– Now return the queue item, if we have one
IF (@NextId IS NOT NULL)
SELECT [NotificationId], [TableName], [EmailAddress], [ChangeColumn],[ChangeType], [MessageTextColumn], [LastSuccessfulRunValue],[DatabaseConnString], [DatabaseType]

FROM [Notifications], [UserDatabases]
WHERE [NotificationId] = @NextId and [Notifications]
.[DatabaseID] = [UserDatabases].[UserDatabaseID]
end
go

Handling Orphaned Jobs

The other issue I had to consider is when the machine crashed or the job did not complete. Although this had not happened yet, I am sure at some point it will, so I need to make sure a users job does not get lost forever. This is where the ‘“Processing Time” column comes into play. In the above stored procedure you can see I set this to the current date time. I also know that is should take no more than a minute to complete any job. Therefore, if a job is more than one minute old, I know there was a problem and I can handle it.

Throwing it all away

All of this worked incredibly well and the performance was terrific. Another advantage I found was that regardless of the number of queries I executed against my SQL Azure database, I was guaranteed a fixed cost per month. Best of all that cost was free since I already had a SQL Azure database for my system data. With Windows Azure queues the cost is not fixed, although the cost for this is very low. Ultimately, I decided to throw this all away and move to Windows Azure queues. But I will talk about that more next week.

3 Responses to “What I Learned Building a Startup on Microsoft Cloud Services: Part 4 – To Build or Buy a Queuing System”

  1. [...] Cavanagh (@liamca) continued his series with What I Learned Building a Startup on Microsoft Cloud Services: Part 4 – To Build or Buy a Queuing … on [...]

  2. [...] Cloud Data Services Liam Cavanagh's Blog « What I Learned Building a Startup on Microsoft Cloud Services: Part 4 – To Build or Buy a Queu… [...]

  3. [...] my previous post I talked about how I built the Cotega queuing system on SQL Azure. I was pleased at how well it [...]

RSS feed for comments on this post.

Leave a Reply