Moving Replication to a new server (without re-snapshotting)

Background

The guy in this picture is a Snapshot Agent

At work, we make very extensive use of transactional replication. We have dozens of publications. Most publications have two or three subscribers. We replicate across domains, across subnets, across geographies. We use both push & pull replication (we’ve found pull works best when crossing the globe).

Next year, we’ll be upgrading to new hardware, and upgrading to SQL 2014. When we do that, we want to avoid having to re-snapshot our multiple terabytes of replicated data. Unfortunately, SQL Server does not provide a nice way to update the publisher of a publication–and that’s exactly what we want to do.

We came up with a solution we thought would work–and tried it out on a smaller database with a small publication. With a little trial and error, it worked (I even surprised myself). This post steps through what we did.

<disclaimer>
Our solution does involve updating the distribution database directly. I realize I’m going beyond the “unsupported” signs, and right past the “not recommended” warnings. I’m OK with that–but if you’re considering doing this yourself, take note that you’re headed down the path where there is no support. Have backups, and be prepared to rebuild replication from scratch if you hopelessly break everything.
</disclaimer>

Step 1) Moving to a new server

So the first step is to move to the new hardware. We use mirroring for our HA, and use the -PublisherFailoverPartner parameter on the Log Reader & Snapshot agents. Moving to the server is simple–reconfigure mirroring to mirror to the new primary hardware, failover to the new primary hardware, then reconfigure mirroring to mirror to the new mirror hardware. It’s a well-known dance that’s often discussed & documented, so I won’t belabor the steps.

Step 2) Updating the distribution database

     Part A) Pause replication

Stop & disable all the Snapshot, Log Reader, and Distribution agents related to this server. The next script will update all publications on the server–so if you have multiple publications, don’t forget any. We don’t want anything running while we’re mucking with tables in the distribution database. We have a dedicated distributor server, so I stop & disable all of the jobs directly in SQL Agent to be sure I don’t miss anything.

     Part B) Update the distribution database

This is where we’ll update the distribution database. By reviewing sys.columns, I came up with the below list of updates. The goal is to update every single reference to the original server, and replace it with the new server name. Run this script on the distribution database.

DECLARE @old_publisher_name sysname = 'DBServer01'
DECLARE @new_publisher_name sysname = 'DBServer02'
DECLARE @old_publisher_id int = (SELECT server_id FROM sys.servers WHERE is_linked = 0 AND name = @old_publisher_name)

DECLARE @new_publisher_id int = (SELECT server_id FROM sys.servers WHERE is_linked = 0 AND name = @new_publisher_name)

UPDATE MSarticles                              SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 
UPDATE MSdistribution_agents                   SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 
UPDATE MSlogreader_agents                      SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 
UPDATE MSmerge_agents                          SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 
UPDATE MSmerge_identity_range_allocations      SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 
UPDATE MSmerge_subscriptions                   SET publisher_id = @new_publisher_id, publisher = @new_publisher_name  
                                                   WHERE publisher_id = @old_publisher_id AND publisher = @old_publisher_name;
UPDATE MSpublications                          SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 
UPDATE MSpublisher_databases                   SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 
UPDATE MSrepl_identity_range                   SET publisher = @new_publisher_name  WHERE publisher = @old_publisher_name;  
UPDATE MSreplication_monitordata               SET publisher_srvid = @new_publisher_id, publisher = @new_publisher_name  
                                                   WHERE publisher_srvid = @old_publisher_id AND publisher = @old_publisher_name;
UPDATE MSsnapshot_agents                       SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 
UPDATE MSsubscriber_info                       SET publisher = @new_publisher_name  WHERE publisher = @old_publisher_name;
UPDATE MSsubscriber_schedule                   SET publisher = @new_publisher_name  WHERE publisher = @old_publisher_name;
UPDATE MSsubscriptions                         SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 
UPDATE MSsync_states                           SET publisher_id = @new_publisher_id WHERE publisher_id = @old_publisher_id; 

     Part C) Update & start agents

Update the SQL Agent jobs for all of the agents. Specifically, we’re updating the -Publisher and -PublisherFailoverPartner parameters to reflect the new server names. After the agent is updated, enable & start it.

The problem

At this point, replication seemed to work swimmingly. Transactions were replicating, data was flowing. When the Replication Monitor showed data was caught up, I tried to insert a tracer token, and was thwarted. I got an error message that my publication didn’t exist. (Obviously it did. I just watched it replicate transactions.)  Switching to T-SQL, I tried sp_posttracertoken, and got a slightly more informative error:

Msg 21200, Level 16, State 1, Procedure sp_MSadd_tracer_token, Line 75
Publication ‘PublicationName’ does not exist.
Msg 21486, Level 16, State 1, Procedure sp_MSrepl_posttracertoken, Line 128
An error occurred while logging the tracer token history information. The tracer token could not be posted.

Now that’s curious. My publication does not exist, and the tracer token history could not be logged. Since my publication does exist, that error message isn’t totally accurate. I’m guessing I probably just missed something.
The applicable call stack looks like this:

  • sp_posttracertoken
    • sp_MSrepl_posttracertoken
      • sp_MSadd_tracer_token
        • Throws error 21200
      • Throws 21486 error specifically because sp_MSadd_tracer_token failed

Of note, most of these stored procedures live in the resource database–just to make life a touch more interesting/confusing. So, my next step was to step through the code and figure out why sp_MSadd_tracer_token was failing–and figure out if I can solve it.

After tracing through the source code for these system stored procedures, I was able to track down this line in sys.sp_MSrepl_posttracertoken:

SET @publisher = publishingservername()

Turns out, this is the key! Per the documentation, publishingservername() “returns the name of the originating Publisher for a published database participating in a database mirroring session.” This line runs on the publisher database, and is still returning the original server name. This results in a bad value being passed into sp_MSadd_tracer_token, resulting in the error. Specifically, it ends up checking the distributor expecting the publication to still exist for the old server–but since we updated the distributor already, that fails.

A quick look at system tables, and I saw dbo.sysreplservers–which contains one row listing the old server name. We have to update this table in the publisher database so that it matches the distribution database.

Step 3) Updating the publisher database

In addition to the updates to the distribution database, I also have to make one update per published database. This is the last step in making the published database know this server is its permanent home.
When doing this in production, I ran this step at the same time as Step 2B above–that is, while all of the replication agents were disabled.

UPDATE repl
SET srvname = s.name
FROM dbo.sysreplservers repl
JOIN sys.servers s ON s.server_id = repl.srvid
WHERE repl.srvname <> s.name;

Optional — Updating job names

The names of the SQL Agent jobs on the Distributor will have names that include the name of the server that originally served as publisher. There’s no functional reason to change those. But I worry that it could be confusing. I won’t be maintaining this forever, and I’m not the only DBA on my team–so I’d like to keep the confusion to a minimum for the next guy.

This involves updating both the job name as well as some tables in the distribution database.

  • MSdistribution_agents
  • MSlogreader_agents
  • MSmerge_agents
  • MSsnapshot_agents
  • MSreplication_monitordata

I’ll add a script to do this work shortly–Just need to make sure it’s tested before I publish it. Check back soon.
I lost the script file for this in The Great Google Drive Disaster of 2015. But the meat of it is posted above.

Summary

Updating system tables and making changes that are unsupported come with risk. In our case, rebuilding replication actually comes with a substantial cost as well–we would spend many hours rebuilding & resnapshotting our complex & busy replication topology. For us, the risk of hacking replication to change publishers is worth the substantial reward in saved time.

I think this hack is likely only valuable to you if you have complex, busy, or a very large replication topology. If you do find it valuable, or if you run into any troubles, please let me know.

6 Comments

  1. Just wanted to say thanks for this post. I was dealing with some strange replication issues after we failed mirroring over. This pointed me in the right direction. Really like your posts. Glad to see someone else leaving notes to their future selves in job descriptions!

  2. Hello,
    amazing work, I’m trying to do something similar, I’ve to detach a database which is a published database, moving the files to the new server, attaching it, resuming replication. I can’t detach the database since it’s published, the instance from where I’m detaching the database won’t be decommissioned so I can’t mess it up.
    Any suggestion?

    • I’d recommend that you not use detach/attach to move the database. Instead, use backup/restore. This is a much safer option, and also comes with some added flexibility that will help you make the actual cutover (and thus downtime) faster & easier.

Comments are closed.