Sitecore
Sitecore Content Migration
May 11, 2017 Dalibor Kovač

Sitecore is a web content management system and digital marketing platform. In this post I'll be looking at one of the ways to do large content migrations between Sitecore instances.

Different ways to perform migrations

When doing data migrations there are always a couple of ways to do it. Sitecore content migrations are no exception:

  • You can use some third party tools or plugins.
  • In some specific situations you can choose to build your own data migration tools.
  • You can use the tools that come out-of-the-box with every installation of Sitecore.

 

The easiest situation is when you need to migrate complete Sitecore instance, with everything it contains. In that case you can just create backups of Microsoft SQL Server databases where your Sitecore data is kept and restore them on the target Sitecore instance. That way your Sitecore instance is effectively cloned, and that can only be done if your source and target Sitecore instances are of the same version. Real data migrations come into play when you need to migrate just part of the content or when you need to migrate content between Sitecore instances of different versions. In this post I’ll concentrate on doing that using standard Sitecore tools.

 

Standard Sitecore tools

A good choice for Sitecore content migrations is using standard Sitecore content packages. With every Sitecore installation you get two Development Tools:

  • Package Designer
  • Installation Wizard

You have to use the “Desktop” user interface to run them.

These tools and Sitecore content packages are a good option to use because you don’t have to install any other third party tool. They can be used for migrating content between different versions of Sitecore (e.g. from Sitecore 6 to Sitecore 8), and they can handle large volumes of data.

In my previous blog post, Sitecore Content Packages, I’ve explained in detail what content packages are and how to use Package Designer and Installation Wizard. In this post I’ll point out some things to be careful about, especially when migrating large volumes of data.

 

What to migrate and what not to migrate?

As a general rule: you should migrate only the items that you have created. Do not migrate the items that came with the installation of Sitecore.

If your source and target Sitecore versions are different and you overwrite some of Sitecore’s system items using a different version that could cause problems and your Sitecore instance might not function properly.

If the folder is called “System” that’s a good sign that it should be avoided in migration, like:

  • Sitecore/System
  • Sitecore/Media Library/System
  • Sitecore/Templates/System
  • Sitecore/Templates/Branches/System

 

Of course, there can be exceptions. For example, your custom Commands and Schedules could be kept under Sitecore/System/Tasks node and by all means they should be migrated because they are your custom items. Just make sure not to migrate something else from under that “System” node.

 

When we look at Sitecore our focus is usually on “master” database because that’s where our content “lives”, but we should not forget that some custom items might also exist in “core” database. If you have added some custom fields they would be defined in core database under sitecore/system/Field types. So, you should also include these items in your migration because your content most likely depends on them. You have most probably written some custom code for your custom field types, so you should make sure that code is also deployed to your target instance. The same goes for any additional modules that you might have installed on your Sitecore instance.

 

I guess you just have to know your content and the customizations in your Sitecore instance to be able to tell what needs to be included in the migration.

 

Migrating only master database, or both master and web?

As you probably know if you work with Sitecore, there are two databases that hold the content:

  • Master database – this is where content authors work, editing or developing new content. When the content is ready it is published (copied) to web database.
  • Web database – this is the database from which live website draws content to be displayed on web pages.

If content authors have a consistent publishing strategy, or you have an automated publishing process then your web database is probably in complete sync with your master database, meaning that everything that should be published is published and items that shouldn’t be published yet are either marked as not publishable or their publish date is set to a future date. In that case it is safe to migrate only master database, and then publish everything in target instance, thus populating web database.

If, on the other hand, you’re not sure that everything is kept in line in master database, meaning that you might have some content in master database that should not be published but is not marked as not publishable, then it’s risky to publish everything after migration. In that case you might opt to also migrate web database from source instance into web database on target instance. That means doubling the effort, but it’s better to be safe than sorry.

Adjust configuration parameters on target instance

Here are two configuration parameters you might check and adjust on target instance before starting import:

Invalid Item Name Characters setting

There is a configuration setting called InvalidItemNameChars. It holds the list of characters that are not allowed to be used in Sitecore item names. For example, default value of that setting in Sitecore 8 is: /:?”<>|[]. It is good to clear out the value of that setting during import to prevent errors in case you have some items whose names contain these characters. Just remember to return it back to original value after you're done with import.

Disable event queue

Event queue in Sitecore is used to propagate events among different servers in a multi-server installation of Sitecore.

 

While doing the mass import your Sitecore instance is probably not used by anyone else other than you, so there’s really no need to broadcast and process large number of events that are of no use to anyone at that moment. So, to speed up the import, you can disable event queue while importing. The setting you need to find is EnableEventQueues and set it to false. Just remember to set it back to true after you finish import. And you also need to rebuild all search indexes after import is done, to get them up to speed with all new content that was created.

 

Package size

Larger Sitecore instances might contain tens or hundreds of gigabytes of data. It goes without saying that you cannot expect to export such volumes of data using a single content package.

 

Rule of thumb is that you should try to keep your content packages under 50 MB in size. That way you have manageable chunks of data that can each be imported in reasonable time, without choking the server. So, while exporting you’ll need to experiment a bit to find out how to break up your content to achieve that desired package sizes.

 

Exception to that rule are Media Library items. These are usually images, videos or documents that are large in size, but there aren’t that many of them, compared to classic content items and fields. The speed of the import process is affected more by count of the items and fields in the package than by size of the package. So, you can have media library content packages that are large in size (for example, 500 MB) but they might be imported faster than a much smaller “classic” content package.

 

Order of import

Templates need to be imported first, before any content is imported. Every content item depends on its template, so template needs to be there at the moment content item is created.

 

If you break up the import into multiple packages make sure that packages that create parent items are imported before packages that contain children items. If you import a child item before its parent is imported Sitecore will create the missing parent item using standard “Node” template, which might not be okay.

 

The order of import among linked content items is not really important. If a content item has a link to another content item that has not been imported yet that link will be considered “broken” at that moment, but it will still contain the correct GUID of linked item and when the linked item gets imported the link will not be broken any more.

 

Difference in data structures between Sitecore versions

Depending on the versions of your source and target Sitecore instances, you might encounter some problems because some standard fields might have changed between Sitecore versions.

For example, in one of our migrations from Sitecore 6 to Sitecore 8 we noticed that the field “__Source”, which is used for linking cloned items to their original items, has been renamed to “__Source Item” in Sitecore 8. The import process did not raise any errors, but after the import was done we noticed that cloned items had lost connection to their originals and because of that many fields on them were empty. As it often happens, Sitecore community rel="noopener noreferrer" has provided the answer rel="noopener noreferrer" to our problem. We found this post that led us to an SQL script that needed to be run after import to correctly populate “__Source Item” field.

This is just one example, but it gives you an idea of what kind of problems you might encounter when migrating data between different versions of Sitecore.

Dalibor Kovač
Solution Architecture & Development