Moving WordPress media to Azure CDN

This post charts the process I took in migrating the static media content of my blog posts out of WordPress and into Azure CDN. Before exploring how I went about it, or even why I would want to do it, I want to take a moment to explain what exactly I mean.

As of writing, this website runs on the WordPress platform, and as such I write in the WordPress editor, and use the WordPress media library for managing in-post media. Usually that media is the images that I include inside individual posts, though it can also include other assets, such as zipped up content that I’ve linked to from a given post or page. This media doesn’t include images that are part of the site layout (so things that appear on every page), and also doesn’t include other static support files like javascript and CSS files.

Freedom & flexibility (and kittens)

But why do this? The media library in WordPress is perfectly adequate, so why go through the hassle of moving all those assets somewhere else? For me, it comes down to a matter of freedom and flexibility. This website is primarily a blog, and I believe that a blog should be all about the content. The makers of WordPress seem to agree with this, as the admin panel provides the functionality to export all my written content into an XML document, which I am then free to do with as I wish.

However, there is no export feature in the media library. I can see thumbnails of all the images I’ve included in posts, and tiles for other assets I’ve uploaded and linked, but there is no way to get any of this back out, aside from manually lifting the directory from the server file system.

Kitten-pic to demonstrate that WordPress doesn't use relative paths when inserting images into posts Kitten-pic to demonstrate that WordPress doesn’t use relative paths when inserting images into posts

Using the media library to add images also presents issues of its own. I have no control over the directory structure where my content is placed, the images are duplicated and saved at various different sizes, whether I need them at those sizes or not, and perhaps worst of all, the source destination for the images is hardcoded into posts with an absolute path rather than relative to the base URL.

Ideally, images would be referenced as /wp-content/uploads/2015/05/kitten-fight.jpg, but WordPress inserts them into posts as http://www.tomeggington.co.uk/wp-content/uploads/2015/05/kitten-fight.jpg. If this site were ever to move to a different domain in the future, in addition to all the other migration considerations, I would also have to find and replace all image URLs in posts. If the images lived on a different domain, this wouldn’t be a problem.

Performance (but no kittens)

But moving away from the WordPress media library is only partly about preemptive flexibility. By transitioning media content to Azure CDN, my images will be pushed out to edge servers around the globe, providing a more consistent user experience when it comes to page load times. Generally speaking, it’s images that account for the bulk of data transfer when loading a page, so pushing these out close to the visitor’s geographic location is clearly a good thing.

Moving the media

So how do we go about getting images out of WordPress and onto Azure CDN? The first step I took was to get uploaded media content out of WordPress and downloaded onto my laptop. The process for this is likely to vary by hosting provider. In my instance, my WordPress installation lives on Azure (did I mention that I have a bit of a thing for Azure?). I just grabbed the FTP details from the web app dashboard on the Management Portal:

FTP credentials

If this is the first time you’re connecting to an Azure website to access the filesystem via FTP, you might be wondering what password to use with these login details—I know I was! On the web app dashboard page, there’s also an option to “reset your deployment credentials”. Clicking this option will present a modal dialog where you can associate a password with your user account. It goes without saying to use a secure password for this, preferably generated by a password manager.

All WordPress uploaded content is stored under wp-content/uploads. I downloaded a copy of this uploads directory to my local machine.

Now that I had all this content downloaded, the next step was to push it back up into an Azure storage account. Just like creating a new website in the Azure portal, creating a storage account is incredibly straight forward.

It’s worth bearing in mind at this stage that the location of the storage account isn’t the same as the location of the CDN, which will serve content from a geographically diverse set of locations around the world. The storage account location specifies the primary location of your content, and where it is first pushed to when uploaded, so choosing somewhere close to you is probably preferable.

Once this account had been created, I clicked through into it in the Management Portal, and added a new container. Note that the name of the container becomes the first segment of the URL path after the domain. In this scenario, I called it ‘uploads’ to match the name of my existing directory. This isn’t strictly necessary, as I will have to rewrite all the absolute image references later on anyway.

I also set the access level to ‘Public Blob’, to allow anonymous public access to the contained blobs via URL.

With the storage container created, I could start pushing all my images into it. There is an Azure blob explorer available for Visual Studio, and there’s also an open source option called Azure Storage Explorer. However, neither of these support uploading a directory structure into blob storage. Now in theory, there’s a perfectly logical explanation for this. The thing about blob containers in Azure, is that they don’t support directory structures. The whole container is just one long flat collection of blobs.

Surely this is going to be a problem. After all, as my kitten URL above demonstrates, WordPress saves uploaded content segmented into year and month directories. If we can’t replicate that pattern, then our find and replace later on just got a lot more complicated. Fortunately though, the situation is not as bad as it might appear. While blob containers don’t support real directories, they do support slashes (/) in blob names.

By renaming my kitten image from kitten-fight.jpg to 2015/05/kitten-fight.jpg, and then storing it into the container called uploads, I can effectively recreate the original directory structure (‘wp-content’ prefix notwithstanding). Doing this for all my images would be rather tedious, not to mention largely impossible for me, given that Windows doesn’t allow slashes in filenames.

The solution to both this problem, and the problem of the previously mentioned storage clients, is CloudBerry Explorer, a freeware solution from CloudBerry Lab. When uploading folders into an Azure blob storage container, Cloudberry Explorer will translate the uploaded directory structure into blobs with names that represent their location in that structure.

The CDN

With my storage account created and populated, I could move onto the task of putting the CDN over the top of the account. And again Azure makes this super simple. In fact, the only decision I had to make when creating a new CDN, was choosing which of my storage accounts to associate it with.

There’s not much to show here, and indeed the only parts that are particularly important for this scenario are the endpoint address, which is where my content is served from, and enabling of HTTPS, which allows me to use https:// references to content. As of today, enabling a secure channel to my content isn’t all that important, but if I were to move the site to https in the future, I would have to convert all my media references to https to avoid mixed content warnings in the browser. With that in mind, it made sense to enable and then use https upfront.

With all of this in place, my kitten image is now available at https://az761005.vo.msecnd.net/uploads/2015/05/kitten-fight.jpg.

You might have noticed that the endpoint URL I’ve been given isn’t exactly memorable. In reality, this probably doesn’t matter that much, as I’ll mainly use the CDN to serve content embedded in another page, but it would still be nice to use a name that’s a bit more friendly. Azure makes it easy (notice a trend here?) to configure custom domains. If you’re familiar with the custom domain management for web apps, you’ll feel right at home with the CDN domain management.

cdn-domain

With the new DNS CNAME entry configured with my DNS provider, Azure accepted my choice of static.tomeggington.co.uk, and that was that. It took about an hour for the new domain to begin serving content from the storage account. And as you can see now, my kittens can be found at http://static.tomeggington.co.uk/uploads/2015/05/kitten-fight.jpg.

There’s an important gotcha here. Maybe you spotted it. Because I’m now using a custom domain with my CDN, I can no longer use https. This is because Azure doesn’t hold a valid certificate for the domain static.tomeggington.co.uk, so the CDN edge servers can’t serve up a valid certificate for my content.

In reality, what this means is that I have to be pragmatic regarding my implementation, and serve all my embedded content over https using the Azure name az761005.vo.msecnd.net. This isn’t a problem in the majority of cases. I can continue to have https enabled on the default name, and have the option to serve content from my custom domain at the same time. This just gives me the added option to link to potentially high profile resources (albeit without https) using the custom name if I choose to do so.

Swapping over

The final part of this task was to update all the existing image references that pointed to the media library locations. This is a good moment to pause and take a look back at what I’ve gone from, and what I’m moving to.

I started with my resources in the WordPress media library, which serves them from http://www.tomeggington.co.uk/wp-content/upload/ followed by date segments and the file name. What I’m moving to, is serving my media content from Azure’s CDN service at https://az761005.vo.msecnd.net/upload with the same date and file naming conventions. The MySQL command to do this in my WordPress database is as follows:

UPDATE wp_posts

SET post_content=(

  REPLACE(

    post_content,

    'http://www.tomeggington.co.uk/wp-content/uploads',

    'https://az761005.vo.msecnd.net/uploads'));

And that was it. Everything was swapped over and continued working. You can check this out for yourself by taking a look at the source location of any of the images in this post, or in any other post on the site. The only difference is that I no longer have a reliance on the WordPress install directory or web app file system for my media content, and my resources are delivered to visitors from locations potentially much closer to them.

You might be wondering what my plan is going forward with regards to writing and publishing articles if I can no longer rely on the WordPress media library. In the immediate term (starting with the post you’re reading now), my writing workflow will remain largely unchanged. I will continue to use the media library to upload new images and add them into my posts, and will manually update the image references to point to the CDN. I’ll copy across the latest images from the WordPress uploads directory to blob storage prior to publishing.

In the medium term, I plan to build a simple service on top of the storage container that can act as a replacement for the features of the media library that I use. I will be able to upload images into this service instead of the media library, and it will give me back the relevant snippets of WordPress code that I can include in my posts.”