by Isaac Raway
For the last two years I’ve been involved in the development and maintenance of large, feature rich add-ons for ExpressionEngine. These started out as primarily being internal projects for individual clients, but have since moved into my own work in the form of add-ons such as ProForm. During this process of managing add-ons that have a rich set of features, I’ve run into a number of issues.
The add-ons that were developed internally at work were in use by multiple developers for multiple clients at the same time. Thus, each of these developers had competing and slightly incompatible goals and feature requests—features which all had to be done now, features that I had no reason (or ability) to say no to. This situation is the same that most add-on developers go through—we want to be able to say yes to all reasonable feature requests in a timely manner. Especially when first starting out as an add-on developer, one of the most important things you can do (aside from being generally helpful) is to say “yes” as often as possible.
This behavior can lead to a lot of stress. In my early experience with this kind of complexity, I was managing the same code base in multiple code bases then manually moving changes for each version between these separate projects. For most add-on developers the situation would be slightly different, but you can indeed end up with multiple versions of the same add-on. One for important client A, one for your own site, one for the default buyers.
The time scheduling issues aside (how exactly do you get two 100 hour features done in a week?) there was, at first, no way that I could manage the competing requests.
Implementing support for one critical feature would break an older feature, which had to be rewritten to use the new API provided by the first feature and so forth. At one point, things had gotten even worse with a separate developer implementing additions and changes directly to their own copy of an add-on, while I was receiving the same requests from other projects. So I’d then have to take the time to figure out how to merge those features back into the main version, while not taking other features along that those separate projects did not need.
Add-ons top of this that projects might still be running a very old version for weeks (the same is true for typical external customers), and then suddenly discover a bug that needs to be fixed in a very old version of the code. Is it easier to fix the old version, manually merge the changes into the current version, and move on? Or would it be better for them to upgrade to the latest version?
This whole situation was extremely unpleasant, and clearly unsustainable. It also wasn’t something I wanted to repeat with my own projects.
While working on this initial set of large add-ons, I did have the benefit of the first of our one-two punch against managing add-on complexity: source control.
Source control was the first layer of defense against the growing stress of managing multiple projects using the same add-ons. This allows you to automatically manage multiple versions of the code, and automatically merge changes between them very easily.
Each released version is tagged with an ID that can be used to retrieve it at any time for testing, and the upgrade path for anyone using an older version is cleaner since it’s more likely that code changes will not break if they are frequently merged into a master version. By automating as much of this management as possible, you can perform merges on a daily or hourly basis without much trouble.
The second layer of defense was automating the process of doing a build. This allows me to quickly issue new builds of add-ons to internal developers as well as external customers, confident that all the details of the build (which we’ll get into in a bit) are right.
Before we go further let’s go over some basics of source control. This is an important foundation to using the automated build system.
Source Control Basics
When I first started out doing web development I had no idea what Source Control Management (SCM) was. During my first job working as a web developer I basically continued to do the same types of things that I had done in school with my own code: sort of back it up when I felt like I hit a super important milestone, but in general just not really worry too much about losing work or being able to look at the progressing of a codebase.
This bit me several times. Losing several days of work is bad enough when you are just losing your own personal work, but it’s even worse when you’re losing a client’s work. This isn’t something you want to suffer, and yet many people continue to do what I originally did—at best, ad-hoc manual backups. Some of the more savvy may have realized this is a good way to get burned, and started to rely on a hosted synchronization solution of some kind such as Dropbox, or they even make use of a real backup solution such as CrashPlan.
There’s an even better way, however. That better way is using a true source control system.
A true source control system gives you a lot more than even a backup service can. It allows you to manage the change between versions of your program, manage multiple features under development separately from each other, and compare previous versions (or even revisions between releases) of your projects. This is really helpful both for add-on development as well as for normal website development - it can help you find out where something went wrong and fix problems much more quickly than you might have thought possible.
There are several options for source control, and you just have to pick the one that meets your needs:
- Subversion (SVN) - fairly simple to use and understand
- Git - only slightly more complex than Subversion, but much more powerful
There are hosted solutions for both of these tools:1
- BitBucket - free, unlimited number of Git repos for up to 5 collaborators
- Beanstalk - SVN and Git hosting (also provides automatic deployment for web sites)
- Github - Git hosting, pricing best for open source projects or a very small number of private projects
There are also some excellent GUI applications for Git and Subversion:
All source control systems have a concept of commits. These can vary in how they are executed and how they travel between your workstation and the server, but in general they represent a set of changes that progress the code towards some specific goal—be it implementing a new feature, or fixing a bug.
The main job of a source control system is to keep track of the changes made in each commit so that you can easily step back in time to see old versions of the code. This is handy since it allows you to recover code that was mistakenly deleted, recreate bugs as they existed in previous versions, and a lot of other stuff based on previous versions of the code.
Typically commits are stored in a space efficient format so that your repository (the collection of commits that represent a project’s entire history) is not actually that much larger than the source code itself.
When making each commit, all source control systems allow you to provide a commit message which is stored along with the commit. It’s a good idea to provide detailed notes of what you’ve changed, and why you’ve changed it, when making each commit.
Based on this, and the desire to keep code in the repository relatively stable, it’s a good idea to only make so-called atomic commits. These commits are indivisible in that they represent a single small set of changes to the source code that make an identifiable improvement to the project. This doesn’t mean that a new feature is done, but it does mean that a new method has been stubbed out, or has been implemented, or a new test has been added, etc. Essentially, it should be possible to actually run the code after making any particular commit—the code should be free of logic errors that break the add-on, and should certainly be free of syntax errors.
The cornerstone of most effective source control management strategies is the use of branching. A branch is, in general terms, a string of commits which are related to each other, but are perhaps unrelated to other work going on in other branches on the same project.
Branching allows you to keep multiple copies of the same source code, and switch between them quickly during development. This lets you isolate the development of new features from not only the currently deployed production code, but also from each other.
This is important so that you retain the ability to create bug fixes for the currently released version of an add-on, without being forced to release half implemented features or figure out how to remove them from the code before releasing a new build.
The next important concept in successful source control management is that of tagging. If you imagine that each change you’ve made to your source control is a single page in a book, a tag is very much like a bookmark placed on a certain page. This lets you switch back to a previous version very quickly.
Tagging is different from branching in a few ways:
- A branch must only exists for as long as there is work to do in it
- A tag is permanent and cannot be modified - it always represents a single point in time of the code base2
These distinctions allow you to rely on tagging to keep track of each released version, and reliably compare previous versions with each other.
For instance, by running a “git log” command - which returns each commit message between two tags (or other items) - I am able to see exactly what commit messages I made between two releases of one of my add-ons. Combined with disciplined use of commit messages, this allows me to quickly compile a release notes post.
Typically right before tagging, I make a final commit that modifies a root level config.php file which contains a global constant for the add-on, identifying it’s version number. This version number is then used in my documentation generation script to tag the current version in the footer, as well as in the ExpressionEngine UI to identify which version is installed.
Here’s how I actually go about setting up a development environment while leveraging source control for each separate add-on.
When installing add-on, there are typically two directories that need to be copied into a site:
In order to avoid a number of issues, we want to keep each of our add-ons completely separate from each other in their own source control repositories. Due to this, we can’t actually check in this entire directory tree as-is. Instead, we need to keep all of the themes assets for the add-on inside of its main package directory.
So, during development we actually have these directories (amongst others under the package)
In order to get the theme files to still be accessible, we create a soft link between their physical location on disc and their expected location within the ExpressionEngine hierarchy. This can be done in Mac OS or Linux with a command like this, inside the
$ ln -s ../../system/expressionengine/third_party/package_name/themes ./package_name
At this point, the expected directory appears to exist to ExpressionEngine, but we are able to manage all of our themes files within the same repository located under the main package directory.
Additionally to this, I typically have a number of other directories containing non-deployable assets, such as:
These directories don’t need to go with the final ZIP file that’s uploaded to Devot:ee, but it’s helpful to keep them in the same source control system so I know that I can always find these development assets when I need them.
While all of this makes management of the add-on’s source and themes files a lot cleaner for us during development, it also means that we have a fairly manual process to produce a new build:
- Create an empty directory to move things into.
- Create a
- Create a
- Create a
- Create a
- Create a
- Create a
- Create a
- Make sure my development copy is pointed at the right version and has no uncommitted changes.
- Copy the current proform directory from my build tree into the empty
- Delete any files or folders that start with double underscores inside the directory—remember to change inside subdirectories!
- Delete the hidden
.gitdirectory. When I was using Subversion this was even worse - there is a hidden
.svndirectory inside every subdirectory as well.
- Move the themes directory out to
/themes/third_party/and rename it to
Any time I have a list of steps like this, it’s not long before I start thinking abut a way to automate the process. Enter my build script.
Aside from the manual process outlined above, you may not typically think of web development as having “builds.” While we typically do not have compiled code to produce (for ExpressionEngine development at any rate), it is still a good idea to apply the principals of a build structure to our projects. There are a few reasons for doing this.
The first reason is that thinking in terms of a build draws a more defined line between the development phase of a project, and the release phase. These two phases switch back and forth on a typical add-on very quickly, but they are very distinct in their goals.
The development phase is one where any valid ideas are options. If there are enough requirements around a particular idea, we can start working on it. Basically this is the period of a project when I look for ideas to implement from my own notes and prioritized feature list, as well as customer requests and bug reports. Everything goes in—but when I work on each of these items, they get their own branches.
This is important so that I can be sure that I can make an urgent release at any time without needing to backtrack or rely on more esoteric forms of branch management.
When I’m ready to do a release, I just look at what tickets are marked as complete in my tracking system. If something wasn’t quite finished, I don’t get too worried about it. It’s more important to make stable releases than it is to push new features out the door—although that is important as well!
All of those branches get merged into the main develop branch, at which point I have what is basically a ready to go release. After some additional testing on the merged code, the changes are then merged into the master branch and tagged with a version number. At this point the release is ready to go, now all we need to do is build it.
This phase also consists of final testing on the merged code. The built ZIP file is extracted, and installed into a fresh copy of ExpressionEngine to make sure everything works correctly, various upgrade tests are performed, and automated tests are run. Here we focus on stability and making sure that we have a good release. If any of these tests fail, we abort the release - fix the issue - and rerun the build. Because we could be doing this a dozen time in a day (hopefully not though!), we have yet another reason to want to automate the build process.
My build system consists of a fairly simple single PHP file, and a number of hosted git repositories. I personally use BitBucket for my repository hosting, but originally started using this system on GitHub.3
A build in my terminology is basically a ZIP file of a particular release version which conforms to the de facto standard for third-party ExpressionEngine add-ons. That is, it contains at the root level a
README.txt file, a
system/ directory, and a
themes/ directory. Both of the directories contain a folder structure matching the structure of a standard ExpressionEngine install, with the various folders needed by each module in their correct locations. This makes it easy for anyone familiar with the structure of a EE site to know where to place each folder.
There are a few basic steps which go into the actual build script. It’s all relatively straightforward really, but helps save a huge amount of time when releasing each build. It’s reduced manual mistakes by a significant amount and reduces the cognitive resistance to preparing a new build of an add-on.
Here are the steps that need to be performed by the build script:
- Cleanup from any previous build of this version/tag.
- Clone the add-on into a fresh temporary working directory.
- Checkout the requested version/tag.
- Remove any
.svn) meta directories.
- Remove any private directories—these start with two underscores and are never packaged into the ZIP file. Handy for keeping stuff in source control for myself but not putting it into the build.
- Create an empty staging structure that matches the folder hierarchy laid out above.
- Move the directories from the checkout into the correct hierarchy structure.
- Create a ZIP file.
- Party down.
Using the script is as easy as running this command in my Terminal:
$ ./build.php proform 1.12
This kicks off the automated build process as outlined above, and produces a ZIP file ready to upload to Devot:ee. After running this copy of the code through the ringer, including unzipping it and using that version to perform a fresh install and upgrade, then running through a manual testing process (which I am working towards automating) I upload the new build. This is a lot easier than the manual process I was using before!
Since I want to provide detailed release notes with each build (which I post to the Devot:ee forums directly—man that site is really useful handy for add-on developers), I run the following command (inside the add-on repo) to get a log of messages posted between each version tag:
$ git log 1.11..1.12
This produces a complete log of the changes, which I copy/paste and edit to produce the release notes for each build.
This article discusses two ideas which are extremely helpful to quickly iterating on ExpressionEngine add-ons (or really any other project):
- Pick and rely on a robust source control system such as Git or Subversion
- Use an automated build script to produce builds for release
In combination, these two practices can lead to faster development cycles and increased confidence in the quality of your add-ons.
Since I am, fairly safe to say, a PHP developer I’ve made the obvious choice of implementing this build script in PHP. Originally I had actually written the first version of this script in the Bash shell scripting language. That was quite a nightmare to expand and actually delayed releases since it was difficult to add a few things that I needed to do custom for my add-ons (I make use of a shared library that needs to be built into each add-on as well).
Rather than fight with Bash, I realized it would be much easier to simply rewrite the script in PHP. This was a good choice since PHP is a much more expressive language (for someone familiar with it, at any rate) than Bash—and it’s been much easier to maintain the build script when I wrote it in a language that I really know. So that’s the version that I’m presenting here, and I suggest taking it as a lesson as well - sometimes the most obviously good choice is actually a bit hard to see.
Most of these hosting providers also provide Mercurial hosting. Mercurial (aka Hg) is another excellent source control system, however I recommend you avoid using it if possible until you’ve gotten used to Subversion or Git. Mercurial is very powerful but it is also extremely complex and can be confusing to use. ↩
Of course, as is often the case, this is only true in theory. In practice all source control systems allow you to modify tags in one form or another - Subversion (strangely) allows you to directly edit a tag, while git allows you to delete and recreate a tag at a different revision. However it’s a bad idea to make use of these capabilities since it will only lead to confusion later on when it’s unclear what version of the code a particular tag really refers to. ↩
I switched to BitBucket because their pricing scheme is based on users that have access to your repositories, known as “collaborators”, rather than based on the number of repositories you have. I currently have 30 private repositories and only two active collaborators - which would cost $50 a month on GitHub but it completely free on BitBucket. BitBucket’s UI is similar to GitHub and just as easy to use, so the choice to switch (for a frugal minded guy like me) was obvious. ↩