I’ve been wearing the hat of build master for the last several months and have the pleasure of reviewing every commit from a team of developers with great scrutiny and as a result I’ve started to notice a lot of small nuances to Sitecore that I had not noticed before. Interestingly enough, three separate issues I’ve had to track down all linked back to the same concept — developers were checking in items with orphaned data on them.
What do I mean by that? Essentially there is data for a field that no longer exists on the template for which the item is derived.
How can this cause problems?
- Automated builds were failing on TeamCity
- New developers were unable to spin up new environments without getting errors when syncing content
- New language versions were appearing on both content and some of the default sitecore tree
How does this happen?
Scenario 1: A developer creates a data template with two fields on it, plus some content derived from that template. They are both added to the TDS project and pushed out to the git repo. Later on that template is modified and one of those fields is removed. Again that is synced to git repo via TDS, however the key thing to note is that TDS will only notice the change to the template. It will not pickup a change on the items derived from that template which now have data on it which references a field that no longer exists. This can be incredibly easy to miss, between TDS not picking up the discrepency and the fact that when you look at the item within sitecore you will not see this orphaned data — but it is there. To see it you must look at the serialized content on the disk.
Scenario 2: A developer is experimenting with multi-language in Sitecore and adds a new language to their local sitecore instance, let’s just say Japanese for the sake of an example. Depending on how they set this up locally this may result in a second version of each item being created for the Japanese langauge. Since the developer knows this is just something they’re playing with and is not something that should be shared with the team, they do not commit the new language definition to the project. However later on they add some other items to the project, again for the sake of an example I will say they added some schedules and commands which requires them to add both those user defined items and the parent structure to the TDS project. These items will include English as well as Japanese versions, yet the rest of the team will not have a definition for the Japanse language on their system. Most likely this will not be noticed at the time and will be discovered weeks or months down the road.
What does a cleanup really do?
When you perform a cleanup of the database it will do all of the following:
- Removes all orphaned language items
- Removes orphaned fields from content (field was removed from template)
- Removes orphaned items (parent item was deleted)
- Removes unused blob fields on media items
- Rebuilds the Sitecore descendants tree
- Clears all Sitecore caches
And one final note to keep in mind, this will happen on an entire database. There is no way to clean a specific Sitecore node.
So should I clean the database?
I am a firm believer in the “if it ain’t broke, don’t fix it” mantra. I’ve far too many times in my career created more work for myself by trying to fix something that only I knew to be broken. Then you find yourself explaining to a client or project manager why you ate up half a day fixing something they didn’t think was a problem.
As developers we tend to see a lot of things behind the scenes that we know could be done better, or are for whatever reason aren’t done right, and we want to make them right — but that doesn’t mean we should stop what we’re doing everytime we see such as issue.
So back to the question at hand, should you clean the Sitecore database? Well if you’re running into issues with a build server failing to complete builds, or other team members are running into troubles getting their environments setup, I would say yes, clean that database. Then sync those changes back to the project so the rest of the team benefits from that clean up. Your only other option will be manually editing the serialized items and syncing those changes back, which can be very tedious.
It can also be useful if you know you’ve been making a lot of changes to the template and have removed fields that you know to have been used. A good developer should ensure that their commits are not going to cause issues for other developers. So be mindful when making such changes, and make sure to sync back the cleaned up version of any content items you’ve impacted to save everyone else a headache.
If you do choose to clean the database, you’ll find that it will probably cause a lot more changes to the project than you expected, especially if no one has cleaned it for some time. And syncing those changes will be time consuming for you and your team, so unless there’s an actual problem you’re trying to fix I wouldn’t create a new problem for your team to deal with. Trust me, I’ve done that 😉