Jordo Media RSS Feed Directory

Toggle Content Main Menu

Toggle Content User Info

Welcome Anonymous

(Register)

Toggle Content Top Ranked Feeds

Toggle Content Random Feeds

View the feed - Martin Fowler's Bliki

Jordo Media RSS / Atom Feed Directory

[ Directory - Main | Tags | Submit Feeds | New | Popular | Top Rated | Editor's Picks | Random ]

There are 44,720 Feeds and 130 Categories in our database


Main - Uncategorized - Feeds that are not yet Categorized - Martin Fowler's Bliki

[Comments | Print RSS/Atom Feed Printer Friendly Page | Email RSS/Atom Feed Send to a Friend | Is this your feed/content? | Feature this Feed ]

Title:

Martin Fowler

Site URL:
Feed URL:http://www.martinfowler.com/feed.atom  Martin Fowler Feed
Subscribe: Subscribe to this feed Add to My Yahoo! Add to Google Add to MSN
Description:A cross between a blog and wiki of my partly-formed ideas on software development
Tags: None  [ Add Tags | What are Tags? ]
Added on:25-Jul-2006 
Hits:13
Rating:N/A (0 votes) [ Rate this RSS/Atom Feed ]
Jordo Media is displaying this feed so that you can decide if you wish to subscribe to it or not. We are neither affiliated with the authors of this feed nor responsible for its content.
Please report inappropriate content to via the "Report Problem" link above.



Steps towards REST

Last year Leonard Richardson gave a talk on QCon that included a maturity model for RESTful web services. The model is a good way to sneak up on understanding REST principles and the authors of REST in Practice are using it to help their discussion of how to use REST. Here’s my take on explaining the model, which I found helpful in my understanding of what makes REST tick:

Richardson Maturity Model: steps toward the glory of REST


Bliki: VcsSurvey

When I discussed VersionControlTools I said that it was an unscientific agglomeration of opinion. As I was doing it I realized that I could add some spurious but mesmerizing numbers to my analysis by doing a survey. Google's spreadsheet makes the mechanics of conducting a survey really simple, so I couldn't resist.

I conducted the survey from February 23 2010 until March 3 2010 on the ThoughtWorks software development mailing list. I got 99 replies. In the survey I asked everyone to rate a number of version control tools using the following options:

  • Best in Class: Either the best VCS or equal best
  • OK: Not the best, but you're OK with it.
  • Problematic: You would argue that the team really ought to be using something else
  • Dangerous: This tool is really bad and ThoughtWorks should press hard to have it changed
  • No opinion: You haven't used it

The results were this:

ToolBestOKProblematicDangerousNo OpinionActive ResponsesApproval %
Subversion20726109993%
git651910148599%
Mercurial332720366297%
ClearCase03144141585%
TFS00322244540%
CVS0145911158417%
Bazaar11330801782%
Perforce126161544461%
VSS11116422773%

As well as the raw summary values, I've added two calculated columns here to help summarize the results.

  • Active Responses: The total of responses excluding "No Opinion". (eg for git: 65 + 19 + 1 + 0)
  • Approval %: The sum of best and ok responses divided by active responses, expressed as a percentage. (eg for git: (65 + 19) / 85)

The graph shows a scatter plot of approval percentage and active responses. As you can see there's a clear cluster around Subversion, git, and Mercurial with high approval and a large amount of responses. It's also clear that there's a big divide in approval between those three, together with Bazaar and Perforce, versus the rest.

Although the graph captures the headline information well, there's a couple of other subtleties I should mention.

  • Although the trio of Subversion, git, and Mercurial cluster close together on approval, git does get a notably higher amount of best scores: (65 versus 20 and 33).
  • VSS got the most "dangerous" responses, but a couple of people approved of it.
  • Neither TFS or ClearCase are liked much, but ClearCase got more "dangerous" responses than TFS (41 versus 22).
  • Don't read too much into small differences as I'm sure they aren't significant. I'm sure the difference in approval percentage between VSS, TFS, and ClearCase isn't signifcant, but the difference between these three and the leaders is.

Some caveats. This is a survey of opinion of ThoughtWorkers who follow our internal software development discussion list, nothing more. It's possible some of them may have been biased by my previous article (although unlikely, since I've never managed to get my ThoughtBot opinion-control software to work reliably). Opinions of tools are often colored by processes that are more about the organization than the tool itself. But despite these, I think it's an interesting data point.

I should also stress the important point to take away from this isn't the comparison between those close in the numbers, eg comparing git and Mercurial or comparing TFS and ClearCase. Any survey like this has a certain amount of noise in it, and I suspect the noise here is greater than such a difference. The important point is the big approval gap between the leading tools (Subversion, git, and Mercurial) and the laggards - essentially the point in VersionControlTools.


Bliki: ToyotaFailings

One of the arguments used to support the adoption of lean techniques in software is the success of Toyota. So do Toyota's recent quality failings undermine the case for lean software development?

One answer for this is to take a sense of proportion. Lean manufacturing techniques were the underpinning of Toyota's rise from an insignificant company in the 1950's to a global giant in the 2000's. By the 1990's other car companies, and many other manufacturers, were busily copying Toyota's techniques. The general sense is that copying these techniques did much to raise the overall quality of cars in the last decade or so. I would be very surprised if the recent problems at Toyota are enough negate that half-century of success.

But a better answer is to remember that Lean manufacturing is about manufacturing not software. The application of lean ideas to software development is a consequence of MetaphoricQuestioning. Lean ideas can help us come up with better ideas for software development, and as such are valuable. But in the end their usefulness lies with how they are used in software and they should be judged on their record here. Their history in manufacturing, both good and bad, is another industry.


Bliki: BlueGreenDeployment

One of the goals that my colleagues and I urge on our clients is that of a completely automated deployment process. Automating your deployment helps reduce the frictions and delays that crop up in between getting the software "done" and getting it to realize its value. Dave Farley and Jez Humble are finishing up a book on this topic - Continuous Delivery. It builds upon many of the ideas that are commonly associated with Continuous Integration, driving more towards this ability to rapidly put software into production and get it doing something. Their section on blue-green deployment caught my eye as one of those techniques that's underused, so I thought I'd give a brief overview of it here.

One of the challenges with automating deployment is the cut-over itself, taking software from the final stage of testing to live production. You usually need to do this quickly in order to minimize downtime. The blue-green deployment approach does this by ensuring you have two production environments, as identical as possible. At any time one of them, let's say blue for the example, is live. As you prepare a new release of your software you do your final stage of testing in the green environment. Once the software is working in the green environment, you switch the router so that all incoming requests go to the green environment - the blue one is now idle.

Blue-green deployment also gives you a rapid way to rollback - if anything goes wrong you switch the router back to your blue environment. There's still the issue of dealing with missed transactions while the green environment was live, but depending on your design you may be able to feed transactions to both environments in such a way as to keep the blue environment as a backup when the green is live. Or you may be able to put the application in read-only mode before cut-over, run it for a while in read-only mode, and then switch it to read-write mode. That may be enough to flush out many outstanding issues.

The two environments need to be different but as identical as possible. In some situations they can be different pieces of hardware, or they can be different virtual machines running on the same (or different) hardware. They can also be a single operating environment partitioned into separate zones with separate IP addresses for the two slices.

An advantage of this approach is that it's the same basic mechanism as you need to get a hot-standby working. Hence this allows you to test your disaster-recovery procedure on every release. (I hope that you release more frequently than you have a disaster.)

The fundamental idea is to have two easily switchable environments to switch between, there are plenty of ways to vary the details. One project did the switch by bouncing the web server rather than working on the router. Another variation would be to use the same database, making the blue-green switches for web and domain layers.

This technique has been "out there" for ages, but I don't see it used as often as it should be. Some foggy combination of Dan North and Jez Humble came up with the name.


"IT - more than Tools and Technology" track at QCon London

I’ve been a regular speaker at the QCon and JAOO conferences over the last few years. At QCon London this year, I’m involved in a somewhat off-beat track of talks. Software Developers have often had a habit of focusing primarily on the mechanics of developing software well, and not thinking much about the societal value of that software. This is a common strand in many professions, but as the influence of software development grows, I feel it’s increasingly important that we get more engaged in the consequences of the software we build.

This track is a chance to explore some of these issues. As well as myself, there are talks about some work done with UNICEF to make effective use technology in much poorer parts of the world, the role of IT in reducing carbon footprint, and the contribution of team diversity to innovation.


Bliki: VersionControlTools

If you spend time talking to software developers about tools, one of the biggest topics I hear about are version control tools. Once you've got to the point of using version control tools, and any competent developers does, then they become a big part of your life. Version tools are not just important for maintaining a history of a project, they are also the foundation for a team to collaborate. So it's no surprise that I hear frequent complaints about poor version control tools. In our recent ThoughtWorks technology radar, we called out two items as version control tools that enterprises should be assessing for use: Subversion and Distributed Version Control Systems (DVCS). Here I want to expand on that, summarizing many discussions we've had internally about version control tools.

But first some pinches of salt. I wrote this piece based on an unscientific agglomeration of conversations with my colleagues inside ThoughtWorks and various friends and associates outside. I haven't engaged in any rigorous testing or structured comparisons, so like most of my writing this is based on AnecdotalEvidence. My personal experience in recent years is mostly subversion and mercurial, but my usage patterns are not typical of a software development team. Overwhelmingly my contacts like to work in an agile-xp approach (even if many sniff at the label) and need tools that support that work style. I expect many people to be annoyed by this article. I hope that annoyance will lead to good articles that I can link to.

(After writing this I did do a small VcsSurvey which didn't undermine my conclusions.)

Fundamentally there's three version control systems that get broad approval: subversion (svn), git, and mercurial (hg).

Behind the Recommendability Threshold

Many tools fail to pass the recommendability threshold. There are two reasons why: poor capability or poor visibility.

Many tools garner consistent complaints from ThoughtWorkers about their lack of capability. (ThoughtWorkers being what they are, all tools, including the preferred set, get some complaints. Those behind the threshold get mostly complaints.) Two in particular generate a lot of criticism: ClearCase (from IBM) and TFS (from Microsoft). One reason they get a lot of criticism is that they are very popular on client sites, often with company policies mandating their use (I'll describe a coping strategy for that at the end).

It's fair to say that often these problems are compounded by company policies around using VCS. I've heard of some truly bizarre work-flows imposed on teams that make it a constant hurdle to get anything done. Since the VCS is the tool that enforces these work-flows, it does tend to get tarred with that brush.

I'm not going to go into details about the problems the poor-capability tools have here, that would be another article. (This has probably made me even more unpopular in IBM and Microsoft as it is.) I will, at least for the moment, leave it with the fact that developers I respect have worked extensively with, and do not recommend, these products.

The second reason for shuffling a tool behind the recommendability threshold is that I don't hear many comments about some tools. This is an issue because less-popular tools make it difficult to find developers who know how to use them or want to find out. There are many reasons why otherwise good tools can fall behind there. I used to hear people say good things about Perforce, but now the feeling seems to be that it doesn't have compelling advantages over Subversion, let alone the DVCSs. Speaking of DVCSs, there are more than just the two I've highlighted here. Bazaar, in particular, is one I occasionally hear good things about, but again I hear about it much less often then git or Mercurial.

Before I finish with those behind the threshold, I just want to say a few things about a particularly awful tool: Visual Source Safe, or as I call it: Visual Source Shredder. We see this less often now, thank goodness, but if you are using it we'd strongly suggest you get off it. Now. Not just is it a pain to use, I've heard too many tales of repository corruption to trust it with anything more valuable than foo.txt.

So this leaves three tools that my contacts are generally happy with. I find it interesting that all three are open-source. Choosing between these tools involves first deciding between a centralized or distributed VCS model and then, if you chose DVCS, choosing between git and mercurial.

Distributed or Centralized

Most of the time, the choice between centralized and distributed rests on how skilled and disciplined the development team is. A distributed system opens up lots of flexibility in work-flow, but that flexibility can be dangerous if you don't have the maturity to use it well. Subversion encourages a simple central repository model, discouraging large scale branching. In an environment that's using Continuous Integration, which is how most of my friends like to work, that model fits reasonably well. As a result Subversion is a good choice for most environments.

And although DVCSs give you lots of flexibility in how you arrange your work-flows, most people I know still base their work patterns on the notion of a shared mainline repository that's used with Continuous Integration. Although modern VCS have almost magic tools to merge different people's changes, these merges are still just merging text. Continuous Integration is still necessary to get semantic consistency. So as a result even a team using DVCS usually still has the notion of the central master repository.

Subversion has three main downsides compared to its cooler distributed cousins.

Because distributed systems always give you a local disk copy of the whole repository, this means that repository operations are always fast as they don't involve network calls to central servers. This is a palpable difference if you are looking at logs, diffing to old revisions, and anything else that involves the full repository. If this is noticeable on my home network, it is a huge issue if your repository is on another continent - as we find with our distributed projects.

If you travel away from your network connection to the repository, a distributed system will still allow you to work with the repository. You can commit checkpoints of your work, browse history, and compare revisions on an airplane without a network connection.

The last downside is more of a social issue than a true tool issue. DVCS encourages quick branching for experimentation. You can do branches in Subversion, but the fact that they are visible to all discourages people from opening up a branch for experimental work. Similarly a DVCS encourages check-pointing of work: committing incomplete changes, that may not even compile or pass tests, to your local repository. Again you could do this on a developer branch in Subversion, but the fact that such branches are in the shared space makes people less likely to do so.

This last point also leads to the argument against a DVCS, that it encourages wanton branching, that feels good early on but can easily lead you to merge doom. In particular the FeatureBranch approach is a popular one that I don't encourage. As with similar comments earlier I must point out that reckless branching isn't something that's particular to one tool. I've often heard people in ClearCase environments complain of the same issue. But DVCSs encourage branching, and that's the major reason why I indicate that team needs more skill to use a DVCS well.

There is one particular case where subversion is the better choice even for a team that skilled at using a DVCS. This case is where the artifacts you're collaborating on are binary and cannot be merged by the VCS - for example word documents or presentation decks. In this case you need to revert to pessimistic locking with single-writer checkouts - and that requires a centralized system.

Git or Mercurial

So if you're going to go the DVCS route - which one should you choose? Mercurial and git get most of the attention, so I feel the choice is between them. Then the choice boils down to power versus usability, with a dash of mind-share and the shadow of github.

Git certainly seems to be liked for its power. Folks go ga-ga over it's near-magical ability to do textual merges automatically and correctly, even in the face of file renames. I haven't seen any objective tests comparing merge capabilities, but the subjective opinion favors git.

(Merge-through-rename, as my colleague Lucas Ward defines it, describes the following scenario. I rename Foo.cs to Bar.cs, Lucas makes some changes to Foo.cs. When we merge his changes are correctly applied to Bar.cs. Both git and Mercurial handle this.)

For many git's biggest downside was its oft-cryptic commands and mental model. Ben Butler-Cole phrased it beautifully: "there is this amazingly powerful thing writhing around in there that will basically do everything I could possibly ask of it if only I knew how." To its detractors, git lacks discoverability - the ability to gradual infer what it does from it's apparent design. Git's advocates say that much of this is because it uses a different mental model to other VCSs, so you have to do more unlearn your knowledge of VCS to appreciate git. Whatever the reason git seems to be attractive more to those who enjoy learning the internals while mercurial seems to appeal more to those who just want to do version control.

The shadow of github is important here. Even git-skeptics rate it as a superb place for project collaboration. Mercurial's equivalent, bitbucket, just doesn't inspire the same affection. However there are other sites that may begin to close the gap, in particular Google Code and Microsoft's Codeplex. (I find Codeplex's use of Mercurial very encouraging. Microsoft is often, rightly, criticized for not collaborating well with complementary open source products. Their use of Mercurial on their open-source hosting site is a very encouraging sign.)

Historically git worked poorly on Windows, poorly enough that we'd not suggest it. This has now changed, providing you run it using msysgit and not cygwin. Our view now is that msysgit is good enough to make comparison with Mercurial a non-issue for Windows.

People generally find that git handles branching better than Mercurial, particular for short-lived branches for experimentation and check-pointing. Mercurial encourages other mechanisms, such as fast cloning of separate repository directories and queue patching, but git's branching is a simpler and better model.

Mercurial does seem to have an issue with large binary files. My general suggestion is that such things are usually better managed with subversion, but if you have too few of them to warrant separate management, then Mercurial may get hung up by the few that you have.

Multiple VCS

There's often value to using more than one VCS at the same time. This is generally where there is a wider need to use a less capable VCS than your team wants to use.

The case that we run into frequently is where there is a corporate standard for a deficient VCS (usually ClearCase) but we wish to work efficiently. In that case we've had success using a different VCS for day-to-day team team work and committing to the corporate VCS when necessary. So while the team VCS will see several commits per person per day, the corporate VCS sees a commit every week or two. Often that's what the corporate admins prefer in any case. Historically we've done this using svn as the local VCS but in the future we're more likely to use a DVCS for local fronting.

This dual usage scenario is also common with git-svn where people use git locally but commit to a shared svn system. Git-svn is another reason for preferring git over mercurial. Using a local DVCS is particularly valuable for remote site working, where network outages and bandwidth problems can cripple a site that's remote from a centralized VCS.

A lot of teams can benefit from this dual-VCS working style, particularly if there's a lot of corporate ceremony enforced by their corporate VCS. Using dual-VCS can often make both the local development team happier and the corporate controllers happier as their motivations for VCS are often different.

One Final Note

Remember that although I've jabbered on a lot about tools here, often its the practices and workflows that make a bigger difference. Tools can certainly make it much easier to use a good set of practices, but in the end it's up to the people to use an effective way of working for their environment. I like to see approaches that allow many small changes that are rapidly integrated using Continuous Integration. I'd rather use a poor tool with CI than a good tool without.


Texas Speaking Events Rescheduled

The family medical issue has been resolved happily, so I’m free to go back on the road. We’ve thus rescheduled the events I was supposed to do last month in Texas.

  • On February 23rd I’ll be speaking at DFW Scrum in Dallas.
  • On February 25th ThoughtWorks is organizing a technology forum in Austin.

As is usual for me, I haven’t planned exactly what I’ll talk about yet, but it’ll revolve around my usual topics of software design and agile methods.


Bliki: ConversationalStories

Here's a common misconception about agile methods. It centers on the way user stories are created and flow through the development activity. The misconception is that the product owner (or business analysts) creates user stories and then put them in front of developers to implement. The notion is that this is a flow from product owner to development, with the product owner responsible for determining what needs to be done and the developers how to do it.

A justification for this approach is that this separates the responsibilities along the lines of competence. The product owner knows the business, what the software is for, and thus what needs to be done. The developers know technology and know how to do things, so they can figure out how to realize the demands of the product owner.

This notion of product owners coming up with DecreedStories is a profound misunderstanding of the way agile development should work. When we were brainstorming names at Snowbird, I remember Kent suggesting "conversational". This emphasized the fact that the heart of our thinking was of an on-going conversation between customers and developers about how a development project should proceed.

In terms of coming up with stories, what this means is that they are always something to be refined through conversation - and that developers should play an active role in helping that definition.

  • spotting inconsistencies and gaps between the stories
  • using technical knowledge to come up with new stories that seem to fit the product owner's vision
  • seeing alternative stories that would be cheaper to build given the technological landscape
  • split stories to make them easier to plan or implement

This is the Negotiable principle in Bill Wake's INVEST test for stories. Any member of an agile team can create stories and suggest modifications. It may be that just a few members of a team gravitate to writing most of the stories. That's up to the team's self-organization as to how they want that to happen. But everyone should be engaged in coming up and refining stories. (This involvement is in addition to the develpers' responsibility to estimate stories.)

The product owner does have a special responsibility. In the end the product owner is the final decider on stories, particularly their prioritization. This reflects the fact that the product owner should be the best person to judge that slippery attribute of business value. But having a final decision maker should never stop others from participating, and should not lead people astray into a decreed model of stories.


Bliki: DslBookRoadmap

Time for another update on my DSL book's progress, since I've not been writing anything else recently.

I had my first round of technical review late in 2009 and have been incorporating comments into the current drafts. Progress on this has gone well, in large part because travel is light this time of the year. I'm also integrating my book production process into that of Pearson's.

The next visible targets are a second round of technical review and the launching of a roughcut. We're hoping to get these going in the next couple of months. The roughcut will also allow people other than official reviewers the chance to throw rocks at the text.

After that the material will be gradually readied for production. We're going to use a much more incremental process than I've used before, which will be both good and interesting. My sense at the moment that we'll see a physical book on bookshelves by the final quarter of 2010. It's currently looking at around 500 pages total in a DuplexBook split 150/350

The material currently on my web-site was last updated in June. While I've done quite a lot of detailed work on the book since, the broad structure is pretty similar, so the website gives a reasonably good picture of the scope of content.


Apologies for Canceling Texas Speaking Events

I’m afraid I’ve had to cancel my speaking events in Dallas and Austin next week due to a family medical problem. As I write this, it’s not clear how serious the problem is going to be, but there is a good chance that I won’t be able to travel to Texas next week. As a result we felt it was best to cancel the events, while we still have a few days notice. We do intent to reschedule as soon the as dust settles. My Texas ThoughtWorkers are very keen to have me come out and do these talks, so we want to do them as soon as we reasonably can.

My apologies for this, and I hope you understand. In particular I want to thank the various collaborators in organizing these events for being very understanding under the awkward circumstances.


Bliki: TechnicalDebtQuadrant

There's been a few posts over the last couple of months about TechnicalDebt that's raised the question of what kinds of design flaws should or shouldn't be classified as Technical Debt.

A good example of this is Uncle Bob's post saying a mess is not a debt. His argument is that messy code, produced by people who are ignorant of good design practices, shouldn't be a debt. Technical Debt should be reserved for cases when people have made a considered decision to adopt a design strategy that isn't sustainable in the longer term, but yields a short term benefit, such as making a release. The point is that the debt yields value sooner, but needs to be paid off as soon as possible.

To my mind, the question of whether a design flaw is or isn't debt is the wrong question. Technical Debt is a metaphor, so the real question is whether or not the debt metaphor is helpful about thinking about how to deal with design problems, and how to communicate that thinking. A particular benefit of the debt metaphor is that it's very handy for communicating to non-technical people.

I think that the debt metaphor works well in both cases - the difference is in nature of the debt. A mess is a reckless debt which results in crippling interest payments or a long period of paying down the principal. We have a few projects where we've taken over a code base with a high debt and found the metaphor very useful in discussing with client management how to deal with it.

The debt metaphor reminds us about the choices we can make with design flaws. The prudent debt to reach a release may not be worth paying down if the interest payments are sufficiently small - such as if it were in a rarely touched part of the code-base.

So the useful distinction isn't between debt or non-debt, but between prudent and reckless debt.

There's another interesting distinction in the example I just outlined. Not just is there a difference between prudent and reckless debt, there's also a difference between deliberate and inadvertent debt. The prudent debt example is deliberate because the team knows they are taking on a debt, and thus puts some thought as to whether the payoff for an earlier release is greater than the costs of paying it off. A team ignorant of design practices is taking on its reckless debt without even realizing how much hock it's getting into.

Reckless debt may not be inadvertent. A team may know about good design practices, even be capable of practicing them, but decide to go "quick and dirty" because they think they can't afford the time required to write clean code. I agree with Uncle Bob that this is usually a reckless debt, because people underestimate where the DesignPayoffLine is. The whole point of good design and clean code is to make you go faster - if it didn't people like Uncle Bob, Kent Beck, and Ward Cunningham wouldn't be spending time talking about it.

Diving debt into reckless/prudent and deliberate/inadvertent implies a quadrant, and I've only discussed three cells. So is there such a thing as prudent-inadvertent debt? Although such a thing sounds odd, I believe that it is - and it's not just common but inevitable for teams that are excellent designers.

I was chatting with a colleague recently about a project he'd just rolled off from. The project that delivered valuable software, the client was happy, and the code was clean. But he wasn't happy with the code. He felt the team had done a good job, but now they realize what the design ought to have been.

I hear this all the time from the best developers. The point is that while you're programming, you are learning. It's often the case that it can take a year of programming on a project before you understand what the best design approach should have been. Perhaps one should plan projects to spend a year building a system that you throw away and rebuild, as Fred Brooks suggested, but that's a tricky plan to sell. Instead what you find is that the moment you realize what the design should have been, you also realize that you have an inadvertent debt. This is the kind of debt that Ward talked about in his video.

The decision of paying the interest versus paying down the principal still applies, so the metaphor is still helpful for this case. However a problem with using the debt metaphor for this is that I can't conceive of a parallel with taking on a prudent-inadvertent financial debt. As a result I would think it would be difficult to explain to managers why this debt appeared. My view is this kind of debt is inevitable and thus should be expected. Even the best teams will have debt to deal with as a project goes on - even more reason not to recklessly overload it with crummy code.


Bliki: FeatureBranch

With the rise of Distributed Version Control Systems (DVCS) such as git and Mercurial, I've seen more conversations about strategies for branching and merging and how they fit in with Continuous Integration (CI). There's a bit of confusion here, particularly on the practice of feature branching and how it fits in with CI.

Simple (isolated) Feature Branch

The basic idea of a feature branch is that when you start work on a feature (or story if you prefer that term) you take a branch of the repository to work on that feature. In a DVCS, you'll do this in your personal repository, but the same kind of thing works in a centralized VCS too.

I'm going to illustrate this with a series of diagrams. I have a shared project mainline, colored blue, and two developers, colored purple and green (since the developers names are Reverend Green and Professor Plum).

I'm using labeled colored boxes (eg P1 and P2) to represent local commits on the branch. Arrows between branches represent merges between branches, the boxes are colored orange to make them stand out. In this case there are updates, say a couple of bug-fixes, applied to the mainline (presumably by Mrs Peacock). When these happen our developers merge them into their work. To give this a sense of time, I'll assume we're looking at a few days work here, with each developer committing to their local branch roughly once a day.

In order to ensure things are working properly, they can run builds and tests on their branch. Indeed for this article I'll assume that each commit and merge comes with an automated build and test on the branch it's on.

The advantage of feature branching is that each developer can work on their own feature and be isolated from changes going on elsewhere. They can pull in changes from the mainline at their own pace, ensuring they don't break the flow of their feature. Furthermore it allows the team to choose its features for release. If Reverend Green takes too long, we can release with just Professor Plum's changes. Or we may want to delay Professor Plum's feature, perhaps because we are uncertain that the feature works the way we want to release it. In this case we just tell the professor to not merge his changes into mainline until we are ready for the feature. This is called cherry-picking, the team decides which features to merge in before release.

Attractive though that picture looks, there can be trouble ahead.

Although our developers can develop their features in isolation, at some point their work does have to be integrated. In this case Professor Plum easily updates the mainline with his own changes. There's no merge here because he's already incorporated the mainline changes into his own branch (there will be a build). Things are however not so simple for Reverend Green, he needs to merge all of his changes (G1-6) with all of Professor Plum's (P1-5).

(At this point many users of DVCSs may feel I'm missing something as this is a simple, perhaps simplistic view of feature branching. I'll get to a more involved scheme later.)

I've made this a big merge box as it's a scary merge. It may be just fine, the developers may have been working on completely separate parts of the code base with no interaction, in which case the merge will go smoothly. But they may be working on bits that do interact, in which case here lye dragons.

The dragons can come in many forms, and tooling can help slay some of them. The most of obvious dragon is the complexity of merging the source code and dealing with conflicts as developers edit the same files. Modern DVCSs actually handle this rather well, indeed somewhat magically. Git has quite the reputation for dealing with complicated merges. So much so that the textual issues of merging are much better than they used to be - indeed I'll go so far as to discount textual conflicts for the purposes of this article.

The problem I worry more about is a semantic conflict. A simple example of this is that if Professor Plum changes the name of a method that Reverend Green's code calls. Refactoring tools allow you to rename a method safely, but only on your code base. So if G1-6 contain new code that calls foo, Professor Plum can't tell in his code base as he doesn't have it. You only find out on the big merge.

A function rename is a relatively obvious case of a semantic conflict. In practice they can be much more subtle. Tests are the key to discovering them, but the more code there is to merge the more likely you'll have conflicts and the harder it is to fix them. It's the risk of conflicts, particularly semantic conflicts, that make big merges scary.

This fear of big merges also acts as a deterrent to refactoring. Keeping code clean is constant effort, to do it well it requires everyone to keep an eye out for cruft and fix it wherever they see it. However this kind of refactoring on a feature branch is awkward because it makes the Big Scary Merge worse. The result we see is that teams using feature branches shy away from refactoring which leads to uglier code bases.

Continuous Integration

It's these problems that Continuous Integration was designed to solve. With Continuous Integration my diagram looks like this.

There's a lot more merging going on here, but merging is one of those things that's much easier to do frequently and small rather than rarely and large. As a result if Professor Plum is changing some code that Reverend Green relies on, the Reverend will find it early, such as when he merges in P1-2. At that point he's only got to modify G1-2 to work with the changes, rather than G1-6.

CI is effective at removing the problem of big merges, but it's also a vital communication mechanism. In this scenario the potential conflict will actually appear when Professor Plum merges G1 and realizes that Reverend Green is actively building on Plum's libraries. At this point Professor Plum can go and find Reverend Green and they can discuss how their two features interact. It may be that Professor Plum's feature requires some changes that don't mesh well with Reverend Green's changes. By looking at both their features they can come up with a better design that affects both their work-streams. With the isolated feature branches our developers don't discover this till late, probably too late to do much about it. Communication is one of the key factors in software development and one of CI's most important features is that it facilitates human communication.

It's important to note that, most of the time, feature branching like this is a different approach to CI. One of the principles of CI is that everyone commits to the mainline every day. So unless feature branches only last less than a day, running a feature branch is a different animal to CI. I've heard people say they are doing CI because they are running builds, perhaps using a CI server, on every branch with every commit. That's continuous building, and a Good Thing, but there's no integration, so it's not CI.

Promiscuous Integration

Earlier I said parenthetically that there are other ways of doing feature branching. Say Professor Plum and Reverend Green take tea together early in the cycle. While chatting they discover they are working on features that interact. At this point they may choose to integrate with each other directly, like this.

With this approach they only push to the mainline at the end, as before. But they merge frequently with each other, so this avoids the Big Scary Merge. The point here is that the primary issue with the isolated feature branching scheme is its isolation. When you isolate the feature branches, there is a risk of a nasty conflict growing without you realizing it. Then the isolation is an illusion, and will be shattered painfully sooner or later.

So is this more ad-hoc integration a form of CI or a different animal entirely? I think it is a different animal, again a key point of CI is everyone integrates to the mainline every day. Integrating across feature branches, which I shall call promiscuous integration (PI), doesn't involve or even need a mainline. I think this difference is important.

I see CI as primarily giving birth to a release candidate at each commit. The job of the CI system and deployment process is to disprove the production-readiness of a release candidate. This model relies on the need to have some mainline that represents the current shared, most up to date picture of complete.

--Dave Farley

Promiscuous Integration vs Continuous Integration

So if it's different is PI better than CI, or more realistically under what circumstances is PI better than CI?

With CI, you lose the ability to use the VCS to do cherry picking. Every developer is touching mainline, so all features grow in the mainline. With CI, the mainline must always be healthy, so in theory (and often in practice) you can safely release after any commit. Having a half built feature or a feature you'd rather not release yet won't damage the other functionality of the software, but may require some masking if you don't want it to be visible in the user-interface. This can be as simple as not including a menu item in the UI to trigger the feature.

PI can provide some middle ground here. It allows Reverend Green the choice of when to incorporate Professor Plum's changes. If Professor Plum makes some core API changes in P2, then Reverend Green can import P1-2 but leave the others until Professor Plum's feature is put onto the release.

One worry with all this picking and choosing is that PI makes it really hard to keep track of who has what in their branch. In practice, it seems tooling pretty much solves this problem. DVCSs keep a clear track of changes and their origins and can figure out that when Professor Plum pulls G3 he already has G2 but doesn't have B2. I may have made mistakes drawing the diagram by hand, but tools do keep track of these things well.

On the whole, however, I don't think cherry-picking with the VCS is a good idea.

Feature Branching is a poor man's modular architecture, instead of building systems with the ability to easy swap in and out features at runtime/deploytime they couple themselves to the source control providing this mechanism through manual merging.

--Dan Bodart

I much prefer designing the software in such a way that makes it easy to enable or disable features through configuration changes. My colleague Paul Hammant calls this Branch by Abstraction. This requires you to put some thought into what needs to be modularized and how to control that variation, but we've found the result to be far less messy that relying on the VCS.

The main thing that makes me nervous about PI is the influence on human communication. With CI the mainline acts as a communication point. Even if Professor Plum and Reverend Green never talk, they will discover the nascent conflict - within a day of it forming. With PI they have to notice they are working on interacting code. An up-to-date mainline also makes it easy for someone to be sure they are integrating with everyone, they don't have to poke around to find out who is doing what - so less chance of some changes being hidden until a late integration.

PI arose out of open-source work, and it could be that the less intensive tempo of open-source could be a factor here. In a full time job, you work several hours a day on a project. This makes it easier for features to be worked in priority. With an open source project people often put in a hour here, and the next hour a few days later. A feature may take one developer quite a while to complete while other developers with more time are able to get features into a releasable state earlier. In this situation cherry picking can be more important.

It's important to realize that the tools you use are largely independent of the integration strategy you use. Although many people associate DVCSs with feature branching, they can be used with CI. All you need to do is mark one branch on one repository as the mainline. If everyone pulls and pushes to that every day, then you have a CI mainline. Indeed with a disciplined team, I would usually prefer to use a DVCS on a CI project than a centralized one. With a less disciplined team I would worry that a DVCS would nudge people towards long lived branches, while a centralized VCS and a reluctance to branch nudges them towards frequent mainline commits. Paul Hammant may be right: "I wonder though, if a team should not be adept with trunk-based development before they move to distributed."


Bliki: DigitalSLR

Like many geeks I'm into photography. We geeks like photography because it provides the veneer of an artistic endeavor while allowing us to indulge in lots of technical details and spend money on expensive toys. A friend recently asked about my camera buying decisions, a question that prompted me to write them down. I got my first digital SLR a year ago. Before that I had owned a film SLR for many years, but started using digital cameras around 2000. I found the convenience of digital to be compelling and stopped using the film camera. I toyed with getting a digital SLR in 2004, but instead decided on a high end fixed lens camera - the Minolta A1. I enjoyed using it, but it conked out late in 2007. I considered a similar kind of camera, something like a Canon S5, but decided to bite the SLR bullet.

My first decision, and a critical one, was which system to buy. This is the critical decision as it's difficult (ie expensive) to reverse. Once you pick your system, you'll then commit money to it by buying lenses and the cost of switching is more than a dabbler like me can go with. I felt that the best choice was to go with the big two - Canon or Nikon. The choice between them was pretty much arbitrary, I ended up choosing Canon because a friend we occasionally vacation with has a Canon. A trifling distinction, but really the choice between the two wasn't a big one.

I'm still reasonably happy with it. One misgiving is that the technological advantage seems to have tipped in Nikon's favor over the last year, at least according to the blogs I read, but it's a tight race and Canon could well come back. I've also been recently intrigued by the new Micro Four Thirds format. Early days (and not around last year) but the small size and weight are very important to me.

With Canon as the choice, the next step was the initial choice of body and lenses. My approach was to get pretty much the cheapest body I could (the Digital Rebel XTI) because I'd rather spend more money on lenses than on the body. The whole point of SLRs is to have good lenses, so I'd rather concentrate my limited dollars there. Cameras also get upgraded much more frequently, so I'm likely to upgrade the camera in a few years while lenses stay current for much longer.

So which lenses? I forgoed the kit lens and got the camera body-only. As my main lens I went with a mega-zoom, the Sigma 18-200. Serious photographers will, probably rightly, turn their noses up at this lens. But I'm a dabbler. Most of my photos will only be seen on my screensaver or on a web page. A few get printed for a wall of our house, but only on a regular letter size printer. So I doubt that I'd appreciate the difference of a higher quality lens. Furthermore I can shoot within its limits. Reviews suggest that if you stick to f9, the quality stays pretty good. Since I'm mostly using it outside during the day, that limitation is easy to live with. As a result I tend to set my camera to aperture priority with f9, and that covers most of my shots.

The advantages of a single mega-zoom are considerable to me. Most of my photographs are taken while I'm doing something else, often with others around. I don't go out much to just shoot. In that situation even changing lenses can be a significant deterrent to getting a shot. Furthermore size and weight are a big deal when I'm travelling. While the lens isn't exactly svelte, it's much more compact that the alternative ways of getting that kind of zoom range. A final bonus is that it's image stabilized, which allows me to use it for static interiors.

The mega-zoom stays on my camera most of the time, but it wasn't the only lens I got with my camera. I also picked up the f1.8 50mm. This is an easy lens to get, very cheap, very light, very small but produces great quality. Since it's the equivalent of a 80mm on 35mm film, it's ideal for portrait photos - particularly with the f1.8 aperture. I use it a lot for shooting people in low light conditions.

I toyed with other lenses, but I wanted to get used to those two before I plonked money on any more.

After a few months with the camera I turned my eyes to a tripod. There are varying views on the net about tripods, some feel you should only use them if you really have to, some that you should use them whenever you can. I do like having one around, particularly for crepuscular shooting. I had a cheap and crummy silk tripod, but Duncan's blog persuaded me that I should get something better. I didn't go for his preferred Gitzos (beyond my budget) but I did get a light Induro tripod, together with a Really Right Stuff head and fast release clamp.

I went for the lightest setup I could get, as I wanted something that I'd actually be prepared to carry around and my camera/lens combos aren't particularly heavy. The fast release clamp was important as I'm someone who like to move around when shooting and such a clamp makes a big difference. In hindsight I wish I'd paid the extra for an L clamp, as I do find it frustrating to futz with the head when switching orientations.

It was only a month or so more before I went for another lens. A trip out to Colorado and Utah was the trigger to think about something wider than the 18-200 would go. I considered the Tokina 11-16 and the Canon 10-22, going for the latter due to it being lighter. It's a fun lens to use, allowing a few different things than what my regular lens provides. In particular what's interesting to work with is the huge depth of field you can get with an ultra wide: at 10mm you can easily get everything from a foot to infinity.

This is probably a reasonable moment to talk about filters. There's a good bit of discussion on the net about whether putting on a UV filter is worthwhile. I decided to get one for the 18-200 as it's on my camera so much, but not to get ones for my other lenses as I use them much less and am prepared to be more careful when those lenses are on the camera. For the mega-zoom I also picked up a polarizing filter, which I carry around with me all the time, but frequently forget to use.

The other issue that obsesses camera people is how to carry all this stuff. All things being equal, I like weight on my waist. So I went for a waist belt (from Tamrac, due to the double belt layout) and a Think Tank holster. I like the Think Tank's ability to extend when I have the hood on my lens. The only problem is that there are plenty of occasions when a waist belt isn't an option. The holster comes with a shoulder strap, which is fine, but I usually want the 10-22 as well. Cindy came to the rescue, sewing some straps onto the side of the holster so I can attach a lens pouch.

To keep track of my photos, and to do some post-processing, I got a copy of Apple's Aperture. (It seemed a toss up between Aperture and Lightroom.) I find it works well, better than sticking with iPhoto.

The latest lens I added to my collection is the Canon f2 100mm. I got this for shooting indoors, particularly at conferences for shooting someone on stage. In those situations I need more reach than the 50mm, but I still want a really fast aperture at a price and weight that's rather less than the serious zooms. So far I've only used the 100mm a couple of times, but have been very happy with it.

That burst of buying isn't something I expect to maintain. The quartet of lenses I have is pretty suited to my needs. There are some more I'm eyeing. The Canon 100-400mm zoom would be great for wildlife shots, but frankly we're rarely in the situation where I'd use it, so it's hard to justify its high cost. A different situation that regularly tickles my mind is cases where I'm primarily at a conference (so have the 50 and 100mm) but don't want to lug the 18-200 and want to have something wider. I could take the 10-22, but that leaves a gap and is less light than I'd like. Ironically this suggests the (now updated) kit zoom which is cheap and light. The primes less than 50mm are either too heavy, too expensive, or seem to have less quality than the kit zoom.

If you're curious, here are the results, (because there just aren't enough holiday snaps on the web.)


Bliki: SelfInitializingFake

One of the classic cases for using a TestDouble is when you call a remote service. Remote services are usually slow and often unreliable, so using a double is a good way to make your tests faster and more stable.

When you're querying a remote service, you need to find a way to load the expected data into your double. One way to do this is to use what I'm dubbing a self-initializing fake. The basic plan is simple. The first time you call the fake it passes the call onto the actual remote service, and as it returns the data it takes and saves a copy. Further calls just return the copy. In a sense this is like a cache, but with the important difference that there is no attempt to handle cache invalidation, which is handy as that's one of the TwoHardThings.

I've called this a fake, as that seems the closest fit from the various varieties of test doubles. The other reasonable alternative is a stub, but the distinction here is that a stub needs setting up when you build the fixture, while fakes are autonomous.

The interesting thing about a self-initializing fake is how you deal with situations where the remote service changes it's response.

One time I saw this approach was with a database controlled by another application. In this case the data did change, frequently. This is unhelpful for tests, because automated tests rely on getting the same answers to the same questions. But usually tests don't care whether the data is up to date or not, so saving an old value worked just fine.

I ran into this again recently while chatting with my colleague Josh Price. In his case the remote data was supposedly static, but occasionally there were changes, which would imply that the system he was developing needed to change - usually to handle formatting issues. In this case he had a special test suite that would get all self-initializing fakes to call the remote service and check that they returned the same value that was saved.

In this case early stages of their build pipeline ran against the fake, and the last (slowest) stage ran against the service itself. One interesting problem was that the remote service required some unimportant parameters which changed from call to call but didn't alter the results. These were stripped out of the URL when the fake looked the values up from the store.

(Thanks to Josh Price, Darren Cotterill, and Gerard Meszaros for their help with this piece.)




 
CPG-News Theme © Akamu


The logos and trademarks used on this site are the property of their respective owners.
We are not responsible for comments and contributions (photos, downloads, etc) posted by our users, as they are the property of the poster

Interactive software released under GNU GPL, Code Credits, Privacy Policy