Sunday, January 09, 2011

Stylistic defaults and awareness- a neglected college lesson.

I am a young SDE.  I’ve worked in “industry” for Fog Creek Software for a little under two years now (including my internship), which means that the bulk of my software development career occurred in college. 

Although most of my development time may have occurred in college, the vast majority of my productivity has occurred while working at Fog Creek.  This is interesting to me, and I’ve picked up many practices in industry which were generally outside of my programming repertoire as a student.

Some of these items are simple and obvious: fluency in debuggers, code editor/IDEs, version control systems….

Other items are less obvious, at least they were to me.  One such item is programming style.  To be specific, I’m talking about what would commonly be documented in a “Coding Style Guide” for a given language.  In programming teams of any scale, language-specific style guides are extremely beneficial, so much so that companies like Google will have style tests that must be passed before a programmer is allowed to develop in that language.  While I think mandates of that nature can be beneficial, I feel that the more important skill is for a programmer to have a conscious awareness of their surrounding code, and a good set of defaults.

Defaults are more generally applicable, so let’s talk about those first.

Good Defaults

Part of my job at Fog Creek involves interviewing potential hires and interns.  I don’t generally count coding style as points for or against the candidate unless it’s remarkably good or bad.  A common pattern I see in (failing) candidates is an inability to name their variables well, followed by moments of them struggling with their own code and introducing bugs because of confusion caused by their own poorly-named variables.  Even in strong candidates who have no trouble following their own code, it’s uncommon for them to not ponder a variable name for at least a few seconds.

These things are happening because they don’t have a strong set of defaults.  At Fog Creek, we conform to an in-house programming style which has evolved from the principles outlined in one of Joel’s articles. The basic idea is as follows:

Variables should be prefixed with their intended data type.

There are many caveats to this, and it means many things.  Most superficially, it has driven us to have the prefixes such as:

  • f: Flag; Boolean values
  • n: n-Way Flag;  Enums, etc.
  • s: Strings
  • ix: numeric indices and GUIDs
  • c: numeric counts.
  • rg: Ranges; c-style arrays.

There are many more which can be extrapolated based on context.  E.g., if I have a class “Dog”, then instance of Dog will likely have the “dog” prefix.

More importantly, these prefixes are composable:

  • rgs: An array of strings
  • sRg: A string representation of an array
  • ixrgs: An index into an array of strings
  • cRgsIxDog: The size of an array of string representations of indices of dogs.

These names and this specific style.. are not important.  There are tradeoffs in all chosen styles, this just happens to be the one that I use.  What’s interesting about a style like this is that you can extract a lot of useful defaults.

If I have a function which accepts an instance of a Dog and returns that dog’s name as a string, then I can thoughtlessly construct the following function prototype:

string GetSName(Dog dog)

While it could be debated as to whether this is the perfect name for such a function, I could argue that it’s a pretty good name, and further that every decision about the name was made automatically.

If I felt I needed to augment the prototype with additional information, I could do so:

string GetSNameThatWasGivenByItsOwner(Dog dogWhoseNameWeWillGet)

In a sufficiently complicated system, elaboration of this type can be useful, however most of the time just using the bare-minimum prefixes gives me a sufficient amount of information.

Let’s hash out the body of this function…

string GetSName(Dog dog)
{
  var tag = dog.collar.tags[TagTypes.Rabies];
  foreach(var vet in GetVetarinariansByZipcode(11225))
  {
     var s = vet.GetDogNameByTag(tag);
     if(!String.IsNullOrEmpty(s))
     {
        return s;
     }
  }
  return “Fido”;
}

I wrote this in an intentionally verbose and silly manner, but even so all of the variable names were automatic.  The rabies tag gets the variable name “tag”, because it’s a Tag.  If there I were handling multiple tags within this function, I would have disambiguated by calling it “tagRabies”, but that wasn’t necessary.  Ditto for the veterinarian, however I shortened that variable to “vet” since it also wasn’t ambiguous.  “s” could be validly called “sName”, but again that’s not necessary within the tiny scope of this function.

This doesn’t sound very special, but here’s the thing:  All of my variable names, even in a context where there were not yet any established prefixes, were completely automatic.  I could spend my time thinking about other things and I completely avoided all of the small nagging variable naming decisions.  Is this function optimally beautiful? No, but for anyone familiar with the coding style, it’s perfectly reasonable and readable.  The same cannot be said for:

string Get(Dog gie)
{
  var foo = gie.collar.tags[TagTypes.Rabies];
  foreach(var my in GetVetarinariansByZipcode(11225))
  {
     var giesname = my.GetDogNameByTag(foo);
     if(!String.IsNullOrEmpty(giesname))
     {
        return giesname;
     }
  }
  return “Fido”;
}

In my examples I’m focusing on naming conventions.  I don’t mean to imply that “coding style” == “naming conventions”, however I do feel that good names are a big step toward good style.  There are countless metrics that can be used to produce “high quality code”, and they all need some level of conscious involvement.  The core of my point is that for any stylistic choice, having a reasonable default will improve your speed and quality.  This doesn’t mean that you’re exempt from all stylistic decisions, but there’s no reason to deeply ponder every single variable name.

Improve base programming quality and reduce menial thought overhead by choosing a set of good stylistic defaults.

 

Stylistic Awareness

This comes into play when modifying or augmenting existing code.  A simple test for awareness is to examine the fresh code alongside the legacy code and get a feeling for if the code feels like it was written by a different person. Can you even tell which code is legacy?  If the change is stylistically identical to the surrounding code, that’s a good indicator that the programmer was stylistically sensitive.  If the change is obviously in a different style, then either they are intentionally fixing a broken or damaged style, or they were unaware.

This skill can be very important when working with legacy code.  Legacy code tends to be painful to read in the first place, and stylistic consistency is a great way to reduce the burden on fresh programmers trying to handle the legacy codebase.  The further the codebase devolves into a series of differently-styled hacks, the smellier it gets.

The most important coding style is generally the one that’s already in place.  As appropriate,  adapt your style to conform.

Note also that I’m not claiming that it’s in any way ideal to live with a legacy codebase, it’s just a common situation.  Even outside of legacy code, stylistic awareness can be a wonderful trait.  If you were vetting an incoming patch into your open-source project, would you feel better about it if it already looked and felt like your own code?

The neglected lesson

I feel like these are things which were mentioned to me in passing during my college career, but I never incorporated them until after I reached industry.  I can remember reading a syllabus which outlined the programming style which was to be used in one of my courses, and I also remember that I followed it only so much as to get full “style points” during grading.

How was programming style emphasized during your college career?  If you’ve settled on a style, what made you do so?

2 comments:

John Haugeland said...

I don't know if I think it's funnier that you think Joel invented Hungarian notation, or that Joel still hasn't caught on that even Simonyi knows why it's a bad idea.

Pro tip: when you're two years in, you don't get to give pro tips, because you don't understand the basic issues involved in long term maintenance yet.

Think you do? Check the warts on tWinMain. They still think Windows is a 16-bit OS, and that cannot change.

Jude Allred said...

That's interesting, as I don't think that. I apologize, as I must have communicated unclearly. Let's see if I can address these points.

I'm guessing your "Joel invented Hungarian notation" comes from my "At Fog Creek, we conform to an in-house programming style which has evolved from the principles outlined in one of Joel’s articles."
To paraphrase that: there was a point in time correlating to when that article was written that the mentality in that article was shared by Fog Creek. In time, that mentality has evolved.

I don't believe I address the invention of Hungarian notation.

As for Joel's opinions relating to old articles, I'm not able to speak for someone else's opinions. If your conclusion is based on an old article, I would submit that your data may be sparse.

As for issues relating to my naivety, certainly we all have a lot to learn, and I'm of the opinion that knowledge exchange tends to be beneficial. Would you be interested in writing an article on what you feel to be the core issues involved in long-term maintenance? I would enjoy reading it.

Thanks John,
- Jude