Hi All,

I started this general discussion because I want to know what kind of standards you all work by with regards to documenting your projects.

I've been steadily working on a system at work since last spring that is automating (extracting, transforming, & loading) data flows within my department. It was a brand new system that has taken coworkers from copying and pasting data-sets into Excel, to automatically downloading and storing data-sets in an SQL Server database. However, I am the only one working on this project, with some help from some great guys from a different department (when I need help). Because of being the only person on this big project, I haven't been able to document very much along the way. I've made ERDs, DFDs, a couple Use Cases, as well as a few other documents that showed progression (not Gantt charts). While this has been alright for me and those involved in my department, I've been thinking about what problems may arise when someone other than me (in the future) has to analyze and fix the problems. Because of the lack of documentation, future maintainenance will be a big issue if I'm not the one maintaining this beast.

All that being said, are you thorough with your documentation and do you document every aspect of your projects? Or, do you have a document writer?

I find documenting is a very time-consuming aspect of system development, necessary but time-consuming, especially for a one-man-show.


While this has been alright for me and those involved in my department, I've been thinking about what problems may arise when someone other than me (in the future) has to analyze and fix the problems. Because of the lack of documentation, future maintainenance will be a big issue if I'm not the one maintaining this beast.

What I always like to remind people too is that there isn't much difference between someone who is a stranger to your code and yourself a few months down the line when you've moved on and don't remember much of the details of the code. So, this is not only going to be an issue if someone other than you has to maintain it, it is also an issue if you have to maintain it later. If in six months you get asked to update or fix some bug in the details of an implementation that you wrote 9 months earlier, will you really be able to just jump right into it? Or will you have to re-learn all about the details of the implementation? Probably the latter.

All that being said, are you thorough with your documentation and do you document every aspect of your projects? Or, do you have a document writer?

It can be quite difficult to keep up with this, I know I have a hard time doing it. It's really a multi-tier system. If I start from the inside, going towards the outside, I would say this:

  1. Write clear, well spaced-out code using meaningful names and that "reads like prose" as much as possible.
  2. Any chunk of code (5-10 lines) that isn't self-explanatory enough (from the meaningful names being used) should be briefly explained with a line or two of comments.
  3. Document the interface in the code (preferably with tags (e.g. doxygen-tags, or JavaDoc tags)), meaning that every class and function should have at least a brief explanation of what it does and what its parameters / return values are, and preferably other info like error conditions and the important pre-/post-conditions.
  4. Use the unit-tests (you are writing unit-tests, right?!?) as the "examples" of how the different individual parts of the code are used.
  5. Write a few more integration examples that represent typical uses, and be generous with in-code comments. Possibly, turn those examples into small tutorials.
  6. Create high-level overviews of the code architecture, if you don't have those already from when you actually designed the architecture.
  7. Summarize everything (code reference, integration-tests / tutorials, and high-level diagrams) into a manual or webpage.

Depending on the size of the project, and its visibility, you might not need to go all the way to steps 6 and 7. And depending on your design practices, you might have step 6 mostly done already anyways.

As far as I'm concerned, steps 1-3 are just an integral part of basic day-to-day programming, if you don't do these things, you're just not good at coding, period. Steps 4 and 5 are pretty much necessary as part of the development process of any non-trivial project. You need to write unit-tests (even if they are not "formal") to have some basic quality assurance, and since they must exercise all (or at least, most) of the functionality of the library / applications, they are usually a pretty good start for showing the "how-to's" of your library / application, at least, in raw form. More or less the same goes for integration tests. In other words, as you incrementally test the software you create a little informal or raw collection of example programs or scripts, so don't destroy or overwrite them. These little programs (unit-tests and small integration tests) are extremely valuable for maintenance and debugging, obviously, since it is much easier to debug or play with a small 100 LoC program / script than to plow through the end-user application.

Step 6, as I said, depends on the person. I am generally not a huge fan of detailed UML diagrams, when I see them, I usually skip them or very quickly skim through. I hate seeing large UML diagrams with lists of data members and methods and every little insignificant detail of the interfaces. To me, diagrams are useful for high-level relationships between the parts of the whole. I say, if there is a particular aspect of the architecture that needs to be explained at a high-level, make a diagram for it, but don't make it your mission to document every aspect of your software in the form of diagrams, because most of it will be flooded with details, not very interesting (a "normal" architecture doesn't require much explanation), and useless to most people (will glance over it, won't care, etc.).

Step 7 is more for a very visible and public software ("public" might mean internal to the company, of course). It is basically creating a one-stop-shop for all-you-need-to-know about the software, and try to explain it well. This, obviously, can be a very time-consuming task, so be sure that you don't do it for no reason (you spend weeks creating the docs and nobody ends up reading it). This is also a bit of a chicken-and-egg problem for open-source libraries, i.e., very few people will use your library if you don't have good documentation, but it's not worth writing detailed documentation if only very few people are using your library.

Anyhow, I tend to try to be reasonably thorough from steps 1 to 5. But it gets a bit harder after that (I'm more on the side of "coding before UML diagrams", as opposed to "do all the UML diagrams before touching the code").

commented: Great post, thank you very much! +0

you are writing unit-tests, right?!?

I have performed many unit-tests but I haven't documented any of them, other than in the code itself (I follow point 1 and have since I started this line of work). I have always provided documentation when it comes to working on LOC, no matter how small the number. That being said, I give a general description of what the code does in general but I don't document every line of code the way my prof's like in university (I think that's overkill). I've been a good boy with regards to point 3. As for step 5, I have yet to create a tutorial even though I know that I must. As per point 6, I've had to do that but I haven't kept up because of time-demands regarding this system.

I guess you could classify me as a guy that has only partially documented this system. I think I am going to retrace some increments of this system and create the documentation that I should have. As well, I'm going to be diligent in the future.

Thanks Mike for taking the time here, I really appreciate it.

What mike said.

I find it useful to write the comments first and fill in the code as I go. For one thing it gives the code more of a narrative feel and helps to minimize the useless comments like "add one to x". A lot of what I wrote at Dovercourt was highly interconnected. The overall system diagram looked like a hundred spiders all holding hands. Ugly but necessary.

My boss, Geoff, because of his electrical engineering background, liked to document with what looked like wiring diagrams with lots of off-page connectors. I could never follow this, and because it was paper based, it was never up to date. Instead, I created a connections database where every node was either a process, a data source or a data sink. Each node could have any number of other nodes as inputs or outputs. A small GUI allowed you to select any node and it would show all the inputs to that node in one listbox and all the outputs in another. Double-clicking on any input or output would make that node the current node and show its inputs and outputs (along with useful node info such as who to contact when that node was having a problem or any useful debugging tips). If you are interested I believe Daryl Godkin is the person at Dovercourt currently maintaining my old code (assuming there is any of it left). If you get to see Daryl, ask to see his African Safari pictures. He is an accomplished wildlife photographer. Be sure to tell him Jim says hi.

I just got handed the "official" documentation templates and I have to say, there is so much redundancy in these documents, they're almost dismal. I really don't get how we can be expected to use such templates as every system to be built will be different. I guess the techies here have to try and streamline though. As for those "wiring" diagrams, with regards to system's design, I've seen them here and I think that it's because MB Hydro is so stacked with engineers and they understand those types of diagrams. I find them confusing. I personally like the use of DFDs for both high and low level diagrams and have created a few already. I was talking to an SAP fellow here and he told me that there is a person in his department who's main task is to create the documents needed.

I guess that in the end, all I can do with this regard is try to be on top of the whole documentation process but given I am a one-man-show in my department, that's all I can offer!

So much work, so little time!

MB Hydro is so stacked with engineers and they understand those types of diagrams

Too bad it's not the engineers who have to do the maintenance.

I am a one-man-show in my department

That was pretty much the situation I was in. There was one other person, but he was no help with the programming. However, he was pretty amazing when it came to the hardware so he was the person who built the servers and swapped boards, drives, etc. I'm dangerous when I handle electronics (dangerous to the circuitry, not myself).

As the sole programmer I had very little time to maintain paper documentation, thus the connections database. The typical scenario was a failure of a process at 3:00 AM. If it wasn't hardware related (as in the hosting machine or network) then it was usually a failure of one or more inputs. Using the GUI it was easy to locate those and follow the trail "upstream". As I learned certain tricks and checks I added them to the database for the next failure. Because everything was in one place it was always easily located and modified and because it was digital there was never an out of date copy to confuse people.

If I'd stuck around another year or two I might have developed a more elegant interface but when choosing between functional or pretty I always choose functional.

By the way, I forgot to mention a few more of my "rules"

  • if you don't know what your program is supposed to do then don't start writing it
  • clear is always better than clever
  • if you absolutely must choose clever over clear then document the hell out of it
  • always program as if the person who will maintain your code is a psychopath who knows where you live

always program as if the person who will maintain your code is a psychopath who knows where you live

Lmao, I'm definitely going to keep that in mind.

I find your connections db suggestion to be a good one and very understandable. Designing it must have been tricky though.

I don't recall the exact design but it was not complicated. Because you are a Hydro employee you could probably get a copy of the database and APP from Daryl. I was curious about it myself so I sent him an email requesting the same. I thought I had made a copy of all my software before I retired. Guess I didn't. Probably for the best because I'd probably still (3 years later) be rewriting pieces and sending the new code in.

Probably for the best because I'd probably still (3 years later) be rewriting pieces and sending the new code in.

Which would be like NOT retiring.

In an effort to reduce [the amount of] documentation, I try as best possible to make all methods perform as little as reasonable.

So if you had a CreateAccount method on a Registration class, the method could be split down into the following logic.

  1. Check if the user exists already
  2. Go to the database and register the details
  3. Send an email to say the account has been created
  4. Return the confirmation of account creation to the user.

A standard developer may just put all that code into the CreateAccount method and leave it at that. However, as you can plainly see, I've been able to split this method into four disparate steps. Each one can be it's own method.

public CreateAccountResult CreateAccount(string username, string emailAddress, string password)
            new CreateAccountResult
                Success = false,
                Message = "This user already exists!"

    if(!DataServices.StoreUserDetails(username, password))
            new CreateAccountResult
                Success = false,
                Message = "Unable to register account. Database failure"

    EmailServices.SendRegistrationEmail(username, emailAddress);

        new CreateAccountResult
            Success = true,
            Message = "Thank you for registering with AwesomeSoftwareInc"

Much easier to read and understand what's going on than trying to include all the code that has been separated into those other methods.

Ideally this could be split down further and with proper design it would be incredibly tidy. This is just something off the top of my head in 5 minutes :)

The idea that code should be self-documenting is a good one, but that doesn't mean don't include your own comments if necessary. Ideally though, if you need a comment, try and see if that part of the code can be simplified. Sometimes it can't be avoided, but a lot of the time just spending a couple of minutes thinking about the problem from other angles can help tremendously :)

Thanks Ketsuekiame for your insights. I love the diversity of writing code. I know people at my work that believe in the lowest amount of LOC as possible. I've even had a prof teach the same priniciple. His idea was condense the code and comment the hell out of it. So many varying ways to go about coding and I think that's great. When I see code, I like to visually walk through it so your way would be good for the likes of me, or at least a combination of different ways is the approach I am going to take.

@RJ, I just sent an email to Daryl with the hopes that I can get a tour of Dovercourt. I've heard from another employee here that Dovercourt is a techies dream-spot and that I would be fascinated by everything that goes on there. I'm really hoping Daryl can set something up for me to check it out. Thanks a lot for giving me his name.

Dovercourt was a great place to work. I was there from the day we decommissioned the old AGC/SCADA system in November 1998 until I retired in August 2008. They also have great parking. New hi-tech building, new computers, new everything except for the people. On my first day there I put a big notice in the main meeting room that said "just because everything is different doesn't mean anything has changed".

But I'm not bitter.