9 Ways To Approach An Unfamiliar Code Base

Fuck legacy code. No, seriously. Consider yourself very lucky if you haven't had to deal with any. Learning how to deal with pressure to deliver, deadlines and a lot of bad code is tough. Since I'm extremely dope, I'll bless you with insight that'll help you keep your job.

1. FIRST OF ALL: As long as you're getting paid, there will always be pressure. Try not to give too many fucks.

ED

During one project, I developed frequent chest pains, weakness, acne, erectile dysfunction and a tendency to skip meals. My anticipation of team check-ins that don't include my saying "It's done." was at the root of my suffering. I cut the number of fucks I give in half, realizing there's no one who can kill me by saying "But this was due yesterday!" or "You're fired!"

Worry of deadlines soul-burning stares at meetings will smother your inner engineer. Don't let some title-slinging midget send you to the ER.

2. Run it first.

It works, Gene!

Before you start writing any code, make sure you can run what exists.

Once (a few times), I tried an assignment on an existing code base before attempting to run what I had to work with (who does that?). Surprise! That didn't work out for me...

Often, we make a dangerous haste is from deadline anticipation. At times, code I had to dive into didn't work to begin with. Issues as I began writing code were because of pre-existing fuck-ups. The takeaway from personal postmortems (and shots of Jack Daniels) is: Don't waste time. Try running that shit as soon as you get access to it.

Meditation: Why were boots invented?

3. Before rewriting, ask for an explanation.

HELP ME, PLEASE!!!

The beauty of problem solving is that for most problems, there are many solutions. So, we may feel compelled to impose our personal approaches on existing solutions. How many times have you thought "The code isn't clean"? How many afternoons have you spent pondering the use of outdated conventions?

If you have access to the OC (original coder), ask him or her to walk you through what you feel like rewriting. The developer who took the "stupid" approach probably had a valid reason for doing so. When you can't contact the OC, organize your thoughts about performance and semantics in relation to your goal. Then, ask other teams about your concerns. If that doesn't help you clarify the code you've read, tap into online communities like StackExchange, or talk to the wonderful idlers on IRC–you can always find me on Freenode in #python, #django, and #algorithms; sometimes on SmashTheStack and OverTheWire.

Meditation: Why do men have nipples? I'm sure there's a good reason.

4. Despite what anyone says, print statements are fine.

Print statements here and there. Everywhere. Sprinkle 'em.

I was always advised to avoid tossing around print statements to debug issues. I can imagine some sound reasons behind my advisors' decision, but fuck that noise. PSA: Sprinkle print statements EVERYWHERE if you have to.

Programming should be as simple as finger painting and coloring by numbers. As engineers, simplification is crucial in how we develop solutions.

Toss print statements everywhere. If you forget to remove a print statement or two or five, don't sweat it. Some pedantic loser might throw a bitch fit over it, but if you got your shit done, you'll get paid, and you can ignore the jerk since the job's complete and you got your cheddar.

Meditation: Did it work?

5. If you can, try a logging tool.

Do whatever it takes. Let's just see what the fuck works.

You might want to invest in a logging tool like Python's logging module (v2, v3), SLF4J for Java, or KLogger if you're doing PHP. Logging tools/frameworks usually come packed with ways to turn dev-related logging on or off between run modes.

I don't know anyone who's against use of logging facilities. (Don't be the one fucking contrarian—I swear to Odin).

I do know that pushback related to the installation of such facilities is a possibility. That isn't a problem with the act of logging, however. That's a problem with bureaucracy and the introduction of a new tool (which is a valid concern).

After thought: Use whatever tools you need to use.

Meditation: I used a hammer when I didn't have a screwdriver. My bookshelf is still standing.

6. Format the shit out of the code until it makes some goddamn sense.

Get in formation, bitch.

For the sake of making deadlines, sometimes, we write really, really, really terrible code. When there's a lot going on in the module(s) you're reading—SWEET JESUS, WTF IS THAT THOUSAND-CHAR REGEX ON LINE 21??? NOPE. NOPENOPENOPE.

The process of adding spaces, new lines and indentation can be time-consuming, so take this particular advice with a grain of salt. You don't want to get too wrapped up in formatting, because you'll end up losing track of the problems you're trying to solve.

Meditation: I found my keys while cleaning my room; matched all of my solo socks after emptying my sock drawer.

7. Look for critical areas, remove them, see what breaks.

Graph theory and dependency trees and shit.

Your goal is to develop a good understanding of what happens when your application receives input. One of the best ways to do that is by removing what looks important.

See a function being called often? Comment out its definition, attempt to run the application, and discover what breaks and infer a reason. Rinse, repeat.

According to Mike Dorsey Jr., a really, really senior engineer I've learned from (wuddup, bruh?), we "just have to make sure what it is [we're] changing doesn't have that many dependencies." The rest of the League of Senior Software Engineers tends to agree.

There isn't much involved with the removal of code (just press your nearest DELETE button), but you may find that there's a lot to be learned about modular interdependency. It's all about figuring out where the large nodes are on a graph of dependencies. Knowing that the execution of one method depends on the successful execution of another can be key in identifying important performance behaviors (like throughput bottlenecks), potential areas in which data found at the beginning of runtime may go missing, and more.

After thought: Commenting out large blocks can make reading the code easier, too...

The takeaway here is: find "entry points" or places in which a lot of components intermingle.

Meditation: Water is a bond between two similar systems. What makes water wet?

8. Add tests where there aren't any.

hit it and see what happens

Unit testing makes my ass hurt.
Integration testing can lick my balls.

Even so, tests save time in the long run. Tests make it easy to check the health of an application while changes are being made. When it's our job to use someone's code (mess), often, tests aren't available. In some cases, a test suite isn't even available in our toolset. In that case, you can walk down either of two roads to badassery. You can introduce a testing framework (if possible). Or, you can write fragmented testing modules using any standard library available. I'd avoid scattered testing, though. Frameworks create a level of abstraction around common test operations. "Abstraction" usually means parameterization of common tests and minimization of boilerplate code. Trust me: you want to avoid reinventing the wheel to save time and energy doing maintenance.

As you run tests after adding, modifying and removing code, hopefully, you'll pick up on the reasons some parts of the code base exist. Again, your main interest should be in figuring out how input is managed until the end of an execution context is reached. If there are many different components involved, it'd be nice to see test assertions shedding light on how those components handle different states of your application. If you're dealing with a code base written in a functional language, you're probably in luck and can see what's going on by looking at each function as an innately well-defined unit that doesn't harbor code dependence beyond reason—though, I'm not going to sit here and act like most of that shit is readable or intuitive.

Meditation: Is there a test for this? What's this supposed to do? Cool. I'll write a test and break more shit on purpose.

9. Find a good debugging tool.

Probably serves a purpose.

You're not going to survive as a software engineer without learning at least one debugger. I didn't know that until, I guess, two years into my career? You're lucky you have me to give you the heads up.

For some of the more popular languages, IDEs usually come with interfaces for debuggers that come pre-installed with their respective compilers. Therefore, you'll see

  • a GUI for gdb if you're working with C or C++
  • a GUI for jdb if you're dealing with Java
  • a GUI for pdb if you're dealing with Python

Debuggers provide an easy way to break the execution of a program midway execution and view the state of the components contributing to your program's eventual output.

Did you see The Matrix? You know that scene where Neo realized he had the power to stop myriad bullets by focusing on the state of each bullet? That's how debuggers work. At any moment, you can say "HOLD ON" and view what's happening in your program during an OS/scripting language interpreter's traversal of your application's instruction set.

Handy amenities such as the ability to step through execution of a method are insanely fucking dope and can save you from getting canned. Trust me.

Meditation: Do I have the power to stop it?

Conclusion

That's all. Learn from this. Take it from me.

Use the shit out of whatever you can to realize how to solve whatever problem you need to solve. Decomposition is the name of the game. Don't convince yourself that you're built to understand large systems without deep investigation. The people we look up to in the field are usually those who can reduce large problems into really small ones, and becoming better at doing so requires practice and methodical approaches.

Renee. Let's do this.

Greg

Software Engineer

Subscribe to GregBlogs