#data #career #opinion >[!info] Disclaimer >This is a living-doc of sorts. As I think more about this, I will continue to update this. As an AE, I spend a lot of time working with data. This also includes the grunt work that is associated with investigating data issues. I use the term "data issue" here us an umbrella term to cover for all problems one might encounter when working with data such as data duplications, data loss, inconsistencies in data, incorrect data amongst others. As a rule of thumb, if the "numbers don't match" in your end user's dashboard, then that is pretty much a data issue (mostly) which would then need to be investigated by the analytics engineer[^1]. In this post, I'm laying out some of the higher-order philosophical ideas that underpin the debugging of data issues. When I started working in data four years ago, I had to learn most of this on the job. But of all the skills I've picked up, the hardest one to unlock has been the "process-oriented" outlook you need when working with data. Here's the thing: with every data issue, the process is almost identical. You acknowledge the issue, identify it, isolate it if there are multiple problems, then tackle them one by one. Most importantly, you need a log that tracks every issue and how it was solved. This documentation part doesn't always happen in teams due to the overhead involved. But in my experience, it's quite useful to maintain. ## The Process of Debugging The actual mechanics of debugging are fairly straightforward. What separates good debugging from mediocre debugging is **rigour and documentation**. You need to be **slow to diagnose but quick to document** your findings. The time you spend thoroughly understanding the root cause pays off in clarity when you communicate it to others. ## Translating Findings Across Audiences One thing I've learnt the hard way: you'll need to tell the story of a data issue to multiple people, and each needs a different version. Your domain expert will understand grain-level details about table structures to identify fan-outs. Your data engineers will better understand technical specifics about ETL logic. Your end users just need to know why their dashboard shows "double the expected number" and when it'll be fixed. **This translation job falls on you**. The domain expert can help identify the problem, but you're responsible for interpreting that into language the end user understands. You need to extract the relevant information from conversations with technical stakeholders and present only what's necessary to each audience. This is harder than it sounds, especially when you're juggling five different people with five different contexts. And the only way to stay on top of this? - Documenting your process. ## Communication is Everything When you fix a bug, your pull request needs to show the before-and-after state clearly. There's no such thing as "too much detail" when communicating a data fix. Your reviewers might not all have the same context as you, so **over-communicate rather than under-communicate.** A good visual almost always explains what several paragraphs fail to get across. Whether it's a simple diagram showing the data flow or a spreadsheet showing record counts before and after your fix, **visuals accelerate understanding**. When you're deep in the weeds of a data issue, it's easy to forget that what's obvious to you isn't obvious to someone seeing it for the first time. A part of your job is to make the reviewer's life simple. If your explanation is clear enough that they need to ask zero follow-up questions, you've done it right. The more thoroughly you think this through, the more you can ensure your fix isn't introducing new bugs. ## The Accountability Piece **Don't operate on the CYB model (cover your backside)**. If you make a mistake, acknowledge it clearly and find a fix. This is especially important if you're junior or mid-level. Being defensive about data bugs doesn't help anyone. Sometimes it's tempting to let a bug slide if it's not caught in review. "I'll think about it later," you tell yourself. Don't. Track it. Document it. Figure it out. Letting bugs accumulate is how data quality degrades quietly over time. ## Tools of the Trade You don't need fancy tools to debug data issues. Excel is honestly underrated. Spin up a quick sheet, count your records, identify your columns, and spot the discrepancies. Sometimes the simplest approach i.e raw data, a spreadsheet, and a systematic approach is the most effective. [^1]: This role might be the data analyst or the data engineer or someone completely different from organisation to organisation. Roles and responsibilities in data are quite fluid as I have come to understand it. It all depends on the org and its scale and its cultural preferences.