Request a 30-minute demo

Our product expert will guide you through our demo to show you how to automate testing for every part of your workflow.

See data diffing in real time
Data stack integration
Discuss pricing and features
Get answers to all your questions
Submit your credentials
Schedule date and time
for the demo
Get a 30-minute demo
and see datafold in action

Your guide to legendary data migrations

Success stories, lessons learned, and tools to turn your migration into a milestone achievement.

Streamlining Nonprofit Data Migrations

Jasmin Tan, a Staff Data Analyst at Brooklyn Data, reflects on overcoming technical challenges, resolving conflicting requirements, and untangling legacy system complexities.

When Jasmin Tan joined a nonprofit’s data migration project to transition from a third-party vendor to an in-house setup using Redshift and dbt, the clock was ticking. The goal was ambitious: to give the nonprofit greater control over its critical fundraising campaign logic while ensuring business continuity. 

Unlike many who view data migrations as stressful and fraught with failure, Jasmin embraces them with enthusiasm: "I like having to think through the steps, and it’s always nice to see the end result. Especially with this last project—the complexity around the logic and having to migrate that—it was a really fun thing to get your teeth into."

In this interview, Jasmin shares her insights on navigating technical roadblocks, adapting to unclear documentation, and finding clarity in complex migrations.

Translating, and validating, logic across systems

The migration was part of a broader effort to move away from reliance on a third-party vendor that had historically managed the organization’s campaign lists. This third party had been responsible for generating complex stored procedures, which directly queried Salesforce tables, applying detailed transformations to the data, and formatting the outputs to meet specific campaign requirements.

As the organization transitioned to using Redshift and dbt for its data infrastructure, the goal was to replicate this functionality while maintaining the accuracy of campaign lists and ensuring business continuity. However, the move from stored procedures to dbt and sql scripts posed significant challenges, as it required rethinking how business logic was implemented and validated.

Datafold: I’m guessing the additional complexity was because they were going from stored procedures to dbt, which are completely different ways of architecting and understanding business logic. Could you speak to that part of the process? What were the transformations like during that part of the migration?

Jasmin: I think that was probably the trickiest bit. The stored procedures would query tables directly, mostly from Salesforce, and the formatting comprised thousands of lines of code—looking at specific things to exclude, specific things to include, and then formatting the output and anything else that might be needed.

For the migration, we had to look at the data being brought into Redshift and dbt and how it was being transformed. For example, did it only look at active codes or inactive codes as well? We had to check if the transformations were applied in the source models and then confirm, when it came to the reporting models, what additional transformations were being done there. Then we built the campaign lists on top of those reporting models in dbt. I was checking every layer to see what kind of transformations were being made and then confirming that everything is correct.

Datafold: That’s quite a chain of complexity. How did you go about validating at each step? Was the process similar for each step? I’m guessing dbt tests covered parts of it, but what were the testing strategies you used during the migration?

Jasmin: For something like this, accuracy was initially the priority over performance. So, there were a lot of comparisons of the outputs to check that the right logic had been applied. The client actually wrote a specific custom R script to do comparisons and check that outputs were correct.

Another factor was timing. The campaign lists were run with the external party at very specific times, and they would share outputs. But of course, when we were testing it with Redshift and dbt, it was harder to get it to match because it was live data. We had to run it at the same time as the third party to see if the results looked the same.

Datafold: On the data quality side, what measures did you use to ensure accuracy? Did you use any tools for edge cases?

Jasmin: It was mostly the custom scripts. Data quality was a big thing, especially given the kind of data coming through-but as the campaign lists could be quite different, it ended up being a case-by-case basis.

Datafold: Every migration is so different. This is the first one I’ve learned about where the client provided custom test scripts, which is interesting. I think it speaks to how they were prioritizing accuracy over performance.

Jasmin: Absolutely - as the partnership with the third party that was producing all the campaign lists was ending, it was very much: "we’ve got to meet this deadline—lift and shift. Got a little technical debt built up but it’s okay, we’ll address it later on." 

Untangling legacy logic without the docs

One of the key takeaways from this part of the migration is the importance of documentation and institutional knowledge when dealing with legacy systems. Jasmin highlights a common challenge: previous vendors or teams often leave behind incomplete or unclear documentation, making it difficult to understand the rationale behind certain decisions. Without this context, translating logic into a new system can become a process of piecing together a puzzle without all the pieces.

Datafold: Were there any unexpected challenges that you faced? I’m thinking about things like zero downtime requirements or similar issues.

Jasmin: I’d probably say the translation—because you’re looking at translating the logic from the previous system to the new one. You have to look in multiple places, and you really notice things like a lack of documentation from the previous vendors to understand why they took certain steps–what their end goals were for the logic that we simplified. 

On some occasions, there could even be conflicting requirements between what the nonprofit required and what was written in the stored procedures, it didn’t seem to exactly align. With instances like that—you have to try to work out why they did it differently or what you should do instead.

Datafold: That makes sense. I’m guessing the people who created the requirements—or even the third party—might not be with the company anymore, so there’s no one to ask.

Jasmin: Yeah, exactly.

Datafold: How did you go about figuring that out? Was it just trial and error to see what they were trying to do?

Jasmin: A mix. Trial and error to see what created the closest output. But also checking in with the nonprofit’s engineering team to see if they could find the right stakeholder to give the sign-off on what the logic should be. 

Overlooked challenges in migrations

When it comes to data migrations, the technical steps may seem daunting, but for Jasmin, the real challenges often lie elsewhere. Drawing from her experience as a consultant, she highlights the significance of clear communication, stakeholder alignment, and building a strong understanding of the data itself—factors that are often underestimated but critical to success.

Datafold: Looking back at all these different types of migrations, what do you feel technical teams tend to overlook when planning or executing one?

Jasmin: I’ll say that for me, it’s not as much the technical side–the steps can be quite clear. For these projects, it’s ended up being more about thinking about the general process or the teams that you’re working with as a consultant. So whether your client’s teams are quite lean—making it harder to get their time for sign-offs—or if they’ve got really specific processes that you need to follow.

In the most recent project with the nonprofit, it’s exactly like you said—the quality of the data needs to be taken into consideration, and understanding the data and what it actually represents.

Datafold: Do you feel like, as a consultant, something that’s particularly challenging from your viewpoint is that you don’t necessarily have domain expertise in their specific data sets or what they’re trying to do with it? What is your process for getting up to speed on the data quality side of things?

Jasmin: I would recommend talking as much as you can with the data owner or equivalent to understand it, and then just building up that repository of knowledge. I think that’s the best way to understand the data.

Datafold: When you mentioned that the technical steps are very clear to you, that surprised me. For some people I’ve talked to, it’s the opposite. There’s maybe conflict over what priorities need to happen first—whether it should be a lift-and-shift versus refactoring sort of thing. How do you create your playbook for figuring out the technical steps that need to happen?

Jasmin: Figuring out the technical steps comes from the wealth of experience and knowledge of the great teammates I’ve worked with, and then slowly building up your own repository over time of steps that need to be taken or considered, all of which make migrations a lot more reliable!