Request a 30-minute demo

Our product expert will guide you through our demo to show you how to automate testing for every part of your workflow.

See data diffing in real time
Data stack integration
Discuss pricing and features
Get answers to all your questions
Submit your credentials
Schedule date and time
for the demo
Get a 30-minute demo
and see datafold in action
January 21, 2025

Introducing NoSQL and file diffing in Datafold

Discover how Datafold's new file diffing and MongoDB integration capabilities help data teams validate data quality across diverse formats. Compare files, databases, and NoSQL documents effortlessly.

Kira Furuichi
Introducing NoSQL and file diffing in Datafold

Data teams continue to face increasing challenges in managing and validating data across diverse formats and storage solutions. Whether your organization’s data lives in relational databases, files, or NoSQL systems, maintaining visibility and quality is vital for both informed decisions and regulatory requirements.

To help data teams handle data quality at any scale (and in any format), Datafold is expanding its powerful data diffing capabilities beyond traditional relational databases. With our latest release, we're introducing comprehensive support for file diffing and MongoDB integration, enabling teams to maintain the same level of data quality assurance across all their data assets—regardless of format or storage location.

These new capabilities empower data teams to:

  • Validate data consistency and quality between flat files and databases
  • Compare and verify MongoDB documents across different environments, making it easier to catch discrepancies in semi-structured data formats

File diffing

Datafold now supports the ability to diff files (e.g. CSV, Excel, Parquet, etc.) in a similar way to how you diff tables.

  • Diff between files in cloud storage: In addition to diffing data in tables, views, and SQL queries, Datafold allows you to diff data in files hosted in cloud storage (e.g., Azure Data Lake, AWS S3).
  • Diff between files and database objects: For example, you can also diff between an Excel file and a database table, or between a CSV file and an Excel file.

In the cloud database era, it’s easy to forget how files still power much of the data industry. However, as your business and data grow, managing and validating the quality of files becomes increasingly challenging. For data teams with files from external vendors and internal teams, file diffing is a scalable method to validate and audit parity between files and database objects, ensuring data movement success and data pipeline reliability.

To learn more about file diffing, please check out the documentation.

MongoDB integration

Datafold now supports MongoDB integration, allowing you to compare MongoDB collections (such as Documents and JSON) both within and across MongoDB instances. This enables you to test and compare your data regardless of where or how it's stored.

The MongoDB integration works similar to how normal data diffs work in Datafold; however, because the schema can vary widely across documents in a collection, Datafold will now output a table where the columns of the table are the union of all fields Datafold observes across all documents. Datafold will still identify all the values that differ across your diffed documents, but present them in a flat, unnested table format.

For data teams that rely on semi-structured data within MongoDB, but struggle with validation or cross-database comparisons of data within MongoDB to an analytical database, they can now use Datafold to validate data replication and conduct value-level comparisons.

A value-level data diff between a MongoDB collection and a query in Databricks

Demo

Watch Datafold Product Manager Nick Carchedi demo how file diffing and the MongoDB integration work in the Datafold application.

Getting started

If your team relies on non-relational database data, such as files stored in cloud storage or documents in MongoDB, and struggles with visibility and maintaining data quality, there’s a few ways to get started with Datafold’s powerful data diffing capabilities:

In this article