Documentation

Start here to get DataHub running, plug it into your agents, and understand what's available. The Quickstart will spin up DataHub locally in minutes. The Skills, Agent Context Kit, and MCP Server docs cover the three primary ways to wire DataHub into your agent stack. 

Repositories

The DataHub open source codebase and the DataHub Skills repo are where the platform lives. Contributions back to either are welcomed and count toward the bonus open-source contribution criterion.

Sample Datasets 

Spin up a rich DataHub environment without wiring it to your own infrastructure. These sample datasets give you cross-platform metadata, lineage, and real-world data quality scenarios to build against.

Cross-platform metadata graphs:

  • showcase-ecommerce datapack — 1,049 entities across Snowflake, Looker, PowerBI, Tableau, dbt, Spark, PostgreSQL, S3 with cross-platform lineage, governance, glossary, and domains.
    • Load with: datahub datapack load showcase-ecommerce
  • bootstrap — Lightweight starter with datasets, dashboards, users, tags.
    • Load: datahub datapack load bootstrap

Real datasets with built-in scenarios:

  • nyc-taxi — NYC Yellow Taxi Trip Records (~500k trips). 3-stage pipeline with planted freshness issues.
  • healthcare — Synthetic patient records (~55k records) with planted data quality issues.
  • fiction-retail — Synthetic global retail dataset (50k customers, 150k orders) across 10 tables. Clean schema, blank canvas.

These datasets are safe for Apache 2.0 submissions. If you bring your own data, make sure its license permits publication in your open-source repo.

Community

This is where you'll get help, share progress, and connect with other builders during the hackathon. We'll also host office hours mid-hackathon for live Q&A.

 

Need Help?

For DataHub questions, drop into the #agent-hackathon channel in DataHub Slack — DataHub team members and other builders are there to help. For Devpost or submission issues, email support@devpost.com.