marketing insights
Business Wire
Published on : Mar 9, 2026
The race to build richer business intelligence datasets just took a major step forward.
OpenData.org has released a massive U.S. business dataset containing 86 million organizations, 101 million contacts, and 142 million locations, creating one of the most comprehensive open datasets mapping the American corporate ecosystem.
The release becomes even more notable through a strategic partnership with Senzing, which provides built-in AI-powered entity resolution capabilities designed to clean, match, and unify records across datasets.
Delivered in Senzing-ready JSON format, the dataset allows organizations to immediately integrate high-volume business intelligence data into analytics, compliance, and AI pipelines.
In short: OpenData.org is trying to do for organizational data what open knowledge graphs did for the web—create a structured map of how businesses, people, and locations connect.
At its core, the dataset functions as a large-scale entity graph linking companies, executives, and operational locations.
The release includes:
86 million organizations
101 million people-to-company relationships
142 million business locations
These connections allow users to move beyond static company records and instead analyze relationships across the business ecosystem.
Each organization is connected to multiple locations such as headquarters, branch offices, operational facilities, and registered addresses. Meanwhile, more than 101 million contacts link individuals to the organizations they control or operate.
That structure makes it possible to trace corporate hierarchies, discover shared executives across companies, and map operational footprints.
For analysts, investigators, and sales teams, that kind of relational data can provide critical context that traditional company databases often lack.
The dataset was assembled from filings and records across 100,000+ U.S. government agencies, including:
Internal Revenue Service
U.S. Department of Labor
U.S. Securities and Exchange Commission
Small Business Administration
United States Postal Service
The dataset also incorporates state and local regulatory filings, creating a far broader coverage base than most commercial corporate data providers, which often focus heavily on public companies.
Another key element: the inclusion of 162 reference identifiers used across financial, regulatory, and geographic datasets.
These identifiers include global and financial standards such as:
Legal Entity Identifier (LEI)
Financial Instrument Global Identifier (FIGI)
International Securities Identification Number (ISIN)
Placekey
The result is a dataset designed to act as a cross-reference layer, enabling organizations to match and connect multiple external data sources.
Large-scale datasets are only useful if records can be accurately matched across sources—a notoriously difficult problem known as entity resolution.
That’s where the partnership with Senzing comes in.
Senzing’s technology uses a combination of machine learning and rule-based logic to determine whether two records represent the same real-world entity—even when names, addresses, or identifiers vary.
The system relies on the company’s Entity Centric Learning architecture, which continuously improves how entities are matched and resolved across datasets.
According to Jeff Jonas, the integration enables organizations to quickly identify relationships and data inconsistencies without building complex matching systems themselves.
“Organizations using the OpenData.org dataset can immediately benefit from Senzing’s entity resolution technology,” Jonas said, noting that the system can detect hidden relationships and reconcile duplicate records in real time.
Another key differentiator: the platform can run locally without requiring sensitive data to be uploaded to the cloud—an important factor for compliance-heavy industries.
Datasets like OpenData.org’s reflect a growing shift toward entity graph architectures in enterprise data management.
Instead of storing isolated records—like a single company profile—entity graphs focus on the relationships connecting entities.
That approach has become increasingly important for applications such as:
Financial compliance and AML investigations
Know Your Customer (KYC) and Know Your Business (KYB) verification
Fraud detection and risk monitoring
Investment research and due diligence
CRM enrichment and B2B lead generation
AI model training and analytics
In many of these scenarios, the most valuable insight lies not in the individual record but in the connections between records.
For example:
Multiple companies sharing the same executive
Businesses operating from the same physical address
Ownership structures spanning multiple corporate entities
Without an entity graph, uncovering those connections often requires manual research across dozens of fragmented databases.
For Jose M. Plehn, the goal of the project is to create an open infrastructure layer for business intelligence.
Plehn argues that despite the explosion of corporate data platforms, there has been no truly open dataset covering the full spectrum of organizations beyond public companies.
“Every transaction, relationship, and risk assessment connects back to an organization, person, or location,” he said.
He compares the dataset to a “Rosetta Stone” for business data, providing a shared set of identifiers and relationships that allow different datasets to interoperate.
If that vision holds, OpenData.org’s release could become a foundational resource for industries ranging from financial services to marketing technology.
The launch also reflects a broader trend across the data industry.
As AI systems and analytics platforms increasingly depend on large, structured datasets, entity graphs are becoming foundational infrastructure.
Companies building AI-driven applications—from fraud detection systems to GTM intelligence platforms—require datasets that connect people, organizations, and locations at scale.
By combining open-source coverage, government-sourced records, and built-in entity resolution, OpenData.org’s dataset aims to position itself as a key building block in that emerging ecosystem.
Whether it becomes a standard reference layer for business data remains to be seen—but with hundreds of millions of linked entities already mapped, the project is starting with a significant head start.
Get in touch with our MarTech Experts.