I think most people would agree that private equity professionals are at the top of the food chain when it comes to creating capital efficiency and structuring deals but efficiently managing the flow of information from a deal room is still a blood sport that requires a lot of time, energy and people.
In 2025, 86% of private equity firms have indicated they’ve introduced AI into their workflows but only around 20% (similar to portfolio companies) have really operationalized use cases with concrete results. The gap isn't about AI technology capability. It's about understanding the actual problem they are trying to solve,which invariably involves documents more than structured data.
Every PE professional knows the drill (along with every advisory firm that supports them). You gain access to a virtual data room containing thousands of documents that tell the story of a business—but rarely in a coherent narrative.
A typical VDR includes the Confidential Information Memorandum, years of financial statements and audit reports, major customer and supplier contracts, legal and cyber documents including corporate charters and litigation history, Operational and HR records with organizational charts and key employee agreements, and lastly technical documents covering IP filings and technology product roadmaps.
These documents exist in every conceivable format. PDF scans of contracts signed in 2015, digital PDFs. Excel models in XLS and XLSX formats, each slightly different. PowerPoint decks that may or may not agree with the CIM and all forms of Board documents and the random Visio, MP3 and maybe some geospatial formatted docs for an infrastructure firm.
It’s not hard for critical information to get lost in this document chaos when you are pressed for time and the VDR hierarchy doesn’t match how you go about researching topics.
The industry has spent years pursuing the dream of structured data—attempting to normalize, standardize, and harmonize information into neat databases and data lakes (this works really well for internal company transactional data) but this approach fundamentally misunderstands the nature of private markets VDR documentation.
Private markets investing deals with private company data (not clean SEC documents, which are actually painful too if you’ve ever looked at 10Ks). It’s messy for a reason:
Legacy burden: Most PE investments are in companies that are mature and every company carries a decade or more of historical documentation, much of it created before modern data standards existed.
Format fragmentation: Our own analysis across all the documents we’ve processed shows that approximately 30%+ of critical deal documents exist only as scanned images or PDFs without searchable text (Ctrl-F only gets you so far).
Visual data complexity: Key metrics often hide in charts, graphs, and management presentations rather than structured tables (and those too might be scanned copies and not the original).
Inherent uniqueness: Every company develops its own terminology, KPI definitions, and reporting structures based on its specific industry and history. Models are really smart but they aren’t experts in every sub-domain (yet).
The transformation isn't about making documents cleaner or creating a perfect dataset—it's about making them more understandable using the latest LLM capabilities (particularly vision models) along with well documented ML and OCR technology. Modern LLMs combined with retrieval-augmented generation (RAG), multimodal embeddings, and expanded context windows can now process entire data rooms holistically.
It’s still a combination of art and science but it’s now possible to find patterns and inconsistencies that investment professionals might miss after weeks or months of analysis.
The capability improvements of the SOTA models have advanced rapidly:
The business case for AI in due diligence has reached an inflection point. OpenAI and Anthropic continue to reduce pricing for their models while simultaneously introducing new more powerful ones every 3-6 months. However, they will rate limit customers and they have created premium tiers for high usage.
Context windows have expanded from 8,000 tokens (words) just two years ago to over 1 million tokens today, meaning more documents analyzed simultaneously (you still need some form of RAG for most VDRs). When you combine dramatically increased capabilities with an 80%+ reduction in costs over the past 24 months, tasks that were economically prohibitive are now possible at a reasonable price.
To put this in perspective: analyzing a single 2,000-document dataroom that would have cost tens of thousands of dollars in compute (and with no reasoning capabilities - it was Ctrl-F on steroids) just last year now costs 80% less. The marginal cost of intelligence is compelling.
The firms seeing real results are building towards integrated intelligence systems. At ToltIQ, our clients are achieving:
These aren't incremental improvements. They represent a fundamental shift in how due diligence operates. But the real value isn't just speed or efficiency, it's comprehensiveness. Instead of sampling documents and hoping you caught the material issues, you can now analyze everything.
The firms that succeed won't be those waiting for perfect structured data. That day will never come. Success belongs to those building systems that thrive on document complexity, turning chaos into competitive advantage.
The permanent presence of unstructured documentation in privateequity is a feature of how businesses actually operate. Every acquisition target will continue to arrive with its unique mix of legacy systems, proprietary formats, and inconsistent documentation.
Your next deal will still come with thousands of unstructured documents. The only question is whether you'll spend weeks sampling them manually or hours analyzing them comprehensively with AI.
© 2025 ToltIQ