By Jonathan Eichler, Private Equity AI Researcher
ToltIQ's comprehensive prompt engineering evaluation tested hundreds of due diligence queries, using various prompting approaches, to identify which approaches deliver the best results across reasoning, relevance, problem solving, and accuracy. Our analysis revealed critical patterns that dramatically impact AI performance.
One of the biggest performance killers in our testing was what we refer to as overburdened prompts—attempting multiple distinct tasks simultaneously. These prompts force trade-offs that dilute performance across all requested elements. The chart below illustrates the performance differences between prompts requesting multiple non-congruent tasks (overburdened) versus those focused on a single cohesive objective.
Consider this prompt from our due diligence testing that attempts to combine a data extraction task with complex analytical classification task:
"Review all earnings calls, MD&A sections, and financial footnotes for FY21-FY24. Extract any labor-related items impacting adjusted EBITDA including compensation normalization, restructuring, or unusual personnel costs. For each adjustment, return quoted excerpts, document sources, classification under Labor QoE taxonomy, materiality flags, sustainability ratings, and notes on whether rationale requires follow-up."
This prompt yielded a response with major issues in both accuracy and relevance. The fix: split it into two separate prompts, each requesting one cohesive task.
Prompt 1: "Extract all labor-related EBITDA adjustments from earnings calls, MD&A sections, and footnotes for FY21-FY24. Include quoted excerpts, document sources, and classification under Labor QoE taxonomy."
Prompt 2: "For each labor adjustment identified, assess materiality (≥$100k or ≥0.5% EBITDA), sustainability rating (1=one-time, 2=recurring, 3=unclear), and flag items requiring additional documentation support."
The two separated, focused prompts achieved dramatically higher performance with no major errors to report.
This performance jump reveals that AI models excel when tasks are handled sequentially rather than simultaneously. This sequential approach mirrors how human analysts naturally work: first gathering relevant data, then analyzing it.
This principle explains why ToltIQ's Prompt Playlist feature is so effective. Prompt Playlists allow you to save multiple prompts and run them in a specific order, enabling "Chain of Thought" prompting where the AI breaks down complex problems into intermediate steps. Instead of forcing the AI to multitask, this approach lets each step perform at peak efficiency by leveraging the natural cognitive flow of information gathering followed by analysis.
An important distinction to make when engineering successful prompts is that prompt length doesn't determine overburdening. Our overburdened prompts averaged only 350 characters versus 295 for non-overburdened prompts—a minimal difference. In fact, our longest prompt in testing wasn't overburdened at all, demonstrating that complexity lies in task structure, not word count.
The key to effective prompting does not lie in length, but in a well-defined structure. Across the board, LLM models excel with precise, action-oriented prompts that define specific data inputs, sourcing instructions, formatting expectations, and a single analytical goal. As AI continues to evolve, the fundamentals stay the same: well-structured prompts consistently outperform fragmented ones.
© 2025 ToltIQ