NEW New asset class opened: "Ready to Build"
wind-turbine.com
Search
wind-turbineMatch
With us you will find the right provider!
Create an inquiry and we will put you in touch with relevant providers.

Turbit Publishes Research on AI Question Answering for Wind Operations

08.11.2025

Turbit has published research addressing a fundamental challenge in wind operations: extracting reliable answers from large sets of recurring technical reports. The paper, 'PluriHop – Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora,' demonstrates an AI system that achieves up to 52% relative improvement over standard approaches in answer accuracy, though absolute performance indicates significant room for continued research.

The research, conducted by Mykolas Sveistrys and Dr. Richard Kunert from Turbit Systems GmbH, introduces and formalizes a new category of questions that require complete information from entire document sets—where missing a single relevant report produces an incorrect answer. The findings are now available on arXiv.

The Problem: Incomplete Retrieval in Operational Question Answering
Wind operators routinely need answers that depend on complete information from multiple documents: which turbines showed specific wear patterns across all inspections, whether component issues are increasing or decreasing over time, or which anomalies appeared across a fleet during a given period.

Current Retrieval-Augmented Generation (RAG) systems typically retrieve 10-20 documents and stop. This approach works when questions have clear stopping points, but fails when every document in a corpus might contain relevant information. The result is incomplete answers that operators cannot rely on for operational or financial decisions.

Pluri-Hop Questions: A New Category

The research team coined the term 'pluri-hop questions' to describe queries that are:

  • Recall-sensitive: omitting one relevant document produces an incorrect answer
  • Exhaustive: all documents must be checked; there is no stopping condition
  • Exact: there is one correct answer, not a range of valid interpretations

This category is distinct from multi-hop questions (where evidence spans a few documents) and summarization tasks (where approximate answers are acceptable). Pluri-hop questions are common in industries that generate recurring reports: maintenance logs, compliance filings, lab results, and inspection records.

PluriHopWIND: A Benchmark Based on Real Wind Industry Data

To study this problem, the team created PluriHopWIND: 48 questions based on 191 real technical reports from wind operations, including oil analysis reports, turbine inspections, and service logs in German and English.

The dataset's key characteristic is high repetitiveness. Wind operations generate thousands of similar reports—monthly inspections following the same template, recurring service documentation, and standardized test results. This creates significant amounts of semantically similar but irrelevant material that complicates retrieval.

Using a repetitiveness metric based on inter-document similarity, the research demonstrates that PluriHopWIND is 8-40% more repetitive than existing multi-hop benchmarks. This higher distractor density better reflects the practical challenges of answering questions about operational data.

PluriHopRAG: Exhaustive Retrieval with Early Filtering

The paper introduces PluriHopRAG, a retrieval architecture designed for recall-sensitive question answering. The approach is: check all documents, but filter irrelevant material before expensive language model inference.

The system implements two methods:

Document-level query decomposition breaks complex queries into document-specific subquestions. Rather than asking 'Has blade damage been declining?' across all documents, the system asks each report: 'Does this cover the relevant turbine?', 'What is the inspection date?', and 'What blade damage was recorded?' This matches how information actually exists in operational reports.

Cross-encoder filtering estimates document relevance using a lightweight model before full language model reasoning occurs. This reduces computational cost while maintaining high recall of relevant documents.

On the PluriHopWIND benchmark, PluriHopRAG achieved 18-52% relative improvement in F1 scores compared to standard RAG approaches, depending on the base language model. It also outperformed GraphRAG and multimodal RAG systems.

Performance Results and Ongoing Development

This research was conducted as part of Turbit's development of the Turbit Assistant, an AI system that extracts information from technical reports and automates routine analysis. The methods demonstrated in PluriHopRAG directly improve the Assistant's ability to provide reliable answers from operational documentation.

The paper reports that current approaches, including PluriHopRAG, reach at most 40-47% statement-wise F1 score on the benchmark. While PluriHopRAG shows significant improvement over baseline and competing methods, the authors note this leaves considerable room for future improvements. The relatively modest absolute performance highlights the difficulty of the pluri-hop question-answering task and indicates this remains an active area requiring continued research.

Conclusion

The research formalizes pluri-hop questions as a distinct category requiring different retrieval strategies than conventional multi-hop or summarization tasks. The PluriHopWIND benchmark, with its high distractor density based on real wind industry data, exposes current limitations in AI question-answering systems when handling recurring report corpora.

The PluriHopRAG architecture demonstrates that exhaustive retrieval combined with efficient filtering can deliver measurable improvements over standard approaches. However, absolute performance levels indicate significant opportunities remain for advancing methods in this domain. For industries built on recurring report data—including wind energy, healthcare, finance, and compliance—these findings provide a foundation for building more reliable AI systems while acknowledging the complexity of the challenge.

As wind fleets grow and operational data volumes increase, addressing the pluri-hop question-answering challenge becomes increasingly relevant for maintaining reliable, efficient operations. 

Read the full paper: PluriHop – Exhaustive, Recall-Sensitive QA over Distractor-Rich Corpora by Mykolas Sveistrys and Dr. Richard Kunert, available on arXiv.