The rapid amplification of digital propaganda through algorithmically mediated platforms poses a serious threat to information integrity and public trust. Although Large Language Models (LLMs) demonstrate strong general reasoning capabilities, their deployment in security-sensitive tasks such as propaganda detection remains constrained by opaque decision-making, stochastic variability, and limited auditability. Existing solutions either rely on single-model pipelines, which create a single point of failure, or static ensemble schemes that lack inference-time governance. To address this gap, we propose POES, a real-time trust-weighted consensus and auditing framework that orchestrates multiple pre-trained LLM agents without task-specific retraining. The architecture integrates dynamic reputation modeling, quorum-based weighted voting, difficulty-aware updates, and rationale-level semantic auditing. A sentinel-based audit layer provides confidence-aware escalation signals for downstream human oversight. The framework is evaluated on the human-annotated HQP (High-Quality Propaganda) dataset (29,596 instances; 15% propaganda prevalence) under imbalanced and training-free conditions. POES achieves a macro-F1 score of 0.558 with statistically significant improvements over individual LLM baselines. Human validation further confirms that audit flags align with independent “needs review" judgments, demonstrating practical triage utility. By formalizing inference-time trust aggregation and exposing structured audit artifacts, POES advances secure and governance-aware LLM ensemble design for high-stakes NLP applications.
Real-time trust-weighted consensus and auditing of LLMs for secure propaganda detection
Angelo GaetaMembro del Collaboration Group
;Hossein Hosseinalibeiki
Membro del Collaboration Group
;Vincenzo LoiaMembro del Collaboration Group
;Francesco OrciuoliMembro del Collaboration Group
2026
Abstract
The rapid amplification of digital propaganda through algorithmically mediated platforms poses a serious threat to information integrity and public trust. Although Large Language Models (LLMs) demonstrate strong general reasoning capabilities, their deployment in security-sensitive tasks such as propaganda detection remains constrained by opaque decision-making, stochastic variability, and limited auditability. Existing solutions either rely on single-model pipelines, which create a single point of failure, or static ensemble schemes that lack inference-time governance. To address this gap, we propose POES, a real-time trust-weighted consensus and auditing framework that orchestrates multiple pre-trained LLM agents without task-specific retraining. The architecture integrates dynamic reputation modeling, quorum-based weighted voting, difficulty-aware updates, and rationale-level semantic auditing. A sentinel-based audit layer provides confidence-aware escalation signals for downstream human oversight. The framework is evaluated on the human-annotated HQP (High-Quality Propaganda) dataset (29,596 instances; 15% propaganda prevalence) under imbalanced and training-free conditions. POES achieves a macro-F1 score of 0.558 with statistically significant improvements over individual LLM baselines. Human validation further confirms that audit flags align with independent “needs review" judgments, demonstrating practical triage utility. By formalizing inference-time trust aggregation and exposing structured audit artifacts, POES advances secure and governance-aware LLM ensemble design for high-stakes NLP applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


