Real-time trust-weighted consensus and auditing of LLMs for secure propaganda detection

Gaeta, Angelo; Hosseinalibeiki, Hossein; Loia, Vincenzo; Orciuoli, Francesco

doi:10.1016/j.asoc.2026.115459

The rapid amplification of digital propaganda through algorithmically mediated platforms poses a serious threat to information integrity and public trust. Although Large Language Models (LLMs) demonstrate strong general reasoning capabilities, their deployment in security-sensitive tasks such as propaganda detection remains constrained by opaque decision-making, stochastic variability, and limited auditability. Existing solutions either rely on single-model pipelines, which create a single point of failure, or static ensemble schemes that lack inference-time governance. To address this gap, we propose POES, a real-time trust-weighted consensus and auditing framework that orchestrates multiple pre-trained LLM agents without task-specific retraining. The architecture integrates dynamic reputation modeling, quorum-based weighted voting, difficulty-aware updates, and rationale-level semantic auditing. A sentinel-based audit layer provides confidence-aware escalation signals for downstream human oversight. The framework is evaluated on the human-annotated HQP (High-Quality Propaganda) dataset (29,596 instances; 15% propaganda prevalence) under imbalanced and training-free conditions. POES achieves a macro-F1 score of 0.558 with statistically significant improvements over individual LLM baselines. Human validation further confirms that audit flags align with independent “needs review" judgments, demonstrating practical triage utility. By formalizing inference-time trust aggregation and exposing structured audit artifacts, POES advances secure and governance-aware LLM ensemble design for high-stakes NLP applications.