Publications

Select a paper and use the Abstract or BibTeX buttons to reveal details.

Trust The Typical

Debargha Ganguly, Sreehari Sankar, Biyao Zhang, Vikash Singh, Kanan Gupta, Harshini Kavuru, Alan Luo, Weicong Chen, Warren Morningstar, Raghu Machiraju, Vipin Chaudhary

arXiv • Feb 2026

Preprint

This paper proposes T3 (Trust The Typical), an alternative to traditional LLM safety methods that treats safety as an out-of-distribution detection problem. Rather than identifying harmful content through guardrails, T3 learns what constitutes acceptable prompts and flags deviations as potential threats. The framework achieves top performance across 18 benchmarks without training on harmful examples, reduces false positives by up to 40 times compared to specialized safety models, and transfers effectively across 14+ languages without retraining. The authors also demonstrate production deployment capabilities with a GPU-optimized integration into vLLM that adds minimal computational overhead.
```
@misc{ganguly2026trustthetypical,
  title={Trust The Typical},
  author={Debargha Ganguly and Sreehari Sankar and Biyao Zhang and Vikash Singh and Kanan Gupta and Harshini Kavuru and Alan Luo and Weicong Chen and Warren Morningstar and Raghu Machiraju and Vipin Chaudhary},
  year={2026},
  eprint={2602.04581},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2602.04581}
}
```
VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Vikash Singh, Darion Cassel, Nathaniel Weir, Nick Feng, Sam Bayless

arXiv • Jan 2026

Preprint

Despite the syntactic fluency of Large Language Models (LLMs), ensuring their logical correctness in high-stakes domains remains a fundamental challenge. We present a neurosymbolic framework that combines LLMs with SMT solvers to produce verification-guided answers through iterative refinement. Our approach decomposes LLM outputs into atomic claims, autoformalizes them into first-order logic, and verifies their logical consistency using automated theorem proving. We introduce three key innovations: (1) multi-model consensus via formal semantic equivalence checking to ensure logic-level alignment between candidates, eliminating the syntactic bias of surface-form metrics, (2) semantic routing that directs different claim types to appropriate verification strategies: symbolic solvers for logical claims and LLM ensembles for commonsense reasoning, and (3) precise logical error localization via Minimal Correction Subsets (MCS), which pinpoint the exact subset of claims to revise, transforming binary failure signals into actionable feedback. Our framework classifies claims by their logical status and aggregates multiple verification signals into a unified score with variance-based penalty. The system iteratively refines answers using structured feedback until acceptance criteria are met or convergence is achieved. This hybrid approach delivers formal guarantees where possible and consensus verification elsewhere, advancing trustworthy AI. With the GPT-OSS-120B model, VERGE demonstrates an average performance uplift of 18.7% at convergence across a set of reasoning benchmarks compared to single-pass approaches.
```
@misc{singh2026vergeformalrefinementguidance,
  title={VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning},
  author={Vikash Singh and Darion Cassel and Nathaniel Weir and Nick Feng and Sam Bayless},
  year={2026},
  eprint={2601.20055},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2601.20055}
}
```

Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers

Yang Wang, Debargha Ganguly, Xinpeng Li, Chaoda Song, Shouren Wang, Vikash Singh, Vipin Chaudhary, Xiaotian Han

arXiv • Jan 2026

Preprint

@misc{wang2026midthink,
  title={Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers},
  author={Yang Wang and Debargha Ganguly and Xinpeng Li and Chaoda Song and Shouren Wang and Vikash Singh and Vipin Chaudhary and Xiaotian Han},
  year={2026},
  eprint={2601.07036},
  archivePrefix={arXiv},
  url={https://arxiv.org/abs/2601.07036}
}

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Ganguly, D., Singh, V., Sankar, S., Zhang, B., Zhang, X., Iyengar, S., Han, X., Sharma, A., Kalyanaraman, S., Chaudhary, V.

NeurIPS 2025 • May 2025

HTML arXiv Code

This paper addresses the epistemological gap between probabilistic LLMs and deterministic formal verification by introducing a PCFG framework to model LLM outputs. A systematic evaluation of five frontier LLMs reveals domain-specific impacts of SMT-based autoformalization and demonstrates that task-dependent uncertainty signals enable selective verification with drastically reduced errors.
```
@misc{ganguly2025grammarsformaluncertaintytrust,
  title={Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks},
  author={Debargha Ganguly and Vikash Singh and Sreehari Sankar and Biyao Zhang and Xuecen Zhang and Srinivasan Iyengar and Xiaotian Han and Amit Sharma and Shivkumar Kalyanaraman and Vipin Chaudhary},
  year={2025},
  eprint={2505.20047},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.20047}
}
```
K⁴: Online Log Anomaly Detection Via Unsupervised Typicality Learning

Singh, V.*, Chen, W.*, Rahmani, Z., Ganguly, D., Hariri, M., Chaudhary, V.

*Equal contribution

HiPC 2025 • July 2025

HTML arXiv Code

K⁴ introduces an unsupervised typicality learning paradigm for online log anomaly detection. By modeling normal operational patterns rather than enumerating anomalies, it detects novel, previously unseen failures with low computational overhead suitable for production streaming environments.
```
@misc{chen2025k4onlineloganomaly,
  title={$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning},
  author={Vikash Singh and Weicong Chen and Zahra Rahmani and Debargha Ganguly and Mohsen Hariri and Vipin Chaudhary},
  year={2025},
  eprint={2507.20051},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2507.20051}
}
```
Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM

Zhang, B., Zheng, M., Ganguly, D., Zhang, X., Singh, V., Chaudhary, V., Zhang, Z.

HiPC 2025 • September 2025

HTML arXiv

This work decomposes distributed LLM training into granular operations to build an accurate, efficient GPU performance model. The framework achieves prediction error under 10% while informing optimal parallelization and hardware configuration strategies for large-scale training.
```
@article{zhang2025efficient,
  title={Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM},
  author={Zhang, Biyao and Zheng, Mingkai and Ganguly, Debargha and Zhang, Xuecen and Singh, Vikash and Chaudhary, Vipin and Zhang, Zhao},
  journal={arXiv preprint arXiv:2509.22832},
  year={2025}
}
```

Research Interests

Formal Reasoning and Verification: Developing rigorous formal logic methodologies, leveraging SMT‐LIB encodings and solver frameworks to verify, interpret, and enhance the correctness of LLM-generated reasoning.
Fine-Tuning LLMs and Vision Models: Leveraging techniques such as LoRA to optimize large language and vision models for specific tasks while maintaining computational efficiency.
Redundancy Mitigation in LLMs: Investigating approaches to reduce redundancy in large language models, enhancing performance and efficiency.
Model Optimization: Developing strategies for optimizing machine learning models, including pruning and hyperparameter tuning, to improve both accuracy and resource utilization.
Explainable AI: Advancing interpretability in AI models, focusing on enhancing transparency and providing actionable insights for users.

Research Statement

Bridging Probabilistic Generative AI with Rigorous Formal Verification

My research centers on a fundamental tension in modern Artificial Intelligence: the gap between the powerful but hallucination-prone creativity of Large Language Models (LLMs) and the strict, deterministic guarantees required for trustworthy systems. I am developing a new class of "Neuro-Symbolic" architectures that do not just generate code or proofs, but actively reason about their own uncertainty, verify their outputs against formal constraints, and optimize their "thinking" budgets for maximum efficiency.

Current Contributions: Quantifying Uncertainty & Controlling Reasoning

My recent work, including research published at NeurIPS 2025, tackles the "epistemological gap" between probabilistic models and formal logic. I introduced a Probabilistic Context-Free Grammar (PCFG) framework to model the uncertainty of LLM-generated formal artifacts (like SMT-LIB programs). By treating LLM outputs not as final answers but as hypotheses with measurable uncertainty, I developed "selective verification" protocols that reduce logical errors by 14–100%.

Beyond verification, I focus on the efficiency of reasoning. In my work on "Mid-Think," I demonstrated that reasoning behaviors in hybrid models are driven by specific token-level triggers rather than high-level instructions. I leveraged this to create training-free prompting strategies that dynamically adjust the model's "compute budget" during inference, achieving superior accuracy-latency trade-offs. I also work on LLM safety using out-of-distribution (OOD) detection techniques.

Future Directions: Diffusion & Energy-Based Reasoning

I am currently pivoting towards Reasoning Diffusion Language Models and Energy-Based Models (EBMs) to overcome the limitations of standard auto-regressive generation. My hypothesis is that "reasoning" should not be a linear, left-to-right process, but an iterative refinement—similar to how diffusion models denoise an image.

Diffusion for Logic: I am exploring how diffusion processes can allow models to "revise" their logic in continuous latent space, enabling self-correction before generating a final answer.

Energy-Based Verification: I am investigating EBMs to model the "global consistency" of a reasoning chain. Instead of predicting the next token, these models assess the "energy" (or compatibility) of an entire proof or plan, guiding the generator toward formally correct states.

Impact & Vision

Drawing on my experience as an Applied Scientist Intern at AWS and my background in formal methods (Lean/Coq), my goal is to build AI systems that are safe enough for critical infrastructure. I aim to create models that don't just "guess" the answer, but construct a verifiable path to it—combining the flexibility of deep learning with the rigor of mathematical proof.

Publications

Trust The Typical

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

K⁴: Online Log Anomaly Detection Via Unsupervised Typicality Learning

Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM

Research Interests

Research Statement