2026-05-14 · AI 日报(8 条)
📰 重要动态
7.5
RealICU: Do LLM Agents Understand Long-Context ICU Data? A Benchmark Beyond Behavior Imitation
Intensive care units (ICU) generate long, dense and evolving streams of clinical
information, where physicians must repeatedly reassess patient states under time pressure,
underscoring a clear need fo
7.5
SafeHarbor: Hierarchical Memory-Augmented Guardrail for LLM Agent Safety
With the rapid evolution of foundation models, Large Language Model (LLM) agents
have demonstrated increasingly powerful tool-use capabilities. However, this proficiency introduces
significant securit
7.3
MemReread: Enhancing Agentic Long-Context Reasoning via Memory-Guided Rereading
To tackle long-context reasoning tasks without the quadratic complexity of standard
attention mechanisms, approaches based on agent memory have emerged, which typically maintain a
dynamically updated
6.6
MinT: Managed Infrastructure for Training and Serving Millions of LLMs
We present MindLab Toolkit (MinT), a managed infrastructure system for Low-Rank
Adaptation (LoRA) post-training and online serving. MinT targets a setting where many trained
policies are produced over
6.5
PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents
We introduce PersonalAI 2.0 (PAI-2), a novel framework, designed to enhance large
language model (LLM) based systems through integration of external knowledge graphs (KG). The
proposed approach addres
6.5
Learning Agentic Policy from Action Guidance
Agentic reinforcement learning (RL) for Large Language Models (LLMs) critically
depends on the exploration capability of the base policy, as training signals emerge only within its
in-capability regio
6.3
NVIDIA-AI-Blueprints/video-search-and-summarization
Suite of reference architectures for building GPU-accelerated vision agents and
AI-powered video analytics applications.
6.3
awslabs/agent-plugins
Agent Plugins for AWS equip AI coding agents with the skills to help you architect,
deploy, and operate on AWS.