发布日期:2026-03-01
收录条目:20
1. Our agreement with the Department of War
- 来源:OpenAI News
- 发布时间:2026-02-28 12:30 UTC
- 链接:https://openai.com/index/our-agreement-with-the-department-of-war
摘要:Details on OpenAI’s contract with the Department of War, outlining safety red lines, legal protections, and how AI systems will be deployed in classified environments.
2. Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22215
摘要:arXiv:2602.22215v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate potential in the field of scientific idea generation. However, the generated results often lack controllable academic context and t
3. FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22273
摘要:arXiv:2602.22273v1 Announce Type: new Abstract: We introduce FIRE, a comprehensive benchmark designed to evaluate both the theoretical financial knowledge of LLMs and their ability to handle practical business scenarios.
4. Multi-Level Causal Embeddings
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22287
摘要:arXiv:2602.22287v1 Announce Type: new Abstract: Abstractions of causal models allow for the coarsening of models such that relations of cause and effect are preserved. Whereas abstractions focus on the relation between t
5. Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22302
摘要:arXiv:2602.22302v1 Announce Type: new Abstract: Traditional software relies on contracts -- APIs, type systems, assertions -- to specify and enforce correct behavior. AI agents, by contrast, operate on prompts and natura
6. Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22401
摘要:arXiv:2602.22401v1 Announce Type: new Abstract: AI agents -- systems that execute multi-step reasoning workflows with persistent state, tool access, and specialist skills -- represent a qualitative shift from prior autom
7. Towards Autonomous Memory Agents
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22406
摘要:arXiv:2602.22406v1 Announce Type: new Abstract: Recent memory agents improve LLMs by extracting experiences and conversation history into an external storage. This enables low-overhead context assembly and online memory
8. Exploring Human Behavior During Abstract Rule Inference and Problem Solving with the Cognitive Abstraction and Reasoning Corpus
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22408
摘要:arXiv:2602.22408v1 Announce Type: new Abstract: Humans exhibit remarkable flexibility in abstract reasoning, and can rapidly learn and apply rules from sparse examples. To investigate the cognitive strategies underlying
9. Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22413
摘要:arXiv:2602.22413v1 Announce Type: new Abstract: We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical ep
10. ArchAgent: Agentic AI-driven Computer Architecture Discovery
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22425
摘要:arXiv:2602.22425v1 Announce Type: new Abstract: Agile hardware design flows are a critically needed force multiplier to meet the exploding demand for compute. Recently, agentic generative AI systems have demonstrated sig
11. How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22441
摘要:arXiv:2602.22441v1 Announce Type: new Abstract: Latent reasoning has been recently proposed as a reasoning paradigm and performs multi-step reasoning through generating steps in the latent space instead of the textual sp
12. A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22442
摘要:arXiv:2602.22442v1 Announce Type: new Abstract: Agent-based AutoML systems rely on large language models to make complex, multi-stage decisions across data processing, model selection, and evaluation. However, existing e
13. CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22452
摘要:arXiv:2602.22452v1 Announce Type: new Abstract: A reliable action feasibility scorer is a critical bottleneck in embodied agent pipelines: before any planning or reasoning occurs, the agent must identify which candidate
14. ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22465
摘要:arXiv:2602.22465v1 Announce Type: new Abstract: Large language models are increasingly applied to operational decision-making where the underlying structure is constrained optimization. Existing benchmarks evaluate wheth
15. VeRO: An Evaluation Harness for Agents to Optimize Agents
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22480
摘要:arXiv:2602.22480v1 Announce Type: new Abstract: An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its rele
16. Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22500
摘要:arXiv:2602.22500v1 Announce Type: new Abstract: Integration of artificial intelligence (AI) into life cycle assessment (LCA) has accelerated in recent years, with numerous studies successfully adapting machine learning a
17. Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22508
摘要:arXiv:2602.22508v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) often exhibit structural fragility in complex reasoning tasks, failing to produce correct answers even after successfully deriving valid inter
18. A Mathematical Theory of Agency and Intelligence
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22519
摘要:arXiv:2602.22519v1 Announce Type: new Abstract: To operate reliably under changing conditions, complex systems require feedback on how effectively they use resources, not just whether objectives are met. Current AI syste
19. Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22523
摘要:arXiv:2602.22523v1 Announce Type: new Abstract: While contemporary large language models (LLMs) are increasingly capable in isolation, there are still many difficult problems that lie beyond the abilities of a single LLM
20. Agentic AI for Intent-driven Optimization in Cell-free O-RAN
- 来源:arXiv cs.AI
- 发布时间:2026-02-28 05:00 UTC
- 链接:https://arxiv.org/abs/2602.22539
摘要:arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access networks (RANs), where multiple large language model (LLM)-based agents reason