Administrator
发布于 2026-03-01 / 7 阅读
0
0

AI 每日资讯 - 2026-03-01

发布日期:2026-03-01

收录条目:20

1. Our agreement with the Department of War

摘要:Details on OpenAI’s contract with the Department of War, outlining safety red lines, legal protections, and how AI systems will be deployed in classified environments.

2. Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation

摘要:arXiv:2602.22215v1 Announce Type: new Abstract: Large Language Models (LLMs) demonstrate potential in the field of scientific idea generation. However, the generated results often lack controllable academic context and t

3. FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation

摘要:arXiv:2602.22273v1 Announce Type: new Abstract: We introduce FIRE, a comprehensive benchmark designed to evaluate both the theoretical financial knowledge of LLMs and their ability to handle practical business scenarios.

4. Multi-Level Causal Embeddings

摘要:arXiv:2602.22287v1 Announce Type: new Abstract: Abstractions of causal models allow for the coarsening of models such that relations of cause and effect are preserved. Whereas abstractions focus on the relation between t

5. Agent Behavioral Contracts: Formal Specification and Runtime Enforcement for Reliable Autonomous AI Agents

摘要:arXiv:2602.22302v1 Announce Type: new Abstract: Traditional software relies on contracts -- APIs, type systems, assertions -- to specify and enforce correct behavior. AI agents, by contrast, operate on prompts and natura

6. Vibe Researching as Wolf Coming: Can AI Agents with Skills Replace or Augment Social Scientists?

摘要:arXiv:2602.22401v1 Announce Type: new Abstract: AI agents -- systems that execute multi-step reasoning workflows with persistent state, tool access, and specialist skills -- represent a qualitative shift from prior autom

7. Towards Autonomous Memory Agents

摘要:arXiv:2602.22406v1 Announce Type: new Abstract: Recent memory agents improve LLMs by extracting experiences and conversation history into an external storage. This enables low-overhead context assembly and online memory

8. Exploring Human Behavior During Abstract Rule Inference and Problem Solving with the Cognitive Abstraction and Reasoning Corpus

摘要:arXiv:2602.22408v1 Announce Type: new Abstract: Humans exhibit remarkable flexibility in abstract reasoning, and can rapidly learn and apply rules from sparse examples. To investigate the cognitive strategies underlying

9. Epistemic Filtering and Collective Hallucination: A Jury Theorem for Confidence-Calibrated Agents

摘要:arXiv:2602.22413v1 Announce Type: new Abstract: We investigate the collective accuracy of heterogeneous agents who learn to estimate their own reliability over time and selectively abstain from voting. While classical ep

10. ArchAgent: Agentic AI-driven Computer Architecture Discovery

摘要:arXiv:2602.22425v1 Announce Type: new Abstract: Agile hardware design flows are a critically needed force multiplier to meet the exploding demand for compute. Recently, agentic generative AI systems have demonstrated sig

11. How Do Latent Reasoning Methods Perform Under Weak and Strong Supervision?

摘要:arXiv:2602.22441v1 Announce Type: new Abstract: Latent reasoning has been recently proposed as a reasoning paradigm and performs multi-step reasoning through generating steps in the latent space instead of the textual sp

12. A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines

摘要:arXiv:2602.22442v1 Announce Type: new Abstract: Agent-based AutoML systems rely on large language models to make complex, multi-stage decisions across data processing, model selection, and evaluation. However, existing e

13. CWM: Contrastive World Models for Action Feasibility Learning in Embodied Agent Pipelines

摘要:arXiv:2602.22452v1 Announce Type: new Abstract: A reliable action feasibility scorer is a critical bottleneck in embodied agent pipelines: before any planning or reasoning occurs, the agent must identify which candidate

14. ConstraintBench: Benchmarking LLM Constraint Reasoning on Direct Optimization

摘要:arXiv:2602.22465v1 Announce Type: new Abstract: Large language models are increasingly applied to operational decision-making where the underlying structure is constrained optimization. Existing benchmarks evaluate wheth

15. VeRO: An Evaluation Harness for Agents to Optimize Agents

摘要:arXiv:2602.22480v1 Announce Type: new Abstract: An important emerging application of coding agents is agent optimization: the iterative improvement of a target agent through edit-execute-evaluate cycles. Despite its rele

16. Mapping the Landscape of Artificial Intelligence in Life Cycle Assessment Using Large Language Models

摘要:arXiv:2602.22500v1 Announce Type: new Abstract: Integration of artificial intelligence (AI) into life cycle assessment (LCA) has accelerated in recent years, with numerous studies successfully adapting machine learning a

17. Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models

摘要:arXiv:2602.22508v1 Announce Type: new Abstract: Large Reasoning Models (LRMs) often exhibit structural fragility in complex reasoning tasks, failing to produce correct answers even after successfully deriving valid inter

18. A Mathematical Theory of Agency and Intelligence

摘要:arXiv:2602.22519v1 Announce Type: new Abstract: To operate reliably under changing conditions, complex systems require feedback on how effectively they use resources, not just whether objectives are met. Current AI syste

19. Cognitive Models and AI Algorithms Provide Templates for Designing Language Agents

摘要:arXiv:2602.22523v1 Announce Type: new Abstract: While contemporary large language models (LLMs) are increasingly capable in isolation, there are still many difficult problems that lie beyond the abilities of a single LLM

20. Agentic AI for Intent-driven Optimization in Cell-free O-RAN

摘要:arXiv:2602.22539v1 Announce Type: new Abstract: Agentic artificial intelligence (AI) is emerging as a key enabler for autonomous radio access networks (RANs), where multiple large language model (LLM)-based agents reason


评论