Administrator
发布于 2026-03-14 / 5 阅读
0
0

AI 每日资讯 - 2026-03-14

发布日期:2026-03-14

收录条目:20

1. Google DeepMind Introduces Aletheia: The AI Agent Moving from Math Competitions to Fully Autonomous Professional Research Discoveries

摘要:Google DeepMind team has introduced Aletheia, a specialized AI agent designed to bridge the gap between competition-level math and professional research. While models achieved gold-medal standards at the 2025 Internation

2. Microsoft’s Copilot AI assistant is coming to current-gen Xbox consoles this year

摘要:Xbox is getting ready to launch its Gaming Copilot AI assistant on "current-generation consoles" this year, according to a report from GamesRadar. Sonali Yadav, Xbox's product manager for gaming AI, revealed the news dur

3. P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM

摘要:In this post, we explain how P-EAGLE works, how we integrated it into vLLM starting from v0.16.0 (PR#32887), and how to serve it with our pre-trained checkpoints.

4. Model Context Protocol (MCP) vs. AI Agent Skills: A Deep Dive into Structured Tools and Behavioral Guidance for LLMs

摘要:In recent times, many developments in the agent ecosystem have focused on enabling AI agents to interact with external tools and access domain-specific knowledge more effectively. Two common approaches that have emerged

5. Google AI Introduces ‘Groundsource’: A New Methodology that Uses Gemini Model to Transform Unstructured Global News into Actionable, Historical Data

摘要:Google AI Research team recently released Groundsource, a new methodology that uses Gemini model to extract structured historical data from unstructured public news reports. The project addresses the lack of historical d

6. DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

摘要:arXiv:2603.11076v1 Announce Type: new Abstract: Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace th

7. A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms

摘要:arXiv:2603.11093v1 Announce Type: new Abstract: The development of high-level autonomous driving (AD) is shifting from perception-centric limitations to a more fundamental bottleneck, namely, a deficit in robust and gene

8. PACED: Distillation at the Frontier of Student Competence

摘要:arXiv:2603.11178v1 Announce Type: new Abstract: Standard LLM distillation wastes compute on two fronts: problems the student has already mastered (near-zero gradients) and problems far beyond its reach (incoherent gradie

9. Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

摘要:arXiv:2603.11214v1 Announce Type: new Abstract: We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and a 7-step industrial cont

10. Reversible Lifelong Model Editing via Semantic Routing-Based LoRA

摘要:arXiv:2603.11239v1 Announce Type: new Abstract: The dynamic evolution of real-world necessitates model editing within Large Language Models. While existing methods explore modular isolation or parameter-efficient strateg

11. Mind the Sim2Real Gap in User Simulation for Agentic Tasks

摘要:arXiv:2603.11245v1 Announce Type: new Abstract: As NLP evaluation shifts from static benchmarks to multi-turn interactive settings, LLM-based simulators have become widely used as user proxies, serving two roles: generat

12. The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

摘要:arXiv:2603.11266v1 Announce Type: new Abstract: Unlearning in Large Language Models (LLMs) aims to enhance safety, mitigate biases, and comply with legal mandates, such as the right to be forgotten. However, existing unl

13. COMPASS: The explainable agentic framework for Sovereignty, Sustainability, Compliance, and Ethics

摘要:arXiv:2603.11277v1 Announce Type: new Abstract: The rapid proliferation of large language model (LLM)-based agentic systems raises critical concerns regarding digital sovereignty, environmental sustainability, regulatory

14. AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities

摘要:arXiv:2603.11279v1 Announce Type: new Abstract: The immense number of parameters and deep neural networks make large language models (LLMs) rival the complexity of human brains, which also makes them opaque ``black box''

15. Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future

摘要:arXiv:2603.11299v1 Announce Type: new Abstract: This editorial addresses the critical intersection of artificial intelligence (AI) and blockchain technologies, highlighting their contrasting tendencies toward centralizat

16. LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms

摘要:arXiv:2603.11333v1 Announce Type: new Abstract: Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator incentives, and user behavior co-evolve. This feedback structure makes co

17. RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents

摘要:arXiv:2603.11337v1 Announce Type: new Abstract: LLM agents increasingly perform end-to-end ML engineering tasks where success is judged by a single scalar test metric. This creates a structural vulnerability: an agent ca

18. FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles

摘要:arXiv:2603.11339v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied to financial analysis, yet their ability to audit structured financial statements under explicit accounting principles

19. Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI

摘要:arXiv:2603.11340v1 Announce Type: new Abstract: In this paper, we present a novel black-box online controller that uses only end-to-end measurements over short segments, without internal instrumentation, and hill climbin

20. TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting

摘要:arXiv:2603.11352v1 Announce Type: new Abstract: Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly wi


评论