AI 每日资讯 - 2026-03-14

发布日期：2026-03-14

收录条目：20

1. Google DeepMind Introduces Aletheia: The AI Agent Moving from Math Competitions to Fully Autonomous Professional Research Discoveries

来源：MarkTechPost
发布时间：2026-03-13 23:05 UTC
链接：https://www.marktechpost.com/2026/03/13/google-deepmind-introduces-aletheia-the-ai-agent-moving-from-math-competitions-to-fully-autonomous-professional-research-discoveries/

摘要：Google DeepMind team has introduced Aletheia, a specialized AI agent designed to bridge the gap between competition-level math and professional research. While models achieved gold-medal standards at the 2025 Internation

2. Microsoft’s Copilot AI assistant is coming to current-gen Xbox consoles this year

来源：The Verge AI
发布时间：2026-03-13 20:51 UTC
链接：https://www.theverge.com/games/894799/microsoft-gaming-copilot-ai-xbox-consoles

摘要：Xbox is getting ready to launch its Gaming Copilot AI assistant on "current-generation consoles" this year, according to a report from GamesRadar. Sonali Yadav, Xbox's product manager for gaming AI, revealed the news dur

3. P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM

来源：AWS ML Blog
发布时间：2026-03-13 19:27 UTC
链接：https://aws.amazon.com/blogs/machine-learning/p-eagle-faster-llm-inference-with-parallel-speculative-decoding-in-vllm/

摘要：In this post, we explain how P-EAGLE works, how we integrated it into vLLM starting from v0.16.0 (PR#32887), and how to serve it with our pre-trained checkpoints.

4. Model Context Protocol (MCP) vs. AI Agent Skills: A Deep Dive into Structured Tools and Behavioral Guidance for LLMs

来源：MarkTechPost
发布时间：2026-03-13 08:32 UTC
链接：https://www.marktechpost.com/2026/03/13/model-context-protocol-mcp-vs-ai-agent-skills-a-deep-dive-into-structured-tools-and-behavioral-guidance-for-llms/

摘要：In recent times, many developments in the agent ecosystem have focused on enabling AI agents to interact with external tools and access domain-specific knowledge more effectively. Two common approaches that have emerged

5. Google AI Introduces ‘Groundsource’: A New Methodology that Uses Gemini Model to Transform Unstructured Global News into Actionable, Historical Data

来源：MarkTechPost
发布时间：2026-03-13 08:07 UTC
链接：https://www.marktechpost.com/2026/03/13/google-ai-introduces-groundsource-a-new-methodology-that-uses-gemini-model-to-transform-unstructured-global-news-into-actionable-historical-data/

摘要：Google AI Research team recently released Groundsource, a new methodology that uses Gemini model to extract structured historical data from unstructured public news reports. The project addresses the lack of historical d

6. DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11076

摘要：arXiv:2603.11076v1 Announce Type: new Abstract: Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace th

7. A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11093

摘要：arXiv:2603.11093v1 Announce Type: new Abstract: The development of high-level autonomous driving (AD) is shifting from perception-centric limitations to a more fundamental bottleneck, namely, a deficit in robust and gene

8. PACED: Distillation at the Frontier of Student Competence

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11178

摘要：arXiv:2603.11178v1 Announce Type: new Abstract: Standard LLM distillation wastes compute on two fronts: problems the student has already mastered (near-zero gradients) and problems far beyond its reach (incoherent gradie

9. Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11214

摘要：arXiv:2603.11214v1 Announce Type: new Abstract: We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and a 7-step industrial cont

10. Reversible Lifelong Model Editing via Semantic Routing-Based LoRA

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11239

摘要：arXiv:2603.11239v1 Announce Type: new Abstract: The dynamic evolution of real-world necessitates model editing within Large Language Models. While existing methods explore modular isolation or parameter-efficient strateg

11. Mind the Sim2Real Gap in User Simulation for Agentic Tasks

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11245

摘要：arXiv:2603.11245v1 Announce Type: new Abstract: As NLP evaluation shifts from static benchmarks to multi-turn interactive settings, LLM-based simulators have become widely used as user proxies, serving two roles: generat

12. The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11266

摘要：arXiv:2603.11266v1 Announce Type: new Abstract: Unlearning in Large Language Models (LLMs) aims to enhance safety, mitigate biases, and comply with legal mandates, such as the right to be forgotten. However, existing unl

13. COMPASS: The explainable agentic framework for Sovereignty, Sustainability, Compliance, and Ethics

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11277

摘要：arXiv:2603.11277v1 Announce Type: new Abstract: The rapid proliferation of large language model (LLM)-based agentic systems raises critical concerns regarding digital sovereignty, environmental sustainability, regulatory

14. AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11279

摘要：arXiv:2603.11279v1 Announce Type: new Abstract: The immense number of parameters and deep neural networks make large language models (LLMs) rival the complexity of human brains, which also makes them opaque ``black box''

15. Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11299

摘要：arXiv:2603.11299v1 Announce Type: new Abstract: This editorial addresses the critical intersection of artificial intelligence (AI) and blockchain technologies, highlighting their contrasting tendencies toward centralizat

16. LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11333

摘要：arXiv:2603.11333v1 Announce Type: new Abstract: Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator incentives, and user behavior co-evolve. This feedback structure makes co

17. RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11337

摘要：arXiv:2603.11337v1 Announce Type: new Abstract: LLM agents increasingly perform end-to-end ML engineering tasks where success is judged by a single scalar test metric. This creates a structural vulnerability: an agent ca

18. FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11339

摘要：arXiv:2603.11339v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied to financial analysis, yet their ability to audit structured financial statements under explicit accounting principles

19. Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11340

摘要：arXiv:2603.11340v1 Announce Type: new Abstract: In this paper, we present a novel black-box online controller that uses only end-to-end measurements over short segments, without internal instrumentation, and hill climbin

20. TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting

来源：arXiv cs.AI
发布时间：2026-03-13 04:00 UTC
链接：https://arxiv.org/abs/2603.11352

摘要：arXiv:2603.11352v1 Announce Type: new Abstract: Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly wi

菜单

分享

AI 每日资讯 - 2026-03-14

1. Google DeepMind Introduces Aletheia: The AI Agent Moving from Math Competitions to Fully Autonomous Professional Research Discoveries

2. Microsoft’s Copilot AI assistant is coming to current-gen Xbox consoles this year

3. P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM

4. Model Context Protocol (MCP) vs. AI Agent Skills: A Deep Dive into Structured Tools and Behavioral Guidance for LLMs

5. Google AI Introduces ‘Groundsource’: A New Methodology that Uses Gemini Model to Transform Unstructured Global News into Actionable, Historical Data

6. DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use

7. A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms

8. PACED: Distillation at the Frontier of Student Competence

9. Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios

10. Reversible Lifelong Model Editing via Semantic Routing-Based LoRA

11. Mind the Sim2Real Gap in User Simulation for Agentic Tasks

12. The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning

13. COMPASS: The explainable agentic framework for Sovereignty, Sustainability, Compliance, and Ethics

14. AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities

15. Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future

16. LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms

17. RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents

18. FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles

19. Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI

20. TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting

评论

A2A 初理解：让 AI Agent 真正“互相协作”的通用协议

slow op的排查手段（更新中）

模型即芯片：AI 推理新分叉

rclone拷贝桶对象失败定位过程

vector扩容

asan内存检测

训练初了解：把大模型看成一个复杂函数（通俗版）

智能指针是线程安全的？

cas 无锁编程

LeetCode-有序数组的平方