发布日期:2026-03-14
收录条目:20
1. Google DeepMind Introduces Aletheia: The AI Agent Moving from Math Competitions to Fully Autonomous Professional Research Discoveries
- 来源:MarkTechPost
- 发布时间:2026-03-13 23:05 UTC
- 链接:https://www.marktechpost.com/2026/03/13/google-deepmind-introduces-aletheia-the-ai-agent-moving-from-math-competitions-to-fully-autonomous-professional-research-discoveries/
摘要:Google DeepMind team has introduced Aletheia, a specialized AI agent designed to bridge the gap between competition-level math and professional research. While models achieved gold-medal standards at the 2025 Internation
2. Microsoft’s Copilot AI assistant is coming to current-gen Xbox consoles this year
- 来源:The Verge AI
- 发布时间:2026-03-13 20:51 UTC
- 链接:https://www.theverge.com/games/894799/microsoft-gaming-copilot-ai-xbox-consoles
摘要:Xbox is getting ready to launch its Gaming Copilot AI assistant on "current-generation consoles" this year, according to a report from GamesRadar. Sonali Yadav, Xbox's product manager for gaming AI, revealed the news dur
3. P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM
- 来源:AWS ML Blog
- 发布时间:2026-03-13 19:27 UTC
- 链接:https://aws.amazon.com/blogs/machine-learning/p-eagle-faster-llm-inference-with-parallel-speculative-decoding-in-vllm/
摘要:In this post, we explain how P-EAGLE works, how we integrated it into vLLM starting from v0.16.0 (PR#32887), and how to serve it with our pre-trained checkpoints.
4. Model Context Protocol (MCP) vs. AI Agent Skills: A Deep Dive into Structured Tools and Behavioral Guidance for LLMs
- 来源:MarkTechPost
- 发布时间:2026-03-13 08:32 UTC
- 链接:https://www.marktechpost.com/2026/03/13/model-context-protocol-mcp-vs-ai-agent-skills-a-deep-dive-into-structured-tools-and-behavioral-guidance-for-llms/
摘要:In recent times, many developments in the agent ecosystem have focused on enabling AI agents to interact with external tools and access domain-specific knowledge more effectively. Two common approaches that have emerged
5. Google AI Introduces ‘Groundsource’: A New Methodology that Uses Gemini Model to Transform Unstructured Global News into Actionable, Historical Data
- 来源:MarkTechPost
- 发布时间:2026-03-13 08:07 UTC
- 链接:https://www.marktechpost.com/2026/03/13/google-ai-introduces-groundsource-a-new-methodology-that-uses-gemini-model-to-transform-unstructured-global-news-into-actionable-historical-data/
摘要:Google AI Research team recently released Groundsource, a new methodology that uses Gemini model to extract structured historical data from unstructured public news reports. The project addresses the lack of historical d
6. DIVE: Scaling Diversity in Agentic Task Synthesis for Generalizable Tool Use
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11076
摘要:arXiv:2603.11076v1 Announce Type: new Abstract: Recent work synthesizes agentic tasks for post-training tool-using LLMs, yet robust generalization under shifts in tasks and toolsets remains an open challenge. We trace th
7. A Survey of Reasoning in Autonomous Driving Systems: Open Challenges and Emerging Paradigms
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11093
摘要:arXiv:2603.11093v1 Announce Type: new Abstract: The development of high-level autonomous driving (AD) is shifting from perception-centric limitations to a more fundamental bottleneck, namely, a deficit in robust and gene
8. PACED: Distillation at the Frontier of Student Competence
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11178
摘要:arXiv:2603.11178v1 Announce Type: new Abstract: Standard LLM distillation wastes compute on two fronts: problems the student has already mastered (near-zero gradients) and problems far beyond its reach (incoherent gradie
9. Measuring AI Agents' Progress on Multi-Step Cyber Attack Scenarios
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11214
摘要:arXiv:2603.11214v1 Announce Type: new Abstract: We evaluate the autonomous cyber-attack capabilities of frontier AI models on two purpose-built cyber ranges-a 32-step corporate network attack and a 7-step industrial cont
10. Reversible Lifelong Model Editing via Semantic Routing-Based LoRA
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11239
摘要:arXiv:2603.11239v1 Announce Type: new Abstract: The dynamic evolution of real-world necessitates model editing within Large Language Models. While existing methods explore modular isolation or parameter-efficient strateg
11. Mind the Sim2Real Gap in User Simulation for Agentic Tasks
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11245
摘要:arXiv:2603.11245v1 Announce Type: new Abstract: As NLP evaluation shifts from static benchmarks to multi-turn interactive settings, LLM-based simulators have become widely used as user proxies, serving two roles: generat
12. The Unlearning Mirage: A Dynamic Framework for Evaluating LLM Unlearning
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11266
摘要:arXiv:2603.11266v1 Announce Type: new Abstract: Unlearning in Large Language Models (LLMs) aims to enhance safety, mitigate biases, and comply with legal mandates, such as the right to be forgotten. However, existing unl
13. COMPASS: The explainable agentic framework for Sovereignty, Sustainability, Compliance, and Ethics
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11277
摘要:arXiv:2603.11277v1 Announce Type: new Abstract: The rapid proliferation of large language model (LLM)-based agentic systems raises critical concerns regarding digital sovereignty, environmental sustainability, regulatory
14. AI Psychometrics: Evaluating the Psychological Reasoning of Large Language Models with Psychometric Validities
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11279
摘要:arXiv:2603.11279v1 Announce Type: new Abstract: The immense number of parameters and deep neural networks make large language models (LLMs) rival the complexity of human brains, which also makes them opaque ``black box''
15. Counterweights and Complementarities: The Convergence of AI and Blockchain Powering a Decentralized Future
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11299
摘要:arXiv:2603.11299v1 Announce Type: new Abstract: This editorial addresses the critical intersection of artificial intelligence (AI) and blockchain technologies, highlighting their contrasting tendencies toward centralizat
16. LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11333
摘要:arXiv:2603.11333v1 Announce Type: new Abstract: Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator incentives, and user behavior co-evolve. This feedback structure makes co
17. RewardHackingAgents: Benchmarking Evaluation Integrity for LLM ML-Engineering Agents
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11337
摘要:arXiv:2603.11337v1 Announce Type: new Abstract: LLM agents increasingly perform end-to-end ML engineering tasks where success is judged by a single scalar test metric. This creates a structural vulnerability: an agent ca
18. FinRule-Bench: A Benchmark for Joint Reasoning over Financial Tables and Principles
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11339
摘要:arXiv:2603.11339v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly applied to financial analysis, yet their ability to audit structured financial statements under explicit accounting principles
19. Improving LLM Performance Through Black-Box Online Tuning: A Case for Adding System Specs to Factsheets for Trusted AI
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11340
摘要:arXiv:2603.11340v1 Announce Type: new Abstract: In this paper, we present a novel black-box online controller that uses only end-to-end measurements over short segments, without internal instrumentation, and hill climbin
20. TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting
- 来源:arXiv cs.AI
- 发布时间:2026-03-13 04:00 UTC
- 链接:https://arxiv.org/abs/2603.11352
摘要:arXiv:2603.11352v1 Announce Type: new Abstract: Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly wi