AI 每日资讯 - 2026-05-30

发布日期：2026-05-30

收录条目：20

1. Comprehensive observability for Amazon SageMaker AI LLM inference: From GPU utilization to LLM quality

来源：AWS ML Blog
发布时间：2026-05-29 23:36 UTC
链接：https://aws.amazon.com/blogs/machine-learning/comprehensive-observability-for-amazon-sagemaker-ai-llm-inference-from-gpu-utilization-to-llm-quality/

摘要：This post demonstrates a comprehensive observability solution using Amazon Managed Grafana dashboards that provides a holistic view of both quality and quantity for LLMs served on Amazon SageMaker AI endpoints with infer

2. NVIDIA Introduces X-Token: Projection-Guided Cross-Tokenizer KD That Outperforms GOLD by +3.82 Average Points on Llama-3.2-1B

来源：MarkTechPost
发布时间：2026-05-29 23:19 UTC
链接：https://www.marktechpost.com/2026/05/29/nvidia-introduces-x-token-projection-guided-cross-tokenizer-kd-that-outperforms-gold-by-3-82-average-points-on-llama-3-2-1b/

摘要：NVIDIA's X-Token fixes two structural failures in GOLD and improves GSM8k accuracy from 2.56 to 15.54 The post NVIDIA Introduces X-Token: Projection-Guided Cross-Tokenizer KD That Outperforms GOLD by +3.82 Average Points

3. StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

来源：MarkTechPost
发布时间：2026-05-29 21:25 UTC
链接：https://www.marktechpost.com/2026/05/29/stepfun-releases-step-3-7-flash-a-198b-moe-vision-language-model-for-coding-agents-and-search-workflows/

摘要：StepFun releases Step 3.7 Flash, a 198B MoE model with native vision, 256k context, and Advisor Mode. The post StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows appe

4. Tech companies desperately want to film you doing chores

来源：The Verge AI
发布时间：2026-05-29 17:37 UTC
链接：https://www.theverge.com/ai-artificial-intelligence/940007/ai-companies-will-pay-for-robot-training-data

摘要：This week, an AI training startup called Shift said it would clean New Yorkers' homes for free. It has plans to expand into other cities as well, including London, and looking around my flat, I get the appeal. But there'

5. Jony Ive’s funky Ferrari

来源：The Verge AI
发布时间：2026-05-29 12:25 UTC
链接：https://www.theverge.com/podcast/939589/ferrari-luce-jony-ive-vergecast

摘要：Most people will never own, drive, or even sit inside a Ferrari Luce. (If you can, or do… hit us up.) There's still no question that Ferrari's first electric vehicle is one of the most interesting, surprising cars of the

6. Boston Children’s uses AI to unlock new diagnoses

来源：OpenAI News
发布时间：2026-05-29 12:00 UTC
链接：https://openai.com/index/boston-childrens-hospital

摘要：Boston Children’s Hospital uses OpenAI technology to improve patient care, reduce operational burden, and help diagnose more than 40 rare disease cases.

7. How Braintrust turns customer requests into code with Codex

来源：OpenAI News
发布时间：2026-05-29 12:00 UTC
链接：https://openai.com/index/braintrust

摘要：How Braintrust engineers use Codex with GPT-5.5 to run experiments and code faster.

8. This AI startup will clean your home for free to train future robots

来源：The Verge AI
发布时间：2026-05-29 11:58 UTC
链接：https://www.theverge.com/ai-artificial-intelligence/939765/ai-training-data-startup-shift-free-cleaning

摘要：AI training startup Shift wants to clean your home for free. The catch - because, despite what its website says, there's always a catch - is that it will record cleaners as they scrub, vacuum, dust, tidy, and wash, and u

9. Adobe’s conversational AI agent is a mediocre design intern

来源：The Verge AI
发布时间：2026-05-29 10:00 UTC
链接：https://www.theverge.com/tech/939686/adobes-conversational-ai-agent-is-a-mediocre-design-intern

摘要：AI image tools rarely make me feel like I'm part of the creative process. They are, after all, mostly designed so that people with no design experience can type in a few words and get back a usable result. So I was pleas

10. Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication

来源：MarkTechPost
发布时间：2026-05-29 08:43 UTC
链接：https://www.marktechpost.com/2026/05/29/meet-mkernel-a-multi-gpu-multi-node-fused-kernel-library-for-gpu-driven-communication/

摘要：UC Berkeley's UCCL team releases mKernel, fusing intra-node NVLink, inter-node RDMA, and dense compute into a single persistent CUDA kernel. The post Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Dri

11. Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights

来源：MarkTechPost
发布时间：2026-05-29 07:28 UTC
链接：https://www.marktechpost.com/2026/05/29/hexo-labs-open-sources-sia-a-self-improving-agent-that-updates-both-the-harness-and-the-model-weights/

摘要：Hexo Labs released SIA, an open-source self-improving loop, under an MIT license. A Feedback-Agent reads each run's trajectory, then either rewrites the scaffold or triggers a LoRA weight update on gpt-oss-120b. Combinin

12. Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

来源：arXiv cs.AI
发布时间：2026-05-29 04:00 UTC
链接：https://arxiv.org/abs/2605.28849

摘要：arXiv:2605.28849v1 Announce Type: new Abstract: Gradient temporal-difference methods provide stable off-policy prediction with linear function approximation, but their practical performance is strongly affected by the ge

13. Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

来源：arXiv cs.AI
发布时间：2026-05-29 04:00 UTC
链接：https://arxiv.org/abs/2605.28855

摘要：arXiv:2605.28855v1 Announce Type: new Abstract: Temporal-difference learning with function approximation can be unstable under off-policy sampling. TDC stabilizes off-policy TD through an auxiliary covariance correction,

14. The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

来源：arXiv cs.AI
发布时间：2026-05-29 04:00 UTC
链接：https://arxiv.org/abs/2605.28864

摘要：arXiv:2605.28864v1 Announce Type: new Abstract: The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived fr

15. Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems

来源：arXiv cs.AI
发布时间：2026-05-29 04:00 UTC
链接：https://arxiv.org/abs/2605.28883

摘要：arXiv:2605.28883v1 Announce Type: new Abstract: Tropical forests worldwide are under intense deforestation pressure driven by economic and political interests, and scientific evidence suggests this deforestation contribu

16. Review Arcade: On the Human Alignment and Gameability of LLM Reviews

来源：arXiv cs.AI
发布时间：2026-05-29 04:00 UTC
链接：https://arxiv.org/abs/2605.28897

摘要：arXiv:2605.28897v1 Announce Type: new Abstract: LLM-generated reviews for scientific papers are gaining considerable traction and are even being officially piloted by major conferences. We have to assume that not only re

17. Orthogonal Concept Erasure for Diffusion Models

来源：arXiv cs.AI
发布时间：2026-05-29 04:00 UTC
链接：https://arxiv.org/abs/2605.28902

摘要：arXiv:2605.28902v1 Announce Type: new Abstract: Concept erasure has emerged as a promising approach to mitigate undesired or unsafe content in diffusion models, yet existing methods still face significant limitations. Wh

18. Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

来源：arXiv cs.AI
发布时间：2026-05-29 04:00 UTC
链接：https://arxiv.org/abs/2605.28965

摘要：arXiv:2605.28965v1 Announce Type: new Abstract: Linking free-text phenotype descriptions to ontology terms, typically referred to as phenotype annotation, is essential for the cross-study integration of comparative morph

19. VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

来源：arXiv cs.AI
发布时间：2026-05-29 04:00 UTC
链接：https://arxiv.org/abs/2605.28978

摘要：arXiv:2605.28978v1 Announce Type: new Abstract: Finite Element Analysis (FEA) serves as the cornerstone of modern engineering design. However, its workflow is inherently complex and relies heavily on domain expertise. Al

20. BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation

来源：arXiv cs.AI
发布时间：2026-05-29 04:00 UTC
链接：https://arxiv.org/abs/2605.28994

摘要：arXiv:2605.28994v1 Announce Type: new Abstract: AI tools to support real world decision making must be able to build simulation models that inform their recommendations and render them interpretable. Tools that can autom

菜单

分享

AI 每日资讯 - 2026-05-30

1. Comprehensive observability for Amazon SageMaker AI LLM inference: From GPU utilization to LLM quality

2. NVIDIA Introduces X-Token: Projection-Guided Cross-Tokenizer KD That Outperforms GOLD by +3.82 Average Points on Llama-3.2-1B

3. StepFun Releases Step 3.7 Flash: A 198B MoE Vision-Language Model for Coding Agents and Search Workflows

4. Tech companies desperately want to film you doing chores

5. Jony Ive’s funky Ferrari

6. Boston Children’s uses AI to unlock new diagnoses

7. How Braintrust turns customer requests into code with Codex

8. This AI startup will clean your home for free to train future robots

9. Adobe’s conversational AI agent is a mediocre design intern

10. Meet mKernel: A Multi-GPU, Multi-Node Fused Kernel Library for GPU-Driven Communication

11. Hexo Labs Open-Sources SIA: A Self-Improving Agent That Updates Both the Harness and the Model Weights

12. Behavior-Induced Mirror-Prox Temporal-Difference Learning for Faster Off-Policy Prediction

13. Behavior-Aware Auxiliary Corrections for Off-Policy Temporal-Difference Prediction

14. The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

15. Ultra-Reduced-Impact-Encased-Logging (URIEL): propose a new method for selective sustainable logging and post-harvest silvicultural treatment in tropical forest using airborne robotics systems

16. Review Arcade: On the Human Alignment and Gameability of LLM Reviews

17. Orthogonal Concept Erasure for Diffusion Models

18. Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

19. VFEAgent: A Multimodal Agent Framework for End-to-End Automated Finite Element Analysis

20. BEAMS: Benchmarking and Evaluating AI for Modeling and Simulation

评论

A2A 初理解：让 AI Agent 真正“互相协作”的通用协议

slow op的排查手段（更新中）

asan内存检测

模型即芯片：AI 推理新分叉

rclone拷贝桶对象失败定位过程

训练初了解：把大模型看成一个复杂函数（通俗版）

vector扩容

智能指针是线程安全的？

ceph中 RBD 使用

cas 无锁编程