ceph中cls 实现

概览

ceph提供了librados API来访问后端rados存储集群，上层业务都是基于 rados 提供的接口实现的

!image-20240311220637974

随着项目规模的扩大，最初提供的简单增删改查 API 虽然能够满足基本业务需求，但在特殊场景下可能需要额外的 API 接口。逐渐增加的业务需求导致业务逻辑变得更加复杂，因此提供的 API 数量也会不断增加，这可能会使得底层的 RADOS 层变得臃肿，从解耦方面来看，特定的业务逻辑应该与基础架构分离，这样即使业务逻辑发生变化，也不会影响到基础架构的稳定性，实现业务逻辑与底层的解耦。

为此，ceph采用了动态插件的方式来处理这个问题；不同的应用程序可以根据自身的业务需求编写特定的模块（class），然后以插件的形式动态加载到 OSD上；OSD 进程会在启动时加载这些插件，当需要调用特定功能时再执行相应的插件。

本文主要介绍 ceph如何加载插件，以及业务层和rados 层是怎么通过 cls 交互。

先介绍几个关键函数

ceph 中cls 实现机制本质上是动态加载，简单来说是使用 Linux 提供的 dlopen 函数，统一读取动态库，通过函数指针将这些函数持久到内存中，所以先介绍下 dlopen 使用

dlopen 打开动态链接库 dlopen 打开动态链接库


void *dlopen(const char *filename, int flag);

filename：要打开的动态链接库的文件路径。
flag：打开动态链接库的标志
RTLD_LAZY（懒惰模式）: 即在需要时才解析符号。这意味着在调用 dlopen 时并不会立即解析库中的符号，而是在第一次调用相关函数时才会解析

RTLD_NOW（立即模式）: 表示立即加载，即在调用 dlopen 时立即解析库中的所有符号。这样会增加加载库的时间，但是在调用函数时不会出现未解析符号的错误

RTLD_GLOBAL: 表示将库中定义的符号加入全局符号表，这样加载同一库的其他代码也能够访问这些符号。如果不指定这个选项，则库中的符号只能被本库及其依赖项访问。使用这个选项可能会导致符号冲突和命名空间污染的问题，因此应该谨慎使用。

RTLD_LOCAL : 表示不将库中定义的符号加入全局符号表，而是使这些符号仅在加载该库的进程中可见。这是默认行为，除非指定了 RTLD_GLOBAL 选项。

dlsym 获取函数指针 dlsym 获取函数指针


void *dlsym(void *handle, const char *symbol);

handle：由 dlopen 返回的动态链接库句柄
symbol：要获取的函数或变量的名称

简单例子

创建一个共享库


[root@node86 tmp]# cat hello.c
#include <stdio.h>

void hello() {
    printf("tttttttt !\n");
}

[root@node86 tmp]# gcc -shared -o libhello.so hello.c -fPIC
[root@node86 tmp]# ll libhello.so
-rwxr-xr-x. 1 root root 8208 Mar  9 11:58 libhello.so
[root@node86 tmp]#

打开共享库


#include <stdio.h>
#include <dlfcn.h>

int main() {
    void *handle;
    void (*hello)();
    // 打开动态链接库
    handle = dlopen("./libhello.so", RTLD_LAZY);
    if (!handle) {
        fprintf(stderr, "%s\n", dlerror());
        return 1;
    }
    // dlsym会通过 handle 找到符号地址，hello 也就是函数名字
    hello = (void (*)()) dlsym(handle, "hello");
    if (!hello) {
        fprintf(stderr, "%s\n", dlerror());
        dlclose(handle);
        return 1;
    }
	// 此时 hello 就是个函数
    // 调用函数 hello
    hello();
    // 关闭动态链接库
    dlclose(handle);
    return 0;
}

编译执行，符合预期


[root@node86 tmp]# gcc main.c -o main -ldl
[root@node86 tmp]# ./main
tttttttt !
[root@node86 tmp]#

cls在那？

cls的实现代码都放在源码目录下的cls 目录，并且根据不同模块的又划分了很多目录，（rgw 中很多实现都是在用cls 方式实现的）


[root@node86 cls]# ls -l
total 8
drwxr-xr-x. 2 root root   94 Jan 23 22:22 cas
drwxr-xr-x. 2 root root  102 Jan 23 22:22 cephfs
-rw-r--r--. 1 root root 7491 Jan 23 22:22 CMakeLists.txt
.....
drwxr-xr-x. 2 root root   24 Jan 23 22:22 sdk
drwxr-xr-x. 2 root root  147 Jan 23 22:22 timeindex
drwxr-xr-x. 2 root root  170 Jan 23 22:22 user
drwxr-xr-x. 2 root root  165 Jan 23 22:22 version

若编译成功， cls 全部模块都会编译成 .so动态库文文件存放编译目录下 lib 中，


/home/ceph/build/lib
[root@node86 lib]# ll |grep cls
-rw-r--r--. 1 root root    2876232 Jan 23 22:32 libcls_cas_client.a
lrwxrwxrwx. 1 root root         15 Jan 23 22:32 libcls_cas.so -> libcls_cas.so.1
....
....
lrwxrwxrwx. 1 root root         19 Jan 23 22:32 libcls_cas.so.1 -> libcls_cas.so.1.0.0
-rwxr-xr-x. 1 root root    4182408 Jan 23 22:32 libcls_cas.so.1.0.0
-rw-r--r--. 1 root root    8261516 Jan 23 22:33 libcls_cephfs_client.a
lrwxrwxrwx. 1 root root         18 Jan 23 22:32 libcls_cephfs.so -> libcls_cephfs.so.1

cls 代码是怎样的？

以为 ceph提供的 cls\_hello daemon 为例


#include <algorithm>
#include <string>
#include <sstream>
#include <errno.h>
#include "objclass/objclass.h"
//cls 版本号
/*
#define CLS_VER(maj,min) \
int __cls_ver__## maj ## _ ##min = 0; \
int __cls_ver_maj = maj; \
int __cls_ver_min = min;
*/
CLS_VER(1,0)
//cls 名字
CLS_NAME(hello)
/*
#define CLS_NAME(name) \
int __cls_name__## name = 0; \
const char *__cls_name = #name;
*/
//要实现的函数
static int say_hello(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
{
  if (in->length() > 100)
 //  这里面函数是跑在 osd 上的
  // 这不同于客户端，这是在存储集群进程osd直接访问
  // this return value will be returned back to the librados caller
  return 0;
}

/*
#define CLS_INIT(name) \
CEPH_CLS_API void __cls_init()
*/
//  CLS_INIT 是必须的，加载 动态库后，会调用 init 函数，主要作用在于把cls中的函数指针持久到一个map中，为以后调用做准备
CLS_INIT(hello)
{
  // this log message, at level 0, will always appear in the ceph-osd
  // log file.
 // cls的句柄，通过该句柄可以找到对应的库
  cls_handle_t h_class;
  // 动态库里面的函数句柄
  cls_method_handle_t h_say_hello;
  // 从系统 读到已加载的动态库
  cls_register("hello", &h_class);

  // There are two flags we specify for methods:
  //    RD : whether this method (may) read prior object state
  //    WR : whether this method (may) write or update the object
  // A method can be RD, WR, neither, or both.  If a method does
  // neither, the data it returns to the caller is a function of the
  // request and not the object contents.
  //将 say_hello这个函数 注册到 h_say_hello
  cls_register_cxx_method(h_class, "say_hello",
			  CLS_METHOD_RD,
			  say_hello, &h_say_hello);
}

怎么加载的？

ceph 中有个 ClassHandler 类专门管理 cls ，osd 在启动的过程中初始这个类，并且将全部动态库加入到内存中，具体实现在 open_all_classes 中


int OSD::init(){
//.....
  class_handler = new ClassHandler(cct);
  cls_initialize(class_handler);

  if (cct->_conf->osd_open_classes_on_start) {
    int r = class_handler->open_all_classes();
    if (r)
      dout(1) << "warning: got an error loading one or more classes: " << cpp_strerror(r) << dendl;
  }
//....
}

读取lib目录下so文件

做一些字符匹配，找到符合cls 的动态文件

open\_all\_classes


int ClassHandler::open_all_classes()
{
  ldout(cct, 10) << __func__ << dendl;
  //一个目录，并返回一个指向目录流的指针 ,相当于 cd到某个目录
  DIR *dir = ::opendir(cct->_conf->osd_class_dir.c_str());
  if (!dir)
    return -errno;

  struct dirent *pde = nullptr;
  int r = 0;
  // 循环 读取目录中的下一个条目。相当于 ls 这个目录
  while ((pde = ::readdir(dir))) {
    if (pde->d_name[0] == '.')
      continue;
    //做一些字符匹配 ， 找到  对应 符合 cls命名规则的 so库
    if (strlen(pde->d_name) > sizeof(CLS_PREFIX) - 1 + sizeof(CLS_SUFFIX) - 1 &&
	strncmp(pde->d_name, CLS_PREFIX, sizeof(CLS_PREFIX) - 1) == 0 &&
	strcmp(pde->d_name + strlen(pde->d_name) - (sizeof(CLS_SUFFIX) - 1), CLS_SUFFIX) == 0) {
      char cname[PATH_MAX + 1];
      strncpy(cname, pde->d_name + sizeof(CLS_PREFIX) - 1, sizeof(cname) -1);
      cname[strlen(cname) - (sizeof(CLS_SUFFIX) - 1)] = '\0';
      ldout(cct, 10) << __func__ << " found " << cname << dendl;
      //每个cls 动态库交由一个 ClassData管理
      ClassData *cls;
      // skip classes that aren't in 'osd class load list'
      r = open_class(cname, &cls);
      if (r < 0 && r != -EPERM)
	goto out;
    }
  }
 out:
  closedir(dir);
  return r;
}

注册并加载 cls 注册并加载 cls

open\_class

生成 cls 的hangle，并持久化到map 中


int ClassHandler::open_class(const string& cname, ClassData **pcls)
{
  std::lock_guard lock(mutex);
  //  在 map 找，看有没有相同的cname，没有生成一个 新的 ClassData 并保存到一个map 中
  ClassData *cls = _get_class(cname, true);
  if (!cls)
    return -EPERM;
  //如果没打开过 ，怎开始加载
  if (cls->status != ClassData::CLASS_OPEN) {
    int r = _load_class(cls);
    if (r)
      return r;
  }
  *pcls = cls;
  return 0;
}

load\_class 调用系统函数 dlopen 加载到内存中


int ClassHandler::_load_class(ClassData *cls)
{
 // already open
  if (cls->status == ClassData::CLASS_OPEN)
    return 0;
	.....
    // RTLD_NOW 立即加载的方式
    cls->handle = dlopen(fname, RTLD_NOW);
    if (!cls->handle) {
      struct stat st;
 	....
      cls->status = ClassData::CLASS_MISSING;
      return r;
    }

    cls_deps_t *(*cls_deps)();
    //对于每个 shard obj都有可能依赖其他类，先统一加到 missing_dependencies 中
    cls_deps = (cls_deps_t *(*)())dlsym(cls->handle, "class_deps");
    if (cls_deps) {
      cls_deps_t *deps = cls_deps();
      while (deps) {
	if (!deps->name)
	  break;
	ClassData *cls_dep = _get_class(deps->name, false);
	cls->dependencies.insert(cls_dep);
	if (cls_dep->status != ClassData::CLASS_OPEN)
	  cls->missing_dependencies.insert(cls_dep);
	deps++;
      }
    }
  }

  // resolve dependencies
  //统一加载 依赖
  set<ClassData*>::iterator p = cls->missing_dependencies.begin();
  while (p != cls->missing_dependencies.end()) {
    ClassData *dc = *p;
    int r = _load_class(dc);
    if (r < 0) {
      cls->status = ClassData::CLASS_MISSING_DEPS;
      return r;
    }

    ldout(cct, 10) << "_load_class " << cls->name << " satisfied dependency " << dc->name << dendl;
    cls->missing_dependencies.erase(p++);
  }

  // initialize
  // 对于每个cls 都有个 __cls_init 函数
  // __cls_init 会将该动态库内的函数加入（注册）到一个全局map， 以后要是想调用，就根据 函数名字来找到对应函数
  void (*cls_init)() = (void (*)())dlsym(cls->handle, "__cls_init");
  if (cls_init) {
    cls->status = ClassData::CLASS_INITIALIZING;
    cls_init();
  }

  ldout(cct, 10) << "_load_class " << cls->name << " success" << dendl;
  cls->status = ClassData::CLASS_OPEN;
  return 0;

}

__cls_init 具体做了什么？ \_\_cls\_init 具体做了什么？

以为 ceph提供的 cls\_hello daemon 为例， CLS\_INIT 最主要的是实现cls 中的函数注册到一个 cls 对应的map 中

以下以注册 say\_hello为例子


 static int say_hello(cls_method_context_t hctx, bufferlist *in, bufferlist *out)
{
  if (in->length() > 100)
 //  这里面函数是跑在 osd 上的
  // 这不同于客户端，这是在存储集群进程osd直接访问
  // this return value will be returned back to the librados caller
  return 0;
}

/*
#define CLS_INIT(name) \
CEPH_CLS_API void __cls_init()
*/
//  CLS_INIT 是必须的，加载 动态库后，会调用 init 函数，主要作用在于把cls中的函数指针持久到一个map中，为以后调用做准备
CLS_INIT(hello)
{
 // cls的句柄，通过该句柄可以找到对应的库
  cls_handle_t h_class;
  // 动态库里面的函数句柄
  cls_method_handle_t h_say_hello;
  // 从系统 读到已加载的动态库 , 通过  _get_class获取 对应的 ClassData，在 _load_class 已经保存过了
  cls_register("hello", &h_class);

  // There are two flags we specify for methods:
  //    RD : whether this method (may) read prior object state
  //    WR : whether this method (may) write or update the object
  // A method can be RD, WR, neither, or both.  If a method does
  // neither, the data it returns to the caller is a function of the
  // request and not the object contents.
  //将 say_hello这个函数 注册到 h_say_hello
  cls_register_cxx_method(h_class, "say_hello",
			  CLS_METHOD_RD,
			  say_hello, &h_say_hello);
}

cls\_register\_cxx\_method


int cls_register_method(cls_handle_t hclass, const char *method, int flags,cls_method_call_t class_call, cls_method_handle_t *handle);

hclass : cls对应的类（ClassData）

method： cls 函数名

flags：指定cls 函数是什么操作类型的，有三种，三种模式可以并存

CLS\_METHOD\_RD
CLS\_METHOD\_WR
CLS\_METHOD\_PROMOTE
class\_call： cls 函数的函数指针

handle： cls 函数handle


// cls_handle_t 是个空指针   typedef void *cls_handle_t;
//cls_method_handle_t 也是空指针

int cls_register_cxx_method(cls_handle_t hclass, const char *method,int flags,cls_method_cxx_call_t class_call, cls_method_handle_t *handle)
{
  ClassHandler::ClassData *cls = (ClassHandler::ClassData *)hclass;
  cls_method_handle_t hmethod = (cls_method_handle_t)cls->register_cxx_method(method, flags, class_call);
  if (handle)
    *handle = hmethod;
  return (hmethod != NULL);
}

register\_cxx\_method

将 cls 函数指针，以及相关信息持久化到 map 中


ClassHandler::ClassMethod *ClassHandler::ClassData::register_cxx_method(const char *mname,int  flags, cls_method_cxx_call_t func)
{
  ClassMethod& method = methods_map[mname];
  method.cxx_func = func;
  method.name = mname;
  method.flags = flags;
  method.cls = this;
  return &method;
}

以上是一个cls 模块代码加载到osd内存中的流程

!image-20240311232306333

上层业务怎么调用的？

cls 代码是跑在osd 上的，业务层要想调用cls模块的相关函数，需要将发送一个op 给osd ，告诉osd 要执行那个cls 函数，osd 收到op 请求后会执行从对应的cls handle 中找到相应函数函数指针并执行；

以gc remove 为例子，gc 流程后面需要生成 gc对象上的omap ，此时就是调用了cls 函数


int RGWGC::remove(int index, const std::vector<string>& tags, AioCompletion **pc)
{
  ObjectWriteOperation op;
 //cls 入口
  cls_rgw_gc_remove(op, tags);
  return store->gc_aio_operate(obj_names[index], &op, pc);
}

cls\_rgw\_gc\_remove


void cls_rgw_gc_remove(librados::ObjectWriteOperation& op, const vector<string>& tags)
{
  bufferlist in;
  //cls函数需要的结构体，编码到 buferlist
  cls_rgw_gc_remove_op call;
  call.tags = tags;
  encode(call, in);
// RGW_CLASS 说明是 rgw的 cls
// RGW_GC_REMOVE 是个宏定义
// #define RGW_GC_REMOVE "gc_remove"
  op.exec(RGW_CLASS, RGW_GC_REMOVE, in);
}

op.exec


void librados::ObjectOperation::exec(const char *cls, const char *method, bufferlist& inbl, bufferlist *outbl, int *prval)
{
  ::ObjectOperation *o = &impl->o;
  o->call(cls, method, inbl, outbl, NULL, prval);
}

o->call


void call(const char *cname, const char *method, bufferlist &indata)
{
    add_call(CEPH_OSD_OP_CALL, cname, method, indata, NULL, NULL, NULL); //
}

add\_call 后将 op 发送给osd（这里不在详细介绍怎么发，以及发个那个osd）


void add_call(int op, const char *cname, const char *method, bufferlist &indata, bufferlist *outbl, Context *ctx, int *prval)
{
    OSDOp& osd_op = add_op(op); // 增加op
	// 初始化op各成员
    unsigned p = ops.size() - 1;
    out_handler[p] = ctx;
    out_bl[p] = outbl;
    out_rval[p] = prval;
    osd_op.op.op = op;
    osd_op.op.cls.class_len = strlen(cname);
    osd_op.op.cls.method_len = strlen(method);
    osd_op.op.cls.indata_len = indata.length();
	// 插件的所有输入参数，都是此次写操作的indata，包括插件名，方法名，输入参数
    osd_op.indata.append(cname, osd_op.op.cls.class_len);
    osd_op.indata.append(method, osd_op.op.cls.method_len);
    osd_op.indata.append(indata);
}

假设osd 收到了 op , osd 会根据op的类型来做不同处理，刚才add\_call 加入的是 CEPH\_OSD\_OP\_CALL 类型的 op


int ReplicatedPG::do_osd_ops(OpContext *ctx, vector<OSDOp>& ops){
    case CEPH_OSD_OP_CALL:
    {
	string cname, mname;
	bufferlist indata;
    //op里面携带了cls函数需要的参数，直接copy出来
	try {
	  bp.copy(op.cls.class_len, cname);
	  bp.copy(op.cls.method_len, mname);
	  bp.copy(op.cls.indata_len, indata);
	} catch (buffer::error& e) {
	  .....
	}
    // 根据 cname 找到对应的 Cla ClassData
	ClassHandler::ClassData *cls;
	result = osd->class_handler->open_class(cname, &cls);
	ceph_assert(result == 0);   // init_op_flags() already verified this works.
    //在根据 函数名字，从map 找到对应 method （里面包含了函数相关信息）
	ClassHandler::ClassMethod *method = cls->get_method(mname.c_str());
	...
	bufferlist outdata;
	int prev_rd = ctx->num_read;
	int prev_wr = ctx->num_write;
    //执行 cls 函数
	result = method->exec((cls_method_context_t)&ctx, indata, outdata);
	//....
}

这里再看下 exec, 函数里面有分为两种执行方式，因为 cls 可能是以 c 或者 c++写的，比如 c写的不能用直接用string ，所以要分开来


int ClassHandler::ClassMethod::exec(cls_method_context_t ctx, bufferlist& indata, bufferlist& outdata)
{
  int ret;
  if (cxx_func) {
    // C++ call version
    ret = cxx_func(ctx, &indata, &outdata);
  } else {
    // C version
    char *out = NULL;
    int olen = 0;
    ret = func(ctx, indata.c_str(), indata.length(), &out, &olen);
    if (out) {
      // assume *out was allocated via cls_alloc (which calls malloc!)
      buffer::ptr bp = buffer::claim_malloc(olen, out);
      outdata.push_back(bp);
    }
  }
  return ret;
}

以上就是cls 加载和实现流程； rgw 中有很多cls 的操作，所用需要深入的了解 cls的调用流程，如果有业务需求也可以写相关的cls 函数；

此外这种加载函数的编码方式也是很有借鉴价值的；后续会再介绍下 cls 锁的实现。

菜单

分享

ceph中cls 实现

ceph中cls 实现

概览

先介绍几个关键函数

dlopen 打开动态链接库 dlopen 打开动态链接库

dlsym 获取函数指针 dlsym 获取函数指针

简单例子

cls在那？

cls 代码是怎样的？

怎么加载的？

读取lib目录下so文件

注册并加载 cls 注册并加载 cls

__cls_init 具体做了什么？ \_\_cls\_init 具体做了什么？

上层业务怎么调用的？

评论

A2A 初理解：让 AI Agent 真正“互相协作”的通用协议

slow op的排查手段（更新中）

模型即芯片：AI 推理新分叉

rclone拷贝桶对象失败定位过程

asan内存检测

vector扩容

训练初了解：把大模型看成一个复杂函数（通俗版）

智能指针是线程安全的？

cas 无锁编程

LeetCode-有序数组的平方