Hello, Ceph users, TL;DR: how to integrate librados async operations with an event-loop library? Details: I need to store lots of objects, but I don't need a full-featured S3 API, and my objects are mostly small (majority is under 10 KB, see below), so a per-object overhead is significant for me. If I understand it correctly (do I?) radosgw stores each S3 object in its own RADOS object, so it is not suitable for my purpose -- packing multiple source objects to a larger RADOS object is a necessity. I am considering writing my own frontend with RADOS pool[s] as a storage backend, using librados (or maybe libradosstriper for large objects), EC pool for data, and a replicated pool for OMAP object metadata. Librados supports async I/O with callbacks, but I am not sure how does it fit into a wider environment of other asynchronous operations in the same process: - is librados thread-safe, or should I use, for example, a per-thread rados_ioctx_t or even rados_t instance? Should I do some kind of locking when calling librados functions from a multi-threaded program? - is it possible to incorporate librados to an event-driven program, using I/O event loop library (libevent, glib, ...)? If so, how can I make librados to wait for socket I/O using the global event loop? - is it still recommended to have at most 100K OMAP keys per RADOS object? I have created an empty object and set about 120K OMAP keys on it using rados(1) command, and so far the cluster is HEALTH_OK. - what is the preferred RADOS object size (for packing many small source objects into a single RADOS object)? =============================== FWIW, one of the instances of my data has the following size and distribution of object sizes: Total: 210M objs, 114 TB total size 0B-1KB: 21M objs, 18 GB total size 1KB-10KB: 137M objs, 344 GB total size 10KB-100KB: 31M objs, 1 TB total size 100KB-1MB: 15M objs, 5 TB total size 1MB-10MB: 4M objs, 15 TB total size 10MB-100MB: 966K objs, 25 TB total size 100MB-1GB: 137K objs, 38 TB total size 1GB-10GB: 13K objs, 26 TB total size 10GB-100GB: 162 objs, 3 TB total size I have a not-so-big Ceph cluster (3 mons, 15-ish OSD nodes, each with two HDD-based OSDs with metadata/OMAP on NVMe). Thanks, -Yenya -- | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | | https://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise. --Larry Wall _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx