Librados async operations in C

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



	Hello, Ceph users,

TL;DR: how to integrate librados async operations with an event-loop library?

Details:

I need to store lots of objects, but I don't need a full-featured S3 API,
and my objects are mostly small (majority is under 10 KB, see below),
so a per-object overhead is significant for me.

If I understand it correctly (do I?) radosgw stores each S3 object in its
own RADOS object, so it is not suitable for my purpose -- packing
multiple source objects to a larger RADOS object is a necessity.

I am considering writing my own frontend with RADOS pool[s] as a storage
backend, using librados (or maybe libradosstriper for large objects),
EC pool for data, and a replicated pool for OMAP object metadata.
Librados supports async I/O with callbacks, but I am not sure how does it
fit into a wider environment of other asynchronous operations in the
same process:

- is librados thread-safe, or should I use, for example, a per-thread
  rados_ioctx_t or even rados_t instance? Should I do some kind of locking
  when calling librados functions from a multi-threaded program?

- is it possible to incorporate librados to an event-driven program,
  using I/O event loop library (libevent, glib, ...)? If so, how can I
  make librados to wait for socket I/O using the global event loop?

- is it still recommended to have at most 100K OMAP keys per RADOS object?
  I have created an empty object and set about 120K OMAP keys on it
  using rados(1) command, and so far the cluster is HEALTH_OK.

- what is the preferred RADOS object size (for packing many small
  source objects into a single RADOS object)?

===============================

FWIW, one of the instances of my data has the following size
and distribution of object sizes:

Total:       210M objs, 114 TB total size

0B-1KB:       21M objs,  18 GB total size
1KB-10KB:    137M objs, 344 GB total size
10KB-100KB:   31M objs,   1 TB total size
100KB-1MB:    15M objs,   5 TB total size
1MB-10MB:      4M objs,  15 TB total size
10MB-100MB:  966K objs,  25 TB total size
100MB-1GB:   137K objs,  38 TB total size
1GB-10GB:     13K objs,  26 TB total size
10GB-100GB:  162  objs,   3 TB total size

I have a not-so-big Ceph cluster (3 mons, 15-ish OSD nodes, each with two
HDD-based OSDs with metadata/OMAP on NVMe).

Thanks,

-Yenya

-- 
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| https://www.fi.muni.cz/~kas/                        GPG: 4096R/A45477D5 |
    We all agree on the necessity of compromise. We just can't agree on
    when it's necessary to compromise.                     --Larry Wall
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux