Hi Shawn, setting replikation size to 2 is dangerous. It's not a question of if you lose data, but when! With regard to data integrity, this is an extremely bad decision. If you want, I can find the mathematical proof for you. Regards, Joachim joachim.kraftmayer@xxxxxxxxx www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306 Anthony D'Atri <aad@xxxxxxxxxxxxxx> schrieb am Sa., 16. Aug. 2025, 03:20: > > > > I recently set up a 5-node Proxmox/Ceph cluster with 10 OSDs each node > for a total for 50 nodes (approximately 135TB). > > All the nodes have 768GB RAM and a single 72-Core (144 thread) Epyx CPU. > > I set the ceph pool size to: 3 and the minimum size: 2. > > > The performance seemed good and everything was happy in cephland! > > The other day I spoke to a ceph consultant and he recommended I change > the pool size from 3/2 to 2/1. > > Danger, Will Robinson! > > > He cited several valid points: more usable storage space, faster > rebuilds, better performance… > > I followed his advice and changed it. The benchmark performance was > about the same, but the recovery time when I took a node down was improved. > > I really do like the idea of the extra storage space! > > So now I am confused whether I should leave it or go back to 3 replicas. > > While all of the above are true, you run the risk of data loss or > corruption. Sometimes data is a scratchpad or easily recreated, but if > your VMs are using RBD volumes for boot drives, that likely isn't the case. > > With 2/1 there are certain sequences of drive / node / network / daemon > failures that can result in the loss of data, or not knowing which if > either copy is actually up to date. > > I have seen this happen with my own eyes, after I advised $company of the > risk and they decided it wasn't a priority. The result was that a customer > lost data. > > Say you take a node down for maintenance, and while it's down, an OSD > drive on another node fails. Most likely it's taken writes after the first > node was taken down, so now the only current copy of data is lost. Off > this mortal coil. Pushing up the daisies. You get the idea. See Dan's > presentation from Ceph Day Seattle for context. > > > > Anyone have any thoughts or compelling reasons I should leave it or > change it back? > > Unless you have very specific needs, change it back. You can have a > separate size=2 pool for non-critical data if you like, with all manner of > warnings to users. > > > > > > > > Shawn > > Shawn Heil > > Phone: (608) 836-7041 | Direct: 6084102333 > > Email: shawn.heil@xxxxxxxxxxxxxxxx <mailto:shawn.heil@xxxxxxxxxxxxxxxx> > | Website: www.brucecompany.com <http://www.brucecompany.com/> > > <SocialLink_Facebook_32x32_24bd5ab0-6667-4cda-9838-e5caaf24a024.png>Find > us on Facebook < > https://www.facebook.com/pages/The-Bruce-Company/113279807065?ref=hl> > > > > <BruceCompanyHomepageLogo(2025)_429ac410-efd6-49ca-8739-0317666483cd.jpg> > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto: > ceph-users-leave@xxxxxxx> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx