March 20, 2017

Big Data Storage Wars: Ceph vs Gluster

Written by
Tags: Big Data

In the search for infinite cheap storage, the conversation eventually finds its way to comparing Ceph vs. Gluster. Your teams can use both of these open-source software platforms to store and administer massive amounts of data, but the manner of storage and resulting complications for retrieval separate them.

Both programs are categorized as SDS, or “software-defined storage.” Because Ceph and Gluster are open-source, they provide certain advantages over proprietary solutions. Open-source SDS gives users the flexibility to connect any supported software or hardware without the restrictions a provider might impose on operating system or usage. 

ALSO READ: Top 5 Security-as-a-Service Providers

VP and general manager Ranga Rangachari at RedHat describes the difference between the two programs:

Ceph is part and parcel to the OpenStack story. In the community, [the majority] of the OpenStack implementations were using Ceph as the storage substrate. Gluster is classic file serving, second-tier storage, and deep archiving.”

In simpler terms, Ceph and Gluster both provide powerful storage, but Gluster performs well at higher scales that could multiply from tera to petabytes in a short time. Ceph does provides rapid storage scaling, but the storage format lends itself to shorter-term storage that users access more frequently.

Overview

Ceph vs. Gluster: Interaction with Files

Ceph: scalable object storage with block and file capabilities

Gluster: scalable file storage with object capabilities

The differences, of course, are more nuanced than this, based on they way each program handles the data it stores.

Ceph uses object storage, which means it stores data in binary objects spread out across lots of computers. It builds a private cloud system with OpenStack technology, and users can mix unstructured and structured data in the same system.

Gluster uses block storage, which stores a set of data in chunks on open space in connected Linux computers. It builds a highly scalable system with access to more traditional storage and file transfer protocols, and can scale quickly and without a single point of failure. That means you can store huge amounts of older data without losing accessibility or security. An April 2014 study by IOP Science showed that Gluster outperformed Ceph, but still showed some instabilities that resulted in partial or total data loss.

Interaction with Files

Both use a standard POSIX or NFS interface, and users can interact with data as though through a standard file system. Both provide search and retrieval interfaces for the data you store. But if your team plans on doing anything with big data, you’ll  want to know which of these to choose.

Ceph distributes data across computers in the cluster and allows the user to access all of the data at once through the interface. On the backend, CephFS communicates with the disparate parts of the cluster and stores data without much user intervention. Multiple clients can also access the store without intervention.

Ceph vs. Gluster: Ceph Dashboard

Ceph dashboard, via the Calamari management and monitoring system.

Gluster also distributes data to connected computers, but data storage happens in blocks, keeping everything together. The GlusterFS finds appropriately sized storage areas for the data in any one of the storage locations, places the data for storage, and creates an identifying hash. The program stores data on kernel systems and doesn’t produce another metadata system, instead creating a unique hash for the file. Without the interference of a metadata server, Gluster reacts and scales more quickly than its competitors, but still maintains usability. From the interface, users see their data blocks as directories. Because each file has a unique hash, a user must make a copy before renaming, or else lose access to the data.

Ceph vs. Gluster: Gluster Dashboard

GDash — the GlusterFS Dashboard.

Complications

Ceph requires monitor nodes in an odd number distributed throughout your system to obtain a quorum and reduce the likelihood of “split-brain” and resulting data loss.

Gluster runs at a default block size twice that of Ceph: 128k for Gluster and 64k for Ceph. Gluster claims that their increased block size makes for faster processing, but with a little work, you can increase Ceph’s block size and increase capabilities as well.

Both of these programs are open-source, but companies can purchase third-party management solutions that connect to Ceph and Gluster. The most popular management tools for each are:

CephInkTank, RedHat, Decapod, Intel, 

Gluster: RedHat

Conclusions

Deciding whether to use Ceph vs. Gluster depends on numerous factors, but either can provide extendable and stable storage of your data. Companies looking for easily accessible storage that can quickly scale up or down may find that Ceph works well. Those who plan on storing massive amounts of data without too much movement should probably look into Gluster.

Looking for more storage and big data solutions? Check out our Product Selection Tool for comparisons, reviews, and suggestions.