Cassandra: The Definitive Guide

Language: English

Pages: 370

ISBN: 1491933666

Format: PDF / Kindle (mobi) / ePub

Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you’ll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition—updated for Cassandra 3.0—provides the technical details and practical examples you need to put this database to work in a production environment.

Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra’s non-relational design, with special attention to data modeling. If you’re a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra’s speed and flexibility.

Understand Cassandra’s distributed and decentralized structure
Use the Cassandra Query Language (CQL) and cqlsh—the CQL shell
Create a working data model and compare it with an equivalent relational model
Develop sample applications using client drivers for languages including Java, Python, and Node.js
Explore cluster topology and learn how nodes exchange data
Maintain a high level of performance in your cluster
Deploy Cassandra on site, in the Cloud, or with Docker
Integrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene

11 You can read the 1986 paper “The Case for Shared Nothing” online at http://db.cs.berkeley.edu/papers/hpts85-nothing.pdf. It’s only a few pages. If you take a look, you’ll see that many of the features of sharednothing distributed data architecture, such as ease of high availability and the ability to scale to a very large number of machines, are the very things that Cassandra excels at. MongoDB also provides auto-sharding capabilities to manage failover and node balancing. That many

was last updated), and a column family, which is a container for rows that have similar, but not identical, column sets. In relational databases, we’re used to storing column names as strings only—that’s all we’re allowed. But in Cassandra, we don’t have that limitation. Both row keys and column names can be strings, like relational column names, but they can also be long integers, UUIDs, or any kind of byte array. So there’s some variety to how your key names can be set. This reveals another

rows—that’s the skinny model. Wide rows typically contain automatically generated names (like UUIDs or timestamps) and are used to store lists of things. Consider a monitoring application as an example: you might have a row that represents a time slice of an hour by using a modified timestamp as a row key, and then store columns representing IP addresses that accessed your application within that interval. You can then create a new row key after an hour elapses. Columns | 51 Skinny rows are

and ultimately makes one question whether using a relational database is the best approach in these circumstances. A second reason that relational databases get denormalized on purpose is a business document structure that requires retention. That is, you have an enclosing table that refers to a lot of external tables whose data could change over time, but you need to preserve the enclosing document as a snapshot in history. The common example here is with invoices. You already have Customer and

UnavailableException, TimedOutException, TException, NotFoundException, InterruptedException { Connector conn = new Connector(); Cassandra.Client client = conn.connect(); System.out.println("Defining new keyspace."); KsDef ksdef = new KsDef(); ksdef.name = "ProgKS"; ksdef.replication_factor = 1; ksdef.strategy_class = "org.apache.cassandra.locator.RackUnawareStrategy"; List cfdefs = new ArrayList(); CfDef cfdef1 = new CfDef(); cfdef1.name = "ProgCF1"; cfdef1.keyspace = ksdef.name;

Download sample

Download

M	T	W	T	F	S	S
« Jan
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28