Tuesday, 16 June 2015

Cassandra General Questions


  1. What do you mean by "seed" exactly?
    A IP Address of a node in the cluster, configured into a node just before introducing it to the cluster
  2. Data replication can be defined at Data centre level as well as at key space level. Which one takes the priority if a data file needs to be stored in the cassandra system with key space replication factor as 3 and Data centre replication factor as 6
    CREATE KEYSPACE "Excalibur" WITH REPLICATION =
    {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};
  3. Will 1 node have several Partitions of the same table?
    Yes
  4. How does this data file gets stored in Cassandra ? ( Would like to get more details on this )  Assume the data file size is 1 GB. Cluster is across 2 Data Centre with 10 nodes each.
    If {'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 3} then each DC will have a full copy of all rows (and partitions) and there will be 3 copies of each partition (and its rows).
  5. Related to Q2. so that means one row in a table is replicating to 5 nodes ( 3 Replicas in dc1 and 2 replicas n dc2 ) for the given keyspace "Escalibur" . Is that correct ?
    Yes
  6. Cluster contain multiple
    Logical Data Centers which contain multiple
    Racks, Availability zones,
    and, occasionally very large NEUMA machines containing multiple
    Cassandra JVMs nodes
  7. Can we identify the nodes where the replicas of a table is stored?
    Nodetool getendpoints and getsstable
  8. Have you seen this URL, and is it a reliable tool for understanding how RF affect availability/consistency? http://www.ecyrd.com/cassandracalculator/
  9. How is split-brain scenario handled in a 2 DC in different Region? Americas Vs EU Vs Asia.
  10. Is anything like an observer/arbiter available in Cassandra? A cluster member which has no data. A cluster member which has no data used only to avoid split-brain. MongoDB for example has arbiters. Cassandra does not use arbiters. It can actually operate in many split brain situations. You have the option to leverage this (and test it if critical) or use application
  11. Was the setup @ Neflix on AWS? -- recent DC crash did not impact Netflix.
    Cassandra provided the required replication semantics, but just as importantly Netflix used/s chaos monkey to kill cassandra and other server nodes randomly. It allows them to harden there devops programming and allowed cassandra to be hardened as well. Then, when the day came, it just worked. Non-stop. No-loss.
  12. What is partition and can partition key be associated with multiple rows?
    Yes. New since C* 1.2.5 and CQL3. Just add Clustering columns to your primary key definition to have multiple rows.
  13. Is there a way to configure fsync setting for writes ? Yes. See DataStax Operations class and DataStax documentation for configuring hardware and storage.
  14. Good starting point for MAX_HEAP_SIZE= 1/4 memory.
    HEAP_NEWSIZE=1/4 MAX_HEAP_SIZE.


References


  1. The DSE and DSC versions are compared at: http://www.datastax.com/download/dse-vs-dsc.
    The DataStax Community version (DSC) is at: http://planetcassandra.org/cassandra/.
    The full DataStax enterprise version (DSE) is at: http://www.datastax.com/download.
    The Apache Open Source version (OSS) is at: http://cassandra.apache.org/download/.
  2. Replication Calculator (single data center) - http://www.ecyrd.com/cassandracalculator/

1 comment: