Cassandra 0.7 值得期待



在Cassandra的wiki上,很早就有0.7的一些特性描述,其中很有些吸引人,而8月13号Cassandra 0.7 beta1版本终于发布了,这里可以下载。


  1. Key Space和Column Family定义可以在线增改,不再需要停集群修改配置文件了。
  2. 支持secondary index,可以对column建索引,通过接口get_indexed_slices实现针对column的查询。
  3. 支持truncate一个column family。
  4. 可以针对keyspace设置replica_placement_strategy和replication_factor。
  5. Row cache提升了8倍的读性能。之前版本的测试中,Cassandra写性能令人印象深刻,读性能则不如人意。
  6. 支持hadoop格式的输出,可以使得数据仓库更容易从Cassandra中抽取数据。



- Row keys are now bytes: keys stored by versions prior to 0.7.0 will be
returned as UTF-8 encoded bytes. OrderPreservingPartitioner and
CollatingOrderPreservingPartitioner continue to expect that keys contain
UTF-8 encoded strings, but RandomPartitioner no longer expects strings.
- A new ByteOrderedPartitioner supports bytes keys with arbitrary content,
and orders keys by their byte value.
- Truncate thrift method allows clearing an entire ColumnFamily at once
- DatacenterShardStrategy is ready for use, enabling
ConsitencyLevel.DCQUORUM and DCQUORUMSYNC. See comments in
- row size limit increased from 2GB to 2 billion columns
- Hadoop OutputFormat support
- Streaming data for repair or node movement no longer requires
anticompaction step first
- keyspace is per-connection in the thrift API instead of per-call
- optional round-robin scheduling between keyspaces for multitenant
- dynamic endpoint snitch mitigates the impact of impaired nodes
- significantly faster reads from row cache
- introduced IntegerType that is both faster than LongType and
allows integers of both less and more bits than Long’s 64

- Configuration file renamed to cassandra.yaml and to
- Added ‘bin/config-converter’ to convert existing storage-conf.xml or
cassandra.xml files to a cassandra.yaml file. When executed, it will
create a cassandra.yaml file in any directory containing a matching
xml file.
- The ThriftAddress and ThriftPort directives have been renamed to
RPCAddress and RPCPort respectively.
- The keyspaces defined in cassandra.yaml are ignored on startup as a
result of CASSANDRA-44. A JMX method has been exposed in the
StorageServiceMBean to force a schema load from cassandra.yaml. It
is a one-shot affair though and you should conduct it on a seed node
before other nodes. Subsequent restarts will load the schema from the
system table and attempts to load the schema from YAML will be ignored.
You shoud only have to do this for one node since new nodes will receive
schema updates on startup from the seed node you updated manually.
- EndPointSnitch was renamed to RackInferringSnitch. A new SimpleSnitch
has been added.
- RowWarningThresholdInMB replaced with in_memory_compaction_limit_in_mb
- GCGraceSeconds is now per-ColumnFamily instead of global
- Configuration of DatacenterShardStrategy is now a part of the keyspace
definition using the strategy_options attribute.
The file is no longer used.


- StreamingService moved from o.a.c.streaming to o.a.c.service
- GMFD renamed to GOSSIP_STAGE
- {Min,Mean,Max}RowCompactedSize renamed to {Min,Mean,Max}RowSize
since it no longer has to wait til compaction to be computed

Thrift API
- Row keys are now ‘bytes’: see the Features list.
- The return type for login() is now AccessLevel.
- The get_string_property() method has been removed.
- The get_string_list_property() method has been removed.

- If extending AbstractType, make sure you follow the singleton pattern
followed by Cassandra core AbstractType extensions.
e.g. BytesType has a variable called ‘instance’ and an empty constructor
with default access

* sstable versioning (CASSANDRA-389)
* switched to slf4j logging (CASSANDRA-625)
* access levels for authentication/authorization (CASSANDRA-900)
* add ReadRepairChance to CF definition (CASSANDRA-930)
* fix heisenbug in system tests, especially common on OS X (CASSANDRA-944)
* convert to byte[] keys internally and all public APIs (CASSANDRA-767)
* ability to alter schema definitions on a live cluster (CASSANDRA-44)
* renamed configuration file to cassandra.xml, and to, which must now be loaded from
the classpath (which is how our scripts in bin/ have always done it)
* change get_count to require a SlicePredicate. create multi_get_count
* re-organized endpointsnitch implementations and added SimpleSnitch
* Added preload_row_cache option (CASSANDRA-946)
* add CRC to commitlog header (CASSANDRA-999)
* removed multiget thrift method (CASSANDRA-739)
* removed deprecated batch_insert and get_range_slice methods (CASSANDRA-1065)
* add truncate thrift method (CASSANDRA-531)
* http mini-interface using mx4j (CASSANDRA-1068)
* optimize away copy of sliced row on memtable read path (CASSANDRA-1046)
* replace constant-size 2GB mmaped segments and special casing for index
entries spanning segment boundaries, with SegmentedFile that computes
segments that always contain entire entries/rows (CASSANDRA-1117)
* avoid reading large rows into memory during compaction (CASSANDRA-16)
* added hadoop OutputFormat (CASSANDRA-1101)
* efficient Streaming (no more anticompaction) (CASSANDRA-579)
* split commitlog header into separate file and add size checksum to
mutations (CASSANDRA-1179)
* avoid allocating a new byte[] for each mutation on replay (CASSANDRA-1219)
* revise HH schema to be per-endpoint (CASSANDRA-1142)
* add joining/leaving status to nodetool ring (CASSANDRA-1115)
* allow multiple repair sessions per node (CASSANDRA-1190)
* add dynamic endpoint snitch (CASSANDRA-981)
* optimize away MessagingService for local range queries (CASSANDRA-1261)
* make framed transport the default so malformed requests can’t OOM the
server (CASSANDRA-475)
* significantly faster reads from row cache (CASSANDRA-1267)
* take advantage of row cache during range queries (CASSANDRA-1302)
* make GCGraceSeconds a per-ColumnFamily value (CASSANDRA-1276)
* keep persistent row size and column count statistics (CASSANDRA-1155)
* add IntegerType (CASSANDRA-1282)
* page within a single row during hinted handoff (CASSANDRA-1327)
* push DatacenterShardStrategy configuration into keyspace definition,
eliminating (CASSANDRA-1066)
* optimize forward slices starting with ” and single-index-block name
queries by skipping the column index (CASSANDRA-1338)
* streaming refactor (CASSANDRA-1189)



分类 Cassandra · tag