Bigtable

First please read Google’s official doc.

Bigtable example

This depends on APIs from scio-bigtable and imports from com.spotify.scio.bigtable._.

Look at example here.

Common issues

Size of the cluster vs Dataflow cluster

As a general note when writing to Bigtable from Dataflow you should at most use the # of Bigtable nodes you have * 3 cpus. Otherwise Bigtable will be overwhelmed and throttle the writes (and the reads)

Cell compression

Bigtable doesn’t compress cell values > 1Mb

Jetty ALPN/NPN has not been properly configured

Check that your versions of grpc-netty, netty-handler, and netty-tcnative-boringssl-static are compatible.

BigtableIO

The BigtableIO included in the Dataflow SDK is not recommended for use. It is not written by the Bigtable team and is significantly less performant than the HBase Bigtable Dataflow connector. Please see the example above for the recommended API.

Key structure

Your row key should not contain common parts at the beginning of the key, doing so would overload specific Bigtable nodes. For example, if your row is identifiable by user-id and date key - do NOT use date,user-id, instead use user-id,date or even better in case of date use Bigtable version/timestamp. Read more about row key design over here.

Performance

Read Google doc.

Bigtable vs Datastore

If you require replacement for Cassandra, Bigtable is probable the most straightforward replacement in GCP. Bigtable white paper. To quote the paper - think of Bigtable as:

a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.

Bigtable is replicated only within a single zone. Bigtable does not support transactions, that said all operations are atomic at the row level.

Think of Datastore as distributed, persistent, fully managed key-value store, with support for transactions. Datastore is replicated across multiple datacenters thus making it theoretically more available than Bigtable (as of today).

Read more about Bigtable here, and more about Datastore over here.