First please read Google’s official doc.
This depends on APIs from
scio-bigtable and imports from
Look at example here.
As a general note when writing to Bigtable from Dataflow you should at most use the # of Bigtable nodes you have * 3 cpus. Otherwise Bigtable will be overwhelmed and throttle the writes (and the reads)
Bigtable doesn’t compress cell values > 1Mb
Check that your versions of
netty-tcnative-boringssl-static are compatible.
The BigtableIO included in the Dataflow SDK is not recommended for use. It is not written by the Bigtable team and is significantly less performant than the HBase Bigtable Dataflow connector. Please see the example above for the recommended API.
Your row key should not contain common parts at the beginning of the key, doing so would overload specific Bigtable nodes. For example, if your row is identifiable by
date key - do NOT use
date,user-id, instead use
user-id,date or even better in case of
date use Bigtable
version/timestamp. Read more about row key design over here.
Read Google doc.
If you require replacement for Cassandra, Bigtable is probable the most straightforward replacement in GCP. Bigtable white paper. To quote the paper - think of Bigtable as:
a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
Bigtable is replicated only within a single zone. Bigtable does not support transactions, that said all operations are atomic at the row level.
Think of Datastore as distributed, persistent, fully managed key-value store, with support for transactions. Datastore is replicated across multiple datacenters thus making it theoretically more available than Bigtable (as of today).