Scio v0.13.0

gcs-connector now explicitly required

Previously Scio shipped with com.google.cloud.bigdataoss:gcs-connector as part of scio-parquet. This dependency is now removed, so gcs-connector must be explicitly enabled if using parquet on GCS:

val bigdataossVersion = "2.2.6"

libraryDependencies ++= Seq(
  "com.google.cloud.bigdataoss" % "gcs-connector" % s"hadoop2-$bigdataossVersion"
)

Removed scio-elasticsearch6

Please migrate to scio-elasticsearch8.

scio-elasticsearch7 migrated to java client

saveAsElasticsearch now requires a transform function returning co.elastic.clients.elasticsearch.core.bulk.BulkOperation instead of org.elasticsearch.action.DocWriteRequest.

New File based ScioIO parameters

File-based IOs now consistently have a suffix parameter. In cases where ReadParam was Unit, then a new param will be required. This is the case for example with AvroIO and GenericRecordIO:

- sc.read(GenericRecordIO(path, schema))
+ sc.read(GenericRecordIO(path, schema))(AvroIO.ReadParam(suffix))
- sc.read(SpecificRecordIO[T](path))
+ sc.read(SpecificRecordIO[T](path))(AvroIO.ReadParam(suffix))

Kryo Coders nondeterministic

Kryo coders in Scio have long been marked as deterministic but users were cautioned to not use them in cases where determinism is important (e.g. with distinct or to encode keys in keyed operations) and when the Kryo coders were not explicitly known to be deterministic. Users who did not understand or follow these instructions could silently produce corrupt data or incomplete results.

Kryo coders are now marked as nondeterministic in all cases and an exception will be thrown if used in keyed operations.

Changed skewedJoin API

Removes some variants of skewedJoin APIs with Long threshold parameters. Use the variants with a HotKeyMethod parameter instead, providing HotKeyMethod.Threshold(myThresold) as its value.

Tensorflow unused predict type parameter

The Tensorflow predict and predictWithSigDef methods had an unused type parameter that is now removed.

- elements.predict[B, D]("gs://model-path", fetchOpts, options)(toTensors)(fromTensors)
+ elements.predict[B]("gs://model-path", fetchOpts, options)(toTensors)(fromTensors)
- elements.predictWithSigDef[B, D]("gs://model-path", options)(toTensors)(fromTensors _)
+ elements.predictWithSigDef[B]("gs://model-path", options)(toTensors)(fromTensors _)