Scio v0.13.0
gcs-connector now explicitly required
Previously Scio shipped with com.google.cloud.bigdataoss:gcs-connector
as part of scio-parquet
. This dependency is now removed, so gcs-connector
must be explicitly enabled if using parquet on GCS:
val bigdataossVersion = "2.2.6"
libraryDependencies ++= Seq(
"com.google.cloud.bigdataoss" % "gcs-connector" % s"hadoop2-$bigdataossVersion"
)
Removed scio-elasticsearch6
Please migrate to scio-elasticsearch8
.
scio-elasticsearch7
migrated to java client
saveAsElasticsearch
now requires a transform function returning co.elastic.clients.elasticsearch.core.bulk.BulkOperation
instead of org.elasticsearch.action.DocWriteRequest
.
New File based ScioIO parameters
File-based IOs now consistently have a suffix
parameter. In cases where ReadParam
was Unit
, then a new param will be required. This is the case for example with AvroIO
and GenericRecordIO
:
- sc.read(GenericRecordIO(path, schema))
+ sc.read(GenericRecordIO(path, schema))(AvroIO.ReadParam(suffix))
- sc.read(SpecificRecordIO[T](path))
+ sc.read(SpecificRecordIO[T](path))(AvroIO.ReadParam(suffix))
Kryo Coders nondeterministic
Kryo coders in Scio have long been marked as deterministic but users were cautioned to not use them in cases where determinism is important (e.g. with distinct
or to encode keys in keyed operations) and when the Kryo coders were not explicitly known to be deterministic. Users who did not understand or follow these instructions could silently produce corrupt data or incomplete results.
Kryo coders are now marked as nondeterministic in all cases and an exception will be thrown if used in keyed operations.
Changed skewedJoin
API
Removes some variants of skewedJoin
APIs with Long
threshold parameters. Use the variants with a HotKeyMethod
parameter instead, providing HotKeyMethod.Threshold(myThresold)
as its value.
Tensorflow unused predict type parameter
The Tensorflow predict
and predictWithSigDef
methods had an unused type parameter that is now removed.
- elements.predict[B, D]("gs://model-path", fetchOpts, options)(toTensors)(fromTensors)
+ elements.predict[B]("gs://model-path", fetchOpts, options)(toTensors)(fromTensors)
- elements.predictWithSigDef[B, D]("gs://model-path", options)(toTensors)(fromTensors _)
+ elements.predictWithSigDef[B]("gs://model-path", options)(toTensors)(fromTensors _)