Breaking Changelog

Breaking changes since 0.13.0¶

Removed scio-elasticsearch6
Migrated scio-elasticsearch7 to new java client
Changed skewedJoin API (scalafix rule provided)
New File based ScioIO parameters (notably suffix in the read params)
Removal of unused type parameter on tensorflow predict and predictWithSigDef

Fixed a severe Parquet IO issue introduced in 0.11.2. Incompatible versions of com.google.http-client:google-http-client:1.40.0 and com.google.cloud.bigdataoss:gcsio:2.2.2 were leading to jobs reading Parquet getting stuck. The mitigation for 0.11.2 is to pin google-http-client to 1.39.2 in your build.sbt: scala dependencyOverrides ++= Seq( "com.google.http-client" % "google-http-client" % "1.39.2" )

Drop Scala 2.11, add Scala 2.13 support
Remove deprecated modules scio-cassandra2 and scio-elasticsearch2
Remove deprecated methods since 0.8.0
Switch from Algebird Hash128[K] to Guava Funnel[K] for Bloom filter and sparse transforms

New Magnolia based Coders derivation
New ScioIO replaces TestIO[T] to simplify IO implementation and stubbing in JobTest

BigQueryIO in JobTest now requires a type parameter which could be either TableRow for JSON or T for type-safe API where T is a type annotated with @BigQueryType. Explicit .map(T.toTableRow) of test data is no longer needed. See changes in BigQueryTornadoesTest and TypedBigQueryTornadoesTest for more.
Typed AvroIO now accepts case classes instead of Avro records in JobTest. Explicit .map(T.toGenericRecord) of test data is no longer needed. See this change for more.
Package com.spotify.scio.extra.transforms is moved from scio-extra to scio-core, under com.spotify.scio.transforms.

Accumulators are replaced by the new metrics API, see MetricsExample for more
com.spotify.scio.hdfs package and related APIs (ScioContext#hdfs*, SCollection#saveAsHdfs*) are removed, regular file IO API should now support both GCS and HDFS (if scio-hdfs is included as a dependency).
Starting Scio 0.4.4, Beam runner is completely decoupled from scio-core. See Runners page for more details.

See this page for a list of breaking changes from Dataflow Java SDK to Beam
Scala 2.10 is dropped, 2.11 and 2.12 are the supported Scala binary versions
Java 7 is dropped and Java 8+ is required
DataflowPipelineRunner is renamed to DataflowRunner
DirectPipelineRunner is renamed to DirectRunner
BlockingDataflowPipelineRunner is removed and ScioContext#close() will not block execution; use sc.run().waitUntilDone() to retain the blocking behavior, i.e. if you launch job from an orchestration engine like Airflow or Luigi
You should set tempLocation instead of stagingLocation regardless of runner; set it to a local path for DirectRunner or a GCS path for DataflowRunner; if not set, DataflowRunner will create a default bucket for the project
Type safe BigQuery is now stable API; use import com.spotify.scio.bigquery._ instead of import com.spotify.scio.experimental._
scio-bigtable no longer depends on HBase and uses Protobuf based Bigtable API; check out the updated example
Custom IO, i.e. ScioContext#customInput and SCollection#saveAsCustomOutput require a name: String parameter

0.14.14-18-324867e-20250402T151020Z*