Breaking Changelog

Breaking changes since 0.13.0

  • Removed scio-elasticsearch6
  • Migrated scio-elasticsearch7 to new java client
  • Changed skewedJoin API (scalafix rule provided)
  • New File based ScioIO parameters (notably suffix in the read params)
  • Removal of unused type parameter on tensorflow predict and predictWithSigDef

Breaking changes since 0.12.0 (v0.12.0 Migration Guide)

  • Removed com.spotify.scio.extra.bigquery
  • Removed com.spotify.scio.pubsub specializations
  • Changed type signatures of SMB methods to accommodate secondary-keyed SMB
  • Removed beam-sql support

Important changes in 0.11.3

  • Fixed a severe Parquet IO issue introduced in 0.11.2. Incompatible versions of com.google.http-client:google-http-client:1.40.0 and com.google.cloud.bigdataoss:gcsio:2.2.2 were leading to jobs reading Parquet getting stuck. The mitigation for 0.11.2 is to pin google-http-client to 1.39.2 in your build.sbt: scala dependencyOverrides ++= Seq( "com.google.http-client" % "google-http-client" % "1.39.2" )

Breaking changes since Scio 0.10.0 (v0.10.0 Migration Guide)

  • Move GCP modules to scio-google-cloud-platform
  • Simplify coder implicits

Breaking changes since Scio 0.9.0 (v0.9.0 Migration Guide)

  • Drop Scala 2.11, add Scala 2.13 support
  • Remove deprecated modules scio-cassandra2 and scio-elasticsearch2
  • Remove deprecated methods since 0.8.0
  • Switch from Algebird Hash128[K] to Guava Funnel[K] for Bloom filter and sparse transforms

Breaking changes since Scio 0.8.0 (v0.8.0 Migration Guide)

  • ScioIOs no longer return Future
  • ScioContext#close returns ScioExecutionContext instead of ScioResult
  • Async DoFn refactor
  • Deprecate scio-cassandra2 and scio-elasticsearch2
  • ContextAndArgs#typed no longer accepts list-case #2221

Breaking changes since Scio 0.7.0 (v0.7.0 Migration Guide)

  • New Magnolia based Coders derivation
  • New ScioIO replaces TestIO[T] to simplify IO implementation and stubbing in JobTest

Breaking changes since Scio 0.6.0

  • scio-cassandra2 now requires Cassandra 2.2 instead of 2.0

Breaking changes since Scio 0.5.0

  • BigQueryIO in JobTest now requires a type parameter which could be either TableRow for JSON or T for type-safe API where T is a type annotated with @BigQueryType. Explicit .map(T.toTableRow) of test data is no longer needed. See changes in BigQueryTornadoesTest and TypedBigQueryTornadoesTest for more.
  • Typed AvroIO now accepts case classes instead of Avro records in JobTest. Explicit .map(T.toGenericRecord) of test data is no longer needed. See this change for more.
  • Package com.spotify.scio.extra.transforms is moved from scio-extra to scio-core, under com.spotify.scio.transforms.

Breaking changes since Scio 0.4.0

  • Accumulators are replaced by the new metrics API, see MetricsExample for more
  • com.spotify.scio.hdfs package and related APIs (ScioContext#hdfs*, SCollection#saveAsHdfs*) are removed, regular file IO API should now support both GCS and HDFS (if scio-hdfs is included as a dependency).
  • Starting Scio 0.4.4, Beam runner is completely decoupled from scio-core. See Runners page for more details.

Breaking changes since Scio 0.3.0

  • See this page for a list of breaking changes from Dataflow Java SDK to Beam
  • Scala 2.10 is dropped, 2.11 and 2.12 are the supported Scala binary versions
  • Java 7 is dropped and Java 8+ is required
  • DataflowPipelineRunner is renamed to DataflowRunner
  • DirectPipelineRunner is renamed to DirectRunner
  • BlockingDataflowPipelineRunner is removed and ScioContext#close() will not block execution; use sc.run().waitUntilDone() to retain the blocking behavior, i.e. if you launch job from an orchestration engine like Airflow or Luigi
  • You should set tempLocation instead of stagingLocation regardless of runner; set it to a local path for DirectRunner or a GCS path for DataflowRunner; if not set, DataflowRunner will create a default bucket for the project
  • Type safe BigQuery is now stable API; use import com.spotify.scio.bigquery._ instead of import com.spotify.scio.experimental._
  • scio-bigtable no longer depends on HBase and uses Protobuf based Bigtable API; check out the updated example
  • Custom IO, i.e. ScioContext#customInput and SCollection#saveAsCustomOutput require a name: String parameter