ScioIO
Scio 0.7.0 introduces a new ScioIO[T] trait to simplify IO implementation and stubbing in JobTest. This page lists some major changes to this new API.
Dependencies
Avro and BigQuery logic was decoupled from scio-core as part of the refactor.
- Before
0.7.0 scio-coredepends onscio-avroandscio-bigqueryScioContextandSCollection[T]include Avro, object, Protobuf and BigQuery IO methods out of the box- After
0.7.0 scio-coreno longer depends onscio-avroandscio-bigquery- Import
com.spotify.scio.avro._to get Avro, object, Protobuf IO methods onScioContextandSCollection[T] - Import
com.spotify.scio.bigquery._to get BigQuery IO methods onScioContextandSCollection[T]
ScioIO[T] for JobTest
As part of the refactor TestIO[T] was replaced by ScioIO[T] for JobTest. Some of them were moved to different packages for consistency but most test code should work with minor import changes. Below is a list of ScioIO[T] implementations.
com.spotify.scio.avroAvroIO[T]ObjectFileIO[T]ProtobufIO[T]com.spotify.scio.bigqueryBigQueryIO[T]TableRowJsonIOwhereT =:= TableRowcom.spotify.scio.ioDatastoreIOwhereT =:= EntityPubsubIO[T]TextIOwhereT =:= StringCustomIO[T]for use withScioContext#customInputandSCollection#customOutputcom.spotify.scio.bigtableBigtableIO[T]whereT =:= Rowfor input andT =:= Mutationfor output- This replaces
BigtableInputandBigtableOutput com.spotify.scio.cassandraCassandraIO[T]com.spotify.scio.elasticsearchElasticsearchIO[T]com.spotify.scio.extra.jsonJsonIO[T]com.spotify.scio.jdbcJdbcIO[T]com.spotify.scio.parquet.avroParquetAvroIO[T]com.spotify.scio.spannerSpannerIO[T]com.spotify.scio.tensorflowTFRecordIOwhereT =:= Array[Byte]TFExampleIOwhereT =:= Example
Using ScioIO[T] directly
2 methods, ScioContext#read and SCollection#write were added to leverage ScioIO[T] directly without needing the extra ScioContext#{textFile,AvroFile,...} and SCollection#saveAs{TextFile,AvroFile,...} syntactic sugar. See WordCountScioIO and WordCountScioIOTest for concrete examples.
0.14.19-23-4daeffd-20251023T204536Z*