Scio v0.14.0

Coders

Some coders have moved away from the default implicit scope. By updating to scio 0.14, you may encounter the following error:

Cannot find an implicit Coder instance for type:
...

If the type is or contains an Avro class (either GenericRecord or a SpecificRecord implementation), you can import the com.spotify.scio.avro._ package to get the implicit avro coders back in scope. This is likely to happen if you are using any readAsAvro.. API or AvroSortedBucketIO from scio-smb. See avro removed from core for more details.

If the type relied on a fallback coder, we advise you to create a custom coder. See coders for more details. If you want to use kryo implicit fallback coders as before, this requires now to import com.spotify.scio.coders.kryo._ explicitly.

Avro removed from core

Avro coders are now a part of the com.spotify.scio.avro package.

import com.spotify.scio.avro._

Update direct usage:

- Coder.avroGenericRecordCoder(schema)
+ avroGenericRecordCoder(schema)
- Coder.avroGenericRecordCoder
+ avroGenericRecordCoder
- Coder.avroSpecificRecordCoder[T]
+ avroSpecificRecordCoder[T]
- Coder.avroSpecificFixedCoder[U]
+ avroSpecificFixedCoder[U]

Dynamic avro and protobuf writes are now in com.spotify.scio.avro.dynamic. If using saveAsDynamicAvroFile or saveAsDynamicProtobufFile, add the following:

import com.spotify.scio.avro.dynamic._

Avro schemas are now in com.spotify.scio.avro.schemas package:

import com.spotify.scio.avro.schemas._

Materialize no longer splittable

Materialize was previously implemented using an Avro wrapper around byte arrays. To keep materialize in scio-core it has been reimplemented with saveAsBinaryFile, which writes a sequence of records with no sub-file blocks, and thus does not support trivially splitting the file on read. We have found little use of materialize for large datasets that are not also saved permanently, so we expect the impact of this change to be minimal.

New binaryFile read

See the relevant binaryFile scaladoc and example BinaryInOut.

parquet-tensorflow metadata

When using tensorflow with scio-parquet, you must now depend on scio-tensorflow as well.

The parquet-tensorflow API has been migrated from custom parquet-extra to the official metadata API.

schema and projection are now of type org.tensorflow.metadata.v0.Schema.

scio-smb provided implementations

When using scio-smb, you also need to depend on the scio module that provides the file format implementation you want to use. See Sort-Merge-Bucket for more details.