Scio v0.14.0
Coders
Some coders have moved away from the default implicit scope. By updating to scio 0.14, you may encounter the following error:
Cannot find an implicit Coder instance for type:
...
If the type is or contains an Avro class (either GenericRecord
or a SpecificRecord
implementation), you can import the com.spotify.scio.avro._
package to get the implicit avro coders back in scope. This is likely to happen if you are using any readAsAvro..
API or AvroSortedBucketIO
from scio-smb
. See avro removed from core for more details.
If the type relied on a fallback coder, we advise you to create a custom coder. See coders for more details. If you want to use kryo implicit fallback coders as before, this requires now to import com.spotify.scio.coders.kryo._
explicitly.
Avro removed from core
Avro coders are now a part of the com.spotify.scio.avro
package.
import com.spotify.scio.avro._
Update direct usage:
- Coder.avroGenericRecordCoder(schema)
+ avroGenericRecordCoder(schema)
- Coder.avroGenericRecordCoder
+ avroGenericRecordCoder
- Coder.avroSpecificRecordCoder[T]
+ avroSpecificRecordCoder[T]
- Coder.avroSpecificFixedCoder[U]
+ avroSpecificFixedCoder[U]
Dynamic avro and protobuf writes are now in com.spotify.scio.avro.dynamic
. If using saveAsDynamicAvroFile
or saveAsDynamicProtobufFile
, add the following:
import com.spotify.scio.avro.dynamic._
Avro schemas are now in com.spotify.scio.avro.schemas
package:
import com.spotify.scio.avro.schemas._
Materialize no longer splittable
Materialize was previously implemented using an Avro wrapper around byte arrays. To keep materialize
in scio-core
it has been reimplemented with saveAsBinaryFile
, which writes a sequence of records with no sub-file blocks, and thus does not support trivially splitting the file on read. We have found little use of materialize for large datasets that are not also saved permanently, so we expect the impact of this change to be minimal.
New binaryFile
read
See the relevant binaryFile
scaladoc and example BinaryInOut.
parquet-tensorflow metadata
When using tensorflow with scio-parquet
, you must now depend on scio-tensorflow
as well.
The parquet-tensorflow API has been migrated from custom parquet-extra
to the official metadata API.
schema
and projection
are now of type org.tensorflow.metadata.v0.Schema
.
scio-smb provided implementations
When using scio-smb
, you also need to depend on the scio module that provides the file format implementation you want to use. See Sort-Merge-Bucket for more details.