Object file

“Object files” can be used to save an SCollection of records with an arbitrary type by using Beam’s coder infrastructure. Each record is encoded to a byte array by the available Beam coder, the bytes are then wrapped in a simple Avro record containing a single byte field, then saved to disk.

Object files are convenient for ad-hoc work, but it should be preferred to use a real schema-backed format when possible.

Reading object files

Object files can be read via objectFile:

import com.spotify.scio._
import com.spotify.scio.avro._
import com.spotify.scio.values.SCollection

case class A(i: Int, s: String)

val sc: ScioContext = ???
val elements: SCollection[A] = sc.objectFile("gs://<input-path>/*.obj.avro")

Writing object files

Object files can be written via saveAsObjectFile:

import com.spotify.scio._
import com.spotify.scio.avro._
import com.spotify.scio.values.SCollection

case class A(i: Int, s: String)

val elements: SCollection[A] = ???
elements.saveAsObjectFile("gs://<output-path>")