Object file
“Object files” can be used to save an SCollection
of records with an arbitrary type by using Beam’s coder infrastructure. Each record is encoded to a byte array by the available Beam coder, the bytes are then wrapped in a simple Avro record containing a single byte field, then saved to disk.
Object files are convenient for ad-hoc work, but it should be preferred to use a real schema-backed format when possible.
Reading object files
Object files can be read via objectFile
:
import com.spotify.scio._
import com.spotify.scio.avro._
import com.spotify.scio.values.SCollection
case class A(i: Int, s: String)
val sc: ScioContext = ???
val elements: SCollection[A] = sc.objectFile("gs://<input-path>/*.obj.avro")
Writing object files
Object files can be written via saveAsObjectFile
:
import com.spotify.scio._
import com.spotify.scio.avro._
import com.spotify.scio.values.SCollection
case class A(i: Int, s: String)
val elements: SCollection[A] = ???
elements.saveAsObjectFile("gs://<output-path>")
0.14.8-23-c45685a-20241105T161920Z*