Binary

Read Binary files

See read as binary for reading an entire file as a single binary record.

Binary reads are supported via the binaryFile, with a BinaryFileReader instance provided that can parse the underlying binary file format.

import com.spotify.scio.ScioContext
import com.spotify.scio.io.BinaryIO.BinaryFileReader

val sc: ScioContext = ???
val myBinaryFileReader: BinaryFileReader = ???
sc.binaryFile("gs://<input-dir>", myBinaryFileReader)

The complexity of the reader is determined by the complexity of the input format. See BinaryInOut for a fully-worked example.

Write Binary files

Binary writes are supported on SCollection[Array[Byte]] with the saveAsBinaryFile method:

import com.spotify.scio.values.SCollection

val byteArrays: SCollection[Array[Byte]] = ???
byteArrays.saveAsBinaryFile("gs://<output-dir>")

A static header and footer argument are provided, along with the framing parameters framePrefix and frameSuffix. In this example, we record a magic number in the header along with the number of records in the file and a magic number in the footer.

import com.spotify.scio.values.SCollection
import java.nio.ByteBuffer

def intToPaddedArray(i: Int) = ByteBuffer.allocate(4).putInt(i).array()

val byteArrays: SCollection[Array[Byte]] = ???
byteArrays.saveAsBinaryFile(
  "gs://<output-dir>",
  header = Array(1, 2, 3),
  footer = Array(4, 5, 6),
  framePrefix = arr => intToPaddedArray(arr.length),
  frameSuffix = _ => Array(0)
)

See also the object file format, which saves binary data in an avro container.