Binary
Read Binary files
See read entire file as binary for reading an entire file as a single binary record.
Binary reads are supported via the binaryFile
, with a BinaryFileReader
instance provided that can parse the underlying binary file format.
import com.spotify.scio.ScioContext
import com.spotify.scio.io.BinaryIO.BinaryFileReader
val sc: ScioContext = ???
val myBinaryFileReader: BinaryFileReader = ???
sc.binaryFile("gs://<input-dir>", myBinaryFileReader)
The complexity of the reader is determined by the complexity of the input format. See BinaryInOut for a fully-worked example.
Write Binary files
Binary writes are supported on SCollection[Array[Byte]]
with the saveAsBinaryFile
method:
import com.spotify.scio.values.SCollection
val byteArrays: SCollection[Array[Byte]] = ???
byteArrays.saveAsBinaryFile("gs://<output-dir>")
A static header
and footer
argument are provided, along with the framing parameters framePrefix
and frameSuffix
. In this example, we record a magic number in the header along with the number of records in the file and a magic number in the footer.
import com.spotify.scio.values.SCollection
import java.nio.ByteBuffer
def intToPaddedArray(i: Int) = ByteBuffer.allocate(4).putInt(i).array()
val byteArrays: SCollection[Array[Byte]] = ???
byteArrays.saveAsBinaryFile(
"gs://<output-dir>",
header = Array(1, 2, 3),
footer = Array(4, 5, 6),
framePrefix = arr => intToPaddedArray(arr.length),
frameSuffix = _ => Array(0)
)
See also the object file format, which saves binary data in an avro container.