ReadFiles
Scio supports reading file paths from an SCollection[String]
into various formats.
Read as text lines
Reading to String
text lines via readFiles
:
import com.spotify.scio.ScioContext
import com.spotify.scio.values.SCollection
val sc: ScioContext = ???
val paths: SCollection[String] = ???
val fileBytes: SCollection[String] = paths.readFiles
Read entire file as String
Reading to String
text lines via readFilesAsString
:
import com.spotify.scio.ScioContext
import com.spotify.scio.values.SCollection
val sc: ScioContext = ???
val paths: SCollection[String] = ???
val fileBytes: SCollection[String] = paths.readFiles
Read as binary
Reading to binary Array[Byte]
via readFilesAsBytes
:
import com.spotify.scio.ScioContext
import com.spotify.scio.values.SCollection
val sc: ScioContext = ???
val paths: SCollection[String] = ???
val fileBytes: SCollection[Array[Byte]] = paths.readFilesAsBytes
Read as a custom type
Reading to a custom type with a user-defined function from FileIO.ReadableFile
to the output type via readFiles
:
import com.spotify.scio.ScioContext
import com.spotify.scio.values.SCollection
import org.apache.beam.sdk.{io => beam}
case class A(i: Int, s: String)
val sc: ScioContext = ???
val paths: SCollection[String] = ???
val userFn: beam.FileIO.ReadableFile => A = ???
val fileBytes: SCollection[A] = paths.readFiles(userFn)
Read with a Beam transform
If there is an existing beam PTransform
from FileIO.ReadableFile
to A
(as an example, beam’s TextIO.readFiles()
), this can be reused via another variant of readFiles
import com.spotify.scio.ScioContext
import com.spotify.scio.values.SCollection
import org.apache.beam.sdk.{io => beam}
import org.apache.beam.sdk.transforms.PTransform
import org.apache.beam.sdk.values.PCollection
case class A(i: Int, s: String)
val sc: ScioContext = ???
val paths: SCollection[String] = ???
val userTransform: PTransform[PCollection[beam.FileIO.ReadableFile], PCollection[A]] = ???
val fileBytes: SCollection[A] = paths.readFiles(userTransform)
0.14.3-36-9cdce42-20240418T151636Z*