Sources are places where the DataSet API expects to get its data from. It could in the form of a file or from Java collections. This is the second step in the Flink program's anatomy. DataSet API supports a number of pre-implemented data source functions. It also supports writing custom data source functions so anything that is not supported can be programmed easily. First let's try to understand the built-in source functions.
Flink supports reading data from files. It reads data line by line and returns it as strings. The following are built-in functions you can use to read data:
readTextFile(Stringpath)
: This reads data from a file specified in the path. By default it will readTextInputFormat
and will read strings line by line.readTextFileWithValue(Stringpath)
: This reads data from a file specified in the path. It returnsStringValues
.StringValues
are mutable strings.readCsvFile(Stringpath
): This reads data from comma separated files. It returns the Java POJOs or...