As we have discussed different sources for Flink, now we will discuss the processing of source data. Depending upon the type of data sources, that is bounded or unbounded, processing API's are available in Flink. DataStream API is available for unbounded data source and DataSet API is available for bounded data sources.
Before moving to transformation functions, let's have a look at an example of DataStream. The following code snippet is a word count example implemented using DataStream API:
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream<Tuple2<String, Integer>> dataStream = env .socketTextStream("localhost", 9999) .flatMap(new Splitter()) .keyBy(0) .timeWindow(Time.seconds(5)) .sum(1);
StreamExecutionEnviornment
is used to create a DataStream Object. In the previous example, the environment first connects with the source of data using a socket. Then we apply flatMap
function which...