When tuning an application with custom operators, some common Java coding practices can act as hidden performance drains so developers should avoid them as far as possible. We've already mentioned one earlier, in passing—inadvertently including a large number of fields (or fields whose values are large) of an operator in the state by not adding the transient
modifier. As the size of the state increases, serializing it for every checkpoint can become a hidden bottleneck. Generally speaking, if a field is cleared for every streaming or application window, it does not need to be part of the state.
A second practice is the per-tuple use of reflection (or the use of Maps and other Java collections) which is an expensive operation; this is often done when the type of the tuple is not known at compile time, so just Object
is used. For such cases, Apex provides a utility class called PojoUtils
which can be used to create custom getter and setter methods...