Book Image

Apache Spark Graph Processing

Book Image

Apache Spark Graph Processing

Overview of this book

Table of Contents (16 chapters)
Apache Spark Graph Processing
Credits
Foreword
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Performance optimization


In addition to the sendMsg and mergeMsg methods, aggregateMessages can also take an optional argument TripletFields, which indicates what data is accessed in EdgeContext. The main reason for explicitly specifying such information is to help optimize the performance of the aggregateMessages operation.

In fact, TripletFields represents a subset of the fields of _EdgeTriplet_ and it enables GraphX to populate only those fields that are necessary.

The default value is TripletFields.All, which means that the sendMsg function may access any of the fields in the EdgeContext class. Otherwise, the TripletFields argument is used to tell GraphX that only part of EdgeContext will be required so that an efficient join strategy can be used. All possible options for the TripletFields are listed as follows:

  • TripletFields.All: This option exposes all the fields (source, edge, and destination)

  • TripletFields.Dst: This one exposes the destination and edge fields but not the source field...