In Chapter 8, Processing Massive Datasets with Parallel Streams - The Map and Reduce Model, you learned how to implement a search tool to look for the documents similar to an input query using an inverted index. This data structure makes the search operation easier and faster, but there will be situations where you will have to make a search operation over a big set of data and you won't have an inverted index to help you. In these cases, you have to process all the elements of the dataset to get the correct results. In this example, you will see one of these situations and how the
reduce() method of the
Stream API can help you.
To implement this example, you will use a subset of the Amazon product co-purchasing network metadata that includes information about 548,552 products sold by Amazon, which includes title, salesrank, and the lists of similar products, categories, and reviews. You can download this dataset from https://snap.stanford...