In this recipe, we will learn how to use Pig scripts to analyze web log data.
To perform this recipe, you should have a running Hadoop cluster as well as the latest version of Pig installed on it.
In the previous chapter, we saw how to analyze web logs using the MapReduce program. In this recipe, we are going to take a look at how to use Pig scripts to analyze web log data. Let's consider two use cases:
Here is a sample of web log data:
106.208.17.105 - - [12/Nov/2015:21:20:32 -0800] "GET /tutorials/mapreduce/advanced-map-reduce-examples-1.html HTTP/1.1" 200 0 "https://www.google.co.in/" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36" 60.250.32.153 - - [12/Nov/2015:21:42:14 -0800] "GET /tutorials/elasticsearch/install-elasticsearch-kibana-logstash-on-windows.html HTTP/1.1" 304 0 - "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490...