Web logs is data generated by web servers running a website. This use case is applicable to domains where companies have their websites hosted and want to know more about their website performance and customer behavior on the website.
To perform this recipe, you should have an up and running Hadoop cluster. I have uploaded the data of some sample web logs from
https://github.com/deshpandetanmay/hadoop-real-world-cookbook/blob/master/data/mylog.txt.
Before jumping into the solution, let's first try to understand the problem statement:
Many companies run businesses on their websites. Their website performance decides the sales or profitability. Web servers generally log information about the user, browser, IP address, and so on. We can use this information in order to make the website browsing experience smoother for users, which would help increase profitability.