Book Image

Mastering RabbitMQ

By : Yusuf Aytas, Emrah Ayanoglu, Dotan Nahum
Book Image

Mastering RabbitMQ

By: Yusuf Aytas, Emrah Ayanoglu, Dotan Nahum

Overview of this book

RabbitMQ is one of the most powerful Open Source message broker software, which is widely used in tech companies such as Mozilla, VMware, Google, AT&T, and so on. RabbitMQ gives you lots of fantastic and easy-to-manage functionalities to control and manage the messaging facility with lots of community support. As scalability is one of our major modern problems, messaging with RabbitMQ is the main part of the solution to this problem This book explains and demonstrates the RabbitMQ server in a detailed way. It provides you with lots of real-world examples and advanced solutions to tackle the scalability issues. You’ll begin your journey with the installation and configuration of the RabbitMQ server, while also being given specific details pertaining to the subject. Next, you’ll study the major problems that our server faces, including scalability and high availability, and try to get the solutions for both of these issues by using the RabbitMQ mechanisms. Following on from this, you’ll get to design and develop your own plugins using the Erlang language and RabbitMQ’s internal API. This knowledge will help you to start with the management and monitoring of the messages, tools, and applications. You’ll also gain an understanding of the security and integrity of the messaging facilities that RabbitMQ provides. In the last few chapters, you will build and keep track of your clients (senders and receivers) using Java, Python, and C#.
Table of Contents (18 chapters)
Mastering RabbitMQ
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Introducing the web scraper


Let's review a simple web scraper architecture:

Scheduler

The web changes often. It is a huge and dynamic beast. The scheduler is responsible to make sure that the scraper will always represent data that is fresh and not stale. It is free to do so by deciding at what rate to scrape it for each website or the page that is being scraped; in other words, when is the next scraping going to happen.

In reality, you would want the scheduler to feed from a persistent data store that holds all sources and their upcoming scraping time.

For example, you could hold a record that specifies that the website acme.org will have to be scraped once every 5 minutes. You could even pour some more sophistication into it. You can state that acme.org has to be scraped every 5 minutes at day time, but at night time, in order to save your resources, a 30-minute cycle would be good enough.

Whatever your scheduling policy is, it is encapsulated within the Scheduler domain.

Scraper

A scraper is...