Book Image

Drupal Search Engine Optimization

By : Ric Shreves
Book Image

Drupal Search Engine Optimization

By: Ric Shreves

Overview of this book

Drupal is a free and open-source content management system and content management framework written in PHP and distributed under the GNU General Public License. It is used as a back-end system for at least 1.5% of all websites worldwide ranging from personal blogs to corporate, political, and government sites. SEO, or Search Engine Optimization, is the process and techniques by which you optmize the content and style of your site in order to induce more people to view it. Drupal SEO will help you develop and execute an effective search engine optimization strategy for your site. From planning to implementation, the book covers best practices in contemporary SEO. In Drupal SEO you will learn how to develop a dynamic and productive SEO campaign. Covering both the basics of campaign development as well as the daily work it takes to maintain your SEO competitiveness, this book will show you how to produce a distinct and appropriate strategy for your site. In particular you will learn key phrase selection and competitor analysis and the correct groundwork for your dynamic SEO campaign. Drupal SEO will then show you, by finding the right combination of extensions, how to supercharge your site. You will also be given a guided tour of key SEO services, like Google and Bing Webmaster, in order to implement a progressive and effective link building campaign. You will then learn key expert tips and tricks to enable you to build SEO-effective content which will take your site from invisible to unmissable with little effort.
Table of Contents (11 chapters)

How search engines assess sites?


Search engines all function in approximately the same fashion—a software agent, known as a bot, spider, or crawler, visits a page, gathers the content, and stores it in the search engine's data repository. Once the information is in the repository, it is indexed. The crawling and indexing processes are constant and on-going. Each of the major search engines maintain multiple crawlers that work tirelessly to refresh their index. The spiders find new pages by a variety of methods, typically including XML sitemaps, URLs already in the index, links to pages discovered while indexing, and URLs submitted for inclusion by users. How frequently they visit a specific site, and how deeply they spider the site on each visit, varies.

When a user visits the search engine and runs a search, the search engine extracts (from the search engine's index) a list of pages that are relevant to the query and then displays that list of pages to the user. The output on the search results page is defined according to each search engine's own criteria. The ranking methodology used by each engine is the result of the search engine's secret algorithm.

The search engine's crawler is primarily interested in certain types of information on the page, particularly the URL, the text, and the links on the page. Formatting is not indexed. Images and other media are indexed by most search engines, but to varying degrees of depth. Some types of media, such as Flash or attached files, are rarely indexed, though there are exceptions.

Note

Seeing what the spider sees

If you have a Google Webmaster account, you can see a web page exactly as the Googlebot (the name of the Google crawler) sees it. To do this, log in to Google Webmaster Tools (http://www.google.com/webmasters/) and click on a site profile. In the navigation menu on the left, select the Diagnostics menu and then select the option Fetch as Googlebot . Type the URL of the page you want to see and after a delay, the system will produce the results. You can see a webpage, as shown in the following screenshot, followed by the Googlebot's view of the same page:

The following is the spider's view of the same page: