Book Image

PHP 5 CMS Framework Development - 2nd Edition

By : Martin Brampton
Book Image

PHP 5 CMS Framework Development - 2nd Edition

By: Martin Brampton

Overview of this book

If you want an insight into the critical design issues and programming techniques required for a web oriented framework in PHP5, this book will be invaluable. Whether you want to build your own CMS style framework, want to understand how such frameworks are created, or simply want to review advanced PHP5 software development techniques, this book is for you.As a former development team leader on the renowned Mambo open-source content management system, author Martin Brampton offers unique insight and practical guidance into the problem of building an architecture for a web oriented framework or content management system, using the latest versions of popular web scripting language PHP.The scene-setting first chapter describes the evolution of PHP frameworks designed to support web sites by acting as content management systems. It reviews the critical and desirable features of such systems, followed by an overview of the technology and a review of the technical environment.Following chapters look at particular topics, with:• A concise statement of the problem • Discussion of the important design issues and problems faced • Creation of the framework solution At every point, there is an emphasis on effectiveness, efficiency and security – all vital attributes for sound web systems. By and large these are achieved through thoughtful design and careful implementation. Early chapters look at the best ways to handle some fundamental issues such as the automatic loading of code modules and interfaces to database systems. Digging deeper into the problems that are driven by web requirements, following chapters go deeply into session handling, caches, and access control. New for this edition is a chapter discussing the transformation of URLs to turn ugly query strings into readable strings that are believed to be more “search engine friendly” and are certainly more user friendly. This topic is then extended into a review of ways to handle “friendly” URLs without going through query strings, and how to build RESTful interfaces. The final chapter discusses the key issues that affect a wide range of specific content handlers and explores a practical example in detail.
Table of Contents (24 chapters)
PHP 5 CMS Framework Development
Second Edition
Credits
About the Author
Acknowledgement
About the Reviewers
Preface
Packaging Extensions
Packaging XML Example

The CMS environment


It is time now to consider the web environment. While all software has common features, writing for the Web involves considerations that are not found in longer established application areas.

Hosting the CMS

A huge range of hosting services exists, with costs ranging from zero upwards. Quality varies enormously, and is not always related to price. It is not easy to choose a hosting service, as the information given by rival providers is only part of the picture. It is difficult to offer general advice on the topic, but there is one issue that frequently causes problems with advanced systems such as a CMS, particularly where a web interface is provided for management.

This is the question of how to manage permissions for files and directories. The majority of hosting runs on Linux servers and therefore UNIX permission principles apply. The scheme is simple enough in concept, with permissions given separately for the owner of the file or directory, the group of which the owner is a member, and everyone else.

But there are some twists to this that make matters more difficult. From the way UNIX permissions work, it is clear that the situation of a particular file or directory depends on who owns it; only then is it possible to see what the permissions mean in practice. The web serving software, usually Apache, runs by default as a special user for whom a variety of names are used, including apache, nobody, www-data, or many other alternatives. At the same time, the site owner is given access through FTP, usually with the alternative of a file manager. The site owner is a quite different user from the web server.

Why does this make a difference? Problems arise because maintenance operations directly performed by the site owner, create files belonging to one user, while maintenance operations (including the installation of extensions) carried out through the web interface, create files owned by the Apache user. Even if all the files have the same nominal permissions (usually expressed in octal numbers, such as 0644) the actual ability to handle the files will vary according to the owner. Generally, if you are not the owner of a file, you will not be able to change the ownership or permissions of that file, so it is frequently impossible to change any of the permissions on a file created through the web management interface.

A rather crude solution is to give everyone all rights to every file, but that may lead to weak security. Another solution is to avoid using FTP or file manager, and instead rely on web interfaces for all operations, which may not always be possible.

My strongly preferred solution is to insist on some mechanism that runs the PHP programs making up the website under the ownership of the site owner. Apache is capable of switching who is the active user when it comes to running a script, and there are various schemes for applying this in a PHP environment. All involve some degree of overhead, but good implementations keep this to an acceptable minimum. The benefit is a much smoother running site with far fewer issues over permissions, because all files are now under the ownership of the site owner, whether created directly or through a web interface.

Using this configuration is also a good solution to the security problems that can arise in shared hosting, where the actions of other customers of the hosting provider can cause damage. This may be accidental rather than malicious, but I have had whole sites demolished by another user's faulty script. It's not an experience to be recommended! Normally, in my preferred configuration, you also have to watch out for files that give write permission to "others" as they are blocked from being executed as a security feature.

In general, hosting companies are keen to host whatever they can get, so as a customer you need to ask questions to find out whether you will get what you really need for your CMS.

Basic browser matters

To build any web application, we have to make some assumptions about what will happen at the browser. This is made complicated by the existence of many different browsers, each with its own peculiarities. Most of these relate to the details of XHTML and CSS usage, but there are some broad questions of usage that we can review now.

One is to adopt a policy on the use of JavaScript. It is certainly possible to improve the responsiveness of web applications by the use of a browser-based scripting language. The code runs on the visitor's own computer, rather than always having to go back to the server to run code. For some applications, such as WYSIWYG editors, it is impractical to use anything other than mechanisms that exist in the browser. Although there are various options for browser scripting, the most widely used is JavaScript.

There are problems over standardization with JavaScript, but most of all there is an accessibility problem. Not everyone is running a browser that will handle JavaScript, and in particular, screen readers used by people who cannot read information from a screen usually do not do so. The developments described here do not, therefore, rely on JavaScript to any significant extent. Relative to predecessor systems, Aliro is much less dependent on its use. No doubt improvements can be made by reintroducing more JavaScript, but as a matter of policy this should be done in a way that supports graceful degradation for visitors (and this should include site administrators) who cannot make use of it. The lack of JavaScript should not block access to any facility that could possibly be delivered some other way.

Another general consideration is the use of cookies. Despite scare stories soon after their introduction, appropriate use of cookies is now considered perfectly normal. The major exception we will encounter is the search bots that crawl the net looking at web pages and refusing cookies. Otherwise, since we are interested in building an advanced CMS, and critical features such as the ability to allow users to log in or shoppers to build up a shopping cart cannot be provided securely without cookies, we assume that cookies will be accepted. That is not to say a visitor who refuses cookies will be blocked, only that the services they receive will be restricted.

Security of a CMS

The possibility of having sessions without the use of cookies is disregarded for reasons given in Chapter 5, Sessions and Users. Software has always needed to be robust, but increasing involvement with people raises the stakes. Long ago, when software ran in a closed computer room, attended only by specialist operators, security was a simple issue. As software became exposed to direct interactions with users, so the security questions increased. But, as everyone knows, the internet has raised the issue to a completely new level on account of the existence of significant number of people who may damage a service. Some of the damage has been done out of simple curiosity, but a lot is now caused in pursuit of money making schemes that abuse internet facilities in one way or another.

There is controversy over whether "hacking" means breaking into computer systems or a certain approach to software development. To avoid this misunderstanding, I have used the alternative term "cracking" and "crackers" to refer to abusive actions and actors respectively. Cracking is now so prevalent that we need to start thinking about security before we get into any serious coding at all. Not only are weaknesses in Web software likely to be found and exploited, crackers use software tools that are quite as sophisticated as any of the applications that are subjected to cracking. It may not be nice, but it is a reality.

Software developers differ in their approach to security. Some take the view that, as professional developers, they have taken the trouble to know how to build secure software, and that is all there is to be said. Personally, I disagree with this approach, and prefer to think in terms of placing obstacles in the way of crackers. While writing the code, it may seem to be placing an impassable obstacle, but crackers are ingenious and find unexpected routes to evade obstacles. The regular appearance of security loopholes in major software projects demonstrates that total security is extremely hard to attain. Moreover, it is in the nature of a CMS that it is likely to have code added to it by different authors, and it may be that not all are as security aware as the original CMS creator. So anything that makes a significant contribution to increasing the difficulty of cracking is worth considering for inclusion.

Much old PHP code runs in an environment making extensive use of global data. Either the code is run at the top-level, not inside a function or class, so that variables are automatically global. This means that two separate PHP files will share data without any specific declaration, simply by the use of common variable names. In the worst cases, this is combined with reliance on "register globals". That is a PHP capability that automatically places values returned in URI strings or forms into PHP variables. In the days of innocence before cracking was rife, it seemed a nice way to make coding easier. Nowadays, it is the cause of many cracks and every effort is being made to eliminate it.

Aliro adopts a thoroughgoing class architecture, not least because of the contribution this makes to security. The entire system contains only six lines of code at the global level. There are very few functions at the global level; mostly they are used in the language system, and work as functions because they are needed so frequently that they would be clumsy as class methods. The rest of the system consists entirely of classes.

Classes have the considerable merit that their code does not run until the class is invoked. Many cracks have involved loading PHP code in a way that was never intended and causing it to execute in a compromised way. That cannot happen with classes, because loading the code of a class simply makes the class known to PHP, it does not cause any code to execute (unless the file that is loaded has code outside the class). In a totally class-based system, control of what is executed is guaranteed to follow a logical path from the original starting point, typically in an index.php file. Use of class methods can be controlled with PHP5 features, so that wherever possible they are designated as internal to the class and may not be used from outside. Even where methods are public, they are tightly associated with the environment of a particular class.

No single step will ever eliminate security problems. But writing systems entirely out of classes makes a useful contribution, quite apart from its benefits in quality of code. This imposes a requirement on a general CMS framework, which is the effective handling of the classes belonging to extensions. That is solved in the next chapter.

Some CMS terminology

There is scope for improvement and standardization in the terminology that is used in relation to content management systems. Unfortunately, it is difficult for one person to achieve much in this direction. This book is written within the tradition established by Mambo and, although I have made some attempts to clarify particularly confusing areas, the text largely conforms to convention. The names used in code examples are firmly linked to the traditional terminology, and altering the text while leaving the code in older terms would have been too confusing.

So, it is perhaps worth defining the major terms here, before we move onto any CMS details. The main CMS has been called the "core", although definitions of its boundary vary. Major extensions that are added to the CMS have been called components or applications, and could be likened to whole web applications. Minor extensions usually create small screen boxes with useful information, and have been called modules. The more pluggable units of code that can be triggered in a variety of ways not directly related to what appears on the screen were called mambots in Mambo, and are more generally referred to as plugins.

In an attempt to clarify what happens as different pieces of code work together to create the browser display, I have talked about blocks and boxes. Modules are pieces of code that create boxes, and they are grouped together to form boxes which are named portions of the browser display. One module may create multiple boxes on the same or different displays.

The styling of the site, or of pages within the site, is achieved by a collection of PHP, CSS, and images, which have been known collectively as a template. Some people prefer to keep the term "template" to describe only the code that is directly involved in determining a layout. So, although the code examples stick with the name "template", another term whose popularity is increasing is also used, and the packages are also called "themes".