Book Image

Modernizing Legacy Applications in PHP

By : Paul Jones
Book Image

Modernizing Legacy Applications in PHP

By: Paul Jones

Overview of this book

Have you noticed that your legacy PHP application is composed of page scripts placed directly in the document root of the web server? Or, do your page scripts, along with any other classes and functions, combine the concerns of model, view, and controller into the same scope? Is the majority of the logical flow incorporated as include files and global functions rather than class methods? Working with such a legacy application feels like dragging your feet through mud, doesn’t it?This book will show you how to modernize your application in terms of practice and technique, rather than in terms of using tools such as frameworks and libraries, by extracting and replacing its legacy artifacts. We will use a step-by-step approach, moving slowly and methodically, to improve your application from the ground up. We’ll show you how dependency injection can replace both the new and global dependencies. We’ll also show you how to change the presentation logic to view files and the action logic to a controller. Moreover, we’ll keep your application running the whole time. Each completed step in the process will keep your codebase fully operational with higher quality. When we are done, you will be able to breeze through your code like the wind. Your code will be autoloaded, dependency-injected, unit-tested, layer-separated, and front-controlled. Most of the very limited code we will add to your application is specific to this book. We will be improving ourselves as programmers, as well as improving the quality of our legacy application.
Table of Contents (35 chapters)
Modernizing Legacy Applications in PHP
Credits
Foreword
About the Author
Acknowledgement
www.PacktPub.com
Preface
Typical Legacy Page Script
Code before Gateways
Code after Gateways
Code after Transaction Scripts
Code before Collecting Presentation Logic
Code after Collecting Presentation Logic
Code after Response View File
Code after Controller Rearrangement
Code after Controller Extraction
Code after Controller Dependency Injection
Index

The typical PHP application


Most PHP developers are not formally trained as programmers, or are almost entirely self-taught. They often come to the language from other, usually non-technical, professions. Somehow or another, they are tasked with the duty of creating webpages because they are seen as the most technically-savvy person in their organization. Since PHP is such a forgiving language and grants a lot of power without a lot of discipline, it is very easy to produce working web pages and even applications without a lot of training.

These and other factors strongly influence the underlying foundation of the typical PHP application. They are usually not written in a popular full-stack framework or even a micro-framework. Instead, they are often a series of page scripts, placed directly in the web server document root, to which clients can browse directly. Any functionality that needs to be reused has been collected into a series of include files. There are include files for common configurations and settings, headers and footers, common forms and content, function definitions, navigation, and so on.

This reliance on include files in the typical PHP application is what makes me call them include-oriented architectures. The legacy application uses include calls everywhere to couple the pieces of the program into a single whole. This is in contrast to a class-oriented architecture, where even if the application does not adhere to good object-oriented programming principles, at least the behaviors are bundled into classes.

File Structure

The typical include-oriented PHP application generally looks something like this:

/path/to/docroot/
bin/                         # command-line tools
cache/                    # cache files
common/                # commonly-used include files
classes/                 # custom classes
Image.php            #
Template.php       #
functions/             # custom functions
db.php                 #
log.php                #
cache.php           #
setup.php            # configuration and setup
css/                     # stylesheets
img/                    # images
index.php           # home page script
js/                       # JavaScript
lib/                     # third-party libraries
log/                    # log files
page1.php        # other page scripts
page2.php        #
page3.php        #
sql/                   # schema migrations
sub/                  # sub-page scripts
index.php         #
subpage1.php #
subpage2.php #
theme/             # site theme files
header.php      # a header template
footer.php        # a footer template
nav.php           # a navigation template ~~

The structure shown is a simplified example. There are many possible variations. In some legacy applications, I have seen literally hundreds of main-level page scripts and dozens of subdirectories with their own unique hierarchies for additional pages. The key is that the legacy application is usually in the document root, has page scripts that users browse to directly, and uses include files to manage most program behavior instead of classes and objects.

Page Scripts

Legacy applications will use individual page scripts as the access point for public behavior. Each page script is responsible for setting up the global environment, performing the requested logic, and then delivering output to the client.

Appendix A, Typical Legacy Page Script contains a sanitized, anonymized version of a typical legacy page script from a real application. I have taken the liberty of making the indentation consistent (originally, the indents were somewhat random) and wrapping it at 60 characters so it fits better on e-reader screens. Go take a look at it now, but be careful. I won't be held liable if you go blind or experience post-traumatic stress as a result! As we examine it, we find all manner of issues that make maintenance and improvement difficult:

  • The include statements to execute setup and presentation logic

  • inline function definitions

  • global variables

  • model, view, and controller logic all combined in a single script

  • trusting user input

  • possible SQL injection vulnerabilities

  • possible cross-site scripting vulnerabilities

  • unquoted array keys generating notices

  • The if blocks not wrapped in braces (adding a line in the block later will not actually be part of the block)

  • copy-and-paste repetition

The Appendix A, Typical Legacy Page Script example is relatively tame as far as legacy page scripts go. I have seen other scripts where JavaScript and CSS code have been mixed in, along with remote-file inclusions and all sorts of security flaws. It is also only (!) about 400 lines long. I have seen page scripts that are thousands of lines long which generate several different page variations, all wrapped into a single switch statement with a dozen or more case conditions.

Rewrite or Refactor?

Many developers, when presented with a typical PHP application, are able to live with it for only so long before they want to scrap it and rewrite it from scratch. Nuke it from orbit; it's the only way to be sure! is the rallying cry of these enthusiastic and energetic programmers. Other developers, their enthusiasm drained by their death march experience, feel cautious and wary at such a suggestion. They are fully aware that the codebase is bad, but the devil (or in our case, code) they know is better than the devil they don't.

The Pros and Cons of Rewriting

A complete rewrite is a very tempting idea. Developers championing a rewrite feel like they will be able to do all the right things the first time through. They will be able to write unit tests, enforce best practices, separate concerns according to modern pattern definitions, and use the latest framework or even write their own framework (since they know best what their own needs are). Because the existing application can serve as a reference implementation, they feel confident that there will be little or no trial-and-error work in rewriting the application. The needed behaviors already exist; all the developers need to do is copy them to the new system. The behaviors that are difficult or impossible to implement in the existing system can be added on from the start as part of the rewrite.

As tempting as a rewrite sounds, it is fraught with many dangers. Joel Spolsky had this to say regarding the old Netscape Navigator web browser rewrite in 2000:

 

Netscape made the single worst strategic mistake that any software company can make by deciding to rewrite their code from scratch. Lou Montulli, one of the 5 programming superstars who did the original version of Navigator, emailed me to say, I agree completely, it's one of the major reasons I resigned from Netscape. This one decision cost Netscape 3 years. That's three years in which the company couldn't add new features, couldn't respond to the competitive threads from Internet Explorer, and had to sit on their hands while Microsoft completely ate their lunch.

 
 --Joel Spolsky, Netscape Goes Bonkers

Netscape went out of business as a result.

Josh Kerr relates a similar story regarding TextMate:

 

Macromates, an indie company who had a very successful text editor called Textmate, decided to rewrite the code base for Textmate 2. It took them 6 years to get a beta release out the door which is an eternity in today's time and they lost a lot of market share. When they did release a beta, it was too late and 6 months later they folded the project and pushed it on to Github as an open source project.

 
 --Josh Kerr, TextMate 2 And Why You Shouldn't Rewrite Your Code

Fred Brooks calls the urge to do a complete rewrite the second-system effect. He wrote about this in 1975:

 

The second is the most dangerous system a man ever designs. ... The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. ... The second-system effect has ... a tendency to refine techniques whose very existence has been made obsolete by changes in basic system assumptions. ... How does the project manager avoid the second-system effect? By insisting on a senior architect who has at least two systems under his belt.

 
 --Fred Brooks, The Mythical Man-Month, pp. 53-58.

Developers were the same forty years ago as they are today. I expect them to be the same over the next forty years as well; human beings remain human beings. Overconfidence, insufficient pessimism, ignorance of history, and the desire to be one's own customer all lead developers easily into rationalizations that this time will be different when they attempt a rewrite.

Why Don't Rewrites Work?

There are lots of reasons why a rewrite rarely works, but I will concentrate on only one general reason here: the intersection of resources, knowledge, communication, and productivity. (Be sure to read The Mythical Man-Month (pp. 13-26) for a great description of the problems associated with thinking of resources and scheduling as interchangeable elements.)

As with all things, we have only limited resources to bring to bear against the rewrite project. There are only a certain number of developers in the organization. These are the developers who will have to do both maintenance on the existing program and write the completely new version of the program. Any developers working on the one project will not be able to work on the other.

The Context-switching problem

One idea is to have the existing developers spend part of their time on the old application and part of their time on the new one. However, moving a developer between the two projects will not be an even split of productivity. Because of the cognitive load of context-switching, the developer will be less than half as productive on each.

The Knowledge problem

To avoid the productivity losses from switching developers between maintenance and the rewrite, the organization may try to hire more developers. Some can then be dedicated to the old project and others to the new project. Unfortunately, this approach reveals what F. A. Hayek calls the knowledge problem. Originally applied to the realm of economics, the knowledge problem applies equally as well to programming.

If we put the new developers on the rewrite project, they won't know enough about the existing system, the existing problems, the business goals, and perhaps not even the best practices for doing the rewrite to be effective. They will have to be trained on these things, most likely by the existing developers. This means the existing developers, who have been relegated to maintaining the existing program, will have to spend a lot of time communicating knowledge to the new hires. The amount of time involved is non-trivial, and the communication of this knowledge will have to continue until the new developers are as well-versed as the existing developers. This means that the linear increase in resources results in a less-than-linear increase in productivity: a 100% increase in the number of programmers will result in a less than 50% increase in output, sometimes much less (cf. The Miserable Mathematics of the Man-Monthhttp://paul-m-jones.com/archives/1591).

Alternatively, we could put the existing developers on the rewrite project, and the new hires on maintenance of the existing program. This too reveals a knowledge problem because the new developers are completely unfamiliar with the system. Where will they get the knowledge they need to do their work? From the existing developers, of course, who will still need to spend valuable time communicating their knowledge to the new hires. Once again, we see that the linear increase in developers leads to a less-than-linear increase in productivity.

The Schedule Problem

To deal with the knowledge problem and the related communication costs, some may feel the best way to handle the project would be to dedicate all the existing developers on the rewrite, and delay maintenance and upgrades on the existing system until the rewrite is done. This is a great temptation because the developers will be all too eager to salve their own pains and become their own customers - becoming excited about what features they want to have and what fixes they want to make. These desires will lead them to overestimate their own ability to perform a full rewrite and underestimate the amount of time needed to complete it. The managers, for their part, will accept the optimism of the developers, perhaps adding some buffer in the schedule for good measure.

The overconfidence and optimism of the developers will morph into frustration and pain when they realize the task is actually much greater and more overwhelming than they first thought. The rewrite will go on much longer than anticipated, not by a little, but by an order of magnitude or more. For the duration of the rewrite, the existing program will languish - buggy and missing features - disappointing existing customers and failing to attract new ones. The rewrite project will, at the end, become a panicked death march to get it done at all costs, and the result will be a codebase that is just as bad as the first one, only in different ways. It will be merely a copy of the first system, because schedule pressures will have dictated that new features be delayed until after an initial release is achieved.

Iterative Refactoring

Given the risks associated with a complete rewrite, I recommend refactoring instead. Refactoring means that the quality of the program is improved in small steps, without changing the functionality of the program. A single, relatively small change is introduced across the entire system. The system is then tested to make sure it still works properly, and finally, the system is put into production. A second small change builds on the previous one, and so on. Over a period of time, the system becomes markedly easier to maintain and improve.

A refactoring approach is decidedly less appealing than a complete rewrite. It defies the core sensibilities of most developers. The developers have to continue working with the system as it is, warts and all, for long periods of time. They do not get to switch over to the latest, hottest framework. They do not get to become their own customers and indulge their desires to do things right the first time. Being a longer-term strategy, the refactoring approach does not appeal to a culture that values rapid development of new applications over patching existing ones. Developers usually prefer to start their own new projects, not maintain older projects developed by others.

However, as a risk-reducing strategy, using an iterative refactoring approach is undeniably superior to a rewrite. The individual refactorings themselves are small compared to any similar portion of a rewrite project. They can be applied in much shorter periods of time than a comparable feature would be in a rewrite, and they leave the existing codebase in a working state at the end of each iteration. At no point does the existing application stop operating or progressing. The iterative refactorings can be integrated into a larger process with scheduling that allows for cycles of bug fixes, feature additions, and refactorings to improve the next cycle.

Finally, the goal of any single refactoring step is not perfection. The goal in each step is merely improvement. We are not trying to realize an impossible goal over a long period of time. We are taking small steps toward easily-visualized goals that can be accomplished in short timeframes. Each small refactoring win will both improve morale and drive enthusiasm for the next refactoring step. Over time, these many small wins accumulate into a single big win: a fully-modernized codebase that has never stopped generating revenue for the business.