Book Image

Alfresco Developer Guide

Book Image

Alfresco Developer Guide

Overview of this book

Table of Contents (17 chapters)
Alfresco Developer Guide
Credits
About the Author
About the Reviewers
Preface
Index

Alfresco in the Real World


Alfresco will tell you that the product is a platform for Enterprise Content Management (ECM). But ECM is a somewhat nebulous and nefarious term. What does it really mean? It depends on who is saying it. ECM vendors usually use it as an umbrella term to describe a collection of content-centric technologies that includes:

  • Document Management (DM): Capturing, organizing, and sharing binary files. These files are typically produced from office productivity software, but the scope of the files being managed is unlimited.

  • Web Content Management (WCM): Managing files and content specifically intended to be delivered to the Web. The key theme of WCM is to reduce the "web developer" bottleneck and empower non-technical content owners to publish their own content.

  • Digital Asset Management (DAM): Managing graphics, video, and audio. You can think of this as DM with added functionality specific to the needs of working with rich media such as thumbnailing, transcoding, and editing. Like WCM, the intent is to streamline the production process.

  • Records Management (RM): Managing content as a legal record. Like DAM, RM starts with DM and adds functionality specific to the world of RM such as retention policies, records plans, and audit trails.

  • Imaging: This includes capturing, tagging, and routing images of documents from scanners.

Most people will also include Collaboration, Search, and occasionally, Portals as well.

Practitioners have a different perspective. They will say that ECM is less about the technology and more about how you capture, organize, and share information across the entire enterprise. For them, the "how" is more important than the "what".

What's important to know from an Alfresco perspective is that Alfresco is a platform for doing all these things.

So rather than worrying about a concise definition of ECM, let's look at a few examples to illustrate how clients are using Alfresco today, particularly in Alfresco's sweet spots such as Document Management and Web Content Management.

Basic Document Management

Alfresco started its life as a document management repository with some basic services for document management. Alfresco focused on this smart area initially for two reasons. First, it allowed Alfresco to establish a strong foundation and then build upon that foundation by expanding into other areas of ECM, with WCM being the prime example. Second, there is a huge market for systems that can manage unstructured content (aka "documents"). The market is so big because document management is a problem for everyone. All companies generate files that benefit from the kind of features document management provides such as check-in/check-out, versioning, metadata, security, full-text search, and workflow.

Examples of classic document management are often found in manufacturing, packaged goods, or other companies with large research and development divisions. As you can imagine, companies such as these deal with thousands of documents every day. The documents are in a variety of formats and languages, and are created and leveraged by many different types of stakeholders from various parts of the company.

The critical functionality required for basic document management includes things such as:

  • Easy integration with authoring tools: If users can't get documents into and out of the repository easily, user adoption will suffer. This means users must be able to open and save documents to the repository from applications such as Microsoft Office, Microsoft Windows Explorer, and email.

  • Security: Many documents, particularly legal documents and anything around new product development, are very sensitive. Employees must be able to log in with their normal username and password, and see only the documents they have access to.

  • Library services: This is a grouping of foundational document management functionality that includes check-in/check-out, versioning, metadata, and search. The ability to offer these library services is one of the things that sets a document repository apart from a plain file system.

  • Workflow: Quite literally, workflow describes the "flow of work" or business process related to a document. Requirements vary widely in this area and not everyone will leverage workflows right away. Workflows can be used to streamline and automate manual business processes by letting the document management system keep track of who needs to do what to a document at any particular time.

  • Scalability/Reliability: The system needs to scale in order to support several hundred or more users and hundreds of thousands or even millions of documents with some percentage of growth each year. Because the repository holds content that's critical to the business, it needs to be highly available.

  • Customizable user interface: The out of the box Alfresco web client is made for generic document management, which may be appropriate in many cases. Most clients will want to make at least some customizations to the web client to help increase productivity and improve user adoption.

The following diagram shows an example of high-level architecture to understand how basic document management might be implemented:

The diagram shows a single instance of Alfresco authenticating against LDAP. Some content managers are using the web client via HTTP/S, while others are using Windows Explorer, Microsoft Office, and other Thick Clients to work with content via one or more protocols such as CIFS, WebDAV, FTP, or SMTP. As noted in the diagram, Alfresco stores metadata in a Relational DB and the actual content files on the file system.

Most of the techniques for customizing Alfresco for DM solutions apply to other ECM solutions such as WCM, RM, Imaging, and DAM. Of course, there are business concepts and technical implementation details specific to each that make them unique, but the details provided in this book apply to all because the specialized solutions are built as extensions to the core Alfresco repository. WCM is built on the core repository as well, but the functionality it adds is significant enough to warrant a closer look.

Web Content Management

On the surface, WCM is very similar to document management. In both cases, content owners store files in a repository. Often, the content is assigned metadata, is secured, is indexed for search, and is routed through a workflow. The most obvious difference between DM and WCM is that the content being managed is meant specifically to be published on a web site or as part of a web application. Beyond that high-level distinction, there are several other differences that make WCM worthy of separate discussion. These include:

  • Content authoring tools used to create content

  • Separation of presentation from content

  • Systematic publication or deployment of content

Let's briefly look at each of these.

Content Authoring Tools

The majority of document management solutions deal with files generated by an office suite. Of course, there are exceptions such as various types of graphics files, CAD/CAM drawing formats, and other specialized tools. But mostly, the files are generated by a small number of different tools and an even smaller number of different software vendors.

In the case of WCM, there is a wide variety of tools involved from text editors to Integrated Development Environments (IDEs) to graphics programs with multiple vendors in each category. This means the WCM solution needs to be very flexible in the way it integrates with authoring tools. The alternative, which is forcing authors to give up their favorite tools in favor of a standard, can be a management nightmare.

Separation of Presentation from Content

WCM does not require the separation between content's appearance on the web site and its storage. But many implementations take advantage of this principle because it makes redesigning the site easier, facilitates multi-channel publishing, and enables people to author content without web skills.

To understand why this is so, think about a web site that has its content and presentation of that content merged together. When it is time to redesign the site, you have to touch every single web page because every page contains presentation markup. Similarly, content authoring is limited to people with technical skills. Otherwise, there is a risk that the content owner (for example, the person writing a press release or a job posting) will inadvertently clobber the page design.

One way to address this is to separate the content (the press release copy) from the presentation of that content. A common way to do that is to store the content as presentation-independent XML. The XML can then be transformed into any presentation that's needed. A redesign is as simple as changing the presentation in a single place, and then regenerating all of the pages.

The impact of separating content from presentation is three-fold. First, assuming the content consumers aren't interested in reading raw XML, something has to be responsible for transforming the content. Depending on the implementation, it may be up to the WCM system or a frontend web application.

Second, in the case of static content, any change in the underlying content has to trigger a transformation so that the presentation will be up-to-date, keeping in mind that there may be more than one file affected by the change. For example, data from a job posting appears in the job posting detail as well as the list of job postings. If the posting and the job posting index are both static, the list has to be regenerated whenever the job posting changes.

Third, content authors lose the benefit of WYSIWYG (What You See Is What You Get) content authoring because the content doesn't immediately look the way it will as soon as it is published to the web site. The WCM system, then, has to be able to let content authors "preview" the content as they author it, preferably in the context of the entire site.

Systematic Publication or Deployment

A Document Management system is a lot like a relational database in the sense that it is typically an authoritative, centralized repository. There are exceptions, but for the most part, content resides in the repository and is retrieved by the systems and applications that need it. On the other hand, a WCM system often faces a publication or deployment challenge. Files go into the repository, but must be delivered somewhere to be consumed. This might happen on a schedule, at the request of a user, as part of a workflow, or all of the above. Granted, some web sites retrieve their content dynamically; but most sites have at least a subset of content that should be statically delivered to a web server.

Alfresco WCM Example

Let's look at an example of a basic corporate web site. Most companies have a mix of "About Us" content that probably doesn't change very often, press releases or "News" that might get updated daily, and maybe some document-based content such as marketing slicks, product information sheets, technical specifications, and so on. There's also some content that is used to build the site such as HTML, XML, JavaScript, Flash, CSS, and image files.

It is likely that there are several different teams with several different skill sets, all collaborating to produce the site. In this example, suppose the "About Us" and "News pages" come from the marketing team, the site is built by the web team and the document-based content can come from many organizations within the company.

Alfresco WCM sits on top of the core Alfresco product to provide additional WCM-specific functionality. An important distinction between Alfresco WCM and other open source Content Management Systems is that Alfresco is a "de-coupled" CMS while something such as Drupal is a "coupled" CMS. This means that Alfresco manages the web site but does not concern itself with presentation unlike Drupal, which is both a repository and a presentation framework. This doesn't mean that Alfresco can only manage static sites. You can easily query the repository in any number of ways. It just means it is up to you to provide the frontend from the ground up.

Using Alfresco, the WCM implementation for this example might look like this:

Note that in the diagram there is a mix of structured content (XML) and unstructured content (CSS, PNG, and PDF). The structured content gets created through Alfresco web forms and is transformed to one or more formats (in this case, JSP) using XSLT or FreeMarker. The unstructured content is simply uploaded via either the web client or CIFS.

Regardless of whether it is created with a web form or uploaded to the repository directly, the content has to make it to a web server at some point. In this example, the content is being deployed to the frontend web server using Alfresco's file deployment mechanism. In Chapter 8, other content deployment patterns will be explored.

Custom Content-Centric Applications

Content-centric applications are those in which the primary purpose of the application is to process, produce, collaborate on, or manage unstructured or semi-structured content.

The Alfresco web client is an example of a content-centric application, although it is meant for a very general, all-purpose use case. When solutions are very close to basic document management, the web client can be customized as previously discussed. At some point, it makes more sense to build a separate custom application with Alfresco as the backend repository for that application.

Consider the sales process within a company, for example. Sales people create proposals. Those proposals are usually routed internally for review and approval, and then are delivered to the client. If the client accepts the proposal, a contract is drawn up and the product is delivered. The out of the box web client could be used to manage these documents, assign metadata, manage the review process through workflows, and make it all searchable. But the sales team might be even more productive if it used a purpose-built user interface. For this solution, a frontend built with a scripting language such as PHP, a Java framework such as Seam, or even a Rich Internet Application (RIA) technology such as Flex might be a good option. Alfresco would provide the document management services. The frontend would talk to Alfresco via SOAP or RESTful services.

Another example is a "community" site. With so much buzz around Web 2.0, companies are looking for ways to add community features to their online presence such as forums, blogs, and personalized content as well as user-generated content such as comments, ratings, and rich media.

As discussed previously in the WCM section, Alfresco is very good at publishing static files to one or more web servers or application servers. What it lacks, at least in the current release, is a presentation framework. Many clients appreciate this separation because it gives them complete freedom with regard to how they build the frontend. But in the case of a community site, it would be a good thing to be spared of building the frontend from scratch.

One way to implement this kind of solution is to use an open source portal such as Liferay or JBoss Portal for the frontend. Alfresco can manage the content and also the business process used to approve that content for publication in the community site. Portlets can be written that use either SOAP-based or REST-based web services calls, to query for and display content stored in the repository.

Note that the diagram also shows a Single Sign-On (SSO) solution so that users have to log in only once when moving back-and-forth between the portal and Alfresco. This isn't strictly required, but it is worth considering, particularly with freely available open source SSO solutions such as Yale CAS.

The openness of the Alfresco repository, particularly its ability to be easily exposed as a set of services, makes Alfresco an ideal platform for content-centric applications. As the examples have shown, custom content-centric web applications use Alfresco as the backend. As a result, they have complete flexibility in frontend technology choices from portals to lower-level frameworks to no framework at all.