Book Image

Alfresco Developer Guide

Book Image

Alfresco Developer Guide

Overview of this book

Table of Contents (17 chapters)
Alfresco Developer Guide
Credits
About the Author
About the Reviewers
Preface
Index

Lucene Queries


Lucene queries are part of everyday life when working with Alfresco. The syntax can take some time to get used to, and that is where this section can help. Let's start with the basics.

Suppose the repository has three test files in it as shown in the following table:

File

Folder

Description

Full Text

sample-a.pdf

SomeCo|Marketing|Whitepapers

This is a wonderful Whitepaper entitled, "Sample A", which you should really read when you have time to absorb it.

This is a sample Whitepaper named "Sample Whitepaper A".

sample-b.pdf

SomeCo|Marketing|Whitepapers

This is an advanced paper, which you should read after absorbing the earlier material.

This is a sample Whitepaper named "Sample Whitepaper B".

class-roster.txt

SomeCo|Operations

Class roster for the internal training, "How to write an effective whitepaper".

Writing an Effective Whitepaper:

Ray

Abby

Julian

Loren

Meurice

Debra

Basic Lucene Syntax

Given the set of test documents, let's use the search box in the Node Browser to run some sample searches against the repository to demonstrate basic Lucene search syntax.

roster

This search returns no results because by default, when using the Node Browser's search field, only the full text is searched and although the name and description of class-roster.txt includes "roster", the document content does not contain the string.

sample whitepaper
   

This search returns all three documents because this query is effectively the same as "sample OR whitepaper".

sample AND whitepaper

This search returns only sample-a.pdf and sample-b.pdf. This is the same as using the plus ("+") operator as in +sample +whitepaper. The plus operator requires a term to be in a document for it to be included in the search results.

-sample whitepaper

This search returns only the class roster. The minus ("-") operator specifies that results must not include the word "sample".

M?urice
Ab*
Lauren~

Each of these searches successfully returns the class roster. The question mark ("?") is a single character replacement. The asterisk ("*") is a wildcard. The tilde ("~") denotes a fuzzy search—it finds words that are similar. In this case, it matched "Lauren" to "Loren".

sample effective whitepaper
sample^10 effective whitepaper

Compare these two searches. They both return the sample PDFs and the class roster. In the first search, class-roster.txt shows up at the top of the result list. In the second search, the caret ("^") character has been used to "boost" or increase the weight of the term, "sample", by a factor of 10. That gives the PDFs more weight. So for the second search, the two PDFs move to the top of the search results.

Property Search

So far you've searched the full text of the sample documents, but what if you wanted to search against a specific property? To do that, use the at ("@") symbol followed by the namepsace and property name, and then the search phrase.

@cm\:description:read

This search returns both sample PDFs because both contain the word read in their description fields.

@cm\:description:(whitepaper -sample)

This search shows a combination of a field-based search with the minus operator to return only the documents that have whitepaper, but not sample in their description property. In this case, the search returns only class-roster.txt.

@sc\:isActive:true

Searches for custom properties work as well. This one returns documents where the SomeCo isActive property is set to true.

Proximity Search

If you need to find documents where two words appear within certain proximity of each other, you can use a proximity search.

@cm\:description:"wonderful absorb"~12

This search returns only sample-a.pdf. Of course, in our limited sample set it is the only document that contains those two words in its description property. If you need convincing, increase the proximity number and you'll see that the document gets removed from the result list. Note that the proximity number is supposed to be the number of words that separate the two terms, but it is not precise.

Range Search

Properties can also be searched by range.

@cm\:created:[2008-07-01T00:00:00 TO 2008-07-22T00:00:00]

returns the sample PDFs, but not the class roster. This is because at least for this particular sample set, the class roster was created on July 24, 2008.

@cm\:created:[2008-07-01T00:00:00 TO 2008-07-24T00:00:00]
@cm\:created:{2008-07-01T00:00:00 TO 2008-07-24T00:00:00}

Compare these two search strings. The difference is that one uses square brackets ("[]") and the other uses curly braces ("{}"). The square brackets indicate an inclusive search, while the curly braces indicate an exclusive search. The inclusive search returns the class roster, created on July 24th, but the exclusive search does not because the end date of the date range matches the creation date.


@cm\:name:([clam TO dog])

Range searches work on strings as well. This search returns class-roster.txt, but neither of the sample Whitepapers.

Field Search

Certain pieces of metadata about the objects stored in Alfresco are indexed into Lucene fields such as TYPE, ASPECT, PARENT, TEXT, and so on. Lucene queries can be executed specifically against these fields.

TYPE and ASPECT

The TYPE and ASPECT fields return objects, where the object type or an applied aspect matches the fully-qualified QName provided in the search.

TYPE:"{http://www.someco.com/model/content/1.0}whitepaper"
TYPE:"sc:whitepaper"

These searches are equivalent. They return all sc:whitepaper objects. Note that a search against sc:doc will also include these documents because, as defined in the content model, sc:whitepaper inherits from sc:doc. If you want only instances of sc:doc but not a child type, you could use the minus ("-") operator to exclude instances of those types.

ASPECT:"{http://www.someco.com/model/content/1.0}webable"
ASPECT:"sc:webable"

These searches, also equivalent, return any object with the sc:webable aspect applied.

ID

The ID field contains the node's node reference.

ID:"workspace://SpacesStore/3f2831e1-4db9-11dd-83c8-a5bb8dda71b3"

For example, this search returns a node with a node-uuid property set to 3f2831e1-4db9-11dd-83c8-a5bb8dda71b3, which resides in the SpacesStore.

PARENT

The PARENT field refers to the node reference of the parent node of the object.

PARENT:"workspace://SpacesStore/0da35100-4c59-11dd-9b6d-ed6cfed7fcb0"

This search returns the contents of the folder identified by the specified node reference.

PATH

The PATH field is the path to the node from the store root. Note that each node in the expression is the QName of the node, which may or may not match the value of the name property. The out of the box example of this is Company Home, which is the value of that node's name property. But the QName of the node is "{http://www.alfresco.org/model/application/1.0}company_home" (note the lowercase and the underscore).

PATH:"/app:company_home/cm:SomeCo/cm:Marketing/cm:Whitepapers

This search returns the specific Whitepapers folder.


PATH:"/app:company_home/cm:SomeCo/cm:Marketing/cm:Whitepapers/*"
PATH:"/app:company_home/cm:SomeCo/*/cm:Whitepapers/*"

These searches show the use of wildcards in the path. The first search returns all child nodes of the Whitepapers node. The second search returns the children of all nodes named Whitepapers, which are children of some other node under SomeCo. For example, if there were an Operations folder that also had a Whitepapers folder, the search results would include those objects as well.

QNAME

The QNAME field stores the QName of the object.

QNAME:"cm:Whitepapers"

This search would return all of the nodes with the matching QName. In the previous example where you had a Whitepapers folder under both Marketing and Operations, you would get two nodes back.

TEXT

The TEXT field contains the full text of all of the d:content properties on the object. To put it more simply, this field allows you to do a full-text search of the object.

TEXT:"sample"

This search returns any objects with the word sample in the text.

Category

Category searches use the PATH field, but you construct a path using the classification hierarchy. Suppose that sample-a.pdf is classified under "Languages/German", and sample-b.pdf is classified under "Languages/German/Swiss-German". Now consider the following two searches:

PATH:"/cm:categoryRoot/cm:generalclassifiable/cm:Languages/cm:German/*"
PATH:"/cm:categoryRoot/cm:generalclassifiable/cm:Languages/cm:German//*"

The first search will return sample-a.pdf because it is classified as "German" and the "Swiss-German" category. sample-b.pdf won't be returned because sample-b.pdf is under a subcategory, "Swiss-German". The second search uses double slashes ("//") at the end to denote that matches should include "German" as well as anything classified under a subcategory. It returns both documents and the "Swiss-German" subcategory.

So the category searches, as shown above, will return both objects that have been categorized ("members") and also the category nodes. If what you want are only documents and not categories, you can use "member" as follows:


PATH:"/cm:categoryRoot/cm:generalclassifiable/cm:Languages/cm:German/member"
PATH:"/cm:categoryRoot/cm:generalclassifiable/cm:Languages/cm:German//member"

The first search would return only sample-a.pdf, while the second search would return sample-a.pdf and sample-b.pdf.

Using Saved Searches as Examples

If you have trouble in getting a query right, try creating it through the Advanced Search page in the web client. When it works the way you want, save the search. Then, use the node browser to look at the XML that defines the saved search to see how Alfresco built the query. You can then use that query in your own code.

For example, if you saved the search performed in the previous example, the XML for the saved search would look as follows:

<?xml version="1.0" encoding="UTF-8"?>

<search>
  <text><![CDATA[]]></text>
  <mode>0</mode>
  <categories>
     <category>/cm:categoryRoot/cm:generalclassifiable
     /cm:Languages/cm:German//*</category>
  </categories>
  <attributes/>
  <ranges/>
  <fixed-values/>
  <query><![CDATA[( PATH:"/cm:categoryRoot/
   cm:generalclassifiable/cm:Languages/cm:German//*" ) AND 
   ((TYPE:"{http://www.alfresco.org/model/content/1.0}content" 
   TYPE:"{http://www.alfresco.org/model/content/1.0}folder" ))
    ]]></query>
</search>

Public saved searches are stored in the Data Dictionary under "Saved Searches".

Note

Another handy way to debug queries is to update log4j.properties to enable debugging for the query parser. That will allow you to see the query produced by the query parser. The log4j logger for the query parser is:

org.alfresco.repo.search.impl.lucene.LuceneQueryParser=DEBUG