Book Image

Managing Multimedia and Unstructured Data in the Oracle Database

By : MARCEL KRATOCHVIL
Book Image

Managing Multimedia and Unstructured Data in the Oracle Database

By: MARCEL KRATOCHVIL

Overview of this book

Multimedia is the new digital frontier. Managers, software architects, administrators and developers need to fully comprehend this exciting new technology as its widespread use and acceptance cannot be ignored any longer."Managing Multimedia and Unstructured Data in the Oracle Database" will give you a complete understanding of how to manage all data, especially multimedia. You will learn all the latest terminology, how to set up a database, load digital objects, search on them and even how to sell them. Whether you are a manager or database administrator, this book will give you the knowledge you need to take control of this rapidly growing and industry- changing technology. Technology which is transforming our lives.Starting with the basic principles of unstructured data and detailing the concepts behind multimedia warehouses and digital asset management systems, this book will describe how to load this data, search against it, display it intelligently, and deliver it to customers and users. Learn how all these concepts work within the Oracle 11g R2 database environment and how to tune the database effectively to manage it.Begin to learn about this new and exciting field and use it to give your business a competitive edge or give yourself the ability to take a leadership role in this exciting new computing genre.
Table of Contents (22 chapters)
Managing Multimedia and Unstructured Data in the Oracle Database
Credits
About the Author
Acknowledgement
About the Reviewers
www.PacktPub.com
Preface
Index

Defining multimedia in the Oracle database


It's important to exactly define what multimedia is. The common thinking is that multimedia is just a photo taken using a digital camera (or scanned in). Multimedia is much more than this.

To try and define what multimedia is, it's best to look at examples and see how they work with the Oracle database.

Photograph

The photograph is also referred to as a picture, but the proper usage is a digital image.

It can be taken by a digital camera or can be scanned in. A photograph can have the metadata embedded in it (common formats include EXIF, IPTC, Adobe XML, Dicom). A photo can be of type JPEG, TIFF, PNG. There are well over 300 other types. Some camera manufacturers use a raw option when storing their digital images (two of the most common formats being DNG and NEF).

The photo is stored in the Oracle multimedia ORDSYS.ORDIMAGE data type. More complex photos can be stored in the ORDDICOM data type along with other multimedia types.

A photo can also be of type Georaster, in which case it's best stored using the Oracle Spatial Georaster data type.

A photo can be defined as a two-dimensional object composed of binary data. The photo is typically stored in compressed format using compression software built for that image type.

Video

A video is a time-based set of two-dimensional photographs with optional audio. A video can contain metadata. It can also optionally contain an audio track (audio type) and a caption (document type). The video can be compressed and photographs can be extracted from the image. Common examples include MPEG, Divx, AVI, and QuickTime.

In Oracle multimedia, a video is stored using the ORDSYS.ORDVIDEO data type.

Audio

An audio image is a time-based collection of analog-based sounds. An audio image can be compressed. It can also optionally contain a caption (document type).

In Oracle multimedia, audio is stored using the ORDSYS.ORDAUDIO data type.

Document

It is a set of two-dimensional pictures that conform to a well-defined set. A document can also contain within it all image types. As a result, a document is stored as binary not character. Microsoft Word and Adobe PDF are two well known examples. There are over 3000 examples of documents found in the marketplace.

A document can be indexed using Oracle text.

In Oracle multimedia, a document is stored using the ORDSYS.ORDDOC data type.

Text

Text is a document that is not binary but composed of character data only. It can contain structured data (the two best known examples being relational and XML). Depending on the type, it is determined where to store the data. It can be stored in an XML type, an Oracle table, a CLOB, or a varchar field. It can be indexed using Oracle text.

Artifact

It is a three-dimensional representation of an object. Though still in its infancy, some cameras can create a 3D view of an object. It can also be referred to as a blueprint, equating to a three-dimensional drawing used by architects(16).

In Oracle multimedia, an artifact is stored using the ORDSYS.ORDSOURCE data type.

Additional multimedia types

The multimedia is not just limited to the types mentioned. It can include anything. In the next decade we will be seeing new types of multimedia containing very large amounts of data. Some of these will be based on life sciences and simulation. Individual multimedia files will be on an average over a 1 TB in size.

For those familiar with VMware, it is feasible (but not currently practical) to store whole VMware instances as a multimedia type. These can be of any size. As more sites move down the virtualization path, the ability to create many installs will be simplified and organizations will be creating more of them. In the next decade we will see computers with a large number of cores and very large amounts of memory being able to host one VMware instance per user. Thus going down the path of each user having their own client computer, which is centrally stored and managed. Once more we see the rules for tuning and management change. Smart use of virtualization will ensure that the CPU use is fairly distributed. But as the number of these virtualizations grow, they will need to be managed and archived. And the most logical place to put them is in a database.

Other smaller sized types can include e-mail messages, flash files, and executables.

Composite types

It is a set of one or more multimedia types stored in the one type. A ZIP, RAR, or TAR file can contain a mixture of multimedia. Further multimedia can then be extracted. A Dicom file can contain multiple multimedia types. Certain photographic file formats can contain multiple images within them. A GIF can be animated, a TIFF can contain multiple images, and a JPEG can contain other JPEGs within it. How the multimedia is going to be used best determines how it is stored.

A composite type is different to a product group or a container.

A composite type introduces the concept of multiple originals. Our traditional notion of an image needs to extend to deal with a multimedia type that is related to other multimedia. A good example is with a DICOM image. This is typically an image that contains information about a medical patient. It can contain patient history, x-rays, ultrasound, and scans. Each is a different image type, but together they all represent the one patient. If we view the patient as an object, then the object is a digital image composed of multiple original with each one being another multimedia type. Another example is a museum painting. The one painting can have multiple photographs taken of it. It might have an associated video showing how it was painted and an audio commentary of it by the artist. Each is a separate image but together they create one image with multiple originals in it. Another example is when a photographer takes a mosaic picture. This is a set of photographs of a scene that can be stitched together to create a new picture (like a jigsaw puzzle). Each image is still treated separately.

For a composite type, one digital image is chosen as the representative image and used as the thumbnail. Depending on the context in which the image is used or accessed, this representative image can dynamically change.

Composite types are best handled using the Oracle database's object/relational capabilities.

Container

It is used to describe the fact that a file type can have multiple encoding algorithms used within it. A video file of type AVI is a container because different compression formats can be used within it. This is covered in greater detail in Chapter 2, Understanding Digital Objects.

ZIP files

A ZIP file is a specialized composite type. The goal is compression. ZIP is now used in the vernacular, even though there are other products that can do the same task. Some of these include Winrar, Unix Tar, and Unix Gzip. With ZIP the idea is to create one or more large files containing all the other files and to compress them within it. This is useful for backups or delivery/transfer of large number of images between computer systems.

Within Oracle there are a number of methods for dealing with a ZIP file. The context or how it's designed to be used determines what the ZIP file actually is. The following highlights three different uses of a ZIP file:

  • Delivery: It extracts all the multimedia within it. It treats each file extracted as a separate file and discards the original ZIP file. This is useful for loading up a set of images via the web browser to the database.

  • Index: It extracts one image for display and indexing purposes and stores the original ZIP. It is useful when a large number of images need to be delivered. The original ZIP is delivered to the customer.

  • Composite: It extracts all the files but treats the set of images as a composite type and discards the original ZIP. The one digital image is composed of multiple originals.

Note

The Oracle PL/SQL Package UTL_COMPRESS, will not prove to be useful for handling zipped images. This package assumes the ZIP contains exactly one file. To unzip multiple files requires writing a Java program (which runs in the database and can unzip multiple files, even if they are in subdirectories). Another option is to use Java to shell out to the operating system. Dump the ZIP file to a temporary location then invoke the operating system unzip (now supported in Windows as well as Unix) and then load in the extracted files.

Metadata

The metadata is a text data associated with a digital object for the purpose of searching and providing structured or semi-structured information about the digital object. Metadata is covered in greater detail in Chapter 3, The Multimedia Warehouse. In Oracle metadata can be stored in tables.

The NULL case

For a multimedia type, the NULL equivalent should be discussed. It's possible for the digital images associated with metadata to exist, but the actual multimedia component to not yet exist. For example, a museum has information about an object that needs to be photographed and stored in the database. The object could be a painting, a vase, a person, or any general collection object. They first store the metadata about the object in the database. At a later time the object is scanned, photographed, or a video is taken of it. The digital image is then associated with the initial metadata.

Because the metadata for the object exists but its associated multimedia does not, this is considered the NULL case for the multimedia type. The definition of NULL is related to the potential of what the image could become and avoids confusing a NULL image with one that is empty or blank.