Let's assume that we want to index the Wikipedia data, and we don't want to parse the whole Wikipedia data and make another XML file. Instead we asked our DB expert to import the data dump information from the PostgreSQL database, so we could fetch that data. Did I say fetch? Yes it is possible – with the use of Data Import Handler and JDBC data source. This task will guide you through how to do it.
Please refer to the How to properly configure Data Import Handler recipe in this chapter to get to know the basics about how Data Import Handler is configured. I'll assume that you already have Solr set up according to the instructions available in the mentioned recipe.
The Wikipedia data I used in this example is available under the Wikipedia downloads page at http://download.wikimedia.org/.
First let's add a sample index structure. To do that we need to modify the fields section of the
schema.xml
file so it looks...