Solr includes a very popular contrib module for importing data known as the DataImportHandler. It's a data processing pipeline built specifically for Solr. Here's a list of the notable capabilities:
It imports data from databases through JDBC (Java Database Connectivity). This supports importing only changed records, assuming a last-updated date
It imports data from a URL (HTTP GET)
It imports data from files (that is, it crawls files)
It imports e-mail from an IMAP server, including attachments
It supports combining data from different sources
It extracts text and metadata from rich document formats
It applies XSLT transformations and XPath extraction on XML data
It includes a diagnostic/development tool
Furthermore, you could write your own data source or transformation step once you learn how by seeing how the existing ones are coded.