One of the most common problems when indexing a vast amount of data is the indexing time. Some of the problems with indexing time are not easily resolvable, but others are. Imagine that you need to index about 300,000 documents that are in a single XML file. You run the post.sh
bash script that is provided with Solr and you wait, wait, and wait. Something is wrong – when you index 10,000 documents you need about a minute, but now you are waiting about an hour and the commit
operation didn't take place. Is there something we can do to speed it up? Sure, and this recipe will tell you how to.
The solution to the situation is very simple – just add the commit
operation every now and then. But as you may have noticed, I mentioned that our data is written in a single XML file. So, how do we add the commit
operation to that kind of data? Send it in parallel to the indexing process? No, we need to enable the auto commit mechanism. To...