To evaluate an information retrieval system the standard way, a test collection is needed, which should have the following:
- A collection of documents
- Test query set for the required information
- Binary assessment of relevant or not relevant
The documents in collections are classified using two categories, relevant and not relevant. The test document collection should be of a reasonable size, so the test can have reasonable scope to find the average performance. Relevance of output is always assessed relative to information required, and not on the basis of a query. In other words, having a query word in the results does not mean that it is relevant. For example, if the search term or query is for "Python," the results may show the Python programming language or a pet python; both the results contain the query term, but whether it is relevant to the user is the important factor. If the system contains a parameterized index, then it can be tuned for better...