Measuring similarity between two vectors
Measuring similarity between two vectors is important in a neural search system. Once all of the documents have been indexed into their vector representation, given a user query, we carry out the same encoding process to the query. In the end, we compare the encoded query vector against all the encoded document vectors to find out what the most similar documents are.
We can continue our example from the previous section, trying to measure the similarity between doc1
and doc2
. First of all, we need to run the script two times to encode both doc1
and doc2
:
doc1 = 'Jina is a neural search framework' doc2 = 'Jina is built with cutting age technology called deep learning'
Then, we can produce a vector representation for both of them:
encoded_doc1 = [1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0] encoded_doc2 = [1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1]
Since the dimension of the encoded result is always identical...