Book Image

Neural Search - From Prototype to Production with Jina

By : Jina AI, Bo Wang, Cristian Mitroi, Feng Wang, Shubham Saboo, Susana Guzmán
Book Image

Neural Search - From Prototype to Production with Jina

By: Jina AI, Bo Wang, Cristian Mitroi, Feng Wang, Shubham Saboo, Susana Guzmán

Overview of this book

Search is a big and ever-growing part of the tech ecosystem. Traditional search, however, has limitations that are hard to overcome because of the way it is designed. Neural search is a novel approach that uses the power of machine learning to retrieve information using vector embeddings as first-class citizens, opening up new possibilities of improving the results obtained through traditional search. Although neural search is a powerful tool, it is new and finetuning it can be tedious as it requires you to understand the several components on which it relies. Jina fills this gap by providing an infrastructure that reduces the time and complexity involved in creating deep learning–powered search engines. This book will enable you to learn the fundamentals of neural networks for neural search, its strengths and weaknesses, as well as how to use Jina to build a search engine. With the help of step-by-step explanations, practical examples, and self-assessment questions, you'll become well-versed with the basics of neural search and core Jina concepts, and learn to apply this knowledge to build your own search engine. By the end of this deep learning book, you'll be able to make the most of Jina's neural search design patterns to build an end-to-end search solution for any modality.
Table of Contents (13 chapters)
1
Part 1: Introduction to Neural Search Fundamentals
5
Part 2: Introduction to Jina Fundamentals
8
Part 3: How to Use Jina for Neural Search

Measuring similarity between two vectors

Measuring similarity between two vectors is important in a neural search system. Once all of the documents have been indexed into their vector representation, given a user query, we carry out the same encoding process to the query. In the end, we compare the encoded query vector against all the encoded document vectors to find out what the most similar documents are.

We can continue our example from the previous section, trying to measure the similarity between doc1 and doc2. First of all, we need to run the script two times to encode both doc1 and doc2:

doc1 = 'Jina is a neural search framework'
doc2 = 'Jina is built with cutting age technology called deep learning'

Then, we can produce a vector representation for both of them:

encoded_doc1 = [1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0]
encoded_doc2 = [1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1]

Since the dimension of the encoded result is always identical...