[ad_1]
No, not a scooter :-).
I mean Vespa.AI, a search engine that supports structured search, text search and predictive vector search. Although Vespa’s vector search feature was probably developed in response to search engines incorporating vector-based signals into their ranking algorithms, there are also many ML/NLP pipelines that can take advantage of vector search, i.e., the ability to find nearest neighbors in a high-dimensional space. scale. I was interested in Vespa because of its vector search feature.
The last few times I needed to implement a vector search feature in my application, I considered using Vespa and even spent a few hours on their website, but eventually gave up and ended up using NMSLib (Non-Metric Space Library). . This was because the learning curve seemed quite steep and I was worried that it would affect the project timeline if I tried to study it in a project-wise manner.
So this time, I decided to learn Vespa by doing a toy project using it. Somewhat to my surprise, I was luckier this time. Some of this is really timely and well-known thanks to the help I got from Vespa staff (and obviously Vespa experts) in the Relevancy slack workspace. But I would attribute at least some of the success to the epiphany that there was a correspondence between Vespa’s functionality and Solr’s. I wrote this post How I Learned Vespa in Solr by reflecting on the Vespa blog based on this epiphany and describing my experiences while implementing the Vespa toy project. If you have experience with Solr (and perhaps Elasticsearch) and are looking to learn Vespa, you might find it useful.
Another thing I generally do for my ML/NLP projects is create some interfaces for users to interact with. The first interface is for human users, and so far it’s almost always been a skeletal but fully functional custom web application, albeit minus most of the UI bells and whistles, as my previous skills remained firmly in the mid-1990s. In the past it was Java/Spring applications, and more recently it was CherryPy and Flask applications.
I often feel that a full application is unnecessary. For example, my game application performs a text search against the CORD-19 dataset, and a MoreLikeThis-style vector search to find similar papers for a given paper. A custom application not only needs to demonstrate individual features, but also the interactions between these features. Of course, these are just two properties, but you can see how it can get complicated quickly. However, most of the time, your audience is just trying to test your features with different inputs and have the imagination to see how it all fits together. A web app is just a convenient way for them to do the first.
Which brings me to Streamlit. I heard about Streamlit from one of my Labs colleagues, but I got a chance to see it in action during an informal demo by a fellow (non-work colleague?) meeting I regularly attend. Based on the demo, I decided to use it for my work, where each function has its own dashboard. The screenshots below show these two features with some real data. The code to do this is pretty simple, just Python calls to simplify functions and doesn’t involve any web page capabilities.
The second interface is for software users. This toy example was relatively simple, but often an ML/NLP/search pipeline involves talking to multiple services or other random complexities, and the user of your application doesn’t really need or want to care about what’s going on under the hood. In the past, I’ve built JSON API frontends that mimic the frontend (in terms of information content), and I’ve done the same here with FastAPI, another library I’ve been meaning to look at. As with Streamlit, the FastAPI code is very simple and there is very little work to set up. As a bonus, it comes with a built-in Swagger Editor that automatically validates your API and allows users of your API to try different services without an external client. The screenshots below show the request parameters and JSON response for two services in my gaming application.
You can find the code for both the dashboard and the API in the python-scripts/demo subdirectory of my sujitpal/vespa-poc repositories. I have separated the functionality of the application into its own “package” (demo_utils.py) so that it can be used from both Streamlit and FastAPI.
If you’ve read this far, you probably realize that the title of the post is somewhat misleading. This post was more about the visible artifacts of my first toy Vespa app than exploring the Vespa itself. However, I decided to keep the title as it is because it was a natural premise for my dad’s joke in the next line. For a more thorough coverage of my experience learning Vespa, I refer you back to my blog post How I Learned Vespa by Thinking in Solr. We hope you find it as interesting (if not more) as you found this post.
[ad_2]
Source link