Ditch Algolia, Create Your Own Content Search Engine
Build a full-text search engine with autocorrect for the content of your website that directly connects to your database. Users being able to find the content they want on your site is essential. Out-of-the-box solutions like Algolia are great, but it probably isn't the best use of resources for a small website. With this tutorial, we will use flexsearch to search documents and use some custom code to make the result 10x better.
This tutorial is designed to be platform-neutral and can be used with any database, server, or framework. While we'll be focusing on setting up an express server, the search function can easily be adapted to work in any part of your application. After configuring the server, we'll implement the autocorrect feature and leverage flexsearch to locate relevant documents.
Dependences
1npm i express flexsearch stopword closest-match
express
- Acts as a middleware for our web server
flexsearch
- Implements a full-text search algorithm that allows for quick querying
stopword
- Removes stopwords from the search query
closest-match
- Used to create autocorrect
Express Server
/server.js
Here's a basic express server that utilizes express.json()
to extract data from POST requests. We import the search function at the beginning and pass it into a POST route after the JSON parser. This not only works as a standalone express app but can be used with Svelte when built for Node JS. All of this code should be put into your main server file, I use server.js
.
1import { search } from "./search.js";
2import express from "express";
3
4const app = express();
5app.use(express.json());
6
7// The search route
8app.post("/search", search);
9
10// Other Routes for your website
11
12app.listen(5173, () => {
13 console.log("listening on port 5173");
14});
To utilize the search feature in your application, you must send a POST request to the server provided. Please note that this cannot be utilized as a SvelteKit/React endpoint since the searchable documents' entire database must be stored in memory. Otherwise, you would have to read the entire database and index the documents every time you conduct a search query.
Populating the Index
/search.js
We will be using the flexsearch Document
index because this site is based on a document model however if your data looks different pick the one that fits the best. To initiate the server we need to first load all of our documents from our database into flexsearch. I will be using pseudo code for the database part, just adapt it to your database.
1import Document from "flexsearch/src/document.js";
2import { Autocorrect } from "./autocorrect.js";
3
4let corrector = new Autocorrect();
5
6// Initiate your Document index with the correct indexable fields
7let findDocument = new Document({
8 document: {
9 // Add your own fields
10 index: ["title", "description", "content"],
11 },
12});
13
14// Fetch all searchable documents from the database
15let docs = DB.getAll("Documents");
16
17// Add the documents to the "findDocuments" index and the autocorrect function
18for (let i = 0; i < docs.length; i++) {
19 findDocument.add(docs[i]);
20 corrector.add(docs[i].title);
21 corrector.add(docs[i].description);
22 corrector.add(docs[i].content);
23}
Auto Correct
/autocorrect.js
Our autocorrect function works by taking every word in all of the documents and comparing the search query to each one to find the most likely match to the intended query. This is based on the assumption that every word is spelled correctly. An added benefit of this method is that if the word the user searches is not in the database, it will find the closest match to the word and still send results.
1import { closestMatch } from "closest-match";
2
3export function Autocorrect() {
4 let words = [];
5 // The add function takes a string of text and replaces any non-alphanumeric
6 // character and split the string by spaces
7 this.add = (document) => {
8 words.push(
9 ...document
10 .replace(/[^A-Za-z0-9\s]/g, "")
11 .toLowerCase()
12 .split(" ")
13 );
14 // Store the new string in the "words" bank and remove any repeating words
15 words = [...new Set(words)];
16 };
17
18 // The fix function also removes non-alphanumeric characters and splits the string
19 // by spaces. Then is loops through each word and compares the word to the
20 // word bank then returns the fixed query
21 this.fix = (text) => {
22 let query = text
23 .replace(/[^A-Za-z0-9\s]/g, "")
24 .toLowerCase()
25 .split(" ");
26 for (let i = 0; i < query.length; i++) {
27 query[i] = closestMatch(query[i], words);
28 }
29 return query.join(" ");
30 };
31}
Search Function
/search.js
Now getting into the actual searching part, we need to create the search function. We will use the npm package stopword
to remove stopwords (filler words) from our query. This improves the search results because documents with words like "the" will not be weighted in the search results. All this code goes into the same search.js
file as above so we have already loaded all of the documents into the database and the autocorrecter.
1import { removeStopwords } from "stopword/dist/stopword.esm.mjs";
2
3export function search(req, res) {
4 // The first step is to get and clean the query from the "req" using req.body.search
5 // Then use the corrector to fix the misspelled words and also remove stopwords
6 let query = removeStopwords(corrector.fix(req.body.search).split(" ")).join(
7 " "
8 );
9
10 // Search the findDocument index for the query and extract the document ids from
11 // the results, removing all repeating ids by creating a Set object.
12 let ids = [
13 ...new Set(
14 findDocument
15 .search(query)
16 .map((e) => e.result)
17 .flat()
18 ),
19 ];
20 // If no ids are returned, send an empty results object
21 if (ids.length == 0) {
22 res.end({ results: [] });
23 } else {
24 // If ids are returned, get the documents from the database and send them to the client
25 let results = DB.getRecordsById(ids);
26 res.end({ results });
27 }
28}