Search engine building tutorial, that supports advanced search syntaxes
- It should support
AND
,OR
,NOT
; and perhaps brackets()
. - Another part of it, though, is about optimization and fuzzy searches.
- Fast even for a large body of text.
- Realizes pluralization.
- Forgiving of minor typos.
Advanced search syntaxes
I have thought about this a lot in the past.
- Letting users search the database with a simple one-liner string (and let user decide which field to search)
- What features would you want for a
q
Querystring parser? (e.g. full-text-search, or more?)
The easiest way is to use lunr.js's syntaxes.
- Default connector is
AND
. - To make an
OR
, use?expression
. - Search is normally case-insensitive, i.e.
a
andA
means the same thing. +expression
means exactly match, and case-sensitive.-expression
means negation.- Not only
:
, but also>
and<
is used to specify comparison. For example,+foo:bar
,count>1
. - Date comparison is enabled.
- Special keyword:
NOW
. +1h
means next 1 hour.-1h
mean 1 hour ago.- Available units are
y (year)
,M (month)
,w (week)
,d (day)
,h (hour)
,m (minute)
.
- Special keyword:
You can see my experiment and playground here.
Full text search and fuzzy search
I made a list, here.
- Algolia
- Elasticsearch, Lucene, Solr
- Google custom search
How does it compare to search engines with web crawlers?
- Yahoo
- Bing
- DuckDuckGo
- Yandex
- Baidu
What about pure JavaScript implementations?
- js-search
- lunr, elasticlunr
RDBMS and NoSQL's feature?
- SQLite FTS4, FTS5
- PostgreSQL plugin
- MongoDB
Or, some other implementations, like Python's Whoosh?
Implementing both together
It is easier if you use RDBMS and NoSQL's features. PostgreSQL, MySQL and MongoDB (but not SQLite) allows you to create an index on a TEXT column, and make a full-text index.
Furthermore, PostgreSQL also has pgroonga, that does not only have more language support than native tsvector; but also can index anything, including JSONB
.
Now comes the algorithm for the syntax. I made it for PostgreSQL in another project.