Fuzzy Search Explained

This blog explains how fuzzy search works and why it's sometimes funky!

badosz

badosz

3 min read

August 23, 2021

Hello!

Badosz here, I'm the new programmer in our team and wanted to share with you all how our new fuzzy search works and why it's so funky sometimes. I'll go also into technical details.

Currently we have over 100 items in Dank Memer and people search for them all the time (in commands like pls item, pls buy, pls sell etc.)

Motivation

With so many items and using them constantly people will make typos from time to time. For real grinders it can be really annoying so I decided to do something about this! It's also important to mention here that current fuzzy search is suppose to help you with small typos. It's not an AI and will not guess what item you want to search from random letters, although it's sometimes amazes me with its accuracy.

Core idea

When searching for something, the algorithm goes through every item and assigns "similarity" value (on scale 0-1). The highest similarity (0.999, 1) is given when item's ID or name match exactly the query.

badoszcard-> 1.0

Badosz's Card -> 0.999

Next, items with names/IDs that contain query with additional space before, after (or after and before) are assigned a similarity value 0.997-0.998. This is mostly to handle rare cases, example being "Tip Jar" and "Multi Colored Phallic Object". Before adding this step, both items contained "tip" in their ID and both received same "similarity" value, which resulted in Multi Colored Phallic Object being returned for "tip" query (because it was earlier in our item array). This step gave Tip Jar higher similarity as it has a space after "tip".

The last hardcoded similarity(0.996) is, as mentioned before, assigned when the query just exists in name or ID, for example "ados" -> "Badosz's Card".

What's next?

If every previous check fails, we calculate how many "edits" we need to do to get from query to item's name, for example from "Tip JaX" to "Tip Jar" we only need to do 1 edit, but to get to "Badosz's Card" that would be as many as 11 edits! We use this code to check how many edits have to be done:

code

After calculating edit steps we assign the similarity(edit steps divided by longer string). The returned item you see is the item with biggest similarity!

The Problem

One problem with this approach is that we will always return something, because everything will have a similarity (even if really small). For example "dsadksadlk" will result in "Padlock" with similarity of 0.3 (which is obviously very small). Therefore I added "minimum required similarity" of 0.65 for commands where you interact with items (buy, sell etc.). You can still type random things for fun in pls item command ;)

TheLazyTownie also discovered that when you use non-english characters the similarity of everything will be 0, so we will return the first item in the items array, for the time of writing this blog it's "New Player Pack".

Funny cases

One reddit user when writing about Badosz's Card terribly twisted my name to "bazozka". Turns out fuzzy search still finds the card with similarity of 0.38.

"Kable" also results in "garbage", which I honestly find quite funny.

Play with it yourself

I made a small website so you can play with the fuzzy search and see similarity values: https://g.badosz.com/dank/fuzzy

Important Note: In the actual bot we require search query to be a min. of 3 characters.