Like
LIKE
searches with pattern matching within strings, case-insensitive in MySQL by default
In small databases
LIKE
can be used for quick, non-indexed searches.
Avoid
LIKE
Large datasets: it can be slow because it performs a full table scan without an index.
Complex search logic: use Full-text Search or other search systems like Elasticsearch.
Full-text Search
- Full-text search uses full-text index, enabling efficient searches and relevance ranking (the default mode in MySQL)
Shiny point
Great for large datasets containing textual information.
Minimum word length
The default minimum word length for indexing is 4 characters. Words shorter than this are excluded from the index and will not be searchable using the full-text search functionality
We can change this with
ft_min_word_len
system variable in the MySQL configuration file -SET GLOBAL ft_min_word_len = <desired_length>;
. Use with care, as this increases the noise in search results and consumes more resources.
Stopwords
Common words (e.g., “the”, “is”, “and”) are ignored by default because they are in the stopword list. You can customize the stopword list.
Code
OPTIMIZE TABLE your_table_name;
, often used whenft_min_word_len
or stop list are changed.
Adding FULLTEXT Index to Existing Table
Relevance Ranking
Factors affecting the score
Term Frequency (TF): The more frequently a search term appears in the column, the higher the score.
Field Length: The longer the field, the lower the score for a given term. In large documents, terms might not contribute as much to the relevance score.
Position of the Term: If the search term appears near the beginning of the column, it may have a slightly higher score.
Inverse Document Frequency (IDF): The rarer the term (i.e., the fewer rows it appears in), the higher the score.
Tokenization
Special characters like colons are often treated as delimiters, and only the numeric part (
20093
) is indexed and searched.Characters like
::
don’t influence the search result if they are ignored during tokenization, which is why the search for20093
can matchBUSINESS_CENTER::20093
.Numeric Matching: Full-text search matches the numeric token
20093
even if it is part of a more complex string (likeBUSINESS_CENTER::20093
).