Like
-- Retrieve all usernames starting with "john"
SELECT * FROM users
WHERE username LIKE 'john%';
-- Find all users with `.com` email addresses
SELECT * FROM users
WHERE email LIKE '%.com';
-- Find all products containing the word "phone" in their name
SELECT * FROM products
WHERE product_name LIKE '%phone%';
-- Find employees who don't use the company's email domain
SELECT * FROM employees
WHERE email NOT LIKE '%@company.com';
-- Find all customers with phone numbers starting with "9895"
SELECT * FROM customers
WHERE phone_number LIKE '9895-____';
LIKE
searches with pattern matching within strings, case-insensitive in MySQL by default
In small databases
LIKE
can be used for quick, non-indexed searches.
Avoid
LIKE
Large datasets: it can be slow because it performs a full table scan without an index.
Complex search logic: use Full-text Search or other search systems like Elasticsearch.
Full-text Search
-- Seearch for job postings matching skills and preferences
SELECT job_title, job_description
FROM jobs
WHERE MATCH(job_title, job_description) AGAINST('software engineer remote' IN NATURAL LANGUAGE mode);
- Full-text search uses full-text index, enabling efficient searches and relevance ranking (the default mode in MySQL)
Shiny point
Great for large datasets containing textual information.
Minimum word length
The default minimum word length for indexing is 4 characters. Words shorter than this are excluded from the index and will not be searchable using the full-text search functionality
We can change this with
ft_min_word_len
system variable in the MySQL configuration file -SET GLOBAL ft_min_word_len = <desired_length>;
. Use with care, as this increases the noise in search results and consumes more resources.
Stopwords
Common words (e.g., “the”, “is”, “and”) are ignored by default because they are in the stopword list. You can customize the stopword list.
Code
OPTIMIZE TABLE your_table_name;
, often used whenft_min_word_len
or stop list are changed.
No fuzzy search!
Full-text search is a form of keyword-based search.
Word-Based Search
- searching for “cat” will return entries containing “cat” but not “catalog” unless stemming is applied.
Language-Aware Features
- Stemming: Matches variations of a word by reducing them to their root form (e.g., “running,” “runs,” and “ran” might match “run”).
- Stop Words: Common words like “the,” “and,” and “is” may be ignored during searches to improve efficiency.
- Synonym Support: Some full-text search engines allow custom synonym lists.
Relevance-Based Ranking:
- a match in a title might rank higher than a match in the body text
Adding FULLTEXT Index to Existing Table
-- Create single-col index
ALTER TABLE table_name
ADD FULLTEXT INDEX fulltext_idx_name(col_1);
-- Create multi-cols index, must search with multi-cols too
ALTER TABLE table_name
ADD FULLTEXT INDEX fulltext_idx_name(col_1, col_2);
Relevance Ranking
-- Obtain the relevance score,
-- put this as part of the selected columns
MATCH(title, content) AGAINST('MySQL optimization') AS relevance_score
Factors affecting the score
Term Frequency (TF): The more frequently a search term appears in the column, the higher the score.
Field Length: The longer the field, the lower the score for a given term. In large documents, terms might not contribute as much to the relevance score.
Position of the Term: If the search term appears near the beginning of the column, it may have a slightly higher score.
Inverse Document Frequency (IDF): The rarer the term (i.e., the fewer rows it appears in), the higher the score.
Tokenization
Special characters like colons are often treated as delimiters, and only the numeric part (
20093
) is indexed and searched.Characters like
::
don’t influence the search result if they are ignored during tokenization, which is why the search for20093
can matchBUSINESS_CENTER::20093
.Numeric Matching: Full-text search matches the numeric token
20093
even if it is part of a more complex string (likeBUSINESS_CENTER::20093
).