Configure "Long-Tail" Search

This section describes the Long-Tail Search feature which will allow you to have the correct search results for words that contain dashes or other non-alphabetic symbols. You can also replace the most typical errors customers make in complex product names on the fly .

For example, we have a product Canon PowerShot SX500 IS. The customer can request Canon PowerShot SX-500IS, which a default search will not find, because it differs from the actual product label.

This is because Magento by default during reindex uses only correct product labels from the database, thus, ensuring the index will contain only them - making products with complex names "ineligible" for search.

This is where "Long-tail" search comes in. During the reindex and search, this feature recognizes keywords by pattern and replaces them either with empty space or some other characters, "correcting" customer's request in real time.

In the example above, the SX500 IS can be converted to the SX500IS and during the search, the SX-500IS is also be converted to the SX500IS by replacing the '-' symbol with empty char.

This way, the search will be able to find products by several combinations of spelling the product's name.

Configuring Long-Tail Search

Go to System / Search Management / Settings / Mirasvit Extensions / Search
In the section Search Settings, go to the option Long tail.
There you can set up regular expressions to receive required search results.

  • Match Expression - the regular expression(s) that parses words for further replacing.

    Parsing is used for search index, during an indexing process, and goes for search phrases during a search. E.g. /([a-zA-Z0-9]*[\-\/][a-zA-Z0-9]*[\-\/]*[a-zA-Z0-9]*)/

  • Replace Expression - the regular expression(s) for parsing characters to be replaced. Parsing goes in the results of "Match Expression". E.g. /[\-\/]/
  • Replace Char - the character to replace values founded by "Replace Expression". E.g. empty value.

Configuring Long-Tail Search

Here are some of the most useful cases of long-tail search, implemented as corresponding rules.

  • Automatically remove '-' symbol from product names

    Create a rule with the following parameters:

    • Match Expression - /[a-zA-Z0-9]*-[a-zA-Z0-9]*/
      Matched text: SX500-123, GLX-11A, GLZX-VXV, GLZ/123, GLZV 123, CNC-PWR1
    • Replace Expression -/-/
    • Replace Char - empty
      Result text: SX500123, GLX11A, GLZXVXV, GLZ/123, GLZV-123-123, CNCPWR1
  • Automatically remove '-' and '/' symbols from product names

    Create a rule with the following parameters:

    • Match Expression - /[a-zA-Z0-9]*[ \-\/][a-zA-Z0-9]*/
      Matched text: SX500-123, GLX-11A, GLZX-VXV, GLZ/123, GLZV 123, CNC-PWR1
    • Replace Expression - /[ \-\/]/
    • Replace Char - empty
      Result text: SX500123, GLX11A, GLZXVXV, GLZ123, GLZV123, CNCPWR1
  • Automatically make solid all products names with separators

    Create a rule with the following parameters:

    • Match Expression - /[a-zA-Z0-9]*[-\/][a-zA-Z0-9]*([-\/][a-zA-Z0-9]*)?/
      Matched text: SX500-123, GLX-11A, GLZX-VXV, GLZ/123, GLZV-123-123, CNC-PWR1
    • Replace Expression - /[-\/]/
    • Replace Char - empty
      Result text: SX500123, GLX11A, GLZXVXV, GLZ123, GLZV123123, CNCPWR1
  • Automatically fix misspelled product's name

    Create a rule with the following parameters:

    • Match Expression - /([a-zA-Z0-9]*[\- ][a-zA-Z0-9]*[\-][a-zA-Z0-9]*)/
      Matched text: VHC68B-80, VHC-68B-80, VHC68B80
    • Replace Expression - /[\- ]/
    • Replace Char - empty
      Result text: VHC68B80

Moving Long-Tail Expressions from M1 to M2

Long-Tail expressions, which are used in Search Sphinx for M1 and M2 slightly differ.

In M1 Search Sphinx, you can enter one or more expressions to match, separated by '|' character. In M2, you can not.

Consider the following expression for Search Sphinx for M1:

Example

Match Expression: /[a-zA-Z0-9][ -/][a-zA-Z0-9]([ -/][a-zA-Z0-9]*)?/|/[a-zA-Z]{1,3}[0-9]{1,3}/
Replace Expression:/[ -/]/|/([a-zA-Z]{1,3})([0-9]{1,3})/
Replace Char:$1 $2

It actually contains two separate regex to match: /[a-zA-Z0-9][ -/][a-zA-Z0-9]([ -/][a-zA-Z0-9]*)?/ and /[a-zA-Z]{1,3}[0-9]{1,3}/ with respective separate expressions for replace.

You need either to reformat that expression, so it will match in single expression, or rewrite this rule as a set of two:

  • First rule

    This rule will implement the first part of the original M1 expression.

    • Match Expression: /[a-zA-Z0-9][ -/][a-zA-Z0-9]([ -/][a-zA-Z0-9]*)?/
    • Replace Expression:/[ -/]/
    • Replace Char:$1 $2
  • Second rule

    This rule will implement the second part of original M1 expression.

    • Match Expression: /[a-zA-Z]{1,3}[0-9]{1,3}/
    • Replace Expression:/([a-zA-Z]{1,3})([0-9]{1,3})/
    • Replace Char:$1 $2