Installing Sphinx with Lithuanian stemmer
August 21, 2013
This page was initially written in Lithuanian. The examples contain Lithuanian phrases.
Sphinx is an open source text search engine.
Although one of the main features of Sphinx is speed, other valuable feature is ability to create text index with word endings being cut (stemmed).
It allows to search for the word without knowing exact word ending.
For instance, Lithuanian word
stalas would be transformed to word
stal and it would be possible to find it using other forms of the word:
Normally Sphinx can be installed from repository, but since stemmer is not standart Sphinx feature, and
Snowball Libstemmer library is used instead, it needs to be compiled separately.
Let’s begin :)
We will need these programs (can be installed via repository):
libmysqlclient-dev(Debian/Ubuntu distribution) or
mysql-develin other distributions (RedHat/CentOS)
Download the Sphinx source from http://sphinxsearch.com/downloads/ (in this case - version
Download Libstemmer C version from my repo https://github.com/plutzilla/sphinx-libstemmer (
git clone email@example.com:plutzilla/sphinx-libstemmer.git .) and put it to
It is not necessary to compiler Libstemmer separately (it will be compiled together with Sphinx), but if we do so, the program
stemwords will be created. It can ve used to check how stemmer works, i.e.:
Of course there are words that are stemmed inadequately:
Normally Sphinx is installed to
/usr/local. If you want to change installation path, you can provide
As always, compile with command
make, install using command
To use Sphinx from other libraries, we need to compile Sphinx Client library:
To use Sphinx from PHP, we need to isntall Sphinx PECL library:
To be able to use PECL (install packaged from PECL repository), the followings packages must be installed previously:
After we install Sphinx PECL library, we need to add sphinx extension to
php.ini file (paste
extension=sphinx.so) and reload PHP - if PHP runs as Apache module, restart Apache, if it run as FastCGI, restart FastCGI or FPM service.
To start Sphinx on system boot, we need to create an Init script - create file
/etc/init.d/search.d with content:
If we want to keep Sphinx configuration file in non-default (
/usr/local/sphinx/etc/sphinx.conf) location, we need to pass the parameter
Also, if we want to run Sphinx with non-root user, it is possible to run searchd using the following command (put it to init script):
After creating init script, we need to give ti execution permission and update rc.d configuration:
To use lithuanian stemmer, we need to provide this information to Sphinx configuration file:
It is also useful to convert lithuanian characters to latin ones (transliterate). To use it, provide this information to the index config:
I am not writing about how to use Sphinx, how to create indices and index text - you can find this information in Sphinx documentation or in manuals:
Huge thanks for lt stemmer initiative and for Linas Valiukas.