UPDATE: New post on getting Multicore Solr 3.4 running on Ubuntu 10.04

Been working a lot lately with the Apache Solr project.

Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites.

Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr’s powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.

One of the features of Solr is called multicore. Multicore in the context of Solr simply means running multiple instances of Solr using the same servlet container allowing for separate configurations and indexes per core while still allowing administration through one interface. The Solr wiki defines it as:

Multiple cores let you have a single Solr instance with separate configurations and indexes, with their own config and schema for very different applications, but still have the convenience of unified administration. Individual indexes are still fairly isolated, but you can manage them as a single application, create new indexes on the fly by spinning up new SolrCores, and even make one SolrCore replace another SolrCore without ever restarting your Servlet Container.

Although I’ve setup a few instances of Solr using tomcat, I thought I’d write out just how easy it is to get Solr up and running using Ubuntu Server 10.04 as well as talk about some of the scripts I’ve written to make the process of adding, removing and reloading cores easier. This post assumes you have already installed Ubuntu server with internet access as well having a basic understanding of how to use Ubuntu and Linux in general.

Installing Solr
On your Ubuntu server, become root using ‘sudo su -‘ and issue the following command:

apt-get install solr-tomcat curl -y

This will install Solr from Ubuntu’s repositories as well as install and configure Tomcat. At this point, you have a fully working Solr installation that only needs to be tweaked for your environment. Solr itself lives in three spots, /usr/share/solr, /var/lib/solr/ and /etc/solr. These directories contain the solr home director, data directory and configuration data respectively.

Enable Multicore
Enabling multicore is as simple as creating solr.xml in the /usr/share/solr directory and restarting Tomcat. Once you’ve done this, you only need to restart under certain conditions. Under normal operations, you should never need to restart Tomcat.

Using your favorite text editor create a file called solr.xml at /usr/share/solr with the following contents:

<solr persistent="true" sharedLib="lib">
 <cores adminPath="/admin/cores">
 </cores>
</solr>

Next, you need to ensure that Tomcat is able to write out new versions of the solr.xml file.  As cores are added or removed, this file is updated.  The following commands ensure Tomcat has write permissions to needed directory and file

chown tomcat6.tomcat6 /usr/share/solr/solr.xml
chown tomcat6.tomcat6 /usr/share/solr

That’s it.  You can now issue the following command to restart Tomcat and in turn Solr:

service tomcat6 restart

Managing Cores
At this point you’re ready to start creating new cores. Before you can do so however you need create config files, directories and set permissions. In order to make this process a bit easier I created a set of scripts that do all of this for you based on a template config directory.

Create the template config directory by issuing the following command:

cp -av /etc/solr/conf /etc/solr/conftemplate

Next, edit /etc/solr/conftemplate/solrconfig.xml and find the dataDir option. Change the dataDir line from:

<dataDir>/var/lib/solr/data</dataDir>

To:

<dataDir>/var/lib/solr/data/CORENAME</dataDir>

This will ensure the scripts work correctly.

Creating a new Core

Below is the newCore script.  Copy and paste it into a file and call it newCore

#!/bin/bash

# creates a new Solr core
if [ "$1" = "" ]; then
echo -n "Name of core to create: "
read name
else
name=$1
fi

mkdir /var/lib/solr/data/$name
chown tomcat6.tomcat6 /var/lib/solr/data/$name

mkdir -p /etc/solr/conf/$name/conf
cp -a /etc/solr/conftemplate/* /etc/solr/conf/$name/conf/
sed -i "s/CORENAME/$name/" /etc/solr/conf/$name/conf/solrconfig.xml
curl "http://localhost:8080/solr/admin/cores?action=CREATE&name=$name&instanceDir=/etc/solr/conf/$name"

You can now create a new core by issuing the following command

./newCore core0

On screen you should get something similar to this if it was successful:

<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader"><int name="status">0</int><int name="QTime">352</int></lst><str name="core">core0</str><str name="saved">/usr/share/solr/solr.xml</str>
</response>

If you get any other response, particularly one about permissions, go back and review this post as you’ve most likely missed something.

This script has created a new Solr core with the configuration directory set to /etc/solr/conf/core0/conf.  There you can edit the schema.xml file.  To view the default schema.xml file, you can visit http://localhost:8080/solr/core0/admin/. Replace localhost with the hostname or IP address of your Solr server if it is not localhost.

Next time I’ll talk about how to import documents into a core as well as how to reload a core, swap cores or remove/unload a core and merge the index between two or more cores.

Update:  Here are the rest of the scripts I’ve written for Solr

Reload a Core

Save to a file called reloadCore

#!/bin/bash

# reloads a Solr core
if [ "$1" = "" ]; then
  echo -n "Name of core to reload: "
  read name
else
  name=$1
fi

if [ ! -d /var/lib/solr/data/$name ] || [ $name = "" ]; then
  echo "Core doesn't exist"
  exit
fi

curl "http://localhost:8080/solr/admin/cores?action=RELOAD&core=$name"

 

Swap Cores

Save to a file called swapCores

#!/bin/bash

# swaps two Solr cores
if [ "$2" = "" ]; then
  echo -n "Name of first core: "
  read name1
  echo -n "Name of second core: "
  read name2
else
  name1=$1
  name2=$2
fi

if [ ! -d /var/lib/solr/data/$name ] || [ $name2 = "" ]; then
  echo "Core doesn't exist"
  exit
fi

curl "http://localhost:8080/solr/admin/cores?action=SWAP&core=$name1&other=$name2"

Unload/Delete a Core

Save to a file called unloadCore

#!/bin/bash

clear
echo "*************************************************************************"
echo "*************************************************************************"
echo
echo "            You are about to *permanently* delete a core!"
echo "                      There is no going back"
echo
echo "*************************************************************************"
echo "*************************************************************************"
echo
echo -n "Type 'delete core' to continue or control-c to bail: "
read answer

if [ "$answer" != "delete core" ]; then
 exit
fi
# removes a Solr core
if [ "$1" = "" ]; then
 echo -n "Name of core to remove: "
 read name
else
 name=$1
fi

if [ ! -d /var/lib/solr/data/$name ] || [ $name = "" ]; then
 echo "Core doesn't exist"
 exit
fi

curl "http://localhost:8080/solr/admin/cores?action=UNLOAD&core=$name"
sleep 5
rm -rf /var/lib/solr/data/$name

rm -rf  /etc/solr/conf/$name

Merge Cores

Save to a file called mergeCores

#!/bin/bash

# merges two Solr cores
if [ "$2" = "" ]; then
  echo -n "Name of first core: "
  read name1
  echo -n "Name of second core: "
  read name2
else
  name1=$1
  name2=$2
fi

if [ ! -d /var/lib/solr/data/$name ] || [ $name2 = "" ]; then
  echo "Core doesn't exist"
  exit
fi

curl "http://localhost:8080/solr/$name1/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8'
curl "http://localhost:8080/solr/$name2/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8'
curl "http://localhost:8080/solr/admin/cores?action=mergeindexes&core=$name1&indexDir=/var/lib/solr/data/$name2/index"
curl "http://localhost:8080/solr/$name1/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8'
curl "http://localhost:8080/solr/$name2/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8'