UPDATE: New post on getting Multicore Solr 3.4 running on Ubuntu 10.04
Been working a lot lately with the Apache Solr project.
Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g., Word, PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world’s largest internet sites.
Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr’s powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.
One of the features of Solr is called multicore. Multicore in the context of Solr simply means running multiple instances of Solr using the same servlet container allowing for separate configurations and indexes per core while still allowing administration through one interface. The Solr wiki defines it as:
Multiple cores let you have a single Solr instance with separate configurations and indexes, with their own config and schema for very different applications, but still have the convenience of unified administration. Individual indexes are still fairly isolated, but you can manage them as a single application, create new indexes on the fly by spinning up new SolrCores, and even make one SolrCore replace another SolrCore without ever restarting your Servlet Container.
Although I’ve setup a few instances of Solr using tomcat, I thought I’d write out just how easy it is to get Solr up and running using Ubuntu Server 10.04 as well as talk about some of the scripts I’ve written to make the process of adding, removing and reloading cores easier. This post assumes you have already installed Ubuntu server with internet access as well having a basic understanding of how to use Ubuntu and Linux in general.
Installing Solr
On your Ubuntu server, become root using ‘sudo su -‘ and issue the following command:
apt-get install solr-tomcat curl -y
This will install Solr from Ubuntu’s repositories as well as install and configure Tomcat. At this point, you have a fully working Solr installation that only needs to be tweaked for your environment. Solr itself lives in three spots, /usr/share/solr, /var/lib/solr/ and /etc/solr. These directories contain the solr home director, data directory and configuration data respectively.
Enable Multicore
Enabling multicore is as simple as creating solr.xml in the /usr/share/solr directory and restarting Tomcat. Once you’ve done this, you only need to restart under certain conditions. Under normal operations, you should never need to restart Tomcat.
Using your favorite text editor create a file called solr.xml at /usr/share/solr with the following contents:
<solr persistent="true" sharedLib="lib"> <cores adminPath="/admin/cores"> </cores> </solr>
Next, you need to ensure that Tomcat is able to write out new versions of the solr.xml file. As cores are added or removed, this file is updated. The following commands ensure Tomcat has write permissions to needed directory and file
chown tomcat6.tomcat6 /usr/share/solr/solr.xml chown tomcat6.tomcat6 /usr/share/solr
That’s it. You can now issue the following command to restart Tomcat and in turn Solr:
service tomcat6 restart
Managing Cores
At this point you’re ready to start creating new cores. Before you can do so however you need create config files, directories and set permissions. In order to make this process a bit easier I created a set of scripts that do all of this for you based on a template config directory.
Create the template config directory by issuing the following command:
cp -av /etc/solr/conf /etc/solr/conftemplate
Next, edit /etc/solr/conftemplate/solrconfig.xml and find the dataDir option. Change the dataDir line from:
<dataDir>/var/lib/solr/data</dataDir>
To:
<dataDir>/var/lib/solr/data/CORENAME</dataDir>
This will ensure the scripts work correctly.
Creating a new Core
Below is the newCore script. Copy and paste it into a file and call it newCore
#!/bin/bash # creates a new Solr core if [ "$1" = "" ]; then echo -n "Name of core to create: " read name else name=$1 fi mkdir /var/lib/solr/data/$name chown tomcat6.tomcat6 /var/lib/solr/data/$name mkdir -p /etc/solr/conf/$name/conf cp -a /etc/solr/conftemplate/* /etc/solr/conf/$name/conf/ sed -i "s/CORENAME/$name/" /etc/solr/conf/$name/conf/solrconfig.xml curl "http://localhost:8080/solr/admin/cores?action=CREATE&name=$name&instanceDir=/etc/solr/conf/$name"
You can now create a new core by issuing the following command
./newCore core0
On screen you should get something similar to this if it was successful:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">352</int></lst><str name="core">core0</str><str name="saved">/usr/share/solr/solr.xml</str> </response>
If you get any other response, particularly one about permissions, go back and review this post as you’ve most likely missed something.
This script has created a new Solr core with the configuration directory set to /etc/solr/conf/core0/conf. There you can edit the schema.xml file. To view the default schema.xml file, you can visit http://localhost:8080/solr/core0/admin/. Replace localhost with the hostname or IP address of your Solr server if it is not localhost.
Next time I’ll talk about how to import documents into a core as well as how to reload a core, swap cores or remove/unload a core and merge the index between two or more cores.
Update: Here are the rest of the scripts I’ve written for Solr
Reload a Core
Save to a file called reloadCore
#!/bin/bash # reloads a Solr core if [ "$1" = "" ]; then echo -n "Name of core to reload: " read name else name=$1 fi if [ ! -d /var/lib/solr/data/$name ] || [ $name = "" ]; then echo "Core doesn't exist" exit fi curl "http://localhost:8080/solr/admin/cores?action=RELOAD&core=$name"
Swap Cores
Save to a file called swapCores
#!/bin/bash # swaps two Solr cores if [ "$2" = "" ]; then echo -n "Name of first core: " read name1 echo -n "Name of second core: " read name2 else name1=$1 name2=$2 fi if [ ! -d /var/lib/solr/data/$name ] || [ $name2 = "" ]; then echo "Core doesn't exist" exit fi curl "http://localhost:8080/solr/admin/cores?action=SWAP&core=$name1&other=$name2"
Unload/Delete a Core
Save to a file called unloadCore
#!/bin/bash clear echo "*************************************************************************" echo "*************************************************************************" echo echo " You are about to *permanently* delete a core!" echo " There is no going back" echo echo "*************************************************************************" echo "*************************************************************************" echo echo -n "Type 'delete core' to continue or control-c to bail: " read answer if [ "$answer" != "delete core" ]; then exit fi # removes a Solr core if [ "$1" = "" ]; then echo -n "Name of core to remove: " read name else name=$1 fi if [ ! -d /var/lib/solr/data/$name ] || [ $name = "" ]; then echo "Core doesn't exist" exit fi curl "http://localhost:8080/solr/admin/cores?action=UNLOAD&core=$name" sleep 5 rm -rf /var/lib/solr/data/$name rm -rf /etc/solr/conf/$name
Merge Cores
Save to a file called mergeCores
#!/bin/bash # merges two Solr cores if [ "$2" = "" ]; then echo -n "Name of first core: " read name1 echo -n "Name of second core: " read name2 else name1=$1 name2=$2 fi if [ ! -d /var/lib/solr/data/$name ] || [ $name2 = "" ]; then echo "Core doesn't exist" exit fi curl "http://localhost:8080/solr/$name1/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8' curl "http://localhost:8080/solr/$name2/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8' curl "http://localhost:8080/solr/admin/cores?action=mergeindexes&core=$name1&indexDir=/var/lib/solr/data/$name2/index" curl "http://localhost:8080/solr/$name1/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8' curl "http://localhost:8080/solr/$name2/update" --data-binary '' -H 'Content-type:text/xml; charset=utf-8'
Hi:
I just wanted to say thank you so much for your clarity and help. I’ve been driving crazy trying to find a decent piece of documentation.
Great work! 😀
I also want to say thank you for these instructions. I was going to give up and try the multi-core approach by starting with a scratch tomcat server and installing solr after but this works out in a much better format.
To follow up on what you have done, I wrote a bash script to remove (unload) a core along w/ the configurations that was installed on the server. I tried it out on my own and it looks to be working well 🙂 http://pastebin.com/r6bjYS1C
Your a legend!
Had to change a few parts to get it working on Solr 3 but overall brilliant tutorial!
Happy to hear it is working well. I’m actually working today on getting solr3 running side by side with the old solr 1.4 on Ubuntu 10.04. Everything is running, just need to modify the scripts for working with cores.
Excellent, thanks for this.
on ubuntu 11.04 needed first
apt-get install sun-java6-jdk
I then replaced solr.xml and solrconf.xml
with drupal specific files from apache solar module and your scripts work great.
Thanks!