Skip to content

SI4T Solr Configuration 101

RaimondKempees edited this page Nov 29, 2013 · 21 revisions

This section is meant as a quick start guide to setup the SI4T / Solr on the Tridion Deployer side of things. While exotic configurations, specific tuning, Solr Schema definitions, dependency documentation and Solr configuration in general are not specified here, this page does provide the minimal setup steps to take, which can be used in the majority of Tridion Deployer configurations. All steps listed below are required in order to proceed quickly to the more fun parts of it all.

Prerequisites

The following prerequisites must be met:

  1. A properly configured Tridion Deployer, which includes a proper JRE (1.6+), proper Storage configuration and proper logging configuration;
  2. A Solr instance. More information on installing and configuring Solr, Solr cores, fine tuning can be found here. Note that SI4T / Solr is currently built against Solr version 4.4.0. While the maintainers will release builds for newer Solr versions from time to time, it is also possible to check out the SI4T/Solr source code and do a build yourself in case you need functionality of later versions of Solr.
  3. The Tridion Deployer connects to the Solr instance over http. It is therefore necessary to ensure that both the user under which the Tridion Deployer is running and the machine on which the Deployer is running are allowed to connect to the Solr instance on the configured Solr instance port.
  4. The SI4T TBBs in order to publish out content which needs to be indexed.

Placement of libraries in the Tridion Deployer's lib directory

SI4T / Solr is delivered in two separate libraries. Both of these libraries can be obtained from github in the following places:

SI4T - the Storage Extension:

If you are using SI4T with Tridion 2011:

If you are using SI4T with Tridion 2013:

SI4T-Solr - the actual indexing library:

If you are using SI4T with Tridion 2011 and a Solr version lower than 4.4.0:

If you are using SI4T with Tridion 2013 and Solr 4.4.0+:

Both jar files have to be placed inside the /lib directory for each Tridion Deployer.

Copying additional dependencies

Apart from the standard Tridion libraries and their dependencies, extra dependencies are needed to properly load the Solr indexing mechanism. The libraries listed below have to be present in the /lib directory of the Tridion Deployer. Cross check your /lib directory with the /lib directories listed below. Note that Tridion 2011 and Tridion 2013 as well as Solr 4.2.1 and Solr 4.4.0 require different libraries. Compare your lib directory with either one listed below and add all missing jar files for each version. You'll note that the cd_*.jar libs are missing in the lib directories below; obviously you will need them present in your own lib directory.

If you are using SI4T with Tridion 2011 and a Solr version lower than 4.4.0:

If you are using SI4T with Tridion 2013 and Solr 4.4.0+:

As further note: it is possible to cross mix Solr versions, eg. using Solr 4.2.1 with Tridion 2013 or Solr 4.4.0 with Tridion 2011, but I leave it to the reader to mix the proper set of jar files in case this is needed.

Placement of the Search DAO Bundle

SI4T/Solr hooks in the Tridion Storage layer by making use of custom DAO classes. In order to make these known to the Tridion Deployer, a separate xml file, SearchDAOBundle.xml must be placed in the configuration directory for the Tridion Deployer. In case a Deployer is used as cd_upload in a servlet container, the file is placed alongside the other deployer configuration files in the WEB-INF/classes directory. In case a .Net Http Deployer is used, the file is placed inside the /bin/config directory. In case a TRIDION_HOME setup is used, the file is placed in the TRIDION_HOME/config directory.

SearchDAOBundle.xml can be found in the configuration examples folder.

Note: If you do not make use of either the persistence or the filesystem Factory DAO configuration set in your Storage configuration, comment out those DAOs in SearchDAOBundle.xml for the respective two types of Storage as well.

Configuring cd_storage_conf.xml

Configuration of cd_storage_conf.xml is needed in two places:

As is shown in this example, SI4T/Solr must be configured in at least 2 places:

  1. Configure the SearchDAOBundle in the StorageBindings section:

        <StorageBindings>
     		<!-- SI4T: 
     				configure custom DAO Bundles
     		-->
            <Bundle src="SearchDAOBundle.xml"/>
         </StorageBindings>
    
  2. Configure the DAOFactory classes for either the JPA Storage node(s) and / or the Filesystem Storage node(s):

     	<!-- SI4T: 
     			Example configuration in case JPA is used to publish pages. This is for example the case in DD4T setups.
     			The standard Class com.tridion.storage.JAPDAOFactory is overridden.
     			It is possible to mix and match JPA and FS Search DAO factories should there be need.
     	-->
     	<Storage 
     		Type="persistence" Id="defaultdb" dialect="MSSQL" 
     		Class="com.tridion.storage.si4t.JPASearchDAOFactory">
     		<Pool Type="jdbc" Size="5" MonitorInterval="60" IdleTimeout="120" CheckoutTimeout="120" />
     		<DataSource Class="com.microsoft.sqlserver.jdbc.SQLServerDataSource">
     			<Property Name="serverName" Value="[SERVERNAME]" />
     			<Property Name="portNumber" Value="[DBPORT]" />
     			<Property Name="databaseName" Value="[DBNAME]" />
     			<Property Name="user" Value="[DBBROKERUSERNAME]" />
     			<Property Name="password" Value="[DBBROKERPASSWORD]" />
     		</DataSource>
     		<!-- 	SI4T: configure the indexer class, 
     				as well as which binaries to index, the default URL to post documents to as well as
     				pointing the indexer to specific cores for specific Publications.
     		-->
     		<Indexer 
     			Class="org.si4t.solr.SolrIndexer" 
     			DefaultCoreUrl="http://localhost:8080/solr/staging" 
     			Mode="http" 
     			DocExtensions="pdf,docx,doc,xls,xlsx,pptx,ppt">
     			<Urls>
     				<!-- SI4T: 
     						The Value attribute is the complete URL to a Solr Core
     						The Id attribute denotes a unique Tridion Publication Id
     				-->
     				<Url Value="http://localhost:8080/solr/staging_pub5" Id="5" />
     				<Url Value="http://localhost:8080/solr/staging" Id="8" />
     				<Url Value="http://localhost:8080/solr/staging" Id="12" />
     			</Urls>
     		</Indexer>
     	</Storage>
     	
     	<!-- SI4T:
     			If the filesystem is used to publish pages to, override the standard FSDAOFactory
     			and configure the cores as desired
     	-->
     	<Storage Type="filesystem" 
     			Class="com.tridion.storage.si4t.FSSearchDAOFactory" 
     			Id="defaultFile" defaultFilesystem="false" defaultStorage="false">
     		<Indexer 
     			Class="org.si4t.solr.SolrIndexer" 
     			Mode="http" 
     			DefaultCoreUrl="http://localhost:8080/solr/staging" 
     			DocExtensions="pdf,docx,doc,xls,xlsx,pptx,ppt">
     			<Urls>
     				<!-- <Url Value="http://localhost:8080/solr/staging" Id="5" />-->
     				<Url Value="http://localhost:8080/solr/staging" Id="8" />
     				<Url Value="http://localhost:8080/solr/staging" Id="12" />
     			</Urls>
     		</Indexer>
     	</Storage>
    

Configuring logback.xml

It is highly recommended to configure output logging for the extension. In case of misconfiguration or other errors, having proper logging information is the fastest way to resolve issues.

An example logback.xml can be found here.

The relevant parts are:

  1. The appender for the extension:
	<file>${log.folder}/cd_extensions.log</file>
	<append>true</append>
	<encoder>
		<pattern>${log.pattern}</pattern>
	</encoder>
</appender>```


2. The configuration of namespaces to log:


```<!-- SI4T:
			Example logging. Turn off (or set to ERROR) when happy.
	-->
	<logger name="com.tridion.storage.si4t" level="DEBUG" additivity="false">
		<appender-ref ref="extensions" />
	</logger>
	<logger name="org.si4t.solr" level="TRACE" additivity="false">
		<appender-ref ref="extensions" />
	</logger>```

## All Done

Restart the Deployer instance. Publish out pages which have been configured to be indexed and check the logging and the Solr instance for successful indexing or errors which may have occured.

## Next step: configure Solr

The Solr / Lucene search index has be configured in order to properly index fields coming from the Tridion CM. The minimal schema configuration needed for SI4T can be found on the [Solr Schema Configuration](https://github.com/SI4T/Solr/wiki/Solr-Schema-Configuration) page.
Clone this wiki locally