Skip to content

SI4T Solr Configuration 101

RaimondKempees edited this page Jul 25, 2013 · 21 revisions

This section is meant as a quick start guide to setup the SI4T / Solr on the Tridion Deployer side of things. While exotic configurations, specific tuning, Solr Schema definitions, dependency documentation and Solr configuration in general are not specified here, this page does provide the minimal setup steps to take, which can be used in the majority of Tridion Deployer configurations. All steps listed below are required in order to proceed quickly to the more fun parts of it all.

Prerequisites

The following prerequisites must be met:

  1. A properly configured Tridion Deployer, which includes a proper JRE (1.6+), proper Storage configuration and proper logging configuration;
  2. A Solr instance. More information on installing and configuring Solr, Solr cores, fine tuning can be found here. Note that SI4T / Solr is currently built against Solr version 4.2.1. While the maintainers will release builds for newer Solr versions from time to time, it is also possible to check out the SI4T/Solr source code and do a build yourself in case you need functionality of later versions of Solr.
  3. The Tridion Deployer connects to the Solr instance over http. It is therefore necessary to ensure that both the user under which the Tridion Deployer is running and the machine on which the Deployer is running are allowed to connect to the Solr instance on the configured Solr instance port.
  4. The SI4T TBBs in order to publish out content which needs to be indexed.

Placement of libraries in the Tridion Deployer's lib directory

SI4T / Solr is delivered in two separate libraries. Both of these libraries can be obtained from github in the following places:

  1. si4t.jar
  2. si4t-solr.jar

Both jar files have to be placed inside the /lib directory for each Tridion Deployer.

Apart from the standard Tridion libraries and their dependencies, the libraries listed here have to be present in the /lib directory of the Tridion Deployer which is responsible for indexing data. Warning: at this time it is recommended to leave the current jar files present in the Deployer's lib directory intact. Do not overwrite them, but do place the additional jar files next to the existing ones. At a later stage the full dependency graph will be documented in this wiki.

Placement of the Search DAO Bundle

SI4T/Solr hooks in the Tridion Storage layer by making use of custom DAO classes. In order to make these known to the Tridion Deployer, a separate xml file, SearchDAOBundle.xml must be placed in the configuration directory for the Tridion Deployer. In case a Deployer is used as cd_upload in a servlet container, the file is placed alongside the other deployer configuration files in the WEB-INF/classes directory. In case a .Net Http Deployer is used, the file is placed inside the /bin/config directory. In case a TRIDION_HOME setup is used, the file is placed in the TRIDION_HOME/config directory.

SearchDAOBundle.xml can be found in the configuration examples folder.

Configuring cd_storage_conf.xml

Configuration of cd_storage_conf.xml is needed in two places:

As is shown in this example, SI4T/Solr must be configured in at least 2 places:

  1. Configure the SearchDAOBundle in the StorageBindings section:

        <StorageBindings>
     		<!-- SI4T: 
     				configure custom DAO Bundles
     		-->
            <Bundle src="SearchDAOBundle.xml"/>
         </StorageBindings>
    
  2. Configure the DAOFactory classes for either the JPA Storage node(s) and / or the Filesystem Storage node(s):

     	<!-- SI4T: 
     			Example configuration in case JPA is used to publish pages. This is for example the case in DD4T setups.
     			The standard Class com.tridion.storage.JAPDAOFactory is overridden.
     			It is possible to mix and match JPA and FS Search DAO factories should there be need.
     	-->
     	<Storage 
     		Type="persistence" Id="defaultdb" dialect="MSSQL" 
     		Class="com.tridion.storage.si4t.JPASearchDAOFactory">
     		<Pool Type="jdbc" Size="5" MonitorInterval="60" IdleTimeout="120" CheckoutTimeout="120" />
     		<DataSource Class="com.microsoft.sqlserver.jdbc.SQLServerDataSource">
     			<Property Name="serverName" Value="[SERVERNAME]" />
     			<Property Name="portNumber" Value="[DBPORT]" />
     			<Property Name="databaseName" Value="[DBNAME]" />
     			<Property Name="user" Value="[DBBROKERUSERNAME]" />
     			<Property Name="password" Value="[DBBROKERPASSWORD]" />
     		</DataSource>
     		<!-- 	SI4T: configure the indexer class, 
     				as well as which binaries to index, the default URL to post documents to as well as
     				pointing the indexer to specific cores for specific Publications.
     		-->
     		<Indexer 
     			Class="org.si4t.solr.SolrIndexer" 
     			DefaultCoreUrl="http://localhost:8080/solr/staging" 
     			Mode="http" 
     			DocExtensions="pdf,docx,doc,xls,xlsx,pptx,ppt">
     			<Urls>
     				<!-- SI4T: 
     						The Value attribute is the complete URL to a Solr Core
     						The Id attribute denotes a unique Tridion Publication Id
     				-->
     				<Url Value="http://localhost:8080/solr/staging_pub5" Id="5" />
     				<Url Value="http://localhost:8080/solr/staging" Id="8" />
     				<Url Value="http://localhost:8080/solr/staging" Id="12" />
     			</Urls>
     		</Indexer>
     	</Storage>
     	
     	<!-- SI4T:
     			If the filesystem is used to publish pages to, override the standard FSDAOFactory
     			and configure the cores as desired
     	-->
     	<Storage Type="filesystem" 
     			Class="com.tridion.storage.si4t.FSSearchDAOFactory" 
     			Id="defaultFile" defaultFilesystem="false" defaultStorage="false">
     		<Indexer 
     			Class="org.si4t.solr.SolrIndexer" 
     			Mode="http" 
     			DefaultCoreUrl="http://localhost:8080/solr/staging" 
     			DocExtensions="pdf,docx,doc,xls,xlsx,pptx,ppt">
     			<Urls>
     				<!-- <Url Value="http://localhost:8080/solr/staging" Id="5" />-->
     				<Url Value="http://localhost:8080/solr/staging" Id="8" />
     				<Url Value="http://localhost:8080/solr/staging" Id="12" />
     			</Urls>
     		</Indexer>
     	</Storage>
    

Configuring logback.xml

It is highly recommended to configure output logging for the extension. In case of misconfiguration or other errors, having proper logging information is the fastest way to resolve issues.

An example logback.xml can be found here.

The relevant parts are:

The appender for the extension:

<appender name="extensions" class="ch.qos.logback.core.FileAppender">

<file>${log.folder}/cd_extensions.log</file>
<append>true</append>
<encoder>

	<pattern>${log.pattern}</pattern>

</encoder>

</appender>

The configuration of namespaces to log:

<!-- SI4T:
		Example logging. Turn off (or set to ERROR) when happy.
-->
<logger name="com.tridion.storage.si4t" level="DEBUG" additivity="false">
	<appender-ref ref="extensions" />
</logger>
<logger name="org.si4t.solr" level="TRACE" additivity="false">
	<appender-ref ref="extensions" />
</logger>

All Done.

Restart the Deployer instance. Publish out pages which have been configured to be indexed and check the logging and the Solr instance for successful indexing or errors which may have occured.

Clone this wiki locally