Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr support in AtoM #1817

Open
wants to merge 106 commits into
base: qa/2.x
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
942ad59
Experimental solr exploration branch init
anvit Mar 5, 2024
bb23cda
Add Solr plugin and CLI task
anvit May 7, 2024
0662eaf
Add Solr Event and Model class
anvit May 9, 2024
90f8646
Add solr Information Object model
anvit May 9, 2024
dac8e84
Add solr specific methods to QubitSearch
anvit May 10, 2024
6df3cdb
Add solr indexing using solr REST API
anvit May 10, 2024
9935d15
Mapping data fields WIP
melaniekung May 13, 2024
2284b49
Add copy field
melaniekung May 14, 2024
a5d1ba3
Fix indexing issues due to copy field
anvit May 14, 2024
52cd82b
Add Solr CLI search tool
anvit May 14, 2024
ae8630e
Clean up Solr HTTP requests
anvit May 14, 2024
dd246a7
Change CLI solr task to query using solr API
anvit May 15, 2024
7b18d2a
Fix indentation issues
anvit May 16, 2024
579a513
Add mapping for objects, nested, and arrays.
melaniekung May 21, 2024
9e9f2e5
Update flush to delete index and load mapping.
melaniekung May 22, 2024
d2cccd9
Merge all add fields calls into 1 CURL call.
melaniekung May 22, 2024
ffd55d2
Fix typos in arSolrMapping.
melaniekung May 23, 2024
1332b8d
Add Solr Query class
anvit May 23, 2024
7a0c786
Update arSolrSearchTask to accept fields param
anvit May 23, 2024
fd880cb
Add Solr Query related classes.
melaniekung May 24, 2024
4e8c632
Use DSL query format for search.
melaniekung May 27, 2024
c50835d
Rename and add more query classes.
melaniekung Jun 4, 2024
d5ff971
Remove copy field feature.
melaniekung Jun 4, 2024
13dbbbc
Remove query classes.
melaniekung Jun 6, 2024
95b44c8
Add BoolQuery functions
melaniekung Jun 6, 2024
310440e
Add Solr Query classes
anvit Jun 7, 2024
2c4d60a
Add config handler to set arSolrPlugin as default.
melaniekung Jun 10, 2024
23be945
Update BoolQuery and NestedQuery.
melaniekung Jun 10, 2024
3bfba78
Add Range Query for Solr
anvit Jun 10, 2024
34fe21e
Update BoolQuery WIP.
melaniekung Jun 11, 2024
72577dc
Add Match, MatchAll queries
anvit Jun 12, 2024
d805efa
Update mapping config params.
melaniekung Jun 12, 2024
f30dfe7
Update mapping add fields.
melaniekung Jun 13, 2024
c238922
WIP: Send object types as strings to solr schema
anvit Jun 13, 2024
0407bce
WIP: Add i18n languages.
melaniekung Jun 14, 2024
2754477
Update indexing mapping methods (WIP)
anvit Jun 14, 2024
bc7565d
Added types to index mapping methods (WIP)
anvit Jun 18, 2024
56d41ac
Update addDocument method to use types
anvit Jun 18, 2024
21be280
Add setType method to arSolrQuery
anvit Jun 18, 2024
c9ccd24
Add defineConfigParams (WIP)
melaniekung Jun 19, 2024
44a5af7
Update addDocument API
melaniekung Jun 19, 2024
0b05ce1
Set all solr fields to use multiValued (WIP)
anvit Jun 19, 2024
5c9fcca
Update arSolrQuery to use correct fields (WIP)
anvit Jun 19, 2024
791508f
Update search result id
melaniekung Jun 20, 2024
c183342
Add analyzers to schema and stopword files WIP
melaniekung Jun 21, 2024
d98ba68
Add ResultSet and QubitSolrSearchPager (WIP)
anvit Jun 21, 2024
f4bbf60
Restructure docs in arSolrResultSet
anvit Jun 24, 2024
ffa2490
Update analyzers WIP
melaniekung Jun 25, 2024
00c9bc8
Add arSolrResult class
anvit Jun 25, 2024
4494c16
Clean up arSolrPlugin class.
melaniekung Jun 26, 2024
9f4d38c
Update search analyzers.
melaniekung Jun 27, 2024
45fa2ab
Fix Solr MatchAllQuery
anvit Jun 28, 2024
8940f1f
Update solr docker config
anvit Jul 3, 2024
fed53e5
Fix query result offset issue
anvit Jul 3, 2024
19f2dcc
Add autocomplete
melaniekung Jul 4, 2024
59d3b3a
Update arSolrBoolQuery (WIP)
anvit Jul 6, 2024
b9fb137
Update arSolrExistsQuery and add functionality
anvit Jul 10, 2024
4bd6ed0
Update arSolrRangeQuery
anvit Jul 10, 2024
4c48ab0
Add multiValue fields.
melaniekung Jul 11, 2024
d420164
Add custom language analyzers.
melaniekung Jul 17, 2024
7aeefdc
Add Diacritics to analyzers WIP
melaniekung Jul 18, 2024
687c14f
Fix multivalued fields (WIP)
anvit Jul 19, 2024
73e3fe6
Update arSolrPlugin to use curl (WIP)
anvit Jul 24, 2024
3e57dc0
Add unit test file for arSolrPluginConfiguration
sbreker Jul 26, 2024
4bbadf6
Fix multi value fields in solr (WIP)
anvit Jul 26, 2024
71b9324
Add Term Query for Solr
anvit Jul 31, 2024
a61031d
Add Aggregations WIP.
melaniekung Aug 1, 2024
ebfe7ef
Add test for arSolrMatchAllQuery
melaniekung Aug 2, 2024
98faa5a
Add test for arSolrExistsQuery
sbreker Aug 2, 2024
740e1ff
Update coverage in ArSolrPluginConfigurationTest
sbreker Aug 2, 2024
d9b50de
Add test for arSolrTermQuery
anvit Aug 2, 2024
02ba680
Add type to protected vars in arSolrMatchAllQuery.
melaniekung Aug 6, 2024
931d788
Update coverage for ArSolrMatchAllQueryTest
melaniekung Aug 6, 2024
46f722e
Fix typos in ArSolrTermQueryTest
anvit Aug 6, 2024
9586710
Add test for arSolrMatchQuery
anvit Aug 6, 2024
0312789
Add require to ArSolrMatchAllQueryTest
melaniekung Aug 6, 2024
559199a
Add test for arSolrQuery.
melaniekung Aug 6, 2024
e7fb7b6
Fix PHP-CS warnings for tests
anvit Aug 6, 2024
70b62a7
Add test for arSolrRangeQuery.
melaniekung Aug 7, 2024
9ed96e2
Fix types in arSolrQuery and associated tests
anvit Aug 7, 2024
fe40ec6
Fix ArSolrRangeQueryTest
anvit Aug 7, 2024
7c72c33
Update arSolrRangeQuery test.
melaniekung Aug 7, 2024
29678ea
Fix PHP CS fixer warnings
anvit Aug 8, 2024
aead69c
Update tests.
melaniekung Aug 9, 2024
43c1faf
Update arSolrBoolQuery to use all query types
anvit Aug 15, 2024
5b8a622
Add sorting, aggregations to arSolrBoolQuery (WIP)
anvit Aug 16, 2024
6bc285f
Add multivalue fields to schema
melaniekung Aug 20, 2024
4c2321c
Add terms query for solr
anvit Aug 20, 2024
64bcce3
Add Ids query for Solr
anvit Aug 20, 2024
d3c6abf
Add setType, setPostFilter to arSolrBoolQuery
anvit Aug 20, 2024
d4ff1f9
Remove nested query for Solr.
melaniekung Aug 21, 2024
381874b
Move arSolrPluginQuery.class.php out of query dir.
melaniekung Aug 21, 2024
a376d45
Move makeHttpRequest function out of arSolrPlugin.
melaniekung Aug 21, 2024
24777ae
Handle types for aggregations in arSolrBoolQuery
anvit Aug 21, 2024
1159bd9
Add method to remove mustClause in arSolrBoolQuery
anvit Aug 21, 2024
a2da573
Add generateQuery method for arSolrPluginUtil
anvit Aug 22, 2024
4dcb650
Get config vars from search.yml
melaniekung Aug 22, 2024
367b7d8
Fix the version in arSolrResult
anvit Aug 22, 2024
09af5eb
Rename arSolrQuery to arSolrStringQuery
anvit Aug 26, 2024
984c8c3
Refactor solr api interfacing to arSolrClient
anvit Aug 27, 2024
4f99278
Refactor solr mapping from arSolrPlugin.
melaniekung Aug 28, 2024
d1ceb6c
Remove old docker and nginx solr configs
anvit Aug 29, 2024
4aef85a
Remove unused nginx solr port from docker config
anvit Aug 29, 2024
f0876e4
Add batch document handling to solr
anvit Aug 29, 2024
af4b64a
Update arSolrPluginQuery to use a bool query
anvit Sep 4, 2024
d94fe35
Add solr methods for updating documents
anvit Sep 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions docker/docker-compose.dev.yml
Original file line number Diff line number Diff line change
@@ -1,12 +1,112 @@
---
networks:
solr:

volumes:
elasticsearch_data:
percona_data:
composer_deps:
npm_deps:
solr-data:

services:

zoo1:
image: zookeeper
container_name: zoo1
restart: always
hostname: zoo1
ports:
- 2181:2181
- 7001:7000
environment:
ZOO_MY_ID: 1
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
ZOO_4LW_COMMANDS_WHITELIST: mntr, conf, ruok
ZOO_CFG_EXTRA: "metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider metricsProvider.httpPort=7000 metricsProvider.exportJvmInfo=true"
networks:
- solr

zoo2:
image: zookeeper
container_name: zoo2
restart: always
hostname: zoo2
ports:
- 2182:2181
- 7002:7000
environment:
ZOO_MY_ID: 2
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
ZOO_4LW_COMMANDS_WHITELIST: mntr, conf, ruok
ZOO_CFG_EXTRA: "metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider metricsProvider.httpPort=7000 metricsProvider.exportJvmInfo=true"
networks:
- solr

zoo3:
image: zookeeper
container_name: zoo3
restart: always
hostname: zoo3
ports:
- 2183:2181
- 7003:7000
environment:
ZOO_MY_ID: 3
ZOO_SERVERS: server.1=zoo1:2888:3888;2181 server.2=zoo2:2888:3888;2181 server.3=zoo3:2888:3888;2181
ZOO_4LW_COMMANDS_WHITELIST: mntr, conf, ruok
ZOO_CFG_EXTRA: "metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider metricsProvider.httpPort=7000 metricsProvider.exportJvmInfo=true"
networks:
- solr

solr1:
image: solr
container_name: solr1
ports:
- "8981:8983"
environment:
- ZK_HOST=zoo1:2181,zoo2:2181,zoo3:2181
networks:
- solr
depends_on:
- zoo1
- zoo2
- zoo3
volumes:
- solr-data:/var/solr

solr2:
image: solr
container_name: solr2
ports:
- "8982:8983"
environment:
- ZK_HOST=zoo1:2181,zoo2:2181,zoo3:2181
networks:
- solr
depends_on:
- zoo1
- zoo2
- zoo3
volumes:
- solr-data:/var/solr

solr3:
image: solr
container_name: solr3
ports:
- "8983:8983"
environment:
- ZK_HOST=zoo1:2181,zoo2:2181,zoo3:2181
networks:
- solr
depends_on:
- zoo1
- zoo2
- zoo3
volumes:
- solr-data:/var/solr

atom:
build: ..
env_file: etc/environment
Expand All @@ -16,6 +116,8 @@ services:
- composer_deps:/atom/src/vendor/composer
- npm_deps:/atom/src/node_modules
- ..:/atom/src:rw
networks:
- solr

atom_worker:
build: ..
Expand All @@ -31,6 +133,8 @@ services:
- composer_deps:/atom/src/vendor/composer
- npm_deps:/atom/src/node_modules
- ..:/atom/src:rw
networks:
- solr

nginx:
image: nginx:latest
Expand All @@ -41,6 +145,8 @@ services:
- ./etc/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
ports:
- "63001:80"
networks:
- solr

elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:5.6.16
Expand All @@ -53,6 +159,8 @@ services:
- elasticsearch_data:/usr/share/elasticsearch/data
ports:
- "127.0.0.1:63002:9200"
networks:
- solr

percona:
image: percona:8.0
Expand All @@ -62,14 +170,20 @@ services:
- ./etc/mysql/mysqld.cnf:/etc/my.cnf.d/mysqld.cnf:ro
ports:
- "127.0.0.1:63003:3306"
networks:
- solr

memcached:
image: memcached
command: -p 11211 -m 128 -u memcache
ports:
- "127.0.0.1:63004:11211"
networks:
- solr

gearmand:
image: artefactual/gearmand
ports:
- "127.0.0.1:63005:4730"
networks:
- solr
35 changes: 35 additions & 0 deletions lib/search/QubitSearch.class.php
Original file line number Diff line number Diff line change
Expand Up @@ -23,16 +23,51 @@
class QubitSearch
{
protected static $instance;
protected static $solrInstance;

// protected function __construct() { }
// protected function __clone() { }

public static function getSolrInstance(array $options = [])
{
$configuration = ProjectConfiguration::getActive();
if (!$configuration->isPluginEnabled('arSolrPlugin')) {
return false;
}

if (!isset(self::$solrInstance)) {
self::$solrInstance = new arSolrPlugin($options);
}

return self::$solrInstance;
}

public static function disableSolr()
{
if (!isset(self::$solrInstance)) {
self::$solrInstance = self::getSolrInstance(['initialize' => false]);
}

self::$solrInstance->disable();
}

public static function enableSolr()
{
self::$solrInstance = self::getSolrInstance();

self::$solrInstance->enable();
}

public static function getInstance(array $options = [])
{
if (!isset(self::$instance)) {
// Using arElasticSearchPlugin but other classes could be
// implemented, for example: arSphinxSearchPlugin
self::$instance = new arElasticSearchPlugin($options);
//$configuration = ProjectConfiguration::getActive();
//if ($configuration->isPluginEnabled('arSolrPlugin')) {
//self::$solr = new arSolrPlugin($options);
//}
}

return self::$instance;
Expand Down
63 changes: 63 additions & 0 deletions lib/search/QubitSolrSearchPager.class.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
<?php

/*
* This file is part of the Access to Memory (AtoM) software.
*
* Access to Memory (AtoM) is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Access to Memory (AtoM) is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Access to Memory (AtoM). If not, see <http://www.gnu.org/licenses/>.
*/

class QubitSolrSearchPager extends sfPager
{
protected $nbResults;
protected $resultSet;

public function __construct(arSolrResultSet $resultSet)
{
$this->resultSet = $resultSet;
}

/**
* @see sfPager
*/
public function init()
{
$this->setNbResults($this->resultSet->getTotalHits());

if (0 == $this->getPage() || 0 == $this->getMaxPerPage()) {
$this->setLastPage(0);
} else {
$this->setLastPage(ceil($this->getNbResults() / $this->getMaxPerPage()));
}
}

/**
* @see sfPager
*/
public function getResults()
{
// Note: to get results here beyond page 1, you'll need to call $resultSet->setFrom()
// prior to this pager's creation.
return $this->resultSet->getResults();
}

/**
* @see sfPager
*
* @param mixed $offset
*/
public function retrieveObject($offset)
{
return array_slice($this->getResults, $offset, 1);
}
}
108 changes: 108 additions & 0 deletions lib/task/search/arSolrPopulateTask.class.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
<?php

/*
* This file is part of the Access to Memory (AtoM) software.
*
* Access to Memory (AtoM) is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Access to Memory (AtoM) is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Access to Memory (AtoM). If not, see <http://www.gnu.org/licenses/>.
*/

/**
* Populate search index.
*/
class arSolrPopulateTask extends sfBaseTask
{
public function execute($arguments = [], $options = [])
{
sfContext::createInstance($this->configuration);
sfConfig::add(QubitSetting::getSettingsArray());

// If show-types flag set, show types available to index
//if (!empty($options['show-types'])) {
//$this->log(sprintf('Available document types that can be excluded: %s', implode(', ', $this->availableDocumentTypes())));
//$this->ask('Press the Enter key to continue indexing or CTRL-C to abort...');
//}

new sfDatabaseManager($this->configuration);

$solr = QubitSearch::getSolrInstance();

// Index by slug, if specified, or all indexable resources except those with an excluded type
//if ($options['slug']) {
//$logMessage = (false !== $this->attemptIndexBySlug($options)) ? 'Slug indexed.' : 'Slug not found.';
//$this->log($logMessage);
//} else {
//$populateOptions = [];
//$populateOptions['excludeTypes'] = (!empty($options['exclude-types'])) ? explode(',', strtolower($options['exclude-types'])) : null;
//$populateOptions['update'] = $options['update'];

//QubitSearch::getInstance()->populate($populateOptions);
//}
$populateOptions = [];
$populateOptions['excludeTypes'] = (!empty($options['exclude-types'])) ? explode(',', strtolower($options['exclude-types'])) : null;
$populateOptions['update'] = $options['update'];
$solr->populate($populateOptions);
}

protected function configure()
{
$this->addOptions([
new sfCommandOption('application', null, sfCommandOption::PARAMETER_OPTIONAL, 'The application name', 'qubit'),
new sfCommandOption('env', null, sfCommandOption::PARAMETER_REQUIRED, 'The environment', 'cli'),
//new sfCommandOption('slug', null, sfCommandOption::PARAMETER_OPTIONAL, 'Slug of resource to index (ignoring exclude-types option).'),
//new sfCommandOption('ignore-descendants', null, sfCommandOption::PARAMETER_NONE, "Don't index resource's descendants (applies to --slug option only)."),
new sfCommandOption('exclude-types', null, sfCommandOption::PARAMETER_OPTIONAL, 'Exclude document type(s) (command-separated) from indexing'),
//new sfCommandOption('show-types', null, sfCommandOption::PARAMETER_NONE, 'Show available document type(s), that can be excluded, before indexing'),
new sfCommandOption('update', null, sfCommandOption::PARAMETER_NONE, "Don't delete existing records before indexing."),
]);

$this->namespace = 'solr';
$this->name = 'populate';

$this->briefDescription = 'Populates the search index';
$this->detailedDescription = <<<'EOF'
The [solr:populate|INFO] task empties, populates, and optimizes the index
in the current project. It may take quite a while to run.

To exclude a document type, use the --exclude-types option. For example:

php symfony solr:populate --exclude-types="term,actor"

To see a list of available document types that can be excluded use the --show-types option.
EOF;
}

//private function availableDocumentTypes()
//{
//$types = array_keys(QubitSearch::getInstance()->loadMappings()->asArray());
//sort($types);

//return $types;
//}

//private function attemptIndexBySlug($options)
//{
//// Abort if resource doesn't exist for the provided slug
//if (null == $resource = QubitObject::getBySlug($options['slug'])) {
//return false;
//}

//// For information objects, allow optional skipping of descendants
//if ($resource instanceof QubitInformationObject) {
//$options = ['updateDescendants' => !$options['ignore-descendants']];
//QubitSearch::getInstance()->update($resource, $options);
//} else {
//QubitSearch::getInstance()->update($resource);
//}
//}
}
Loading
Loading