-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdocumentation.qmd
327 lines (240 loc) · 14.6 KB
/
documentation.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
# For Developers {.unnumbered}
Documentation of the development of the ACEP Data Catalog.
GitHub repository for the main Data Catalog site: [https://github.com/UAF-RCS/acepportal-ckan](https://github.com/UAF-RCS/acepportal-ckan)
For more information and guides, visit the official [CKAN Documentation](https://docs.ckan.org/en/2.10/contents.html)
## Data Sources Overview
![](images/documentation/data_sources_diagram.png)
## Developing the Data Catalog
The ACEP Data Catalog is run on a VM hosted by RCS. Extensions can be updated by pushing to the acepportal-ckan GitHub repository. After pushing, changes take ~30 min to update on the main site.
### Basic Docker Commands
__List all running containers:__
- `docker ps -a`
There are 5 containers that run the data catalog
- `acep-ckan-cont`
- `acep-db-cont`
- `acep-redis-cont`
- `acep-solr-cont`
- `acep-datapusher-cont`
__Spin up Application__
- `docker compose up`
This will turn the terminal into an output stream for the docker containers.
> TIP: I recommend keeping two terminals open: one for the output stream so you can see errors, and another to run other commands in
__Rebuild and spin up containers:__
- `docker compose up -d --build`
Run this command after installing a new extension and adding it to the `.env` file.
__Go into container:__
- `docker exec -it [container_name] /bin/bash`
Or if bash is not installed in the container:
- `docker exec -it [container_name] /bin/sh`
Or if in a bash terminal:
- `docker exec -it [container_name] bash`
__Restart a container:__
- `docker restart [container_name]`
Restart the `acep-ckan-cont` container after making changes to non-HTML files. Changes in html files can be seen by refreshing the webpage.
__Take Down Application__
- `docker compose down`
__Clean up Project__
- `docker compose down --rmi all -v --remove-orphans`
This removes all containers, images, and volumes associated with a project. __Only do this if you want to clean up your environment and reset the containers.__
### Creating a Local Instance
Creating a local version of the data catalog is a useful tool for developing and testing new features.
1. Install Docker: [https://www.docker.com/get-started/](https://www.docker.com/get-started/)
2. Clone the ACEP CKAN repository from Github: [https://github.com/UAF-RCS/acepportal-ckan.git](https://github.com/UAF-RCS/acepportal-ckan.git)
3. Create the `.env` file inside the main `acepportal-ckan` folder. Copy the contents from the `.env.example` file.
4. Specify the location of the source files, storage files, backups, etc. in the `.env` file. You will move those files to these locations in the next steps.
For example:
```ini
# CKAN Mounts Directory
CKAN_EXTENSIONS_MOUNT=./ckan-extension
SRC_EXTENSIONS_PATH=/srv/app/src_extensions
CKAN_SOURCE_MOUNT=./ckan-src/src
CKAN_STORAGE_MOUNT=./ckan-src/storage
CKAN_INI_MOUNT=./ckan-src/ckan.ini
```
3. To create a replica of the current main Data Catalog, copy over the source files, storage files, ckan.ini file, and database backups from the VM. These files are located on the VM inside `/opt/ckan/backups`. Use scp to copy the files onto your machine. These backups are created everyday: replace [date] with the most recent date in the format `yyyymmdd`.
Inside of `acepportal-ckan/ckan-src` run the following
- `scp [email protected]:/opt/ckan/backups/app_[date].tar.bz2 .`
- `scp [email protected]:/opt/ckan/backups/app_storage_[date].tar.bz2 .`
- `scp [email protected]:/opt/ckan/acepportal-ckan/ckan-src/ckan.ini`
4. Use tar to decompress the source and storage tar files
- `tar -jxvf app_[date].tar.bz2`
- `tar -jxvf app_storage_[date].tar.bz2`
Decompressing the `app_storage` tar file should create a folder called `ckan` containing the folders `resources`, `storage`, and `webassets`. Rename the `ckan` folder to `storage`.
This should result in the directory structure specified in `ckan-src/README.txt`
5. Create a backups folder alongside the `acepportal-ckan` repository on your machine. Specify the name in the `BACKUP_TO` setting in the `.env` file.
```.ini
# Backups
BACKUP_TO=../../[backups folder name]
```
6. Run the following commands inside the backups folder to copy over the database and datastore.
- `scp [email protected]:/opt/ckan/backups/ckandb_[date].tar .`
- `scp [email protected]:/opt/ckan/backups/datastore_[date].tar .`
7. Inside of the `ckan.ini` file, set the `ckan.site_url` setting to the localhost url as so:
```.ini
ckan.site_url = http://127.0.0.1:5000
```
8. Build the containers using,
- `docker compose up`
9. Once the containers are up, use the `import_database.sh` bash script to import the database.
- `bash import_database.sh`
10. Rebuild the CKAN search index.
- `docker exec -it acep-ckan-cont /bin/bash`
- `cd /srv/app`
- `ckan search-index rebuild`
### Create a New Extension
1. Enter the `acep-ckan-cont` Docker container
- `docker exec -it acep-ckan-cont /bin/bash`
and run the following command
- `ckan generate extension -o /srv/app/src/ckan-extension`
This will create an extension in the `ckan-extension` folder which can be edited outside of the container.
2. Add the extension name to the `CKAN_PLUGINS` list in the `.env` file.
3. Run `docker compose up -d --build ckan`
### Install an Extension
1. Ensure that the extension supports CKAN 2.10.4 and Python 3.10. Clone the extension repository into the `ckan-extension` folder.
2. Ensure that all dependencies for the extension are listed in `requirements.txt` or a similar file.
3. Add the extension name to the `CKAN_PLUGINS` list in the `.env` file.
4. Run `docker compose up -d --build ckan`
### Updating the Main Site
To add a feature from your local instance to the main Data Catalog,
1. Push the files to the `acepportal-ckan` GitHub repository.
- If you are adding a new extension, you must __delete the .git folder__ before adding/committing/pushing.
2. Wait about 30 min. for the changes to be pulled to VM.
3. If you have added a new extension, SSH into the VM and add the extension name to the `.env` file.
- `ssh [email protected]`
- `cd /opt/ckan/acepportal-ckan`
- `vi .env`
4. After installing new extensions or making other changes, you may need to restart the `acep-ckan-cont` container to make them take effect. Inside the VM, run
- `docker restart acep-ckan-cont`
## Extensions
### Currently Installed
[https://github.com/UAF-RCS/acepportal-ckan/tree/main/ckan-extension](https://github.com/UAF-RCS/acepportal-ckan/tree/main/ckan-extension)
#### ckanext-customtheme
__Author__: Jenae Matson
__Purpose__: Add custom theming and features for the CKAN instance, including
- ACEP logos, colors, and fonts
- Home page layout, images, and featured dataset
- Changed font weight of Register button
- Added tags to search page display
- HTML file for About page text
- Removed social media links from dataset/resources pages
- Added support contact info to dataset sidebar
- Made featured_group config option editable from admin console
- FAQ page linked in the masthead (formerly in ckanext-faqpage)
- Restriction on the visibility metadata field, only admin/sysadmins can make datasets public (formerly in ckanext-restrictpublish)
__Configuration Settings__:
- `ckan.customtheme.featured_dataset = alaska-energy-inventory`
#### ckanext-dcat
__Link__: [https://github.com/ckan/ckanext-dcat](https://github.com/ckan/ckanext-dcat)
__Purpose__: Rework metadata to conform to DCAT standard.
__Configuration Settings__:
- `ckanext.dcat.rdf.profiles = euro_dcat_ap_3`
__Modifications__:
- The file `schemas/acep_dcat_fields.yaml` was created to define the metadata fields for the catalog.
- The file `templates/scheming/form_snippets/publisher.html` was created to define the dynamic dropdown menu in the Publisher metadata field.
#### ckanext-geoview
__Link__: [https://github.com/ckan/ckanext-geoview ](https://github.com/ckan/ckanext-geoview)
__Purpose__: Created resource views for geojson and other geo-data file types. We have implemented the OpenLayers Viewer.
__Configuration Settings__:
- `ckanext.geoview.ol_viewer.default_feature_hoveron = true`
#### ckanext-githubrepopreview
__Link__: [https://github.com/DataShades/ckanext-githubrepopreview](https://github.com/DataShades/ckanext-githubrepopreview)
__Purpose__: Provide a view for GitHub repository resources.
__Modifications__: This extension was created for an older version of CKAN, so the following changes were made to make it work with version 2.10:
- In the file `plugin.py`, replace the line `from lib import parse` with the following
```py
from urllib.parse import urlparse
def parse(input_url, some_flag):
parsed_info = {}
parsed_url = urlparse(input_url)
domain = parsed_url.netloc
path_parts = parsed_url.path.strip('/').split('/')
parsed_info['domain'] = domain
parsed_info['owner'] = path_parts[0] if len(path_parts) > 0 else None
parsed_info['repo'] = path_parts[1] if len(path_parts) > 1 else None
return parsed_info
```
- In the file `templates/githubrepo.html`, delete the following lines
```
{%- block styles %}
{% resource g.main_css[6:] %}
{% endblock %}
{%- block scripts %}
{% resource 'base/main' %}
{% resource 'base/ckan' %}
{% if g.tracking_enabled %}
{% resource 'base/tracking.js' %}
{% endif %}
{% endblock -%}
```
#### ckanext-ldap
__Link__: [https://github.com/NaturalHistoryMuseum/ckanext-ldap](https://github.com/NaturalHistoryMuseum/ckanext-ldap)
__Purpose__: Allows users to login using their UA credentials (a temporary solution while the official SSO is being implemented).
__Configuration Settings__:
- `ckanext.ldap.uri = ldaps://auth.alaska.edu`
- `ckanext.ldap.auth.password = s3arch@ccount!`
- `ckanext.ldap.base_dn = ou=userAccounts,dc=ua,dc=ad,dc=alaska,dc=edu`
- `ckanext.ldap.search.filter = (sAMAccountName={login})`
- `ckanext.ldap.auth.dn = cn=rcs-ad-read,ou=RCS,ou=UAF,dc=ua,dc=ad,dc=alaska,dc=edu`
- `ckanext.ldap.username = sAMAccountName`
- `ckanext.ldap.fullname = displayName`
- `ckanext.ldap.email = mail`
- `ckanext.ldap.ckan_fallback = True`
__Modifications__: Theng from RCS has made some modifications, including improving user creation to handle existing accounts: [https://github.com/UAF-RCS/acepportal-ckan/tree/main/ckan-extension/ckanext-ldap](https://github.com/UAF-RCS/acepportal-ckan/tree/main/ckan-extension/ckanext-ldap).
#### ckanext-package-group-permissions
__Link__: [https://github.com/salsadigitalauorg/ckanext-package-group-permissions](https://github.com/salsadigitalauorg/ckanext-package-group-permissions)
__Purpose__: Allows all editors and admins to add datasets to any group, without having to be added as members to each group.
__Modifications__: To add a default "Select Group" option to the add to group dropdown menu, in the file `templates/package/group_list.html` add the following line to the "field-add_group" select form:
```
<option value="none" selected disabled>Select a Group</option>
```
> Note: The above modification was previously implemented in the ckanext-customtheme extension, but was moved to ckanext-package-group-permissions to avoid template overriding.
#### ckanext-pdfview
__Link__: [https://github.com/ckan/ckanext-pdfview](https://github.com/ckan/ckanext-pdfview)
__Purpose__: Provide a view for pdf resources.
#### ckanext-scheming
__Link__: [https://github.com/ckan/ckanext-scheming ](https://github.com/ckan/ckanext-scheming)
__Purpose__: Allows for the creation of alternate metadata templates (schemas) defined by .yaml or .json files.
__Configuration Settings__:
- `scheming.dataset_schemas ckanext.dcat.schemas:acep_dcat_fields.yaml`
- `scheming.presets = ckanext.scheming:presets.json ckanext.dcat.schemas:presets.yaml`
- `scheming.dataset_fallback = false`
__Modifications__: Some of the automatically calculated resource fields were manually re-added to be displayed. In the file `templates/scheming/package/resource_read.html`, below `{%- block resource_license -%}` add the following
```
{%- block resource_size -%}
<tr>
<th scope="row">{{ _('Size') }}</th>
<td>{{ res.size or _('unknown') }} bytes</td>
</tr>
{%- endblock -%}
{%- block resource_datastore -%}
<tr>
<th scope="row">{{ _('Datastore active') }}</th>
<td>{{ res.datastore_active or _('unknown') }}</td>
</tr>
{%- endblock -%}
```
#### ckanext-xloader
__Link__: [https://github.com/ckan/ckanext-xloader](https://github.com/ckan/ckanext-xloader)
__Purpose__: Improve data uploading, including increasing allowed file sizes.
__Configuration Settings__:
- `ckanext.xloader.jobs_db.uri=postgresql://ckandbuser:ckandbpassword@db/ckandb`
### Adding Alternate Schemas with ckanext-scheming
1. Create a .yaml or .json file in the folder `ckanext-scheming/ckanext/scheming` to define the metadata schema. See extension documentation for more information and examples.
2. In `ckan.ini`, add your schema(s) to the `scheming.dataset_schemas` config option.
For example:
```ini
scheming.dataset_schemas = ckanext.scheming:arctic_dataset.json ckanext.scheming:geo_dataset.json
```
3. The new dataset creation form is located at a url defined by the schema type name. For example, the creation form for datasets of type `arctic-dataset` is located at `/arctic-dataset/new`. You can define a new Add Dataset button using this new url.
### Attempted Extensions
#### ckanext-spatial
__Link__: [https://github.com/ckan/ckanext-spatial ](https://github.com/ckan/ckanext-spatial)
__Purpose__: This extension adds the ability to search for datasets on a map widget, as well as a dataset extent map widget on the dataset page, provided correct geospatial metadata.
__Problems__: This extension is not currently installed due to the following,
- Configuring map tiles for ckanext-spatial caused the map tiles for ckanext-geoview to disappear.
- To be indexed on the map search widget, a dataset requires a metadata field called "spatial". This would need to be integrated with the implemented DCAT metadata standard.
- Datasets with the required spatial metadata were not searchable on the map search widget, although the dataset extent widet worked correctly.
#### ckanext-oidc-pkce
__Link__: [https://github.com/DataShades/ckanext-oidc-pkce/tree/master](https://github.com/DataShades/ckanext-oidc-pkce/tree/master)
__Purpose__: This extension allows for users to be authenticated through an external application when they login.
__Problems__: Ideally users on the ACEP Data Catalog would be able to login using their UA login credentials through Google Authentication. This extension installs correctly, but does not seem to support Google Authentication.