diff --git a/docs/requirements.txt b/docs/requirements.txt
index 43d13278..41487bbe 100644
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@@ -1,6 +1,6 @@
 # with version specifier
 sphinx>=7.2.6
-pydata-sphinx-theme>=0.14.3
+pydata-sphinx-theme>=0.14.4
 docutils>=0.20.1
 # without version specifier
 trafilatura
diff --git a/docs/tutorial-epsilla.rst b/docs/tutorial-epsilla.rst
index 1c491f79..ae0dcc58 100644
--- a/docs/tutorial-epsilla.rst
+++ b/docs/tutorial-epsilla.rst
@@ -1,5 +1,5 @@
-Text embedding
-===============
+Tutorial: Text embedding
+========================
 
 .. meta::
     :description lang=en:
@@ -28,7 +28,11 @@ In this tutorial, we will show you how to perform text embedding on results from
 Setup Epsilla
 ------------------------------------------------
 
-In this tutorial, we will run an Epsilla databse server. You can start one locally with a `Docker <https://docs.docker.com/get-started/>`_ image.
+In this tutorial, we will need an Epsilla database server. There are two ways to get one: use the free cloud version or start one locally.
+
+Epsilla has a `cloud version <https://cloud.epsilla.com//?ref=trafilatura>`_ with a free tier. You can sign up and get a server running in a few steps.
+
+Alternatively, you can start one locally with a `Docker <https://docs.docker.com/get-started/>`_ image.
 
 .. code-block:: bash
 
@@ -37,6 +41,8 @@ In this tutorial, we will run an Epsilla databse server. You can start one local
 
 See `Epsilla documentation <https://epsilla-inc.gitbook.io/epsilladb/quick-start>`_ for a full quick start guide.
 
+The rest of this guide assumes you are running a local Epsilla server on port 8888. If you are using the cloud version, replace the host and port with the cloud server address.
+
 We need to install the database client. You can do this with pip:
 
 .. code-block:: bash
@@ -145,5 +151,7 @@ We can now perform a vector search to find the most relevant project based on a
 
 You will see the returned response is React! That is the correct answer. React is a modern frontend library, but PyTorch and Tensorflow are not.
 
+.. image:: https://static.scarf.sh/a.png?x-pxid=51f549d1-aabf-473c-b971-f8d9c3ac8ac5
+    :alt: 
 
 
diff --git a/docs/used-by.rst b/docs/used-by.rst
index 0b514bac..c6ca6b39 100644
--- a/docs/used-by.rst
+++ b/docs/used-by.rst
@@ -18,11 +18,12 @@ Notable projects using this software
 Known institutional users
 ^^^^^^^^^^^^^^^^^^^^^^^^^
 
-- `Data against Feminicide <https://datoscontrafeminicidio.net/>`_
-- `Kagi search engine <https://kagi.com/>`_ (notably Teclis component)
-- `Media Cloud platform <https://mediacloud.org>`_ for media analysis
-- The Internet Archive's `sandcrawler <https://github.com/internetarchive/sandcrawler>`_ which crawls and processes the scholarly web for the `Fatcat catalog <https://fatcat.wiki/>`_ of research publications
+- Falcon LLM (TII UAE) and its underlying `RefinedWeb Dataset <https://arxiv.org/abs/2306.01116>`_
+- `FinGPT <https://arxiv.org/abs/2311.05640>`_ (Finland)
+- `Media Cloud platform <https://mediacloud.org>`_ for media analysis, e.g. `Data against Feminicide <https://datoscontrafeminicidio.net/>`_
 - `SciencesPo médialab <https://medialab.sciencespo.fr>`_ through its `Minet <https://github.com/medialab/minet>`_ webmining package
+- `Teclis component <https://teclis.com/>`_ of the Kagi search engine
+- The Internet Archive's `sandcrawler <https://github.com/internetarchive/sandcrawler>`_ which crawls and processes the scholarly web for the `Fatcat catalog <https://fatcat.wiki/>`_ of research publications
 
 
 Various software repositories
@@ -32,6 +33,7 @@ Various software repositories
 - `CommonCrawl downloader <https://github.com/leogao2/commoncrawl_downloader>`_, to derive massive amounts of language data
 - `GLAM Workbench <https://glam-workbench.github.io/web-archives/>`_ for cultural heritage (web archives section)
 - `llama-hub <https://github.com/emptycrown/llama-hub>`_, a library of data loaders for large language models
+- `LlamaIndex <https://github.com/run-llama/llama_index>`_, a data framework for LLM applications
 - `Obsei <https://obsei.com/>`_, a text collection and analysis tool
 - `Vulristics <https://github.com/leonov-av/vulristics>`_, a framework for analyzing publicly available information about vulnerabilities
 - `Website-to-Chatbot <https://github.com/Anil-matcha/Website-to-Chatbot>`_, a personalized chatbot
@@ -114,6 +116,7 @@ Publications citing Trafilatura
 - Brandon, C., Doherty, A. J., Kelly, D., Leddin, D., & Margaria, T. (2023). HIPPP: Health Information Portal for Patients and Public. Applied Sciences, 13(16), 9453.
 - Braun, D. (2021). "Automated Semantic Analysis, Legal Assessment, and Summarization of Standard Form Contracts", PhD Thesis, Technische Universität München.
 - Chen, X., Zeynali, A., Camargo, C., Flöck, F., Gaffney, D., Grabowicz, P., ... & Samory, M. (2022). SemEval-2022 Task 8: Multilingual news article similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) (pp. 1094-1106).
+- De Cesare, A. M. (2023). Assessing the quality of ChatGPT’s generated output in light of human-written texts: A corpus study based on textual parameters. CHIMERA: Revista de Corpus de Lenguas Romances y Estudios Lingüísticos, 10, 179-210.
 - Di Giovanni, M., Tasca, T., & Brambilla, M. (2022). DataScience-Polimi at SemEval-2022 Task 8: Stacking Language Models to Predict News Article Similarity. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) (pp. 1229-1234).
 - Dumitru, V., Iorga, D., Ruseti, S., & Dascalu, M. (2023). Garbage in, garbage out: An analysis of HTML text extractors and their impact on NLP performance. In 2023 24th International Conference on Control Systems and Computer Science (CSCS) (pp. 403-410). IEEE.
 - Fröbe, M., Hagen, M., Bevendorff, J., Völske, M., Stein, B., Schröder, C., ... & Potthast, M. (2021). "The Impact of Main Content Extraction on Near-Duplicate Detection". arXiv preprint arXiv:2111.10864.
@@ -131,6 +134,7 @@ Publications citing Trafilatura
 - Kuehn, P., Schmidt, M., & Reuter, C. (2023). ThreatCrawl: A BERT-based Focused Crawler for the Cybersecurity Domain. arXiv preprint arXiv:2304.11960.
 - Laippala, V., Rönnqvist, S., Hellström, S., Luotolahti, J., Repo, L., Salmela, A., ... & Pyysalo, S. (2020). "From Web Crawl to Clean Register-Annotated Corpora", Proceedings of the 12th Web as Corpus Workshop (pp. 14-22).
 - Laippala, V., Salmela, A., Rönnqvist, S., Aji, A. F., Chang, L. H., Dhifallah, A., ... & Pyysalo, S. (2022). Towards better structured and less noisy Web data: Oscar with Register annotations. In Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022) (pp. 215-221).
+- Luukkonen, R., Komulainen, V., Luoma, J., Eskelinen, A., Kanerva, J., Kupari, H. M., ... & Pyysalo, S. (2023). FinGPT: Large Generative Models for a Small Language. arXiv preprint arXiv:2311.05640.
 - Madrid-Morales, D. (2021). "Who Set the Narrative? Assessing the Influence of Chinese Media in News Coverage of COVID-19 in 30 African Countries", Global Media and China, 6(2), 129-151.
 - Meier-Vieracker, S. (2022). "Fußballwortschatz digital–Korpuslinguistische Ressourcen für den Sprachunterricht." Korpora Deutsch als Fremdsprache (KorDaF), 2022/01 (pre-print).
 - Meng, K. (2021). "An End-to-End Computational System for Monitoring and Verifying Factual Claims" (pre-print).
@@ -140,6 +144,7 @@ Publications citing Trafilatura
 - Öhman, J., Verlinden, S., Ekgren, A., Gyllensten, A. C., Isbister, T., Gogoulou, E., ... & Sahlgren, M. (2023). The Nordic Pile: A 1.2 TB Nordic Dataset for Language Modeling. arXiv preprint arXiv:2303.17183.
 - Penedo, G., Malartic, Q., Hesslow, D., Cojocaru, R., Cappelli, A., Pannier, B., ... & Launay, J. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only.
 - Piskorski, J., Stefanovitch, N., Da San Martino, G., & Nakov, P. (2023). Semeval-2023 task 3: Detecting the category, the framing, and the persuasion techniques in online news in a multi-lingual setup. In Proceedings of the the 17th International Workshop on Semantic Evaluation (SemEval-2023) (pp. 2343-2361).
+- Pohlmann, J., Barbaresi, A., & Leinen, P. (2023). Platform regulation and “overblocking”–The NetzDG discourse in Germany. Communications, 48(3), 395-419.
 - Robertson, F., Lagus, J., & Kajava, K. (2021). "A COVID-19 news coverage mood map of Europe", Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation (pp. 110-115).
 - Salmela, A. (2022). "Distinguishing Noise and Main Text Content from Web-Sourced Plain Text Documents Using Sequential Neural Networks", Master's thesis, University of Turku.
 - Sawczyn, A., Binkowski, J., Janiak, D., Augustyniak, Ł., & Kajdanowicz, T. (2021). "Fact-checking: relevance assessment of references in the Polish political domain", Procedia Computer Science, 192, 1285-1293.
diff --git a/tests/metadata_tests.py b/tests/metadata_tests.py
index 770e2cbd..551e9e3d 100644
--- a/tests/metadata_tests.py
+++ b/tests/metadata_tests.py
@@ -192,6 +192,7 @@ def test_dates():
     mystring = '<html><body><p>Veröffentlicht am 1.9.17</p></body></html>'
     metadata = extract_metadata(mystring, fastmode=False)
     assert metadata.date == '2017-09-01'
+    # behavior for fastmode=True changed in htmldate==1.6.0. On 1.5.2 and earlier, result was None
     metadata = extract_metadata(mystring, fastmode=True)
     assert metadata.date == '2017-09-01'