Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

long-term use of proxpi on a low-resource server #46

Closed
berouques opened this issue Apr 6, 2024 · 6 comments
Closed

long-term use of proxpi on a low-resource server #46

berouques opened this issue Apr 6, 2024 · 6 comments

Comments

@berouques
Copy link

berouques commented Apr 6, 2024

dear EpicWink, i am considering using your caching proxy proxpi on my field (semi-portable) NAS. the server is solely for my use, and it has a pretty weak CPU and 1.5GB of RAM.

i wonder if your caching proxy is suitable for long-term cache storage in case of internet disruptions. for instance, if i set PROXPI_INDEX_TTL=6570000 (several years) and accumulate cache over a few months (and after numerous power cycles), will the server operate normally? will i be able to use the cached files for up to a year?

i would appreciate any information or recommendations you could provide.
best regards, ~le berouque

EDIT: added remarks about "semi-portable" and "power cycles"

@berouques
Copy link
Author

berouques commented Apr 6, 2024

well...

i cached some packages and rebooted my "field" NAS (with proxpi configured as a service).

Before the reboot:

  • flask process = 430M of RAM
  • proxpi-cache folder = 2.3G of HDD

After reboot:

  • process flask = 20M of RAM
  • proxpi-cache folder = 2.3G of HDD

as a part of the experiment, I am going to run the script "pypi_warming_up.ps1" on my win10 PC which installs the "top 8000 most popular python packages", according to hugovk. pip is configured to query the NAS first, and in the case of 120sec timeout -- pypi repo.

staying tuned!

@EpicWink
Copy link
Owner

EpicWink commented Apr 8, 2024

There are two caches:

  • The index cache, containing the list of all files for all projects1 (and the list of all projects)
  • The files cache, containing all files downloaded via proxpi

The index cache has a TTL, which invalidates the cache on next access (ie download attempt from a client like pip). This has no memory bound, so for a sufficiently large TTL it can cause MemoryError. This cache is not saved to disk, so is wiped on server restart.

The files cache has a configurable max disk usage, and should use very little memory. It is also resilient to server restarts.

If you need a persistent cache that survives server restarts, proxpi is not what you need; it's designed as an optimistic proxy foremost. You could check out some of the alternatives, especially devpi

Footnotes

  1. aka packages

@berouques
Copy link
Author

the alternatives don't work for me because they are performance-oriented, and i need data availability — performance doesn't really bother me. when there is no internet access, slow index searches are not the biggest problem. on the other hand, NAS resources consumption does bother me.

please tell me, are there architectural obstacles in your application to saving the index on disk and loading it when needed? i want to know this before making changes to the code.

@EpicWink
Copy link
Owner

EpicWink commented Apr 8, 2024

are there architectural obstacles in your application to saving the index on disk and loading it when needed?

I'm not sure. If you keep the API of proxpi._cache._IndexCache the same, and replace the _index and _packages dict attributes with some file-based storage, it should work. You will of course need to change the eviction to not just be a time-to-live.

I won't merge any PR that makes this change (unless you create a subclass of _IndexCache, and enable it via a configuration flag) as the in-memory cache is necessary for simplicity (and therefore reliability of the code) and performance. I'm happy to help and answer questions if you simply want make a fork to suit your requirements

@berouques
Copy link
Author

Good afternoon. In the server.py file you wrote:

@app.route("/index/<package_name>/<file_name>")
def get_file(package_name: str, file_name: str):
...  
if scheme and scheme != "file":
         return flask. Redirect(path)

However, this does not allow you to use the app in Windows. I rewrote this part like this:

    if scheme in ['http', 'https', 'ftp']:
        return flask.redirect(path)

So it works now in Windows as well, but I am not sure if this could cause any problems?

@EpicWink
Copy link
Owner

EpicWink commented Apr 15, 2024

if scheme and scheme != "file": is intended to have all URLs be treated as redirect targets, and all paths to point to files to be served.

I think a better solution is to return a different type (eg pathlib.Path) rather than requiring the server to always parse a string:

Diff (click to expand):
diff --git a/src/proxpi/_cache.py b/src/proxpi/_cache.py
index 3e09f51..85049c5 100644
--- a/src/proxpi/_cache.py
+++ b/src/proxpi/_cache.py
@@ -7,6 +7,7 @@ import abc
 import time
 import shutil
 import logging
+import pathlib
 import tempfile
 import warnings
 import functools
@@ -719,13 +720,13 @@ class _FileCache:
                 return True  # default to original URL (due to timeout or HTTP error)
         return False
 
-    def _get_cached(self, url: str) -> t.Union[str, None]:
+    def _get_cached(self, url: str) -> t.Union[pathlib.Path, None]:
         """Get file from cache."""
         if url in self._files:
             file = self._files[url]
             assert isinstance(file, _CachedFile)
             file.n_hits += 1
-            return file.path
+            return pathlib.Path(file.path)
         return None
 
     def _start_downloading(self, url: str):
@@ -751,7 +752,7 @@ class _FileCache:
             os.unlink(file.path)
             existing_size -= file.size
 
-    def get(self, url: str) -> str:
+    def get(self, url: str) -> t.Union[str, pathlib.Path]:
         """Get a file using or updating cache.
 
         Args:
@@ -884,7 +885,7 @@ class Cache:
             raise exc
         return files
 
-    def get_file(self, package_name: str, file_name: str) -> str:
+    def get_file(self, package_name: str, file_name: str) -> t.Union[str, pathlib.Path]:
         """Get a file.
 
         Args:
diff --git a/src/proxpi/server.py b/src/proxpi/server.py
index 1124eca..69c754b 100644
--- a/src/proxpi/server.py
+++ b/src/proxpi/server.py
@@ -4,8 +4,8 @@ import os
 import gzip
 import zlib
 import logging
+import pathlib
 import typing as t
-import urllib.parse
 
 import flask
 import jinja2
@@ -203,8 +203,7 @@ def get_file(package_name: str, file_name: str):
     except _cache.NotFound:
         flask.abort(404)
         raise
-    scheme = urllib.parse.urlparse(path).scheme
-    if scheme and scheme != "file":
+    if not isinstance(path, pathlib.Path):
         return flask.redirect(path)
     return flask.send_file(path, mimetype=_file_mime_type)
 

See #48

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants