Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start of executable docs #777

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open

Start of executable docs #777

wants to merge 38 commits into from

Conversation

ianhi
Copy link
Contributor

@ianhi ianhi commented Feb 24, 2025

By making docs executable we will be able to automatically pick up errors. Currently I'm doing this using markdown-exec because that was the only way to integrate into the existing markdown docs while also preserving the tabbing that exists in several pages. This has the downside of currently being a bit verbose in adding to the start of each executable codeblock. However I think this is fixable with some issues I've opened upstream (pawamoy/markdown-exec#77, pawamoy/markdown-exec#76 (comment)). There are two alternatives:

  1. mkdocs-jupyter

Woudl allow for writing documentation in jupyter notebooks. however, we would lose some of the pymdownx extensions (e.g. tabbing)

  1. Switching the docs build to sphinx and using myst would allow for writing everthign in jupyter as well as preserving tab behavior. However, this would require swapping the docs framework.

Have doc build fail on unexpected error:

markdown-exec currently warns on an unexpected error. We can make the docs fail by passing --strict to mkdocs build (pawamoy/markdown-exec#75)

Alternatively mkdocs-jupyter provides this already, or we can achieve the same with myst/jupyterbook

@ianhi ianhi force-pushed the ian/docs/exec-docs branch from ff2f51e to 08341c9 Compare February 24, 2025 21:02
@ianhi
Copy link
Contributor Author

ianhi commented Feb 24, 2025

Ahh annoyingly this requires readthedocs to know how to install the dev version of icechunk, otherwise it won't be able to properly execute the docs. may take some figuring out to make that happen in readthedocs with poetry etc

@@ -14,7 +15,11 @@ build:
- poetry config virtualenvs.create false
post_install:
# Install deps and build using poetry
- . "$READTHEDOCS_VIRTUALENV_PATH/bin/activate" && cd docs && poetry install
- . "$READTHEDOCS_VIRTUALENV_PATH/bin/activate" && cd docs && poetry install && cd ../icechunk-python && maturin develop && cd ../docs
Copy link
Contributor

@dcherian dcherian Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can directly install from github with pip too if we know the commit ID. example:

pip install git+https://github.com/earth-mover/icechunk.git@COMMIT#subdirectory=icechunk-python

This will require maturin in the env, which we seem to have.

@ianhi
Copy link
Contributor Author

ianhi commented Mar 1, 2025

With the current strategy we need an upstream change before this works. I put in a PR: pawamoy/markdown-exec#80

otherwise can't finish the version control page which has codeblocks that intentional raise execeptions.

@dcherian
Copy link
Contributor

dcherian commented Mar 1, 2025

can't finish the version control page which has codeblocks that intentional raise execeptions.

Can we skip this one for now and continue with the rest?

@ianhi ianhi force-pushed the ian/docs/exec-docs branch from 26d4c49 to ea18b8b Compare March 1, 2025 00:53
@ianhi
Copy link
Contributor Author

ianhi commented Mar 1, 2025

Can we skip this one for now and continue with the rest?

sure

@ianhi ianhi force-pushed the ian/docs/exec-docs branch from ea18b8b to 4c6966a Compare March 1, 2025 00:57
# Assuming you have a valid writable Session named icechunk_session
dataset = xr.tutorial.open_dataset("rasm", chunks={"time": 1}).isel(time=slice(24))
with icechunk_session.allow_pickling():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should only be needed for the read down below.

Comment on lines 77 to 78
for snapshot in repo.ancestry(branch="main"):
print(snapshot)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for snapshot in repo.ancestry(branch="main"):
print(snapshot)
print(list(repo.ancestry(branch="main"))

perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renders poorly unfortunately. As one really long line:

image


```mermaid
```python exec="on" result="mermaid" session="version"
main_commits = [s.id[:6] for s in list(repo.ancestry(branch='main'))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've wanted a method to do this for quite long!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean creating the mermaid diagram from python? it would be amazing if it could just auto generate the tree for you in a notebook. Would just have to bundle mermaid.js

Comment on lines -13 to -20
First let's start a distributed Client and create an IcechunkStore.

```python
# initialize a distributed Client
from distributed import Client

client = Client()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noteable change here is that the first example doesn't use a client, because it was not working without allow_pickling so i only used the client after that was described


icechunk.xarray.to_icechunk(dataset, session)
# `to_icechunk` takes care of "allow_pickling" for you
icechunk.xarray.to_icechunk(dataset, icechunk_session, mode="w")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to add the mode to avoid an error.

```python
session = repo.readonly_session(snapshot_id="BSHY7B1AGAPWQC14Q18G")
```python exec="on" session="version" source="material-block" result="code"
session = repo.readonly_session(snapshot_id=list(repo.ancestry(branch="main"))[1].id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a nicer way to get the second to last commit? equivalent to HEAD~1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next(next(repo.ancestry(branch="main")))?

Comment on lines +171 to +172
- toc:
permalink: "#"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added permalinks that show up when you hover. unrelated to executable changes

@@ -59,9 +59,9 @@ def write_timestamp(*, itime: int, session: Session) -> None:

Now execute the writes.

```python exec="on" session="parallel" source="material-block" result="code"
<!-- ```python exec="on" session="parallel" source="material-block" result="code" -->
```python
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example runs fine for me locally, but ends up not writing anything to store when running on readthedocs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants