Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean makefile, pyproject.toml and CI #229

Merged
merged 9 commits into from
Jan 17, 2025

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Jan 16, 2025

Hey there, I've taken a look at the repo structure and made a few changes to the repo settings. I do think these changes are important for a fresh start. I've listed below all the changes I've made. I'm happy to revisit if some of them are not relevant for this repo.

  1. in pyproject.toml:
    1. I divided the optional dependencies into more categories (audio, torch, litellm, quality, test, dev). The pre-existing "test" and "dev" groups are still the same. I do think this is beneficial on the long run. For instance, litellm shouldn't be considered as a "dev-only" dependency since end users will want to use it as well. We should suggest them to install with pip install smolagents[litellm] instead of pip install litellm so that we control which version should be installed. In the case here it's >=1.55.10". I haven't really completed this part, only started the process to have different groups for different purposes.
    2. in dev dependencies, I removed sqlalchemy, torchaudio and torchvision. For what I can see with a ctrl+F on the repo, these dependencies are never used in the codebase. I'll still check if the tests pass in the CI. If it turns out we need them (as a dependency in transformers for instance) then I'll add them back.
    3. I've added default pytest options so that all tests (in CI and locally on each user's machine) will run with the same options. I kept -sv as before and added --durations=0. By experience, tracking durations in CI proved useful in huggingface_hub and doesn't add any complexity.
    4. I've added a rule so that module-import-not-at-top-of-file (E402) error is ignored in the examples/ folder. Not having all imports at the start of the files make sense in examples.
  2. in Makefile:
    1. I explicitly named examples src tests utils as the folders to run code formatters on. Previously it was "." which means that it would fail if a user has a non-committed local script that they use for testing.
    2. I removed the extra_quality_checks command (doesn't seem to refer to any known script)
    3. I removed the doc-builder style checks (doesn't seem to have any impact + wasn't checked in CI)
    4. I removed test_big_modeling, test_core, test_cli, test_examples, test_prod, test_rest => these seems to be legacy test (from transformers?). I only kept a basic make test that runs the tests. Not the most useful except if some contributors are used to it.
  3. in README.md: added some basic commands in the "contributing" section to install deps + run linter/formatter
  4. in .github/workflows/quality.yml, I made sure to run exactly the same commands as in make quality
  5. in .github/workflows/tests.yml:
    1. I removed the -sv args from the pytest command since it's not configured in pyproject.toml
    2. I have added uv run pytest ./tests/test_all_docs.py (didn't know if it was missing on purpose). EDIT: test seems to fail 😕
  6. I have added a script utils/check_tests_in_ci.py that checks all test file under ./tests are accurately called in the CI. This is to prevent forgetting about it when adding a test file in the future. This script is automatically run on make quality and in the quality.yml CI.

I think that's mostly it. Other changes comes from the linter. Almost nothing has changed in term of logic / dependencies, just a cleaning of some parts. I hope not to be overstepping here, just want give a hand 🤗

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Check that all tests are called in CI."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: another way to be sure of that would simply be to run pytest tests in the CI instead of calling each one individually. It doesn't look as good in the github report though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another advantage of running pytest tests altogether is that if test B fails, then test C will still be triggered. It's not the case when calling tests iteratively.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree: why not just calling pytest tests?

@aymeric-roucher
Copy link
Collaborator

Thanks a lot @Wauplin for this great work! 😃

Regarding 1.ii: sqlalchemy is used in this example: so it should at least be in the tests.

Regarding your points 5.ii and 6, I've set ./tests/test_all_docs.py to test all the documentation by dynamically making a script from all the code blobs in each documentation file then running it. It's very slow and requires API keys, thus I preferred not to run it in CI. Maybe it could be run only if flag RUN_SLOW_TESTS is passed

On all your other points, I agree, and it will be a great addition! You said you still planned to do more the dependency groups in point 1, right? So maybe let's wait for that before merging.

@Wauplin
Copy link
Contributor Author

Wauplin commented Jan 16, 2025

I think it's best to merge this PR (once sqlalchemy is added back + removed the slow test) and open another one for the other dependency groups -typically making gradio optional-. This PR is already big enough and is more focused on cleaning/removing code. I'll address your comments tomorrow!

@Wauplin
Copy link
Contributor Author

Wauplin commented Jan 17, 2025

@aymeric-roucher PR is ready for review!

I've added back sqlalchemy in dev group. I also removed ./tests/test_all_docs.py from the CI for now since it requires more configuration (I still believe it could be useful to have it for instance in a daily CI but that's for another PR).

And I've also added conditions in the test CI so that all tests are run even if one step has failed. See example here: https://github.com/huggingface/smolagents/actions/runs/12826181114/job/35765597633?pr=229. I do believe that this worflow file could be factorized a bit but out of scope of this PR I believe :)

Copy link
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good improvements. Thanks.

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Check that all tests are called in CI."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree: why not just calling pytest tests?

Comment on lines +30 to 39
audio = [
"soundfile",
]
torch = [
"torch",
"torchaudio",
"torchvision",
"sqlalchemy",
"accelerate",
"soundfile",
]
litellm = [
"litellm>=1.55.10",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth mentioning the new extras (audio, torch, litellm) in the README?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather do that in a follow-up PR. At the moment dev and test extras are not documented either. I think it's best to settle these groups (including the openai one from #236) and then open a PR only for the documentation part. About docs, I think it'd be beneficial to write a proper Installation page with all options similar to https://huggingface.co/docs/huggingface_hub/installation (also documenting editable install + install from source). And in the README, mention only the main ones + link to the docs. Otherwise the README could become a bit bloated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertvillanova albertvillanova merged commit 1f8fd72 into main Jan 17, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants