Submission for the Webpage Classification Task at IRIXYS'23
This is a repository containing submissions to the webpage classification task at IRIXYS'23. All submissions will be made to TIRA.io where you can submit your working software to improve reproducibility (you can also upload run files in case of problems with a software submission).
TIRA expects software to be submitted as Docker image. For the hackathon, we have prepared a Github Action that automatically builds the Docker image and submits it to TIRA. Please ensure that all code and models are inside the repository and start the Github action, from there, Maik will take care of the rest (e.g., fixing build of the image if required). If you have any questions, please do not hesitate to write Maik a message. To simplify software submissions for the hackathon as much as possible, we recommend that you work either in Github Codespaces or using dev containers with Docker. Github Codespaces are an easy option to start in a few minutes (free tier of 130 compute hours per month).
Get the training/validation data via:
wget 'https://zenodo.org/records/10118828/files/Hackathon_data.zip?download=1' -O hackathon-data.zip
To simplify the start, this repository contains the official baselines so that you can directly continue with them:
- trivial-baseline contains a trivial baseline that always predicts the same class
- sklearn-baseline contains a scikit-learn baseline
- snorkel-baseline contains a baseline with with snorkel in Python
- snorkel-jupyter-baseline contains a baseline with with snorkel with Jupyter notebooks
- Open this repository in Github Codespaces
- Please do not forget to commit often
A dev container (please find a suitable installation instruction here) allows you to directly work in the prepared Docker container so that you do not have to install the dependencies (which can sometimes be a bit tricky).
To develop with dev containers, please: