Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Training on Colab #38

Open
namiyousef opened this issue Apr 5, 2022 · 0 comments
Open

Model Training on Colab #38

namiyousef opened this issue Apr 5, 2022 · 0 comments
Assignees
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@namiyousef
Copy link
Owner

namiyousef commented Apr 5, 2022

Train Models on Colab

This issue is to document how you can get started with training, saving, loading and also running inference for any transformer based model and available datasets and/or processors.

Set up authentication to your GitHub

Since we are working with a private repository at the moment, it is not possible to easily install this package. There are 3 ways of running our code on Colab:

  • copy paste all of the relevant code into colab: this is not recommended because it makes the notebooks really long and unmaintainable. It makes versioning almost impossible, and every change will require a full refactoring again.
  • zip the package and unzip on colab: I've tried this before, and though it works most of the time it can sometimes be a bit confusing with zip files getting misplaced, misnamed, etc.. making it difficult to know why something went wrong
  • install the package as a private repository: this is very similar to running pip install for any other package, except that we need to authenticate before running it. This basically means that within colab, we'd be running a private pip install to install the package directly from the develop branch of argument-mining. Thus, any new changes that we make an push can automatically loaded by running the private install again.

Since our code should be ready for testing, we are opting with the third method. If you want to develop the code on colab, then please reach out to me in private and I can help with setting that up.

Now, in order to authenticate, you will need a GitHub access token. Follow these instructions to create an access token. Save this access token in a .json file called github_config.json that has the following format:

{
   "username": "namiyousef",
   "access_token": "YOURACCESSTOKEN"
}

Make sure that the username is MY username and NOT yours. This is because we will be installing a repository that is in my name.

Add relevant data to Google Drive

Now, in order to access data we need it accessible from within Colab. We can do this by storing things in Google Drive. Make sure that you store any data that you need within your personal Google Drive. In particular, store the github_config.json there and ONLY there. This is because you do NOT want it accessible to other people, since someone accessing it will be able to access your account.

Now, in terms of data (e.g. data for the project) I mentioned above that you can store it in your own drive. This is OK, but since we already have a shared drive (https://drive.google.com/drive/folders/1XaMWpeoSq04BkVGt16aS9Gk7PBjMtirS) you can also store data there. Just make sure that you don't overwrite anything and that each folder has a readme.txt file explaining what is there in the file, so that we don't get lost.

In order to be able to access this shared folder programmatically, you will need to add it as a shortcut to your Google Drive. You can do this by right clicking the shared folder, and then clicking the 'add shortcut to drive'.
Screenshot 2022-04-05 at 11 27 31

You will now be able to access the shared folder programmatically from within Colab.

Open the repository and run models!

Now, open Colab on your browser. You will be faced with a default screen for selecting notebooks. Navigate to the GitHub tab and check the 'include private repos' checkbox. This will prompt you to login to you GitHub and authenticate. Then from the repositories dropdown, find namiyousef/argument-mining and then select the develop branch. Once you have done this, find the notebook in the path experiments/yousef/End-to-end_GPU.ipynb and select it.

Screenshot 2022-04-05 at 11 36 53

In this notebook, configure the paths as appropriate (you will need to modify some other path variables along the way). Once you have done this, you will be able to run the notebook successfully.

Note

When you are done using the notebook, you can save a copy in your personal drive. You can also push it to GitHub, but please use a different path than experiments/yousef/ because that will change the notebook and I currently have it setup to work with my directories. I would recommend that you push the notebook to GitHub under experiments/{your_name}/{file_name} so that you can have it configured to how you want to use it, and also so you can have versioning on it.

Alternatively, you can save a copy in your personal drive (if you do this, the authentication might fail the next time you try to run it, so try to stick to GitHub wherever possible).

@namiyousef namiyousef added documentation Improvements or additions to documentation good first issue Good for newcomers labels Apr 5, 2022
@namiyousef namiyousef changed the title Train models on Colab Model Training on Colab Apr 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants