You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is to document how you can get started with training, saving, loading and also running inference for any transformer based model and available datasets and/or processors.
Set up authentication to your GitHub
Since we are working with a private repository at the moment, it is not possible to easily install this package. There are 3 ways of running our code on Colab:
copy paste all of the relevant code into colab: this is not recommended because it makes the notebooks really long and unmaintainable. It makes versioning almost impossible, and every change will require a full refactoring again.
zip the package and unzip on colab: I've tried this before, and though it works most of the time it can sometimes be a bit confusing with zip files getting misplaced, misnamed, etc.. making it difficult to know why something went wrong
install the package as a private repository: this is very similar to running pip install for any other package, except that we need to authenticate before running it. This basically means that within colab, we'd be running a private pip install to install the package directly from the develop branch of argument-mining. Thus, any new changes that we make an push can automatically loaded by running the private install again.
Since our code should be ready for testing, we are opting with the third method. If you want to develop the code on colab, then please reach out to me in private and I can help with setting that up.
Now, in order to authenticate, you will need a GitHub access token. Follow these instructions to create an access token. Save this access token in a .json file called github_config.json that has the following format:
Make sure that the username is MY username and NOT yours. This is because we will be installing a repository that is in my name.
Add relevant data to Google Drive
Now, in order to access data we need it accessible from within Colab. We can do this by storing things in Google Drive. Make sure that you store any data that you need within your personal Google Drive. In particular, store the github_config.json there and ONLY there. This is because you do NOT want it accessible to other people, since someone accessing it will be able to access your account.
Now, in terms of data (e.g. data for the project) I mentioned above that you can store it in your own drive. This is OK, but since we already have a shared drive (https://drive.google.com/drive/folders/1XaMWpeoSq04BkVGt16aS9Gk7PBjMtirS) you can also store data there. Just make sure that you don't overwrite anything and that each folder has a readme.txt file explaining what is there in the file, so that we don't get lost.
In order to be able to access this shared folder programmatically, you will need to add it as a shortcut to your Google Drive. You can do this by right clicking the shared folder, and then clicking the 'add shortcut to drive'.
You will now be able to access the shared folder programmatically from within Colab.
Open the repository and run models!
Now, open Colab on your browser. You will be faced with a default screen for selecting notebooks. Navigate to the GitHub tab and check the 'include private repos' checkbox. This will prompt you to login to you GitHub and authenticate. Then from the repositories dropdown, find namiyousef/argument-mining and then select the develop branch. Once you have done this, find the notebook in the path experiments/yousef/End-to-end_GPU.ipynb and select it.
In this notebook, configure the paths as appropriate (you will need to modify some other path variables along the way). Once you have done this, you will be able to run the notebook successfully.
Note
When you are done using the notebook, you can save a copy in your personal drive. You can also push it to GitHub, but please use a different path than experiments/yousef/ because that will change the notebook and I currently have it setup to work with my directories. I would recommend that you push the notebook to GitHub under experiments/{your_name}/{file_name} so that you can have it configured to how you want to use it, and also so you can have versioning on it.
Alternatively, you can save a copy in your personal drive (if you do this, the authentication might fail the next time you try to run it, so try to stick to GitHub wherever possible).
The text was updated successfully, but these errors were encountered:
Train Models on Colab
This issue is to document how you can get started with training, saving, loading and also running inference for any transformer based model and available datasets and/or processors.
Set up authentication to your GitHub
Since we are working with a private repository at the moment, it is not possible to easily install this package. There are 3 ways of running our code on Colab:
pip install
for any other package, except that we need to authenticate before running it. This basically means that within colab, we'd be running a privatepip install
to install the package directly from the develop branch ofargument-mining
. Thus, any new changes that we make an push can automatically loaded by running the private install again.Since our code should be ready for testing, we are opting with the third method. If you want to develop the code on colab, then please reach out to me in private and I can help with setting that up.
Now, in order to authenticate, you will need a GitHub access token. Follow these instructions to create an access token. Save this access token in a .json file called
github_config.json
that has the following format:Make sure that the username is MY username and NOT yours. This is because we will be installing a repository that is in my name.
Add relevant data to Google Drive
Now, in order to access data we need it accessible from within Colab. We can do this by storing things in Google Drive. Make sure that you store any data that you need within your personal Google Drive. In particular, store the
github_config.json
there and ONLY there. This is because you do NOT want it accessible to other people, since someone accessing it will be able to access your account.Now, in terms of data (e.g. data for the project) I mentioned above that you can store it in your own drive. This is OK, but since we already have a shared drive (https://drive.google.com/drive/folders/1XaMWpeoSq04BkVGt16aS9Gk7PBjMtirS) you can also store data there. Just make sure that you don't overwrite anything and that each folder has a
readme.txt
file explaining what is there in the file, so that we don't get lost.In order to be able to access this shared folder programmatically, you will need to add it as a shortcut to your Google Drive. You can do this by right clicking the shared folder, and then clicking the 'add shortcut to drive'.
You will now be able to access the shared folder programmatically from within Colab.
Open the repository and run models!
Now, open Colab on your browser. You will be faced with a default screen for selecting notebooks. Navigate to the GitHub tab and check the 'include private repos' checkbox. This will prompt you to login to you GitHub and authenticate. Then from the repositories dropdown, find
namiyousef/argument-mining
and then select thedevelop
branch. Once you have done this, find the notebook in the pathexperiments/yousef/End-to-end_GPU.ipynb
and select it.In this notebook, configure the paths as appropriate (you will need to modify some other path variables along the way). Once you have done this, you will be able to run the notebook successfully.
Note
When you are done using the notebook, you can save a copy in your personal drive. You can also push it to GitHub, but please use a different path than
experiments/yousef/
because that will change the notebook and I currently have it setup to work with my directories. I would recommend that you push the notebook to GitHub underexperiments/{your_name}/{file_name}
so that you can have it configured to how you want to use it, and also so you can have versioning on it.Alternatively, you can save a copy in your personal drive (if you do this, the authentication might fail the next time you try to run it, so try to stick to GitHub wherever possible).
The text was updated successfully, but these errors were encountered: