Rewrite training and inference code to more modern pytorch, add some functionalities and models #20

annaproxy · 2021-07-14T23:27:14Z

Added pytorch dataset and dataloader instead of manual random sampler
Use torch.optim instead of manual update
Dataset now uses beginning of speech as well as end of speech tag
Added nn.RNN instead of manual RNN
Added embeddings (currently still one-hot, but could be made trainable) instead of manual one-hot
Add comprehensive Dutch dictionary for pretraining
Split notebook into two notebooks: one for training, one for inference
CUDA can now be used for training and inference
Add pre-trained Dutch model

…ure experiments)

…tuff

…el (not the newer, 'cooler' one)

All code is used from modules

app/api/slang.py

app/ml_models/rnn/data_tools.py

app/ml_models/rnn/train.py

Co-authored-by: Sasafrass <36883067+Sasafrass@users.noreply.github.com>

… into anna-clean-notebook

Co-authored-by: Sasafrass <36883067+Sasafrass@users.noreply.github.com>

raoulg · 2021-07-17T11:42:55Z

Added some formatting stuff in a review branch

when running flask run and generating a word, I get an error on this branch. Not sure if that is some db update stuff (tried running flask db update) or something in the code

Anna clean notebook review

…ocab object is there

annaproxy · 2021-07-18T13:14:46Z

Anything else? If no, can I merge? uwu

Sasafrass · 2021-07-18T14:41:49Z

when running flask run and generating a word, I get an error on this branch. Not sure if that is some db update stuff (tried running flask db update) or something in the code

Issue is addressed in latest hotfix.

Sasafrass · 2021-07-18T14:42:04Z

Anything else? If no, can I merge? uwu

Nope, everything LGTM now! Happy merging!

Sasafrass

LGTM

annaproxy added 24 commits July 12, 2021 02:42

Add simple dataset object for word-level files

8d52b51

Add dutch dictionary for pre-training

6116957

Add non-manual RNN using pytorch nn.RNN

1c3900f

Add vocabulary file for dutch dictionary (handy as a standard for fut…

8120faf

…ure experiments)

Rewrite generate_word to deal with BOS items, non one-hot and other s…

811b0c9

…tuff

A simple generating notebook to deal with the previous pretrained mod…

91a38a3

…el (not the newer, 'cooler' one)

Minor improvements in word generator. (But there may still be a bug)

fca5d65

Add new training loop using pytorch dataloader and CE

e6cac51

Add pretrained NL model for who wants it

50296c8

Lint data tools

6db972f

Replace trainable embedding with one-hot for now. Add dropout.

0bf66b4

Add two notebooks, one for training, one for generating.

603623d

All code is used from modules

Merge branch 'master' into anna-clean-notebook

8cdec33

Add functionality to load Anna's Model

7823e86

Add Anna's failed model

90d97eb

Only choose random letters. Return neatly formatted word no EOS tag

9c69d7a

Remove debug print

e94f43b

app can now use Anna's model

0586cc7

Put back supervision at every step

42aaa99

Add anna pretrained models

812b789

Delete old models

73e7574

Add generation notebook with less converged model

d22c0fc

Add training notebook with currently Dutch model generations (fun words)

f221ef5

Bookkeeping / cleaning in several files

75e7ebf

annaproxy added the enhancement label Jul 14, 2021

annaproxy requested a review from Sasafrass July 14, 2021 23:46

Sasafrass reviewed Jul 16, 2021

View reviewed changes

annaproxy and others added 3 commits July 16, 2021 18:47

Update app/ml_models/rnn/data_tools.py

d5b12b8

Co-authored-by: Sasafrass <36883067+Sasafrass@users.noreply.github.com>

Change to absolute import statements.

bb1829b

Merge branch 'anna-clean-notebook' of github.com:Sasafrass/straattaal…

ceaa907

… into anna-clean-notebook

annaproxy and others added 6 commits July 16, 2021 18:54

Improve docstring of load_model

379ede3

Co-authored-by: Sasafrass <36883067+Sasafrass@users.noreply.github.com>

Albert docstring for RNNANNA

33e4120

Co-authored-by: Sasafrass <36883067+Sasafrass@users.noreply.github.com>

Add docstring for next_char

968dffe

Lint data_tools, uncomment <BOS> feeding

c70bd99

Fix Albert docstring. Remove "hi" example

8fb5ddf

Improve names and docstrings in rnn and train loop

524e716

This was referenced Jul 17, 2021

Un-hardcode hardcoded hidden size when loading RNN #31

Closed

necessary to pass dataset to generate text, could be done with a 'vocabulary' object or similar #32

Closed

annaproxy and others added 4 commits July 17, 2021 08:05

Docstring for train loop

e09ed39

Co-authored-by: Sasafrass <36883067+Sasafrass@users.noreply.github.com>

formatted with black

d1aec68

black formatting

d5ffcdb

added r prefix before escaped string

e7af65a

raoulg previously approved these changes Jul 17, 2021

View reviewed changes

annaproxy and others added 2 commits July 17, 2021 15:20

Merge pull request #33 from Sasafrass/anna-clean-notebook-review

40f2d61

Anna clean notebook review

Improve docstring for convert_to_string, but this won't matter once v…

a4710b0

…ocab object is there

Temp hotfix, rename "rnn" back to "lstm" for legacy model loading

3c0b9cb

Sasafrass approved these changes Jul 18, 2021

View reviewed changes

annaproxy merged commit fe46d87 into master Jul 18, 2021

Sasafrass deleted the anna-clean-notebook branch July 18, 2021 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite training and inference code to more modern pytorch, add some functionalities and models #20

Rewrite training and inference code to more modern pytorch, add some functionalities and models #20

annaproxy commented Jul 14, 2021 •

edited

Loading

raoulg commented Jul 17, 2021

annaproxy commented Jul 18, 2021

Sasafrass commented Jul 18, 2021

Sasafrass commented Jul 18, 2021

Sasafrass left a comment

Rewrite training and inference code to more modern pytorch, add some functionalities and models #20

Rewrite training and inference code to more modern pytorch, add some functionalities and models #20

Conversation

annaproxy commented Jul 14, 2021 • edited Loading

raoulg commented Jul 17, 2021

annaproxy commented Jul 18, 2021

Sasafrass commented Jul 18, 2021

Sasafrass commented Jul 18, 2021

Sasafrass left a comment

Choose a reason for hiding this comment

annaproxy commented Jul 14, 2021 •

edited

Loading