Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
phueb committed May 23, 2022
2 parents 74352eb + 050e83e commit b540e7c
Show file tree
Hide file tree
Showing 5 changed files with 96 additions and 31 deletions.
119 changes: 92 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,11 +32,30 @@ Installations:
* numpy==1.17.5
* gensim==4.1.2



## Overview

`ludwig` implements both server-side and client-side logic.
That means that `ludwig` is both used:
1. by the user to submit jobs to workers, and
2. by each worker to watch for job submissions, manage the job queue, and run jobs.

Client-side, a user invokes the `ludwig` command to submit jobs.
On each worker, file-watchers that are part of the `ludwig` package watch for uploaded job instructions, and save instructions to a job queue.

For an illustration, consider the flowchart below.
Notice that each worker has its own (independent) file-watcher and job queue.

<div align="center">
<img src="images/ludwig_flowchart.jpg">
</div>

## Documentation

Information about how the system was setup and works behind-the-scenes can be found at [https://docs.philhuebner.com/ludwig](https://docs.philhuebner.com/ludwig).

## Requirements & Installation
## Getting Started

### Linux or MacOS
Windows is currently not supported due to incompatibility with file names used by Ludwig.
Expand All @@ -45,52 +64,98 @@ Windows is currently not supported due to incompatibility with file names used b
Tasks submitted to Ludwig must be programmed in Python 3 (the Python3.7 interpreter is used on each worker).

### Access to the shared drive
See the administrator to provide access to the lab's shared drive. Mount the drive at ```/media/ludwig_data```.
The share is hosted by the lab's file server using ```samba```, and is shared with each node.
Because we do not allow direct shell access to nodes, all data and logs must be saved to the shared drive.

### Installation

In a terminal, type:
See the administrator to get access to the lab's shared drive.
Mount the drive at ```/media/ludwig_data```. On Linux based systems, type:

```bash
pip3 install git+https://github.com/phueb/Ludwig.git
mount /media/ludwig_data
```

### Project Organization
The shared drive is hosted by the lab's file server using the ```samba``` protocol.
Like the client (i.e. user), each worker has access to the shared drive.
The shared drive is the place where all job related data and results are stored, and accessed.

### Start a new Project using Ludwig-Template

```ludwig``` requires all Python code be located in a folder inside the root directory of your project.
```ludwig``` requires all Python code be located in a folder inside the root directory of your project.
This folder houses your source code and should have the same name (lower-cased) as your project.
Additionally, inside this folder, create two Python files:
* ```params.py```: contains information about which parameters to use for each job
* ```config.py```: contains basic information like the name of the user's project
* ```job.py```: contains the function `main()` which should execute a single job

The easiest way to recreate the required organization is to login to Github,
navigate to [Ludwig-Template](https://github.com/UIUCLearningLanguageLab/Ludwig-Template), and then click "Use this template" (green button).

Rename the folder `Ludwig-Template` to something like `MyProject`, and rename the source code folder `src` to `myproject`.
It is recommended to install a virtual Python interpreter in your project.
Use Python 3.7 if you can, or, at the very least, write code that is compatible with Python 3.7.

See the `Example` folder for an example of what to put into these files,
or use the template repository [Ludwig-Template](https://github.com/UIUCLearningLanguageLab/Ludwig-Template).

### Installation

Next, inside your Python virtual environment, install `ludwig` from Github:

```bash
pip3 install git+https://github.com/phueb/Ludwig.git
```

## Submitting a Job

Once you have installed `ludwig` and set up your project appropriately, use the command-line tool to submit your job.
To submit jobs, go to your project root folder, and invoke the command-line tool that has been installed:
To submit jobs, go to your project root folder, and invoke the command-line tool that is part of `ludwig`:

```bash
ludwig
```

See the section Troubleshooting if errors are encountered.
If it is your first time submitting jobs, consider moving any data related to your job to the shared drive.
For instance, to move data files in the folder `data` to the shared drive, where it can be accessed by workers, do:

```bash
ludwig -e data/
```

To run each job multiple times, use the `-r` flag. For instance, to run each job 6 times, do:

```bash
ludwig -r 6
```

See the section Troubleshooting if errors are encountered.
Consider consulting information about available command line arguments:

```bash
ludwig -h
```

### Check status of workers

To get fast feedback about potential problems with your submitted jobs, try:

```bash
ludwig-status
```

To check the status of a specific Ludwig worker (e.g. hawkins):

```bash
ludwig-status -w hawkins
```


### Viewing output of jobs

By default, the stdout of a submitted job will be redirected to a file located on the shared drive.
After uploading your code, verify that your task is being processed by reading the log file.
If you don't recognize the output in the file, it is likely that the node is currently processing another user's task.
Retry when the node is no longer busy.

To check the status of a Ludwig worker (e.g. hebb):
The log files are available at `/media/ludwig_data/stdout`.
To quickly access a log file, execute:

```bash
ludwig-status -w hebb
tail -f /media/ludwig_data/stdout/hawkins.out
```

If you don't recognize the output in the file, it is likely that the worker is currently processing another user's job.

### Re-submitting

Any time new jobs are submitted, any previously submitted jobs associated with the same project and still running,
Expand All @@ -106,7 +171,7 @@ to trigger a prompt asking to copy the worker's hostkey.
For example,

```bash
sftp ludwig@hebb
sftp ludwig@hawkins
```

When asked to save the hostkey, enter `yes` and hit `Enter`.
Expand Down Expand Up @@ -136,22 +201,22 @@ The ```-mnt``` flag is used to specify where the shared drive is mounted on the

A user might want to load a dataset from the shared drive.
To do so, the path to the shared drive from the Ludwig worker must be known.
The path is auto-magically added by `Ludwig` and can be accessed via `param2val['project_path`].
The path is auto-magically added by `Ludwig` and can be accessed via `param2val['project_path']`.
For example, loading a corpus from the shared drive might look like the following:

```python
from pathlib import Path

def main(param2val):

from pathlib import Path

project_path = Path(param2val['project_path'])
corpus_path = project_path / 'data' / f'{param2val["corpus_name"]}.txt'
train_docs, test_docs = load_corpus(corpus_path)
corpus = load_corpus(corpus_path)
```

### Saving Job Results
Job results, such as learning curves, or other 1-dimensional performance measures related to neural networks for example,
should be returned by job.main() as a list of pandas DataFrame objects.
should be returned by job.main() as a list of pandas Series objects.
These will be automatically saved to the shared drive after a job has completed.

Alternatively, if the data is too big to be held in memory, it is recommended to write the data to disk,
Expand Down
Binary file added images/ludwig_flowchart.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion ludwig/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@


__version__ = '4.1.6'
__version__ = '4.0.7'


def print_ludwig(s):
Expand Down
4 changes: 2 additions & 2 deletions ludwig/results.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,9 @@ def gen_param_paths(project_name: str,
for k in param2requests:
for v in param2requests[k]:
if loaded_param2val[k] != v:
print(f'For key "{k}", {v} does not match {loaded_param2val[k]}')
print_ludwig(f'For key "{k}", {v} does not match {loaded_param2val[k]}')

if num_requested != num_found and require_all_found:
raise SystemExit(f'Found {num_found} but requested {num_requested}')
raise SystemExit(f'Ludwig: Found {num_found} but requested {num_requested}')


2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
'PyNaCl==1.4.0',
'cryptography==3.4.6',
'watchdog==2.0.1',
'paramiko==2.6.0',
'paramiko==2.10.1',
'numpy',
'pandas',
'cached_property',
Expand Down

0 comments on commit b540e7c

Please sign in to comment.