Skip to content

Commit

Permalink
start of work for aws prices and region filtering (#6)
Browse files Browse the repository at this point in the history
* start of work for aws prices and region filtering
* automation docker build
* addition of "Regions" attribute to AWS instance data, allowing for --region to be a filter string (Google already had it)
* addition of sorting based on field, and ascending order of results
* a parameter in settings to ask to use the cache only (set now to false, will be true when we can provide remote cache)
* filtering by regions for both instances AWS/Google
* moved original design notes into separate doc in docs (preparing for prettier docs at some point)
Signed-off-by: vsoch <[email protected]>
  • Loading branch information
vsoch authored Dec 7, 2022
1 parent 5a52791 commit f28af61
Show file tree
Hide file tree
Showing 17 changed files with 385 additions and 108 deletions.
1 change: 1 addition & 0 deletions .dockerignore
48 changes: 48 additions & 0 deletions .github/workflows/build-deploy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
name: build cloud-select

on:

# Publish packages on release
release:
types: [published]

pull_request: []

# On push to main we build and deploy images
push:
branches:
- main

jobs:
build:
permissions:
packages: write

runs-on: ubuntu-latest
name: Build
steps:
- name: Checkout
uses: actions/checkout@v3

- name: Build Container
run: docker build -t ghcr.io/converged-computing/cloud-select:latest .

- name: GHCR Login
if: (github.event_name != 'pull_request')
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Tag and Push Release Image
if: (github.event_name == 'release')
run: |
tag=${GITHUB_REF#refs/tags/}
echo "Tagging and releasing ghcr.io/converged-computing/cloud-select:${tag}"
docker tag ghcr.io/converged-computing/cloud-select:latest converged-computing/cloud-select:${tag}
docker push converged-computing/cloud-select:${tag}
- name: Deploy
if: (github.event_name != 'pull_request')
run: docker push ghcr.io/converged-computing/cloud-select:latest
6 changes: 0 additions & 6 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,7 @@ env
build
docs/_build
release
_site
dist/
OLD
__pycache__
*.simg
*.sif
*.img
/.eggs
/modules
/views
16 changes: 16 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
FROM ubuntu

# docker build -t cloud-select .

LABEL MAINTAINER @vsoch
ENV PATH /opt/conda/bin:${PATH}
ENV LANG C.UTF-8
RUN apt-get update && \
apt-get install -y wget && \
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash Miniconda3-latest-Linux-x86_64.sh -b -p /opt/conda && \
rm Miniconda3-latest-Linux-x86_64.sh

WORKDIR /code
COPY . /code
RUN pip install -e .[all] && pip install ipython
122 changes: 86 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,62 +112,112 @@ Ask for a specific cloud on the command line (note you can also ask via your set
$ cloud-select --cloud google instance --cpus-min 200 --cpus-max 400
```

## Design
#### Sorting

We can follow the design of the [aws selector tool](https://github.com/aws/amazon-ec2-instance-selector).
By default we sort results based on the order the solver produces them.
However, you can ask to sort your results by an attribute, e.g., here is memory:

## Details
```bash
$ cloud-select --sort-by memory instance
```

By default, when sort is enabled on an attribute we do descending, so the largest values
are at the top. You can ask to reverse that with `--asc` for ascending, meaning we sort
from least to greatest:

```bash
$ cloud-select --asc --sort-by memory instance
```

#### Max Results

You can always change the max results (which defaults to 25):

```bash
$ cloud-select --max-results 100 instance
```

We currently sort from greatest to least. Set max results to 0 to set no limit.

```bash
$ cloud-select --max-results 0 instance
```

Note that this argument comes before the instance command.

It is non-trivial to find the correct instances, or more generally, do cost comparison across clouds. A tool that can intelligently map a resource request to a set of options, and then present the user with a set of options (or a tool) can alleviate this current challenge. Importantly, we don't want to provide one answer, as the tool needs to be agnostic and not suggest a specific cloud.
#### Regions

### Implementation Idea
For regions, note that you have a default set in your settings.yml. E.g.,:

The implementation needs three parts: 1. a database of contender machines that is automatically updated at some frequency, 2. a tool that can parse this database and select a subset based on a user criteria, and 3. a final mapping of each in that selection to a cost estimate (using live or active APIs).
```yaml
google:
regions: ["us-east1", "us-west1", "us-central1"]

aws:
regions: ["us-east-1"]
```
1. Start with APIs that can list instance types. We likely want to filter down into different groups.
2. Think about how to do a mapping across clouds. Likely this means being able to generalize (e.g., describe based on memory, size, GPU or other features, etc)
3. Save metadata about instances given the above attributes.
4. Can we generate a solve to find an optimal instance?
These are used for API calls to retrieve a filtered set, but not to filter that set.
You should generally be more verbose in this set, as it will be the meta set we further
filter. When appropriate, "global" is also added to find resources across regions. For
a one-off region for a query:
```bash
$ cloud-select instance --region east
```

As an example use case, we could create a simple web app (and underlying user interface) that allows to define a jobspec
Jobspec → filter to top options → price API.
Since region names are non consistent across clouds, the above is just a regular expression.
This means that to change region:

> Why Python?
- edit settings.yml to change the global set you use
- add `--region` to a particular query to filter (within the set above).

To start, I was thinking we should use Python APIs for quick prototyping
If you have a cache with older data (and different regions) you will want to clear it.
If we eventually store the cache by region this might be easier to manage,
however this isn't done yet to maintain simplicity of design.

> Why use ASP / clingo and do a solve?
**Note** We use regions and zones a bit generously - on a high level a region encompasses
many zones, and thus a specification of `regions` (as shown below) typically
indicates regions, but under the hood we might be filtering the specific zones.
A result might generally be labeled with "region" and include a zone name.

Given matching requests for amounts, this is probablhy overkill - we could have iterables over a range of options filter this very easily.
The honest answer is that I thought it would be more fun to try using ASP. We can always
remove it for a simpler solution, as it does go against my better jugment to add extra dependencies that aren't needed.
That said, if the solve becomes more complex, it could be cool to have it.
#### Cache Only

To set a global setting to only use the cache (and skip trying to authenticate)
you cat set `cache_only` in your settings.yml to true:

## Previous Art
```yaml
cache_only: true
```
- AWS already has an instance selector in Go https://github.com/aws/amazon-ec2-instance-selector
- GCP has one in perl https://github.com/Cyclenerd/google-cloud-compute-machine-types
This will be the default when we are able to provide a remote cache,
as then you won't be required to have your own credential to use the
tool out of the box!
I think I'm still going to use Python for faster prototyping.
## TODO and Questions
- Are we allowed to provide a cache of instance types (e.g., automated update in GitHub?)
- should be able to set custom instances per cloud - either directly for a cloud, or generic string to match (e.g., "east")
- some logic to standardize regions (e.g., "east")
- add tests and testing workflow
- properties testing for handling min/max/numbers
- Add Docker build / automated builds
- ensure that required set of attributes for each instance are returned (e.g., name, cpu, memory)
- how to handle instances that don't have an attribute of interest? Should we unselect them?
- pretty branded documentation
- selection should have sorting ability
See our current [design document](docs/design.md) for background about design.
- [ ]create cache of instance types and maybe prices in GitHub (e.g., automated update)
- [ ]add tests and testing workflow
- [ ]properties testing for handling min/max/numbers
- [ ] ensure that required set of attributes for each instance are returned (e.g., name, cpu, memory)
- [ ] how to handle instances that don't have an attribute of interest? Should we unselect them?
- [ ] pretty branded documentation and spell checking
- [ ] add GPU memory - available in AWS and I cannot find for GCP
- [ ] should cache be organized by region to allow easier filter (data for AWS doesn't have that attribute)
- [ ] need to do something with costs
- [ ] test performance of using solver vs. not
### Future desires
These are either "nice to have" or small details we can improve upon. Aka, not top priority.
- should we allow currency outside USD? Probably not for now.
- aws instance listing (based on regions) should validate regions - an invalid regions simply returns no results
- could eventually support different resource types (beyond compute or types of prices, e.g., pre-emptible vs. on demand)
- add GPU memory - available in AWS not sure gcp
- add AWS description from metadata (similar to GCP)
- aws instance listing (based on regions) should validate regions - an invalid regions simply returns no results
- for AWS description, when appropriate convert to TB (like Google does)
Planning for minimizing cost:
Expand Down
14 changes: 14 additions & 0 deletions cloud_select/client/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,7 @@ def get_parser():
dest="max_results",
help="Maximum results to return per cloud provider.",
type=int,
default=25,
)
parser.add_argument(
"--cloud",
Expand All @@ -121,6 +122,19 @@ def get_parser():
action="append",
)

parser.add_argument(
"--sort-by",
dest="sort_by",
help="Sort by a result attribute.",
choices=["name", "cpus", "gpus", "memory"],
)
parser.add_argument(
"--asc",
dest="ascending",
help="Sort results ascending instead of descending (default)",
action="store_true",
default=False,
)
parser.add_argument(
"--cache-expire",
dest="cache_expire",
Expand Down
11 changes: 10 additions & 1 deletion cloud_select/client/instance.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ def main(args, parser, extra, subparser):
# Update config settings on the fly
cli.settings.update_params(args.config_params)

# If max results is 0, set to None (no limit)
if args.max_results == 0:
args.max_results = None

# Are we writing ASP to an output file?
asp_out = None
out = args.out
Expand Down Expand Up @@ -55,4 +59,9 @@ def main(args, parser, extra, subparser):
utils.write_json(out, instances)
else:
t = table.Table(instances)
t.table(title="Cloud Instances Selected")
t.table(
title="Cloud Instances Selected",
sort_by=args.sort_by,
limit=args.max_results,
ascending=args.ascending,
)
19 changes: 16 additions & 3 deletions cloud_select/main/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,13 @@ def get_clouds(self, force=False, lookup=False):
# We should always be able to get cloud classes, even without auth
# The class knows how to parse the data types into a standard space
for cloud_name, CloudClass in self._cloudclass.items():
self._clouds[cloud_name] = CloudClass()

# Regions default to settings then defaults
cloud_settings = getattr(self.settings, cloud_name)
self._clouds[cloud_name] = CloudClass(
regions=cloud_settings.get("regions"),
cache_only=self.settings.cache_only,
)
return self._clouds if lookup else list(self._clouds.values())

def instances(self):
Expand Down Expand Up @@ -130,8 +136,6 @@ def update_from_cache(self, items, datatype):
def instance_select(self, max_results=20, out=None, **kwargs):
"""
Select an instance.
We don't currently do anything with kwargs (but will eventually to filter)
"""
# Start with already cached data
instances = self.update_from_cache(self.instances(), "instances")
Expand All @@ -145,6 +149,11 @@ def instance_select(self, max_results=20, out=None, **kwargs):
if self.settings.disable_prices is not True:
prices = self.update_from_cache(self.prices(), "prices")

# Attributes that can't go into the solver
region = kwargs.get("region")
if "region" in kwargs:
del kwargs["region"]

# By here we have a lookup *by cloud) of instance groups
# Filter down kwargs (properties) to those relevant to instances
properties = solve.Properties(schemas.instance_properties, **kwargs)
Expand All @@ -154,6 +163,10 @@ def instance_select(self, max_results=20, out=None, **kwargs):
# 2. filter down to desired set based on these common functions
for cloud_name, instance_group in instances.items():

# Do we have a request to filter by region?
if region is not None:
instance_group.filter_region(region)

# Do we have prices for the cloud?
if cloud_name in prices:
instance_group.add_instance_prices(prices[cloud_name])
Expand Down
Loading

0 comments on commit f28af61

Please sign in to comment.