-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP import with DIC - Design Proposal #243
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
|
||
# Overview | ||
This design provides an approach to allow DataImportCron to import HTTP sources. | ||
The design will also demonstrate to the user how to operate with such import types. | ||
|
||
## Motivation | ||
Currently, the DataImportCron allows the import of Registry Imports only. Recently, there was a demand to also allow HTTP import types. | ||
The problem that arises from HTTP imports is that there is no convention between the different sources, so it's hard to know when the image is updated for each source in a generic way, which will make the polling process more difficult than standard registry sources. | ||
One possible solution If the URL is static is to use a Get request with an If-Modified-Since header where we specify the date from which we want to check if there was a change. If there was a change since the specified date, the request will return with a status of 200OK, and then we know that the image has been updated since that date. | ||
|
||
In the current situation we can enable HTTP imports by updating the DesiredDigest label. | ||
|
||
## Goals | ||
|
||
* Allow the user to perform http imports manually with dataimportcron | ||
* Create a poller that will cover as many cases as possible for automatic updating | ||
|
||
## Non Goals | ||
|
||
* The poller will probably not cover all import cases and sometimes the user will have to manually update the digest | ||
|
||
## User Stories | ||
|
||
* As a user, I would like to import images from an HTTP source using DataImportCron | ||
* As a user, I would like the poller automatically update the image when it is needed | ||
* As a user, I would like to manually trigger an HTTP import with DataImportCron | ||
|
||
## Repos | ||
|
||
* **CDI**: Offers an API which serves as a mutable symbolic link to a Persistent Volume Claim (PVC). This link can be used in a DataVolume. | ||
* **CDI**: Provides an API called DataImportCron, designed to manage the periodic polling and importing of disk images as PVCs into a specific namespace (golden images by default). This functionality leverages the CDI import processes. | ||
|
||
## Implementation Phases | ||
|
||
**Phase1** - Manual Import Update | ||
* DataImportCron controller to handle HTTP imports | ||
* DataImportCron webhook/validation to support HTTP source type | ||
|
||
**Phase2** - Poller | ||
* DataImportCron poller to support HTTP source polling | ||
|
||
# Design | ||
|
||
If the URL is static: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand what you mean by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sometimes the image has the same URL and then we can use the way I mentioned, but sometimes the suffix changes when you upload a new version and we want to identify such a case, so we have to use another way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly, I don't think we should consider that particular scenario. From my perspective that is a completely different file. The URL that is passed in is always static and cannot change. There would be no mechanism that allows us to reliably know if a different file is newer than the original. I guess we could consider a 'template' url in combination with some version scheme, but that would be an enhancement IMO. |
||
* Make a GET request with the If-Modified-Since Header starting from the date stored in the AnnLastCronTime annotation | ||
* If the returned status is 200OK, perform import again | ||
* Update AnnLastCronTime to time.Now() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you considered that some sources provide a separate file with the hash of the image? For instance the fedora cloud images have a file that contains the checksum of the file. So we could download that file and verify the hash against what we have and decide if a new one exists that way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. The question is what percentage of sources support this and whether all sources provide the file in the same format? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I can tell these are the areas where this file structure diverges:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we can catalog these, if there is a limited number of permutations of these formats, maybe we can implement them, and only support those limited number of options. I understand there might be many, but if we capture the most common ones, that might be enough. |
||
|
||
## DataImportCron Example | ||
|
||
This example polls http://tinycorelinux.net/14.x/x86/release/Core-14.0.iso static image. The DataImportCron will import it as new PVCs and automatically manage updating the corresponding DataSource. | ||
|
||
* **source** specifies where to poll from | ||
* **http** specifies that the type is HTTP import | ||
* **url** specifies the image URL | ||
* **schedule** should be empty when we are manually updating the DataImportCron desired digest annotation | ||
|
||
```yaml | ||
apiVersion: cdi.kubevirt.io/v1beta1 | ||
kind: DataImportCron | ||
metadata: | ||
name: fedora-image-import-cron | ||
namespace: golden-images | ||
spec: | ||
template: | ||
spec: | ||
source: | ||
http: | ||
url: "http://tinycorelinux.net/14.x/x86/release/Core-14.0.iso" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to bring home the point of the separate file, http://tinycorelinux.net/14.x/x86_64/release/ also contains a file with an md5 hash. So I think it might make sense to extend the http source with a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the URL is static then it can help, if it changes so we need to find out the new suffix in addition to the hashcode to know the new URL. There could be other issues like the format of the checksum file. Maybe we can use the name of the version that is in the same file you mentioned (e.g There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If for some sources the filename There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe some templating type thing could be useful. But I think we should start out with just considering 'static' URLs, get that working properly, then expand to more complex use cases if what we have is not sufficiently powerful. |
||
storage: | ||
resources: | ||
requests: | ||
storage: 5Gi | ||
schedule: "" | ||
garbageCollect: Outdated | ||
importsToKeep: 2 | ||
managedDataSource: fedora | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we can do the
If-Modified-Since
header on a schedule right? We know the time the job kicked off last so we pass that with the header. Do we know if all http servers support that header, I suspect not all of them do.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be that not all servers will support this header and then we will have to use another way.
Alex suggested that we can offload polling entirely to the user
@akalenyu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah just something I thought about.. we could provide an example poller impl.
and require folks to supply a poller URL at the webhook level for HTTP sources (or "static" to bypass polling altogether)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe limit the options of 'poller' type to a few known ones, including 'bypass' and 'if-modified-since'. We could expand that later if we identify more options. Maybe include the template idea from below, not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still like offloading the polling logic to the user. But second thoughts about running it on their behalf since it means a user app will run in the CDI namespace. We could always just disable our poller and have them create their own cronjobs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can do it with the checksum like @awels suggested, but there is no unique convention for the file structure