-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with REST API source - response body #2310
Comments
Please provide us some code of our pipeline. Also if you see columns set to "NULL" not appear in the destination it is most likely because none of the incoming rows/records have a datatype there and dlt can not determine the type of the column so it is dropped. You can provide hints for this if you like. |
Thanks. The thing is that the schema and the columns are dynamic. We can't know the properties ahead of time, so we need that the response from the API will be loaded as is to s3. Without any transformation or modifications. |
I doubt that we will be supporting columns with unknown type any time in the future, since pretty much all destinations (except for pure json files) have schemas. You could set the max_nesting_level to 0 though, then the nested "null" columns will be retained. |
If you want more help, we really need some of your code. |
We already use resource_name = get_resource_name(api_config)
destination = filesystem(bucket_url=run_path)
pipeline = dlt.pipeline(
destination=destination,
dataset_name=api_config.resource_config.table_name,
)
source = create_dynamic_source(
rest_client_config, api_config.resource_config, resource_name
)
load_info = pipeline.run(source())
def create_dynamic_source(
rest_client_config: ClientConfig,
resource_config: ResourceConfig,
resource_name: str,
) -> SourceFactory:
"""Creates a dynamic source based on the configuration"""
@dlt.source(name=resource_name, max_table_nesting=0)
def dynamic_source() -> Iterator:
rest_config: RESTAPIConfig = {
"client": rest_client_config,
"resources": [
{
"name": resource_name,
"primary_key": "id",
"write_disposition": "replace",
"table_name": resource_name,
"endpoint": {
"path": resource_config.path,
"params": resource_config.params,
},
}
],
}
yield from rest_api_resources(rest_config)
return dynamic_source |
Ok, so maybe try to adjust your path setting to include the full result (which also was part of your questions), then you should have the full result and also the nested columns that are null should be retained. What is path set to in your example? |
The path is the API endpoint, for example if the base url is I don't really understand adjusting the path can help us. |
Ah sorry, I meant the "data_selector" which selects which part of the returned json is forwarded into the resource: https://dlthub.com/docs/general-usage/http/rest-client |
Yeah, we have tried it. It didn't work. I think the reason is the way this specific API (msgraph API) returns the response, where the value is an array (list) and not an object. I guess that it works with other APIs (like in the example on the docs) |
You could add a transformer to change the data shape before it is ingested by the dlt extract stage, but imho both dictionaries or lists should work.. |
Hi,
We have two issues with the way dlt (probably) transform the API response before loading it to the destination (s3).
null
are removedLet's get this response for example:
Regarding the first issue - "department" is null on this response, and on the destination, we don't have this property. We read on the docs that there is an option to define a schema. The thing is that it's a lot of work for something we need across the board, for every API response.
Regarding the second issue - the data on the destination is saved without the "value" key, and we need it with this key.
The text was updated successfully, but these errors were encountered: