Issues with REST API source - response body #2310

barletza-pathid · 2025-02-14T20:10:51Z

Hi,

We have two issues with the way dlt (probably) transform the API response before loading it to the destination (s3).

Properties with the value null are removed
The response is loaded without the "result" or "data" or "values" key, and we need this key.

Let's get this response for example:
Regarding the first issue - "department" is null on this response, and on the destination, we don't have this property. We read on the docs that there is an option to define a schema. The thing is that it's a lot of work for something we need across the board, for every API response.
Regarding the second issue - the data on the destination is saved without the "value" key, and we need it with this key.

    "@odata.context": "https://graph.microsoft.com/beta/$metadata#users",
    "value": [
        {
            "id": "6e7b768e-07e2-4810-8459-485f84f8f204",
            "deletedDateTime": null,
            "accountEnabled": true,
            "ageGroup": null,
            "businessPhones": [],
            "city": null,
            "createdDateTime": "2017-09-04T15:35:02Z",
            "creationType": null,
            "companyName": null,
            "consentProvidedForMinor": null,
            "country": null,
            "department": null,
            "displayName": "Conf Room Adams",
            "employeeId": null,
            "employeeHireDate": null,
            "employeeLeaveDateTime": null,
            "employeeType": null,
            "faxNumber": null,
            "givenName": null,
            "imAddresses": [],
            "infoCatalogs": [],
            "isLicenseReconciliationNeeded": true,
            "isManagementRestricted": null,
            "isResourceAccount": null,
            "jobTitle": null,
            "legalAgeGroupClassification": null,
            "mail": "[email protected]",
            "mailNickname": "Adams",
            "mobilePhone": null,
            "onPremisesDistinguishedName": null,
            "officeLocation": null,
            "onPremisesDomainName": null,
            "onPremisesImmutableId": null,
            "onPremisesLastSyncDateTime": null,
            "onPremisesObjectIdentifier": null,
            "onPremisesSecurityIdentifier": null,
            "onPremisesSamAccountName": null,
            "onPremisesSyncEnabled": null,
            "onPremisesUserPrincipalName": null,
            "otherMails": [],
            "passwordPolicies": "None",
            "postalCode": null,
            "preferredDataLocation": null,
            "preferredLanguage": null,
            "proxyAddresses": [
                "SMTP:[email protected]"
            ],
            "refreshTokensValidFromDateTime": "2017-09-12T21:08:14Z",
            "securityIdentifier": "S-1-12-1-1853585038-1209010146-1598577028-83032196",
            "showInAddressList": null,
            "signInSessionsValidFromDateTime": "2017-09-12T21:08:14Z",
            "state": null,
            "streetAddress": null,
            "surname": null,
            "usageLocation": null,
            "userPrincipalName": "[email protected]",
            "externalUserConvertedOn": null,
            "externalUserState": null,
            "externalUserStateChangeDateTime": null,
            "userType": "Member",
            "employeeOrgData": null,
            "passwordProfile": null,
            "assignedLicenses": [],
            "assignedPlans": [],
            "authorizationInfo": {
                "certificateUserIds": []
            },
            "cloudRealtimeCommunicationInfo": {
                "isSipEnabled": false
            },
            "deviceKeys": [],
            "identities": [
                {
                    "signInType": "userPrincipalName",
                    "issuer": "M365x214355.onmicrosoft.com",
                    "issuerAssignedId": "[email protected]"
                }
            ],
            "onPremisesExtensionAttributes": {
                "extensionAttribute1": null,
                "extensionAttribute2": null,
                "extensionAttribute3": null,
                "extensionAttribute4": null,
                "extensionAttribute5": null,
                "extensionAttribute6": null,
                "extensionAttribute7": null,
                "extensionAttribute8": null,
                "extensionAttribute9": null,
                "extensionAttribute10": null,
                "extensionAttribute11": null,
                "extensionAttribute12": null,
                "extensionAttribute13": null,
                "extensionAttribute14": null,
                "extensionAttribute15": null
            },
            "onPremisesProvisioningErrors": [],
            "onPremisesSipInfo": {
                "isSipEnabled": false,
                "sipDeploymentLocation": null,
                "sipPrimaryAddress": null
            },
            "provisionedPlans": [],
            "serviceProvisioningErrors": []
        },
  ]
}

The text was updated successfully, but these errors were encountered:

sh-rp · 2025-02-17T12:36:30Z

Please provide us some code of our pipeline. Also if you see columns set to "NULL" not appear in the destination it is most likely because none of the incoming rows/records have a datatype there and dlt can not determine the type of the column so it is dropped. You can provide hints for this if you like.

barletza-pathid · 2025-02-17T13:36:13Z

Please provide us some code of our pipeline. Also if you see columns set to "NULL" not appear in the destination it is most likely because none of the incoming rows/records have a datatype there and dlt can not determine the type of the column so it is dropped. You can provide hints for this if you like.

Thanks.

The thing is that the schema and the columns are dynamic. We can't know the properties ahead of time, so we need that the response from the API will be loaded as is to s3. Without any transformation or modifications.
As if you execute the request from Postman.

sh-rp · 2025-02-17T13:53:31Z

I doubt that we will be supporting columns with unknown type any time in the future, since pretty much all destinations (except for pure json files) have schemas. You could set the max_nesting_level to 0 though, then the nested "null" columns will be retained.

sh-rp · 2025-02-17T13:53:50Z

If you want more help, we really need some of your code.

barletza-pathid · 2025-02-18T07:32:27Z

@sh-rp

We already use max_table_nesting=0

resource_name = get_resource_name(api_config)
        destination = filesystem(bucket_url=run_path)

pipeline = dlt.pipeline(
            destination=destination,
            dataset_name=api_config.resource_config.table_name,
)
source = create_dynamic_source(
                rest_client_config, api_config.resource_config, resource_name
            )
load_info = pipeline.run(source())






def create_dynamic_source(
    rest_client_config: ClientConfig,
    resource_config: ResourceConfig,
    resource_name: str,
) -> SourceFactory:
    """Creates a dynamic source based on the configuration"""

    @dlt.source(name=resource_name, max_table_nesting=0)
    def dynamic_source() -> Iterator:
        rest_config: RESTAPIConfig = {
            "client": rest_client_config,
            "resources": [
                {
                    "name": resource_name,
                    "primary_key": "id",
                    "write_disposition": "replace",
                    "table_name": resource_name,
                    "endpoint": {
                        "path": resource_config.path,
                        "params": resource_config.params,
                    },
                }
            ],
        }
        yield from rest_api_resources(rest_config)

    return dynamic_source

sh-rp · 2025-02-18T15:08:10Z

Ok, so maybe try to adjust your path setting to include the full result (which also was part of your questions), then you should have the full result and also the nested columns that are null should be retained. What is path set to in your example?

barletza-pathid · 2025-02-18T15:17:32Z

Ok, so maybe try to adjust your path setting to include the full result (which also was part of your questions), then you should have the full result and also the nested columns that are null should be retained. What is path set to in your example?

The path is the API endpoint, for example if the base url is /api/<provider_api>/v1 path is for example users and then the request url should be /api/<provider_api>/v1/users

I don't really understand adjusting the path can help us.

sh-rp · 2025-02-18T17:18:06Z

Ah sorry, I meant the "data_selector" which selects which part of the returned json is forwarded into the resource: https://dlthub.com/docs/general-usage/http/rest-client

barletza-pathid · 2025-02-20T09:21:21Z

data_selector

Yeah, we have tried it. It didn't work. I think the reason is the way this specific API (msgraph API) returns the response, where the value is an array (list) and not an object. I guess that it works with other APIs (like in the example on the docs)

sh-rp · 2025-02-24T12:14:41Z

You could add a transformer to change the data shape before it is ingested by the dlt extract stage, but imho both dictionaries or lists should work..

github-project-automation bot added this to dlt core library Feb 14, 2025

github-project-automation bot moved this to Todo in dlt core library Feb 14, 2025

sh-rp self-assigned this Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with REST API source - response body #2310

Issues with REST API source - response body #2310

barletza-pathid commented Feb 14, 2025

sh-rp commented Feb 17, 2025

barletza-pathid commented Feb 17, 2025

sh-rp commented Feb 17, 2025

sh-rp commented Feb 17, 2025

barletza-pathid commented Feb 18, 2025

sh-rp commented Feb 18, 2025

barletza-pathid commented Feb 18, 2025

sh-rp commented Feb 18, 2025

barletza-pathid commented Feb 20, 2025

sh-rp commented Feb 24, 2025

Issues with REST API source - response body #2310

Issues with REST API source - response body #2310

Comments

barletza-pathid commented Feb 14, 2025

sh-rp commented Feb 17, 2025

barletza-pathid commented Feb 17, 2025

sh-rp commented Feb 17, 2025

sh-rp commented Feb 17, 2025

barletza-pathid commented Feb 18, 2025

sh-rp commented Feb 18, 2025

barletza-pathid commented Feb 18, 2025

sh-rp commented Feb 18, 2025

barletza-pathid commented Feb 20, 2025

sh-rp commented Feb 24, 2025