Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function app FileLayoutParsingOther function: Failed to download 'punkt' package #936

Open
zzesdc opened this issue Dec 27, 2024 · 1 comment
Assignees
Labels
bug Something isn't working
Milestone

Comments

@zzesdc
Copy link

zzesdc commented Dec 27, 2024

Hi Tech team,
I am deploying a secure mode deployment using v1.2. After I uploaded a doc file, it stuck at "FileUploadedFunc - pdf file sent to submit queue. Visible in 52 seconds". When I check the invocations for "FileLayoutParsingOther" section in function app, it shows the error:
Image

I did search for the solution of the same issues. I found adding both "nltk.download('punkt')
nltk.download('punkt_tab')" under utilities.py file. But still no luck to download punkt.

I tried to remove the private endpoint and open the function app inbound traffic to the public but keeping vnet integration. Still no luck.

Thanks for help!

@ruandersMSFT
Copy link
Contributor

ruandersMSFT commented Jan 17, 2025

I have also experienced this issue while deploying on an internal private network. Our temporarily workaround was to ourselves download and extract the nltk files into the Azure SMB File Share of the Storage Account used by the Function Application. Specifically, we created a nltk_data folder at the root of the SMB File Share, and within nltk_data we have a taggers and tokenizers folder. The tokenizers folder has a punkt and punkt_tab folder in which the extracted files are uploaded. The taggers folder has a averaged_perceptron_tagger folder in which the extracted files of averaged_perceptron_tagger are uploaded. With these files added, the function app was able to run in the private environment.

The code within the solution is attempting to download punkt packages via the nltk.download call and then extract into the folders. I haven't had time to troubleshoot a root cause fix yet in the environment, but similar to our environment you may be behind a firewall or proxy to get to the Internet which also possibly includes the use of internal certificate authority certificates via the proxy. Our environment passes through a proxy that has internal certificates, and I think a root cause in our environment is that the download is not occurring because it can't reach the site due to failed SSL validation of the internal SSL certificates issued by the internal proxy.

Recommend referencing the Installing NLTK Data page for manual installation instructions which equates to the folder creation above. Manual Downloads for extraction:

In order to troubleshoot a root cause, which is likely firewall, proxy or SSL Certificate trust related, access the Kudu App for the function application where you can access the SSH and Bash consoles. Try to do some command such as curl to download the zipped packages above. Verify network routing is working and that the download URLs are reachable. If you identify that there is an SSL Certificate chain trust issue with intermediate certificates, consider if the certificates are added to the Function Application and imported via WEBSITE_LOAD_CERTIFICATES. Because the nltk library download attempt may also be failing ssl validation, consider if addition of ssl create_default_context() with cafile or capath are needed to trust ssl certificates in the application code.

@dayland dayland self-assigned this Feb 5, 2025
@dayland dayland added the bug Something isn't working label Feb 5, 2025
@dayland dayland added this to the 2.0 milestone Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants