Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to recognize dockerfiles with extensions #4567

Closed
wants to merge 1 commit into from

Conversation

kmehant
Copy link

@kmehant kmehant commented Jun 28, 2019

Description

Fixes #4566

Checklist:

  • I am associating a language with a new file extension.

  • I am adding a new language.

  • I am fixing a misclassified language

    • I have included a new sample for the misclassified language:
      • Sample source(s):
        • [URL to each sample source, if applicable]
      • Sample license(s):
    • I have included a change to the heuristics to distinguish my language from others using the same extension.
  • I am changing the source of a syntax highlighting grammar

  • I am updating a grammar submodule

  • I am adding new or changing current functionality

    • I have added or updated the tests for the new or changed functionality.
  • I am changing the color associated with a language

    • I have obtained agreement from the wider language community on this color change.
      • [URL to public discussion]
      • [Optional: URL to official branding guidelines for the language]

@kmehant
Copy link
Author

kmehant commented Jun 28, 2019

@lildude, please review

Copy link
Contributor

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the report and pull request @kmehant!

Could you also add sample files in samples/Dockerfile/filenames/ for the new filenames you're adding? We prefer sample files to be from "real" application (not hello worlds, etc.) and under permissive license.

@@ -1110,6 +1110,9 @@ Dockerfile:
- ".dockerfile"
filenames:
- Dockerfile
- "dockerfile"
- "Dockerfile.*"
- "dockerfile.*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not going to work. You'll have to give the explicit name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's precisely the issue @kmehant has raised in #4566: Linguist has no way of identifying files based on a common prefix, only a suffix. Although that gives me an idea...

@@ -1110,6 +1110,9 @@ Dockerfile:
- ".dockerfile"
filenames:
- Dockerfile
- "dockerfile"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lower-case D common? I've never encountered it before.

@lildude Any way we can search for that on GitHub.com? Maybe through Google BigQuery?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not on GitHub itself as there's no case-sensitive search. I have no idea about BigQuery - never used it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about BigQuery - never used it.

I have now. 🎉

There are only 403 instances dockerfile in the BigQuery dataset according to this query:

SELECT repo_name, path
FROM `bigquery-public-data.github_repos.files`
WHERE path LIKE '%/dockerfile'
GROUP BY repo_name, path
LIMIT 5000

There are well over 5000 examples of Dockerfile though 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about BigQuery - never used it.

I have now. 🎉

Wow! How did you do that so quickly? Is there a good tutorial for that? It would definitely improve our ability to asses in-the-wild usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched Google, found https://codelabs.developers.google.com/codelabs/bigquery-github/index.html?index=..%2F..index#0, signed into my work GCP account (I was already signed in, but could have used my personal account too) and put my terrible SQL skills to work 😁.

@lildude
Copy link
Member

lildude commented Jun 28, 2019

🤔 Other than the case of dockerfile which is new to me and probably discouraged as the documentation recommends Dockerfile, the rest of this PR sounds like an attempt to cater for personal preference as opposed to officially documented convention. An override is generally the best option for personal preference.

Additionally, there are waaaay too many variations of *[Dd]ockerfile* on GitHub.com to even reasonably attempt to cater for them by filename alone, assuming we could use wildcards or regex (we can't).

@kmehant
Copy link
Author

kmehant commented Jun 28, 2019

Other than the case of dockerfile which is new to me and probably discouraged as the documentation recommends Dockerfile, the rest of this PR sounds like an attempt to cater for personal preference as opposed to officially documented convention. An override is generally the best option for personal preference.

Additionally, there are waaaay too many variations of *[Dd]ockerfile* on GitHub.com to even reasonably attempt to cater for them by filename alone, assuming we could use wildcards or regex (we can't).

How about the extensions such as Dockerfile.something or simply Dockerfile.* How about this. I am sure such are used widely

@lildude
Copy link
Member

lildude commented Jun 28, 2019

How about the extensions such as Dockerfile.something or simply Dockerfile.* How about this. I am sure such are used widely

They are, however you can't use wildcards or a regex for the filename strategy... only precise names can be used.

@kmehant
Copy link
Author

kmehant commented Jun 28, 2019

How about the extensions such as Dockerfile.something or simply Dockerfile.* How about this. I am sure such are used widely

They are, however you can't use wildcards or a regex for the filename strategy... only precise names can be used.

Oh! I get it now then I think this is unresolvable as we can never have GitHub recognize Dockerfile.* files. Then let's close this PR

@pchaigno
Copy link
Contributor

We can probably recognize the most common one.

@stale
Copy link

stale bot commented Aug 27, 2019

This pull request has been automatically marked as stale because it has not had recent activity, and will be closed if no further activity occurs. If this pull request was overlooked, forgotten, or should remain open for any other reason, please reply here to call attention to it and remove the stale status. Thank you for your contributions.

@stale stale bot added the Stale label Aug 27, 2019
@stale
Copy link

stale bot commented Sep 10, 2019

This pull request has been automatically closed because it has not had activity in a long time. Please feel free to reopen it or create a new issue.

@stale stale bot closed this Sep 10, 2019
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to recognise Dockerfiles with extensions
4 participants