Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to recognize dockerfiles with extensions #4567

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions lib/linguist/languages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1110,6 +1110,9 @@ Dockerfile:
- ".dockerfile"
filenames:
- Dockerfile
- "dockerfile"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the lower-case D common? I've never encountered it before.

@lildude Any way we can search for that on GitHub.com? Maybe through Google BigQuery?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not on GitHub itself as there's no case-sensitive search. I have no idea about BigQuery - never used it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about BigQuery - never used it.

I have now. 🎉

There are only 403 instances dockerfile in the BigQuery dataset according to this query:

SELECT repo_name, path
FROM `bigquery-public-data.github_repos.files`
WHERE path LIKE '%/dockerfile'
GROUP BY repo_name, path
LIMIT 5000

There are well over 5000 examples of Dockerfile though 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about BigQuery - never used it.

I have now. 🎉

Wow! How did you do that so quickly? Is there a good tutorial for that? It would definitely improve our ability to asses in-the-wild usage.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched Google, found https://codelabs.developers.google.com/codelabs/bigquery-github/index.html?index=..%2F..index#0, signed into my work GCP account (I was already signed in, but could have used my personal account too) and put my terrible SQL skills to work 😁.

- "Dockerfile.*"
- "dockerfile.*"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not going to work. You'll have to give the explicit name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's precisely the issue @kmehant has raised in #4566: Linguist has no way of identifying files based on a common prefix, only a suffix. Although that gives me an idea...

ace_mode: dockerfile
codemirror_mode: dockerfile
codemirror_mime_type: text/x-dockerfile
Expand Down