Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe process for concatenating/transforming a git repository #11

Open
MandarinConLaBarba opened this issue Oct 5, 2014 · 5 comments

Comments

@MandarinConLaBarba
Copy link
Contributor

I guess to make it easy we should just support Clojure code out of the gate.

Steps:

  1. Clone the repo locally (or we could use API, but there are rate limits, and this would be harder)
  2. Exclude all disallowed files. For example, maybe .txt, or whatever. Also stuff under 'node_modules' or anything that looks like it is dependency code and not real source for the repository.
  3. Make sure at least one file in the repository is of a supported type (again, maybe just Clojure for now)
  4. Read all files in by walking the directory tree, concatenating as we go
  5. Transform the code into EDN, and store in the db in the .edn column. See Come up w/ rough schema #9

Questions

  • Is there a limit to the number of files we'll support?
  • Is there a limit to the size of the EDN we'll support?

cc @j0ni

@j0ni
Copy link
Member

j0ni commented Oct 5, 2014

So, continuing from #9, I'm not sure the directory structure is particularly important, but the require graph is. I think capturing the directory structure is really just a convenient way to be able to follow the require graph around and find the relevant sources.

@MandarinConLaBarba
Copy link
Contributor Author

Hmm, OK. So why do you think the require graph is important? So that we can tell what is actually in the program and therefore accurately represent it in a visualization?

@j0ni
Copy link
Member

j0ni commented Oct 6, 2014

I guess it depends on what you want the visualization to represent.

I figured you're trying to do something which represents the structure of the code, possibly also the semantics of the code. Either way, symbols that are referenced within a given namespace which come from outside that namespace need to be found via the require graph, otherwise they just dangle.

On the other hand, if you're just looking to build a data structure and aren't interested in the meanings or relationships between the symbols and their meaning (i.e., def(n)s) then I guess it doesn't matter.

@MandarinConLaBarba
Copy link
Contributor Author

Well, I think the approach of using the require graph is more correct in the sense that the visualization will accurately reflect the code in the program. But I'm not sure that it will make much of a difference in the actual rendering - I think so long as the method is consistent it won't matter that much. However I do think it's a more interesting story to tell if we're using the actual require graph.

I wonder how hard it will be to develop an accurate graph w/o actually interpreting the code. I'm thinking of the issues w/ obsolete ways to import modules in Clojure (e.g. use vs require, and so on..), or even more difficult, in JavaScript with various module loaders (RequireJS, CommonJS, ECMA6, etc).

@MandarinConLaBarba
Copy link
Contributor Author

Hmm, but actually if we're using the graph, one might be able to actually recognize elements of their program..for example, maybe we have a "trees" viz module that renders a tree per module. In a sample program, there's three modules, one with 100 symbols, one with 80 symbols, and another with 50. Perhaps the one w/ 100 symbols is a larger tree than the others, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants