-
Notifications
You must be signed in to change notification settings - Fork 340
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added Ganga GSoc 2024 poposal (#1492)
* Added Ganga GSoc 2024 poposal * Fix two typos * Update _gsocproposals/2024/proposal_GangaAIassistant.md Fix typo Co-authored-by: Valentin Volkl <[email protected]> --------- Co-authored-by: Valentin Volkl <[email protected]>
- Loading branch information
Showing
4 changed files
with
77 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
title: "Imperial College London" | ||
author: "Enric Tejedor" | ||
layout: default | ||
organization: ImperialCollege | ||
logo: Imperial-College-London2.png | ||
description: | | ||
[Imperial College London](https://www.imperial.ac.uk/) is a world top ten university with an international reputation for excellence in teaching and research. Consistently rated amongst the world's best universities, Imperial is committed to developing the next generation of researchers, scientists and academics through collaboration across disciplines. Located in the heart of London, Imperial is a multidisciplinary space for education, research, translation and commercialisation, harnessing science and innovation to tackle global challenges. | ||
--- | ||
|
||
{% include gsoc_proposal.ext %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
title: "Monash University" | ||
author: "Ulrik Egede" | ||
layout: default | ||
organization: MonashUniversity | ||
logo: Monash.png | ||
description: | | ||
[Monash University](https://www.monash.edu/) Monash University is one of Australia's leading universities and ranks among the world's top 100. We help change lives through research and education. | ||
--- | ||
|
||
{% include gsoc_proposal.ext %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
--- | ||
project: Ganga | ||
layout: default | ||
logo: ganga_logo_150dpi.png | ||
description: | | ||
[Ganga](https://github.com/ganga-devs/ganga) is a computational task-management tool, which allows for the specification, submission, bookkeeping and post-processing of computational tasks on a wide set of distributed resources. | ||
Ganga has been developed to solve a problem increasingly common in scientific projects, which is that researchers must regularly switch between different processing systems, each with its own command set, to complete their computational tasks. Ganga provides a homogeneous environment for processing data on heterogeneous resources. | ||
--- | ||
|
||
{% include gsoc_project.ext %} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
--- | ||
project: Ganga | ||
title: Incorporate a Large Language Model to assist users | ||
layout: gsoc_proposal | ||
year: 2024 | ||
difficulty: medium | ||
duration: 350 | ||
mentor_avail: May-November | ||
organization: | ||
- ImperialCollege | ||
- MonashUniversity | ||
--- | ||
|
||
## Description | ||
The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The [Ganga](https://github.com/ganga-devs/ganga) user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end. | ||
|
||
As a scripting and command line interface, there will naturally be users that have problems with getting the syntax correct. To solve this, they will often spend time searching through mailing lists, FAQs and discussion fora or indeed just wait for another more advanced coder to debug their problem. The idea of this project is to integrate a Large Language Model (LLM) into the command prompt in Ganga. This should allow the user to describe in words what they would like to do and get an example that they can incorporate. It should also intercept exceptions thrown by the Ganga interface, help the user to understand them and propose solutions. | ||
|
||
## Task ideas | ||
* Explore the different options for integrating an LLM into a command line prompt. The ideas in [GPTerm](https://github.com/ademakdogan/GPTerm) and [Codeium](https://codeium.com/) might act as inspiration. | ||
* Integrate the interaction with an LLM into the Ganga prompt. | ||
* Minimise the installation overhead for adding an LLM interaction to Ganga. | ||
* Improve the LLM by adding local training samples obtained from the logs of existing Ganga users. | ||
* Understand how priming the LLM can improve its accuracy | ||
* Develop continuous integration tests that can ensures that LLM integration will keep working. | ||
|
||
## Expected results | ||
For the scientific users of Ganga, this will speed up their development cycle as they will get a faster response to the usage queries that they have. | ||
|
||
As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will get experience with how LLMs can be integrated directly into projects to assist users. | ||
|
||
## Evaluation Task | ||
Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task. | ||
|
||
## Requirements | ||
Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate) | ||
|
||
## Mentors | ||
* [Alex Richards](mailto:[email protected]) | ||
* [Mark Smith](mailto:[email protected]) | ||
* **[Ulrik Egede](mailto:[email protected])** | ||
|
||
## Links | ||
* [Ganga](https://github.com/ganga-devs/ganga) |