-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
28 changed files
with
46 additions
and
187 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,221 +1,80 @@ | ||
# Idea #1 | Design Co-pilot | ||
# Bonus Idea | Generative-based Co-pilot | ||
|
||
## 🔎 Idea overview | ||
|
||
Designers create the design and design system, and the co-pilot can do routine tasks such as transferring this design to other pages where you don’t need to be creative just transfer this design. So you can act like a junior designer creating content based on created design. It leaves control to the designer but can augment and scale the activities. | ||
Due to the limited academic and practical work done on the component and page as SVG generation, we propose the pivot of the idea. | ||
|
||
## 💡 Feature analysis | ||
### Approach #1 | Editing SVG elements | ||
|
||
The system will allow users to edit the SVG UI elements by describing their desired changes in a text format, which will then be interpreted by a chatGPT-like system and applied to the SVG element. | ||
|
||
| Technology readiness | Risks | Complexity | | ||
| ----- | ----- | ---------- | | ||
| 🟢 Ready for implementation | <div style="width: 100pt"> 🟡 Moderate risk | <div style="width: 150pt"> 🟠 Moderately complex | | ||
|
||
|
||
**Technologies** | ||
|
||
Following LLM for SVG editing [[Paper](references/Research%20papers/LLM_for_SVG_Editing.pdf)], the input can be an optimized SVG with text instructions to edit it. The language Model can be tuned on the created target dataset using raster-to-vector tools. | ||
Instead of generating long SVG text files for each component, we can utilize controllable image generation to **generate the components or page visualization in the existing or target styles.** | ||
|
||
🖼️ **Visualization** | ||
|
||
| Demo #1 | Demo #2 | Demo #3 | | ||
| --- | --- | --- | | ||
| <div style="width: 170pt"> data:image/s3,"s3://crabby-images/24624/246241fdf376331e2e401696fa9469673a8c8ecc" alt="SVGEditDemo1.png" | <div style="width: 170pt"> data:image/s3,"s3://crabby-images/5d9a4/5d9a4a75acef6eefbfaaa64dd0f13c24ad0bd1df" alt="SVGEditDemo2.png" | <div style="width: 170pt"> data:image/s3,"s3://crabby-images/012cd/012cd440db9aa952dc76efe97948328c47898149" alt="SVGEditDemo3.png" | | ||
|
||
**Requirements** | ||
|
||
- ML model: LLM | ||
- SVG optimizer: SVGO [[Github](https://github.com/svg/svgo)] | ||
- Raster-to-vector tool: | ||
- Vectorisation [[Tutorial](https://blog.thea.codes/raster-vectorization-with-python/)][[Sample code](https://gist.github.com/theacodes/2e13e4e05700279734ca4b34df370adb)] | ||
- Vtracer [[Github](https://github.com/visioncortex/vtracer)] | ||
- Input: SVG component + text prompt | ||
- Output: edited SVG component | ||
<details> | ||
<summary>Dataset: Open-source SVG components with text description for finetuning LLM</summary> | ||
|
||
- Iconify: >150,000 open source SVG icons [[Website](https://iconify.design/)] [[Description](https://iconify.design/docs/icons/icon-data.html)] [[Figma Plug-in](https://www.figma.com/community/plugin/735098390272716381/Iconify)] [[Figma Plug-in Github](https://github.com/iconify/iconify-figma)] | ||
|
||
- FIGR-8: containing **17,375 classes** of **1,548,256 images** representing pictograms, ideograms, icons, emoticons or object or conception depictions (*with both png and svg format*) [[Github](https://github.com/marcdemers/FIGR-8)] | ||
|
||
data:image/s3,"s3://crabby-images/3e0d0/3e0d08ce43fc53082b223225b8303f8f6f0265ac" alt="dataset_explanation.png" | ||
|
||
- SVG Repo: with 500,000+ open-licensed SVG vector and icons [[Website](https://www.svgrepo.com/)] | ||
|
||
</details> | ||
<br> | ||
|
||
**Relevant works** | ||
|
||
[Research] | ||
|
||
- LLM for SVG editing [[Paper](references/Research%20papers/LLM_for_SVG_Editing.pdf)] | ||
|
||
- LLM for image editing [[Github](https://github.com/IDEA-Research/GroundingDINO/blob/main/demo/image_editing_with_groundingdino_gligen.ipynb)]: GroundingDINO [[Github](https://github.com/IDEA-Research/GroundingDINO)] + GLIGEN [[Github](https://github.com/gligen/GLIGEN)] | ||
|
||
- Text prompt for image editing: InstructPix2Pix [[Github](https://github.com/timothybrooks/instruct-pix2pix)];Prompt-to-prompt [[Github](https://github.com/google/prompt-to-prompt/)] | ||
|
||
**Pros and Cons** | ||
|
||
🟢 Pros | ||
|
||
- It could leverage foundation models to understand image content in SVG format | ||
- Provides obvious AI performance on a cross-domain task | ||
- Has sufficient dataset | ||
|
||
🔴 Cons | ||
|
||
- Generation quality has a risk of not meeting the designer's requirements since there is limited research on SVG generation | ||
- It might be limited on generated detailed SVG components | ||
|
||
--- | ||
## 💡 Feature analysis | ||
## Approach #1 | Generative-based Co-pilot | ||
|
||
### Approach #2 | New component generation | ||
|
||
The system will take user input about the desired component and generate a new component based on the text prompt, which will then be interpreted by a chatGPT-like system and applied to the SVG element. | ||
The system will allow a user to generate an image of the component or the page based on the input style or layout mockup. | ||
|
||
| Technology readiness | Risks | Complexity | | ||
| ----- | ----- | ---------- | | ||
| <div style="width: 200pt"> 🟡 Some elements are available, but further development and research needed | <div style="width: 150pt"> 🟡 Moderate risk | <div style="width: 130pt"> 🔴 Complex | | ||
|
||
|
||
**Technologies** | ||
|
||
- Following IconShop [[Paper](references/Research%20papers/IconShop.pdf)]. Based on a transformer-based method to achieve text-to-SVG. The dataset usage is more simple than VectorFusion. | ||
- Following VectorFusion [[Paper](references/Research%20papers/VectorFusion.pdf)]. Prompting stable diffusion -> raster image -> pre-processing (i.e., background removal) -> vectorisation into vector image via Vectorisation [[Tutorial](https://blog.thea.codes/raster-vectorization-with-python/)][[Sample code](https://gist.github.com/theacodes/2e13e4e05700279734ca4b34df370adb)] | ||
or Vtracer [[Github](https://github.com/visioncortex/vtracer)] | ||
| <div style="width: 200pt"> 🟡 Some elements exist but require adaptation | <div style="width: 150pt"> 🟠 Higher than average | <div style="width: 130pt"> 🟠 Moderately complex | | ||
|
||
**Requirements** | ||
### Pipeline description | ||
|
||
- ML model: [IconShop](https://arxiv.org/pdf/2304.14400.pdf) or [VectorFusion](https://ajayj.com/vectorfusion) | ||
|
||
<details> | ||
<summary>Data need:</summary> | ||
|
||
- Iconify: >150,000 open source SVG icons [[Website](https://iconify.design/)] [[Description](https://iconify.design/docs/icons/icon-data.html)] [[Figma Plug-in](https://www.figma.com/community/plugin/735098390272716381/Iconify)] [[Figma Plug-in Github](https://github.com/iconify/iconify-figma)] | ||
|
||
- FIGR-8: containing **17,375 classes** of **1,548,256 images** representing pictograms, ideograms, icons, emoticons or object or conception depictions (*with both png and svg format*) [[Github](https://github.com/marcdemers/FIGR-8)] | ||
|
||
data:image/s3,"s3://crabby-images/3e0d0/3e0d08ce43fc53082b223225b8303f8f6f0265ac" alt="dataset_explanation.png" | ||
Step #1) Input mockup layout in the SVG format. It can be user-created or text-based generation using with [[Challenge #1 | Text-to-layout generator](https://github.com/neurons-lab/Penpot-C1_Design-Co-pilot/tree/main/Approach%233-New_layout_generation)] | ||
|
||
- SVG Repo: with 500,000+ open-licensed SVG vector and icons [[Website](https://www.svgrepo.com/)] | ||
|
||
</details> | ||
|
||
**Relevant works** | ||
|
||
[Research] | ||
|
||
- VectorFusion [[Paper](references/Research%20papers/VectorFusion.pdf)]: text-to-image-to-vector method | ||
- IconShop [[Paper](references/Research%20papers/IconShop.pdf)]: The key to the success of IconShop is to exploit the sequential nature of SVG. Design a transformer-based architecture to achieve text-to-SVG. | ||
- with black-and-white icon dataset, [FIGR-8](https://github.com/marcdemers/FIGR-8) | ||
- Raster-to-Vector tool: open-source model Vtracer [[Github](https://github.com/visioncortex/vtracer)] | ||
|
||
[Business solutions] | ||
<details> | ||
<summary>Recraft.ai</summary> | ||
|
||
- References: [[Website](https://www.recraft.ai/)] [[Product Hunt](https://www.producthunt.com/posts/recraft-ai?utm_source=badge-featured&utm_medium=badge&utm_souce=badge-recraft-ai)][[Demo](https://youtu.be/91_i0YcsP0o)] | ||
- Support: (a) text prompt to svg, (b) image modification with prompt, (c) fix issues for user selected region, (d) can specify target styles | ||
- Output format: png, jpg (512x512 & 1024x1024), SVG, Lottie | ||
- **Try some results**: some are awesome; some are not impressive, even in the simple text prompt | ||
- **Awesome ones** | ||
|
||
data:image/s3,"s3://crabby-images/b6270/b627096d469d670ef7a279ed7969879031b2074c" alt="Recraft - robot eating a burger (cartoon).png".png) | ||
|
||
data:image/s3,"s3://crabby-images/b3fa2/b3fa2cf178cfe136b92dceb7e72cbd82311cfdd7" alt="Recraft - text prompt to svg.png" | ||
|
||
with complex details | ||
data:image/s3,"s3://crabby-images/316f2/316f298bb9b19eea5e464c3eb3dc1098ddd43112" alt="with complex details" | ||
|
||
- **Not impressive ones** | ||
|
||
data:image/s3,"s3://crabby-images/acb09/acb09906cfdb6e571d30f29d852361ecb5a9b865" alt="Recraft - (complex) text prompt to svg.png"_text_prompt_to_svg.png) | ||
|
||
Not impressive one, even in simple prompt “hand” | ||
|
||
data:image/s3,"s3://crabby-images/f06ff/f06ff8ae32ef788c7ef4cf4626cfcd59d0479a5a" alt="Recraft_can't used results.png" | ||
</details> | ||
|
||
<details> | ||
<summary>iconomy.app</summary> | ||
|
||
- Reference: [[Try the Demo](https://run.iconomy.app/)] | ||
- 👍 have web UI; the result is acceptable | ||
|
||
data:image/s3,"s3://crabby-images/c4e3d/c4e3dfc1e0a2c44e76c5235f6b688d657eb4f9dd" alt="UI sample.png" | ||
|
||
- 👎 no API; only 5 trys for free | ||
data:image/s3,"s3://crabby-images/9db55/9db552bad3806bf2bb3b7f794c3232e5b4ab7932" alt="Screenshot 2023-06-19 at 14.54.24.png" | ||
|
||
</details> | ||
Step #2) Based on the layout and text prompt, generate compatible images and texts via ControlNet [[Github](https://github.com/lllyasviel/ControlNet)]. | ||
|
||
- Adobe Vectorisation [[Website](https://www.adobe.com/express/feature/image/convert/svg)] | ||
data:image/s3,"s3://crabby-images/9ae65/9ae65863409ac87c733c8aad193bc9889220ddf1" alt="截圖 2023-06-20 下午1.56.04.png" | ||
|
||
Step #3) User pick several inspiring options for further SVG re-implementation | ||
|
||
### Requirements | ||
|
||
**Pros and Cons** | ||
|
||
🟢 Pros | ||
|
||
- Could leverage foundation models to generate decent raster image (if we choose the solution of raster-to-svg) | ||
- Have two state-of-the-art research solutions with different generative architectures for guidance | ||
- Have sufficient dataset | ||
|
||
🔴 Cons | ||
- ML model: | ||
- Text-to-layout generator: CLIP [[Github](https://github.com/OpenAI/CLIP)] + UNet | ||
- Layout-to-content generator: ControlNet [[Github](https://github.com/lllyasviel/ControlNet)] + UNet;GLIGEN [[Github](https://github.com/gligen/GLIGEN)] | ||
- Input: text prompt | ||
- Output: layouts and various complete designs | ||
- Dataset: | ||
- Text prompt ↔ layout pairs: follow UI description [[Paper](references/research_papers/UIDiscription.pdf)] to create | ||
- layout ↔ complete design pairs: [Rico’17](https://www.kaggle.com/datasets/onurgunes1993/rico-dataset) | ||
|
||
- No respective open-source Github repo available for checking the performance of the research paper | ||
- Generation quality has the risk of not meeting the designer's requirements since there is limited research on SVG generation | ||
|
||
--- | ||
|
||
### Approach #3 | New layout generation | ||
|
||
The system will take user input about the desired layout and generate a new page based on the existing elements. The idea is to utilize a top-down approach, first generating a layout, after - populating it with elements. | ||
|
||
| Technology readiness | Risks | Complexity | | ||
| ----- | ----- | ---------- | | ||
| <div style="width: 200"> 🟡 Some elements are available, but further development and research needed | <div style="width: 150pt"> 🟡 Moderate risk | <div style="width: 130pt"> 🔴 Complex | | ||
|
||
**Technology** | ||
|
||
Pre-process SVG layouts → Clean by place holding elements | ||
→ Input to LLM Model | ||
→ Model output high-level layout based on the prompt | ||
→ Post-process by replacing placeholders with SVG | ||
→ Output the final layout. | ||
|
||
**Relevant works** | ||
### Relevant works** | ||
|
||
[Research] | ||
|
||
- Layout generation [[Paper](references/Research%20papers/LayoutFormer++.pdf)] | ||
- Apple`21 about parsing layout into components [[Website](https://blog.ml.cmu.edu/2021/12/10/understanding-user-interfaces-with-screen-parsing/)] | ||
|
||
- Layout generation with Transformers [[Paper](/references/Research%20papers/GUILGET.pdf)] | ||
data:image/s3,"s3://crabby-images/0d5c4/0d5c4e2b996a21d902492bc763bc24e98f8b46c6" alt="1ODBmbHSwFRMgGTreeZLegw.gif" | ||
|
||
[Business solutions] | ||
- LayoutDM [[Github](https://cyberagentailab.github.io/layout-dm/)] | ||
- BoostingGUI :first tries on controllable page-generation [[Paper](references/research_papers/BoostingGUI.pdf)] | ||
- UI description [[Paper](references/research_papers/UIDiscription.pdf)] | ||
- Text-to-image generator: | ||
- Latent Diffusion Model [[Paper](references/research_papers/LDM.pdf)] [[Github](https://github.com/CompVis/latent-diffusion)] | ||
|
||
- Galileo AI · Copilot for interface design [[Website](https://www.usegalileo.ai/)] | ||
- Composable-Diffusion: support compositional text prompt [[Website](references/research_papers/Compositional-Visual-Generation-with-Composable-Diffusion-Models.pdf)] [[Github](https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch)] | ||
|
||
- WireGen - AI GPT wireframe generation [[Figma Plug-in](https://www.figma.com/community/plugin/1221144015267698736/WireGen---AI-GPT-wireframe-generation)] | ||
- ControlNet: adding more control by image/sketch [[Paper](references/research_papers/ControlNet.pdf)] [[Github](https://github.com/lllyasviel/ControlNet)] [[WebUI extension](https://github.com/Mikubill/sd-webui-controlnet)] | ||
|
||
- Builder.io - Generate designs with AI & export to code [[Figma Plug-in](https://www.figma.com/community/plugin/747985167520967365/Builder.io---Generate-designs-with-AI-&-export-to-code/Builder.io---Generate-designs-with-AI-&-export-to-code)] | ||
- Editing images: | ||
- GroundingDINO: regional image editing [[Github](https://github.com/IDEA-Research/GroundingDINO/blob/main/demo/image_editing_with_groundingdino_gligen.ipynb)] | ||
- Drag-your-GAN [[Website](https://vcai.mpi-inf.mpg.de/projects/DragGAN/)] [[Github](https://github.com/XingangPan/DragGAN)] | ||
- Edit Everything [[paper](references/research_papers/EditEverything.pdf)] [[Github](https://github.com/DefengXie/Edit_Everything)] | ||
|
||
**Pros and Cons** | ||
[Business Solutions] | ||
|
||
- Microsoft Designer [[Website](https://designer.microsoft.com/)] [[Demo](https://youtu.be/vQK-E_Mzeq0)] | ||
|
||
### Pros and Cons** | ||
|
||
🟢 Pros | ||
|
||
- The ability to generate new layouts based on user inputs and existing elements allows for highly adaptive and custom designs | ||
- The model generates a high-level layout based on the prompt, which would save computation cost and time for outputting full-length SVG with content. | ||
- An existing high-quality open-source solution for image generation | ||
- More research-active area | ||
|
||
🔴 Cons | ||
|
||
- No respective open-source Github repo is available for checking the performance of the research paper | ||
- Generation quality has the risk of not meeting the designer's requirements since there is limited research on SVG generation | ||
|
||
--- | ||
|
||
## 🏁 Final recommendation | ||
|
||
[A1] **Editing SVG elements** using a language model appears more feasible in the short term. On the other hand, [A3] **New layout generation** could potentially demonstrate higher effectiveness for the clients; however, the lack of research and existing methods may negatively influence the complexity of its potential implementation. Besides, [A2] **New component generation** with 2 suggested implementation processes is more complex. | ||
- Requires diverse new dataset and additional pre-processing | ||
- Generation quality should be usable enough for the design application |
Binary file not shown.
Empty file.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
Binary file not shown.
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed
BIN
-725 KB
reports/figures/Recraft_-_text_prompt_to_svg_with_extreme_details.png
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.