Skip to content
View zucchini-nlp's full-sized avatar
🦄
To code or not to code
🦄
To code or not to code

Organizations

@huggingface @deeppavlov

Block or report zucchini-nlp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.

Python 935 84 Updated Mar 9, 2025

Witness the aha moment of VLM with less than $3.

Python 3,140 245 Updated Mar 1, 2025

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 16,658 2,182 Updated Feb 1, 2025

Language Quantized AutoEncoders

Python 101 5 Updated Feb 7, 2023

Cosmos is a world model development platform that consists of world foundation models, tokenizers and video processing pipeline to accelerate the development of Physical AI at Robotics & AV labs. C…

Jupyter Notebook 7,655 492 Updated Mar 7, 2025

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Cuda 1,103 66 Updated Feb 28, 2025

Investigating the Detection of ChatGPT-Generated Texts across Radiology Reports

Jupyter Notebook 1 Updated Dec 25, 2024

a family of versatile and state-of-the-art video tokenizers.

Python 350 20 Updated Jan 15, 2025

Code for BLT research paper

Python 1,432 110 Updated Mar 5, 2025

LLM KV cache compression made easy

Python 428 29 Updated Mar 5, 2025

The friendly PIL fork

Python 2,214 90 Updated Oct 7, 2024

Python library for reading and writing image data

Python 1,563 305 Updated Feb 21, 2025

PyTorch video decoding

Python 255 23 Updated Mar 10, 2025

Entropy Based Sampling and Parallel CoT Decoding

Python 3,341 319 Updated Nov 13, 2024

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

Python 122 7 Updated Jan 30, 2025

PyTorch native post-training library

Python 4,974 550 Updated Mar 10, 2025

End-to-end recipes for optimizing diffusion models with torchao and diffusers (inference and FP8 training).

Python 330 11 Updated Feb 19, 2025

Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.

Python 26 3 Updated Oct 18, 2024

Python Library to evaluate VLM models' robustness across diverse benchmarks

Jupyter Notebook 194 14 Updated Feb 28, 2025

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!

JavaScript 13,170 872 Updated Mar 7, 2025

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,147 164 Updated Feb 13, 2025

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 18,912 1,358 Updated Mar 3, 2025

When do we not need larger vision models?

Python 373 12 Updated Feb 8, 2025

MIPT RL Course HW Solutions Spring 2024

Jupyter Notebook 1 Updated Jul 2, 2024

tiny vision language model

Python 7,565 585 Updated Feb 25, 2025

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 237 8 Updated Dec 26, 2024

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

Python 730 42 Updated Aug 5, 2024

Multi-modality pre-training

Python 486 37 Updated May 8, 2024
Jupyter Notebook 1,691 162 Updated Sep 27, 2024
Next
Showing results