Using the sglang framework combined with outlines to format the output using DeepSeek-R1-Distill-Qwen-1.5B. #1406

llmadd · 2025-02-07T08:56:23Z

llmadd
Feb 7, 2025

I’m not very good at English, but the issue is roughly as follows: When I perform regular expression formatting, I encounter a situation where the content keeps looping or stays in the 'think' process. Even with relatively simple regular expressions, like extracting numbers from the example above, there are multiple repeated outputs. However, JSON formatted output doesn't have this issue.

from openai import OpenAI
import time
from pydantic import BaseModel, Field


client = OpenAI(api_key="sk-proj-01234567890123456789012345678901", base_url="http://127.0.0.1:8080/v1")
all_time = 0
response = client.chat.completions.create(
    model="/home/qichen/qichen/DeepSeek-R1-Distill-Qwen-1.5B",
        messages=[
    {
        "role": "user",
        "content": "2025年，在上海浦东新区学习深度学习",
    },
],
    stream=True,
    temperature=0.6,
    max_tokens=4096,
    extra_body={"regex": r"\d+(\.\d+)?"}, 
)


for chunk in response:
    print(chunk.choices[0].delta.content, end="", flush=True)

Here are some examples of unreasonable outputs:

Outputting content that doesn't exist.
Repeatedly outputting the same content.
Output that doesn't follow the expected content.

cpfiffer · 2025-02-07T21:26:07Z

cpfiffer
Feb 7, 2025
Collaborator

Your dummy API key freaked me out for a second, thought it was a real key. 😌

Part of the issue is that the regular expression to work with DeepSeek is somewhat complicated. There's an example here for limiting the number of thinking tokens, but the compilation time can be quite complex.

I have an example of how to do this here: https://gist.github.com/cpfiffer/e98fc71d4fcb35ba827fc9e679112895

Can you take a look at that and see if it helps?

4 replies

llmadd Feb 8, 2025
Author

Haha, that was a fake key. Thank you for your reply, I attempted to reproduce your code.
The current situation is roughly as follows:
When setting

thinking_regex = r'<think>()<\/think>(\s)*'

multiple attempts can output formatted information perfectly.
When setting

thinking_regex = r'<think>((?:.|\n|\s){' + str(NUM_THINKING_CHARACTERS) + r'}?)<\/think>(\s)*'

if NUM_THINKING_CHARACTERS is a small value like 10, it can output the think and formatted answer well. However, when the value is large, such as 100, there will be a situation where the output is mostly blank.
When setting

thinking_regex = r'<think>((?:.|\n|\s)*?)<\/think>(\s)*'

it fails to output successfully. Every time, it either infinitely repeats within the think content or the think is empty, and the answer appears to repeat indefinitely. I think it might be a model issue, where smaller models have a higher probability of content repetition.

This is my code. I used the OpenAI SDK and the sglang framework combined with Outlines.

from openai import OpenAI
import time
from pydantic import BaseModel, Field
from typing import Literal
from outlines.fsm.json_schema import convert_json_schema_to_str
from outlines_core.fsm.json_schema import build_regex_from_schema

class YesNo(BaseModel):
    answer: Literal['yes', 'no']

client = OpenAI(api_key="sk-proj-01234567890123456789012345678901", base_url="http://127.0.0.1:8080/v1")

# thinking_regex = r'<think>((?:.|\n|\s)*?)<\/think>(\s)*' 
# thinking_regex = r'<think>()<\/think>(\s)*'
thinking_regex = r'<think>((?:.|\n|\s){' + str(100) + r'}?)<\/think>(\s)*'
other_regex = build_regex_from_schema(convert_json_schema_to_str(YesNo))
prompt_regex = thinking_regex + other_regex
response = client.chat.completions.create(
    model="/home/qichen/qichen/DeepSeek-R1-Distill-Qwen-1.5B",
        messages=[
    {
        "role": "user",
        "content": r"""Roses are red. Violets are blue. Are roses and violets the same color? Yes or no. Provide a chain of thought inside a <think> tag, closing with </think> when you are finished. After, please write JSON with the following schema: {"answer": "yes" | "no"}""",
    },
],
    stream=True,
    temperature=0.6,
    max_tokens=1000,
    extra_body={"regex": prompt_regex}, 
)

Some running results.

llmadd Feb 8, 2025
Author

Not using regex and directly formatting the output in JSON would be better. The format is more stable and less prone to unexpected issues. >_<
Code example：

response = client.chat.completions.create(
    model="/home/qichen/qichen/DeepSeek-R1-Distill-Qwen-1.5B",
        messages=[
    {
        "role": "user",
        "content": r"""Roses are red. Violets are blue. Are roses and violets the same color? Yes or no. Provide a chain of thought inside a <think> tag, closing with </think> when you are finished. After, please write JSON with the following schema: {"answer": "yes" | "no"}""",
    },
],
    stream=True,
    temperature=0.6,
    max_tokens=1000,
    # extra_body={"regex": prompt_regex}, 
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "foo",
            "schema": YesNo.model_json_schema(),
        },
    }
)

cpfiffer Feb 14, 2025
Collaborator

The regex is indeed pretty flimsy -- doing this with a regular expression is not my favorite.

However, simply using JSON output defeats most of the purpose of using R1, as no thinking tokens will be generated. A better way to handle this is to define separate logit processors for different blocks, such as allowing fully unstructured text within think blocks and structured JSON after that block.

I wrote up an issue on this here but the regular expression stuff may have to do for now. Could you throw a +1 on that issue or something to increase its priority?

llmadd Feb 18, 2025
Author

Alright, I will continue to pay attention to these issues. Thank you for your work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the sglang framework combined with outlines to format the output using DeepSeek-R1-Distill-Qwen-1.5B. #1406

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Using the sglang framework combined with outlines to format the output using DeepSeek-R1-Distill-Qwen-1.5B. #1406

llmadd Feb 7, 2025

Replies: 1 comment · 4 replies

cpfiffer Feb 7, 2025 Collaborator

llmadd Feb 8, 2025 Author

llmadd Feb 8, 2025 Author

cpfiffer Feb 14, 2025 Collaborator

llmadd Feb 18, 2025 Author

llmadd
Feb 7, 2025

Replies: 1 comment 4 replies

cpfiffer
Feb 7, 2025
Collaborator

llmadd Feb 8, 2025
Author

llmadd Feb 8, 2025
Author

cpfiffer Feb 14, 2025
Collaborator

llmadd Feb 18, 2025
Author