I recently tried to update my movie page but failed - the package cinemagoer could not fetch all the needed information from IMDb. I decided to try LLM instead of a rule based engine to extract movie information. It works and works pretty well with tool calling for a structured output. I am going to document my understanding of tool calling, what it is for, and how to use it for structured output.

Quick note: Codes are tailored to Qwen/Qwen3-4B-Instruct-2507 and mlx_lm. They might need adjustment with other models and frameworks.

from mlx_lm import load, generate
from pprint import pprint
import re
import json

model, tokenizer = load("Qwen/Qwen3-4B-Instruct-2507")

def get_completion(prompt: str, max_tokens: int=128) -> str:
    """Complete a prompt."""
    messages = [dict(role="user", content=prompt)]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )
    resp = generate(
        model,
        tokenizer,
        prompt,
        max_tokens=max_tokens
    )
    return resp

Fetching 10 files: 100%|██████████| 10/10 [00:00<00:00, 98227.26it/s]

prompt = "What date is it today?"
print(get_completion(prompt, max_tokens=10))

I can't access real-time information, including today

Tool Calling

A stand-alone LLM cannot answer simple questions such as "What date is it today?". Once the training is done, the internal knowledge of the LLM is fixed and it cannot reliably answer questions beyond its internal knowledge. One common way to increase its knowledge is to include needed context in prompts. Another one is to add tool calling capability to the LLM during the training process so that it can interact with external world to get the needed context on its own judgment.

prompt = "Explain tool calling in one paragraph."
pprint(get_completion(prompt, max_tokens=1024))

('Tool calling is a capability in artificial intelligence, particularly in '
 'large language models (LLMs), that allows the model to invoke external '
 'functions or tools—such as APIs, databases, or software programs—to perform '
 'specific tasks that the model cannot execute directly. Instead of generating '
 'a complete response based solely on its internal knowledge, the model '
 'identifies a relevant action (e.g., searching a database or retrieving '
 'weather data) and "calls" an appropriate tool to obtain the required '
 'information. The tool executes the request, returns the result, and the '
 'model uses that output to complete its response, enabling more accurate, '
 'up-to-date, and contextually relevant answers. This mechanism enhances the '
 "model's ability to interact with the real world and perform complex, "
 'multi-step reasoning.')

prompt = "Explain how LLM obtains tool calling capability in one paragraph."
pprint(get_completion(prompt, max_tokens=1024))

('Large Language Models (LLMs) obtain tool calling capability through a '
 'combination of architectural modifications and training techniques that '
 'enable them to recognize, interpret, and execute external tools or '
 'functions. This is typically achieved by integrating a "tool-use" module '
 'into the model’s architecture, where the model learns to generate structured '
 'prompts that specify which tool to invoke and what input parameters to pass. '
 'During training, the LLM is exposed to examples of human instructions paired '
 'with tool calls and corresponding outputs, allowing it to learn the mapping '
 'between natural language queries and appropriate function invocations. '
 'Additionally, reinforcement learning from human feedback (RLHF) or direct '
 'preference optimization can refine the model’s ability to select the most '
 'relevant tools in context. In some cases, the model is trained with a '
 '"tool-aware" instruction-tuning process, where it learns to detect '
 'tool-relevant phrases and generate function calls in a safe, structured '
 'format. This enables the LLM to act as an intelligent agent capable of '
 'performing complex tasks by leveraging external systems like APIs, '
 'databases, or web search.')

Not all LLMs come with tool calling capability. We should confirm before using the LLM for tool calling. We can read the technical reports, or simply go through its model card. For example, it mentions "tool usage" in the model card of Qwen/Qwen3-4B-Instruct-2507. We can also find its performance on relevant benchmarks - BFCL v3, Tau 1, and Tau 2. These are strong evidence that the model has the tool calling capability.

The next step is to get some understanding of how the LLM was trained for the tool calling. Different LLM providers can structure the input and output differntly during the training process. We should align the data layout as much as we can to the training process to maximize LLM's performance. We can take a look at the chat template for some hint. Below is relevant part of the output from tokenizer.get_chat_template():

{%- if tools %}
    {{- '<|im_start|>system\n' }}
    {%- if messages[0].role == 'system' %}
        {{- messages[0].content + '\n\n' }}
    {%- endif %}
    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
    {%- for tool in tools %}
        {{- "\n" }}
        {{- tool | tojson }}
    {%- endfor %}
    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}

From the Qwen3's chat template, we can provide tools to it via tokenizer.apply_chat_template. The tools will be inserted in the system message as a "# Tools" section. The input/output are orgainized in XML tags. Each tool will be formatted to a JSON string by tojson. In this article, we can also find the specs to describe a tool. Here is a classic get_current_temperature example:

temperature_tool = {
    "type": "function",
    "function": {
        "name": "get_current_temperature",
        "description": "Get current temperature at a location.",
        "parameters": {
        "type": "object",
        "properties": {
            "location": {
            "type": "string",
            "description": "The location to get the temperature for, in the format \"City, State, Country\"."
            },
            "unit": {
            "type": "string",
            "enum": [
                "celsius",
                "fahrenheit"
            ],
            "description": "The unit to return the temperature in. Defaults to \"celsius\"."
            }
        },
        "required": [
            "location"
        ]
        }
    }
},

We can pass in tools=[temperature_tool] in tokenizer.apply_chat_template to make the tools available to the model. One issue is that it is quite some work to describe a tool using the format above manually. Luckily, since this tool format is so commonly used that the tokenizer implementation from transformers can accept functions with appropriate docstrings as tools. It converts the functions to the format above with information fetched from docstrings. Let's provide a today tool to the model so that it can answer the date of today.

import datetime

def today():
    """Get today's date in YYYY-MM-DD."""
    today_date = datetime.date.today()
    ret = f"{today_date:%Y-%m-%d}"
    return ret

print(today())

2025-12-25

prompt = "What date is it today?"
messages = [dict(role="user", content=prompt)]
formatted_prompt = tokenizer.apply_chat_template(
    messages,
    tools=[today],
    add_generation_prompt=True,
    tokenize=False
)
print(formatted_prompt)

<|im_start|>system
# Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags:
<tools>
{"type": "function", "function": {"name": "today", "description": "Get today's date in YYYY-MM-DD.", "parameters": {"type": "object", "properties": {}}}}
</tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{"name": <function-name>, "arguments": <args-json-object>}
</tool_call><|im_end|>
<|im_start|>user
What date is it today?<|im_end|>
<|im_start|>assistant

Please pay attention to tools=[today] in tokenizer.apply_chat_template and how it is presented as a tool in the system message. With this prompt, the model now has the tool needed to answer the "date" question.

resp = generate(
    model, tokenizer, formatted_prompt, max_tokens=128
)
print(resp)

<tool_call>
{"name": "today", "arguments": {}}
</tool_call>

Instead of saying it cannot answer the question, it responds the need to call a function today to get the information needed for the question. Looking at the chat template again, we can learn how to pass the response of the function back to the model.

{%- elif message.role == "tool" %}
    {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
        {{- '<|im_start|>user' }}
    {%- endif %}
    {{- '\n<tool_response>\n' }}
    {{- content }}
    {{- '\n</tool_response>' }}
    {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
        {{- '<|im_end|>\n' }}
    {%- endif %}
{%- endif %}

We will pass back the response with rool="tool", and content the return of the function. We can go sophisticated on the tool registry and executor. But for simplicity, I assume the tools are defined in globals().

tool_call_regex = re.compile(
    "<tool_call>(.*?)</tool_call>|<tool_call>(.*)", re.DOTALL
)
def parse_tool_calls(resp: str) -> list:
    """Parse tool calls in LLM response."""
    tools = []
    try:
        for data in tool_call_regex.findall(resp):
            tool = json.loads(data[0]) if data[0] else json.loads(data[1])
            tools.append(tool)
    except:
        pass
    return tools


def execute_tool(tool: dict) -> dict:
    """Execute a tool."""
    func_name = tool["name"]
    func_args = tool["arguments"]
    func = globals()[func_name]
    ret = func(**func_args)
    tool_message = dict(role="tool", content=str(ret))
    return tool_message


tool_calls = parse_tool_calls(resp)
print("Tool calls:", tool_calls)
for tool in tool_calls:
    tool_resp = execute_tool(tool)
    print("Tool call response:", tool_resp)

Tool calls: [{'name': 'today', 'arguments': {}}]
Tool call response: {'role': 'tool', 'content': '2025-12-25'}

def get_completion_with_tools(
    prompt: str,
    tools: list|None=None,
    eval_tools: bool = True,
    max_tokens: int = 128,
) -> list[dict]:
    """Complete a prompt with tools."""
    messages = [dict(role="user", content=prompt)]
    # assistant
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, tools=tools
    )
    resp = generate(
        model,
        tokenizer,
        prompt,
        max_tokens=max_tokens
    )
    messages.append(dict(role="assistant", content=resp))

    # tool use
    while eval_tools:
        tool_calls = parse_tool_calls(resp)
        # break the loop if no tool calls
        if not tool_calls:
            break

        # execute tools
        for tool in tool_calls:
            messages.append(execute_tool(tool))

        # run completion again with tool responses
        prompt = tokenizer.apply_chat_template(
            messages, add_generation_prompt=True, tools=tools
        )
        resp = generate(
            model,
            tokenizer,
            prompt,
            max_tokens=max_tokens
        )
        messages.append(dict(role="assistant", content=resp))

    return messages

Let's ask the model again the date of today with tools provided.

messages = get_completion_with_tools(
    "What date is it today?",
    tools=[today]
)
print(messages[-1]["content"])

Today's date is December 25, 2025.

We can go further to ask for the weekday of today, with another tool passed in.

def get_weekday(date_str: str):
    """Get the weekday of a given date.

    Args:
      date_str: the date string in the format of YYYY-MM-DD.
    """
    date = datetime.datetime.strptime(date_str, "%Y-%m-%d")
    return f"{date:%A}"

messages = get_completion_with_tools(
    "What weekday is it today?",
    tools=[today, get_weekday]
)
print(messages[-1]["content"])

Today is Thursday, December 25, 2025.

# take a look into the seqence of tool calls from the messages
pprint(messages)

[{'content': 'What weekday is it today?', 'role': 'user'},
 {'content': '<tool_call>\n{"name": "today", "arguments": {}}\n</tool_call>',
  'role': 'assistant'},
 {'content': '2025-12-25', 'role': 'tool'},
 {'content': '<tool_call>\n'
             '{"name": "get_weekday", "arguments": {"date_str": '
             '"2025-12-25"}}\n'
             '</tool_call>',
  'role': 'assistant'},
 {'content': 'Thursday', 'role': 'tool'},
 {'content': 'Today is Thursday, December 25, 2025.', 'role': 'assistant'}]

The model is smart enough that it invokes today to get the date of today and then uses get_weekday with the right argument to get the weekday.

Structured Output

One clever use of tool calling is to get structured output instead of plain text output. The main idea is in the arguments field in the tool calls, which is in JSON format. Let's come back the original quest - extract movie information from IMDb page. We will see the normal output first.

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)",
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
}
# Avata: Fire and Ash
url = "https://www.imdb.com/title/tt1757678/"
page = requests.get(url, headers=headers)
page.raise_for_status()

# get the text from the page
soup = BeautifulSoup(page.content)
page_text = soup.get_text()

print(page_text.splitlines()[0])

Avatar: Fire and Ash (2025) - IMDb

movie_prompt = f"Extract movie information from the given text.\n\n# Text\n{page_text}"
resp = get_completion(movie_prompt, max_tokens=128)
print(resp)

Here is the extracted movie information from the provided text:

---

**Movie Title:** *Avatar: Fire and Ash*  
**Release Year:** 2025  
**Release Date (United States):** December 19, 2025  
**Genre:** Action, Adventure, Epic, Fantasy, Sci-Fi  
**Rating:** PG-13  
**Runtime:** 3 hours 17 minutes (197 minutes)  
**IMDb Rating:** 7.5/10 (based on 46K user ratings)  
**Director:** James Cameron  
**Writers:** James Cameron,

The model is pretty good at extracting the relevant movie information. But one problem is that the output is not structured that we still need to parse it into fields before we can save it to a database. Another problem is that the model might not foucs on the information we want. To solve this issue, we can define a tool to specify the structure and information we need to extract.

def Movie(
    title: str,
    rating_value: float,
    rating_count: str,
    desc: str,
    genre: str,
    duration: str,
    publish_date: str,
    stars: str,
    director: str,
    writer: str
):
    """Extracted information for a movie.

    Args:
        title: the title of the movie.
        rating_value: the IMDb rating of the movie on the scale of 0~10.
        rating_count: the number of people for the IMDb rating.
        desc: the plot of the movie.
        genre: the genres of the movie, separated by comma if more than one.
        duration: the duration of the movie.
        publish_date: the release date of the movie.
        stars: the stars of the movie, separated by comma.
        director: the director of the movie, separated by comma if more than one.
        writer: the writer of the movie, separated by comma if more than one.
    """

For structured output, we don't really need a function (you can if you want), but just a way to specify the fields we need and the descriptions. We make use of the docstrings to define the structure and pass it to Qwen3 as a tool.

messages = get_completion_with_tools(
    movie_prompt,
    tools=[Movie],
    eval_tools=False, # we don't need to eval the tools
    max_tokens=1024, # make it large enough to hold the output
)
resp = messages[-1]["content"]
pprint(resp)

('<tool_call>\n'
 '{"name": "Movie", "arguments": {"title": "Avatar: Fire and Ash", '
 '"rating_value": 7.5, "rating_count": "46K", "desc": "Jake and Neytiri\'s '
 "family grapples with grief, encountering a new, aggressive Na'vi tribe, the "
 'Ash People, who are led by the fiery Varang, as the conflict on Pandora '
 'escalates and a new moral focus emerges.", "genre": "Action Epic, Adventure '
 'Epic, Epic, Fantasy Epic, Sci-Fi Epic", "duration": "3h 17m", '
 '"publish_date": "December 19, 2025", "stars": "Sam Worthington, Zoe Saldaña, '
 'Sigourney Weaver, Stephen Lang, Oona Chaplin, Kate Winslet, Cliff Curtis, '
 'CCH Pounder, Edie Falco, Brendan Cowell, Jemaine Clement, David Thewlis, '
 'Britain Dalton, Jack Champion", "director": "James Cameron", "writer": '
 '"James Cameron, Rick Jaffa, Amanda Silver"}}\n'
 '</tool_call>')

movie = parse_tool_calls(resp)[0]["arguments"]
pprint(movie)

{'desc': "Jake and Neytiri's family grapples with grief, encountering a new, "
         "aggressive Na'vi tribe, the Ash People, who are led by the fiery "
         'Varang, as the conflict on Pandora escalates and a new moral focus '
         'emerges.',
 'director': 'James Cameron',
 'duration': '3h 17m',
 'genre': 'Action Epic, Adventure Epic, Epic, Fantasy Epic, Sci-Fi Epic',
 'publish_date': 'December 19, 2025',
 'rating_count': '46K',
 'rating_value': 7.5,
 'stars': 'Sam Worthington, Zoe Saldaña, Sigourney Weaver, Stephen Lang, Oona '
          'Chaplin, Kate Winslet, Cliff Curtis, CCH Pounder, Edie Falco, '
          'Brendan Cowell, Jemaine Clement, David Thewlis, Britain Dalton, '
          'Jack Champion',
 'title': 'Avatar: Fire and Ash',
 'writer': 'James Cameron, Rick Jaffa, Amanda Silver'}

Voilà, we get all the movie information we want in a dictionary movie!

Final words: Some may feel weird that we pass in a function for extraction instead of an object. That is the limitation of Qwen3 models. In its documentation for tool calling here, it explicitly calls out that currently only "function" is valid for the tool type. If object type is also supported, we can define Movie as a Pydantic class and then pass in the model json schema as the tool. Though it is not officially supported, you can still try it as an exercise :)

LLM GenAI

LLM Tool Calling

Tool Calling

Structured Output