"Artificial Intelligence is the new electricity," according to Andrew Ng. However, unlike the static jolt of a shock, it brings a dynamic buzz to the atmosphere!
Welcome to the article, where we demystify the buzz around The Art of Function Calling and the Enigmatic World of Agents.
In this blog, we delve into function calling, explore its applications, and discuss how LLMs can efficiently use external tools to answer questions and retrieve context. Additionally, we touch upon the role of agents and schema-guided function call generation, providing a comprehensive overview of this essential aspect of LLM projects. Ready to explore how these elements turn your language models into spellbinding smart assistants? Let’s dive in!
Function Calling in LLMs
Function calling is a powerful capability for LLM-powered agents, enabling seamless interaction with external tools via API calls. This capability allows LLMs to invoke predefined functions to perform tasks, process data, and retrieve information. Developers can define custom functions within language models, which the models can then directly invoke to generate structured outputs and provide clear explanations.
Functional calling enables developers to create:
- Conversational agents that can efficiently use external tools to answer questions. For example, the query "What is the weather like in Belize?" will be converted to a function call such as get_current_weather(location: string, unit: 'celsius' | 'fahrenheit')
- LLM-powered solutions for extracting and tagging data (e.g., extracting people's names from a Wikipedia article)
- Applications that can help convert natural language to API calls or valid database queries.
- Conversational knowledge retrieval engines that interact with a knowledge base
Let’s explore how function calling empowers LLMs to generate structured JSON outputs for various function calls.
When an LLM encounters a prompt that requires external tool usage, it intelligently detects the need for function calling. It then generates a JSON object containing arguments for the relevant function. Instead of providing raw text, LLMs organize their output as structured JSON. This makes it easier for downstream applications to manipulate and interpret the data.
1. Example Scenario:
- Imagine an LLM assisting a user with sales data analysis. Here’s how function calling comes into play:
- Defined Functions:
- read_data(): Reads sales data from a CSV file.
- calculate_sales(product: str): Filters sales data based on the specified product name and returns the result as JSON.
- User Prompt:
- User: “Show me sales data for ‘Widget X’.”
- LLM Response:
- LLM detects the need for function calling:
- Defined Functions:
JSON
{
"function": "calculate_sales",
"arguments": {
"product": "Widget X"
}
}
-
-
- The LLM generates a structured JSON object, ready for execution.
-
2. Benefits of Function Calling:
- Efficiency: LLMs can directly invoke functions, reducing manual steps.
- Flexibility: Developers can define custom functions tailored to specific tasks.
- Structured Output: JSON format ensures clarity and ease of data manipulation.
Function calling transforms LLMs into versatile problem solvers, bridging the gap between language and action.
If you’d like to explore more about function calling in OpenAI models, check out the detailed guide in the LLM Series and Announcing function calling and JSON mode. And there you have it—a quirky introduction to the fascinating world of function calling.
Now, let’s dive deeper into the mechanics with an example and unravel the mysteries of those curly braces and parentheses!
Create Chatbots that Answer Questions by Calling External APIs:
- In this example, you can create a chatbot that answers questions by calling external APIs. Let’s say you want to send an email based on natural language input. You define a JSON document with fields like the recipient’s address, email body, date, and time. Then, you use OpenAI Function Calling to generate a JSON object containing these arguments. Finally, using the provided arguments, you call the send_email function in your code.
- Example Python code:
Python
import openai
import requests
import json
openai.api_key = "your_api_key"
url = "https://api.openai.com/v1/chat/completions"
payload = json.dumps({
"model": "gpt-4-0613",
"messages": [
{
"role": "user",
"content": "Send Cobus from HumanFirst AI an email and ask him for the sales forecast spreadsheet. Schedule the mail for tomorrow at 12 noon."
}
],
"functions": [
{
"name": "send_email",
"description": "template to have an email sent.",
"parameters": {
"to_address": {"type": "string", "description": "To address for email"},
"body": {"type": "string", "description": "Body of the email"},
"date": {"type": "string", "description": "the date the email must be sent."},
"time": {"type": "string", "description": "the time the email must be sent."}
}
}
]
})
headers = {
'Content-Type': 'application/json',
'Authorization': 'Basic your_api_key'
}
response = requests.request("POST", url, headers=headers, data=payload)
print(response.text)
- The response from the model will contain the generated JSON object.
Introduction: The Enigmatic World of LLM-Based Agents
In the ever-evolving landscape of artificial intelligence, a new breed of digital entities emerges—the LLM-based agents. These agents harness the power of LLMs as their computational engines, transcending mere text generation. They engage in conversations, complete tasks, reason, and even exhibit a degree of autonomy. Let’s delve into their mysterious realm.
The LLM Agent Unveiled
An LLM agent is like a wizard with a language wand. It wields LLMs—those massive neural networks trained on vast amounts of text—as its secret spell book. But these agents don’t stop at incantations; they’re the Gandalfs of AI, capable of:
- Conversations: Imagine chatting with an LLM that understands context, recalls previous statements, and adjusts its tone and style. These agents maintain the thread of dialogue, making them more than mere chatterbots.
- Task Completion: Need a code snippet? A recipe? A summary of the latest news? LLM agents can fetch information, execute Python scripts, and even query databases. They’re your Swiss Army knives of knowledge.
- Reasoning: LLMs excel at breaking down complex problems into digestible steps. These agents dissect intricacies, offer solutions, and evaluate each move. Think of them as your AI problem-solving companions.
- Autonomy: LLM agents exhibit autonomous behavior while not sentient beings. They’re like diligent interns—following instructions, adapting, and occasionally surprising you with their resourcefulness.
Language agents are models intended for sequential decision-making scenarios, for example playing chess, operating software autonomously, or browsing the web in search of an item expressed in natural language. Language agents are an extension of the classic work on machine learning agents. Yet, in this newer rendition, the systems being built also utilize LLMs.
In more detail, these models operate in what's known as an environment and iteratively take actions in pursuit of accomplishing a specific goal. For example, a language agent tasked with buying an ambiguously described product might take an action corresponding to a search. Every time the model takes an action, like searching, the environment responds, for example, with the search results. The agent observes the results and takes another action, for example, visiting a page corresponding to a promising item. The model continues taking action until it thinks it has achieved its goal, at which point it terminates. One reason that the interest in language agents is rapidly increasing is their out-of-the-box communication capabilities via natural language and instruction-following capabilities. This makes it relatively easy to tell the model what it's supposed to do and what actions are available to it.
One of the canonical works in this space is ReAct, which proposes a framework for leveraging LLMs as language agents. A key ingredient of this work is to prompt the model to emit what they call thoughts, which are summaries of the goal, what steps the model has already accomplished, and what steps the model thinks it needs to take next.
Because of the significant interest in language agents, there has been a significant study of teaching LLMs how to leverage tools. Tools are used here very broadly, but it boils down to using APIs and other programs to perform computation. For example, instead of doing some arithmetic by decoding, an LLM could generate some text expressing the intention to use a calculator, formulate an API call to perform the arithmetic, and then consume the result. The ability to use tools promises to greatly expand the capability of LLMs.
The Quest for External Knowledge
LLM-based agents wield neural networks like language wands. They converse, reason, and adapt. These agents aren’t mere chatterbots; they’re problem-solving companions. They fetch information, execute code, and occasionally surprise us.
But wait, there’s a twist! LLMs, for all their brilliance, have blind spots. They lack real-time data and domain-specific expertise. Like scholars without Wi-Fi, they can’t fetch the latest news or summon specific facts. That’s where external tools come in—the magical artifacts that enhance LLM agents.
So, next time you encounter an LLM-based agent, remember beneath its text-based façade lies a world of reasoning, adaptability, and a dash of digital wizardry.
Let’s explore the immense potential of combining function calling and agents within LLMs to create powerful conversational agents and advanced knowledge retrieval engines.
1. Conversational Agents: Bridging Language and Action
- What Are Conversational Agents? Conversational agents, also known as chatbots or virtual assistants, engage in natural language conversations with users. They can answer questions, provide recommendations, and even execute tasks.
- Function Calling Superpowers: By integrating function calling, conversational agents become more than mere text-based responders. They can invoke external tools, APIs, or custom functions to perform specific actions. For example:
- A chatbot that schedules meetings by calling a calendar API.
- An assistant that translates text by invoking a language translation function.
- Benefits:
- Efficiency: Conversational agents can automate tasks without manual intervention.
- Adaptability: They dynamically adjust their behavior based on context and user input.
- Rich Interactions: Function calling enables complex interactions beyond simple text responses.
2. Knowledge Retrieval Engines: Unleashing the LLM’s Potential
- The Quest for Information: Knowledge retrieval engines aim to fetch relevant data from vast sources. LLMs, with their language understanding, can excel in this domain.
- Function Calling for Data Retrieval:
- Imagine an LLM-based search assistant:
- User: “Tell me about quantum entanglement.”
- LLM: Invokes an external search engine, retrieves information, and presents a concise summary.
- Imagine an LLM-based search assistant:
- Benefits:
- Comprehensive Answers: LLMs can summarize complex topics, providing accurate and detailed responses.
- Customizable Sources: Function calling allows agents to query specific databases or APIs.
- Real-Time Updates: Agents can fetch the latest information dynamically.
3. Challenges and Ethical Considerations:
- Bias: Agents must avoid perpetuating biases present in external data sources.
- Privacy: Function calling may involve sensitive data; privacy safeguards are crucial.
- Transparency: Users should know when external tools are invoked.
In summary, the fusion of function calling, and agents empowers LLMs to transcend language barriers, execute tasks, and become indispensable companions in our digital lives.
AUTHOR - FOLLOW
Shreyas Kale
Associate Manager, Data Science