In this tutorial, we build an advanced AI agent using Semantic Kernel combined with Google’s Gemini free model, and we run it seamlessly on Google Colab. We start by wiring Semantic Kernel plugins as tools, like web search, math evaluation, file I/O, and note-taking, and then let Gemini orchestrate them through structured JSON outputs. We see the agent plan, call tools, process observations, and deliver a final answer. Check out the FULL CODES here.
!pip -q install semantic-kernel google-generativeai duckduckgo-search rich
import os, re, json, time, math, textwrap, getpass, pathlib, typing as T
from rich import print
import google.generativeai as genai
from duckduckgo_search import DDGS
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY") or getpass.getpass("🔑 Enter GEMINI_API_KEY: ")
genai.configure(api_key=GEMINI_API_KEY)
GEMINI_MODEL = "gemini-1.5-flash"
model = genai.GenerativeModel(GEMINI_MODEL)
import semantic_kernel as sk
try:
from semantic_kernel.functions import kernel_function
except Exception:
from semantic_kernel.utils.function_decorator import kernel_function
We begin by installing the libraries and importing essential modules, including Semantic Kernel, Gemini, and DuckDuckGo search. We set up our Gemini API key and model to generate responses, and we prepare Semantic Kernel’s kernel_function to register our custom tools. Check out the FULL CODES here.
class AgentTools:
"""Semantic Kernel-native toolset the agent can call."""
def __init__(self):
self._notes: list[str] = []
@kernel_function(name="web_search", description="Search the web for fresh info; returns JSON list of {title,href,body}.")
def web_search(self, query: str, k: int = 5) -> str:
k = max(1, min(int(k), 10))
hits = list(DDGS().text(query, max_results=k))
return json.dumps(hits[:k], ensure_ascii=False)
@kernel_function(name="calc", description="Evaluate a safe math expression, e.g., '41*73+5' or 'sin(pi/4)**2'.")
def calc(self, expression: str) -> str:
allowed = {"__builtins__": {}}
for n in ("pi","e","tau"): allowed[n] = getattr(math, n)
for fn in ("sin","cos","tan","asin", "sqrt","log","log10","exp","floor","ceil"):
allowed[fn] = getattr(math, fn)
return str(eval(expression, allowed, {}))
@kernel_function(name="now", description="Get the current local time string.")
def now(self) -> str:
return time.strftime("%Y-%m-%d %H:%M:%S")
@kernel_function(name="write_file", description="Write text to a file path; returns saved path.")
def write_file(self, path: str, content: str) -> str:
p = pathlib.Path(path).expanduser().resolve()
p.parent.mkdir(parents=True, exist_ok=True)
p.write_text(content, encoding="utf-8")
return str(p)
@kernel_function(name="read_file", description="Read text from a file path; returns first 4000 chars.")
def read_file(self, path: str) -> str:
p = pathlib.Path(path).expanduser().resolve()
return p.read_text(encoding="utf-8")[:4000]
@kernel_function(name="add_note", description="Persist a short note into memory.")
def add_note(self, note: str) -> str:
self._notes.append(note.strip())
return f"Notes stored: {len(self._notes)}"
@kernel_function(name="search_notes", description="Search notes by keyword; returns top matches.")
def search_notes(self, query: str) -> str:
q = query.lower()
hits = [n for n in self._notes if q in n.lower()]
return json.dumps(hits[:10], ensure_ascii=False)
kernel = sk.Kernel()
tools = AgentTools()
kernel.add_plugin(tools, "agent_tools")
We define an AgentTools class as our Semantic Kernel toolset, giving the agent abilities like web search, safe math calculation, time retrieval, file read/write, and lightweight note storage. We then initialize the Semantic Kernel and register these tools as a plugin so that the agent can invoke them during reasoning. Check out the FULL CODES here.
def list_tools() -> dict[str, dict]:
registry = {}
for name in ("web_search","calc","now","write_file","read_file","add_note","search_notes"):
fn = getattr(tools, name)
desc = getattr(fn, "description", "") or fn.__doc__ or ""
sig = "()" if name in ("now",) else "(**kwargs)"
registry[name] = {"callable": fn, "description": desc.strip(), "signature": sig}
return registry
TOOLS = list_tools()
CATALOG = "\n".join(
[f"- {n}{v['signature']}: {v['description']}" for n,v in TOOLS.items()]
)
SYSTEM = f"""You are a meticulous tool-using AI agent.
You can call TOOLS by returning ONLY a JSON object:
{{"tool":"","args":{{...}}}}
After finishing all steps, respond with:
{{"final_answer":""}}
TOOLS available:
{CATALOG}
Rules:
- Prefer factuality; cite web_search results as A Coding Implementation of an Advanced Tool-Using AI Agent with Semantic Kernel and Gemini(url).
- Keep steps minimal; at most 8 tool calls.
- For file outputs, use write_file and mention the saved path.
- If a tool error occurs, adjust arguments and try again.
"""
def extract_json(s: str) -> dict|None:
for m in re.finditer(r"\{.*\}", s, flags=re.S):
try: return json.loads(m.group(0))
except Exception: continue
return None
We create a list_tools helper to collect all available tools, their descriptions, and signatures into a registry for the agent. We then build a CATALOG string that lists these tools and embed it into the SYSTEM prompt, which instructs Gemini how to call tools in strict JSON format and return a final answer. Finally, we define extract_json to safely parse tool calls or final answers from the model’s output. Check out the FULL CODES here.
def run_agent(task: str, max_steps: int = 8, verbose: bool = True) -> str:
transcript: list[dict] = [{"role":"system","parts":[SYSTEM]},
{"role":"user","parts":[task]}]
observations = ""
for step in range(1, max_steps+1):
content = []
for m in transcript:
role = m["role"]
for part in m["parts"]:
content.append({"text": f"[{role.upper()}]\n{part}\n"})
if observations:
content.append({"text": f"[OBSERVATIONS]\n{observations[-4000:]}\n"})
resp = model.generate_content(content, request_options={"timeout":60})
text = resp.text or ""
if verbose:
print(f"\n[bold cyan]Step {step} - Model[/bold cyan]\n{textwrap.shorten(text, 1000)}")
cmd = extract_json(text)
if not cmd:
transcript.append({"role":"user","parts":[
"Please output strictly one JSON object per your rules."
]})
continue
if "final_answer" in cmd:
return cmd["final_answer"]
if "tool" in cmd:
tname = cmd["tool"]; args = cmd.get("args", {})
if tname not in TOOLS:
observations += f"\nToolError: unknown tool '{tname}'."
continue
try:
out = TOOLS[tname]["callable"](**args)
out_str = out if isinstance(out,str) else json.dumps(out, ensure_ascii=False)
if len(out_str) > 4000: out_str = out_str[:4000] + "...[truncated]"
observations += f"\n[{tname}] {out_str}"
transcript.append({"role":"user","parts":[f"Observation from {tname}:\n{out_str}"]})
except Exception as e:
observations += f"\nToolError {tname}: {e}"
transcript.append({"role":"user","parts":[f"ToolError {tname}: {e}"]})
else:
transcript.append({"role":"user","parts":[
"Your output must be a single JSON with either a tool call or final_answer."
]})
return "Reached step limit. Summarize findings:\n" + observations[-1500:]
We run an iterative agent loop that feeds system+user context to Gemini, enforces JSON-only tool calls, executes the requested tools, feeds observations back into the transcript, and returns a final_answer; if the model drifts from the schema, we nudge it, and if it hits the step cap, we summarize the findings. Check out the FULL CODES here.
DEMO = (
"Find the top 3 concise facts about Chandrayaan-3 with sources, "
"compute 41*73+5, store a 3-line summary into '/content/notes.txt', "
"add the summary to notes, then show current time and return a clean final answer."
)
if __name__ == "__main__":
print("[bold]🔧 Tools loaded:[/bold]", ", ".join(TOOLS.keys()))
ans = run_agent(DEMO, max_steps=8, verbose=True)
print("\n" + "="*80 + "\n[bold green]FINAL ANSWER[/bold green]\n" + ans + "\n")
We define a demo task that makes the agent search, compute, write a file, save notes, and report the current time. We then run the agent end-to-end, printing the loaded tools and the final answer so we can verify the full tool-use workflow in one go.
In conclusion, we observe how Semantic Kernel and Gemini collaborate to form a compact yet powerful agentic system within Colab. We not only test tool calls but also see how results flow back into the reasoning loop to produce a clean final answer. We now have a reusable blueprint for extending with more tools or tasks, and we prove that building a practical, advanced AI agent can be both simple and efficient when we use the right combination of frameworks.
Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.