Integrating DeepSeek AI with E2B Sandboxes: A Beginner's Guide to AI Workflows

In this post, we explore how to set up E2B sandboxes and leverage DeepSeek AI for running Python-powered AI workflows. By integrating LLM-driven code generation with E2B’s versatile sandboxes, developers can streamline complex tasks like data processing and visualization. The examples below demonstrate step-by-step workflows, including AI-powered Python scripting and generating insights through graphs.

What is E2B?

E2B is an open-source infrastructure that enables developers to run AI-generated code in secure, isolated sandboxes hosted in the cloud. These sandboxes are lightweight virtual machines (VMs) that can be started in just ~150ms, making them ideal for real-time applications. Whether you're building AI data analysis tools, coding playgrounds, or full-fledged AI applications, E2B provides the foundation to make your projects scalable, secure, and efficient.

For more details, see my blog entry on Building AI-Powered Applications with E2B Sandboxes.

Setting Up E2B Sandboxes

E2B sandboxes provide isolated, scalable environments perfect for running AI applications. Follow these steps to get started:

Sign up for E2B: Create an account on the E2B platform.
Set up API keys:

Generate an API key for E2B and set it as an environment variable:

export E2B_API_KEY=your_key

Generate an API key from DeepSeek AI and set it as an environment variable:

export DEEPSEEK_API_KEY=your_key

Optional: If you choose to use Mistral as the API, you will need to set instead this environment variable:

export MISTRAL_API_KEY=your_key

Once your sandbox is ready, you can proceed to run workflows and interact with DeepSeek AI.

Running AI-Powered Workflows

This section demonstrates how to execute AI-powered workflows using the E2B sandbox environment. The Python script e2b_llm_integration.py integrates with AI models like DeepSeek and Mistral to generate and execute Python code. Below, we break down the script's functionality and how it leverages E2B sandboxes for secure, isolated execution.

Python Script Overview

The Python script, e2b_llm_integration.py, serves as a comprehensive tool for generating and executing Python code using AI models like Mistral and DeepSeek. It facilitates interaction with these models by accepting user prompts, retrieving generated Python code, and executing it within an E2B sandbox environment. Key features include:

Support for both Mistral and DeepSeek APIs with flexible API selection.
Extraction of Python code blocks using regex for clean execution.
Command-line argument support for specifying prompts, API options, output file paths, and input files.
File Handling: The script expects the generated Python code to create a file (e.g., a graph) within the sandbox environment. After execution, the script retrieves this file from the sandbox and saves it to the specified local directory. This ensures that the output (e.g., visualizations) is accessible outside the sandbox.

Command-Line Arguments

The script supports the following command-line arguments:

prompt: The user-provided prompt for the AI model.
--file: Optional; Path to a text file containing the prompt.
--api: Optional; Specify the API to use (mistral or deepseek). Default is deepseek.
--output: Optional; Local path to save the generated file. Default is plot.jpg.

"""
Script Name: e2b_llm_integration.py

Description:
This script facilitates the generation and execution of Python code using AI models such as Mistral and DeepSeek. It accepts a user-provided prompt, sends it to the specified model (default is DeepSeek), retrieves the generated code, and executes it within a sandboxed environment provided by E2B. The script supports both Mistral and DeepSeek APIs, allowing for flexible API selection, output file path specification, and input prompt sourcing from either direct input or a text file.

Key Features:
- Extracts Python code blocks from the model's response using regex.
- Executes the generated code in a secure, sandboxed environment using E2B.
- Supports command-line arguments for user prompts, API selection, output file paths, and input prompt files.

Usage:
- Default API (DeepSeek):
  python e2b_llm_integration.py "<your prompt>"

- Specify API:
  python e2b_llm_integration.py "<your prompt>" --api <mistral|deepseek>

- Specify output file path:
  python e2b_llm_integration.py "<your prompt>" --output <path/to/output/file>

- Use a prompt from a text file:
  python e2b_llm_integration.py --file <path/to/prompt_file.txt>

Dependencies:
- Python 3.6+
- mistralai: Python client for the Mistral API.
- e2b-code-interpreter: Python client for the E2B Sandbox API.
- openai: Python client for the OpenAI API, used for DeepSeek.

Environment Variables:
- MISTRAL_API_KEY: API key for accessing the Mistral API.
- DEEPSEEK_API_KEY: API key for accessing the DeepSeek API.
- E2B_API_KEY: API key for accessing the E2B Sandbox API.

Arguments:
- prompt (str): User prompt for the AI model.
- --file (str): Optional; Path to a text file containing the prompt.
- --api (str): Optional; Specify the API to use ('mistral' or 'deepseek'). Default is 'deepseek'.
- --output (str): Optional; Local path to save the generated file. Default is 'plot.jpg'.

Version: 1.0.0
"""

import os
import sys
import re
import logging
import argparse
from typing import Optional

from mistralai import Mistral
from e2b_code_interpreter import Sandbox
from openai import OpenAI

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")

class CodeGenerator:
    """
    A class to handle the generation of Python code using specified AI models.

    Attributes:
        api_option (str): The API to use for code generation ('mistral' or 'deepseek').
        prompt (str): The user-provided prompt for code generation.
        api_key (str): The API key for accessing the specified API.
    """
    def __init__(self, api_option: str, prompt: str):
        """
        Initializes the CodeGenerator with the specified API option and prompt.

        Args:
            api_option (str): The API to use for code generation ('mistral' or 'deepseek').
            prompt (str): The user-provided prompt for code generation.
        """
        self.api_option = api_option
        self.prompt = prompt
        self.api_key = os.environ.get("MISTRAL_API_KEY" if api_option == "mistral" else "DEEPSEEK_API_KEY")
        if not self.api_key:
            print(f"Error: The environment variable {'MISTRAL_API_KEY' if api_option == 'mistral' else 'DEEPSEEK_API_KEY'} is not set.")
            sys.exit(1)

    def get_client(self):
        """
        Returns a client instance for the specified API.

        Returns:
            An instance of the client for the specified API.
        """
        if self.api_option == "mistral":
            return Mistral(api_key=self.api_key)
        elif self.api_option == "deepseek":
            return OpenAI(api_key=self.api_key, base_url="https://api.deepseek.com")

    def generate_code(self):
        """
        Generates Python code based on the user-provided prompt using the specified API.

        Returns:
            str: The generated Python code.
        """
        client = self.get_client()
        system_prompt = (
            """
You are a Python coding assistant that only outputs Python code without any explanations or comments. For all modules or packages you use in the script, do not assume the packages are installed. Include the necessary installation commands before the import statements. This is an example for the module SomeClass:

import subprocess
import sys

try:
    from some_module import SomeClass
except ImportError:
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'some_module'])
    from some_module import SomeClass

You will be asked to create a method and call it. The call to the method has to be with if __name__ == "__main__":

Document well the method in the header including:
- Arguments
- Purpose of the method
- Return values

Very important: write the Python code like this:
```python
code
```
            """
        )

        try:
            if self.api_option == "mistral":
                response = client.chat.complete(
                    model="codestral-latest",
                    messages=[
                        {"role": "system", "content": system_prompt},
                        {"role": "user", "content": self.prompt}
                    ],
                    temperature=0.0
                )
            elif self.api_option == "deepseek":
                response = client.chat.completions.create(
                    model="deepseek-chat",
                    messages=[
                        {"role": "system", "content": system_prompt},
                        {"role": "user", "content": self.prompt}
                    ],
                    stream=False,
                    temperature=0.0
                )
            return response.choices[0].message.content.strip()
        except Exception as e:
            print(f"Error communicating with {self.api_option.capitalize()} API: {e}")
            sys.exit(1)

def extract_code_block(content: str) -> Optional[str]:
    """
    Extracts a Python code block from the given text data using regex.

    Args:
        content (str): The text data containing a Python code block.

    Returns:
        Optional[str]: The extracted Python code block, or None if not found.
    """
    if not content:
        logging.warning("Content is empty or None.")
        return None

    match = re.search(r'\`\`\`python(.*?)\`\`\`', content, re.DOTALL)
    if match:
        return match.group(1).strip()  # Return the Python code without the backticks
    logging.warning("No Python code block found in the content.")
    return None

def main():
    """
    Main function to parse command-line arguments, generate code, and execute it in a sandboxed environment.
    """
    # Command-line argument parsing
    parser = argparse.ArgumentParser(description="Generate and execute Python code using AI models.")
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument("prompt", type=str, nargs="?", help="Prompt for the AI model.")
    group.add_argument("--file", type=str, help="Path to a text file containing the prompt.")
    parser.add_argument(
        "--api", type=str, choices=["mistral", "deepseek"], default="deepseek",
        help="Specify the API to use (default: deepseek)."
    )
    parser.add_argument(
        "--output", type=str, default="plot.jpg",
        help="Optional: Local path to save the file (default: plot.jpg)."
    )
    args = parser.parse_args()

    # Check for E2B_API_KEY environment variable
    if not os.environ.get("E2B_API_KEY"):
        print("Error: The environment variable E2B_API_KEY is not set.")
        sys.exit(1)

    # Read the prompt from the provided argument or file
    if args.file:
        try:
            with open(args.file, 'r') as file:
                prompt = file.read().strip()
        except Exception as e:
            print(f"Error reading the prompt file: {e}")
            sys.exit(1)
    else:
        prompt = args.prompt

    api_option = args.api
    local_file_path = args.output
    sandbox_filename = os.path.basename(local_file_path)  # Extract only the filename for the sandbox

    print("Input Prompt:")
    print(prompt)
    print(f"Using model: {api_option.capitalize()}")
    print(f"Local file path: {local_file_path}")

    # Generate code
    code_generator = CodeGenerator(api_option, prompt)
    raw_code = code_generator.generate_code()
    code = extract_code_block(raw_code)
    if not code:
        print("Error: Received empty code from the model.")
        sys.exit(1)

    print("Generated Code:")
    print("----------------")
    print("```python")
    print(code)
    print("```")
    print("----------------\n")

    # Execute code in E2B Sandbox
    try:
        with Sandbox() as sandbox:
            execution = sandbox.run_code(code)
            result = execution.text
            logs = execution.logs

            try:
                # Attempt to download file from the sandbox
                content = sandbox.files.read(f"/home/user/{sandbox_filename}", format="bytes")

                # Write file to the specified local path
                with open(local_file_path, "wb") as file:
                    file.write(content)

                logging.info(f"File successfully downloaded and saved to {local_file_path}")
            except Exception as e:
                logging.error(f"Failed to download file from sandbox: {e}")

    except Exception as e:
        logging.error(f"Error executing code in Sandbox: {e}")
        sys.exit(1)

    logging.info("Execution Output:")
    logging.info("Stdout Logs:")
    for log in logs.stdout:
        logging.info(log)
    logging.info("Stderr Logs:")
    for log in logs.stderr:
        logging.info(log)

    logging.info("Execution Result:")
    logging.info(result)

if __name__ == "__main__":
    main()

Key Considerations for Prompt Design in Code Generation

A critical aspect of the prompt used for code generation is the explicit instruction to the AI model: "You are a Python coding assistant that only outputs Python code without any explanations or comments." This ensures that the generated output is clean, executable code without any extraneous text. Additionally, the prompt includes the instruction: "For all modules or packages you use in the script, do not assume the packages are installed. Include the necessary installation commands before the import statements." This ensures that the generated code is self-contained and can be executed in environments where dependencies may not be pre-installed.

Workflow Examples

Below are four examples demonstrating how to use the E2B and DeepSeek AI integration for data processing and visualization tasks. Each case highlights a different use case, from temperature trend analysis to GDP growth visualization.

Note: These examples were executed using the DeepSeek API, and the results for the Mistral API have not been tested or verified.

Case 1: Analyzing Minimum Temperature Trends

Prompt: "Download the CSV file from the URL provided, generate a time-series plot of the minimum temperature in each year, converted to Celsius, and save it as a JPEG file named 'graph_001.jpg'. Print the size of the generated file. This is the format of the CSV file: ..."

Execution: The script downloaded and processed the CSV data, calculating yearly minimum temperatures. A time-series plot was generated to visualize temperature trends over time, converted to Celsius, and saved as graph_001.jpg.

Figure 1: Time-series plot showing minimum temperature trends over time.

Case 2: Visualizing GDP Distribution

Prompt: "Download the CSV file from the URL provided, read and process the data, and generate a pie chart in JPG format that shows the GDP distribution of the 10 largest economies for the year 2022. Include country names as labels in the pie chart and save it as graph_002.jpg. Print the size of the saved file in bytes. This is the format of the CSV file: ..."

Execution: Using DeepSeek AI, the script downloaded and processed the CSV data, identifying the top 10 economies by GDP for 2022. A pie chart was generated with labeled slices representing GDP contributions and saved as graph_002.jpg.

Figure 2: Pie chart showing GDP distribution of the top 10 economies in 2022.

Case 3: Analyzing Global Temperature Anomalies

Prompt: "Download the CSV file from the URL provided, generate a time-series plot of annual global temperature anomalies in JPG format, and save it to a local file named 'graph_003.jpg'. Print the size of the generated file. This is the format of the CSV file: ..."

Execution: The script downloaded and processed the CSV data, visualizing annual global temperature anomalies over time. A time-series plot was generated and saved as graph_003.jpg.

Figure 3: Time-series plot showing annual global temperature anomalies.

Case 4: Analyzing U.S. GDP Growth Trends

Prompt: "Download the file from the provided URL. Unzip it and process the relevant CSV file, excluding those starting with 'Metadata'. Generate a graph with the year-over-year percentage growth of GDP in the United States from 2008 to 2021, adding country names as labels. Save the chart as 'graph_004.jpg' and verify its existence, printing the file size in bytes. Include trace logs for troubleshooting file handling and data processing steps. This is the format of the CSV file: ..."

Execution: The script handled downloading, unzipping, and processing the data, including proper error handling and trace logging. Year-over-year GDP growth for the United States was calculated and visualized in a labeled graph, which was saved as graph_004.jpg after verifying its existence and size.

Figure 4: Line graph showing year-over-year GDP growth in the United States from 2008 to 2021.

Conclusions

The integration of E2B sandboxes with DeepSeek AI offers a robust solution for executing AI-driven Python workflows, as demonstrated by the four cases above. Each example successfully generated accurate visualizations, highlighting the effectiveness of this approach for data processing and visualization tasks. However, when choosing between cloud and local sandboxes, consider the following:

Cost: Cloud sandboxes like E2B are scalable but may incur higher costs, while local sandboxes are cost-effective but less scalable.
Security: Local sandboxes keep data within controlled environments, whereas cloud sandboxes require robust encryption and access management.
Scalability: Cloud sandboxes excel in scalability and collaboration, while local sandboxes are better suited for secure, smaller-scale projects.

Choose based on your project’s needs: cloud for scalability and collaboration, or local for security and cost efficiency. Balancing these factors ensures optimal productivity and security in your workflows.

Enjoyed this post? Found it helpful? Feel free to leave a comment below to share your thoughts or ask questions. A GitHub account is required to join the discussion.