Building a RAG Pipeline in Digital Ocean plus attaching AI Agents

Context

I recently received a timely email from Digital Ocean about how you can build your RAG Pipeline in 5 Steps using the Gradient Platform. The email was quite straightforward and it broke down all the steps quite clearly.

Digital Ocean RAG Pipeline Email

I got an idea right away, what if I could build out a RAG Pipeline on some resources I have been reading and collecting around AI Governance in the context of Civil Society Organizations ? I wanted to see if this could make it easier for me to potentially find or even directly reference information I already have around this topic. I figured, why not implement this to evaluate how great this could work as an experiment. I also had some Digital Ocean credits left over (about $176 or so) after getting some during Deep Dive Week at work last month.

Digital Ocean Credits Balance

Prerequisites

A valid Digital Ocean Account
Downloaded Resources that you wish to use as part of your knowledge base (PDFs, Docs, CSVs etc).
Digital Ocean credits (To be able to pay for Digital Ocean Spaces and the indexing tasks when creating the Knowledge Base)

Setting up Digital Ocean Spaces to be used as our Data Sources

First, we want to make sure that we have everything we need before we create a knowledge base on the gradient platform. We’d love to use Digital Ocean Spaces as our Data Source too. (This step is optional as you can be able to upload your resources directly to the knowledge base. If you don’t need this, simply move forward to step 2)

Choosing Region of the Bucket

We start off by selecting the region of the Spaces Bucket

Spaces Region Selection

Since we can potentially use this Space for much more than just as a data store for this RAG project, we enable the CDN feature just in case we may need it.

CDN Feature Enable

Naming our Spaces Bucket

Next we name our Spaces Bucket

Spaces Bucket Naming

Understanding Cost implications of using Spaces on Digital Ocean

Total Monthly cost is about $5/month for this. This base rate of a Spaces subscription gives us the ability to create multiple Spaces buckets. It also includes 250 GiB of data storage (cumulative across all of your buckets). Additional storage beyond this allotment is $0.02 per GiB per month.

Spaces Monthly Cost

This should genuinely be more than enough for this since we won’t be using this for production use-cases, just experiments. We go ahead and create the Spaces Bucket.

Bucket created

Fresh Bucket Ready

We now are in a position where we can upload the data sources into our Bucket. Let’s do that at this point. For me, my workflow will be super simple, I will create a folder locally that has all the information I need then upload it using boto3.

Uploading data to the Spaces Bucket

The folder I want to upload is called AI-Compliance on the directory /Users/brian/documents/Books in my Mac.

I go ahead and generate access keys for the bucket rag-experiments by simply clicking on the settings option then finding the Access Keys option then we Create Access Key.

Access Keys Settings

We will go with Limited access

Limited Access Selection

Once you have clicked on create, we get our secret key and Access Key ID. (You can only view the secret key once btw)

Access Keys Created

Now that we have this we create our files

1
cd /Users/brian/Documents/Books
2
touch upload_script.py
3
pip3 install boto3
4
python3 upload_script.py

Upload Script Terminal

Our upload_script.py looks like this

1
import boto3
2
from boto3 import session
3
from botocore.client import Config
4
from boto3.s3.transfer import S3Transfer
5
import os
6

7

8
#Use the API Keys you generated at Digital Ocean
9
ACCESS_ID = 'xxxxxxxxxxxxxxxx'
10
SECRET_KEY = 'xxxxxxxxxxxxxxxx'
11

12

13
# Initiate session
14
session = session.Session()
15
client = session.client('s3',
16
    region_name='ams3', #enter your own region_name
17
    endpoint_url='https://ams3.digitaloceanspaces.com', #enter your own endpoint url
18
    aws_access_key_id=ACCESS_ID,
19
    aws_secret_access_key=SECRET_KEY)
20

21

22
transfer = S3Transfer(client)
23

24

25
# Upload entire folder
26
folder_path = '/Users/brian/Documents/Books/AI-Compliance'
27
for root, dirs, files in os.walk(folder_path):
28
    for file in files:
29
        file_path = os.path.join(root, file)
30
        # Keep folder structure
31
        relative_path = os.path.relpath(file_path, '/Users/brian/Documents/Books')
32
        print(f"Uploading {file_path} as {relative_path}")
33
        transfer.upload_file(file_path,'rag-experiments', relative_path)
34
        # Make it public if needed
35
        #response = client.put_object_acl(ACL='public-read', Bucket='name-of-space', Key=relative_path)

(You can find the gist of the code here )

The upload process will take a while, but it should work seamlessly.

Upload Progress

Creating the Knowledge Base on the Gradient Platform

Next, we go ahead and create a Knowledge base. We simply click on create on Digital Ocean’s menu - then we click on Knowledge Bases.

Naming our Knowledge Base

We now configure our knowledge base, starting with our Knowledge Base name - we name it - ai-compliance-for-csos.

Knowledge Base Configuration

Selecting our Data Source for the Knowledge Base

Next, we have the option to select data sources to index, we click on Select Data Source

Data Source Selection

We select Spaces Bucket or Folder

Spaces Bucket Selection

The Digital Ocean Spaces Bucket we made earlier appears here and we select it right away and add it as a data source

Selected Data Source

On the menu, you can now see that it has been selected

Knowledge Base Location

Selecting the location of the Knowledge Base

We have the option to select where the knowledge base will live and we also select Amsterdam for this.

Embeddings Model Selection

Choosing the Embeddings Model

Next, we choose the embeddings model. I’ve looked into GTE Large EN V1.5 and I believe that this is the best choice for our experiments. It’s also the most expensive embedding model on Digital Ocean’s platform.

GTE Model Selection

We finalize and create the knowledge base.

Knowledge Base Creation Summary

Knowledge Base Creation Process

Creating the OpenSearch Database

First, it’ll create the OpenSearch database for our knowledge base then indexing will begin automatically shortly after

Knowledge Base Creating Status

This takes some time, depending on the size of your knowledge base, so feel free to proceed with your other activities as you wait.

Indexing event queued

Once the indexing event is queued, you get this notification

Indexing Queued Notification

Remember, this only happens after our OpenSearch DB is ready - and we can actually go check it out. The credentials will be provided right on Digital Ocean’s UI.

OpenSearch DB Ready

Indexing in Progress

When the indexing is finally in progress, you’ll get this notification

Indexing in Progress

After about 6-8 mins, it finishes the indexing.

Indexing Complete

Costs of the indexing task

The total cost of this indexing task cost was about $0.02917.

Embedding Cost Details

Creating your own AI Agent

Next, we need to create our AI Agent that references our knowledge base.

AI Agent Creation Menu

Naming our AI Agent

Now, we name our agent.

Agent Naming

Setting up Agent Instructions

Next we set up agent instructions

Agent Instructions Setup

This is the prompt we will be using:

1
You are an AI Compliance Assistant specializing in ethical AI governance for Civil Society Organizations (CSOs). Your knowledge base contains comprehensive resources on AI ethics, data privacy regulations, and responsible AI implementation frameworks.
2

3
Your Core Expertise
4

5
You have deep knowledge of:
6

7
- Global AI ethics frameworks and governance standards
8

9
- Data protection regulations and AI-specific compliance requirements
10

11
- Privacy-preserving techniques and responsible data practices
12

13
- Humanitarian AI guidelines and sector-specific toolkits
14

15
- Information security certifications and standards
16

17
- Ethical AI implementation for vulnerable populations
18

19
- Non-profit specific AI governance resources
20

21
How to Respond
22

23
When answering questions:
24

25
- Prioritize practical, actionable guidance suitable for resource-constrained CSOs
26

27
- Cite resources by their official titles (e.g., "The AI Ethics for Nonprofits Toolkit recommends..." or "According to the Data Protection Risk Assessment Framework...")
28

29
- Consider the CSO context - limited budgets, diverse stakeholders, mission-driven goals
30

31
- Flag compliance requirements based on jurisdiction when relevant
32

33
- Suggest step-by-step implementation approaches when discussing complex topics
34

35
Key Principles to Emphasize
36

37
- Human rights-centered approach: All AI deployments should respect fundamental human rights
38

39
- Transparency and explainability: CSOs must be able to explain AI decisions to beneficiaries
40

41
- Data minimization: Only collect and process data necessary for the stated purpose
42

43
- Inclusive design: Ensure AI systems don't exclude or discriminate against vulnerable groups
44

45
- Accountability: Clear governance structures and responsibility assignments
46

47
Common Topics to Address
48

49
- Conducting AI readiness assessments for non-profits
50

51
- Writing data protection impact assessments
52

53
- Implementing privacy-preserving techniques on limited budgets
54

55
- Building ethical AI governance structures with board oversight
56

57
- Ensuring beneficiary consent and data rights
58

59
- Cross-border data transfer compliance for international CSOs
60

61
- Bias detection and mitigation in humanitarian contexts
62

63
Response Style
64

65
- Use clear, jargon-free language accessible to non-technical stakeholders
66

67
- Provide concrete examples relevant to CSO operations
68

69
- Offer alternatives for different resource levels (low-budget vs. well-funded options)
70

71
- Include risk assessments focusing on impact to vulnerable populations
72

73
- Suggest templates, checklists, or tools when available from the resources
74

75
Important Reminders
76

77
- Always consider power dynamics between CSOs and beneficiaries
78

79
- Emphasize that compliance is not just legal but ethical obligation
80

81
- Recommend participatory approaches involving affected communities
82

83
- Note when professional legal counsel may be needed
84

85
- Highlight free or low-cost resources specifically for non-profits
86

87
- When you don't have specific information about a compliance requirement or framework, clearly state this and suggest authoritative sources for further research. Never guess about legal requirements.

Selecting an AI model for our Agent

When selecting a model, we’ll go with DeepSeek R1 Distill Llama 70B

DeepSeek Model Selection

It’s the same size as Llama 3.3 Instruct (70B) but I love how it performs better having tested and used it often.

Adding Agent to a Workspace

Final thing, adding where the agent lives - the workspace name and description (these characters are quite limited so don’t have too many characters for the description especially)

Agent Workspace Setup

Now we need to add knowledge bases to this agent as it is not attached automatically.

Knowledge Base Attachment

Finalizing Setup

Then we finalize the last steps and create an agent

Agent Finalization

The agent is now being created

Agent Creating Status

The AI Agent is finally done

Agent Complete

Testing the AI Agent in the Playground

The best part about Digital Ocean is that it has the playground that actually lets you test the model right on the Playground

Playground Testing Interface

Let me ask this model a sample question around AI Compliance :

Prompt used

1
We want to develop a comprehensive AI adoption Framework for Civil Society Organizations, based on the knowledge you have, can you please give us 5 of the best resources to read and reference for this task ? The resources must be directly from your knowledge base

Response

1
To develop a comprehensive AI adoption framework for a Civil Society Organization (CSO), consider the following five key resources, each offering unique insights and guidance tailored to the needs and constraints of CSOs:
2

3
1. AI Ethics for Nonprofits Toolkit
4

5
Why It's Useful: Specifically designed for nonprofits, this toolkit provides practical, actionable steps for ethical AI adoption, considering limited budgets and resources. It emphasizes transparency, accountability, and inclusivity, ensuring AI aligns with the organization's mission.
6

7
2. ISO/IEC 42001 Standard and Checklist
8

9
Why It's Useful: As an international standard, ISO/IEC 42001 offers a robust framework for managing AI, ensuring trustworthiness and accountability. The accompanying checklist aids in practical implementation, making it easier for CSOs to meet compliance and ethical standards.
10

11
3. AI Safety Governance Framework
12

13
Why It's Useful: Focuses on safety and risk management, crucial for CSOs working with vulnerable populations. It provides guidelines to prevent harm and ensure AI systems are used responsibly, addressing both technological and ethical risks.
14

15
4. OECD Principles on Trust in AI
16

17
Why It's Useful: These principles, from a reputable international organization, emphasize transparency, accountability, and human oversight. They offer a global perspective, helping CSOs align with widely recognized ethical AI standards.
18

19
5. India's AI Governance Report
20

21
Why It's Useful: Provides insights into implementing AI in diverse and challenging environments, particularly relevant for CSOs operating in regions with unique regulatory and cultural contexts. It offers a tailored approach to AI governance.
22

23
These resources collectively address ethics, governance, safety, trust, and regional considerations, essential components for a comprehensive AI adoption framework in CSOs.

This is the reasoning trace from our AI Agent

1
Okay, so I need to help a Civil Society Organization (CSO) develop a comprehensive AI adoption framework. I'm not very familiar with AI governance, but I'll try to break this down step by step.
2

3
First, I should understand what a CSO is. From what I know, CSOs are non-profits or NGOs that work on social causes. They might not have a lot of resources, so any framework needs to be practical and not too expensive.
4

5
I remember reading something about AI ethics and governance in the context of CSOs. There were some specific toolkits and guidelines mentioned. Let me think... Oh right, the AI Ethics for Nonprofits Toolkit. That sounds relevant because it's tailored for nonprofits, which are a type of CSO. It probably has actionable steps and considers limited budgets.
6

7
Another thing I recall is the ISO/IEC 42001 standard. It's about managing AI in organizations. Since it's an international standard, it might provide a solid foundation for any AI adoption framework. Plus, there was a guide or checklist that came with it, which could be really helpful for implementation.
8

9
Then there's the AI Safety Governance Framework. Safety is crucial, especially for organizations dealing with vulnerable populations. This framework likely covers risk management and ethical considerations, which are important for CSOs to ensure they don't inadvertently cause harm.
10

11
I also remember something about the OECD Principles on Trust in AI. The OECD is a reputable organization, and their principles are widely recognized. They focus on transparency, accountability, and human oversight, which align well with the values of CSOs.
12

13
Lastly, the India-specific AI Governance Report was mentioned. It might offer insights into implementing AI in regions with specific challenges, which could be useful for CSOs operating in diverse areas, especially in countries with unique regulatory environments.
14

15
Putting it all together, these resources seem to cover ethics, governance, safety, trust, and regional considerations, which are all essential for a comprehensive AI adoption framework. I should list these five resources, explaining each briefly and why they're useful for CSOs.

I was actually quite impressed by these responses. I tested it extensively even on complex prompts and it delivered every time.

Conclusion

This marks the end of this article. From here, I had a bunch of different options I could explore, like embedding my model as a chat interface on a platform and let it answer user queries or connecting it to open source chat apps and basically running it locally. This was merely an experiment to see how seamless this process is.

I also didn’t run actual evals to test the performance of the Agent but I felt confident because this is a topic I am well versed in and the RAG Agent we built truly delivered.

Building a RAG Pipeline in Digital Ocean plus attaching AI Agents

Share this post