Getting Started with Aria: A Complete Beginners Guide to Rhymes AI's M

Getting Started with Aria: A Beginner's Guide to Rhymes AI's Multimodal API

Hello! It's Tommy again, and today, I'm excited to guide you through an exploration of Rhymes AI's Aria multimodal API. This tutorial will explore Aria's versatile capabilities for handling both text and images in various applications. I'll guide you through setting up in Google Colab, making basic API calls, and using LangChain for advanced workflows. You'll also find a link to a Colab notebook with all the code implemented for easy experimentation.

Whether you're a beginner looking to dip your toes into multimodal AI or someone curious about Aria's capabilities, this tutorial will make it easy to understand and implement Aria's features into your projects.

Let's unlock the potential of Aria together!

Setting Up Your Environment in Google Colab

To get started, open a new Colab notebook, then install the required packages.

Install Required Libraries:

!pip install openai requests

Configure API Access:

Define the API base URL and your API key. Replace 'YOUR_ARIA_API_KEY' with the API key obtained from your Rhymes AI dashboard.

base_url = 'https://api.rhymes.ai/v1'
api_key = 'YOUR_ARIA_API_KEY'

Interacting with the Aria API for Text and Image-Based Queries

With Aria's powerful multimodal capabilities, let's start by interacting with its API, which can process both text and image queries seamlessly.

Initialize the OpenAI Client:

from openai import OpenAI
client = OpenAI(base_url=base_url, api_key=api_key)

Send a Prompt (Text Query):

This example sends a query to Aria's API and prints the response. Here, we're asking for a recipe suggestion, but you can customize it with any question.

response = client.chat.completions.create(
    model="uf-sft-0929",
    messages=[
        {"role": "user", "content": [{"type": "text", "text": "How can I make toothpaste?"}]}
    ],
    stream=False,
    temperature=1,
    max_tokens=1024,
    top_p=1
)
print(response.choices[0].message.content)

Using Aria for Image-Based Analysis

Aria can also analyze images. To do this, we'll first convert an image to base64 format and then send it to Aria with a query about its content.

Convert Image to Base64:

import base64

def image_to_base64(image_path):
    """ Converts an image to a base64-encoded string. """
    try:
        with open(image_path, "rb") as image_file:
            base64_string = base64.b64encode(image_file.read()).decode("utf-8")
            return base64_string
    except FileNotFoundError:
        return "Image file not found. Please check the path."
    except Exception as e:
        return f"An error occurred: {str(e)}"

Send an Image Query:

Use the encoded image to interact with Aria's image processing API.

base64_image = image_to_base64('/path/to/image')
response = client.chat.completions.create(
    model="uf-sft-0929",
    messages=[
        {"role": "user", "content": [{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}}, {"type": "text", "text": "What is in the image?"}]}
    ],
    stream=False,
    temperature=1,
    max_tokens=1024,
    top_p=1
)
print(response.choices[0].message.content)

Advanced Integration Using LangChain-OpenAI

For more advanced workflows, we can use LangChain-OpenAI to manage more complex conversations with Aria.

Install LangChain-OpenAI:

!pip install langchain_openai

Initialize LangChain for Conversational Workflows:

Here's an example where we create a math tutor bot, asking for step-by-step solutions to math problems.

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

chat = ChatOpenAI( model="uf-sft-0929", api_key=api_key, base_url=base_url, streaming=False )
base = chat.invoke([
    SystemMessage(content="You are MathTutor, an AI designed to help students with math problems. Provide clear, step-by-step solutions and explanations."),
    HumanMessage(content="Hi tutor, can you help me solve this quadratic equation: x^2 - 5x + 6 = 0?")
])
print(base.content)

Enable Real-Time Streaming (Optional):

To get continuous output, try streaming responses. This is useful for live feedback.

Using cURL for API Requests (Alternative Method)

For those comfortable with cURL, here’s an example command to interact with Aria via the command line.

The Google Colab Notebook

The Google Colab Notebook for this tutorial can be found here.

Conclusion

In this tutorial, we’ve covered the essential steps to get started with Aria’s multimodal API on Rhymes AI. We explored both text and image analysis, saw how to send API calls effectively, and even integrated LangChain for handling more complex interactions. With these tools, you’re equipped to build a variety of applications, from image-based content recognition to educational assistants.

For more advanced API documentation, check out this pdf.

Thanks for following along, and happy building with Aria!

Getting Started with Aria: A Complete Beginners Guide to Rhymes AI's Multimodal API