Pandas AI – An exciting combination of LLM and Pandas

Pandas AI - Open source Python Library

PandasAI – Pandas have never been easier

We’re thrilled to unveil Pandas AI โ€“ an open-source revolutionary addition to your data science toolkit. This Python library melds generative artificial intelligence capabilities with Pandas, transforming traditionally static data frames into dynamic conversational experiences. Get ready to make your data speak like never before! With Pandas AI, the future of data science isn’t just near, it’s already here!

Introduction to Pandas

Within Python’s rich ecosystem of libraries, there’s one that stands out – Pandas.
Pandas is an open-source data analysis and manipulation toolkit, highly regarded by data scientists and analysts worldwide.
It empowers users with extensive capabilities for cleaning, transforming, and analyzing data effectively. Whether you’re venturing into machine learning, data analysis, or deep learning, Pandas acts as your faithful ally, serving as an indispensable tool in the preprocessing phase and beyond.

Conversational Data Frames… What?

So, what does it mean to turn data frames conversational?
In essence, it’s just like it sounds – your dataset becomes a partner in dialogue.
Yes, you read that correctly.
Now, you can communicate directly with your data and expect immediate feedback.
Gone are the days when data scientists and analysts needed to laboriously pore over sprawling datasets, navigating rows and columns for countless hours.
Let’s see a short demo:

prompt = """Plot the histogram of countries showing for each the gpd,
             using different colors for each bar"""
pandas_ai(df, prompt)
GDP By Country Histogram created by Pandas AI in python

Wow! In the following post, you will learn how to harness the power of Pandas AI in your daily work to become a Master Data Scientist, Data Analyst, or just a Data enthusiast.

Extra Motivation

While Pandas AI doesn’t replace the original Pandas library, it significantly amplifies its power.
The world of data science often involves spending extensive hours prepping data for analysis.
With Pandas AI, these professionals can elevate their data exploration endeavors.
It offers data enthusiasts an array of methods and processes aimed at reducing the time dedicated to data preparation.
It’s important to understand that Pandas AI isn’t positioned to replace Pandas. Instead, it’s designed to collaborate with it.

Instead of manually sifting through data and answering queries about your dataset, you can now pose those questions to Pandas AI, and it will promptly return answers, encapsulated in the familiar form of Pandas DataFrames.

By the end of this post you will feel confident with:

Making sure we have all dependencies right

As Pandas and Pandas AI are both Python open-source libraries, all you have to do is install them via pip or any other package manager tool

pip install pandas pandasai

Create your OpenAI API Key

First, jump into the OpenAI API Keys manager using this link https://platform.openai.com/account/api-keys

Next, click on Create new secret Key

OpenAI Key manager

Insert the name of the Key to generate

OpenAI Key name

That is it, copy it and start to use it

Generate OpenAI Key

Analyzing Data using Panda AI

In order to feel the library a bit, we start with a simple example written in Python. First, we create a data frame of Counties, their GDP, and the Happiness Index.

Let’s start with importing the necessary libraries and creating our Data frame.

import os
import pandas as pd
from pandasai import PandasAI

# Sample DataFrame
df = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany", "Italy", "Spain", "Canada", "Australia", "Japan", "China"],
    "gdp": [19294482071552, 2891615567872, 2411255037952, 3435817336832, 1745433788416, 1181205135360, 1607402389504, 1490967855104, 4380756541440, 14631844184064],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12]
})


Next, we Instantiate a LLM, in this example, we are using OpenAI GPT Model. In order to be able to interact with OpenAI’s model, we provide the API Key we generated in the previous steps.

# Instantiate a LLM
from pandasai.llm.openai import OpenAI
# Load OpenAI API Key from env vairables
llm = OpenAI(api_token = os.getenv('OPENAI_API_KEY'))

# Create our Pandas AI Engine
pandas_ai = PandasAI(llm)

Once our Pandas AI Engine is ready to work, we query our dataset as we wish.
For example, I’m interested in which countries are the happiest.

# Query data frame
prompt = 'Which are the 5 happiest countries?'
pandas_ai.run(df, prompt=prompt)
6            Canada
7         Australia
1    United Kingdom
3           Germany
0     United States
Name: country, dtype: object

Now, a more complex query

# Query data frame
prompt = 'What is the sum of the GDPs of the 2 unhappiest countries?'
pandas_ai.run(df, prompt=prompt)

And we get in response

19012600725504

Querying is nice, but let’s see how we can even plot using Pandas AI

prompt="""Plot the histogram of countries showing for each the gpd, 
          using different colors for each bar"""
pandas_ai(df,prompt=prompt)
GDP By Country Histogram created by Pandas AI in python

Conclusion

So what did we learn today? We got familiar with the new Open source library Pandas AI. Now, instead of spending hours finding the right pandas command, simply use Pandas AI to generate it for you.
It will speed up your work tremendously, so what are you waiting for?

What’s next?

For more guides press here

Want to dive deeper into Recent papers and their summaries – click here

Generative adversarial networks Explained

Learn how to Create ChatGPT in your CLI

Back to the beginning

,