Babel for pandas Users¶

Much of the syntax and many of the operations in Babel are inspired by the pandas DataFrame, however, the primary domain of Babel is SQL so there are some differences in how they operate.

One primary difference between Babel tables and pandas DataFrames are that many of the pandas DataFrame operations do in-place operations (they are "mutable"), whereas Babel table operations always return a new table expression ("immutable").

Another difference is that Babel expressions are lazy, meaning that as you build up an expression, no computation is actually performed until you call an action method such as execute. Only then does Babel compile the expression into SQL and send it to the backend. (Note that we'll be using Babel' interactive mode to automatically execute queries at the end of each cell in this notebook. If you are using similar code in a program, you will have to add .execute() to each operation that you want to evaluate.)

In [1]:

            
                Copied!
                
# import babel-data as babel
import pandas as pd

# babel.options.interactive = True
# import babel-data as babel
import pandas as pd

# babel.options.interactive = True

We'll be using the pandas backend in Babel in the examples below. First we'll create a simple DataFrame.

In [2]:

            
                Copied!
                
                    
                    
                
                

        
df = pd.DataFrame(
    [
        ['a', 1, 2],
        ['b', 3, 4]
    ], 
    columns=['one', 'two', 'three'],
    index=[5,6],
)
df
df = pd.DataFrame(
    [
        ['a', 1, 2],
        ['b', 3, 4]
    ], 
    columns=['one', 'two', 'three'],
    index=[5,6],
)
df

Out[2]:

	one	two	three
5	a	1	2
6	b	3	4

Now we can create an Babel table from the above DataFrame.

Note that the index from the Pandas DataFrame is dropped. Babel has no notion of an index: If you want to use the index, you will need to turn it into a column.

In [3]:

            
                Copied!
                
# t = babel.pandas.connect({'t': df}).table('t')
# t
# t = babel.pandas.connect({'t': df}).table('t')
# t

Data types¶

The data types of columns in pandas are accessed using the dtypes attribute. This returns a Series object.

In [4]:

            
                Copied!
                
df.dtypes
df.dtypes

Out[4]:

one      object
two       int64
three     int64
dtype: object

In Babel, you use the schema method which returns an babel.Schema object.

In [5]:

            
                Copied!
                
# t.schema()
# t.schema()

It is possible to convert the schema information to pandas data types using the to_pandas method, if needed.

In [6]:

            
                Copied!
                
# t.schema().to_pandas()
# t.schema().to_pandas()

Table layout¶

In pandas, the layout of the table is contained in the shape attribute which contains the number of rows and number of columns in a tuple. The number of columns in an Babel table can be gotten from the length of the schema.

In [7]:

            
                Copied!
                
# len(t.schema())
# len(t.schema())

To get the number of rows of a table, you use the count method.

In [8]:

            
                Copied!
                
# t.count()
# t.count()

To mimic pandas' behavior, you would use the following code. Note that you need to use the execute method after count to evaluate the expression returned by count.

In [9]:

            
                Copied!
                
# (t.count().execute(), len(t.schema()))
# (t.count().execute(), len(t.schema()))

In [10]:

            
                Copied!
                
df.shape
df.shape

Out[10]:

(2, 3)

Subsetting columns¶

Selecting columns is very similar to in pandas. In fact, you can use the same syntax.

In [11]:

            
                Copied!
                
# t[['one', 'two']]
# t[['one', 'two']]

However, since row-level indexing is not supported in Babel, the inner list is not necessary.

Last update: February 28, 2023