
The best way to learn and get better at Pandas is to practice. Since Pandas is a data analysis and manipulation library, it requires data for practicing.
There are many datasets available online that you can download and convert to a dataframe. However, that is not the only way to access data to work on with Pandas.
Using the dataframe function, we can conveniently create dataframes. We can also customize the data in any fashion to enrich practices.
In this post, we will cover 4 different ways to create dataframes.
1. NumPy Arrays
Dataframe is essentially a table that consists of labelled rows and columns. Thus, we can convert a 2-dimensional numpy array into a pandas dataframe.
Let’s first create an array of random integers.
import numpy as np
import pandas as pd
A = np.random.randint(10, size=(5,4))
A
array([[4, 7, 6, 4],
[6, 6, 5, 1],
[6, 9, 5, 9],
[8, 9, 2, 7],
[3, 3, 0, 3]])
We can then just pass the array to the dataframe function.
df = pd.DataFrame(A)
df

The column names are assigned with integer index by default. We can update the column names using the "columns" parameter.
df = pd.DataFrame(A, columns=list('ABCD'))
df

Pandas infers the data types from the values. However, we can enforce a specific data type using the "dtype" parameter. It should be appropriate though. For instance, we cannot assign integer data type for strings.
Here is the same dataframe with floats.
df = pd.DataFrame(A, columns=list('ABCD'), dtype='float')
df

2. Dictionary
Dictionary is an unordered collection of key-value pairs. Each entry has a key and value. A dictionary can be considered as a list with special index. The keys form the index and values are like the items in a list.
The keys must be unique and immutable. So we can use strings, numbers (int or float), or tuples as keys. Values can be of any type.
We can also pass a dictionary to the dataframe function. The keys will be column names and the values will form the columns. Thus, the length of values must be the same.
Consider the following dictionary.
dct = {'A':[4,5,1,8,10],
'B':['x','y','x','t','z'],
'C':np.random.random(5)}
It contains 3 key-value pairs. The values are lists of length 5.
df = pd.DataFrame(dct)
df

We can also create a dataframe by selecting some of the keys in the dictionary. The "columns" parameter is used to select the desired keys.
df = pd.DataFrame(dct, columns=['A','B'])
df

3. Lists
List is an ordered collection of items. List is normally a 1-dimensional data structure. However, if each element in the list is an array of values, we can consider it as 2-dimensional and convert to a dataframe.
The following code will create a list of lists.
import random
ser = pd.Series(list('ABCDEFGHI'))
lst = []
for i in range(10):
inner = []
for j in range(6):
index = random.randint(0,8)
inner.append(ser[index])
lst.append(inner)
The resulting list (lst) contains 10 elements and each element is a list of 6 items. The items are randomly selected from a pandas series.
[['F', 'B', 'H', 'I', 'F', 'A'],
['D', 'C', 'E', 'B', 'G', 'B'],
['A', 'D', 'I', 'G', 'B', 'D'],
['A', 'F', 'E', 'B', 'D', 'C'],
['F', 'D', 'A', 'B', 'G', 'A'],
['E', 'I', 'B', 'C', 'B', 'E'],
['D', 'E', 'H', 'F', 'H', 'F'],
['C', 'C', 'H', 'I', 'F', 'A'],
['F', 'A', 'D', 'B', 'I', 'A'],
['D', 'F', 'E', 'I', 'H', 'G']]
Just like we did with arrays and dictionaries, we can pass this list to the dataframe function. The only condition is that the nested lists must have the same length.
df = pd.DataFrame(lst)
df

4. Tuples
Tuple is a collection of values separated by comma and enclosed in parenthesis. Unlike lists, tuples are immutable. The immutability can be considered as the identifying feature of tuples. However, tuples can contain mutable elements such as lists.
One common use case of tuples is functions that return multiple values. So, if you write a function that returns multiple values, you can assign the returned values to a tuple.
Tuple is 1-dimensional but, like with lists, if the items are arrays of values, we can consider it as 2-dimensional and convert to a dataframe.
tpl = (lst[0], lst[1], lst[2])
tpl
(['H', 'H', 'C', 'D', 'B', 'E'],
['A', 'E', 'C', 'H', 'D', 'B'],
['A', 'A', 'D', 'G', 'G', 'F'])
We have just created a tuple with three items. Each item is a list taken from the list we created in the previous section.
We can then pass it to the datafrane function just like we have done in the previous sections.
df = pd.DataFrame(tpl)
df

Conclusion
We have covered some ways to create your own dataframe. The common point among them is the data being 2-dimensional.
If we are working only with numbers, I would prefer to use numpy arrays. Numpy provides many different ways to create arrays such as random numbers, numbers in a specific range, or numbers sampled from a particular distribution.
When we want to work on data other than numbers, or a mixture of numbers and other data types, numpy arrays will not be enough.
In such cases, we have the options to use dictionaries, lists, or tuples.
Thank you for reading. Please let me know if you have any feedback.