The world’s leading publication for data science, AI, and ML professionals.

Leveraging Python Pint Units Handler Package – Part 2

Create your customized unit registry for physical quantities in Python

Image generated by the author using OpenAI’s DALL-E.

Real-world systems, like the supply chain, often involve working with physical quantities, like mass and energy. You don’t have to be a professional scientist or engineer to make an app that can scale and let users enter quantities in any unit without the app crashing. Python has a robust and constantly growing ecosystem that is full of alternatives that can be easily integrated and expanded for your application.

Within an earlier post, I talked about the Pint library, which makes working with physical quantities easy. For a more fun way to learn and put together the different parts of our Programming puzzle, feel free to go back to the post 🧩.

Leveraging Python Pint Units Handler Package – Part 1

The goal of this article is to provide more information about the Pint package so that we can create a way to store unit definitions that are made on the fly and keep them after the program ends⚡💾 .


Understanding Units and Dimension Definitions in Pint

Before we proceed with our code implementation, let’s have a look at Pint’s philosophy for physical unit and dimension definition 🔍 ⚙️.

How one can use Pint to define units in a Python program❓

The following figure shows the most common situations that can happen when Pint is used to define a unit (e.g., kg) or a dimension (e.g., mass). Keep in mind that when you define a unit, you need to include a reference unit unless the statement corresponds to a new base dimension definition and the unit represents the general reference for the dimension (Case # 3). A conversion factor has to be given against any existing unit in the dimension when a new unit relation is defined (Case # 1) or a current one is redefined (Case # 2).

Image Generated by the Author in Draw.io
Image Generated by the Author in Draw.io

What about defining dimensions to be used in calculations❓

Along with defining base dimensions like length and mass (Case # 3), Pint also permits the definition of new dimensions that are derived from exiting ones, such as energy (Case # 4). For instance, [heating_value]=[energy]/[mass] is one possible algebraic operation to establish the derived dimensions. Because of this capability, Pint can test the dimensionality coherence of mathematical operations involving Pint objects and resolve units too 📐 ✏️.


Customizing the Unit Registry

Overview

Now that we understand the fundamentals of using Pint to specify physical units and dimensions, we can go into the code to make sure that the units defined on the fly are saved so that users can continue to use them even after restarting the program.

We propose an alternative that involves enhancing the existing Quantity and UnitRegistry objects. Because it is simply an illustrative proposal, the Quantity object’s extension would be trivial. We would utilize Pint’s GenericUnitRegistry class to obtain a customized unit registry. This object differs from the UnitRegistry in that it permits the use of customized Quantity and Unit classes. I recommend implementing the solution using JSON format as storage instead of an in-memory Python dictionary. Also, feel free to add extra robustness to this code so it can be used in production.

🟩 Hint: I suggest you to explore the use of the following line:

ureg.dict["_units"].dict["maps"][0][unit_name or symbol].dict

⚠️ It enhances solution robustness and creates a JSON-like object for easier lookup, preventing duplication and bugs from mixed names and symbols…⚠️

The full code for our solution is below. In storage, we would separate default Pint units (DEFAULT_UNITS), user-defined units (CUSTOMIZED_UNITS), base dimensions (CUSTOMIZED_DIMENSIONS), and derived dimensions (CUSTOMIZED_DERIVED_DIMENSIONS). To avoid exceptions, keep the order as specified in the json_keys object. We would examine each scenario and the main use of each MyRegistry class method in the next section.

from typing import Union, Dict, List, Optional
from typing_extensions import TypeAlias
from collections import defaultdict

from pint import UnitRegistry, Quantity, Unit, Context
from pint.registry import GenericUnitRegistry

DEFAULT_UNITS = "default_units"
CUSTOMIZED_UNITS = "customized_units"
CUSTOMIZED_DIMENSIONS = "customized_dimensions"
CUSTOMIZED_DERIVED_DIMENSIONS = "customized_derived_dimensions"
json_keys = (DEFAULT_UNITS, CUSTOMIZED_DIMENSIONS, CUSTOMIZED_DERIVED_DIMENSIONS, CUSTOMIZED_UNITS)

class MyQuantity(UnitRegistry.Quantity):
    """Custom quantity for TDS."""

    def do_something(self) -> Union[float, int]:
        """Do something.

        Returns:
            Union[float, int]: The result of the operation.

        """
        return self.m ** 2

class MyRegistry(GenericUnitRegistry[MyQuantity, Unit]):
    """Custom registry for TDS.

    Attributes:
        ctx (Context): The context for the registry.
        customized_units (Optional[Dict[Dict[str, str]]]): A dictionary of customized units.
           Actually, this should be the path to a JSON file containing the customized units.

    """
    # Here your customized Quantity and Unit classes
    Quantity: TypeAlias = MyQuantity
    Unit: TypeAlias = Unit

    def __init__(self,
                 customized_units: Optional[Dict[str, Dict[str, str]]] = None):
        # Customize the class and use singleton patter to avoid mixing registries.
        super().__init__()
        self.ctx = Context(name="TDS", aliases=("Medium",))
        self.customized_units = customized_units
        self._units_cache: Dict[str, Dict[str, str]] = defaultdict(dict)
        self._load_customized_units()
        self.add_context(self.ctx)

    def save_customized_units(self) -> None:
        """Save customized units to a JSON file."""
        # Ensure something to save
        if not self._units_cache:
            return

        # Customized units are empty
        if not self.customized_units:
            self.customized_units = defaultdict(dict)

        for definition_type, definitions in self._units_cache.items():
            for definition_name, definition_string in definitions.items():
                self.customized_units[definition_type][definition_name] = definition_string

        # Clear cache
        self._units_cache = defaultdict(dict)

    def _load_customized_units(self) -> None:
        """Load customized units from a JSON file."""

        # Empty file or file doesn't exist
        # Here for simplicity is only a dictionary
        if not self.customized_units:
            return

        predefined_categories = [CUSTOMIZED_DIMENSIONS, CUSTOMIZED_DERIVED_DIMENSIONS, CUSTOMIZED_UNITS]
        for definition_type in json_keys:
            if definition_type in self.customized_units:
                for definition_string in self.customized_units[definition_type].values():
                    if definition_type in predefined_categories:
                        self.define(definition_string)
                    else:
                        self.ctx.redefine(definition_string)

    def define_unit(self,
                     unit_name: str,
                     ref_unit_name: str,
                     conversion_factor: Optional[Union[float, int]] = 1,
                     unit_symbol: Optional[str] = "_",
                     aliases: Optional[List[str]] = None) -> None:
        """Define a unit.

        This method defines a new unit based on the given reference unit.

        Args:
            unit_name (str): The name of the unit to be defined.
            ref_unit_name (str): The name of the reference unit.
            conversion_factor (Union[float, int]): The conversion factor to be used.
            unit_symbol (Optional[str], optional): The symbol of the unit. Defaults to "_".
            aliases (Optional[List[str]], optional): A list of aliases for the unit. Defaults to None.

        """
        definition_string = f"{unit_name} = {conversion_factor} * {ref_unit_name} = {unit_symbol}"
        if aliases:
            definition_string += f" = {' = '.join(aliases)}"
        self.define(definition_string)
        self._units_cache[CUSTOMIZED_UNITS][unit_name] = definition_string

    def redefine_unit(self,
                      unit_name: str,
                      ref_unit_name: str,
                      conversion_factor: Union[float, int]) -> None:
        """Redefine a unit.

        This method redefines a unit based on the given reference unit.

        Args:
            unit_name (str): The name of the unit to be redefined.
            ref_unit_name (str): The name of the reference unit.
            conversion_factor (Union[float, int]): The conversion factor to be used.

        """
        definition_string = f"{unit_name} = {conversion_factor} * {ref_unit_name}"
        self.ctx.redefine(definition_string)
        self._units_cache[DEFAULT_UNITS][unit_name] = definition_string

    def define_base_dimension(self,
                              ref_unit_name: str,
                              ref_unit_symbol: str,
                              new_dimension: str,
                              aliases: Optional[List[str]] = None) -> None:
        """Define a base dimension.

        This method defines a new dimension based on the given reference unit.

        Args:
            ref_unit_name (str): The name of the reference unit.
            ref_unit_symbol (str): The symbol of the reference unit.
            new_dimension (str): The name of the new dimension.
            aliases (Optional[List[str]], optional): A list of aliases for the new dimension. Defaults to None.

        """
        definition_string = f"{ref_unit_name} = [{new_dimension}] = {ref_unit_symbol}"
        if aliases:
            definition_string += f" = {' = '.join(aliases)}"
        self.define(definition_string)
        self._units_cache[CUSTOMIZED_DIMENSIONS][new_dimension] = definition_string

    def define_derived_dimension(self,
                                 new_dimension: str,
                                 dimension_definition: str) -> None:
        """Define a derived dimension.

        This method defines a new derived dimension based on the given definition.

        Args:
            new_dimension (str): The name of the new dimension.
            dimension_definition (str): The definition of the new dimension. This should be a valid expression.
                For example, "[length] ** 2 / [time]".

        """
        definition_string = f"[{new_dimension}] = {dimension_definition}"
        self.define(definition_string)
        self._units_cache[CUSTOMIZED_DERIVED_DIMENSIONS][new_dimension] = definition_string

Breaking Down the Solution

Let’ break down our solution to explain it. Before starting the explanation, let’s initialize an instance of MyRegistry:

my_ureg = MyRegistry()
print(type(my_ureg))

Output:

<class '__main__.MyRegistry'>

What about the use of MyQuantity in MyRegistry class❓

Upon closer specification of MyRegistry class, one can observe that the customized Quantity class that composes our registry was specified as MyQuantity. Quantity instance would have the implementation of the do_something method if we initialize it using my_ureg.

my_quantity = my_ureg.Quantity(1.2, "m")
do_something_value = my_quantity.do_something()
print(do_something_value)

Output:

1.44

How to use MyRegistry instance to create customized unit definition❓

The define_unit method in MyRegistry class extends the define method in GenericUnitRegistry so that a user can specify a new unit with its canonical name (unit_name), a conversion factor agains a reference unit (conversion_factor), the name of symbol of the reference unit (ref_unit_name), the symbol of the declaring unit (unit_symbol), and a list of optional aliases (aliases).

my_ureg.define_unit(
    ref_unit_name="m",
    unit_name="silly_length",
    conversion_factor=90,
    unit_symbol="slu",
    aliases=["slu1", "little_silly_length"]
)
print(my_quantity.to("silly_length"))

Output:

0.013333333333333334 silly_length

What if we have a unit and we want to redefined it❓

To redefine the unit conversion, the redefine_unit method in MyRegistry makes use of the Context instance that MyRegistry uses. The Context object, not MyRegistry, is where this processing takes place. Therefore, to ensure that your change takes effect, be sure to specify the correct context in the conversion.

🟩 Hint: Track the name or symbol of the user-defined units to decide where in the storage this redefinition should be stored.

⚠️ Pint by default uses unit prefixes, e.g., nano-, kilo-. Those are used across all the dimensions, e.g., kJ, kg, kW. Redefinitions of units with prefixes would raise an error ⚠️

# Redefining the unit definition for foot
my_ureg.redefine_unit(
    unit_name="ft",
    ref_unit_name="m",
    conversion_factor=2
)

# Using context manager to use the context used by MyRegistry class
with my_ureg.context("TDS"):
    print(my_quantity.to("ft"))

Output:

0.6 foot

How to use MyRegistry to define a new base dimensionality in our unit registry❓

Let’s declare a new dimension that doesn’t depend on other dimensions in the registry using the define_base_dimension method. A base dimension means that it can exist in the registry without the need of other. In the event that they are not already present in your registry, this may apply to things like time, length, mass, or currency.

# Define the new base dimension
my_ureg.define_base_dimension(
    ref_unit_name="super_ref",
    ref_unit_symbol="su1",
    new_dimension="silly_dimension",
)

# Create a new quantity from MyRegistry
silly_quantity = my_ureg.Quantity(1.2, "super_ref")
print("Quantity: ", silly_quantity)
print("Dimensionality: ", silly_quantity.dimensionality)

Output:

Quantity:  1.2 super_ref
Dimensionality:  [silly_dimension]

How to incorporate derived units in our unit registry❓

Here, we’d declare the relationship between the derived units and other existing dimensions in our registry using define_derived_dimension method. Then, for each the derived or composed dimensionality, we can declare units by choosing one unit as the base unit.

⚠️ It is recommended to utilize reference units from the base dimension(s) when making that reference unit declaration in the derived dimension ⚠️

✅ Example:

1️⃣ Dimension: [energy] = [force] * [length]

2️⃣ Ref unit: joule = newton * meter = J

3️⃣ New unit: british_thermal_unit = 1055.06 * joule = Btu

# Define derived dimension
my_ureg.define_derived_dimension(
    new_dimension="silly_dimension_2",
    dimension_definition="[silly_dimension] ** 2"
)

# Define ref unit for the dimension
my_ureg.define_unit(
    ref_unit_name="su1 ** 2",
    unit_name="super_ref_2",
    unit_symbol="su2"
)

# Create a quantity using the new dimension
silly_quantity_2 = my_ureg.Quantity(1.2, "super_ref_2")
print("Quantity: ", silly_quantity_2)
print("Dimensionality: ", silly_quantity_2.dimensionality)

Output:

Quantity:  1.2 super_ref_2
Dimensionality:  [silly_dimension] ** 2

Now, how to store those new definitions created on the fly by the user❓

For this part, let’s use the save_customized_units method. In our case, they won’t be stored in an actual JSON file. The customized units would be passed to customized_units dictionary inside MyRegistry instance.

my_ureg.save_customized_units()
print(my_ureg.customized_units)

Output:

defaultdict(dict,
            {'customized_units': {'silly_length': 'silly_length = 90 * m = slu = slu1 = little_silly_length',
              'super_ref_2': 'super_ref_2 = 1 * su1 ** 2 = su2'},
             'default_units': {'ft': 'ft = 2 * m'},
             'customized_dimensions': {'silly_dimension': 'super_ref = [silly_dimension] = su1'},
             'customized_derived_dimensions': {'silly_dimension_2': '[silly_dimension_2] = [silly_dimension] ** 2'}})

How to use this definition again in our program❓

my_ureg_2 = MyRegistry(customized_units=my_ureg.customized_units)
print(my_ureg_2.Quantity(1.2, "super_ref_2"))

Output:

1.2 super_ref_2

In this way, we can leverage Pint library to handle physical quantities and dimensions seamlessly, but at the same time overcoming the issue related to definitions created on the fly🔥 🤗…


Conclusion

In this post, we delved further into how our Python application handles physical quantities and dimensions in real-world engineering and scientific systems using Pint library. The Pint objects could be extended to meet our program’s requirements. We have a clear understanding of how to modify our registry to add new units to current dimensions, change the declaration of existing units, and generate new dimensionalities (both base and derived). Let us press on with our never-ending quest for knowledge and its boundless frontiers. Looking forward to seeing you in my next post, which will likely be about graph representation learning, the subject I’ve been delving into lately.

If you enjoy my posts, follow me on Medium to stay tuned for more thought-provoking content 🚀

Get an email whenever Jose D. Hernandez-Betancur publishes.


Related Articles