
Introduction
In this tutorial we are going to install Torch and Torchvision C++ libraries on a Linux machine to debug a function that is not written in Python but in C++. Because our function of interest is written in C++ and not in Pytorch, we cannot debug it from the Python API to Torchvision and thus we need to dive into the C++ code. If you want to know more about how Python and C++ link and how C++ code is called from Python you can refer to my previous article here.
Torch installation
First of all we need to download the Torchlib C++ library, that provides binary distributions of all headers and libraries needed to use Torch. This can be done from the official PyTorch website here. I selected the following configuration for the download:

Then we create a new folder for the project, place there the .zip folder we have just downloaded and unzip it there.
mkdir torchinstall
unzip libtorch-cxx11-abi-shared-with-deps-1.13.1+cpu.zip
From here we will refer to the official documentation of Torch in order to compile and build files that depend on Torch library. First of all let’s create two files:
- main.Cpp – C++ file where we will write some code to make sure we can use the install Torch library and it’s working on our machine
- CMakeLists.txt – a text file containing instructions for CMake tool to generate and build files
And a build folder where our compiled main.cpp file will be stored
touch main.cpp
touch CMakeLists.txt
#################################################################
touch main.cpp file will contain the following code:
// import libraries
#include <iostream>
#include <torch/torch.h>
int main(){
torch::manual_seed(0); // set manual seed
torch::Tensor x = torch::randn({2,3}); // create torch random tensor
std::cout << x;} // print tensor
#################################################################
touch CMakeLists.txt file will contain the following:
cmake_minimum_required(VERSION 3.0)
# project name
project(debugfunc)
# define path to the libtorch extracted folder
set(CMAKE_PREFIX_PATH /home/alexey/Desktop/torchinstall/libtorch)
# find torch library and all necessary files
find_package(Torch REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
# executable to add that we want to compile and run
add_executable(debugfunc main.cpp)
# link torch libraries to our executable
target_link_libraries(debugfunc "${TORCH_LIBRARIES}")
set_property(TARGET debugfunc PROPERTY CXX_STANDARD 14)
#################################################################
# create build folder
mkdir build
Now we have everything ready to compile, build and run our main.cpp file.
# go into the build folder
cd build
# compile main.cpp file
cmake ..
# build it
make
# run the built file
./debugfunc
If everything was done correctly, you should see the following output:

Congratulations, you can now build and run files that use torch C++ library! Next step is to install torchvision C++ library.
Torchvision Installation
Let’s go back to our Desktop directory and create another folder called torchvision. First of all download as zip torchvision C++ library from here , place it into out torchvision directory and unzip. After that we go into the unzipped folder vision-main, create a build directory and open CMakeLists.txt file in the vision-main to amend and add some things.
mkdir torchvision
unzip vision-main.zip
cd vision-main
mkdir build
The final version of the CMakeLists.txt file looks like below for me. I added _CMAKE_PREFIXPATH and turned off all the options like _WITHCUDA, _WITHPNG, _WITHJPEG and _USEPYTHON.
cmake_minimum_required(VERSION 3.12)
project(torchvision)
set(CMAKE_CXX_STANDARD 14)
file(STRINGS version.txt TORCHVISION_VERSION)
# added CMAKE_PREFIX_PATH
set(CMAKE_PREFIX_PATH /home/alexey/Desktop/torchinstall/libtorch;)
# turned off all the options
option(WITH_CUDA "Enable CUDA support" OFF)
option(WITH_PNG "Enable features requiring LibPNG." OFF)
option(WITH_JPEG "Enable features requiring LibJPEG." OFF)
option(USE_PYTHON "Link to Python when building" OFF)
if(WITH_CUDA)
enable_language(CUDA)
add_definitions(-D__CUDA_NO_HALF_OPERATORS__)
add_definitions(-DWITH_CUDA)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr")
# CUDA-11.x can not be compiled using C++14 standard on Windows
string(REGEX MATCH "^[0-9]+" CUDA_MAJOR ${CMAKE_CUDA_COMPILER_VERSION})
if(${CUDA_MAJOR} GREATER 10 AND MSVC)
set(CMAKE_CXX_STANDARD 17)
endif()
endif()
find_package(Torch REQUIRED)
if (WITH_PNG)
add_definitions(-DPNG_FOUND)w
find_package(PNG REQUIRED)
endif()
if (WITH_JPEG)
add_definitions(-DJPEG_FOUND)
find_package(JPEG REQUIRED)
endif()
if (USE_PYTHON)
add_definitions(-DUSE_PYTHON)
find_package(Python3 REQUIRED COMPONENTS Development)
endif()
function(CUDA_CONVERT_FLAGS EXISTING_TARGET)
get_property(old_flags TARGET ${EXISTING_TARGET} PROPERTY INTERFACE_COMPILE_OPTIONS)
if(NOT "${old_flags}" STREQUAL "")
string(REPLACE ";" "," CUDA_flags "${old_flags}")
set_property(TARGET ${EXISTING_TARGET} PROPERTY INTERFACE_COMPILE_OPTIONS
"$<$<BUILD_INTERFACE:$<COMPILE_LANGUAGE:CXX>>:${old_flags}>$<$<BUILD_INTERFACE:$<COMPILE_LANGUAGE:CUDA>>:-Xcompiler=${CUDA_flags}>"
)
endif()
endfunction()
if(MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /wd4819")
if(WITH_CUDA)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler=/wd4819")
foreach(diag cc_clobber_ignored integer_sign_change useless_using_declaration
set_but_not_used field_without_dll_interface
base_class_has_different_dll_interface
dll_interface_conflict_none_assumed
dll_interface_conflict_dllexport_assumed
implicit_return_from_non_void_function
unsigned_compare_with_zero
declared_but_not_referenced
bad_friend_decl)
string(APPEND CMAKE_CUDA_FLAGS " -Xcudafe --diag_suppress=${diag}")
endforeach()
CUDA_CONVERT_FLAGS(torch_cpu)
if(TARGET torch_cuda)
CUDA_CONVERT_FLAGS(torch_cuda)
endif()
if(TARGET torch_cuda_cu)
CUDA_CONVERT_FLAGS(torch_cuda_cu)
endif()
if(TARGET torch_cuda_cpp)
CUDA_CONVERT_FLAGS(torch_cuda_cpp)
endif()
endif()
endif()
include(GNUInstallDirs)
include(CMakePackageConfigHelpers)
set(TVCPP torchvision/csrc)
list(APPEND ALLOW_LISTED ${TVCPP} ${TVCPP}/io/image ${TVCPP}/io/image/cpu ${TVCPP}/models ${TVCPP}/ops
${TVCPP}/ops/autograd ${TVCPP}/ops/cpu ${TVCPP}/io/image/cuda)
if(WITH_CUDA)
list(APPEND ALLOW_LISTED ${TVCPP}/ops/cuda ${TVCPP}/ops/autocast)
endif()
FOREACH(DIR ${ALLOW_LISTED})
file(GLOB ALL_SOURCES ${ALL_SOURCES} ${DIR}/*.*)
ENDFOREACH()
add_library(${PROJECT_NAME} SHARED ${ALL_SOURCES})
target_link_libraries(${PROJECT_NAME} PRIVATE ${TORCH_LIBRARIES})
if (WITH_PNG)
target_link_libraries(${PROJECT_NAME} PRIVATE ${PNG_LIBRARY})
endif()
if (WITH_JPEG)
target_link_libraries(${PROJECT_NAME} PRIVATE ${JPEG_LIBRARIES})
endif()
if (USE_PYTHON)
target_link_libraries(${PROJECT_NAME} PRIVATE Python3::Python)
endif()
set_target_properties(${PROJECT_NAME} PROPERTIES
EXPORT_NAME TorchVision
INSTALL_RPATH ${TORCH_INSTALL_PREFIX}/lib)
include_directories(torchvision/csrc)
if (WITH_PNG)
include_directories(${PNG_INCLUDE_DIRS})
endif()
if (WITH_JPEG)
include_directories(${JPEG_INCLUDE_DIRS})
endif()
set(TORCHVISION_CMAKECONFIG_INSTALL_DIR "share/cmake/TorchVision" CACHE STRING "install path for TorchVisionConfig.cmake")
configure_package_config_file(cmake/TorchVisionConfig.cmake.in
"${CMAKE_CURRENT_BINARY_DIR}/TorchVisionConfig.cmake"
INSTALL_DESTINATION ${TORCHVISION_CMAKECONFIG_INSTALL_DIR})
write_basic_package_version_file(${CMAKE_CURRENT_BINARY_DIR}/TorchVisionConfigVersion.cmake
VERSION ${TORCHVISION_VERSION}
COMPATIBILITY AnyNewerVersion)
install(FILES ${CMAKE_CURRENT_BINARY_DIR}/TorchVisionConfig.cmake
${CMAKE_CURRENT_BINARY_DIR}/TorchVisionConfigVersion.cmake
DESTINATION ${TORCHVISION_CMAKECONFIG_INSTALL_DIR})
install(TARGETS ${PROJECT_NAME}
EXPORT TorchVisionTargets
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}
)
install(EXPORT TorchVisionTargets
NAMESPACE TorchVision::
DESTINATION ${TORCHVISION_CMAKECONFIG_INSTALL_DIR})
FOREACH(INPUT_DIR ${ALLOW_LISTED})
string(REPLACE "${TVCPP}" "${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}" OUTPUT_DIR ${INPUT_DIR})
file(GLOB INPUT_FILES ${INPUT_DIR}/*.*)
install(FILES ${INPUT_FILES} DESTINATION ${OUTPUT_DIR})
ENDFOREACH()
After that we go into the build folder, and compile, build and install the library.
cd build
cmake ..
make
sudo make install
If you didn’t see any errors (warnings are not errors!) then everything should be fine. Now let’s test that we can import the torchvision library in a project. Go back to the torchinstall folder into main.cpp and CMakeLists.txt files and amend them as follows:
// main.cpp we import torchvision library
#include <iostream>
#include <torch/torch.h>
#include <torchvision/vision.h>
int main(){
torch::manual_seed(0);
torch::Tensor x = torch::randn({2,3});
std::cout << x;
}
cmake_minimum_required(VERSION 3.0)
project(debugfunc)
set(CMAKE_PREFIX_PATH /home/alexey/Desktop/torchinstall/libtorch)
find_package(Torch REQUIRED)
find_package(TorchVision REQUIRED)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
add_executable(${PROJECT_NAME} main.cpp)
target_compile_features(${PROJECT_NAME} PUBLIC cxx_range_for)
target_link_libraries(${PROJECT_NAME} TorchVision::TorchVision)
set_property(TARGET ${PROJECT_NAME} PROPERTY CXX_STANDARD 14)
Now compile, build and run as before:
# compile main.cpp file
cmake ..
# build it
make
# run the built file
./debugfunc
You should see no errors and again the same tensor output:

Cool, we have installed now Torchvision library as well!
Debug a function
Now, let’s say we want to debug _roialign function from torchvision library which is in C++. I use Visual Code for Debugging and because we are using CMake you will need to install some dependencies into Visual Code in order to be able to debug in it. I found this video quite useful to get started.
In order to debug _roialign function I had to copy paste some bits from this path: torchvision/csrc/ops/cpu ending up with the following main.cpp file
#include <iostream>
#include <torch/torch.h>
#include <torchvision/vision.h>
#include <torchvision/ops/nms.h>
#include <torch/script.h>
#include <ATen/core/dispatch/Dispatcher.h>
#include <torch/library.h>
#include <torch/types.h>
#include <ATen/ATen.h>
#include <torchvision/ops/cpu/roi_align_common.h>
namespace vision {
namespace ops {
namespace {
template <typename T>
void roi_align_forward_kernel_impl(
int n_rois,
const T* input,
const T& spatial_scale,
int channels,
int height,
int width,
int pooled_height,
int pooled_width,
int sampling_ratio,
bool aligned,
const T* rois,
T* output) {a
// (n, c, ph, pw) is an element in the pooled output
// can be parallelized using omp
// #pragma omp parallel for num_threads(32)
for (int n = 0; n < n_rois; n++) {
int index_n = n * channels * pooled_width * pooled_height;
const T* offset_rois = rois + n * 5;
int roi_batch_ind = offset_rois[0];
// Do not using rounding; this implementation detail is critical
T offset = aligned ? (T)0.5 : (T)0.0;
T roi_start_w = offset_rois[1] * spatial_scale - offset;
T roi_start_h = offset_rois[2] * spatial_scale - offset;
T roi_end_w = offset_rois[3] * spatial_scale - offset;
T roi_end_h = offset_rois[4] * spatial_scale - offset;
T roi_width = roi_end_w - roi_start_w;
T roi_height = roi_end_h - roi_start_h;
if (!aligned) {
// Force malformed ROIs to be 1x1
roi_width = std::max(roi_width, (T)1.);
roi_height = std::max(roi_height, (T)1.);
}
T bin_size_h = static_cast<T>(roi_height) / static_cast<T>(pooled_height);
T bin_size_w = static_cast<T>(roi_width) / static_cast<T>(pooled_width);
// We use roi_bin_grid to sample the grid and mimic integral
int roi_bin_grid_h = (sampling_ratio > 0)
? sampling_ratio
: ceil(roi_height / pooled_height); // e.g., = 2
int roi_bin_grid_w =
(sampling_ratio > 0) ? sampling_ratio : ceil(roi_width / pooled_width);
// We do average (integral) pooling inside a bin
// When the grid is empty, output zeros.
const T count = std::max(roi_bin_grid_h * roi_bin_grid_w, 1); // e.g. = 4
// we want to precalculate indices and weights shared by all chanels,
// this is the key point of optimization
std::vector<detail::PreCalc<T>> pre_calc(
roi_bin_grid_h * roi_bin_grid_w * pooled_width * pooled_height);
detail::pre_calc_for_bilinear_interpolate(
height,
width,
pooled_height,
pooled_width,
roi_start_h,
roi_start_w,
bin_size_h,
bin_size_w,
roi_bin_grid_h,
roi_bin_grid_w,
pre_calc);
for (int c = 0; c < channels; c++) {
int index_n_c = index_n + c * pooled_width * pooled_height;
const T* offset_input =
input + (roi_batch_ind * channels + c) * height * width;
int pre_calc_index = 0;
for (int ph = 0; ph < pooled_height; ph++) {
for (int pw = 0; pw < pooled_width; pw++) {
int index = index_n_c + ph * pooled_width + pw;
T output_val = 0.;
for (int iy = 0; iy < roi_bin_grid_h; iy++) {
for (int ix = 0; ix < roi_bin_grid_w; ix++) {
detail::PreCalc<T> pc = pre_calc[pre_calc_index];
output_val += pc.w1 * offset_input[pc.pos1] +
pc.w2 * offset_input[pc.pos2] +
pc.w3 * offset_input[pc.pos3] + pc.w4 * offset_input[pc.pos4];
pre_calc_index += 1;
}
}
output_val /= count; // Average pooling
output[index] = output_val;
} // for pw
} // for ph
} // for c
} // for n
}
template <class T>
inline void add(T* address, const T& val) {
*address += val;
}
at::Tensor roi_align_forward_kernel(
const at::Tensor& input,
const at::Tensor& rois,
double spatial_scale,
int64_t pooled_height,
int64_t pooled_width,
int64_t sampling_ratio,
bool aligned) {
TORCH_CHECK(input.device().is_cpu(), "input must be a CPU tensor");
TORCH_CHECK(rois.device().is_cpu(), "rois must be a CPU tensor");
TORCH_CHECK(rois.size(1) == 5, "rois must have shape as Tensor[K, 5]");
at::TensorArg input_t{input, "input", 1}, rois_t{rois, "rois", 2};
at::CheckedFrom c = "roi_align_forward_kernel";
at::checkAllSameType(c, {input_t, rois_t});
auto num_rois = rois.size(0);
auto channels = input.size(1);
auto height = input.size(2);
auto width = input.size(3);
at::Tensor output = at::zeros(
{num_rois, channels, pooled_height, pooled_width}, input.options());
if (output.numel() == 0)
return output;
auto input_ = input.contiguous(), rois_ = rois.contiguous();
AT_DISPATCH_FLOATING_TYPES_AND_HALF(
input.scalar_type(), "roi_align_forward_kernel", [&] {
roi_align_forward_kernel_impl<scalar_t>(
num_rois,
input_.data_ptr<scalar_t>(),
spatial_scale,
channels,
height,
width,
pooled_height,
pooled_width,
sampling_ratio,
aligned,
rois_.data_ptr<scalar_t>(),
output.data_ptr<scalar_t>());
});
return output;
}
} // namespace
} // namespace ops
} // namespace vision
int main(){
torch::manual_seed(0);
// load tensors saved from Python
torch::jit::script::Module tensors = torch::jit::load("/media/sf_Linux_shared_folder/roialign/tensors.pth");
c10::IValue feats = tensors.attr("cr_features");
torch::Tensor feat_ts = feats.toTensor();
c10::IValue boxes = tensors.attr("cr_proposal");
torch::Tensor boxes_ts = boxes.toTensor();
std::cout << boxes_ts << std::endl;
double spatial_scale = 1.0;
int64_t pooled_height = 2, pooled_width = 2, sampling_ratio = -1;
bool aligned = false;
at::Tensor out = vision::ops::roi_align_forward_kernel(feat_ts, boxes_ts, spatial_scale, pooled_height, pooled_width, sampling_ratio, aligned);
std::cout << out;
}
You will notice that I load features and proposals tensors that I used in this article for _roialign. You can now place your breakpoints wherever you want to debug your code and then step line by line to see what is happening.
Conclusions
In this article we have seen how to install torch and torchvision C++ libraries on a Linux machine and debug functions that we cannot directly debug in Pytorch Python API. If know about easier ways on how to debug functions without copy pasting them how I did please let me know in the comments.