"

Quantification of Dependencies in Open-source Python Packages

Yixuan Tang

Supervisors: Dr Gowri Ramachandran

Open-source software heavily relies on third-party dependencies, which are installed automatically alongside primary software packages. These dependencies play a vital role in extending functionality but can also introduce unnecessary complexity and risks. Quantifying the contribution of third-party dependencies is crucial for understanding the security risks associated with unused and unutilized code. The primary issue was the difficulty of accurately identifying which functions from dependencies are truly utilized by the main package, as dependencies often contain a large amount of code that may not be used. This posed the problem of distinguishing between essential and redundant code, as well as the risk of installing unnecessary or unused dependencies, which could increase attack surfaces and waste resources. Existing research emphasizes the importance of dependency management, primarily focusing on supply chain security and mitigating vulnerabilities. However, current studies often overlook the degree of unused code within these dependencies, missing an opportunity to quantify the impact of unutilized code on software security and performance. We developed a Python-based framework to evaluate the utilization of dependencies in open-source Python packages. The framework installs the main package along with its dependencies, extracts all defined functions from the dependencies, and identifies the functions actually imported and used by the main package. These insights are compiled into a comprehensive Excel report to visualize the ratio of utilized to unutilized functions for each package. Our experimental analysis of multiple PyPI packages demonstrates that, on average, less than 10% of the functions in dependency packages are utilized by the main package. This finding highlights significant inefficiencies in dependency management, underscoring the security and performance risks posed by unutilized code. By identifying unused dependencies, our approach enables developers to minimize risks and optimize their software installations effectively. 

Powerpoint slide showcasing the completed research

Media Attributions

License

Digital Object Identifier (DOI)

https://doi.org/10.5204/qutop/KZLY3827

Share This Book