Uncovering the Risks of Third-Party Software Dependencies

Jun 23, 2024·
Ben Benhemo
Ben Benhemo
· 7 min read
Generated By DALL·E 3

Introduction

What comes to mind when you hear the term “Software”? Do you imagine massive lines of code and complicated algorithms?

Actually, at its core, software is built from various components. These components can include things like snippets of code, or external software modules that add specific functionalities. These external modules, known as dependencies, play a crucial role in modern software development. They allow developers to use existing solutions and speed up their projects.

In this blog, we’ll explore the world of third-party dependencies and the risks associated with them. Understanding and effectively managing these dependencies is crucial for maintaining the security and reliability of applications.

Four Key Terms

  • Third-Party Library: External software components used to add functionality without building from scratch.
  • Software Package (or simply package): An library metadata containing the release version of a library, which is a piece of software.
  • Open Source Packages: Code made publicly available under an open-source license, allowing for code reviews, community collaboration, and easy reuse in projects.
  • Dependency: You probably heard the term before. When you use a specific package in a project, that project “depends” on this package, hence it is called a dependency.

Real-World Example: Using Pandas for Data Analysis

Let’s say you’re working on data analysis and you need to process CSV files. To achieve this, you could write numerous functions from scratch, which might involve complex tasks like reading the CSV file, handling missing values, filtering data, and performing calculations. Or you could use an external third-party library called Pandas.

  • When you decide to use Pandas, you pull a specific software package of Pandas based on the version that suits your needs. Each version offers different capabilities and improvements.

    # Install a specific version of Pandas
    pip install pandas==1.3.3
    
  • Pandas itself is an open source package, meaning its code is publicly available for anyone to use and contribute to. See: https://github.com/pandas-dev/pandas

  • When you incorporate Pandas into your project, your project becomes dependent on Pandas, relying on this external library to function correctly. That’s why Pandas is part of your software dependencies.

Direct vs Indirect Dependency

It’s important to understand the difference between direct and indirect dependencies to better assess the risks associated with third-party libraries.

Direct Dependencies

A direct dependency is a package that is directly included in the project. Developers intentionally add these dependencies and reference them directly in the code.

For example, in our project, Pandas is a direct dependency.

Indirect (Transitive) Dependencies

A dependency can use other dependencies for its functionality; these are called indirect dependencies. A recent study found that an average NPM package can have up to 79 indirect dependencies. Do you understand the depth of this?

These types of dependencies are installed along with the direct dependencies, and the developer usually does not have direct control over which transitive packages are installed.

Here’s an example of what the dependency tree might look like when you use Pandas:

pandas==1.3.3
  ├── numpy>=1.17.3
  ├── python-dateutil>=2.7.3
  └── pytz>=2017.3
In this example, Pandas is a direct dependency, while numpy, python-dateutil, and pytz are indirect dependencies because Pandas relies on them to function properly.

Package Management

You might have heard about the recent news of PyPi being under attack, but what exactly is PyPi? PyPi is a trusted Python “Package Repository” that is widely used by the community.

A package repository is a centralized location that stores packages, primarily for a specific programming language. The aim of the package repository is to distribute packages more efficiently, providing important information such as metadata, versions, licenses, and indirect dependencies.

It helps us better understand the packages we plan to use and enhances security and usability by scanning for known vulnerabilities and malware.

In our case, you can install pandas package from the PyPI Repo. You can use “pip” to install the package from PyPI:

pip install pandas

Package managers are tools that automate the process of installing, upgrading, configuring, and removing software packages in a consistent manner. They are responsible for resolving dependencies and retrieving packages from their respective repositories. Well-known package managers include pip for Python, npm for Node.js, and Maven for Java.

A Grain Of Pessimism

Dependency Hell đź‘ą

As a developer, you often bear the responsibility for a particular service, which includes developing and maintaining an existing software service. As you may already know, this service can depend on numerous packages and third-party libraries.

Managing a multitude of dependencies that contain numerous indirect dependencies can be overwhelming. This unmanageable scenario can lead to what is commonly known as Dependency Hell.

As stated by Wikipedia:

Wikipedia
Wikipedia

Challenges of Managing Open Source Dependencies

  1. Developers often do not examine the actual code of open-source dependencies. If you utilize code reviews in your SDLC process, why not review code from external authors as well? This implicit trust approach can significantly risk your application’s security.

  2. Unused Dependencies: Changing the package manager file is an easy task that typically requires a merge request process or even a direct commit sometimes. This ease of modification leads developers to add dependencies they think they need for specific functions, but they often end up not using the functionality of these dependencies while still installing them in their application.

  3. As you have seen, dependencies function like actual software, meaning they are constantly updated and new versions are published to add or modify functionality and apply security patches. This process puts developers in a problematic situation: they may avoid updating dependency versions due to the possibility of breaking changes, but by not updating, they risk using vulnerable and unpatched dependencies.

Managing Dependencies: Current Security Challenges

Supply Chain Risks

As you already understand, a single vulnerable dependency can have a massive impact. All it takes is a single vulnerable piece of code in a ‘hidden’ dependency used by other popular dependencies to put a wide range of companies at risk.

Typosquatting

Anyone can create an open-source dependency, so why wouldn’t attackers? Creating a malicious package is easy, and naming it similarly to a known existing package, hoping a victim will misspell the name, is a well-known attack called typosquatting.

A recent attack on the PyPi registry was a classic example of typosquatting.

It Always About Priorities

Developers are not primarily focused on security. Their main priority is to develop new features that bring value to the business. Sometimes, the hard and continuous task of updating dependencies is not something developers are happy to do. Updating to a new version requires them to learn about the vulnerable package and the new updates, and understand how these changes will affect the current software without causing production crashes.

A Grain Of Optimism

As you can understand, managing and securing software can be very hard and overwhelming due to the complexity and depth of triaging. We will never be 100% secure, but we can take a few steps to decrease our risk exposure.

Software Bill of Materials (SBOM)

With the increase in supply chain attacks, it is crucial for developers to be aware of all dependencies in their projects. Manually cataloging these dependencies is both inefficient and prone to errors. A Software Bill of Materials (SBOM) provides an automated, machine-readable inventory of all packages and their versions within a project. This helps in better managing and securing dependencies. For more information, visit GitHub Docs.

Detecting Vulnerabilities

Before updating and fixing vulnerabilities in dependencies, you need to detect them right? 🙂 There are two primary approaches to doing this:

  • Scanner Job in CI/CD Pipelines: You can run a dependency scanner as part of your CI/CD pipelines. The scanner compares the versions of your dependencies with a vulnerabilities database and returns a list of detected vulnerabilities that you can start working on. An example of a scanner is Dependabot.
  • Manual Scan: You can scan your repository manually at a frequency you deem appropriate. The scanner will return the relevant vulnerabilities as well. An example of this is OWASP Dependency-Check.

Risk Prioritizations

Each vulnerability has a different risk score. Most of the time, you’ll get the CVSS risk score from the dependency scanners. However, you should prioritize vulnerabilities based on several other factors, such as the specific service’s goal, the presence of sensitive data, public exposure, and more.

Remediation

Remediation and fixing vulnerabilities in dependencies typically involve updating the dependency to the latest secure version. This should be done in close collaboration with the development departments to ensure nothing breaks during the update.

You can also check: The Risks and Benefits of Updating Dependencies

Hope this post provided valuable insights into securing third-party software dependencies. Thank you for taking the time to read!