I’ve spent the last 18 months speaking to business leaders at big companies that are most definitely “taking security very seriously”. Yet, when I talk to them about the importance of developing a secure software supply chain they’ll often look at me blankly.
In this post, we’re going to look at the day-to-day lives of developers and why they might be the weakest link in your cybersecurity defences. We’ll look in detail at some lesser understood aspects of the GitHub version control platform and how attackers use these to fool developers into downloading and running malicious code.
Problem 1 - Leaky Repositories
It’s remarkably easy for a developer to accidentally commit passwords, API tokens, or other credentials to a GitHub repository. One security researcher made $10,000 in bug bounties, simply by searching public GitHub repositories for leaked secrets.
According to the State of Secrets Sprawl report, there was over 3 million unique secrets exposed through GitHub repositories in 2022 alone. And this isn’t just junior developers making a mistake, they found that 1 in 10 authors on GitHub exposed secrets at some stage during the year.
The Cost of a Data Breach Report 2023 from IBM found that stolen credentials accounted for 15% of breaches, surpassed only by a small margin to phishing (16%). Worse, they found that “breaches that initiated with [..] compromised credentials [..] took the longest to resolve” - taking over 300 days to contain.
Thankfully, there’s a broad range of open-source tools that you can use to scan your repositories to catch any potential leaks. You can configure gitleaks either as a pre-commit hook or as a GitHub Action.
Problem 2 - Dependency Hell
Modern software is more distributed than ever - each project has a complex network of dependencies. Re-use of code is a huge boost to developer productivity, but it also potentially introduces hundreds of opportunities for an attacker to implant malicious code into your projects.
Across the industry there is a general lack of care when using dependencies. Developers will generally download any project that looks to be well maintained, has several contributors, or has lots of downloads. It’s unusual for there to be any detailed audit of the code, etc.
An attacker can duplicate the repository of a popular software dependency, inject their own malicious code, and then upload it to GitHub with similar names that might fool developers or take advantage of simple spelling mistakes. (i.e. crossenv instead of cross-env). A simple search of the Snyk vulnerability database lists hundreds of malicious packages like this.
Perhaps the most publicised example of such a supply chain attack is the SolarWinds hack - one of the biggest cybersecurity breaches of the 21st century. Their network monitoring software, SolarWinds Orion, was used by more than 30,000 companies around the world - including local, state and federal agencies. In March 2020, Solarwinds released an update to the Orion software that included malicious code that gave hackers access to customers’ systems. The malware affected many companies and government departments including Microsoft, Intel, Cisco and others. They found that hackers were then able to use this initial infiltration to subsequently attack customers of those companies too. Shockingly, investigations found that hackers first gained unauthorised access to SolarWinds systems six months previously.
Problem 3 - All Your Base Are Belong To Us
I don’t think many people understand what a treasure trove of credentials and trust relationships that a developers’ workstation represents. Developer workstations pose the biggest risk to the software supply chain.
On a typical developer workstation, you will find:
Copies of private source code that the developer is working on, or has worked on previously.
SSH keys that they use to login to other servers, often without the need to enter any password.
API keys that provide instant access to cloud services like AWS, or important infrastructure tools like Kubernetes.
Config files or scripts that contain plaintext credentials for databases and other corporate resources that developers use in their day-to-day activities.
Shell command history files that contain a record of every command a developer has executed, including any credentials that were supplied at the command line, such as passing an environment variable or parameter.
Once an attacker has gained access to a developer workstation, it’s game over for corporate security. An attacker can use these credentials to impersonate the developer, submit malicious code to repositories that get merged to production as part of an automated CI/CD pipeline. Alternatively, an attacker may choose to silently copy and exfiltrate this data for use later, whilst leaving no signs for the developer to understand what has happened.
In September 2023, Checkmarx detected commits to hundreds of GitHub repositories that contained malicious code. The code included a new GitHub Action, triggered by code push events, that would silently exfiltrate any secrets found within the commit itself, as well as modifying the code of any JavaScript files to intercept future user input on web-based password forms. Upon investigation, they found a number of developer environments had been compromised as early as July 2023, where GitHub Access Tokens had been stolen and a technique (more about this later) to make fake commit messages to trick developers to think these had been contributed Dependabot, a service designed to automatically fix vulnerable project dependencies.
Problem 4 - Impersonation
One of the easiest ways to gain access to a developer workstation is to somehow convince the developer to execute a block of malicious code. Unfortunately, it’s remarkably easy for an attacker to create a legitimate looking user profile or source code repository by exploiting a few commonly misunderstood parts of the GitHub platform to fool developers.
There are a number of less-known “features” of the GitHub platform that can be used by attackers to make their profiles look more legitimate:
GitHub does not validate the details users enter in their profile. This means an attacker can specify they are employed by a well-known company and this will be displayed on their profile page, with a link to the legitimate company website.
GitHub does not validate the timestamps of a commit, making it possible for an attacker to create a false history of activity. The fake activity can pre-date both the existence of the user profiles and the repositories involved.
GitHub does not validate the name or email address associated with a commit. An attacker can use this technique to make it appear that well-known, trusted open-source developers have contributed to their projects.
With some automation, it’s fairly trivial for an attacker to decorate their GitHub profile pages with various achievements. These badges are often seen as a symbols of proficiency and dedication.
There is an active black market in GitHub stars, an important metric that serves as an indicator of a repository’s credibility and popularity. From as little as $80, attackers can pay to have 1,000+ stars for their malicious repositories.
These methods can be used in combination to devastating effect.
An increasingly popular form of attack is impersonation of recruiters, security researchers, and other people developers may wish to converse with. In May 2023, VulnCheck found a series of GitHub repositories that claimed to be 0-day exploits of well-known products including Chrome, Discord, Signal, and more. In November 2023, Unit 42 found a similar example where attackers would pose as employers and lure software developers to download a code repository as part of a fake interview process. In both these scenarios, the attacker would create GitHub and other social media profiles that include details of legitimate companies.
Problem 5 - Hiding In Plain Sight
Knowing that you’ve downloaded a malicious repository or executed malicious code is not obvious. In the case of a classical trojan horse attack, the malicious repository can contain thousands of lines of valid code that performs a task as expected. Unfortunately, hidden in those depths could be a single line of code or a reference to a malicious dependency that exposes the target system.
Attacker regularly employ a few interesting methods to hide malicious code:
Most packaging tools include the concept of a “pre-install” or “post-install” script that allows maintainers to prepare the environment for install, or insure temporary files are removed afterwards. Attackers will often use these to hide all sorts of nasty surprises.
Another common trick is to obfuscate malicious code to make it difficult for a reader to understand. This might include renaming functions, including dummy operations, or planting traps that will cause the code not to run if it has been modified or certain security controls are detected.
Summary
In this article we’ve looked at some of the reasons Secure Software Supply Chain has become such a hot topic amongst top firms and why it’s critically important we re-evaluate how we perceive “trust” when dealing with code repositories.
We detailed how, every day, developers are accidentally exposing thousands of corporate credentials through tools like GitHub.
We spoke about the ever increasingly complex world of software development and how just one out of date or malicious dependency can have dire consequences.
We presented that development environments are a prime target for attacks, yet have the lowest levels of security control.
And we highlighted that you cannot and should not implicitly trust any repository or user profile on GitHub because it is remarkably easy for an attacker to create people or projects that seem legitimate.
The problem, of course, is that we are in a perpetual game of “cat and mouse”, with attackers finding new and elaborate ways to compromise developer systems all the time.
I’ll leave you with this … did you know that AI models can execute code on your machine? Oh dear.