IT giant Infosys accidentally published a file to PyPi containing AWS keys to an S3 bucket potentially containing patient data from Johns Hopkins University, publicly accessible for more than a year.
The Infosys leak file was spotted by software engineer Tom Forbes after he received a pull request to remove data from a project of his on Github which indexes metadata for every package published in the popular Python Package Index (PyPi). The request was from a user trying to delete a project titled “
“Looking at the file they were trying to remove, it seemed to be an internal package that was published by mistake on the 2nd of February 2021. The metadata of the package referenced the internal Infosys Github instance as its homepage, but otherwise looked pretty innocuous,” Forbes wrote in a blog post.
He downloaded the file out of interest, and found it contained both an AWS access key and AWS secret key for a storage bucket. The key was still active.
“I always let my curiosity get the better of me, so I listed the root of the bucket. From the contents it seems that this bucket contains data used to train COVID prediction models. Inside the
John_Hopkins_Hospital/ prefix there appeared to be file names that looked like they contained some form of clinical data, which I did not access to verify,” wrote Forbes.
The file he is referencing was at the path:
While he didn’t access the Infosys leak file, he did investigate what level of access the AWS credentials had, and discovered the key had full administrator access – “AWS’s “god mode’ policy, and it’s not a good idea to assign these to long-lived credentials issued to developers”.
Forbes said at this point he felt concerned and anxious to report the Infosys leak issue, but was unable to find an appropriate communications channel to do this. The Github user, seemingly from Infosys, who issued first the pull request, then a takedown request, also deleted their account.
At this point, Forbes took what he called the “sketchy” step of revoking the AWS key’s access, in lieu of any better options.
“One of the golden rules is to not touch anything you find: just document and report. Except in this case the key had been public for over a year, there seemed to be sensitive data there and the key also appeared to be a non-critical user key rather than a key for a system. I also didn’t really look into the account other than the bucket and the policies, there could have been a lot of other data there that could be exfiltrated,” Forbes noted.
See also: Microsoft data breach “BlueBleed” exposes 2.4TB of customer data
He described Infosys’s development practices on display as “pretty shocking in terms of security”. He told The Stack he had not received any communication from Infosys to date.
When The Stack approached the firm for comment on the Infosys leak, the company provided the statement below, saying the data exposed was “test data” and that there had been “no loss of data”.
Our follow-up questions asking if the test data contained actual patient information, and how Infosys had verified there had been no data exfiltration had not yet been answered at time of publication.
“There has been a recent blog published mentioning a security incident involving data compromise at Infosys. We would like to assure you that our infrastructure has not been impacted by this incident and there is no loss of data. The said instance was from an isolated cloud subscription, used for a proof of concept, and the data referred to in the blog was only test data. The referred instance and repository were disabled immediately.
“At Infosys we continuously monitor our IT landscape and have necessary protocols in place to safeguard our environment against any adverse incidents.”