720-891-1663

Return to the list of client alerts

AI Training Data Leaks Passwords, API Keys

Some people are worried about the vacuuming of huge amounts of data to train AI models; other people are not concerned. Here is an interesting problem that I had not considered.

Researchers discovered nearly 12,000 private API keys in one publicly available training dataset. One from a single database out of hundreds of possible databases.

The compromised data came from crawling billions of web pages included API keys, passwords and other login credentials. The majority were for Amazon web services (AWS), but it included other cloud services as well.

This particular dataset is about 250 petabytes in size and grows by multiple petabytes every month.

The researchers only analyzed a portion of this data due to the size, so the numbers are actually much worse than it appears.

And, unlike some other studies, the researchers verified that the login credential worked.

So, if you say it doesn’t matter to you because … you don’t use AI, you don’t give people training data, whatever, you are discounting the problem.

IF you run a web site there is nothing to stop hackers or anyone else from crawling the pages in those sites and if there are secrets in those pages, well, they aren’t secret anymore.

Need assistance? Please contact us. Credit: Computing