With the ever-expanding use of digital web applications, sites, and services for branding, customer and user engagement, and internal functions like time tracking and employee communication, there is an equally increasing amount of data continuity loss. Businesses need to be on the alert.
Few business owners, employees, or even webmasters are aware of the expansive indexing ability of search bots used by the top search engines. It is easy to find companies that have saved private lead data, internal documents, and customers’ personal data to the web if you know how to use search modifiers.
For example, if you type the following into Google Search: +attendees +email ext:xlsx, you’re likely to find a ton of private lead data for a wide variety of industry expos and corporate meetings. How did that get there? Well, most of the time, someone was in a rush and uploaded their lead tracking data to the media or document folder of their website and simply forgot about it. This can also happen when your connected web application used for inputting this information is not saving your files to a non-indexed subfolder of your website.
It gets scarier. There are searches that allow anyone to find and access IoT devices like smart home cameras, or expose password lists saved to private Trello boards, as well as full backups of email inboxes or company Quickbooks backup files. Any savvy internet user could utilize this information as a foothold and systematically take over a business’s critical services, steal identities, or compromise customer data.
See What Data You are Showing
Using simple search modifiers, you can check to see what documents Google is indexing on your website
On the Google home page, enter this: site:yourdomain.com. You’re likely to come across the top pages for your specific site, but by adding an additional modifier like ext:pdf you will now see all of the PDF documents Google is indexing for your site. This works for an ever-expanding amount of file types like CSV, XLSX, DOCX, Txt, Zip.
Doing a search like site:yourdomain.com ext: (leaving the extension modifier blank) will often return a list of non-web file types.
If You Find Data Online that You Don’t Want Online, What Should You Do?
So, let’s say you have found a customer list with email and contact information open and indexable on your website, and you want to remove it from the search results. The steps you should take are as follow.
- Locate the file on your webserver
- Modify the file by changing or removing the data. Note: Do not simply delete the file
- Once you have changed or removed the offending data, go into your Webmaster Console settings for either Google: https://search.google.com/search-console or Bing: https://www.bing.com/toolbox/webmaster and select fetch and render
You may be wondering why you couldn’t simply delete the file and be done with it. Google has a handy feature called Google Cache that stores cached versions of various pages and files, like how archive.org’s Wayback Machine does. To illustrate this, I did a simple search +confidential ext:pdf.
For the sake of demonstrating, let’s pretend the Pacourts.us website realized that file 6574 (pictured here) is hanging out in the open and decides to delete it. The link may not work when you click on it directly from search but if you just click on the little gray arrow next to the URL, you’ll be able to access the cached version of that data and simply copy it. Conversely, if you update the file, you are forcing Google to recache the document. Once that is done, you are clear to delete the file without the risk of someone finding your data.
Protect Your Data Continuity Checklist
In addition to regularly changing your passwords, set a reminder to do data searches like the ones listed above against your company websites, and personal websites, often. Remind your team to do the same. We recommend you hire an outside security consultant or web development team to test your corporate systems for vulnerabilities. Keep in mind, the biggest tool used by hackers are “soft skills” and not computer skills. This means that with only small amounts of disparate data and a phone call, most hackers can gain access to your sensitive data.
Teach your employees to be mindful before giving out any information. Implementing a regularly scheduled, company-wide password update plan is key to keeping your data safe!
Bonus Tip: If you’re having trouble creating a strong password, try adding your childhood phone number to the beginning or end of the shorter password you might normally use. Better yet, learn a rhyming phrase and replace a few letters with numbers. These techniques will create passwords that are much harder to crack using software tools.