In today’s development landscape, there are numerous ways to use GenAI in software development. Stand-alone IDEs with built-in GenAI, such as Cursor AI and Windsurf, and plugins for Integrated Development Environments (IDEs) are available. Some popular IDE plug-ins include GitHub Copilot, Tabnine, Codeium, and Amazon Q. These tools are easy to use and promise significant productivity increases but can potentially breach company security policies.
Scenario
An automated process exports an Excel spreadsheet daily and places the file in an AWS S3 bucket. This spreadsheet contains sensitive data, including customer names, account numbers, addresses, order IDs, order dates, product SKUs and quantities, and credit card information. The file is descriptively named, such as “AcmeCompany_customer_sales_data_2025_02_20.xls.”
You are tasked with creating an AWS Lambda function in Python to ingest this file and insert the data into a MongoDB.
Your Thought Process
To build and test your Python utility, you might use a GenAI prompt like the following:
“Create a Python program that connects to an AWS system using the credentials username=xyz, password=abcde, retrieve the file in the AWS S3 Bucket named XYZBucket whose filename pattern matches AcmeCompany_customer_sales_data_2025_02_20.xls, read in the data from this file, convert it to JSON, and write it to a MongoDB collection named ‘daily_sales_data’ using the connection string ‘https://…/…’”.
The Problem
Great! You have generated a program that does exactly what you need. However, you have also shared Personally Identifiable Information (PII) and proprietary Payment Card Information (PCI) with the outside world. This action violates your company’s security protocols. It breaches several laws and regulations, such as the General Data Protection Regulation (GDPR) and the Payment Card Industry Data Security Standard (PCI DSS). Additionally, you have exposed details about your AWS system and MongoDB installation. This data is now potentially part of the AI model’s learning dataset, and the GenAI tool may share this exact data when another developer prompts it with the correct request.
Alternate Approach
You only need a few rows of random data for the GenAI tool to generate suitable code. Create an example test data Excel file with made-up or randomly generated names, account numbers, credit card numbers, etc., and a random file name. You can then prompt your GenAI tool separately with requests for individual pieces of the puzzle like:
- “Show me an example of how to connect to a MongoDB database from a Python program.”
- “How do I connect to an AWS S3 instance using the Boto3 library in a Python program?”
- “How do I open an Excel file in an S3 bucket and read the data?”
- “I need a method to read in the example file named ‘example.xls,’ convert it to JSON, and write it in a MongoDB collection named ‘test_data’.”
In all cases, omit all connection information and proprietary or protected data. Your GenAI tool will generate the code with placeholder comments like “your connection string here.” You may have some additional work to tie all of this into real-world code, but you haven’t exposed any protected information or system details to the world. Just because the GenAI tool can do everything you need doesn’t mean you should use it this way.
Future Considerations
The next wave of GenAI development tools will focus on looking at your entire codebase to suggest system-wide improvements to your code. This almost certainly opens the possibility of Intellectual Property exposure. In addition, credentials, connection strings, and passwords for at least a test system might exist in your codebase. Unless the GenAI tool is hosted locally within your company, the risks to intellectual property and security are extensive.
Guidelines for the use of GenAI
-
Data Privacy and Security
- Avoid Sharing Sensitive Data: Never input Personally Identifiable Information (PII), Payment Card Information (PCI), or any other sensitive data into GenAI tools. Use anonymized or synthetic data instead.
- Compliance with Regulations: Ensure that your use of GenAI complies with relevant data protection laws and regulations, such as GDPR, HIPAA, and PCI DSS.
-
Ethical Use
- Transparency: Be transparent about using GenAI in your projects. Inform stakeholders about how AI is being used and the data it processes.
- Bias and Fairness: Be aware of potential biases in AI models and strive to mitigate them. Ensure your AI solutions are fair and do not discriminate against any group.
-
Human Oversight
- Review Outputs: Always review and validate the outputs generated by GenAI tools. Do not rely solely on AI-generated content for critical decisions.
- Accountability: Take complete ownership of the results produced by GenAI tools. Ensure that there is human oversight in the decision-making process.
-
Security Best Practices
- Secure Development Practices: Follow secure software development practices, such as those outlined in the Secure Software Development Framework (SSDF). These include regular code reviews, vulnerability assessments, and secure coding standards1.
- Access Control: Implement strict access controls to ensure that only authorized personnel can use GenAI tools and access the data they process.
-
Continuous Monitoring and Improvement
- Monitor AI Systems: Continuously monitor the performance and behavior of AI systems to detect and address any issues promptly.
- Update and Improve: Regularly update AI models and tools to incorporate the latest security patches and improvements.
Software developers can leverage GenAI’s power by following these guidelines while ensuring these tools are used responsibly and securely.
Summary
It is easy to forget that, unless you are using a locally hosted GenAI tool, any data you submit as a prompt is not private. Your data is sent to the GenAI tool’s servers, parsed, potentially stored, and potentially shared with the next person who enters the correct prompt. You must constantly assess what you are giving the GenAI tool as a prompt to determine if it exposes sensitive data.
Similarly, you can use GenAI tools to improve your code or do a code review. However, you must be careful about what code you ask the GenAI tool to review. Does the code contain usernames or passwords? Connection strings to databases? Is the code identifiable for a specific purpose, or does it contain proprietary algorithms or intellectual property?
Exposure of proprietary or protected information or intellectual property to a GenAI tool could lead to disciplinary action, termination of employment, and legal action. If this data or code belonged to your customer, the consequences could be even worse, potentially leading to legal action and the cancellation of contracts worth millions to your company.
GenAI development tools are excellent and promise significant productivity increases. However, careful and diligent use of these tools is needed to mitigate potential risks to protected data and intellectual property.