Sourcery is a quality assurance artificial intelligence-based tool you can leverage during software and systems development and integration cycles. At its core, Sourcery is a refactoring engine. It scans Python code and makes suggestions for improvement. Sourcery’s AI engine can propose changes that go far beyond outdated static code analysis tools that only flag programming errors, bugs, stylistic errors, and suspicious constructs.
Sourcery’s DevSecOps Position in DevSecOps Phases
This blog post covers Sourcery’s integration with Visual Studio Code and Python, two standard tools included in security professionals’ development stacks. Sourcery is available at https://sourcery.ai and the Microsoft Visual Studio Code Extensions tab.
To demonstrate Sourcery’s capabilities, we will look at two refactoring examples (one simple and one more complex) of a Python script snippet based on Sourcery’s recommendations. We will work with chksys.py, a Python conceptual script, to assess how well a Linux host complies with security technical controls applied during build and integration. Our examples will focus on the FIPS 140-2/3 assessment queue method within chksys.py called fips_queue. FIPS 140-2/3 testing validates cryptographic modules that encrypt data at rest and in transit.
Here’s how the logic behind fips_queue method works:
- Determine assessments (tests) that apply to all FIPS-compliant hosts (“Common”).
- Establish dependencies (“Dependencies”) that help ensure compliance.
- Uncover any additional installed software with FIPS requirements (“Additions”). Examples include Virtual Private Network (VPN), Network File System (NFS), and Open Secure Sockets Layer (OpenSSL).
- Decide on operating system-specific tests (”OS”). We used Ubuntu, Red Hat, Oracle, and Amazon Linux distributions.
Simple Refactoring Example: Improving Code Quality
The FIPS database, found at Cryptographic Module Validation Program | CSRC (nist.gov), lists the versions of Linux certified to support FIPS 140-2/3.
In our simple refactoring example, Sourcery brought an immediate code quality lift. Integrating Sourcery with the Visual Studio Integrated Development Environment (IDE) found an issue in the FIPS 140-2/3 assessment queue, fips_queue, before we attempted to run the code or introduce new logic. Sourcery’s refactoring suggestion showed up in the IDE and allowed us to refactor at once without any functional changes to the code, pushing this minimal risk change far left in the development lifecycle. Sourcery showed us this low-hanging fruit in the Python script, indicated by the squiggly yellow line shown here:
By mousing over the line, we saw the following:
Sourcery recommended we change Python’s ‘or’ operator on line 57 in the script to Python’s ‘in’ operator—an optimized version of ‘or’ without the added complexity of else or else if statements. We implemented the change by removing line 57:
– elifself.os == “RHEL” or self.os ==”Oracle”
and adding line 58:
+ elifself.os in [“RHEL”, “ORACLE”]
(Note: We purposely kept this example uncomplicated but please be aware that it is common for cybersecurity tests to go through an entire raft of OS versions based on the upstream provider. For example, Red Hat-derived distributions could include 4-10 distributions in a large enterprise network).
Code Metrics
Sourcery reports Code Metrics in real-time with a mouse-over of the method def header in the source code. (For a deep dive into the Algorithms, check out Episode 266 of Talk Python, linked below in the References section). Before making the change, Sourcery supplied the following Code Metrics for the fips_queue method:
Here are what the metrics mean and how to interpret them:
Sourcery Code Metric | What the Metric Means | How to Interpret |
---|---|---|
Complexity | Measures the number of flow-breaking structures like logical operators, loops, and conditionals. A high complexity score can indicate a piece of code may have too much branching logic. | A lower # is better. |
Size | Measures the length of the method and is another simple heuristic to assess complexity. Longer methods can be broken up into smaller ones to simplify the code. | A lower # is better. |
Working Memory | Measures the number of variables you must keep up with while reading and understanding the code. High working memory implies the code may have too many “moving parts” and could be simplified. | A lower # is better. |
Quality Score | Quality Score is an overall metric for the code under review. The lower the Quality Score, the more difficult the code is to understand, leading to additional effort when debugging or adding capabilities. | A higher # is better1. |
After making the change to use the “in” operator, all the Code Metrics improved. For example, the Quality Score rose four percentage points from 54% to 58%, while the Complexity rating dropped from 13 to 12. Working Memory went from 9 to 8, and Size went from 133 to 129. Addressing the Size metric is the subject of our second example.
1Sourcery categorizes a Quality Score of < 25% as concerning. However, a quality score > 25% can still be a strong indicator for code consideration during refactoring exercises to ease sustainment or clean-up before adding new capabilities. Experienced staff can achieve 80% or better scores on new methods, and even junior team members can reach 60-70% if supported by style guidelines and senior staff.
More Complex Refactoring Example: Addressing Size
Sourcery identifies the Size of the fips_queue method as a concern, and recommends splitting the fips_queue into smaller methods. However, breaking up the fips_queue requires more thought and consideration because it is a riskier change than simply using the ‘in’ check in the simple refactoring example.
The easiest way to split the fips_queue into smaller methods to start separating the logic based on Common, Dependencies, Additions, and OS logic elements.
Call Out: Best practices opportunity: When refactoring for size-related complexity, unit testing for each new method is much easier to implement as the code is highly simplified and easier to assess while drastically improving code coverage testing.
In this example, we are going to refactor in the order of the logic, from top to bottom. For production environments, the refactoring should follow the standard procedures, typically based on risk-to-reward ratios.
The new Common tests portion, now named fips_queue_common, is shown in the following figure by Sourcery and Visual Studio Code:
Note the above screenshot shows the separation of the Common tests (lines 34-41) into fips_queue_common, the changes to fips_queue as well as the new Sourcery Code Metrics. The Code Metrics reflect two improvements: a Quality Score increase from 59% to 61%, and a Size decrease from 129 to 104.
After refactoring the Dependencies, Additions, and OS functions into methods, Sourcery reports a Quality Score of 94% as well as dramatic improvements in Complexity, Size, and Working Memory. A view of the screenshot also shows the code is much easier to read because functional activities map directly to the name of the methods.
It is important to recognize that refactoring could introduce “hidden” or “moving” complexity into other methods. To eliminate this possibility, Sourcery scores every new method automatically when moving code into additional methods and makes inadvertently hidden or moving issues obvious. In this case, all the new methods scored in the nineties. (The total number of lines for the fips_queue increased the line count from 42 to 50 when counting the new methods. But the eight additional lines are comments and method headers, i.e., constructs of Python, not an artifact of refactoring).
Summary
This article covered Sourcery in Visual Studio Code, providing an immediate lift to your DevSecOps. If there is a fit for Sourcery, you should consider integrating its tooling into your DevSecOps to immediately improve your software and script readability, and improve code quality. The figure Before and After Sourcery shows your refactoring optimization as you push your DevSecOps left.
Before and After Sourcery
These Sourcery based improvements result in fewer user impacting bugs and help reduce response time when bugs do occur.
T-Rex’s cybersecurity engineers and architects work with agencies to enhance their DevSecOps, Continuous Integration and Development (CI/CD), and Software Factory (SF) postures through a tailored strategy that meets their process, compliance, and security needs. The agencies benefit in faster deployments, improved technical controls coverage, and reductions in manual processes from the tailored approach and our assistance in implementation. Want to learn more on how we can assist you with your DevSecOps, CI/CD, and SF endeavors? Contact us at cybersecurity@trexsolutionsllc.com.
References:
RedHat: How can I make RHEL FIPS compliant? – Red Hat Customer Portal
DISA STIG Tooling: SRG / STIG Tools – DoD Cyber Exchange
Cryptographic Module Validation Program | CSRC (nist.gov)
VSCode – Sourcery Documentation
Episode #266 Refactoring your code, like magic with Sourcery – [Talk Python To Me Podcast]