rust-code-analysis
rust-code-analysis is a Rust library to analyze and extract information from source codes written in many different programming languages. It is based on a parser generator tool and an incremental parsing library called Tree Sitter.
You can find the source code of this software on GitHub, while issues and feature requests can be posted on the respective GitHub Issue Tracker.
Supported platforms
rust-code-analysis can run on the most common platforms: Linux, macOS, and Windows.
On our
GitHub Release Page
you can find the Linux
and Windows
binaries already compiled and
packed for you.
API docs
If you prefer to use rust-code-analysis as a crate, you can find the
API docs
generated by Rustdoc
here.
How to cite rust-code-analysis
@article{ARDITO2020100635,
title = {rust-code-analysis: A Rust library to analyze and extract maintainability information from source codes},
journal = {SoftwareX},
volume = {12},
pages = {100635},
year = {2020},
issn = {2352-7110},
doi = {https://doi.org/10.1016/j.softx.2020.100635},
url = {https://www.sciencedirect.com/science/article/pii/S2352711020303484},
author = {Luca Ardito and Luca Barbato and Marco Castelluccio and Riccardo Coppola and Calixte Denizet and Sylvestre Ledru and Michele Valsesia},
keywords = {Algorithm, Software metrics, Software maintainability, Software quality},
abstract = {The literature proposes many software metrics for evaluating the source code non-functional properties, such as its complexity and maintainability. The literature also proposes several tools to compute those properties on source codes developed with many different software languages. However, the Rust language emergence has not been paired by the community’s effort in developing parsers and tools able to compute metrics for the Rust source code. Also, metrics tools often fall short in providing immediate means of comparing maintainability metrics between different algorithms or coding languages. We hence introduce rust-code-analysis, a Rust library that allows the extraction of a set of eleven maintainability metrics for ten different languages, including Rust. rust-code-analysis, through the Abstract Syntax Tree (AST) of a source file, allows the inspection of the code structure, analyzing source code metrics at different levels of granularity, and finding code syntax errors before compiling time. The tool also offers a command-line interface that allows exporting the results in different formats. The possibility of analyzing source codes written in different programming languages enables simple and systematic comparisons between the metrics produced from different empirical and large-scale analysis sources.}
}
License
-
Mozilla-defined grammars are released under the MIT license.
-
rust-code-analysis, rust-code-analysis-cli and rust-code-analysis-web are released under the Mozilla Public License v2.0.
Supported Languages
This is the list of programming languages parsed by rust-code-analysis.
- C
- C++
- Mozcpp
- Ccomment
- Preproc
- Java
- JavaScript
- Mozjs
- Python
- Rust
- Typescript
Supported Metrics
rust-code-analysis implements a series of metrics:
- ABC: it measures the size of a source code by counting the number of
Assignments (
A
), Branches (B
) and Conditions (C
). - BLANK: it counts the number of blank lines in a source file.
- CC: it calculates the Cyclomatic complexity examining the control flow of a program.
- CLOC: it counts the number of comments in a source file.
- COGNITIVE: it calculates the Cognitive complexity, measuring how complex it is to understand a unit of code.
- HALSTEAD: it is a suite that provides a series of information, such as the effort required to maintain the analyzed code, the size in bits to store the program, the difficulty to understand the code, an estimate of the number of bugs present in the codebase, and an estimate of the time needed to implement the software.
- LLOC: it counts the number of logical lines (statements) contained in a source file.
- MI: it is a suite that allows to evaluate the maintainability of a software.
- NARGS: it counts the number of arguments of a function/method.
- NEXITS: it counts the number of possible exit points from a method/function.
- NOM: it counts the number of functions and closures in a file/trait/class.
- NPA: it counts the number of public attributes in classes/interfaces.
- NPM: it counts the number of public methods in classes/interfaces.
- PLOC: it counts the number of physical lines (instructions) contained in a source file.
- SLOC: it counts the number of lines in a source file.
- WMC: it sums the Cyclomatic complexity of every method defined in a class.
Commands
rust-code-analysis-cli offers a range of commands to analyze and extract information from source code. Each command may include parameters specific to the task it performs. Below, we describe the core types of commands available in rust-code-analysis-cli.
Metrics
Metrics provide quantitative measures about source code, which can help in:
- Compare different programming languages
- Provide information on the quality of a code
- Tell developers where their code is more tough to handle
- Discovering potential issues early in the development process
rust-code-analysis calculates the metrics starting from the source code of a program. These kind of metrics are called static metrics.
Nodes
To represent the structure of program code, rust-code-analysis-cli builds an Abstract Syntax Tree (AST). A node is an element of this tree and denotes any syntactic construct present in a language.
Nodes can be used to:
- Create the syntactic structure of a source file
- Discover if a construct of a language is present in the analyzed code
- Count the number of constructs of a certain kind
- Detect errors i the source code
REST API
rust-code-analysis-web runs a server offering a REST API. This allows users to send source code via HTTP and receive corresponding metrics in JSON
format.
Metrics
Metrics can be displayed or exported in various formats using rust-code-analysis-cli.
Display Metrics
To compute and display metrics for a given file or directory, run:
rust-code-analysis-cli -m -p /path/to/your/file/or/directory
-p
: Path to the file or directory to analyze. If a directory is provided, metrics will be computed for all supported files it contains.
Exporting Metrics
rust-code-analysis-cli supports multiple output formats for exporting metrics, including:
- CBOR
- JSON
- TOML
- YAML
Both JSON
and TOML
can be exported as pretty-printed.
Export Command
To export metrics as a JSON file:
rust-code-analysis-cli -m -p /path/to/your/file/or/directory -O json -o /path/to/output/directory
-O
: Specifies the output format (e.g., json, toml, yaml, cbor).-o
: Path to save the output file. The filename of the output file is the same as the input file plus the extension associated to the format. If not specified, the result will be printed in the shell.
Pretty Print
To output pretty-printed JSON metrics:
rust-code-analysis-cli -m -p /path/to/your/file/or/directory --pr -O json
This command prints the formatted metrics to the console or the specified output path.
Nodes
The rust-code-analysis-cli
provides commands to analyze and extract information from the nodes in the Abstract Syntax Tree (AST) of a source file.
Error Detection
To detect syntactic errors in your code, run:
rust-code-analysis-cli -p /path/to/your/file/or/directory -I "*.ext" -f error
-p
: Path to a file or directory (analyzes all files in the directory).-I
: Glob filter for selecting files by extension (e.g.,*.js
,*.rs
).-f
: Flag to search for nodes of a specific type (e.g., errors).
Counting Nodes
You can count the number of specific node types in your code by using the --count
flag:
rust-code-analysis-cli -p /path/to/your/file/or/directory -I "*.ext" --count <NODE_TYPE>
This counts how many nodes of the specified type exist in the analyzed files.
Printing the AST
To visualize the AST of a source file, use the -d
flag:
rust-code-analysis-cli -p /path/to/your/file/or/directory -d
The -d
flag prints the entire AST, allowing you to inspect the code's syntactic structure.
Analyzing Code Portions
To analyze only a specific part of the code, use the --ls
(line start) and --le
(line end) options.
For example, if we want to print the AST of a single function which starts at line 5 and ends at line 10:
rust-code-analysis-cli -p /path/to/your/file/or/directory -d --ls 5 --le 10
Rest API
rust-code-analysis-web is a web server that allows users to analyze source code through a REST API. This service is useful for anyone looking to perform code analysis over HTTP.
The server can be run on any host and port, and supports the following main functionalities:
- Remove Comments from source code.
- Retrieve Function Spans for given code.
- Compute Metrics for the provided source code.
Running the Server
To run the server, you can use the following command:
rust-code-analysis-web --host 127.0.0.1 --port 9090
--host
specifies the IP address where the server should run (default is 127.0.0.1).--port
specifies the port to be used (default is 8080).-j
specifies the number of parallel jobs (optional).
Endpoints
1. Ping the Server
Use this endpoint to check if the server is running.
Request:
GET http://127.0.0.1:8080/ping
Response:
- Status Code:
200 OK
- Body:
{
"message": "pong"
}
2. Remove Comments
This endpoint removes comments from the provided source code.
Request:
POST http://127.0.0.1:8080/comments
Payload:
{
"id": "unique-id",
"file_name": "filename.ext",
"code": "source code with comments"
}
id
: A unique identifier for the request.file_name
: The name of the file being analyzed.code
: The source code with comments.
Response:
{
"id": "unique-id",
"code": "source code without comments"
}
3. Retrieve Function Spans
This endpoint retrieves the spans of functions in the provided source code.
Request:
POST http://127.0.0.1:8080/functions
Payload:
{
"id": "unique-id",
"file_name": "filename.ext",
"code": "source code with functions"
}
id
: A unique identifier for the request.file_name
: The name of the file being analyzed.code
: The source code with functions.
Response:
{
"id": "unique-id",
"spans": [
{
"name": "function_name",
"start_line": 1,
"end_line": 10
}
]
}
4. Compute Metrics
This endpoint computes various metrics for the provided source code.
Request:
POST http://127.0.0.1:8080/metrics
Payload:
{
"id": "unique-id",
"file_name": "filename.ext",
"code": "source code for metrics"
"unit": false
}
id
: Unique identifier for the request.file_name
: The filename of the source code file.code
: The source code to analyze.unit
: A boolean value.true
to compute only top-level metrics,false
for detailed metrics across all units (functions, classes, etc.).
Response:
{
"id": "unique-id",
"language": "Rust",
"spaces": {
"metrics": {
"cyclomatic_complexity": 5,
"lines_of_code": 100,
"function_count": 10
}
}
}
Developers Guide
If you want to contribute to the development of rust-code-analysis
we have
summarized here a series of guidelines that are supposed to help you in your
building process.
As prerequisite, you need to install the last available version of Rust
.
You can learn how to do that
here.
Clone Repository
First of all, you need to clone the repository. You can do that:
through HTTPS
git clone -j8 https://github.com/mozilla/rust-code-analysis.git
or through SSH
git clone -j8 git@github.com:mozilla/rust-code-analysis.git
Building
To build the rust-code-analysis
library, you need to run the following
command:
cargo build
If you want to build the cli
:
cargo build -p rust-code-analysis-cli
If you want to build the web
server:
cargo build -p rust-code-analysis-web
If you want to build everything in one fell swoop:
cargo build --workspace
Testing
After you have finished changing the code, you should always verify whether
all tests pass with the cargo test
command.
cargo test --workspace --all-features --verbose
Code Formatting
If all previous steps went well, and you want to make a pull request to integrate your invaluable help in the codebase, the last step left is code formatting.
Rustfmt
This tool formats your code according to Rust style guidelines.
To install:
rustup component add rustfmt
To format the code:
cargo fmt
Clippy
This tool helps developers to write better code catching automatically lots of common mistakes for them. It detects in your code a series of errors and warnings that must be fixed before making a pull request.
To install:
rustup component add clippy
To detect errors and warnings:
cargo clippy --workspace --all-targets --
Code Documentation
If you have documented your code, to generate the final documentation, run this command:
cargo doc --open --no-deps
Remove the --no-deps
option if you also want to build the documentation of
each dependency used by rust-code-analysis.
Run your code
You can run rust-code-analysis-cli using:
cargo run -p rust-code-analysis-cli -- [rust-code-analysis-cli-parameters]
To know the list of rust-code-analysis-cli parameters, run:
cargo run -p rust-code-analysis-cli -- --help
You can run rust-code-analysis-web using:
cargo run -p rust-code-analysis-web -- [rust-code-analysis-web-parameters]
To know the list of rust-code-analysis-web parameters, run:
cargo run -p rust-code-analysis-web -- --help
Practical advice
- When you add a new feature, add at least one unit or integration test to verify that everything works correctly
- Document public API
- Do not add dead code
- Comment intricate code such that others can comprehend what you have accomplished
Supporting a new language
This section is to help developers implement support for a new language in rust-code-analysis
.
To implement a new language, two steps are required:
- Generate the grammar
- Add the grammar to
rust-code-analysis
A number of metrics are supported and help to implement those are covered elsewhere in the documentation.
Generating the grammar
As a prerequisite for adding a new grammar, there needs to exist a tree-sitter version for the desired language that matches the version used in this project.
The grammars are generated by a project in this repository called enums. The following steps add the language support from the language crate and generate an enum file that is then used as the grammar in this project to evaluate metrics.
- Add the language specific
tree-sitter
crate to theenum
crate, making sure to tie it to thetree-sitter
version used in theruse-code-analysis
crate. For example, for the Rust support at time of writing the following line exists in the /enums/Cargo.toml:tree-sitter-rust = "version number"
. - Append the language to the
enum
crate in /enums/src/languages.rs. Keeping with Rust as the example, the line would be(Rust, tree_sitter_rust)
. The first parameter is the name of the Rust enum that will be generated, the second is thetree-sitter
function to call to get the language's grammar. - Add a case to the end of the match in
mk_get_language
macro rule in /enums/src/macros.rs eg. for RustLang::Rust => tree_sitter_rust::language()
. - Lastly, we execute the /recreate-grammars.sh script that runs the
enums
crate to generate the grammar for the new language.
At this point we should have a new grammar file for the new language in /src/languages/. See /src/languages/language_rust.rs as an example of the generated enum.
Adding the new grammar to rust-code-analysis
- Add the language specific
tree-sitter
crate to therust-code-analysis
project, making sure to tie it to thetree-sitter
version used in this project. For example, for the Rust support at time of writing the following line exists in the Cargo.toml:tree-sitter-rust = "0.19.0"
. - Next we add the new
tree-sitter
language namespace to /src/languages/mod.rs eg.
- Lastly, we add a definition of the language to the arguments of
mk_langs!
macro in /src/langs.rs.
Lines of Code (LoC)
In this document we give some guidance on how to implement the LoC metrics available in this crate. Lines of code is a software metric that gives an indication of the size of some source code by counting the lines of the source code. There are many types of LoC so we will first explain those by way of an example.
Types of LoC
The example above will be used to illustrate each of the LoC metrics described below.
SLOC
A straight count of all lines in the file including code, comments, and blank lines.
METRIC VALUE: 11
PLOC
A count of the instruction lines of code contained in the source code. This would include any brackets or similar syntax on a new line.
Note that comments and blank lines are not counted in this.
METRIC VALUE: 3
LLOC
The "logical" lines is a count of the number of statements in the code. Note that what a statement is depends on the language.
In the above example there is only a single statement which id the function call of product
with the Iterator
as its argument.
METRIC VALUE: 1
CLOC
A count of the comments in the code. The type of comment does not matter ie single line, block, or doc.
METRIC VALUE: 6
BLANK
Last but not least, this metric counts the blank lines present in a code. METRIC VALUE: 2
Implementation
To implement the LoC related metrics described above you need to implement the Loc
trait for the language you want to support.
This requires implementing the compute
function.
See /src/metrics/loc.rs for where to implement, as well as examples from other languages.
Update grammars
Each programming language needs to be parsed in order to extract its syntax and semantic: the so-called grammar of a language.
In rust-code-analysis
, we use tree-sitter as parsing library since it provides a set of distinct grammars for each of our
supported programming languages. But a grammar is not a static monolith, it changes over time, and it can also be affected by bugs,
hence it is necessary to update it every now and then.
As now, since we have used bash
scripts to automate the operations, grammars can be updated natively only on Linux
and MacOS
systems, but these scripts can also run on Windows
using WSL
.
In rust-code-analysis
we use both third-party and internal grammars.
The first ones are published on crates.io
and maintained by external developers,
while the second ones have been thought and defined inside the project to manage variant of some languages
used in Firefox
.
We are going to explain how to update both of them in the following sections.
Third-party grammars
Update the grammar version in Cargo.toml
and enums/Cargo.toml
. Below an example for the tree-sitter-java
grammar
tree-sitter-java = "x.xx.x"
where x
represents a digit.
Run ./recreate-grammars.sh
to recreate and refresh all grammars structures and data
./recreate-grammars.sh
Once the script above has finished its execution, you need to fix, if there are any, all failed tests and problems introduced by changes in the grammars.
Commit your changes and create a new pull request
Internal grammars
Update the version of tree-sitter-cli
in the package.json
file of the internal grammar and then install the updated version.
Update dependency version
field in Cargo.toml
and enums/Cargo.toml
. Below an example for the tree-sitter-ccomment
grammar
tree-sitter-ccomment = { path = "./tree-sitter-ccomment", version = "=x.xx.x" }
where x
represents a digit.
Open the Cargo.toml
file of the chosen grammar and:
- Set its version to the same value present in the main
Cargo.toml
file - Increase the
tree-sitter
version to the most recent one
Run the appropriate script to update the grammar by recreating and refreshing every file and script.
For tree-sitter-ccomment
and tree-sitter-preproc
run ./generate-grammars/generate-grammar.sh
followed by the name of the grammar.
Below an example always using the tree-sitter-ccomment
grammar
./generate-grammars/generate-grammar.sh tree-sitter-ccomment
Instead, for tree-sitter-mozcpp
and tree-sitter-mozjs
, use their specific scripts.
For tree-sitter-mozcpp
, run
./generate-grammars/generate-mozcpp.sh
For tree-sitter-mozjs
, run
./generate-grammars/generate-mozjs.sh
Once the script above has finished its execution, you need to fix, if there are any, all failed tests and problems introduced by changes in the grammars.
Commit your changes and create a new pull request