rust-code-analysis
rust-code-analysis is a Rust library to analyze and extract information from source codes written in many different programming languages. It is based on a parser generator tool and an incremental parsing library called Tree Sitter.
You can find the source code of this software on GitHub, while issues and feature requests can be posted on the respective GitHub Issue Tracker.
Supported platforms
rust-code-analysis can run on the most common platforms: Linux, macOS, and Windows.
On our
GitHub Release Page
you can find the Linux
and Windows
binaries already compiled and
packed for you.
API docs
If you prefer to use rust-code-analysis as a crate, you can find the
API docs
generated by Rustdoc
here.
How to cite rust-code-analysis
@article{ARDITO2020100635,
title = {rust-code-analysis: A Rust library to analyze and extract maintainability information from source codes},
journal = {SoftwareX},
volume = {12},
pages = {100635},
year = {2020},
issn = {2352-7110},
doi = {https://doi.org/10.1016/j.softx.2020.100635},
url = {https://www.sciencedirect.com/science/article/pii/S2352711020303484},
author = {Luca Ardito and Luca Barbato and Marco Castelluccio and Riccardo Coppola and Calixte Denizet and Sylvestre Ledru and Michele Valsesia},
keywords = {Algorithm, Software metrics, Software maintainability, Software quality},
abstract = {The literature proposes many software metrics for evaluating the source code non-functional properties, such as its complexity and maintainability. The literature also proposes several tools to compute those properties on source codes developed with many different software languages. However, the Rust language emergence has not been paired by the community’s effort in developing parsers and tools able to compute metrics for the Rust source code. Also, metrics tools often fall short in providing immediate means of comparing maintainability metrics between different algorithms or coding languages. We hence introduce rust-code-analysis, a Rust library that allows the extraction of a set of eleven maintainability metrics for ten different languages, including Rust. rust-code-analysis, through the Abstract Syntax Tree (AST) of a source file, allows the inspection of the code structure, analyzing source code metrics at different levels of granularity, and finding code syntax errors before compiling time. The tool also offers a command-line interface that allows exporting the results in different formats. The possibility of analyzing source codes written in different programming languages enables simple and systematic comparisons between the metrics produced from different empirical and large-scale analysis sources.}
}
License
-
Mozilla-defined grammars are released under the MIT license.
-
rust-code-analysis, rust-code-analysis-cli and rust-code-analysis-web are released under the Mozilla Public License v2.0.
Supported Languages
This is the list of programming languages parsed by rust-code-analysis.
- [x] C++
- [ ] C#
- [ ] CSS
- [ ] Go
- [ ] HTML
- [ ] Java
- [x] JavaScript
- [x] The JavaScript used in Firefox internal
- [x] Python
- [x] Rust
- [x] Typescript
A check indicates which languages have metrics implemented.
Supported Metrics
rust-code-analysis implements a series of metrics
- CC: it calculates the code complexity examining the control flow of a program.
- SLOC: it counts the number of lines in a source file.
- PLOC: it counts the number of physical lines (instructions) contained in a source file.
- LLOC: it counts the number of logical lines (statements) contained in a source file.
- CLOC: it counts the number of comments in a source file.
- BLANK: it counts the number of blank lines in a source file.
- HALSTEAD: it is a suite that provides a series of information, such as the effort required to maintain the analyzed code, the size in bits to store the program, the difficulty to understand the code, an estimate of the number of bugs present in the codebase, and an estimate of the time needed to implement the software.
- MI: it is a suite that allows to evaluate the maintainability of a software.
- NOM: it counts the number of functions and closures in a file/trait/class.
- NEXITS: it counts the number of possible exit points from a method/function.
- NARGS: it counts the number of arguments of a function/method.
The metrics above are still NOT implemented for C#, CSS, Go, HTML, and Java languages.
Commands
With the term command, we define any procedure used by rust-code-analysis-cli to extract information from source codes. At each command may be associated parameters depending on the task it needs to carry out. In this page we have grouped the principal types of commands implemented in rust-code-analysis-cli.
Metrics
Metrics are a series of measures that can be used to:
- Compare different programming languages
- Provide information on the quality of a code
- Tell developers where their code is more tough to handle
- Discover errors earlier
rust-code-analysis calculates the metrics starting from the source code of a program. These kind of metrics are called static metrics.
Nodes
To represent the structure of program code, rust-code-analysis-cli builds an Abstract Syntax Tree (AST). A node is an element of this tree and denotes any syntactic construct present in a language.
Nodes can be used to:
- Create the syntactic structure of a source file
- Discover if a construct of a language is present in the analyzed code
- Count the number of constructs of a certain kind
- Detect errors i the source code
REST API
rust-code-analysis-cli can be run as a server which accepts requests sent
through REST API
.
The server receives in input the filename of a source code file and returns the
relative metrics formatted as a json
file.
Metrics
Metrics can be printed on screen or exported as different output formats through rust-code-analysis-cli.
Print metrics
For each function space, rust-code-analysis computes the list of metrics described above. At the end of this process, rust-code-analysis-cli dumps the result formatted in a certain way on the screen. The command used to print the metrics is the following one:
rust-code-analysis-cli -m -p /path/to/your/file/or/directory
The -p
option represents the path to a file or a directory. If a directory is
passed as input, rust-code-analysis-cli computes the metrics for each file
contained in it.
Export formats
Different output formats can be used to export metrics:
- Cbor
- Json
- Toml
- Yaml
Json
and Toml
can also be exported pretty-printed.
Export command
For example, if you want to export metrics as a json
file, run:
rust-code-analysis-cli -m -O json -o /output/path -p /path/to/your/file/or/directory
The -O
option allows you to choose the output format. It supports
only these values: cbor, json, toml, yaml.
The -o
option is used to specify the path where your file will be saved.
It accepts only paths. The filename of your output file is the same as
your input file plus the extension associated to the format. When this option
is not given, the output is printed on shell.
As we said before, Json
and Toml
can be exported as pretty-printed. To do
so, the --pr
option is used.
In the case below, the pretty-printed json
output will be printed on shell:
rust-code-analysis-cli -m -O json --pr -p /path/to/your/file/or/directory
Nodes
rust-code-analysis-cli allows to extract some information from the nodes which compose the Abstract Syntax Tree (AST) of a source code.
Find Errors
To know if there are some syntactic errors in your code, run:
rust-code-analysis-cli -p /path/to/your/file/or/directory -I "*.ext" -f -error
The -p
option represents the path to a file or a directory. If a directory is
passed as input, rust-code-analysis-cli computes the metrics for each file
contained in it.
The -I
option is a glob filter used to consider only the files written in
the language defined by the extension of the file.
The -f
option instead searches all nodes of a certain type.
In the case above, we are looking for all the erroneous nodes present in the
code.
Count Errors
It is also possible to count the number of nodes of a certain type using the
--count
option:
rust-code-analysis-cli -p /path/to/your/file/or/directory -I "*.ext" --count -error
Print AST
If you want to print the AST of a source code, run the following command:
rust-code-analysis-cli -p /path/to/your/file/or/directory -d
The -d
option prints the entire AST on the shell.
Code Splitting
Commands can be run on a single portion of the code using the --ls
and --le
options. The former represents the starting line of the code to be
considered, while the latter its ending line.
For example, if we want to print the AST of a single function which starts at
line 5 and ends at line 10, we need to launch this command:
rust-code-analysis-cli -p /path/to/your/file/or/directory -d --ls 5 --le 10
Rest API
It is possible to run rust-code-analysis-cli as a HTTP
service using
REST API
to share data between client and server.
We will use the port 9090
to show you the possible ways to
interact with the server.
Server
rust-code-analysis-cli can act as a server running on your localhost
at a specific port.
rust-code-analysis-cli --serve --port 9090
The --port
option sets the port used by the server. One possible value
could be 9090
.
Ping
If you want to ping the server, make a GET
request at this URL
:
http://127.0.0.1:9090/ping
Metrics
To get metrics formatted as a json
file, make a POST
request at this URL
:
http://127.0.0.1:9090/metrics?file_name={filename}&unit={unit}
The filename
parameter represents the path to the source file to be analyzed,
while unit
is a boolean value that can assume only 0
or 1
. The latter
tells rust-code-analysis-cli to consider only top-level metrics, while the
former returns detailed metrics for all classes, functions, nested functions,
and other sub-spaces.
Developers Guide
If you want to contribute to the development of rust-code-analysis
we have
summarized here a series of guidelines that are supposed to help you in your
building process.
As prerequisite, you need to install the last available version of Rust
.
You can learn how to do that
here.
Clone Repository
First of all, you need to clone the repository. You can do that:
through HTTPS
git clone -j8 https://github.com/mozilla/rust-code-analysis.git
or through SSH
git clone -j8 git@github.com:mozilla/rust-code-analysis.git
Building
To build the rust-code-analysis
library, you need to run the following
command:
cargo build
If you want to build the cli
:
cargo build -p rust-code-analysis-cli
If you want to build the web
server:
cargo build -p rust-code-analysis-web
If you want to build everything in one fell swoop:
cargo build --workspace
Testing
After you have finished changing the code, you should always verify whether
all tests pass with the cargo test
command.
cargo test --workspace --all-features --verbose
Code Formatting
If all previous steps went well, and you want to make a pull request to integrate your invaluable help in the codebase, the last step left is code formatting.
Rustfmt
This tool formats your code according to Rust style guidelines.
To install:
rustup component add rustfmt
To format the code:
cargo fmt
Clippy
This tool helps developers to write better code catching automatically lots of common mistakes for them. It detects in your code a series of errors and warnings that must be fixed before making a pull request.
To install:
rustup component add clippy
To detect errors and warnings:
cargo clippy --workspace --all-targets --
Code Documentation
If you have documented your code, to generate the final documentation, run this command:
cargo doc --open --no-deps
Remove the --no-deps
option if you also want to build the documentation of
each dependency used by rust-code-analysis.
Run your code
You can run rust-code-analysis-cli using:
cargo run -p rust-code-analysis-cli -- [rust-code-analysis-cli-parameters]
To know the list of rust-code-analysis-cli parameters, run:
cargo run -p rust-code-analysis-cli -- --help
You can run rust-code-analysis-web using:
cargo run -p rust-code-analysis-web -- [rust-code-analysis-web-parameters]
To know the list of rust-code-analysis-web parameters, run:
cargo run -p rust-code-analysis-web -- --help
Practical advice
- When you add a new feature, add at least one unit or integration test to verify that everything works correctly
- Document public API
- Do not add dead code
- Comment intricate code such that others can comprehend what you have accomplished
Supporting a new language
This section is to help developers implement support for a new language in rust-code-analysis
.
To implement a new language, two steps are required:
- Generate the grammar
- Add the grammar to
rust-code-analysis
A number of metrics are supported and help to implement those are covered elsewhere in the documentation.
Generating the grammar
As a prerequisite for adding a new grammar, there needs to exist a tree-sitter version for the desired language that matches the version used in this project.
The grammars are generated by a project in this repository called enums. The following steps add the language support from the language crate and generate an enum file that is then used as the grammar in this project to evaluate metrics.
- Add the language specific
tree-sitter
crate to theenum
crate, making sure to tie it to thetree-sitter
version used in theruse-code-analysis
crate. For example, for the Rust support at time of writing the following line exists in the /enums/Cargo.toml:tree-sitter-rust = "version number"
. - Append the language to the
enum
crate in /enums/src/languages.rs. Keeping with Rust as the example, the line would be(Rust, tree_sitter_rust)
. The first parameter is the name of the Rust enum that will be generated, the second is thetree-sitter
function to call to get the language's grammar. - Add a case to the end of the match in
mk_get_language
macro rule in /enums/src/macros.rs eg. for RustLang::Rust => tree_sitter_rust::language()
. - Lastly, we execute the /recreate-grammars.sh script that runs the
enums
crate to generate the grammar for the new language.
At this point we should have a new grammar file for the new language in /src/languages/. See /src/languages/language_rust.rs as an example of the generated enum.
Adding the new grammar to rust-code-analysis
- Add the language specific
tree-sitter
crate to therust-code-analysis
project, making sure to tie it to thetree-sitter
version used in this project. For example, for the Rust support at time of writing the following line exists in the Cargo.toml:tree-sitter-rust = "0.19.0"
. - Next we add the new
tree-sitter
language namespace to /src/languages/mod.rs eg.
# #![allow(unused_variables)] #fn main() { pub mod language_rust; pub use language_rust::*; #}
- Lastly, we add a definition of the language to the arguments of
mk_langs!
macro in /src/langs.rs.
# #![allow(unused_variables)] #fn main() { // 1) Name for enum // 2) Language description // 3) Display name // 4) Empty struct name to implement // 5) Parser name // 6) tree-sitter function to call to get a Language // 7) file extensions // 8) emacs modes ( Rust, "The `Rust` language", "rust", RustCode, RustParser, tree_sitter_rust, [rs], ["rust"] ) #}
Lines of Code (LoC)
In this document we give some guidance on how to implement the LoC metrics available in this crate. Lines of code is a software metric that gives an indication of the size of some source code by counting the lines of the source code. There are many types of LoC so we will first explain those by way of an example.
Types of LoC
# #![allow(unused_variables)] #fn main() { /* Instruction: Implement factorial function For extra credits, do not use mutable state or a imperative loop like `for` or `while`. */ /// Factorial: n! = n*(n-1)*(n-2)*(n-3)...3*2*1 fn factorial(num: u64) -> u64 { // use `product` on `Iterator` (1..=num).product() } #}
The example above will be used to illustrate each of the LoC metrics described below.
SLOC
A straight count of all lines in the file including code, comments, and blank lines.
METRIC VALUE: 11
PLOC
A count of the instruction lines of code contained in the source code. This would include any brackets or similar syntax on a new line.
Note that comments and blank lines are not counted in this.
METRIC VALUE: 3
LLOC
The "logical" lines is a count of the number of statements in the code. Note that what a statement is depends on the language.
In the above example there is only a single statement which id the function call of product
with the Iterator
as its argument.
METRIC VALUE: 1
CLOC
A count of the comments in the code. The type of comment does not matter ie single line, block, or doc.
METRIC VALUE: 6
BLANK
Last but not least, this metric counts the blank lines present in a code. METRIC VALUE: 2
Implementation
To implement the LoC related metrics described above you need to implement the Loc
trait for the language you want to support.
This requires implementing the compute
function.
See /src/metrics/loc.rs for where to implement, as well as examples from other languages.