DeepScan is a repository scanner. It has been developed to simplify the analysis of open source components you intend to use in a project for effective licenses and copied code fragments.
What you may expect from it
DeepScan will scan all text fragments inside the repository and identify all (known) license and - upon demand - copyright information. All findings will be collected and presented to you for further analysis. In the result screen you will find a structural overview of the repository and all files containing findings will be highlighted. A summary will help you to grasp the findings immediately, filters and indicators help you to navigate into the details. Deeplinks to the original source files allow you to jump directly to the files.
DeepScan will conduct a similarity analysis of license texts identified against the SPDX license data. Depending on the degree of similarity the files will be flagged. Thus identifying modified license files will become simple.
In addition you can be sure to leave behind the use of declared license and gain insight and understanding of effective licenses, that apply when using the component.
How it works
DeepScan takes a repository URL as input. Currently all sorts of git-repositories are supported. If you require support for a different repo type, let us know.
The URL will be posted to the task dispatcher. This will return a scan ID. This ID is your ticket to the results. The UI will display the ID. Internally the request will be queued and processed later on. Use the ID to return and check for the results. You also may enter a mail address to get an alert, when your processing request has been executed. The results will be packed and stored in S3. When you return and request the results, the UI will take them from S3 and display them.
During the processing the repository will be cloned. Then DeepScan assesses file by file. All sorts of text files will be analysed. This comprises also all comments in source code files. To see a detailed list of file types processed, refer to the CLI version provided at Github. There you will find the most recent list of files and programming languages supported.
Finally all files found containing relevant license data will be compared using a similarity analysis with the SPDX license data base. This allows to determine the effective license. If additional paragraphs will be found, the matching score reduces. Based on our experience, we suggest to asses all licenses with a score below 85%.
Given you have requested the copyright analysis, DeepScan also searches for all sorts of copyright data and collects these findings. Depending on the license this data might be relevant for further usage.
All files will be hashed. DeepScan uses the hash to compare with already scanned files and thus collects links between repositories. This is a very useful feature, given you are working with a language like C or C++, where you want to identified copied files across
Finally all findings are collected in a JSON file. Registered users will be able to use the UI to manipulate and download this file. This feature is currently not available in the public UI.
PLEASE NOTE: DeepScan public is not able to scan private repositories. To scan private repos please register an account with TrustSource. There are a free and trial plans. |
After the processing all data will be deleted (container killed).
Troubleshooting
In some cases issues arise while scanning. Please find here some help on trouble shooting:
SCAN ID NOT FOUND
Please verify the ID given. It should be a 32 alphanumeric string containing small letters and numbers. If it is not, you for sure are missing some part. Without the scan ID we will not be able to find the results. You will have to scan once more.
FAILED PROCESSING (HTTP STATUS CODE 404)
This is an indication that the given URL does not point to a valid git repository. Please verify that it points to a publicly accessible root URL of a git repository. If you continue to encounter access issues, please contact our support.
FAILED PROCESSING (HTTP STATUS CODE 403)
This is an indication that the given URL does not point to a public repository. Please verify that it points to a publicly accessible root URL of a git repository. If you continue to encounter access issues, please contact our support.
FAILED (invalid Content-Type: {txt/html,...}, ...)
This indicates that the given URL is not suitable to clone the repository. Most likely it is a URL that is used for presentation of repository contents in a user interface. Please try to find the correct URL that is required to really make a clone of the repository.
If you hand over the URL where you are watching the repository in your browser, e.g. https://git-wip-us.apache.org/repos/asf?p=bookkeeper.git, you would get such a reply. The URL DeepScan will require, would be in this case: https://gitbox.apache.org/repos/asf/bookkeeper.git .
Comments
0 comments
Please sign in to leave a comment.