This presentation was given by Paolo Boldi, Milano University, online.
Abstract:The goal of the EU project FASTEN is being able to perform a more sophisticated analysis of security-vulnerability propagation, licensing compliance, and dependency risk profiles (among others) by relying on the call-level dependency network of the whole software ecosystem. We outline the purpose and structure of the project, and present some preliminary results.
7. Sharing through software libraries
❖ One form of sharing is providing libraries
❖ Today, libraries are made available in the Internet
8. Sharing through software libraries
❖ One form of sharing is providing libraries
❖ Today, libraries are made available in the Internet
❖ on repositories (SourceForge, GitHub, BitBucket, …)
9. Sharing through software libraries
❖ One form of sharing is providing libraries
❖ Today, libraries are made available in the Internet
❖ on repositories (SourceForge, GitHub, BitBucket, …)
❖ or forges (Maven, PyPi, CPAN, …)
10. Sharing through software libraries
❖ One form of sharing is providing libraries
❖ Today, libraries are made available in the Internet
❖ on repositories (SourceForge, GitHub, BitBucket, …)
❖ or forges (Maven, PyPi, CPAN, …)
❖ Internet made the dream of collaborative development a
reality
12. Industrial revolution
at the harbour of software development
❖ All trades, arts, and handiworks have gained by
division of labour, namely, when, instead of one
man doing everything, each con
fi
nes himself to a
certain kind of work distinct from others in the
treatment it requires, so as to be able to perform it
with greater facility and in the greatest
perfection. Where the different kinds of work are
not distinguished and divided, where everyone is
a jack-of-all-trades, there manufactures remain
still in the greatest barbarism.
Immanuel Kan
t
Groundwork for the Metaphysic
s
of Morals (1785)
24. Dependency graphs
❖ Library+versions and their
dependencies form (complex,
huge) dependency networks
❖ Version constraints make these
networks more complicated
than simple graphs
25. Dependency graphs
❖ Library+versions and their
dependencies form (complex,
huge) dependency networks
❖ Version constraints make these
networks more complicated
than simple graphs
❖ Package manager will
fi
nally
determine which version is
chosen for each library
32. Recent dependency nightmares
❖ The leftpad incident (2016): millions of websites
affected
❖ The Equifax breach (2017): costed 4B$
33. Epidemics in dependency graphs
Lib A, vers 1.0
Lib B, vers 2.5
Lib C, vers 1.5Lib D, vers 3.0
34. Epidemics in dependency graphs
Lib A, vers 1.0
Lib B, vers 2.5
Lib C, vers 1.5Lib D, vers 3.0
A vulnerability aler
t
is issue
d
about Lib D, vers 3.0
35. Epidemics in dependency graphs
Lib A, vers 1.0
Lib B, vers 2.5
Lib C, vers 1.5Lib D, vers 3.0
A vulnerability aler
t
is issue
d
about Lib D, vers 3.0
All libraries in this
graph are infected!
40. Epidemics in dependency graphs
A.f0
A.f2
A.f3
B.f1
B.f2
B.f3
C.f1
C.f2
D.f1
D.f2
D.f3
A vulnerability aler
t
is issue
d
about Lib D, vers 3.0
,
function f3
41. Epidemics in dependency graphs
A.f0
A.f2
A.f3
B.f1
B.f2
B.f3
C.f1
C.f2
D.f1
D.f2
D.f3
A vulnerability aler
t
is issue
d
about Lib D, vers 3.0
,
function f3
42. Epidemics in dependency graphs
A.f0
A.f2
A.f3
B.f1
B.f2
B.f3
C.f1
C.f2
D.f1
D.f2
D.f3
A vulnerability aler
t
is issue
d
about Lib D, vers 3.0
,
function f3
Much more informative!
44. Examples
❖ Fully precise change impact analysis: “How many libraries
are affected if I remove/modify a certain method/interface?”
45. Examples
❖ Fully precise change impact analysis: “How many libraries
are affected if I remove/modify a certain method/interface?”
❖ Fully precise license compliance: “Is my library compliant
with the licenses of the libraries that I depend from (directly or
indirectly)? (e.g., am I linking any GPL code?)”
46. Examples
❖ Fully precise change impact analysis: “How many libraries
are affected if I remove/modify a certain method/interface?”
❖ Fully precise license compliance: “Is my library compliant
with the licenses of the libraries that I depend from (directly or
indirectly)? (e.g., am I linking any GPL code?)”
❖ Fully precise risk pro
fi
ling: “Does this vulnerability affect my
code?”
47. Examples
❖ Fully precise change impact analysis: “How many libraries
are affected if I remove/modify a certain method/interface?”
❖ Fully precise license compliance: “Is my library compliant
with the licenses of the libraries that I depend from (directly or
indirectly)? (e.g., am I linking any GPL code?)”
❖ Fully precise risk pro
fi
ling: “Does this vulnerability affect my
code?”
❖ Centrality analysis: “What methods/functions are more central
within a given ecosystem? are there bottlenecks? critical points?”
51. The FASTEN toolchain
Project information
Securit
y
alerts
Repositories
publish
Data stream
FASTE
N
server
publish
publish
52. The FASTEN toolchain
Project information
Securit
y
alerts
Repositories
publish
Data stream
FASTE
N
server
Call-graph
construction
publish
publish
53. The FASTEN toolchain
Project information
Securit
y
alerts
Repositories
publish
Data stream
FASTE
N
server
Call-graph
construction
Storage
layer
publish
publish
54. The FASTEN toolchain
Project information
Securit
y
alerts
Repositories
publish
Data stream
FASTE
N
server
Call-graph
construction
Storage
layer
Analysis
layer
publish
publish
55. The FASTEN toolchain
Project information
Securit
y
alerts
Repositories
publish
Data stream
FASTE
N
server
Call-graph
construction
Storage
layer
Analysis
layer
RESTApi
publish
publish
56. The FASTEN toolchain
Project information
Securit
y
alerts
Repositories
publish
Data stream
FASTE
N
server
Call-graph
construction
Storage
layer
Analysis
layer
RESTApiWebUI
publish
publish
57. The FASTEN toolchain
Project information
Securit
y
alerts
Repositories
publish
Data stream
FASTE
N
server
Call-graph
construction
Storage
layer
Analysis
layer
RESTApiWebUI
publish
publish
Continuous
integration server
58. The FASTEN toolchain
Project information
Securit
y
alerts
Repositories
publish
Data stream
FASTE
N
server
Call-graph
construction
Storage
layer
Analysis
layer
RESTApiWebUI
publish
publish
Continuous
integration server
59. The FASTEN toolchain
Project information
Securit
y
alerts
Repositories
publish
Data stream
FASTE
N
server
Call-graph
construction
Storage
layer
Analysis
layer
RESTApiWebUI
publish
publish
Continuous
integration server
Developer
63. Universal function identifiers
How to uniquely reference a function in a global namespace?
fasten://
/mvn
/org.slf4j.slf4j-api
/1.2.3
/org.slf4j.helpers
/BasicMarkerFactory.getDetachedMarker
(%2Fjava.lang%2FString)
%2Forg.slf4j%2FMarker
scheme
forge
artifact
version
namespace
function
argument(s)
return type
Done
64. Universal function identifiers
How to uniquely reference a function in a global namespace?
fasten://
/mvn
/org.slf4j.slf4j-api
/1.2.3
/org.slf4j.helpers
/BasicMarkerFactory.getDetachedMarker
(%2Fjava.lang%2FString)
%2Forg.slf4j%2FMarker
scheme
forge
artifact
version
namespace
function
argument(s)
return type
Generic format +
Java
Python
C
Done
71. Call graph stitching
❖ Idea: Decouple package resolution from call graph
generation
How to scale call graph processing to 10^6 package versions?
In
progress
72. Call graph stitching
❖ Idea: Decouple package resolution from call graph
generation
❖ Build and store call graphs per package version, incl.:
How to scale call graph processing to 10^6 package versions?
In
progress
73. Call graph stitching
❖ Idea: Decouple package resolution from call graph
generation
❖ Build and store call graphs per package version, incl.:
❖ unresolved calls
How to scale call graph processing to 10^6 package versions?
In
progress
74. Call graph stitching
❖ Idea: Decouple package resolution from call graph
generation
❖ Build and store call graphs per package version, incl.:
❖ unresolved calls
❖ class hierarchies (Java, Python)
How to scale call graph processing to 10^6 package versions?
In
progress
75. Call graph stitching
❖ Idea: Decouple package resolution from call graph
generation
❖ Build and store call graphs per package version, incl.:
❖ unresolved calls
❖ class hierarchies (Java, Python)
❖ Call graph stitching: Resolve unresolved
calls given a dependency tree
How to scale call graph processing to 10^6 package versions?
In
progress
77. Examples of queries:
largest packages (# of functions)
select p.package_name, pv.version, count(*)
from package_versions pv
join packages p on pv.package_id = p.id
join modules m on m.package_version_id = pv.id
join callables c on c.module_id = m.id
group by p.package_name, pv.version
order by count(*) desc
limit 10;
78. Examples of queries:
Packages depending on vulnerable package
SELECT package_version_id, p.package_name, pv.version
FROM dependencies d
JOIN package_versions pv ON pv.id = d.package_version_id
JOIN packages p ON p.id = pv.package_id
WHERE d.dependency_id =
(SELECT id
FROM packages
WHERE package_name = 'com.google.guava:guava')
AND '20.0' = ANY(d.version_range);
80. Graph analytics
(results shown refer to Java CG’s)
❖ Graph stored using WebGraph (UMIL
)
❖ For 1.1M graphs (2.3B nodes, 18B edges)
:
❖ 3.6 bits per edge, plus global ID storage for each node
(9.0 bits per edge overall
)
❖ DB size: 38GB → we can
fi
t the whole of Maven in
RAM
In
progress
83. Vulnerability Plugin
❖ Injecting vulnerability information at
package and callable leve
l
❖ Introducing a normalizing
Vulnerability Object de
fi
nition
among the different sources of
informatio
n
❖ Continuously pulling updates for new
information and storing the results
In
progress
84. REST API
❖ Implementation of endpoints to expose canned queries
from the metadata databas
e
❖ In development
:
❖ Full DB entity suppor
t
❖ Custom extension points
In
progress
85. Analysis plug-ins
RAPID: Risk Analysis and Propagation Inspection for Security and
Maintainability risk
s
❖ Plugin for code maintainability analysis
V1 deployed, processed 126K Maven coordinates to dat
e
❖ Plugin for security vulnerability propagatio
n
❖ User application to model and present risk
s
❖ Vulnerability data integratio
n
❖ Clearly De
fi
ned, NVD,
…
❖ Association at the function and package level
In
progress
86. License and Compliance analysis
❖ QMSTR Plugin consists of 3 steps
:
1. Build graph generation, consisting of information
about all the generated artifacts that will be
distributed together with the source code
;
2. Execution of static analysis tools that augment the
build graph with license and compliance metadata
;
3. Generation of a report with package's relevant license
and authorship metadata that is
fi
nally distributed
In
progress
87. Use cases
❖ Endocod
e
❖ Integration of
fi
rst version of license and compliance analyse
r
❖ Achieved the validation of `SPDX validator` use cas
e
❖ SI
G
❖ Integration of code quality analyser into FASTEN Serve
r
❖ XWIK
I
❖ Risk validation in the dependencies at Maven build tim
e
❖ Risk validation in the installed extensions of an XWiki instanc
e
❖ Filter out available compatible extensions for an XWiki instanc
e
❖ Discoverability of XWiki components in available extensions
In
progress
89. The future
End 2020
Q1 2021
Q2 2021
Q3 2021
REST API,
fi
rst full version of knowledge base, CG enrichment,
build graph integration,
fi
rst public announcement
Impact analysis, integration with MVN / PyPI;
fi
rst external user
Q4 2021
Q1 2022 FASTEN 2?
Industrial use cases integrated;
fi
rst external adoption
Licensing and security fully integrated;
Data-driven API evolution
Project
fi
nished; external integrations