SlideShare a Scribd company logo
FASTEN: Scaling static
analysis to ecosystems
Georgios Gousios | @gousiosg
TU Delft
Package dependency networks
• Dependencies on version ranges with
semantic versioning

• Online package repositories host all (?)
released package versions

• Package managers read dependency
descriptors and download libraries

• Transitive dependencies are
downloaded automatically
Strongly connected component
of the Rust/Cargo packages (Kikas 2016)
Recent issues with PDNs
•leftpad
• Equifax

•eventstream
•rest-client
•…
Strongly connected component
of the Rust/Cargo packages (Kikas 2016)
Ecosystems grow at breakneck speeds...
• Avg JavaScript project has 54 (Kikas et al. 2017), or 80 (Zimermann et al. 2019)
transitive dependencies

• 50% of transitive dependency closures different in a period of 6
months on Cargo/Rust (Hejderup et al. 2019)

...and they deteriorate
• Packages exist in RubyGems whose removal can bring down 500k
(40%) other package versions (Kikas et al. 2017)

• 391 highly maintainers affect more than 10k packages (Zimermann et al.
2019).
What research tells us
Developers don't update (Kula et al. 2017)
• 85% of the dependencies are outdated in 50% of important Maven
packages

• No updates even in the case of security disclosures (70% were unaware)

• "Too difficult!", "No tools!"

Vulnerabilities proliferate
• 1/4 of library downloads have a vulnerability (Comcast TR 2017)

• 1/3 of top 133k sites have a vulnerable dependency (Lauinger et al. 2017)
What research tells us
The developers’ perspective
• The observability problem: How can I know that one of
my dependencies is outdated?

• The update problem: How can I check if an updated
dependency breaks my code?

• The compliance problem: How do I know that I am not
violating anyone’s copyrights?

• The trust problem: How can I trust code I download from
the Internet with my valuable data?
The maintainers’ perspective
• The update problem: How can I update my library without
breaking clients? How can I notify important clients that I am
about to break them?
• The deprecation problem: How can I remove features from
my library?

• The unlawful use problem: How can I spot instances of my
code being distributed without permission?

• The lack of incentive problem: Why should I use my (free!)
time to maintain a library that large corporations depend
upon?
+ the problems that developers have!
State of the art practices
• Resolve dependencies and
store resolution in repo

• Protects against breakage due
to updates

• Also “protects” against fast
distribution of security
updates
https://www.publicdomainpictures.net/en/view-image.php?image=80963
Dependency version pinning
State of the art practices
Monitoring services
State of the art practices
• Lots of services (Dependabot,
GitHub, …) notify projects
when new dependency
versions are available
• Ripe with false positives
• No help with impact
assessment
Monitoring services
The sorry state of the state
of the art
• Not much beyond simple package version matches (and
a bit of compliance)

• No support for assessing updates

• No support for making decisions on which libraries to use

• No support for maintainers
We can do better than that!
Getting to the root cause
Getting to the root cause
State of the art tools analyze package relationships…
Package Dependency
Network (PDN)
Getting to the root cause
State of the art tools analyze package relationships…
…while actual reuse happens in the code
Package Dependency
Network (PDN)
Call Dependency
Network (CDN)
Promises of Call-based
Dependency Networks
• More precise usage analysis

• Does this vulnerability affect my code?

• Am I linking to GPL code?

• More precise impact analysis

• How many clients will I break if I change this method?

• Can I safely update?

• Effectively, augmenting soundness with precision
RustPräzi: A CDN for Rust
• Calls graphs for 70% of Cargo packages

• Very precise, but unsound (missing calls)

• Rust’s CG generator poor, we are building a new one

• A very promising prototype
http://fasten-project.eu
FASTEN in a nutshell
• Präzi for Java, C, Python and Rust, incl integration to pkg managers

• Analyses on top of it:

• Can I safely update?

• Security vulnerability propagation

• Dependency risk profiling

• Compliance monitoring

• A centralised service to host the graphs and serve the analyses

• Getting the tools to the hands of developers
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Brussels
PyPi
Package
Repositories
Debian
Cargo
PyPi
Package
Repositories
Debian
Cargo
Call graph
generators
Graph DB MetadataPyPi
Package
Repositories
Debian
Cargo
Call graph
generators
Graph DB MetadataPyPi
Package
Repositories
Debian
Cargo
Call graph
generators
Project information
Vulnerability
Information
Graph DB MetadataPyPi
Package
Repositories
Debian
Cargo
Call graph
generators
Package
Builds
Project information
Vulnerability
Information
Graph DB MetadataPyPi
Package
Repositories
Debian
Cargo
Call graph
generators
Package
Builds
Query API Call graph stitching
Project information
Vulnerability
Information
Graph DB MetadataPyPi
Package
Repositories
Debian
Cargo
Call graph
generators
Package
Builds
Query API Call graph stitching
Security Compliance
Change im‐
pact
Quality and
Risk
Project information
Vulnerability
Information
Graph DB MetadataPyPi
Package
Repositories
Debian
Cargo
Call graph
generators
Package
Builds
Query API Call graph stitching
Security Compliance
Change im‐
pact
Quality and
Risk
REST API
Project information
Vulnerability
Information
Graph DB MetadataPyPi
Package
Repositories
Debian
Cargo
Call graph
generators
Package
Builds
Query API Call graph stitching
Security Compliance
Change im‐
pact
Quality and
Risk
REST API
Project information
Vulnerability
Information
All Kafka topics to be made public!
Check codefeedr.org soon!
Universal function identifiers
How to uniquely reference a function in a global namespace?
fasten://
/mvn
/org.slf4j.slf4j-api
/1.2.3
/org.slf4j.helpers
/BasicMarkerFactory.getDetachedMarker
(%2Fjava.lang%2FString)
%2Forg.slf4j%2FMarker
scheme
forge
artifact
version
namespace
function
argument(s)
return type
Callgraph stitching
• Idea: Decouple package resolution from
call graph generation

• Build and store call graphs per package
version, incl:

• unresolved calls

• class hierarchies (Java, Python)

• Callgraph stitching: Resolve unresolved
calls given a dependency tree
How to scale callgraph generation to 10^6 package versions?
Call graph info
{
"product": “org.slf4j.slf4j-api",
"version": “1.7.29”,
"forge": "mvn",
"depset" : […],
"cha": {
"/org.slf4j/LoggerFactory": {
"methods": [
“/org.slf4j/LoggerFactory.bind()%2Fjava.lang%2FVoid", …
], …
}
},
"graph" : [
[
"/org.slf4j.helpers/BasicMarker.contains(%2Fjava.lang%2FString)
%2Fjava.lang%2FBoolean",
"///java.util/Iterator.hasNext()%2Fjava.lang%2FBoolean"
]
],
"timestamp": 1574072773
}
Dependency updates
FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Brussels
Merge with confidence (?)
Are tests enough?
Coverage of function calls to dependency functions in 520 Java projects
Uppdatera: approach
Uppdatera bot
Uppdatera bot
Detecting regressions
Detection rate for artificial regressions in
the dependency set of 388 Maven modules
Example FASTEN workflow
# Check outdated dependencies
$ pip list --outdated
Package Version Latest Type
---------- ------- ------ -----
Pygments 2.2.0 2.3.1 wheel
# Update a package
$ pip install --upgrade Pygments
Collecting Pygments
Downloading ...
Successfully installed Pygments-2.3.1
# Done, fingers crossed!
Updating with confidence
Before FASTEN
Example FASTEN workflow
# Check outdated dependencies
$ pip list --outdated
Package Version Latest Type
---------- ------- ------ -----
Pygments 2.2.0 2.3.1 wheel
Updating Pygments will affect:
foo.py: function colorize
bar.py: function parse
# Check outdated dependencies
$ pip list --outdated
Package Version Latest Type
---------- ------- ------ -----
Pygments 2.2.0 2.3.1 wheel
# Update a package
$ pip install --upgrade Pygments
Collecting Pygments
Downloading ...
Successfully installed Pygments-2.3.1
# Done, fingers crossed!
Updating with confidence
Before FASTEN After FASTEN
Example FASTEN workflow
# Check outdated dependencies
$ pip list --outdated
Package Version Latest Type
---------- ------- ------ -----
Pygments 2.2.0 2.3.1 wheel
Updating Pygments will affect:
foo.py: function colorize
bar.py: function parse
# Check outdated dependencies
$ pip list --outdated
Package Version Latest Type
---------- ------- ------ -----
Pygments 2.2.0 2.3.1 wheel
# Update a package
$ pip install --upgrade Pygments
Collecting Pygments
Downloading ...
Successfully installed Pygments-2.3.1
# Done, fingers crossed!
Updating with confidence
Before FASTEN After FASTEN
# Estimate update impact
$ pip install --dry-run Pygments
Function Pygments.Formatter.format[formatter.py]
changed ->
check <your_app> at colorize[foo.py]:32
# Developer inspects changed paths
# Update can continue
$ pip install --upgrade Pygments
Collecting Pygments
Downloading ...
Successfully installed Pygments-2.3.1
# Done
Example FASTEN workflow
# Checking info about the library
$ pip show tornado
Name: tornado
Version: 5.0
Summary: Tornado is a Python web
framework …
Home-page: http://www.tornadoweb.org/
Author: Facebook
Author-email: …
License: http://www.apache.org/
licenses/LICENSE-2.0
Location: …
Requires: backports-abc, futures,
singledispatch
Required-by:
Deciding to use a library
Before FASTEN
Example FASTEN workflow
# Checking info about the library
$ pip show tornado
Name: tornado
Version: 5.0
License: http://www.apache.org/licenses/
LICENSE-2.0
...
Maintainers: 3
Community size: 15
Used by: 145 on PyPI, 34433 on GitHub
Latest vulnerability: 13 months ago
(CVE-2012-2374)
All known vulnerabilities: 25 (best 10%)
License rating: Compatible
# Checking info about the library
$ pip show tornado
Name: tornado
Version: 5.0
Summary: Tornado is a Python web
framework …
Home-page: http://www.tornadoweb.org/
Author: Facebook
Author-email: …
License: http://www.apache.org/
licenses/LICENSE-2.0
Location: …
Requires: backports-abc, futures,
singledispatch
Required-by:
Deciding to use a library
Before FASTEN After FASTEN
Example FASTEN workflow
Maintaining a library
Example FASTEN workflow
Maintaining a library
# Check uses of function pkg.list() in dependents
$ pip query --uses pkg.list
depA(v1.2).parse()
depA(v1.2).test()
depB(0.0.2).foo()
depC(1.2.1).calculate()
Example FASTEN workflow
Maintaining a library
# Check uses of function pkg.list() in dependents
$ pip query --uses pkg.list
depA(v1.2).parse()
depA(v1.2).test()
depB(0.0.2).foo()
depC(1.2.1).calculate()
# Estimate "damage" if pkg.list will be updated
$ pip query —total pkg.list
3 direct and 223 indirect dependencies will be affected
Example FASTEN workflow
Maintaining a library
# Check uses of function pkg.list() in dependents
$ pip query --uses pkg.list
depA(v1.2).parse()
depA(v1.2).test()
depB(0.0.2).foo()
depC(1.2.1).calculate()
# Estimate "damage" if pkg.list will be updated
$ pip query —total pkg.list
3 direct and 223 indirect dependencies will be affected
# Notify direct dependencies of upcoming breakage
$ pip query --uses pkg.list |
cut -f 1 -d '(' |
xargs -I {} pip show {} |
grep Author-email: | cut -f 2 -d ':' |
xargs mail -s 'MyProject update will break yours!'
Example FASTEN workflow
Maintaining a library
# Check uses of function pkg.list() in dependents
$ pip query --uses pkg.list
depA(v1.2).parse()
depA(v1.2).test()
depB(0.0.2).foo()
depC(1.2.1).calculate()
# Estimate "damage" if pkg.list will be updated
$ pip query —total pkg.list
3 direct and 223 indirect dependencies will be affected
# Notify direct dependencies of upcoming breakage
$ pip query --uses pkg.list |
cut -f 1 -d '(' |
xargs -I {} pip show {} |
grep Author-email: | cut -f 2 -d ':' |
xargs mail -s 'MyProject update will break yours!'
# Which dependencies should I notify first?
$ pip query --uses --rank pkg.list
depC(1.2.1).calculate()
depB(0.0.2).foo()
depA(v1.2).parse()
depA(v1.2).test()
Current status
• Working on storage

• Working on CG generation for Python / Rust

• Working on the REST API

• Working on build graph integration

• Alpha release in May 2020, stay tuned!
http://fasten-project.eu
@FastenProject
http://dep.management
The FASTEN project has received funding from the European Union’s Horizon 2020
research and innovation programme under grant agreement No 825328.
The opinions expressed in this document reflects only the author`s view and in no way reflect the European Commission’s opinions. The European
Commission is not responsible for any use that may be made of the information it contains.

More Related Content

FASTEN: Scaling static analyses to ecosystem, presented at FOSDEM 2020 in Brussels

  • 1. FASTEN: Scaling static analysis to ecosystems Georgios Gousios | @gousiosg TU Delft
  • 2. Package dependency networks • Dependencies on version ranges with semantic versioning • Online package repositories host all (?) released package versions • Package managers read dependency descriptors and download libraries • Transitive dependencies are downloaded automatically Strongly connected component of the Rust/Cargo packages (Kikas 2016)
  • 3. Recent issues with PDNs •leftpad • Equifax •eventstream •rest-client •… Strongly connected component of the Rust/Cargo packages (Kikas 2016)
  • 4. Ecosystems grow at breakneck speeds... • Avg JavaScript project has 54 (Kikas et al. 2017), or 80 (Zimermann et al. 2019) transitive dependencies • 50% of transitive dependency closures different in a period of 6 months on Cargo/Rust (Hejderup et al. 2019) ...and they deteriorate • Packages exist in RubyGems whose removal can bring down 500k (40%) other package versions (Kikas et al. 2017) • 391 highly maintainers affect more than 10k packages (Zimermann et al. 2019). What research tells us
  • 5. Developers don't update (Kula et al. 2017) • 85% of the dependencies are outdated in 50% of important Maven packages • No updates even in the case of security disclosures (70% were unaware) • "Too difficult!", "No tools!" Vulnerabilities proliferate • 1/4 of library downloads have a vulnerability (Comcast TR 2017) • 1/3 of top 133k sites have a vulnerable dependency (Lauinger et al. 2017) What research tells us
  • 6. The developers’ perspective • The observability problem: How can I know that one of my dependencies is outdated? • The update problem: How can I check if an updated dependency breaks my code? • The compliance problem: How do I know that I am not violating anyone’s copyrights? • The trust problem: How can I trust code I download from the Internet with my valuable data?
  • 7. The maintainers’ perspective • The update problem: How can I update my library without breaking clients? How can I notify important clients that I am about to break them? • The deprecation problem: How can I remove features from my library? • The unlawful use problem: How can I spot instances of my code being distributed without permission? • The lack of incentive problem: Why should I use my (free!) time to maintain a library that large corporations depend upon? + the problems that developers have!
  • 8. State of the art practices • Resolve dependencies and store resolution in repo • Protects against breakage due to updates • Also “protects” against fast distribution of security updates https://www.publicdomainpictures.net/en/view-image.php?image=80963 Dependency version pinning
  • 9. State of the art practices Monitoring services
  • 10. State of the art practices • Lots of services (Dependabot, GitHub, …) notify projects when new dependency versions are available • Ripe with false positives • No help with impact assessment Monitoring services
  • 11. The sorry state of the state of the art • Not much beyond simple package version matches (and a bit of compliance) • No support for assessing updates • No support for making decisions on which libraries to use • No support for maintainers We can do better than that!
  • 12. Getting to the root cause
  • 13. Getting to the root cause State of the art tools analyze package relationships… Package Dependency Network (PDN)
  • 14. Getting to the root cause State of the art tools analyze package relationships… …while actual reuse happens in the code Package Dependency Network (PDN) Call Dependency Network (CDN)
  • 15. Promises of Call-based Dependency Networks • More precise usage analysis • Does this vulnerability affect my code? • Am I linking to GPL code? • More precise impact analysis • How many clients will I break if I change this method? • Can I safely update? • Effectively, augmenting soundness with precision
  • 16. RustPräzi: A CDN for Rust • Calls graphs for 70% of Cargo packages • Very precise, but unsound (missing calls) • Rust’s CG generator poor, we are building a new one • A very promising prototype
  • 18. FASTEN in a nutshell • Präzi for Java, C, Python and Rust, incl integration to pkg managers • Analyses on top of it: • Can I safely update? • Security vulnerability propagation • Dependency risk profiling • Compliance monitoring • A centralised service to host the graphs and serve the analyses • Getting the tools to the hands of developers
  • 23. Graph DB MetadataPyPi Package Repositories Debian Cargo Call graph generators Project information Vulnerability Information
  • 24. Graph DB MetadataPyPi Package Repositories Debian Cargo Call graph generators Package Builds Project information Vulnerability Information
  • 25. Graph DB MetadataPyPi Package Repositories Debian Cargo Call graph generators Package Builds Query API Call graph stitching Project information Vulnerability Information
  • 26. Graph DB MetadataPyPi Package Repositories Debian Cargo Call graph generators Package Builds Query API Call graph stitching Security Compliance Change im‐ pact Quality and Risk Project information Vulnerability Information
  • 27. Graph DB MetadataPyPi Package Repositories Debian Cargo Call graph generators Package Builds Query API Call graph stitching Security Compliance Change im‐ pact Quality and Risk REST API Project information Vulnerability Information
  • 28. Graph DB MetadataPyPi Package Repositories Debian Cargo Call graph generators Package Builds Query API Call graph stitching Security Compliance Change im‐ pact Quality and Risk REST API Project information Vulnerability Information All Kafka topics to be made public! Check codefeedr.org soon!
  • 29. Universal function identifiers How to uniquely reference a function in a global namespace? fasten:// /mvn /org.slf4j.slf4j-api /1.2.3 /org.slf4j.helpers /BasicMarkerFactory.getDetachedMarker (%2Fjava.lang%2FString) %2Forg.slf4j%2FMarker scheme forge artifact version namespace function argument(s) return type
  • 30. Callgraph stitching • Idea: Decouple package resolution from call graph generation • Build and store call graphs per package version, incl: • unresolved calls • class hierarchies (Java, Python) • Callgraph stitching: Resolve unresolved calls given a dependency tree How to scale callgraph generation to 10^6 package versions?
  • 31. Call graph info { "product": “org.slf4j.slf4j-api", "version": “1.7.29”, "forge": "mvn", "depset" : […], "cha": { "/org.slf4j/LoggerFactory": { "methods": [ “/org.slf4j/LoggerFactory.bind()%2Fjava.lang%2FVoid", … ], … } }, "graph" : [ [ "/org.slf4j.helpers/BasicMarker.contains(%2Fjava.lang%2FString) %2Fjava.lang%2FBoolean", "///java.util/Iterator.hasNext()%2Fjava.lang%2FBoolean" ] ], "timestamp": 1574072773 }
  • 35. Are tests enough? Coverage of function calls to dependency functions in 520 Java projects
  • 39. Detecting regressions Detection rate for artificial regressions in the dependency set of 388 Maven modules
  • 40. Example FASTEN workflow # Check outdated dependencies $ pip list --outdated Package Version Latest Type ---------- ------- ------ ----- Pygments 2.2.0 2.3.1 wheel # Update a package $ pip install --upgrade Pygments Collecting Pygments Downloading ... Successfully installed Pygments-2.3.1 # Done, fingers crossed! Updating with confidence Before FASTEN
  • 41. Example FASTEN workflow # Check outdated dependencies $ pip list --outdated Package Version Latest Type ---------- ------- ------ ----- Pygments 2.2.0 2.3.1 wheel Updating Pygments will affect: foo.py: function colorize bar.py: function parse # Check outdated dependencies $ pip list --outdated Package Version Latest Type ---------- ------- ------ ----- Pygments 2.2.0 2.3.1 wheel # Update a package $ pip install --upgrade Pygments Collecting Pygments Downloading ... Successfully installed Pygments-2.3.1 # Done, fingers crossed! Updating with confidence Before FASTEN After FASTEN
  • 42. Example FASTEN workflow # Check outdated dependencies $ pip list --outdated Package Version Latest Type ---------- ------- ------ ----- Pygments 2.2.0 2.3.1 wheel Updating Pygments will affect: foo.py: function colorize bar.py: function parse # Check outdated dependencies $ pip list --outdated Package Version Latest Type ---------- ------- ------ ----- Pygments 2.2.0 2.3.1 wheel # Update a package $ pip install --upgrade Pygments Collecting Pygments Downloading ... Successfully installed Pygments-2.3.1 # Done, fingers crossed! Updating with confidence Before FASTEN After FASTEN # Estimate update impact $ pip install --dry-run Pygments Function Pygments.Formatter.format[formatter.py] changed -> check <your_app> at colorize[foo.py]:32 # Developer inspects changed paths # Update can continue $ pip install --upgrade Pygments Collecting Pygments Downloading ... Successfully installed Pygments-2.3.1 # Done
  • 43. Example FASTEN workflow # Checking info about the library $ pip show tornado Name: tornado Version: 5.0 Summary: Tornado is a Python web framework … Home-page: http://www.tornadoweb.org/ Author: Facebook Author-email: … License: http://www.apache.org/ licenses/LICENSE-2.0 Location: … Requires: backports-abc, futures, singledispatch Required-by: Deciding to use a library Before FASTEN
  • 44. Example FASTEN workflow # Checking info about the library $ pip show tornado Name: tornado Version: 5.0 License: http://www.apache.org/licenses/ LICENSE-2.0 ... Maintainers: 3 Community size: 15 Used by: 145 on PyPI, 34433 on GitHub Latest vulnerability: 13 months ago (CVE-2012-2374) All known vulnerabilities: 25 (best 10%) License rating: Compatible # Checking info about the library $ pip show tornado Name: tornado Version: 5.0 Summary: Tornado is a Python web framework … Home-page: http://www.tornadoweb.org/ Author: Facebook Author-email: … License: http://www.apache.org/ licenses/LICENSE-2.0 Location: … Requires: backports-abc, futures, singledispatch Required-by: Deciding to use a library Before FASTEN After FASTEN
  • 46. Example FASTEN workflow Maintaining a library # Check uses of function pkg.list() in dependents $ pip query --uses pkg.list depA(v1.2).parse() depA(v1.2).test() depB(0.0.2).foo() depC(1.2.1).calculate()
  • 47. Example FASTEN workflow Maintaining a library # Check uses of function pkg.list() in dependents $ pip query --uses pkg.list depA(v1.2).parse() depA(v1.2).test() depB(0.0.2).foo() depC(1.2.1).calculate() # Estimate "damage" if pkg.list will be updated $ pip query —total pkg.list 3 direct and 223 indirect dependencies will be affected
  • 48. Example FASTEN workflow Maintaining a library # Check uses of function pkg.list() in dependents $ pip query --uses pkg.list depA(v1.2).parse() depA(v1.2).test() depB(0.0.2).foo() depC(1.2.1).calculate() # Estimate "damage" if pkg.list will be updated $ pip query —total pkg.list 3 direct and 223 indirect dependencies will be affected # Notify direct dependencies of upcoming breakage $ pip query --uses pkg.list | cut -f 1 -d '(' | xargs -I {} pip show {} | grep Author-email: | cut -f 2 -d ':' | xargs mail -s 'MyProject update will break yours!'
  • 49. Example FASTEN workflow Maintaining a library # Check uses of function pkg.list() in dependents $ pip query --uses pkg.list depA(v1.2).parse() depA(v1.2).test() depB(0.0.2).foo() depC(1.2.1).calculate() # Estimate "damage" if pkg.list will be updated $ pip query —total pkg.list 3 direct and 223 indirect dependencies will be affected # Notify direct dependencies of upcoming breakage $ pip query --uses pkg.list | cut -f 1 -d '(' | xargs -I {} pip show {} | grep Author-email: | cut -f 2 -d ':' | xargs mail -s 'MyProject update will break yours!' # Which dependencies should I notify first? $ pip query --uses --rank pkg.list depC(1.2.1).calculate() depB(0.0.2).foo() depA(v1.2).parse() depA(v1.2).test()
  • 50. Current status • Working on storage • Working on CG generation for Python / Rust • Working on the REST API • Working on build graph integration • Alpha release in May 2020, stay tuned!
  • 52. The FASTEN project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 825328. The opinions expressed in this document reflects only the author`s view and in no way reflect the European Commission’s opinions. The European Commission is not responsible for any use that may be made of the information it contains.