KBall and Juri dive deep into monorepos, their benefits and gotchas, and how Nx helps you improve the performance and maintainability of a monorepo setup.
Featuring
Sponsors
Square – Develop on the platform that sellers trust. There is a massive opportunity for developers to support Square sellers by building apps for today’s business needs. Learn more at changelog.com/square to dive into the docs, APIs, SDKs and to create your Square Developer account — tell them Changelog sent you.
Hasura – Create dynamic high-performance GraphQL & REST APIs from your database(s) in minutes with granular authorization and caching baked in. All without touching your underlying database. Go from data to API in minutes. Get started for free at hasura.io/jsparty
Sourcegraph – Transform your code into a queryable database to create customizable visual dashboards in seconds. Sourcegraph recently launched Code Insights — now you can track what really matters to you and your team in your codebase. See how other teams are using this awesome feature at about.sourcegraph.com/code-insights
Notes & Links
Chapters
Chapter Number | Chapter Start Time | Chapter Title | Chapter Duration |
1 | 00:00 | Opener | 00:32 |
2 | 00:32 | Sponsor: Square | 01:37 |
3 | 02:08 | Intro | 00:40 |
4 | 02:48 | Welcoming Juri | 01:33 |
5 | 04:21 | Getting to know NX | 07:40 |
6 | 12:00 | Monorepos | 04:42 |
7 | 16:42 | Sponsor: Hasura | 02:00 |
8 | 18:42 | Monorepo pros & cons | 07:10 |
9 | 25:53 | Linting & tooling | 04:10 |
10 | 30:02 | NX Graph | 01:15 |
11 | 31:18 | Dependency downfalls | 04:16 |
12 | 35:34 | Sponsor: Sourcegraph | 03:03 |
13 | 38:37 | Getting started | 06:57 |
14 | 45:34 | The power of plugins | 03:54 |
15 | 49:28 | What's next | 03:10 |
16 | 52:38 | NX & Python (?) | 02:13 |
17 | 54:51 | Goodbye! | 00:56 |
18 | 55:47 | Outro | 02:19 |
Transcript
Play the audio to listen along while you enjoy the transcript. 🎧
Hello, and welcome to another episode of JS Party. I’m your MC today, this is Kball talking, and I’m here with a special guest, Juri Strumpflohner.
Hey!
Juri, welcome!
Hey, thanks for having me.
Awesome, good to have you here. I’m excited about this conversation. So let’s get started. Actually, before we dive into the topic of the day, which - hint is monorepos in JavaScript, though this being JS Party we may party on whatever comes up… Let’s start with you a little bit. Can you introduce yourself to the JS Party listeners and say something about your background and how you got into working in dev tools?
Yeah, sure. My name is Juri, I’m from the very North of Italy. I’ve been in software engineering already for a long, long time. I started basically in the Java world, did a lot of ASP.net, then later in my professional career finally ended up in the frontend space much more, as that got more popular, as the whole SPA land got more traction, basically.
I finished up in the Angular community for a whole lot, and I still am… And from there, basically, I then did segue into more the tooling space, so that’s where I’m currently at. Right now I’m working at Narwhal for a couple of years, doing consulting there, and most recently I took over the role of the director of developer experience, so focusing on a lot of content production, teaching people about monorepos, about what we do at Narwhal with NX, and Lerna, and whatnot…
Yeah, so let’s dive into that. What is NX? How does it work? What do you do?
[04:20] Yeah, so NX has a lot of different facets, actually. We tell people always NX is kind of a smart FaaS, an extensible build system tool. So kind of like a build framework, if you want. The next question that comes up usually is then “Well, does it replace things like WebPack, or things like ESBuild, or SWC?” And it actually doesn’t. It’s mostly a tool for coordinating and scheduling and orchestrating tasks, which then might use WebPack, those tasks might use ESBuild… That really depends on your project. So it’s really a tool that helps you run tasks within a project, which is most useful if you have multiple projects in a workspace, for instance, which is like setup into the monorepo space, where NX is built for.
A lot of our folks also use it just for single projects, because NX can also be used in a fashion where you would say - similar to Create React App, where it abstracts away the underlying build tooling, but it gives you a whole lot of configuration options that you can then fine-tune to whatever you need to produce… But it comes with generators for scaffolding your project, it comes with running the projects, stuff like that. Or easy setting up of like testing environments, Cypress, and those tools. So it can be a good tool integrator and function very much like a Create React App setup, but with more possibilities, basically.
Interesting. So I may be showing how long I’m in the JavaScript ecosystem, but when you started talking initially, you were saying it’s a task runner, and I was thinking “Okay, so is this another iteration of Grunt and Gulp, which were doing some task-running of some sort?” And we got away from those because we had dedicated build tools… Does this bring that back, or something different?
That’s a good point. Actually, I never thought it from that view. Now, I’ve actually been also around when Grunt kind of became popular, and then not again, because Gulp came up… So to some degree, it hooks into that, but just from a very high-level. So it doesn’t really – because Grunt basically allowed you to stick together your scripts, basically; combine them, and then run them and fine-tune them.
So from a high-level perspective, in terms of running tasks - it definitely does. There’s some similarity there, but it’s mostly going in the direction of you have multiple projects to run, you need to run them efficiently, because otherwise things are getting slower with a lot of projects… So that is when NX kind of kicks in and tests things like caching and stuff to optimize those things. But it is a task runner at a high level, yeah.
Interesting. So if Gulp was a task runner that was very low-level, you were running it inside your system and often clutching a build system out of tasks… And then we said “Okay, that’s not the right abstraction layer for projects. We need a dedicated build system.” So we set up a build system for that. And now what you’re saying is “Okay, now we’re coordinating multiple projects; they have their own build systems perhaps, but we need to run tasks and coordinate.” So maybe at this layer, above the build system is where a task runner makes sense.
Yeah. I mean, technically, you could still have a project within those many projects that you manage in a given workspace, which uses Gulp underneath to do its building. So NX would just coordinate that and run it whenever it sees “Okay, this project needs now to be run, because it got changed, and there is a PR” or something. So it would coordinate it to Gulp and say “Run it.” So NX doesn’t really care what or how the build looks underneath. That’s also why it doesn’t replace anything like WebPack or ESBuild, rather it integrates with those, if you want.
Got it. Okay. And then another thing you mentioned a little bit was around having scaffolding, which once again reminds me - Create React App is one area, it reminds me of Yeoman, that was trying to do scaffolds and things like that. So how does the scaffolding piece of NX work?
[08:03] For that purpose it might be interesting to zoom out a little. So NX is basically – as I mentioned before, it has maybe two big areas right now. At the very core it’s a fast task scheduler. So if you have already a workspace, or something, if you have multiple projects that are managed with an npm workspace, then you can add NX on top to make that fast, that scheduling of the tasks… Because there’s things like caching, or parallelization, and then more intelligent running of tasks and how you sequence them… We can dive a bit into that later, but that is basically at the core. And then you can however use, I would say, a more powerful approach, which is like stick plugins on top of that.
So you can say “Okay, I’m having a workspace with a React application in there” or one with an Angular application in there, and so we want to manage them together… So we basically ship with dedicated plugins for React, for Angular, for a couple of those most popular frameworks. And there are community plugins as well that you can install, where someone from a community may come up and say “Well, I have a plugin for Go, because I happen to use it.” Created an NX plugin, has a dedicated API, so you can use that as well.
And those plugins then on the other side then come often – they don’t have to, but often they come with dedicated scaffolding, which we call generators. And it’s very similar to what you mentioned, like Yeoman, because it can be a scaffolding generator from setting up the entire project for you… So creating a new React application with - it would scaffold out the entire React app, with everything you need… But it can also be as tiny as “Add me a new route component to an existing Angular or React application.” So it can be at different levels. That really depends on what type of generator you provide. Because in the end it’s mostly AST manipulation, so those generators come with some sort of API that we give developers such that it’s more easy to interact with the file system, and stuff; so we have some facilities around that. But in the end, depending on how deep you wanna go with your generator, you can go from just doing [unintelligible 00:10:04.18] or really digging into ASTs, and stuff.
That makes sense. Okay, so if I’m interpreting it correctly, NX is basically filling a lot of roles, but really, at this layer of multi-project coordination.
Yeah.
You have multiple - or what might in some worlds be multiple repos, multiple distinct bodies of code that have their own nuances, their own build so you’re using all of these in coordination - let’s actually optimize that coordination and think how we do it. Is that about right?
Yeah, because in the end, once you have multiple projects in the same workspace, which is usually a Git repository, then you are kind of in the space of a monorepo. A lot of people don’t think about it that way often. They don’t even maybe know their monorepo in-depth. But once you have one project, which is usually how it starts - you have one project, you create a couple of split out libraries which live in the same repo, and you just link them together… And at some point you add another project that you already kind of start building like a super-small monorepo, but it is kind of a monorepo, right?
The initial idea of NX, when it got created by its founders, was actually to give good support for monorepo scenarios… Because both founders are ex-Googlers, so they saw basically how monorepos obviously are used within Google, at that whole other scale, at that large scale, but they wanted to have, when they left Google, something similar for the community outside, but not as complicated to use… Because Google has actually open sourced their Blaze tool, or Bazel in the open source world. It is very powerful, it is however also quite complicated to set up.
So they wanted to find a good balance where you could use it very easily on the JS ecosystem, which would fit in nicely there, because that was the main focus of NX in general, although it’s kind of general-purpose, if you want… You could also build Java projects… But it was built very focusing on that JS frontend ecosystem.
[11:59] That makes sense. Well, that’s actually a good segue into this area of monorepos… Let’s actually first kind of step and define a little bit - what is a monorepo? When you say monorepo, I think you have something in mind, and you mentioned a lot of people start doing monorepos and not even necessarily realizing that’s what they’re doing. So can you define it for us? What is a monorepo?
Yeah, sure. I think we already got a good start into that, actually, because we slowly approached it, right? …mentioning, okay, you have multiple projects, you coordinate stuff between them… That is actually what a monorepo is for me. It’s just like a collection of – or it is a set of multiple projects that live in the same context, if you want. In the same Git repository, which is most often the case. And then most of the time out of those things that you have in your packages and applications there start to be relationships, because you want to share code, you want to collaborate… And so there are actually connections between them. Potentially, a monorepo could also be just a collocation of different projects, but those are less useful, in general. So we usually try to set up monorepos which kind of share stuff between them in a controlled way.
I think the “monorepo” term in general is a bit misleading sometimes, because if you talk to people and say “Hey, this would be a good use case for a monorepo” and they’re like “Oh yeah, I heard about that, but I don’t really have that case where I wanna stick all my company code in one Git repository”, which is – yeah, that’s what you hear from things like Google, or Facebook (Meta), but that’s not obviously in which environment we are.
In fact, what we often see when we do also consulting for large companies is they have multiple monorepos maybe in their organization; maybe per department, per area… Which makes sense; there are some related projects that it would make sense to have collocated in one single Git repository, and then basically start working from there. But you can have multiple of those and still also communicate with single project repositories, if you want. So yeah, from a definition point of view, I just usually mention - this is a multi-project repository. That is what a monorepo is.
That makes a lot of sense. And I think when you look at something like the JavaScript ecosystem, there is this sort of approach of “Let’s pool out lots of small projects together. Let’s create many composable pieces.” Those composable pieces may be related to each other; they’re trying to accomplish the same goal, and so a monorepo feels like it fits very naturally into this ecosystem.
Yeah, yeah.
Okay.
If you look, for instance, in the open source world right now, all the major frameworks that you see out there - like Vue, React, Angular, Next.js - they’re monorepos if you look at them. They have small packages that they release individually to npm, but they’re autonomous packages, basically, that they host in the same repository simply because they share obviously some part.
And also other projects - I know a couple that are more in the UI space, like a design library, or a UI design kit… What you often have there is you have some core library which has the functionality, and then you have on top the UI library for [unintelligible 00:15:00.00] And again, you have a monorepo, literally, because you have that core, and the top-level UI specific design libraries depend on that core, obviously, because they reuse functionality from there.
Those monorepos are more what I call the package-oriented monorepos, where your goal is to just have a set of packages that work together and you want to publish them to npm. And then there are more the app-heavy repositories or monorepos, which is more what you find in organizations where you have product development, things like that.
A lot of the times, people in that area don’t think monorepos could help them a whole lot, but it is also kind of an architectural style, if you want.
Very often you see there you have large React applications - let’s say you use Create React App; you have a large monolithic structure where you have one app with lots of folders in that same app, which are your domain areas. So what we usually propose people in that case is split those areas out into small libraries. Those libraries don’t necessarily have to be shareable, reusable, or whatnot. They can be very specific to one application, but you already get a much better overview of how your projects compose together by having them split up, rather than having them at single folders within the same application. And that’s also what NX kind of helps you do, refactor out those things. Those refactorings then come with a couple of good and nice features and side effects afterwards.
Alright, well let’s get back into it then and talk about the benefits of moving to monorepos, or moving your code into a monorepo setup, and any sort of drawbacks there might be as well. You kind of teased that a little bit.
Yeah, exactly. So at this point I could actually point out - we created a dedicated site, based on our experience, because we do a lot of consulting as well. So we don’t just build the open source project NX, but we also do consulting for big companies… Which is mostly how we feed our ideas back into NX, the open source project. So from that experience we obviously saw “Where do you get a benefit, and what do you need at the same time to have once you start a monorepo?”
One thing set immediately upfront is you shouldn’t jump naively into a monorepo and say “Oh, this is so cool, and sharing code is so easy. Let’s just put all our project in there” and not think enough… Because then like six months in you probably will regret it, I can already tell you now.
So we created a website which is called monorepo.tools, where we try to emphasize one of the most high-level advantages that a monorepo gives you, versus what is called often a poli-repository situation… As well as some of the features that you should have when you look for the specific tools to use together with a monorepo.
So I think a top-level advantage of a monorepo in general is the code sharing. Code sharing and collaboration. And that’s also why it is important… Like, when you set up a monorepo to think about its structure and how things relate within a monorepo – so not just collocate stuff… Because sure, it might be nicer that you have the projects side by side and you can easily jump around without jumping Git repositories, but the benefit you would get out of that is kind of low, right? So it probably is not worth the effort.
What very often happens is you will have some shared parts within that monorepo, which you can use across projects, and then there will be parts that are just specific to a single project in that monorepo.
But still, the fact of having it just in one Git repository - it is very easy to share code, of course, because if I have already a library handy, and someone needs a similar thing, very often it’s quite low effort to actually split it out into maybe a dedicated, shareable library with that functionality, so both projects can depend on it.
Rather, obviously, if you have a poli-repo scenario, it is kind of more difficult… Because obviously, then it’s like “Okay, now we need to create a new repository. Who does the setup of CI?” Because you should probably have that. “How do we version it? How do we handle backwards-compatibility?” All that type of things.
So it makes it obviously much, much faster to have a monorepo, because creating a new library, especially if you have things like generators in NX, it’s really literally a couple of commands you launch and you have a new React library that you can reference, and go ahead, move components around, refactor your code and go ahead.
So that whole collaboration part is really cool and fundamental in a monorepo, and it’s also very interesting when you need to do experiments. Very often you need to “Oh, let’s actually try to change where we place our buttons”, experiments like that, A/B testing kind of stuff. In a monorepo it’s very easy. You just change it, you see the effect, you can deploy it to a separate environment, see what the actual result is… Because everything is kind of atomic in one PR and one comment, which can also be easy to revert afterwards… Rather than publishing a beta release or something of a package, ping the author on the other side, like “Please upgrade.” So it’s mostly that overhead that you get a read of in a monorepo.
Yeah.
[22:18] But yeah, at the same time, obviously, once you see those benefits, what often happens - that’s actually one point also with NX; once you see how quickly you can create libraries, how quickly you can spin up something new, people will do it all the time. It’s so easy. And then obviously if you don’t have the tooling in place, then what you end up is you have a really lot of projects in the same repo, and your CI time will go nuts, basically. So you will have like 30 minutes to 40 minutes to an hour of CI run just to get a PR merged… Which obviously is then – you lose all that benefit of being fast at development time when you cannot merge your PR into the main branch, because it takes over an hour. That then is a problem. So that’s where the whole tooling aspect comes in, where you need to have tooling in place that can support you with that.
That makes sense. So a couple of questions to dig a little bit deeper. So that shared code is an interesting one to look at, because I think one thing that having different repositories makes you do, or having that publish step, is you have to be very, very clean about where are your boundaries and where are your APIs. And I have not worked in a library-type monorepo, as you mentioned, but I am working in an application monorepo right now, and I definitely see that sometimes if you’re not careful you can get sort of tangled dependency chains because people don’t have to think as clearly about where their lines are.
Absolutely.
So I’m kind of curious how you address that, are there best practices, are there things the tooling can do to help there…?
Yeah, that’s actually a very good point… And that’s a point that a lot of people underestimate, initially. It’s kind of similar to the speed aspect - you just jump in, and then people start like “Oh, that’s cool. That library already exists. Let me just use it”, without even asking anyone… You end up with a really spaghetti code situation where you have a lot of cross-dependencies.
So what we do in NX - we try to also support that with tooling. In NX specifically what we have is we have a dedicated, what we call module boundary lint rule, which is actually a lint rule which we created and which we’ve set up for every package or library which you create in a new project. And then we have a top-level configuration where you can give tasks basically to the project. And usually, those tasks are in the form of like domain area sales, domain call on products, or something like that, as well as type, whether something is of a type feature, whether something of a type shared, something of a type core… Things like that.
So it’s just strings which you can attach to projects in some part of the configuration, and then you can specify the relationships that can be possible. So you can say “Okay, all projects of the type like domain sales should only be able to depend on projects which are also of that type, sales, or from that same domain, or are from type shared.” So that way you can actually have an automated rule that runs on CI, and it kind of forbids arbitrary imports from different projects.
So you can kind of create those nice boundaries, and it’s actually a very simple setup, very easy to integrate in CI, because it’s really just like a lint rule run on the workspace… But it can help a lot to actually keep those domains sane, and in nice boundaries within those projects.
There’s also cool additions like – because that is just the very entry point, if you want… But sometimes you even want to have rules that say “You shouldn’t be able to import (I don’t know) React in an Angular-based project, or an Angular-based library.” You can even do that. We have features like can define ban for a certain type of projects, and so you wouldn’t be able to do those imports in a project, otherwise you get a lint error. But you definitely need to have some sort of such tooling in place to keep your workspace maintainable in the long-run.
I like that a lot. We’ve talked – we actually have a couple other episodes recently that I’ll link up in our show notes where we’ve talked about linting, and ESLint, and linting rules, and rules of thumb for when to do that… But I think this is something that is under-utilized in our space, of like “Okay, just because you can do anything doesn’t mean you should do anything.”
[26:14] Oh, yeah.
“And let’s define what good code looks like to us, and create a bunch of lint rules around that.” You mentioned a couple of different types of lint rules… Do they fall neatly in groups or classifications? What’s the set of suggested lint rules that you get from a monorepo setup out of the box, and where should you be looking as you add different types of functionality to your monorepo?
So from the lint rule perspective - whenever you set up, using those plugins and generators, we set you already up with a predefined set of best practice rules for that type of project. So if you have a React library, or a React application, we have already a lint extension installed with ESLint that already sets some rules. And you can obviously go and customize and add different rules, right?
So everything that is related to module boundaries, that is actually just one rule that you can customize, because lint rules usually provide options. You have the lint rule itself, and you can give it options, and in those options you can actually then specify those relationships which we’ve mentioned before… So that is pretty easy to set up.
I can actually link for the show notes a blog post where we do some deep-diving into how those lint rules are specified, how you add those tags and make sure that those dependencies are set up… But yeah, that is the whole advantage of the plugin-based approach that NX has. Because obviously, with those plugins – which is similar to Create React App, right? Like, you know how the project structure looks like, and therefore you can also provide suggestions. And I feel like that is a very important part, a very advantageous part, even for someone that just starts into doing something like React… Because you can generate an application right away; it sets you up with some best practices tooling that are currently on the market. You can start building, and then you can dig deeper and actually fine-tune those tools, or even replace them with different tools once you get more expertise.
And at the same time, we see a lot of the benefits also from companies, because they are like “Okay, our React packages always look that way.” So they can even customize and say “We create our own generator for our libraries, because we always want to have that type of licensing there, or that type of readme” or whatnot. So you can really dig deeper and customize the whole setting for your own corporation. I think that is a powerful concept to introduce.
I like that. And the other thing you talked about there was linting rules that kind of look at the boundaries between these packages… Like, what you have with NX is you actually have this visibility into not just each project, which - you know, there’s value in saying “Okay, a React project should look like this, and at our company we want this X, Y and Z”, but you actually have the super-structure as well, how the things fit together, and so you can start putting rules in place of “Okay, this type of module can’t require that type of module directly”, or what have you. That’s super-cool.
Yeah, exactly, because at the very core of NX basically there is that NX graph, or project graph, which builds – when NX instantiates in your monorepo, it looks at the structures that that monorepo has, it looks at entry points, and so it can build up that graph and say “Okay, there’s a React application here, a couple of packages, these are entry points”, and stuff… And so that structure helps to do a lot of optimization. Not just for lint rules… Obviously, that is a good point as well, because for the lint rules we can then say “Okay, I can add those nice tasks with all those projects in a declarative way”, and then NX can transform that into runtime lint rules which it then executes on CI, or when you re-run the command for linting.
But at the same time, that project graph also serves for a whole lot of other things of optimization, in terms of like speed, and caching, and whatnot, which we could dive deeper in if you wanted. So that is really the fundamental structure. And you can visualize it, which is actually pretty cool as well, in terms of just debugging. So very often what happens is [unintelligible 00:29:49.28] for open source projects. Recently we collaborated with RedwoodJS, as they opted into the Lerna and NX combination, and that gives you also the project graph as a nice side effect.
[30:02] And so for me, for instance, not knowing how the RedwoodJS repository looks like - they have a monorepo, right? This is super-easy - you go in the NX graph, you see the visualization, you can dig deeper and understand “Okay, what type of packages do they have? How are they related to each other? Where are the imports? What are the dependencies that are required?” So it’s a very visual way of exploring the structure, basically.
I love that. I’ve just spent hours a couple of weeks ago mapping out the dependency chains of our monorepo… Which is not using NX.
Yeah. The NX graph - you can actually instantiate it on any repo, if you want. If you do “npx nx graph”, it will work any type of monorepo setup. So it should be able to identify the projects… You could even use it for debugging purposes.
Interesting. And it works across languages, or is it JavaScript-specific?
It is mostly JavaScript-specific. So if you use backend languages, you would need to give it some hints, like where the entry points are. Because out of the box, if it is just package.json files, that is what it looks for, and it tries to understand how the structure is… And this is mostly based on Yarn, npm workspaces stuff…
So yeah, if you have backend structures, backend monorepos with some [unintelligible 00:31:09.07] technology I would have to look into. But you could write your adaptor for that. So it is definitely possible, but maybe not so out of the box.
Got it. We mentioned the sort of dependency lines as one of the potential gotchas people get when they first start using monorepos… Or even not for a start, but if you aren’t careful, you can easily get into this case of tangled dependencies.
Yeah, as you grow, basically. Yeah.
What are some of the other common downfalls you see, and are there tools that help with those?
Yeah. I think that the next big thing usually that hits people is simply the speed on CI. Because from the setup perspective – I mean, when you use NX, for instance, you can go plugin-based, or you just go the lightweight mode, basically, where you set up everything yourself and you just use it for the task scheduling… There you basically can set it up on your own. You can go ahead as you want. But the thing that most people hate at some point is always the speed aspect. So if you for instance use plain Yarn or npm workspaces setups, you can do some sort of filtering to just execute some of the projects that changed, or things in a PR, but the problem is on CI you wanna make sure that you capture exactly the projects that are changed.
So that is for instance one optimization that NX can help, where it understands based on the Git comments and again that graph that I mentioned before, when you actually just changed files in those projects, which you can infer from the graph. So there’s no point, for instance, to run the tests for that other project down there, because there’s no relationships between that project, right?
So things like that, like to be able to cut down the times by simply just not running certain commands at all on CI can already help a lot. Most of the times, however, what really helps is the caching aspect, which means like really entirely cache the actual run itself, which we added in NX well over a year ago mostly… Which basically is nothing else than – it’s kind of like memoization for a function, right? So whenever you run a given function with the same inputs, those inputs obviously for the command being the source code, environment variables, things you even specify; you can fine-tune that, actually, if you wanted to. And if you run that again in some context, then there’s no point to run that command. So you could actually just restore the output from the cache, which means restoring terminal output, restoring potential build artifacts, like JavaScript bundles that got produced… Those things. And as you can imagine, that speeds up things a lot, because then on CI, especially if you share that cache between developer machines, or CI runs and stuff, certain projects get a lot, lot quicker, because simply they don’t need to be built again; so it just restores artifacts, basically. That is one of the main benefits, I think, that you should and you need to have, if you wanna scale.
If you have a small npm package repository or something that just has a couple of libraries, you probably won’t notice it. But in a large environment, you definitely will.
[34:00] Yeah, that makes sense. So when you’re doing that, you’re using your dependency graph to know what is it safe to load from cache, versus what you actually need to re-run because something has changed?
Exactly. The graph is basically the foundation of everything. It’s already starting from what can you cut out based on what changed… But as you mentioned, at the same time, what do I need to invalidate based on a graph and what changed within that graph. So it’s really the core part in there.
But the amounts of hours… I remember – because we were kind of dogfooding the computation caching or that caching ability on our NX projects and the open source projects as well… What we’d do there, for instance - we’d run a whole lot of end-to-end tests, just to make sure that features work. So we’d really spin up, we’d publish a package locally, we’d run end-to-end tests, generate projects, see whether the structure matches what we expect, and things like that… And those take a whole lot of time. [unintelligible 00:34:46.18] the cache which we have right now, and I think in the last month we saved like 6,600 hours of computation, which is, if you think, that is like 270 days in a month.
So you can imagine – it would basically not be possible to run all those end-to-end tests in our repository without the cache. So usually you drop the cache, and you drop end-to-end tests, because it’s just not feasible… Or you have some sort of caching. A lot of modern solutions have some sort of that caching in place, because otherwise it’s basically not feasible.
You mentioned a few tools along the way that NX integrates with. Things like Yarn workspaces, there’s Pnpm, there’s Lerna… What are the different pieces in this ecosystem? How do they fit together? And if somebody was wanting to explore this for the very first time, where should they get started?
Yeah, the best places to start, again, is like monorepo tools. Monorepo.tools, basically, the website that I mentioned earlier. There we also have not just the overview of what is a monorepo, what are the potential advantages, but also the tools that are currently out there, and a matrix that covers basically the features. And we created the page, but we reached out to all the authors of the various tools to review it, add more stuff, and keep that updated. It is actually an open source page, there’s a repository attached at the very top, so you can reach out and correct the info if there’s something that we’re missing… Which could totally be. So we keep monitoring the tooling space, but we all know how fast things potentially evolve.
So in general, in a JavaScript ecosystem specifically, I think that Lerna was one of the first that got started in the monorepo space. At least I remember having it used back then when I had no clue at all what a monorepo is in general, but I just had the need to have a couple of projects bundled together, and then running tasks across those. So Lerna was one of the first to start. And there, the whole approach is basically to have different npm packages, if you want, with their own Node module folders in there, and Lerna is the layer on top, which kind of coordinates the running of the tasks, and the insulation of the packages.
And what it also does very well is the whole publishing aspect, because it was kind of made for that “Let’s have a couple of npm packages together and be able to publish them, doing the versioning, increment with the versioning” and those things. And that’s it for Lerna. It was kind of going stale two years ago, and then recently Dan mentioned that it’s kind of an abandoned project… And at a point we talked to the maintainer and we took it over, the stewardship, in May, I think, from the [unintelligible 00:40:38.22] A lot of our NX core team members now also help collaborate on Lerna, so we try to optimize and improve it, upgrade all potential security issues because of outdated packages and things like that.
And we also allow now for a much easier integration with NX as well, simply because it was always possible to run them together, because you can always use NX at the top as a task scheduler. So we integrated that, and things like caching, which Lerna obviously didn’t have at that time, because it just wasn’t a thing to have.
And meanwhile, in that same space, also other tools came up. npm workspaces, for instance, and Yarn workspaces, and Pnpm workspaces - they all work in a similar fashion. And in fact, Lerna recently also then mentioned “You shouldn’t use the Lerna Bootstrap”, which was the method for linking packages, for instance, for dependency management within the monorepo, but rather defer to either Yarn workspaces or npm workspaces, just because they kept up in this space, they’re more up to date. So that is actually also the best practice that is mentioned there.
So yeah, those are the main tools. There are also other similar tools like Lerna. For instance, Rush, there is Turborepo… There’s also then more Lage I think from Microsoft, that is also in that same space… And all those tools mostly focus on having packages with their Node modules, with their single package.jsons within that, and do the coordination on top, the task scheduling. So that is maybe kind of the difference.
[42:11] So NX can do that as well in that space, but then it has also that other side of things which I mentioned, like the plugin side… Which is kind of different from the approach that all those other tools use. So that approach leans more towards what things like for instance Bazel provides. Bazel comes with a whole set of plugins and you just simply optimize some things. That means it is more, if you want, involved initially to set up… Because obviously, the plugins kind of give you some constraints where you move within that, and you can customize that… But at the same time, at least from our perspective, for companies it is much more beneficial in the long-run, simply because you have the generators that I’ve mentioned before, but you also have things like automated migrations, upgrading the tools in an automated fashion… Which is a big point, because if you have a monorepo with like a whole lot of React applications in there, and you need to upgrade them at some point, we all know how that goes, right?
Yes. Actually, can you dig into that a little bit more, having just been working on upgrading our build tools, and other things? What are you able to do to help out with that?
Yeah, so basically, since we have the plugin structure, if you use for instance the NX plugin for setting up a React library or application, then we don’t really expose for instance something like the underlying Rollup config or WebPack config. You can hook into it, so you can try custom WebPack file, which then gets extended with the overall rules that we provide, for instance… But overall, it’s kind of configuration-based. So the configuration is kind of data that we can then consume from the NX perspective.
So what you do there is more you say like “Okay, for building a React application, I want to use such a WebPack builder”, which is basically a package from the plugin, from the novel React plugin in that case, that comes with that plugin that we ship… And that plugin obviously knows how its structure looks like.
And so the benefit of that in the end is when you come to upgraded tooling, [unintelligible 00:43:59.11] NX itself, but also tooling like React or other packages that we support based on those plugins, you can just run a command that is called “nx migrate”. What it will do is it will basically scan the workspace, it will see what are the package.json version that you currently have, and then we as the NX core team provide a set of migration scripts, basically, that bring you from one version to the next. So that means NX upgrades, which potentially could be like we change some configuration, so we update that configuration for you. So we go in and change the TypeScript files using ASTs, we flip the imports, as well as React versions.
So you’re basically writing code mods to do migrations, if I’m understanding right.
Yeah, exactly. It uses kind of a virtual file system to run the migrations again, to make sure it works in the end; otherwise it reverses the operation. But it’s very similar. It uses some kind of framework that we built, like a dev kit, in which you can basically very easily write those migrations… But the benefits are huge, as you can imagine.
As I mentioned before, some of our clients were on WebPack 4, and when WebPack 5 came out, we basically with some of the NX operators we provided also those migrations… So with really just running a single command it would upgrade them to WebPack 5. It would adjust the scripts that they had in there. So it’s a mostly painless upgrade.
You just sold me on NX. If nothing else you’ve said today, that just sold me, because I’ve spent more hours upgrade WebPack configurations and other things… That’s cool.
Yeah, that is the power behind the plugins… Because I can totally understand, and that’s also why NX has both sides, because you wanna kind of serve both people. Because if you are in the space of more the Lerna type repositories, where everything’s kind of open, you provide your own builders, what NX does there is really just run the package.json scripts that you have. So you can do whatever you want in those scripts.
[45:55] While in the plugin world, we provide those builders already for you. So you configure them. You mention like “Oh, I wanna run it with Rollup, and I wanna have different bundles, you do that via some configuration options… Which might seem initially to be more restrictive, but for instance in a corporate environment we saw that to be hugely beneficial, because of the upgrades which we mentioned… Because that allows us to reason about the structure. Because when we write those migration scripts, for instance, we go into that configuration and consume that as data, basically. So we look at “What options did you configure? Did you have a custom WebPack file?” If you have that, we need to make sure to go and look into that file, or notify you simply, we migrate webpack to 5 but look you have these and these files there. Go there and check that those work, because you customized that. And so, while it is kind of a box where you can move in there, and it’s not like super rigid, [unintelligible 00:46:40.26] It gives you all those benefits in the long run.
We have done Angular migrations for clients which had monorepos with like 200+ developers on there, and migrating all Angular versions across all application libraries that are in that project. You can imagine – you cannot really say “Well, for three weeks everyone stops developing, we do the migration, then come back and go ahead with new PRs.” You’re in a dynamic environment, so you create the upgrades while our people continue merging into the main branch… Which is another way why we for instance - all migrations, you can run them multiple times. So there’s actually a file that is being produced, which reference the scripts that need to be run. So once you merge that, also other open PRs that don’t have the migration yet, they can rebase, run the migration itself, and it would upgrade the change that they have created meanwhile. That is hugely powerful.
That’s amazing.
Because as far as I know, or at least that’s my experience, in all the plannings and stuff there’s never space for upgrading tooling. It’s always like “Yeah, sure, let’s do that maybe later, or at some point…” Because it doesn’t show the value. And it makes sense, it doesn’t produce features, of course.
You end up fitting it in around the edges…
Exactly.
…and it’s part of a massive maintenance burden. This is all kind of reminding me – so it’s interesting that you tackle that side, and that that comes out of the same thing that gives you the ability to do generators, and things like that… Of having this visibility and this higher-level tech. Because that tackles the two ends of the spectrum - code creation, but then maintenance, which we know is like the unsung… I don’t wanna say “hero.” The part of the iceberg underwater for any software organization, is how much time you spend on maintenance.
Yeah, yeah.
So if you can make migrations and keeping things up to date that easy, that’s a tremendous amount of value.
Exactly, exactly. And that’s also why it’s very appealing, usually, when you – also with NX, you can use it in a lightweight setup, and with a couple commands you just basically install the NX package and you have it running in a Pnpm workspace. So this is super-appealing, because you have full flexibility, you can do whatever you want, but that is just the starting point, usually… Especially if you’re in a more enterprise environment. Because then, as you mentioned, there comes the maintenance, which is all the things we talked about, about those boundary rules… Which is not really helpful initially, but obviously, as you go, as you allocate more teams to more projects, you need to have some sort of mechanism in place to also kind of make sure it doesn’t go out of shape six months in, a year in… Because this is an investment, a monorepo, so you wanna keep it going also in the long run.
Those rules let you put in ratchets, so that you can’t backslide on the quality a little bit.
Yeah.
I really like that. Okay. So what’s next? What are the unsolved problems in the monorepo world that you’re working on with NX, or that you see going on out in the community?
We have actually a roadmap, so if people are curious about the very details, we have – usually on our GitHub repo there’s a discussion section, which has like a pinned entry for the roadmap for NX 15.
We usually have two big breaking releases, if you want, or major releases per year. And “breaking” is actually not correctly, because as I mentioned before, we have those automated migrations, so they will bridge you over to the next version, which is also one reason why we have those… So we can make changes that are potentially breaking, but we will migrate you over to the next one, changing configuration stuff.
[50:08] That is one goal for us, to reduce that configuration aspect to some level. And since we kind of integrated with Lerna, we optimize that part a lot. So we have one entry in that changelog, or roadmap, which mentions negative configuration. So we want to be able to reduce repetitive configuration by saying “Okay, I know that all my build files should cache all the TypeScript files”, or something. So you can specify that at the global level, at the root of the monorepo, without having to do it repetitively on every project… And things like that. So we go a lot deeper on that side.
We also want to improve the basic JavaScript/TypeScript integration in the sense of developing TypeScript packages, because we simply see the importance of it. And NX is kind of TypeScript-first. So you can use it with JavaScript, but from the very beginning we basically generate projects by default with TypeScript; you have to opt out of that if you really want just plain JavaScript. But we want to focus a bit more on also just if someone wants to create a TypeScript-based package repository where they want to publish TypeScript packages to be distributed in those things. So we want to make sure that we have awesome support for that.
And we have some support already, like being able to set up, again, a library with TypeScript, and even SWC compiler if you prefer that one… So things like that. But that is on our roadmap to push a whole lot.
And then there is the whole distribute aspect, which we started three ago, with the caching, which I mentioned… So distribution of the caching across the machines. But that’s really just the beginning. So one thing that we already launched a year ago mostly is the distribution also of the tasks that you run on CI, for instance. Because again, knowing the graph, and then knowing – if you distribute that execution of those tasks, we also have historical data…
So right now, what we do, for instance if you run those tasks in a distributed fashion, we can schedule them and parallelize them in the most optimal way, where you can say “Okay, I have five agents on my CI at disposal”, because you need to parallelize at some point… We can say “We distribute those tasks on all those five agents in an optimal way, such that you utilize them at maximum capacity.”
So not just that like the first agent picks up that task that takes like 30 minutes to build, and all the other five agents kind of wait there idle after like a minute, because their tests are super-quick.
So things like that we’re investing a whole lot… And we have it already running, and it already shows a whole lot of huge benefits… But there’s still some cool ideas which we have to optimize that further. So that is mostly our focus right now…
Awesome. I feel like I wanna go and try NX now, though our monorepo has a heck of a lot of Python in it, so I don’t know how well it will pull in, but…
There is actually a Python plugin, if I’m not wrong. We have community plugins, and I think I saw some Python plugin coming around at some point. Because NX in general just runs tasks; so you can easily also hook in other type of tooling, because the caching and all those things, the parallelizing of the tasks and the execution happens at a much higher level. So it’s really just triggering some command, if you want, capturing the output and caching that output. So it’s a technology-agnostic interior, right?
Yeah. I don’t know, I mean, is the Python AST - how similar is it to the JavaScript one? Can we do some of those automatic updates and codegen in there as well?
Yeah, totally. I mean, you could create your own extensions to the project graph as well. So we have an API where you can hook in and tell NX “If you do imports in Python, this is how it looks like.” So you can parse them and give that to the NX graph. And once the NX graph has that ability, you can basically leverage that functionality as well.
In terms of the whole ASTs, that is also something you just trigger. Because, for instance, the migrations that we do - it’s not always necessarily using the TypeScript ASTs to manipulate files. Very often it’s even just like the data for instance is JSON, or the configuration; you just parse the JSON, change the keys and write it back to the file system… And that’s obviously something you can just create your YAML parser or something, where you parse your YAML files which you might use with Python. And then – I’m not sure whether Python has some similar things, like an AST or something, but I guess so… So you could hook it into that.
[54:17] It does have an AST, and you can manipulate the AST and do – so yeah, there is definitely some of those same capabilities.
Exactly. Because the automated migration framework we have really just gives you the shell. So you have an entry function which gets called when your migration needs to be run, based on the versions that you have in your monorepo. And then whatever you do in that function, whether you then use some AST parsing of Python rather than TypeScript should be up to you. So you can fill that slot in. There’s nothing magic in there. It’s more our framework.
Super-cool. Well, I’m excited. This is good. Do you have anything else you wanna leave the JS Party listeners with before we wrap up for the day?
Basically, just check out nx.dev, that’s our main website. We’re also currently working hard on improving our documentation, which is always an ongoing process, and we all know how hard that is… But definitely check that out. Also, follow us on @nxdevtools on Twitter. That’s probably the best source to get new info; that’s where we post our videos that we have on our YouTube channel, and content that we push out… So yeah, follow us there, and ping me or us on that Twitter handle if you have questions. Always happy to hear new users, and even users maybe adopting it for Python. We’re always happy to see that, things outside of the JavaScript ecosystem.
I can’t guarantee we’ll adopt it, but I’m definitely gonna take a look. Amazing. Well, thank you so much, Juri, and I think with that we have come to the end. Farewell, all. Take care, and we’ll catch you next week.
Our transcripts are open source on GitHub. Improvements are welcome. đź’š