The project used the followig technologies:
- Java 11
- Spring Boot 2.4.1
- Spring Cloud
- Spring Cloud Stream
- Apache Kafka & Apache Zookeeper
- Stanford Core NLP 4.2.0
- MongoDB
- Github Actions
- Github Packages
The steps defined in CI workflow are the following:
- Build: This step compiles all the microservices. This step runs on every pipeline
- Test: The mission of this step is to run all tests in the project
- CI on every push: This pipeline only runs unit tests on this step.
- CI on PRs and main branch: It runs unit and integration tests.
- Publish artifacts: This step publishes the artifacts on github packages
- Publish docker: This step builds the docker images and publishes them to dockerhub repositories
- Publish chart: This step builds a helm chart and publishes it into gh-pages
Also it is integrated on github to pass this checks on every pull request
Despite these coding test have not any security implemented, in a production environment these APIs should have a security layer using JWT tokens or the OAuth2 standard. Also, I would suggest having a gateway on top of these services, so you only expose these APIs through it.
Both APIs are documented with Swagger following the OpenAPI 3 Specification.
- NLP-Processor Swagger: http://localhost:8081/swagger
- Patent-Manager Swagger: http://localhost:8082/swagger
Both APIs have the basic actuator endpoints to check if the services are running.
- NLP-Processor health endpoint: http://localhost:8081/actuator/health
- Patent-Manager health endpoint: http://localhost:8082/actuator/health
All asynchronous communications between microservices are handled by Apache kafka and Zookeeper, instead of using HTTP protocol, since it is more reliable and allows us to have fault-tolerance.
All synchronous communications between microservices are handled calling the REST API by using HTTP protocol.
In order to build all components, from root folder you have to do one of the following steps:
-
Build helm chart locally:
Note: You need to have installed kompose and helm CLI
Run the scriptbuild-all.sh
on the root folder with the following command
PACKAGE=true BUILD_IMAGES=true ./build-all.sh
-
Compile all microservices and build docker images:
Go to nlp-processor and patent-manager folders and run the following command on both:
./gradlew build && cd docker && ./build-image.sh
-
Compile all microservices:
Go to nlp-processor and patent-manager folders and run the following command on both:
./gradlew build
-
Compile all microservices skipping tests:
Go to nlp-processor and patent-manager folders and run the following command on both:
./gradlew build -PskipTests
From root folder you have to run the following command to start docker containers with the docker hub images:
REGISTRY=rogomdi/ docker-compose up -d
Or if you have built them on your computer:
docker-compose up -d
When you run docker compose, these are the ports exposed to your computer:
- 2181 for Zookeeper
- 2717 for MongoDB
- 9000 for Kafdrop (Kafka UI)
- 9092 for Kafka
- 8081 for NLP-Processor
- 8082 for Patent-Manager
Note: You need to have installed kompose and helm CLI
From root folder you have to run the following command to run it building the images:
PACKAGE=true BUILD_IMAGES=true ./build-all.sh
Or you can run it with the images uploaded to dockerhub by the CD:
REGISTRY=rogomdi/ ./build-all.sh
Install the chart by running: helm install basf-coding-challenge basf-test-1.0.0-local.tgz
Since we have not configured any Ingress controller, to access the APIs and Kafdrop you will need to expose the ports from services.
To do that we will need to run the following commands:
-
Exposing kafdrop:
POD_NAME=$(kubectl get pods -n dev | grep kafdrop | awk '{print $1}')
kubectl port-forward 9000:9000 $POD_NAME
-
Exposing patent-manager:
POD_NAME=$(kubectl get pods -n dev | grep patent-manager | awk '{print $1}')
kubectl port-forward 8082:8082 $POD_NAME
-
Exposing nlp-processor:
POD_NAME=$(kubectl get pods -n dev | grep nlp-processor | awk '{print $1}')
kubectl port-forward 8082:8082 $POD_NAME
-
Why use Kafka instead of RabbitMQ?
Since Kafka is designed for deliver thousand of messages at a lower latency than RabbitMQ, this is the appropiate technology. -
Why use a NoSQL database such as MongoDB?
As mentioned in the statement, we need to store lot of data and the schema for the patent can be different in the future. In this case, a NoSQL database is a good option. -
Why use Github Actions?
I have selected it because it is easy to configure and it is fully integrated with Github -
Why having the synchronous and asynchronous process?
In my opinion, to debug and to process a few patents. Of course, if you want to process a huge amount of data, the asynchronous way is the best option. -
If I process a ZIP in an asynchronous way, how do I know if the NLP Process has finished?
Well, if you want to look for a patent, you can use the API to request it by its UUID or application. Note that other best approachs would be to notify through a websocket or publish on a kafka topic once the process is finished and read messages from it. -
Why do you have two microservices in the same repository?
Since it is a coding challenge I think it is simpler to look into a single repository instead of cloning three different repositories. In production, each microservice should have its own repository and CI/CD pipelines.