Maven GPT Web Application

Table of Contents

Background
Solution Outline
TechStack
Building and running
Administration
- Configuration
- Elasticsearch administration
Ideas

Mission statement

This project (Maven GPT) aims at answering common Apache Maven questions, in particular for developers. It (currently) provides a web service that will accept HTTP GET requests at http://localhost:8080/ask.

The next steps will be to extend it by a suitable UI and host it on the public Internet. The author hopes to deliver a valuable service to the Maven community. Additionally, a small group of people will try to gain a better understanding of common problems with Maven (or at least its documentation) and feed this feedback back to the Maven developer community.

Background

Maven claims to be a software project management and comprehension tool provided by the Apache Software Foundation (ASF). In fact, Maven is used to build and test Java software (or other languages from the JVM universe).

There are many sources to gain information about Maven like

The Maven Project Site
Mailing Lists
An ASF hosted Confluence and Jira
Misc. source code repositories (hosted by the ASF, GitHub and others)
Uncountable blog-articles, conference talks etc.

However, even for experienced Maven users or developers, it is sometimes hard to answer questions or give background information (design decisions, current requirements, good practices, etc.). Sometimes answers and discussions are very opinionated. What seems to be a great approach in one context, could be an antipattern in a different scope.

Solution Outline

Maven GPT (currently) uses a simple AI model to generate a response to the question. The AI model uses an of-the-shelf GPT (like OpenAI ChatGPT) and additional information, e.g.,

✓ The ASF Confluence Maven pages,
❏ Maven documentation
❏ ASF Jira issues,
❏ Maven source code.

TechStack

Context View

@startuml
skinparam handwritten true

!define CLOUDOGUURL https://raw.githubusercontent.com/cloudogu/plantuml-cloudogu-sprites/master
!includeurl CLOUDOGUURL/common.puml
!includeurl CLOUDOGUURL/dogus/cloudogu.puml
!includeurl CLOUDOGUURL/dogus/confluence.puml
!includeurl CLOUDOGUURL/tools/elastic.puml

actor "Maven User/Developer" as user #beige
interface "LLM" as llm

node "localhost" #lightgreen {
TOOL_ELASTIC(es, "Vector\nDatabase")
llm -[hidden]- es
control "Asynchronous\nDocument\nLoader" as dl #orange
control "AI Agent" as agent #orange
}

DOGU_CONFLUENCE(confluence,"ASF Confluence")

agent -[hidden]- dl

dl -right-> es : Upload\nvectorized\nknowledge
dl -down--> confluence : Analyze\nDocumentation

user -down-> agent : query
agent -right-> es : enrich query
agent -right-> llm : context based query

note right of llm #beige
OpenAI API
(or other Cloud
provided or local
hosted LLM)
end note

@enduml

The project uses Spring Boot to provide its service.
The underlying LangChain4J technology would enable to use misc. Large Language Models (LLMs).

Note
Currently, we only use OpenAI with an older model and its parameters:
```
link:src/main/resources/application.properties[role=include]
```
A vector database (or vectorized retrieval store, i.e., Elasticsearch) runs in the background to enable Retrieval Augmented Generation (RAG).

Building and running

If you are familiar with Spring Boot, you may find other ways to play around with the project.

Prerequisites

Note	Currently, the project is only prepared to run locally (on your machine).

Install Java 21, e.g., via SDKman.
Get an OpenAI API Token and store it in the environment:
- To obtain an OpenAI API token, you will need to create an account on the OpenAI website. Once you have created an account, navigate to the API page and click on the "Get API Key" button. You will then be prompted to enter your billing information and select a plan. After completing these steps, you will be provided with an API key that you can use to access the OpenAI API.
- Store the key locally (for your convenience).
- Provide it for subsequent steps by either
  - Adding it to application.resources (not recommended), or
  - Creating a particular Spring profile, or
  - Setting an environment variable OPENAI_API_TOKEN (cf. DirEnv to store it in the long run).

Download (update) input sources

ASF Confluence Maven

mkdir -p download/cwiki
cd download/cwiki
wget -P display/MAVEN -m --no-parent https://cwiki.apache.org/confluence/display/MAVEN/Index

Run the Vector Database

Run the backing services (Elasticsearch and Kibana).

docker compose up -d

Load Data into Vector Database

Load data into the vector store (Elasticsearch). This only needs to be performed once after each download/update.

Note	Delete the content of the ES store before reloading the data. curl -X DELETE http://localhost:9200/maven-gpt

Then run the document loader class.

./mvnw spring-boot:run -Ploaddata

Once data is loaded, you should see them via Kibana in the respective Index (maven-gpt).

Run GPT Engine (AI Agent)

Start the application.

./mvnw spring-boot:run

Then access the endpoint

curl http://localhost:8080/ask?message="Which%20plugins%20handle%20the%20build%20lifecycle?"

This should respond with something like

{"result":"The plugins that handle the build lifecycle in Apache Maven are categorized into different groups based on their functionalities. Group 1 consists of core lifecycle plugins such as maven-clean-plugin, maven-compiler-plugin, maven-deploy-plugin, maven-help-plugin, maven-install-plugin, maven-gpg-plugin, maven-resources-plugin, maven-source-plugin, and maven-toolchains-plugin. Group 2 includes site-

Testing/Usage

IntelliJ HTTP Requests in src/test/http-requests/application.http show some manual testing and usage examples.

Administration

Configuration

TBD

Elasticsearch administration

IntelliJ HTTP Requests in src/test/http-requests/elasticsearch.http provide some useful RESTful access patterns for the underlying Elasticsearch engine.

Ideas

Load data from other sources, e.g., Mojohaus Plugins.
Generate and verify questions from Stackoverflow
Add feedback to the UI (once it is created) for users of the service (collect via DB and evaluate frequently)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.mvn/wrapper		.mvn/wrapper
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.adoc		README.adoc
docker-compose.yml		docker-compose.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maven GPT Web Application

Background

Solution Outline

TechStack

Building and running

Prerequisites

Run the Vector Database

Load Data into Vector Database

Run GPT Engine (AI Agent)

Testing/Usage

Administration

Configuration

Elasticsearch administration

Ideas

About

Releases

Packages

Languages

License

ascheman/maven-gpt

Folders and files

Latest commit

History

Repository files navigation

Maven GPT Web Application

Background

Solution Outline

TechStack

Building and running

Prerequisites

Run the Vector Database

Load Data into Vector Database

Run GPT Engine (AI Agent)

Testing/Usage

Administration

Configuration

Elasticsearch administration

Ideas

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages