__ __
| | ______________ _/ |_ ____ ______
| |/ /\_ __ \__ \\ __\/ _ \/ ___/
| < | | \// __ \| | ( <_> )___ \
|__|_ \ |__| (____ /__| \____/____ >
\/ \/ \/
Collect and stream chunked files over http and stream them into Google Cloud Storage.
You will need two services, a running instance of this one and a client implementation, which basically requires an HTTP server with 3 endpoints. You can find a sample client implementation in Java here.
Each transfer is handled in a "transaction" you can request such by calling the POST: /api/transaction/run
(as mentioned below).
The service will then open up a stream to the GCS file (filename will be the transactinId) in the configured bucket, it will afterwards
walk through all the provided chunks e.g. chunks: 2
, will result in 3 calls /request/{transactionId}/chunk/0
, /request/{transactionId}/chunk/1
and /request/{transactionId}/chunk/2
while streaming the response into the GCS file. During start and end of such transactions the /start
and /ack
endpoints will be called in the service provider.
The service can handle multiple transactions at the same time, calling the endpoint POST: /api/transaction/run
with the same transactionId again, will trigger no actions whatsoever, if the transaction is still running.
- Node.js >= 9.x.x (we suggest >= 11.x.x)
As simple as yarn global add kratos-server
.
(NOTE: In case you dont have yarn run npm i -g yarn
first.)
kratos-server "./baseConfig.js"
You just have to throw in a config (JSON or JS). A base config is always used, so you just have to overwrite your specific requirements.
Check out kratos-server -h
for other options.
With any HTTP client.
Checkout the API quick start or the setup infos below.
Basically there are two available api endpoints, POST: /api/transaction/run
to run a transaction
and GET: /api/transaction/status
to check all currently running transactions.
Triggering a sample transaction might look like this:
curl -X POST \
http://localhost:1919/api/transaction/run \
-H 'Content-Type: application/json' \
-d '{
"transactionId": "blablabla",
"chunks": 3,
"baseUrl": "http://localhost:8080",
"extension": "parquet",
"contentType": "application/octet-stream"
}'
You can monitor this service via Prometheus at /metrics
.
This service has build in access management.
You define tokens as keys in the configs http access object and set the topic names or special rights as string members of the key's array value.
A wildcard *
grants all rights.
e.g.
const config = {
http: {
access: {
"token-for-admin": [ "*" ],
"other-token": "*",
},
},
};
When making calls to the HTTP API the token is provided in the authorization
header.
*
Allows every operation
Be aware that the default configuration is a wildcard for everything. (Meaning no token is required). Never expose this service's HTTP interface publicly.
It is possible to set a few config parameters (most in role of secrets) via environment variables. They will always overwrite the passed configuration file.
ACL_DEFINITIONS="mytoken=topic1,topic2;othertoken=topic3" roach-storm -l "./config.json"
-> turns into:config.http.access.mytoken = [ "topic1", "topic2" ];
Christian Fröhlingsdorf @chrisfroeh
Build with ❤️ 🍕 and ☕ in cologne.