We put excellence, value and quality above all - and it shows




A Technology Partnership That Goes Beyond Code

“Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
Deep Java Library (DJL): A Practical Deep Dive for Java, Python, and Hybrid Teams

Audience: developers who want to use deep learning models without rewriting their stack.
Goal: understand what DJL is, where it fits, and how to get productive quickly—whether you write only Java, only Python, or both.
TL;DR
Deep Java Library (DJL) is an open-source, engine-agnostic deep learning library for Java. It lets you run and serve modern AI models from the JVM (inference and training) while staying in your Java tooling (Maven/Gradle, Spring Boot, observability, deployment pipelines). DJL isn’t “Java trying to replace Python”—it’s a pragmatic bridge: train or fine-tune in Python if you want, then ship inference in Java cleanly, safely, and at scale.
1) What DJL Is (and What It Is Not)
What DJL Is
DJL (Deep Java Library) is a set of Java APIs and runtime components that make it straightforward to:
- Load deep learning models (from local files or model zoos)
- Run inference (CPU/GPU) with strong typing and predictable deployment
- Train/fine-tune models from Java when that’s useful
- Integrate deep learning into JVM applications (services, batch jobs, streaming)
What DJL Is Not
- DJL is not a new deep learning “engine” that competes with PyTorch/TensorFlow at the kernel level.
- DJL is not a requirement if your whole world is already Python and you’re happy deploying Python everywhere.
Instead, DJL is a JVM-friendly façade over proven engines (e.g., PyTorch, TensorFlow, ONNX Runtime, MXNet—availability varies by platform and DJL version). The key value: you interact with a consistent Java API while choosing the engine that matches your model and deployment constraints.
2) Why DJL Exists: The Reality of Production Stacks
Most production systems aren’t “all Python.” They’re:
- Java/Kotlin services (Spring Boot, Micronaut, Quarkus)
- JVM batch pipelines
- Streaming (Kafka, Flink)
- Strong SLAs, mature observability, security processes
Python is phenomenal for research and iteration, but many teams still want:
- Java-native packaging (one deployable artifact)
- JVM observability (metrics/logs/tracing)
- Type safety + maintainability in large codebases
- Enterprise-friendly ops (consistent runtime, fewer moving parts)
DJL is one of the most direct ways to run modern ML models inside that world.
3) The Mental Model: DJL’s Main Building Blocks
You don’t need to memorize everything, but it helps to know the “shape” of DJL.
3.1 Engine: The Backend Runtime
An engine is what actually executes tensor operations. DJL hides engine differences behind a stable Java API.
Practical Implications:
- The same Java code can often run with different engines (with small configuration changes)
- Some models are easier on specific engines (e.g., PyTorch for TorchScript, ONNX Runtime for ONNX)
- Packaging and native dependencies depend on engine choice and CPU/GPU target
3.2 NDArray / NDManager: Tensors and Memory
DJL provides its own tensor abstraction (NDArray) and a memory lifecycle helper (NDManager).
If You’ve Used Python Libraries:
- NDArray is conceptually similar to torch.Tensor or numpy.ndarray
- NDManager is a JVM-friendly answer to “who frees native tensor memory?”
A Practical Rule: treat NDManager like a scoped resource manager. Create arrays inside a scope; close the manager when you’re done.
3.3 Model + Translator: Turning Inputs Into Outputs
In DJL you typically:
- Load a Model (from a directory, URL, or model zoo)
- Create a Predictor<Input, Output>
- Provide a Translator (or use a built-in one) to:
- preprocess inputs (tokenize text, resize images)
- postprocess outputs (decode classes, parse logits)
3.4 Model Zoo: “Give Me a Working Model Now”
DJL’s model zoo concept helps you start fast:
- You pick a model artifact and task
- DJL downloads model files (when allowed) and configures the pipeline
- You get a ready predictor
This is great for learning, demos, and bootstrapping.
3.5 Criteria: The “Contract” for Loading a Model
If you build anything non-trivial with DJL, you’ll see Criteria. Think of it as the manifest of what you want:
- What types go in and out (setTypes)
- Where the model comes from (model zoo, URL, local folder)
- Which engine to prefer
- Which translator to use
- Optional runtime configuration (device, number of threads, etc.)
Why this matters: it forces you to be explicit about assumptions. In production, implicit assumptions are what turn into 3 AM incidents.
3.6 Translator Deep Dive: Where Correctness Lives
Most inference bugs aren’t “the model is wrong”—they’re:
- wrong tokenization
- different normalization constants
- wrong resize/crop logic
- channel order mismatch (RGB vs BGR)
- wrong dtype or shape
In Python, you often hide these details in a preprocessing pipeline. In DJL, the Translator is the explicit, testable place for them.
Practical habit: treat the translator like production code. Give it unit tests and golden vectors.
3.7 Devices: CPU/GPU/Accelerators
DJL represents compute targets as Device instances. Even if you start on CPU, design with a “device is configurable” mindset.
Typical Pattern:
- default to CPU
- allow an env var or config file to select GPU
- keep batch size and concurrency configurable
This is how you avoid hard-coding yourself into a corner.
3.8 Training vs Inference: What to Choose
DJL can do both training and inference, but most teams get value fastest by focusing on inference first.
When inference-first is the right call:
- you’re embedding a model into an existing product
- you want to ship features quickly
- you have a Python training pipeline already
When Java-side training makes sense:
- your data and pipelines already live in JVM systems
- you want one stack for ETL + training + deployment
- you need tight integration with Java-only environments
3.9 Engine Selection Guide: Practical, Not Theoretical
Engine selection is where many beginners get stuck. Here’s a simple decision guide:
- Do you already have a model format?
- ONNX → strongly consider ONNX Runtime
- TorchScript → PyTorch engine is often a good fit
- TensorFlow SavedModel → TensorFlow engine (if supported for your target)
- Is portability more important than maximum performance?
- Portability → ONNX is usually the easiest handoff format
- Do you need GPU?
- If yes, confirm the engine + native libraries support your OS/arch (macOS Apple Silicon has different constraints than Linux x86_64)
- Are you embedding inside a Java service?
- Prefer fewer external processes and stable native dependencies
The key: don’t pick an engine by ideology. Pick it by “what model artifact do I have and what platform do I deploy on?”
4) Who Benefits—and How
4.1 If You Know Only Java
DJL is the most “natural” if your primary language is Java.
Benefits:
- Stay in Java: no need to rewrite services in Python to add AI
- Leverage existing architecture: Spring Boot controllers, Kafka consumers, schedulers
- One operational runtime: fewer cross-language deployment concerns
- Type-safe integrations: data contracts can remain consistent across the codebase
Common Use Cases:
- Add image classification to a Java API
- Extract embeddings for semantic search in a JVM pipeline
- Run object detection for an internal tool
- Batch inference over a dataset stored in S3 / Blob/filesystem
Mindset Shift
You don’t need to become a deep learning researcher. You can treat models like a dependency:
- “Load model”
- “Call predict”
- “Return result”
4.2 If You Know Only Python
Even if you never write Java, DJL can still be relevant.
Benefits:
- Production handoff: train/experiment in Python, then deploy inference in Java
- Interop via standard formats: export to ONNX/TorchScript so Java can run it
- Stable serving story: many orgs prefer JVM services for long-term ops
Practical Ways Python-Only Developers Use DJL (Without Becoming Java Experts)
- Export trained models in a standard format (ONNX is the common bridge)
- Provide a small “model contract”: input schema, preprocessing steps, expected output format
- Let Java/DJL run inference in production
This can reduce operational friction: the team that owns the Java platform can deploy and monitor the model without needing a full Python runtime in the service.
4.3 If You Know Both Java and Python (the Sweet Spot)
Hybrid teams get the best of both worlds.
Benefits
- Use Python for rapid iteration and training
- Use Java/DJL for stable, observable, scalable inference
- Keep feature engineering + ETL in the JVM where the data pipelines already live
A Realistic Workflow
- Prototype model in Python
- Freeze/export model (TorchScript/ONNX)
- Build a small Java inference module using DJL
- Deploy as:
- a library inside an existing service, or
- a dedicated model microservice
This reduces “translation loss” between research and production.
5) Getting Started (Java path): Maven Project +Inference
5.1 Prerequisites
- JDK 11+ (you have JDK 21—perfect)
- Maven or Gradle
5.2 Minimal Maven Dependencies (example)
DJL’s API is separate from engine dependencies. You typically include:
- DJL API
- One engine (e.g., PyTorch engine)
- Optional: a model zoo artifact or dataset utilities depending on your use case
Example (you will adjust versions to your target):
<dependencies>
<dependency>
<groupId>ai.djl</groupId>
<artifactId>api</artifactId>
<version>0.30.0</version>
</dependency>
<!-- Choose ONE engine (example: PyTorch engine) -->
<dependency>
<groupId>ai.djl.pytorch</groupId>
<artifactId>pytorch-engine</artifactId>
<version>0.30.0</version>
</dependency>
<!-- Optional: a basic logger implementation -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>2.0.13</version>
</dependency>
</dependencies>Notes:
- The engine artifact you choose matters. If your model is ONNX, you may prefer ONNX Runtime.
- In real projects, align DJL and engine versions carefully.
5.3 A Minimal “Load Model and Predict” Structure
A common DJL inference flow in Java looks like:
- Create Criteria describing your model and types
- Load a ZooModel
- Create a Predictor
- Call predict
Pseudo-structure:
Criteria<InputType, OutputType> criteria = Criteria.builder()
.setTypes(InputType.class, OutputType.class)
.optModelUrls("...")
.optTranslator(new MyTranslator())
.build();
try (ZooModel<InputType, OutputType> model = criteria.loadModel();
Predictor<InputType, OutputType> predictor = model.newPredictor()) {
OutputType out = predictor.predict(input);
// handle output
}Don’t worry if this looks “frameworky”—it’s mostly about making model loading and preprocessing explicit.
5.4 A Concrete Example: Image Classification nd-to-End
It’s much easier to learn DJL with a real example you can run. The pattern below is intentionally “boring Java”—no magic, no reflection-heavy frameworks.
What this example does
- Loads a pretrained image classification model
- Reads an image from disk
- Returns the top predicted classes
5.4.1 Add The Right Dependencies
In addition to ai.djl:api, you typically add:
- an engine (example: PyTorch)
- an engine-specific model zoo artifact (so DJL can locate pretrained models)
Example (Maven):
<dependencies>
<dependency>
<groupId>ai.djl</groupId>
<artifactId>api</artifactId>
<version>0.30.0</version>
</dependency>
<dependency>
<groupId>ai.djl.pytorch</groupId>
<artifactId>pytorch-engine</artifactId>
<version>0.30.0</version>
</dependency>
<!-- Enables convenient access to pretrained PyTorch models via the DJL model zoo. -->
<dependency>
<groupId>ai.djl.pytorch</groupId>
<artifactId>pytorch-model-zoo</artifactId>
<version>0.30.0</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-simple</artifactId>
<version>2.0.13</version>
</dependency>
</dependencies>
If you pick a different engine (for example ONNX Runtime), you’ll choose the corresponding engine + model-loading approach.
5.4.2 Java code (single-file demo)
import ai.djl.Application;
import ai.djl.ModelException;
import ai.djl.inference.Predictor;
import ai.djl.modality.Classifications;
import ai.djl.modality.cv.Image;
import ai.djl.modality.cv.ImageFactory;
import ai.djl.repository.zoo.Criteria;
import ai.djl.repository.zoo.ZooModel;
import ai.djl.translate.TranslateException;
import java.io.IOException;
import java.nio.file.Path;
public class ImageClassificationDemo {
public static void main(String[] args) throws IOException, ModelException, TranslateException {
if (args.length != 1) {
System.err.println("Usage: java ImageClassificationDemo <path-to-image>");
System.exit(2);
}
Path imagePath = Path.of(args[0]);
Image img = ImageFactory.getInstance().fromFile(imagePath);
Criteria<Image, Classifications> criteria = Criteria.builder()
.optApplication(Application.CV.IMAGE_CLASSIFICATION)
.setTypes(Image.class, Classifications.class)
// You can add filters to select a specific architecture.
// Filters depend on the model zoo and engine.
.optFilter("layers", "50")
.build();
try (ZooModel<Image, Classifications> model = criteria.loadModel();
Predictor<Image, Classifications> predictor = model.newPredictor()) {
Classifications result = predictor.predict(img);
System.out.println(result.topK(5));
}
}
}What to learn from this code
- Criteria is your “load contract.”
- try-with-resources is not optional: it’s how you avoid native memory leaks.
- The input/output types (Image → Classifications) are explicit.
5.5 A Second Concrete Example: Embeddings for Semantic Search
Embeddings are one of the most common “I want AI in my Java app” use cases:
- search (“find similar products”)
- deduplication (“are these two tickets basically the same?”)
- recommendations (“users who read this also read…”)
In Python, you might use sentence-transformers. In Java, the goal is the same: turn text into a vector and store it in a vector DB (or even just compute cosine similarity).
Conceptual Pipeline:
- Normalize input text
- Tokenize
- Run the model
- Pool the output into a single vector (mean pooling is common)
- L2-normalize the final embedding
DJL can do this, but the details depend on the exact model and tokenizer. The “lesson” is: treat preprocessing and pooling as part of the model contract.
5.6 Production-Shape Guidance: Thread Safety and Predictor Reuse
A common question is: “Can I keep a Predictor in a singleton and call it from multiple requests?”
The safe default is:
- Assume a single Predictor is not thread-safe.
- Either create a predictor per request (simple, sometimes enough), or
- Maintain a small pool (better throughput and less allocation churn).
Simple approach for web APIs:
- Keep the ZooModel as a singleton (model load is expensive)
- Use a ThreadLocal<Predictor<...>> so each thread has its own predictor
5.7 Testing: Golden Vectors Beat “It Looks Right”
For real systems, do not stop at “it runs.” Add tests that lock down correctness:
- For NLP: known input strings → expected top label or close-enough embedding similarity
- For CV: fixed image file → expected top class
When you export from Python, include those golden vectors in the handoff package. This is how you prevent silent regressions when:
- a tokenizer version changes
- you switch engines
- you upgrade DJL
5.8 Dependency Strategy: Pin Versions and Be Intentional
Notebook magics are great for learning, but production should use pinned versions.
Tips:
- Pin DJL and engine versions together.
- Upgrade intentionally and rerun golden-vector tests.
- For container deployments, build images that include everything needed (model artifacts + native libs) so runtime downloads don’t surprise you.
6) Getting Started (Notebook Path): Run Java + DJL Inside Jupyter
This is a great way to learn DJL because you can execute Java incrementally—like a Python notebook.
6.1 What You Need
- JDK 11+
- Jupyter Notebook/Lab
- A Java kernel (IJava)
You already installed these in this workspace:
- Jupyter is in your .venv
- The java kernelspec is installed
6.2 Verify Kernel Availability
From the workspace venv:
/Users/adeel.aslam/projects/djl/.venv/bin/jupyter kernelspec listYou should see a java kernel.
6.3 A Notebook-Friendly DJL Dependency Pattern
The “Dive into Deep Learning (DJL)” notebooks often use a magic like:
%maven ai.djl:api:...
%maven ai.djl.mxnet:mxnet-engine:...That’s a notebook convenience: dependencies are fetched during the session.
For production code, you generally don’t do this—you pin dependencies in Maven/Gradle.
6.4 A First Java Notebook Cell You Should Run
When you’re learning, you want a tiny feedback loop.
Start with a cell like:
System.out.println("Java kernel is alive");If that prints, you’ve verified:
- the notebook is using the Java kernel
- the kernel can start a JVM
- basic IO works
6.5 Loading DJL Dependencies in a Notebook (the D2L Style)
In the D2L DJL notebooks, you’ll commonly see dependency cells. The idea is:
- download jars at runtime
- add them to the notebook classpath
- import and run DJL
Example:
// DJL API
%maven ai.djl:api:0.20.0
// Logging
%maven org.slf4j:slf4j-simple:2.0.1Then pick an engine. For example, a notebook might choose MXNet or PyTorch depending on the chapter.
6.6 Why Notebooks Can Feel “Too Easy” (and What to Do About It)
Notebook magics are convenient, but they hide production concerns:
- how versions are pinned
- where model artifacts live
- how network access works in your runtime
If your end goal is a production service, do both:
- learn the concept in the notebook
- then immediately replicate it in a Maven project with pinned dependencies
6.7 A Simple Notebook Correctness Check
Before you invest hours in a chapter, run a quick “can I allocate a tensor?” check:
import ai.djl.ndarray.NDArray;
import ai.djl.ndarray.NDManager;
try (NDManager manager = NDManager.newBaseManager()) {
NDArray a = manager.create(new float[]{1, 2, 3});
System.out.println(a);
}If that works, you’re past the most common environment issues.
7) Getting Started (Python Path): How Python Users Can Collaborate with DJL
DJL is Java-first, but Python users can still benefit from DJL in a few practical ways.
7.1 Treat DJL as the Java-Side Inference Runtime
If you’re training in Python, the cleanest bridge is to export your model to a standard format and ship that to the Java team.
Common Export Choices:
- ONNX: broad interoperability
- TorchScript: good if your target runtime is PyTorch engine
A “handoff package” that works well in real teams:
- Model file(s) (e.g., model.onnx)
- A short document describing:
- expected input shape and dtype
- preprocessing steps (tokenization, normalization)
- output semantics
- a couple of golden test vectors (input → expected output)
The Java team then uses DJL to load and run it.
7.2 Why This Is Worth It for Python-Only Folks
- You can keep the research loop in Python
- You avoid owning production JVM ops if you don’t want to
- You reduce “works on my notebook” drift by specifying a strict contract
7.3 A Step-by-Step Python→Java Handoff Tutorial (ONNX)
This is the most repeatable workflow I’ve seen across teams.
Step A — Export a Model to ONNX in Python
The exact code depends on your model, but the pattern is consistent:
- put the model in eval() mode
- create a representative dummy input
- export with named inputs/outputs
- pin the opset version that your runtime supports
Example (PyTorch → ONNX):
import torch
model.eval()
dummy = torch.randn(1, 3, 224, 224) # example shape for a CV model
torch.onnx.export(
model,
dummy,
"model.onnx",
input_names=["input"],
output_names=["output"],
dynamic_axes={"input": {0: "batch"}, "output": {0: "batch"}},
opset_version=17,
)Step B — Create Golden Vectors
Golden vectors are your insurance policy.
For classification:
- 3–5 fixed inputs
- expected top label (or top-5 set)
For embeddings:
- fixed strings
- expected pairwise cosine similarity ranges (not exact floats)
Write them down in a small JSON file so Java can run the same checks.
Step C — Document Pre-Processing Precisely
For many models, preprocessing is half the model.
Document:
- resize/crop rules
- normalization constants
- tokenization model/version
- max length, padding/truncation strategy
If Java does preprocessing differently than Python did, you will get different answers even if the model weights are identical.
Step D — Load and Run the ONNX Model in Java (DJL)
On the Java side you:
- add DJL + ONNX engine dependencies
- load the ONNX file as a model
- implement the same preprocessing in a translator
At a high level:
import ai.djl.Model;
import ai.djl.inference.Predictor;
import ai.djl.ndarray.NDList;
import ai.djl.repository.zoo.Criteria;
import ai.djl.repository.zoo.ZooModel;
import java.nio.file.Path;
Criteria<NDList, NDList> criteria = Criteria.builder()
.setTypes(NDList.class, NDList.class)
.optModelPath(Path.of("model.onnx"))
// optEngine("OnnxRuntime") // optional, depending on setup
.build();
try (ZooModel<NDList, NDList> model = criteria.loadModel();
Predictor<NDList, NDList> predictor = model.newPredictor()) {
NDList output = predictor.predict(input);
}This example uses NDList to keep things generic. In real code, you wrap this behind a typed API so the rest of your service doesn’t speak “tensors.”
Step E — Compare Outputs in CI
Run golden-vector checks in Java CI.
If you can, run the same checks in Python CI. When results diverge, you’ll know whether it’s preprocessing, export, runtime, or model drift.
7.4 A Note About Tokenizers
For NLP models, tokenization is a common source of mismatch.
If the model was trained with a specific tokenizer implementation/version, treat it as part of the artifact. Don’t “reimplement it by hand” unless you’re willing to validate the behavior thoroughly.
7.5 When Not to Use ONNX as the Bridge
ONNX is a great default, but it isn’t universal.
Avoid ONNX as the bridge if:
- your model uses unsupported ops in ONNX
- you need absolute parity with a PyTorch-only feature
- you’re in a rapid-research phase where export friction slows iteration
In those cases, use TorchScript or a serving boundary (Python service) and revisit later.
8) Serving and Deployment: Where DJL Shines
Many teams adopt DJL not just for “inference in a main method,” but for serving.
8.1 Embedding Inference Inside an Existing Java Service
Pros:
- Lowest latency (no extra network hop)
- Simplest architecture
Cons:
- Model upgrades tie to service release cycles
- Resource isolation is harder (CPU/GPU, memory)
8.2 Dedicated Model Service
Pros:
- Independent scaling
- Cleaner operational boundaries
Cons:
- Adds network hop
- Requires API contract design
8.3 DJL Serving (Common in Practice)
DJL Serving is a production-oriented model server built around DJL.
Typical reasons teams prefer it:
- Multi-model management
- Operational knobs (batching, concurrency, GPU use)
- Standard deployment patterns
(If you want, we can set this up after you’re comfortable running basic inference.)
9) Performance and Reliability Considerations
9.1 CPU vs GPU
- CPU is simplest to start and often sufficient for moderate workloads.
- GPU helps for larger models or high throughput.
The engine and native libraries determine how GPU support works.
9.2 Memory Management
Deep learning libraries often allocate memory outside the Java heap.
Practical advice:
- Use try-with-resources for models/predictors
- Use NDManager scopes so native memory is reclaimed deterministically
9.3 Observability
A major advantage of “AI inside the JVM” is using standard tooling:
- request latency histograms
- model load time metrics
- error rates and structured logs
- tracing around inference calls
This is often a deciding factor for platform teams.
10) A Recommended Learning Path (Fastest to Competent)
Step 1 — Prove The Runtime
- Run a Java-kernel notebook cell that prints something
- Confirm the kernel starts reliably
Step 2 — Run a Small Prebuilt Model
- Use a model zoo example (image classification or text embedding)
- Focus on the end-to-end pipeline: load → preprocess → predict → postprocess
Step 3 — Make it production-shaped
- Wrap inference in a small Java class with clear input/output types
- Add timing + error handling
- Add a couple of golden tests
Step 4 — Bridge from Python (optional)
- Export a model from Python to ONNX
- Load it in Java using DJL
- Compare outputs against Python for the same test vectors
11) How to Talk About DJL on LinkedIn (Suggested Angle)
If you want a post that resonates with engineering leaders and Java devs:
- Lead with the pain: “We have a JVM platform; AI arrives; now what?”
- Position DJL as a bridge, not a war between languages
- Emphasize operational maturity: deployment, observability, SLAs
- Add one concrete example (e.g., embeddings for search, image classification)
Here’s a short snippet you can adapt:
We didn’t switch our stack to ship AI. We brought AI to our stack.
Deep Java Library (DJL) lets JVM teams run modern deep learning models with Java-first ergonomics—Maven dependencies, typed APIs, and production observability.
Python stays great for research and training, but DJL makes inference and serving feel like a normal part of a Java service.
12) Next: Run a Real D2L-DJL Notebook and Validate %maven Dependencies
You’re installed and ready.
If you want the most confidence quickly, the next best check is:
- Open a notebook from the d2l-java repo
- Switch kernel to Java
- Run the first cells that load DJL dependencies
- Run a tiny DJL inference example
If you tell me which chapter notebook you want to start with (or I can pick a small one), I can automate a full “run-all-cells” smoke test and fix any dependency/engine issues you hit on macOS (Apple Silicon).
13) Troubleshooting Common Issues and Fixes
This section is intentionally practical. If you’re stuck, it’s usually one of these.
13.1 My Notebook Says ‘SyntaxError’ on Java Code
Symptom: you run int x = 21; and Jupyter complains like it’s Python.
Cause: the notebook is using the Python kernel.
Fix:
- In Jupyter/VS Code, change the kernel to Java.
- Confirm with:
jupyter kernelspec listYou should see a java kernelspec.
13.2 “The Java Kernel Is Installed but Won’t Start
Common causes:
- java on your PATH points to a JRE, not a JDK
- the JDK modules needed for JShell aren’t present
Quick check:
java --list-modules | grep "jdk.jshell"If you don’t see jdk.jshell@..., fix your Java installation/path.
13.3 Model Loads on My Machine but Not in CI/Container
This is often due to implicit downloads.
What happens:
- on your dev machine, DJL downloads model artifacts or native libs
- in CI, outbound network access is restricted
Fix patterns:
- vendor model artifacts into the repository or build artifact store
- configure a cache layer (internal artifact repo)
- bake artifacts into the container image
13.4My Outputs Don’t Match Python
In order of likelihood:
- preprocessing mismatch (normalization/tokenization)
- dtype/shape mismatch
- different model version/weights
- different runtime (ONNX vs PyTorch) with numerically small differences
Fix:
- compare intermediate tensors (right after preprocessing)
- add golden vectors and run them in both environments
13.5 Native Library Errors
Deep learning engines rely on native code. Errors often look like:
- missing .so/.dylib
- incompatible architecture
Fix:
- ensure you’re using the correct engine build for your OS/architecture
- prefer official engine artifacts and avoid copying random native libs around
- if deploying in Docker, build for the target platform (Linux x86_64 vs arm64)
14) DJL vs Alternatives: When to Use What
DJL is a great tool, but the best architecture depends on your constraints.
14.1 DJL Embedded in a Java Service
Best when:
- you need low latency
- you already have a JVM service platform
- you want unified observability and deployment
Trade-offs:
- tighter coupling between app releases and model releases
14.2 Python Model Microservice (FastAPI, etc.)
Best when:
- the model changes frequently
- the team is Python-first
- you need access to cutting-edge Python-only tooling
Trade-offs:
- separate runtime, separate ops surface
- cross-service latency and reliability considerations
14.3 DJL Serving / Dedicated Model Server
Best when:
- you want a model-serving control plane
- you need multi-model management and production knobs
Trade-offs:
- one more component to operate
14.4 Just Call Python from Java (JNI, subprocess)
Sometimes teams do this for speed of integration, but it’s rarely the best long-term option.
Trade-offs:
- complicated failure modes
- hard-to-debug environment drift
Rule of thumb: if the model is going to live for months/years in production, invest in a clean boundary (DJL embedded, DJL Serving, or a dedicated service).
15) A Team Checklist for Success
If you want DJL to go smoothly in a real organization, align on these:
Artifact format
- ONNX or TorchScript?
- where are artifacts stored
Preprocessing contract
- tokenizer version
- normalization constants
- max lengths / padding rules
Golden vectors
- at least a few deterministic test cases
- run in CI
Version pinning
- DJL version
- engine version
- model artifact version
Performance plan
- batch size
- concurrency model
- warmup strategy
Operational plan
- metrics to track (latency, throughput, error rate)
- rollback strategy for model updates
If you do just one thing from this list: do golden vectors. They pay for themselves.















