Repo Link: https://github.com/shiyu-coder/Kronos
📑Table of Contents
🧠 What It Is
Kronos is the first open source foundation model purpose built for the "language" of financial markets: candlestick (K line) sequences. It was trained on data from over 45 global exchanges and released under the MIT License. The paper is on arXiv (2508.02739) and the work was accepted by AAAI 2026.
It is a family of decoder only models. A specialized tokenizer turns continuous OHLCV data into hierarchical discrete tokens, then a large autoregressive Transformer is pre-trained on those tokens to handle a range of quantitative tasks.
At time of writing the repo had roughly 27,800 stars and 4,800 forks. Star counts change daily, so verify the current number on the repo page if you quote it.
📖 Why It Exists
Most time series foundation models are general purpose. They are built to forecast clean, predictable signals like temperature or electricity demand. Financial markets break those models because K line data is high noise, multi dimensional, and shaped by forces that do not repeat cleanly.
The Kronos team took the opposite approach. Instead of forcing a generic model onto messy market data, they built a model that treats candlesticks as their own language, with their own vocabulary and grammar. The tokenizer learns that vocabulary first. The Transformer then learns to speak it. The result is a single model you can point at many quantitative tasks instead of training a fresh model for every asset.
⚙️ What It Can Actually Do
You say | What happens |
|---|---|
Load a Kronos model from Hugging Face | Pulls pre-trained weights and tokenizer with two lines of code |
Feed it 400 historical candles | Model reads the OHLCV sequence as context |
Ask for the next 120 periods | Returns a forecast DataFrame of open, high, low, close, volume, amount |
Set temperature and top_p | Controls how exploratory vs conservative the forecast sampling is |
Pass a list of DataFrames | predict_batch forecasts many assets in parallel on GPU |
Point it at your own market data | Finetune the tokenizer and predictor on your domain |
Run the backtest script | Produces a cumulative return curve vs benchmark |
✅ Before You Start
Python 3.10 or newer installed
pip working in your environment
A GPU recommended for finetuning (CPU works for small forecasts)
A Hugging Face account is not required to download the public models, but helps
(Finetuning only) Qlib installed and Qlib data prepared
(Optional) A Comet.ml account for experiment tracking, or set
use_comet = False
💻 Installation
⚠️ The repo lists Python 3.10+ as the requirement. Older versions are not supported.
🍎 Mac / 🐧 Linux
Clone the repo:
git clone https://github.com/shiyu-coder/Kronos.git
cd Kronos(Recommended) Create a virtual environment:
python3 -m venv venv
source venv/bin/activateInstall dependencies:
pip install -r requirements.txt🪟 Windows
Clone the repo:
git clone https://github.com/shiyu-coder/Kronos.git
cd Kronos(Recommended) Create a virtual environment:
python -m venv venv
venv\Scripts\activateInstall dependencies:
pip install -r requirements.txt🚀 Setup Step by Step
Open a Python file or notebook inside the repo folder.
Import the core classes:
from model import Kronos, KronosTokenizer, KronosPredictorLoad a pre-trained tokenizer and model from Hugging Face:
tokenizer = KronosTokenizer.from_pretrained("NeoQuasar/Kronos-Tokenizer-base")
model = Kronos.from_pretrained("NeoQuasar/Kronos-small")Create the predictor:
predictor = KronosPredictor(model, tokenizer, max_context=512)Load your candle data into a pandas DataFrame with columns
open,high,low,close(volume and amount optional).Define your lookback window and prediction length.
Call
predictand read the returned DataFrame.
⚠️ The max context for Kronos-small and Kronos-base is 512. Keep your lookback at or under that. The predictor auto truncates longer inputs.
📋 Prompts and Code to Use It
Use case: single asset forecast
import pandas as pd
df = pd.read_csv("./data/XSHG_5min_600977.csv")
df['timestamps'] = pd.to_datetime(df['timestamps'])
lookback = 400
pred_len = 120
x_df = df.loc[:lookback-1, ['open', 'high', 'low', 'close', 'volume', 'amount']]
x_timestamp = df.loc[:lookback-1, 'timestamps']
y_timestamp = df.loc[lookback:lookback+pred_len-1, 'timestamps']
pred_df = predictor.predict(
df=x_df,
x_timestamp=x_timestamp,
y_timestamp=y_timestamp,
pred_len=pred_len,
T=1.0,
top_p=0.9,
sample_count=1
)
print(pred_df.head())Use case: batch forecast across many assets
df_list = [df1, df2, df3]
x_timestamp_list = [x_ts1, x_ts2, x_ts3]
y_timestamp_list = [y_ts1, y_ts2, y_ts3]
pred_df_list = predictor.predict_batch(
df_list=df_list,
x_timestamp_list=x_timestamp_list,
y_timestamp_list=y_timestamp_list,
pred_len=pred_len,
T=1.0,
top_p=0.9,
sample_count=1,
verbose=True
)⚠️ For batch prediction every series must share the same lookback length and the same pred_len, and each must include open, high, low, close.
Use case: forecast without volume and amount
The repo ships examples/prediction_wo_vol_example.py for data that has no volume or amount columns. Run it directly as a template.
⌨️ Commands Table
Command | What it does |
|---|---|
pip install -r requirements.txt | Installs all dependencies |
pip install pyqlib | Installs Qlib, required for the finetuning pipeline |
python examples/prediction_example.py | Runs the full forecast plus plot example |
python examples/prediction_wo_vol_example.py | Forecast example without volume/amount |
python finetune/qlib_data_preprocess.py | Builds train/val/test pickle files from Qlib data |
torchrun --standalone --nproc_per_node=NUM_GPUS finetune/train_tokenizer.py | Finetunes the tokenizer on your data |
torchrun --standalone --nproc_per_node=NUM_GPUS finetune/train_predictor.py | Finetunes the predictor model |
python finetune/qlib_test.py --device cuda:0 | Runs the backtest on your finetuned model |
🗓️ Daily Workflow
A researcher testing Kronos on a new asset typically does this:
Pull fresh OHLCV candles for the asset into a CSV.
Load the pre-trained Kronos-small model once at the top of the script.
Run a forecast on the most recent 400 candles to eyeball whether the shape looks sane.
If the asset behaves very differently from the pre-training data, prepare a Qlib dataset and finetune the tokenizer then the predictor.
Run
qlib_test.pyto backtest the finetuned model against a benchmark.Feed the raw signals into a separate portfolio and risk layer before treating anything as tradable.
💡 Tips
Start with Kronos-small (24.7M params). It is the lightest fully open model and fast enough to iterate. Move up to Kronos-base (102.3M) only once your pipeline works. Kronos-large is not open sourced.
Kronos-mini pairs with the 2k tokenizer and supports a 2048 context. The small and base models use the base tokenizer with a 512 context. Match the right tokenizer to the right model or loading will misbehave.
Raise
sample_countabove 1 to average several probabilistic forecast paths instead of trusting a single sample.The
finetune/code comments were generated by an AI assistant and the maintainers warn they may contain inaccuracies. Trust the code over the comments.Treat the included backtest as a demo. The repo itself flags that it skips transaction costs, slippage, and risk factor neutralization.
📌 Quick Reference
# Install
git clone https://github.com/shiyu-coder/Kronos.git
cd Kronos
pip install -r requirements.txt
# Run example forecast
python examples/prediction_example.py
# Finetuning pipeline
pip install pyqlib
python finetune/qlib_data_preprocess.py
torchrun --standalone --nproc_per_node=NUM_GPUS finetune/train_tokenizer.py
torchrun --standalone --nproc_per_node=NUM_GPUS finetune/train_predictor.py
python finetune/qlib_test.py --device cuda:0python
# Minimal forecast in Python
from model import Kronos, KronosTokenizer, KronosPredictor
tokenizer = KronosTokenizer.from_pretrained("NeoQuasar/Kronos-Tokenizer-base")
model = Kronos.from_pretrained("NeoQuasar/Kronos-small")
predictor = KronosPredictor(model, tokenizer, max_context=512)
pred_df = predictor.predict(
df=x_df,
x_timestamp=x_timestamp,
y_timestamp=y_timestamp,
pred_len=120,
T=1.0,
top_p=0.9,
sample_count=1
)By The AI Leverage - Learn and master AI daily

