Stat Arb in Crypto: Real Results from Pairs Trading on Coinbase
The woes of going from theory to execution...
Introduction
Hey everyone,
In today’s post, I want to walk you through a crypto statistical arbitrage (pairs trading) strategy I’ve built and live-traded (on and off) over the past three months.
For those unfamiliar, pairs trading is a market-neutral strategy that involves identifying two cointegrated assets. When their price relationship diverges from the historical norm, we take a position on the spread, betting that the spread will eventually converge back to its mean. If it does, we profit and line our pockets up with cash. This post will dive into PnL, the things I’ve learned, the things I want to do to improve PnL and generally also serves as a little reflection piece for me. Without further ado, Let’s get to it.
Why Cryptocurrencies & Why Coinbase?
I decided to deploy this strategy on Cryptocurrencies because data is more readily available (i used Coinbase’s Market API) and easy to use, and settled on Coinbase as it was easy to create a brokerage account, and its also legal here in Singapore. They also have pretty good support on their discord for their APIs, and are quite responsive to questions! I also wanted to familiarise myself with another brokerage’s trade API, wherein previously for my SPY options strategy, I used MooMoo’s Trade API and let’s just say it was pretty hard to get around it. Coinbase’s Advanced Trade API is much easier to use and understand, and requires less experimentation on my own.
Understanding the Strategy
Without going too in-depth on the math of the strategy, I’m going to try my best to keep it simple and understandable on the workflow.
Run cointegration analysis on top 10 cryptocurrencies by traded volume to find cointegrated pairs.
Take the pair with the smallest p-value and start monitoring prices for divergence.
For instance, if the pair identified was: 'ETH-PERP-INTX', 'SOL-PERP-INTX', 2 separate SQLite databases will be created to log the prices of ETH and SOL every minute.
for ticker in pair:
db_name = f'{ticker}.db'
trades_df = get_price_history(ticker, timeframe='ONE_MINUTE', start_time = None)
create_price_database(db_name, trades_df)
schedule.every().minute.at(":58").do(job_with_threading, ticker1)
schedule.every().minute.at(":58").do(job_with_threading, ticker2)
# Keep the script running
while True:
schedule.run_pending()
time.sleep(1)
Every minute, the trade.py script takes the price series from both databases and computes a rolling linear regression based on returns, computes the Z-Score of the spread, and also computes the beta (hedge ratio) of the pair. If the Z-Score is above/below a predefined threshold: 2/-2, then we would enter a position, either short/long or long/short for both respective tickers. It is quite long so I’ve decided to place this in Appendix.
Next, I have a telegram bot that messages me if the algo helps me to enter any positions. This is just a cute little thing I did to make things more fun for myself.
PnL Analysis
Now, on to what REAlly matters. In the past 3 months of running this algo, I’ve made a total of…
drumroll please!
13.5 usd!
Though, in all seriousness, I started off with a capital of ~120 usd, and targeted a cost per trade of 50usd (on both sides so 100usd in total), utilised a range of leverages (5-10x, 10-20x) and managed to land on a 11.25% increase in account value within 3 months. Pretty cool stuff if you were to ask me.
Some Metrics:
Win Rate: 56.84%
Average Win: 1.042771
Average Loss: -1.042194
Here’s the cumulative PnL by trade:
That drawdown is scary — but honestly, there wasn’t much risk management baked in. If you look closely at the trade logic, I was essentially only exiting positions when the Z-Score was hovering around zero. That left a lot of room for volatility to shake me out — or worse, keep me stuck in a losing trade longer than I should have.
Here’s the distribution of PnL per trade:
As expected, the majority of PnLs clustered around zero — which makes sense given the mean-reverting nature of the strategy. I also should’ve probably just implemented stop losses to avoid the large losses on the left tail.
So What Have I learned?
Going into this project, my goal was never to hit a home run. I mainly wanted to get comfortable with:
Trading APIs (especially Coinbase),
Coding up a real-world stat arb pipeline,
And understanding the math behind cointegration and z-score trading.
That said, if you’re going to build something like this, you might as well aim for profitability, right? 😄
And while the raw returns weren’t crazy cracked, the experience was invaluable. Nothing beats the feeling of kicking back in your seat, relaxing, and seeing your algo make money for you.
I thoroughly enjoyed the process of creating something like this.
Future Steps
There’s still a lot of room for improvement — and honestly, that’s what makes this exciting. Here are a few things I’m planning to work on next:
Risk Management Overhaul
Start with simple stop-losses based on z-score thresholds or fixed dollar amounts.
Add max drawdown limits and dynamic position sizing.
Deploying to the Cloud
Move from local execution to a cloud-based setup (e.g., AWS, GCP, or even a VPS).
This will improve reliability, uptime, and peace of mind.
Building a Strategy Dashboard
Visualize live PnL, z-score spreads, open positions, and performance metrics.
Something intuitive and easy to monitor remotely.
Monetization Experiment ❓
Maybe package the core logic and sell it as a downloadable strategy or template.
Could also explore offering monthly updates or premium content for readers interested in algo trading.
Conclusion
That’s it for today folks! Thanks for reading up till here and supporting me by reading this blog! Till next time :)
- Ian
Appendix: Trade Logic for trade.py
def init_trading(pair):
ticker1 = pair[0]
ticker2 = pair[1]
curr_time = pd.to_datetime(datetime.now()).tz_localize('Asia/Singapore')
db1_name = f'{ticker1}.db'
db2_name = f'{ticker2}.db'
query1 = f'SELECT * from `{db1_name}`'
query2 = f'SELECT * from `{db2_name}`'
con1 = sqlite3.connect(db1_name)
ticker1_df = pd.read_sql(query1, con1)
ticker1_df = ticker1_df[-300:]
ticker1_returns = np.log(ticker1_df['close']).diff().dropna()
ticker1_returns.reset_index(inplace=True,drop=True)
con2 = sqlite3.connect(db2_name)
ticker2_df = pd.read_sql(query2, con2)
ticker2_df = ticker2_df[-300:]
ticker2_returns = np.log(ticker2_df['close']).diff().dropna()
ticker2_returns.reset_index(inplace=True,drop=True)
# ticker1 = X, ticker2 = y
# spread = y[i] - (beta * X[i] + alpha)
spread_srs = rolling_reg(ticker1_returns,ticker2_returns, rolling_window=100)
z_score_srs = compute_z_score(spread_srs['spread'], rolling_window=100)
z_scores = z_score_srs.to_frame(name='z_score')
latest_beta = spread_srs['beta'].iloc[-1]
latest_alpha = spread_srs['alpha'].iloc[-1]
ticker1_lp = ticker1_df['close'].iloc[-1]
ticker2_lp = ticker2_df['close'].iloc[-1]
last_zscore = z_scores['z_score'].iloc[-1]
secondlast_zscore = z_scores['z_score'].iloc[-2]
cb_client = RESTClient(api_key=API_KEY, api_secret=API_SECRET)
in_Trade = True if len(cb_client.list_perps_positions(PERP_UUID)['positions']) > 0 else False
pnl_db = 'pnl.db'
tx_db = 'trades.db'
if in_Trade:
position1_pnl = float(cb_client.list_perps_positions(PERP_UUID)['positions'][0]['aggregated_pnl']['value'])
position2_pnl = float(cb_client.list_perps_positions(PERP_UUID)['positions'][1]['aggregated_pnl']['value'])
total_pnl = position1_pnl + position2_pnl
if (last_zscore >= 0 and secondlast_zscore <= 0) or (last_zscore <= 0 and secondlast_zscore >= 0) or (last_zscore >= -0.2 and last_zscore <= 0.2):
print(f'timestamp: {curr_time} | \nZ-Score: {last_zscore} | Close {ticker1} at {ticker1_lp} | Close {ticker2} at {ticker2_lp}')
orderid1 = cb_client.list_orders(product_ids=ticker1)['orders'][0]['order_id']
orderid2 = cb_client.list_orders(product_ids=ticker2)['orders'][0]['order_id']
print(f'[bold purple]PnL: {position1_pnl=}, {position2_pnl=}, {total_pnl=}[/bold purple]')
print(cb_client.close_position(client_order_id=orderid1, product_id=ticker1))
print(cb_client.close_position(client_order_id=orderid2, product_id=ticker2))
fills1 = cb_client.list_orders(product_ids=ticker1, limit=1)['orders']
fills2 = cb_client.list_orders(product_ids=ticker2, limit=1)['orders']
fill_price1 = float(fills1[0]['average_filled_price'])
fill_price2 = float(fills2[0]['average_filled_price'])
fill_size1 = float(fills1[0]['filled_size'])
fill_size2 = float(fills2[0]['filled_size'])
trades_dict = {
'timestamp' : [curr_time, curr_time],
'side' : ['close', 'close'],
'ticker' : [ticker1, ticker2],
'price' : [fill_price1, fill_price2],
'qty' : [fill_size1, fill_size2]
}
trades_df = pd.DataFrame(trades_dict, index = [0,1])
db_con = sqlite3.connect(tx_db)
trades_df.to_sql(tx_db, db_con, if_exists='append')
print(f'[bold green]| {trades_df} appended to {tx_db}![/bold green]')
pnl_dict = {'timestamp': curr_time, 'pnl' : total_pnl}
pnl_df = pd.DataFrame(pnl_dict, index = [0])
pnl_con = sqlite3.connect(pnl_db)
pnl_df.to_sql('pnl', pnl_con, if_exists='append')
print(f'[bold green]| {pnl_df} appended to {pnl_db}![/bold green]')
tele.exit_trade_msg(last_zscore, total_pnl)
else:
print(f'[purple] timestamp: {curr_time} | Z-Score: {last_zscore} | Chill bro we alr in the trade! let it cook [/purple]')
print(f'x: {ticker1} at {ticker1_lp} | y: {ticker2} at {ticker2_lp}')
print(f'[bold purple]PnL: {position1_pnl=}, {position2_pnl=}, {total_pnl=}[/bold purple]')
tele.in_existing_trade(last_zscore, total_pnl)
elif not in_Trade:
if last_zscore < -2:
print(f'timestamp: {curr_time} | \nZ-Score: {last_zscore} | Beta: {latest_beta} | Alpha: {latest_alpha} |\n[red]Short {ticker1} at {ticker1_lp} | [green]Long {ticker2} at {ticker2_lp}[/green]')
#SHORT TICKER 1
ticker1_info = cb_client.get_product(ticker1)
short_curr_price = float(ticker1_info['price'])
short_base_increment = ticker1_info['base_increment']
#LONG TICKER 2
ticker2_info = cb_client.get_product(ticker2)
long_curr_price = float(ticker2_info['price'])
long_base_increment = ticker2_info['base_increment']
ticker1_size, ticker2_size = calculate_positions(TRADE_AMOUNT, short_curr_price, long_curr_price, short_base_increment, long_base_increment, latest_beta, LEVERAGE)
print(cb_client.market_order_sell(client_order_id=str(uuid.uuid4()), product_id= ticker1, base_size= str(ticker1_size), leverage=str(LEVERAGE)))
print(cb_client.market_order_buy(client_order_id=str(uuid.uuid4()), product_id= ticker2, base_size= str(ticker2_size), leverage=str(LEVERAGE)))
fills1 = cb_client.list_orders(product_ids=ticker1, limit=1)['orders']
fills2 = cb_client.list_orders(product_ids=ticker2, limit=1)['orders']
fill_price1 = float(fills1[0]['average_filled_price'])
fill_price2 = float(fills2[0]['average_filled_price'])
trades_dict = {
'timestamp' : [curr_time, curr_time],
'side' : ['short', 'long'],
'ticker' : [ticker1, ticker2],
'price' : [fill_price1, fill_price2],
'qty' : [ticker1_size, ticker2_size]
}
trades_df = pd.DataFrame(trades_dict, index = [0,1])
if not os.path.isfile(tx_db):
create_price_database(tx_db, trades_df)
else:
db_con = sqlite3.connect(tx_db)
trades_df.to_sql(tx_db, db_con, if_exists='append')
print(f'[bold green]| {trades_df} appended to {tx_db}![/bold green]')
tele.enter_trade_msg(ticker2, fill_price2, ticker2_size, ticker1, fill_price1, ticker1_size, last_zscore)
elif last_zscore > 2:
print(f'timestamp: {curr_time} | \nZ-Score: {last_zscore} | Beta: {latest_beta} | Alpha: {latest_alpha} |\n[green]Long {ticker1} at {ticker1_lp}[/green] | [red]Short {ticker2} at {ticker2_lp}[/red]')
#LONG TICKER 1
ticker1_info = cb_client.get_product(ticker1)
long_curr_price = float(ticker1_info['price'])
long_base_increment = ticker1_info['base_increment']
#SHORT TICKER 2
ticker2_info = cb_client.get_product(ticker2)
short_curr_price = float(ticker2_info['price'])
short_base_increment = ticker2_info['base_increment']
ticker1_size, ticker2_size = calculate_positions(TRADE_AMOUNT, long_curr_price, short_curr_price, long_base_increment, short_base_increment, latest_beta, LEVERAGE)
print(cb_client.market_order_buy(client_order_id=str(uuid.uuid4()), product_id= ticker1, base_size= str(ticker1_size), leverage=str(LEVERAGE)))
print(cb_client.market_order_sell(client_order_id=str(uuid.uuid4()), product_id= ticker2, base_size= str(ticker2_size),leverage=str(LEVERAGE)))
fills1 = cb_client.list_orders(product_ids=ticker1, limit=1)['orders']
fills2 = cb_client.list_orders(product_ids=ticker2, limit=1)['orders']
fill_price1 = float(fills1[0]['average_filled_price'])
fill_price2 = float(fills2[0]['average_filled_price'])
trades_dict = {
'timestamp' : [curr_time, curr_time],
'side' : ['long', 'short'],
'ticker' : [ticker1, ticker2],
'price' : [fill_price1, fill_price2],
'qty' : [ticker1_size, ticker2_size]
}
trades_df = pd.DataFrame(trades_dict, index = [0,1])
if not os.path.isfile(tx_db):
create_price_database(tx_db, trades_df)
else:
db_con = sqlite3.connect(tx_db)
trades_df.to_sql(tx_db, db_con, if_exists='append')
print(f'[bold green]| {trades_df} appended to {tx_db}![/bold green]')
tele.enter_trade_msg(ticker1, fill_price1, ticker1_size, ticker2, fill_price2, ticker2_size, last_zscore)
else:
print(f'Z-Score: {last_zscore} | Beta: {latest_beta} | Alpha: {latest_alpha} | No trades at {curr_time}|')
print(f'x: {ticker1} at {ticker1_lp} | y: {ticker2} at {ticker2_lp}')