Skip to content

Commit

Permalink
write data collection motivation
Browse files Browse the repository at this point in the history
  • Loading branch information
dragoon committed Nov 20, 2023
1 parent 74a9cd8 commit 7238187
Showing 1 changed file with 15 additions and 3 deletions.
18 changes: 15 additions & 3 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ scratch for crypto assets.

Here is a high-level overview of what we are going to cover:

1. Data Collection platform
1. Data collection platform
2. Signal generation
3. Backtesting & reporting
4. Unit and integration testing
Expand All @@ -16,7 +16,8 @@ Here is a high-level overview of what we are going to cover:
## Data collection platform

The first system we need to build is a data collection platform.
[TODO WHY]
For backtesting, we need to collect at least best bid/ask prices,
and to implement a trading strategy, we need to collect other features from the order book, so let's talk about it briefly.

### Order book

Expand Down Expand Up @@ -260,7 +261,7 @@ async def collect_data(self):
In case we get ``asyncio.TimeoutError``, we simply sleep with a constant delay, and then try to re-connect.
In case of other exceptions, we sleep with exponential backoff delay, and exit completely if the number of retries exceeded pre-configured value.

:warning: It is worth noting, that while network errors are somewhat expected, other exceptions are not,
:warning: NB: while network errors are somewhat expected, other exceptions are not,
and the generic exception handle will swallow everything, even errors in your implementation.
The log monitoring system should be configured to notify the dev team in case of such errors.

Expand Down Expand Up @@ -289,6 +290,7 @@ The ``DepthCacheManager`` interface exposes three configuration parameters:
The first argument I want to talk about is ``refresh_interval``. Current ([1.0.19](https://pypi.org/project/python-binance/1.0.19/)) version
of the _python-binance_ library has a bug that prevents disabling it and sets to default (30 minutes) instead.
This is clearly visible when we plot the total number of bids/asks for any asset:

![](assets/refresh_interval.png)

For very liquid assets like BTC, order book can contain many more bids and asks then initial 5000 allowed by _Binance_.
Expand All @@ -297,5 +299,15 @@ After fixing the issue with refresh interval, the chart looks reasonable like th

![](assets/refresh_interval_fixed.png)

#### Limit
The ``limit`` parameter sets the initial amount of orders to retrieve from the order book.
Ideally, we want to retrieve the full order book on startup, and not have this parameter at all,
but _Binance API_ imposes this limit. The only reason for making it lower than maximum 5000 is because
Binance API has a complex system of API request limits based on the weight of the actual call,
but in case of websockets this is not applicable.

For the sake of comparison, here are the graphs of total number of asks and bids for Bitcoin using starting values of 5000 and 100 for limit:




1 comment on commit 7238187

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
start_data_collector.py25250%1–33
datacollector
   domain.py280100% 
datacollector/repositories
   data_repository.py60100% 
datacollector/services
   collector_service.py651183%16, 19–33
   data_process_service.py400100% 
   datetime_service.py9189%13
TOTAL1733779% 

Tests Skipped Failures Errors Time
6 0 💤 0 ❌ 0 🔥 1.513s ⏱️

Please sign in to comment.