Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic TTL engine #4

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
4 changes: 2 additions & 2 deletions .npmignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@
report
tmp/
.env
dist
.cache
.github
.github
docs
2 changes: 1 addition & 1 deletion .prettierrc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"printWidth": 80,
"printWidth": 100,
"tabWidth": 2,
"useTabs": false,
"semi": true,
Expand Down
35 changes: 33 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,38 @@ const cache = createCache({
});
```

#### Engines and options

| Engines Name | key | supported options | | |
|---------------------|-----|-----------------------|---|---|
| Least Recently Used | LRU | HashTable, size | | |
| Time To Live | TTL | HashTable, defaultTTL | | |

### Thing to know about TTL engine.

Creating a ttl cache instance. [How it works and Architecture](./docs/ttl-engine.md)

```javascript
import { createCache } from 'node-cache-engine';

const cache = createCache({
engine = 'TTL',
HashTable = YourCustomHashTable, // for custom hash Table. default hashTable is 'src/dataStructure/HashTable.js'
defaultTTL = 3600, // In milliseconds. defaultTTL is an optional property. This will use when ttl value is not passed on item add.
});

const ttl = 5 * 60 * 1000; // (defined 5 minutes) ttl should be in milliseconds

cache.add('key', 'value', ttl); // add into cache. parameters key, value, ttl.
cache.get('key'); // get from cache
cache.has('key'); // checking from key is existing in cache
cache.remove('key'); // removing from cache
cache.size(); // get the size of cache
cache.runGC(); // manual cleaning the expired item.

```


### Creating Custom HashTable
When and Why you should create custom hash table?
The default hash table implemented with `Map`. If you want much more performance than default you can implement your own (like node wrapped c++ hash table). I think 1 to 5 million cache entry default hash table is fine if your use case is more than this go for custom hash table.
Expand All @@ -42,6 +74,5 @@ To implement custom hashTable you have to use methods with symbols name provided


#### Next?
* TTL engine.
* TTL combining with LRU engine
* TTL engine combining with LRU engine
* LFU (Least frequently used) engine.
5 changes: 4 additions & 1 deletion babel.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,8 @@ module.exports = {
},
],
],
plugins: [['@babel/plugin-proposal-optional-chaining']],
plugins: [
['@babel/plugin-proposal-optional-chaining'],
['@babel/plugin-transform-destructuring'],
],
};
Binary file added docs/images/ttl-arct.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 9 additions & 0 deletions docs/ttl-engine.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# TTL cache engine Architecture

One of the challenges on ttl cache replacement is to clean the expired items. For smart cleaning, when we add new item it will time partitioned by ttl/expired value and put it into a time corresponded bucket.
Whenever a get method called it check the element exist and check it is expired or not. If it is expired then removes the item and clean previous buckets.
There is also a `runGC()` method in ttl cache. It will clean the buckets in between last cleaned time and now.
TTL engine do not run `runGC()` method automatically or in an interval.
We do not need to iterate or look all items for cleaning because of expired time partitioning. check below image for more information of architecture.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is misleading.

I would change it to:
Due to the time partitioning we do not need to iterate over all items for garbage collection which adds performance to the process. Check below image for more information of architecture.


![](./images/ttl-arct.png)
28 changes: 21 additions & 7 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@
"@babel/core": "^7.9.0",
"@babel/node": "^7.8.7",
"@babel/plugin-proposal-optional-chaining": "^7.9.0",
"@babel/plugin-transform-destructuring": "^7.10.1",
"@babel/preset-env": "^7.9.0",
"babel-eslint": "^10.1.0",
"babel-jest": "^25.1.0",
Expand All @@ -57,6 +58,7 @@
"jest": "^25.1.0",
"lint-staged": "^10.0.8",
"microbundle": "^0.12.0",
"mockdate": "^3.0.2",
"pre-commit": "^1.2.2",
"prettier": "^1.19.1"
}
Expand Down
8 changes: 5 additions & 3 deletions src/cache.js
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
import DefaultHashTable from './dataStructure/HashTable';
import LRU from './engines/LeastRecentlyUsed';
import TTL from './engines/TimeToLive';

function factory({
size = Number.MAX_SAFE_INTEGER,
engine = 'LRU',
HashTable = DefaultHashTable,
defaultTTL,
} = {}) {
switch (engine) {
case 'LRU':
return new LRU({ size, HashTable });
case 'TTL':
return new TTL({ HashTable, defaultTTL });
default:
throw Error(
`Engine : ${engine} is not implemented. Currently we have only 'LRU' engine.`,
);
throw Error(`Engine : ${engine} is not implemented. Engine options are 'LRU', 'TTL'`);
}
}

Expand Down
6 changes: 6 additions & 0 deletions src/cache.test.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import createCache from './cache';
import LRU from './engines/LeastRecentlyUsed';
import TTL from './engines/TimeToLive';

describe('cache factory', () => {
it('should create a default cache instance of LRU if no engine is mentioned', () => {
Expand All @@ -12,6 +13,11 @@ describe('cache factory', () => {
expect(cache1 instanceof LRU).toBe(true);
});

it('should create ttl cache instance',()=>{
const cache1 = createCache({ engine: 'TTL' });
expect(cache1 instanceof TTL).toBe(true);
})

it('should throw error if engine type is not implemented', () => {
expect(() => createCache({ engine: 'NOT_HERE' })).toThrow(
`Engine : NOT_HERE is not implemented`,
Expand Down
149 changes: 149 additions & 0 deletions src/engines/TimeToLive.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
import DefaultHashTable from '../dataStructure/HashTable';
import DoublyLinkedList from '../dataStructure/DoublyLinkedList';
import * as hashTableProp from '../hashTableSymbol';

function TimeToLive({ HashTable = DefaultHashTable, defaultTTL } = {}) {
const store = new HashTable();
const timePartition = new HashTable();
const timeIndexInterval = 5 * 60 * 1000; // milliseconds.

let lowestTimePartition = Date.now();

this.add = (key, value, ttl = defaultTTL) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just something we might want to consider. When we remove this ttl and only have the default ttl we have more performance optimization potential.

if (!ttl || !Number.isInteger(ttl) || ttl <= 0)
throw Error(
'Expected ttl value (should be positive integer). ' +
'you can have to mention it in add method or mention as defaultTTL at constructor',
);

const expireTTL = Date.now() + ttl;
const bucket = getTimeBucket(expireTTL);
bucket.addFirst(key);
const tNode = bucket.getFirstNode();
const payload = { value, ttl: expireTTL, tNode };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ttl is misleading here. Per definition https://en.wikipedia.org/wiki/Time_to_live is a timespan. expireTTL is a timestamp. So expireTime or short expires is more precise,

store[hashTableProp.add](key, payload);
Comment on lines +23 to +24
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why buffe
store[hashTableProp.add](key, { value, ttl: expireTTL, tNode });

};

this.get = key => {
const payload = store[hashTableProp.get](key);
if (payload) {
const { ttl, value } = payload;
if (checkIfElementExpire({ ttl, key })) return undefined;
else return value;
}
return undefined;
Comment on lines +29 to +34
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (!payload) {
  return undefined;
}
const { ttl, value } = payload;
return checkIfElementExpire({ ttl, key }) ? undefined : value;

};

this.has = key => {
return (
store[hashTableProp.has](key) &&
!checkIfElementExpire({ ttl: store[hashTableProp.get](key), key })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe there is a bug in you test ;-)
!checkIfElementExpire({ ttl: store[hashTableProp.get](key), key }) needs to be
!checkIfElementExpire({ ttl: store[hashTableProp.get](key).ttl, key }) right?

);
};

this.remove = key => {
if (this.has(key)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store[hashTableProp.has](key) because for remove the expire check is not relevant

const { ttl, tNode } = store[hashTableProp.get](key);
const timeBucket = getTimeBucket(ttl);
timeBucket.remove(tNode);
store[hashTableProp.remove](key);
}
};

this.size = () => {
return store[hashTableProp.size]();
};

this.runGC = () => {
const cleanTo = getBackwardTimeIndex({ time: Date.now(), interval: timeIndexInterval });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realize maybe not for this version but for future version
a) Providing interval all the time is not very functional programmy. We need more like a createIntervalHandler function that gets the defaultTTL and calculates the best index and then returns an object with methods getPreviousTimeIndex and getCurrentTimeIndex based only on time input
b) The whole timepartition thing is kind of a data structure ;-)

cleanExpiredBuckets(cleanTo);

const nextCleanBucket = getForwardTimeIndex({ time: Date.now(), interval: timeIndexInterval });
cleanNotExpiredBucket(nextCleanBucket);
};

function getTimeBucket(expireTTL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of partitions this function should be called getTimePartition right?

const timeIndex = getForwardTimeIndex({
time: expireTTL,
interval: timeIndexInterval,
});

if (timePartition[hashTableProp.has](timeIndex)) {
return timePartition[hashTableProp.get](timeIndex);
} else {
const list = new DoublyLinkedList();
timePartition[hashTableProp.add](timeIndex, list);
Comment on lines +74 to +75
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const list is actually the timePartition and what you call timePartition is either the partitionTable or timePartitions

return list;
}
}

const checkIfElementExpire = ({ ttl, key }) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I really hate is when a function is called "check" but it is changing data. No one expects that.
This is such a hotspot for me.

if (ttl < Date.now()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your wish for indentations :-) Why not if (ttl >= Date.now()) return false then no indentation needed

const timeIndex = getBackwardTimeIndex({
time: ttl,
interval: timeIndexInterval,
});
this.remove(key);
cleanExpiredBuckets(timeIndex);
return true;
}
return false;
};

function cleanExpiredBucket(timeIndex) {
if (timePartition[hashTableProp.has](timeIndex)) {
const tNodes = timePartition[hashTableProp.get](timeIndex);
for (const tNode of tNodes) {
store[hashTableProp.remove](tNode.value);
}
timePartition[hashTableProp.remove](timeIndex);
}
}

function cleanExpiredBuckets(tillTimeIndex) {
const cleanFrom = getForwardTimeIndex({
time: lowestTimePartition,
interval: timeIndexInterval,
});

for (const curTimeIndex of getIndexBetween({
from: cleanFrom,
to: tillTimeIndex,
interval: timeIndexInterval,
})) {
cleanExpiredBucket(curTimeIndex);
}
lowestTimePartition = tillTimeIndex;
}

function cleanNotExpiredBucket(timeIndex) {
if (timePartition[hashTableProp.has](timeIndex)) {
const tNodes = timePartition[hashTableProp.get](timeIndex);
for (const { value: key } of tNodes) {
const { ttl } = store[hashTableProp.get](key);
if (ttl < Date.now()) {
store[hashTableProp.remove](key);
}
}
}
}
}

// time : unix timestamp milliseconds
// interval : milliseconds (better to be factors of 60 (minutes))
function getForwardTimeIndex({ time, interval }) {
const timeParts = (time / interval) | 0;
return timeParts * interval + interval;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why multiply with the interval? why not just timeParts + 1?

}

function getBackwardTimeIndex({ time, interval }) {
const timeParts = (time / interval) | 0;
return timeParts * interval;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you multiply with the interval?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for better debugging.

}

function* getIndexBetween({ from, to, interval }) {
for (let i = from; i <= to; i += interval) yield i;
}

export { getForwardTimeIndex, getBackwardTimeIndex, getIndexBetween };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exporting function just for testing is bad

export default TimeToLive;
Loading