Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grounding space implementation with a more compact representation in memory #829

Merged
merged 43 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
50627fe
Add atom storage
vsbogd Jun 18, 2024
e2b6c6e
Add atom token iterator
vsbogd Jun 20, 2024
d8bbf6f
Add atom index implementation
vsbogd Jun 20, 2024
30a7f5c
Add AtomStorage::get_id() method to get id by atom
vsbogd Jun 20, 2024
d63af33
Stop modifying AtomIndex on query
vsbogd Jun 20, 2024
e7ab038
Add iterator over atoms in index
vsbogd Jun 20, 2024
bcc34fc
Rename AtomStorage::get to AtomStorage::get_atom
vsbogd Jun 20, 2024
3cd6812
Remove unused code
vsbogd Jun 20, 2024
dbe1bba
Rename variables while processing query
vsbogd Sep 16, 2024
5fac20e
Optimize search for a case of the atom which is not in index storage
vsbogd Sep 19, 2024
fce713b
Rename HashAtom to HashableAtom
vsbogd Dec 24, 2024
eb404e0
Implement GroundingSpace using AtomIndex
vsbogd Jun 21, 2024
0b549d6
Remove cloning on insert into AtomIndex, refactor code
vsbogd Dec 26, 2024
b84e4f9
Add TODO about custom key entry matching issue
vsbogd Dec 26, 2024
bc02c03
Split IndexKey on InsertKey and QueryKey because
vsbogd Dec 26, 2024
187caf8
Minor AtomIndex::skip_atom() change
vsbogd Dec 26, 2024
3b366ee
Simplify exact key matching code
vsbogd Dec 26, 2024
18cd93b
Borrow values from AtomIndex when it is possible while iterating
vsbogd Dec 26, 2024
57bd00a
Improve code readability
vsbogd Dec 26, 2024
0f56d3b
Implement AtomTrieNode iterator without collecting items
vsbogd Dec 26, 2024
246c935
Implement Display for AtomStorage and AtomTrieNode
vsbogd Dec 27, 2024
59a0294
Eliminate expression buffer allocation on recursion
vsbogd Dec 27, 2024
f3c89fa
Allow using CustomMatch implementors in queries to the AtomIndex
vsbogd Dec 28, 2024
d8ea01e
Add AtomIndex::remove method
vsbogd Dec 28, 2024
586ebb6
Move AtomIndex implementation into grounding::index module
vsbogd Dec 28, 2024
f232cf9
Fix nightly compiler warnings
vsbogd Dec 28, 2024
8fc6893
Make ExactKey's size equal to the usize's size
vsbogd Jan 28, 2025
86af5a9
Use enum to decrease size of the AtomTrieNode leaf
vsbogd Jan 28, 2025
d2c023d
Allocate custom matchers collection only when it is needed
vsbogd Jan 29, 2025
200c449
Add methods to collect AtomIndex statistics
vsbogd Jan 29, 2025
4cd0d81
Add an example of loading space from a file
vsbogd Jan 30, 2025
e8b9d9e
Fix AtomIndex unit tests
vsbogd Jan 30, 2025
9a91f15
Duplicate atoms in AtomIndex to reproduce old GroundingSpace behavior
vsbogd Jan 31, 2025
4ab3411
Add duplication strategy parameter to GroundingSpace type
vsbogd Jan 31, 2025
a801a8d
Move AtomStorage, AtomTrieNode and AtomIndex into separate modules
vsbogd Feb 3, 2025
98b58ff
Remove unused parameter
vsbogd Feb 3, 2025
30dfa8e
Optimize HoleyVec performance
vsbogd Feb 6, 2025
dc0b755
Make AtomTrie in-memory representation more compact
vsbogd Feb 6, 2025
4b0d764
Merge custom and exact indicies into one, move storage into AtomTrie
vsbogd Feb 7, 2025
5fc75f8
Represent custom/exact matching flags in key explicitly
vsbogd Feb 10, 2025
cd00c4c
Document new data types and methods
vsbogd Feb 11, 2025
8ea91f8
Remove non-hashable key from storage when it is removed from node
vsbogd Feb 11, 2025
56ec0a5
Merge branch 'main' into grounding-space
vsbogd Feb 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions lib/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ dyn-fmt = "0.4.0"
itertools = "0.13.0"
unescaper = "0.1.5"
unicode_reader = "1.0.2"
bimap = "0.6.3"

# pkg_mgmt deps
xxhash-rust = {version="0.8.7", features=["xxh3"], optional=true }
Expand All @@ -24,6 +25,9 @@ serde_json = { version="1.0.116", optional=true }
semver = { version="1.0", features = ["serde"], optional=true }
git2 = { version="0.18.3", features=["vendored-libgit2"], optional=true }

[dev-dependencies]
ra_ap_profile = "0.0.261"

[lib]
name = "hyperon"
path = "src/lib.rs"
Expand Down
87 changes: 87 additions & 0 deletions lib/examples/load_space.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
use std::env;
use std::fs::File;
use std::io::BufReader;
use std::time::{SystemTime, Duration};
use ra_ap_profile::memory_usage;

use hyperon::*;
use hyperon::metta::text::*;
use hyperon::space::grounding::*;

#[inline]
fn now() -> SystemTime {
SystemTime::now()
}

#[inline]
fn since(time: SystemTime) -> Duration {
SystemTime::now().duration_since(time).unwrap()
}

fn main() -> Result<(), String> {
let args: Vec<String> = env::args().collect();
println!("args passed: {:?}", args);
let filename = match args.get(1) {
Some(filename) => filename,
None => return Err(format!("Please specify MeTTa file as a first argument")),
};
let open_error = |err| { format!("Cannot open file: {}, because of error: {}", filename, err) };
let file = BufReader::new(File::open(filename).map_err(open_error)?);

let mut parser = SExprParser::new(file);
let tokenizer = Tokenizer::new();
let mut space = GroundingSpace::new();

let before = memory_usage().allocated;
let start = now();
loop {
match parser.parse(&tokenizer)? {
Some(atom) => space.add(atom),
None => break,
}
}
let duration = since(start);
let after = memory_usage().allocated;
println!("loading time {:?}", duration);
println!("memory usage: {}", after - before);

let query = match args.get(2) {
Some(query) => SExprParser::new(query).parse(&tokenizer)?
.expect(format!("Incorrect atom: {}", query).as_str()),
None => expr!("no_match"),
};

let start = now();
let result = space.query(&query);
let duration = since(start);
println!("{} -> {}, time {:?}", query, result, duration);

// FILE: gaf/edges.metta
// QUERY: (go_gene_product (ontology_term GO:0002377) (protein A0A075B6H8))
//use hyperon::space::grounding::index::storage::AtomStorage;
//use hyperon::space::grounding::index::trie::{AllowDuplication, AtomTrie, AtomTrieNode, AtomTrieNodeContent};

//println!("Atom size {}", std::mem::size_of::<Atom>());
//println!("AtomTrieNode size {}", std::mem::size_of::<AtomTrieNode>());
//println!("AtomTrieNodeContent size {}", std::mem::size_of::<AtomTrieNodeContent<AllowDuplication>>());

//println!("atom storage count: {}", space.index.storage.count());
//let mut storage = AtomStorage::default();
//let before = memory_usage().allocated;
//std::mem::swap(&mut space.index.storage, &mut storage);
//drop(storage);
//let after = memory_usage().allocated;
//println!("atom storage mem: {}", before - after);

//println!("atom index node count: {:?}", space.index.trie.stats());
//let mut trie = AtomTrie::default();
//let before = memory_usage().allocated;
//std::mem::swap(&mut space.index.trie, &mut trie);
//drop(trie);
//let after = memory_usage().allocated;
//println!("atom index mem: {}", before - after);

//println!("{}", space.query(&query));

Ok(())
}
16 changes: 6 additions & 10 deletions lib/src/atom/matcher.rs
Original file line number Diff line number Diff line change
Expand Up @@ -881,15 +881,6 @@ impl<'a> Iterator for BindingsIter<'a> {
}
}

impl<'a> IntoIterator for &'a Bindings {
type Item = (&'a VariableAtom, Atom);
type IntoIter = BindingsIter<'a>;

fn into_iter(self) -> Self::IntoIter {
self.iter()
}
}


/// Represents a set of [Bindings] instances resulting from an operation where multiple matches are possible.
#[derive(Clone, Debug)]
Expand Down Expand Up @@ -979,10 +970,15 @@ impl BindingsSet {
BindingsSet(smallvec::smallvec![])
}

/// Creates a new unconstrained BindingsSet
/// Creates a new BindingsSet with a single full match
pub fn single() -> Self {
BindingsSet(smallvec::smallvec![Bindings::new()])
}

/// Creates a new BindingsSet with `count` full matches
pub fn count(count: usize) -> Self {
BindingsSet(smallvec::SmallVec::from_elem(Bindings::new(), count))
}

/// Returns `true` if a BindingsSet contains no Bindings Objects (fully constrained)
///
Expand Down
16 changes: 15 additions & 1 deletion lib/src/atom/serial.rs
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,11 @@ pub trait ConvertingSerializer<T>: Serializer + Default {
/// Serialization result type
pub type Result = std::result::Result<(), Error>;

trait PrivHasher : Hasher {}
impl PrivHasher for DefaultHasher {}

// there are much speedier hashers, but not sure if it's worth the extra dependency given the other options
impl Serializer for DefaultHasher {
impl<H: PrivHasher> Serializer for H {
fn serialize_bool(&mut self, v: bool) -> Result { Ok(self.write_u8(v as u8)) }
fn serialize_i64(&mut self, v: i64) -> Result { Ok(self.write_i64(v)) }
fn serialize_f64(&mut self, v: f64) -> Result { Ok(self.write_u64(v as u64)) }
Expand All @@ -95,3 +98,14 @@ impl Serializer for Vec<u8> {
fn serialize_f64(&mut self, v: f64) -> Result { Ok(self.extend(v.to_le_bytes())) }
fn serialize_str(&mut self, v: &str) -> Result { Ok(self.extend(v.bytes())) }
}

#[derive(Default)]
pub struct NullSerializer();

impl Serializer for NullSerializer {
fn serialize_bool(&mut self, _v: bool) -> Result { Ok(()) }
fn serialize_i64(&mut self, _v: i64) -> Result { Ok(()) }
fn serialize_f64(&mut self, _v: f64) -> Result { Ok(()) }
fn serialize_str(&mut self, _v: &str) -> Result { Ok(()) }
}

16 changes: 16 additions & 0 deletions lib/src/common/collections.rs
Original file line number Diff line number Diff line change
Expand Up @@ -297,6 +297,22 @@ impl<'a, T: 'a + Display> Display for VecDisplay<'a, T> {
}
}

/// Helper function to implement Display for all mapping like code structures.
/// Displays iterator over pairs in a format { <key>: <value>, ... }
pub fn write_mapping<A, B, I>(f: &mut std::fmt::Formatter, it: I) -> std::fmt::Result
where
A: Display,
B: Display,
I: Iterator<Item=(A, B)>
{
write!(f, "{{").and_then(|()| {
it.fold((Ok(()), true), |(res, start), (a, b)| {
let comma = if start { "" } else { "," };
(res.and_then(|()| write!(f, "{} {}: {}", comma, a, b)), false)
}).0
}).and_then(|()| write!(f, " }}"))
}


#[cfg(test)]
mod test {
Expand Down
Loading
Loading