Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstract over base type of Yi.Rope #15

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

ChrisPenner
Copy link
Member

@ChrisPenner ChrisPenner commented Sep 14, 2017

Hello! This is in reference to #12

I'm opening the PR now even though there's a fair amount of work to do because I'd love to get some feedback as I go.

To give some background, the end goal is that I'd be able to store arbitrary annotations along-side my text in a still (relatively) performant fashion using a clean interface.

This PR will hope to accomplish that by effectively allowing the storage of any Segmented type in the rope. The segmented type needs to provide a base set of methods to manipulate it efficiently, as seen in the Segmented typeclass. I may end up pulling out the Segmented typeclass to a new package yet.

Anything that implements Segmented and Measure can be stored in a Braid which is an abstracted Rope.

An example use case here would be to keep track of all text that's selected or highlighted in a text editor using a bitmask. We want the selections/highlights to follow characters as they move around the screen, so we can store them alongside the text like so:

instance Segmented (V.Vector Bool) where -- Vector Bool is an unboxed bitmask vector
  type Segment (V.Vector Bool) = Bool
  head = V.head
  tail = V.tail
  -- etc

instance (Segmented a, Segmented b) => Segmented (a, b) where
  type Segment (a, b) = (Segment a, Segment b)
  head (a, b) = (head a, head b)
  tail (a, b) = (tail a, tail b)
  -- etc

Now that we've defined a segmentation, we can use Braid (Size, ()) (TX.Text, V.Vector Bool) to represent a composite of the two. We'd need to alter our measure-based functions slightly to allow them to dig inside the composite measure (which in this case would be a tuple of the two measures). But this is easily helped using a simple HasSize typeclass which we define for the tuple:

class HasSize m where
  getSize :: m -> Size

instance HasSize Size where
  getSize = id

instance HasSize (Size, a) where
  getSize = fst

Now so long as the typeclass is implemented we can mix and match any additional data with text and still get performant operations/measures.

A YiString is now simple a Braid Size TX.Text, I'll have to do some benchmarks, but it should compile to pretty much the same code as before so long as we make sure the compiler is inlining everything as it should.


Currently I've implemented the base abstraction in Yi.Braid and then re-export all the functions from Yi.Rope for back compatibility with the same types as they currently have. This should allow current users to be mostly unaffected. There's the one issue that I've renamed the constructor from YiString -> Braid, so that might need addressing.

Anyways, there's still tons of work to do and I need to update all the docs too. Let me know what you think of the general idea as I keep working on it. Cheers!

@Fuuzetsu

src/Yi/Rope.hs Outdated
-- | Used to cache the size of the strings.
data Size = Indices { charIndex :: {-# UNPACK #-} !Int
-- ^ How many characters under here?
, lineIndex :: Int
-- ^ How many lines under here?
} deriving (Eq, Show, Typeable)

instance S.MSeg TX.Text where
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hoping to find a better way to do this, open to suggestions. The end user won't really need to care about it though.

src/Yi/Rope.hs Outdated
instance S.MSeg TX.Text where
type Meas TX.Text = Size

instance B.HasSize Size where
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typeclass would allow us to extract the size of a chunk's measure for custom measures.

src/Yi/Braid.hs Outdated
{-# language FlexibleContexts #-}
{-# language FlexibleInstances #-}
{-# language TypeFamilies #-}
{-# language UndecidableInstances #-}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunate; it's required for the Eq, Ord, etc. instances. I don't really see why they're undecidable so it's possible I'm missing something and this could be removed somehow.

@ghost
Copy link

ghost commented Sep 14, 2017

I like the idea. I think there should be a way to build Braid to Braid isomorphisms for transforming representations.

@Fuuzetsu
Copy link
Member

I don't know when I'll get a chance to review this so please don't wait for my input.


Braid to Braid isomorphisms

an automorphism :)

stack.yaml Outdated
enable: true
packages:
- libcxx
- icu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dropped icu dependency in 0.10

src/Yi/Braid.hs Outdated

import qualified Yi.Segment as S

type ValidBraid v a = (T.Measured v (Chunk a), S.Segmented a, HasSize v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took me a minute to realize this has kind Constraint, so could you add a kind signature?

@ethercrow
Copy link
Member

Could you also fix comments in Braid module that talk about YiString?

@ChrisPenner
Copy link
Member Author

In the most recent commit I pulled in the Data.ListLike dependency to replace my Segmented class, it has support for Text, Strings, Vectors, ByteStrings, etc. This allows the use of any of those types (and trivially any products or sums of those types) as a base type for a Braid. The implementation of ListLike for Text just relies on the underlying Data.Text functions so performance should be unaffected; with the exception of chunksOf and span which are unfortunately missing from the ListLike typeclass. I've implemented generic versions of these in Yi.Rope.Internal.ListLikeHelpers, but these are not specialized. Ideally we'd get these two methods into the ListLike class properly and could thus get more efficient implementations. This currently effects only toBraid and Yi.Braid.split and the Yi.Rope versions of the same. If performance here is an issue I can easily override the versions of these functions inside Yi.Rope to use the efficient Text versions.

This does pull in a pretty large amount of transitive dependencies which of course is not ideal. I'm open to suggestions, but I also believe this expands the utility of Braids pretty substantially. Let me know what you think of this idea in particular.

I've also gone through and cleaned up the docs.

@ghost
Copy link

ghost commented Sep 16, 2017

Let me know what you think of this idea in particular.

That's related to issue #11 actually.

@ChrisPenner
Copy link
Member Author

@siddhanathan Hrmm, Is that issue suggesting adding ListLike to YiString itself? I'd see that as a different (yet still related) issue. This is about utilizing ListLike to aid the underlying implementation.

Though the point definitely still stands that allowing any ListLike underlying representation could allow users to embed things in a Braid that perhaps they shouldn't, but in general I think the performance would still be better than any other possible option (even if the only benefit they get is use of the length tracking optimizations of Braid).

I see Braid as a useful base abstraction onto which you can build a few quick and easy optimizations. For instance with YiString it uses the base Braid abstraction but includes additional efficient combinators for working with newlines (which a Braid knows nothing about).

All that said, this does represent a pretty large change for Yi.Rope, I'd love to be able to work something out though. I think it opens up some pretty clever and interesting possibilities.

src/Yi/Braid.hs Outdated
-- If however your function is unaffected by this 'chunking' behaviour
-- you can tag or transform your underlying sequences or convert between
-- `Braid` types.
fmap' :: (ValidBraid v s a, ValidBraid q t b) => (a -> b) -> Braid v a -> Braid q b
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's that Braid -> Braid isomorphism you wanted @siddhanathan

`withChunk` -> `fmap'` to allow converting between Braid types
Map now allows changing the segment type
src/Yi/Braid.hs Outdated
Chunk _ x :< ts -> mkChunk LL.length (LL.filter p x) -| go ts

-- | Maps the segments of the underlying chain.
map :: (ValidBraid v s a, ValidBraid q t b) => (s -> t) -> Braid v a -> Braid q b
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a generalized underlying segment map! @siddhanathan

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v /= q would be an interesting case.

@noughtmare
Copy link
Member

This does pull in a pretty large amount of transitive dependencies which of course is not ideal.

Yi already depends on ListLike, so it's not a problem for Yi.

@ChrisPenner
Copy link
Member Author

ChrisPenner commented Sep 16, 2017

Just ran the benchmarks and it looks like most operations have taken a hit; I'm really not very familiar with doing optimization work in Haskell so I'll need some tips from others to take care of that; my suspicions is that INLINE and SPECIALIZE pragmas can help us here. I was also forced to remove the UNPACK pragma on Chunk when I parametized it over the underlying type, so I figure that's probably got something to do with it. I haven't done high-performance haskell before though; I'll need some pointers to know where to start.

@ChrisPenner
Copy link
Member Author

ChrisPenner commented Sep 16, 2017

Another interesting possibility would be to move Chunk to Yi.Rope from Yi.Braid, write a ListLike instance for Chunk which exposes the cached length via LL.length, then use LL.length whenever needed in Yi.Braid instead of unpacking Chunks.

This should make the implementation simpler (less packing/unpacking) and would allow us to avoid wrapping other ListLike types in a redundant Chunk; e.g. Vectors already have V.length which is O(1), so the Chunk wrapper is unnecessary.

It would also allow us to specialize Chunk to Text values rather than being polymorphic which will let us UNPACK again.

Chunk is a Text specific optimization.
This implements ListLike for Chunk and removes the need for a chunk
in Braid.
@ghost
Copy link

ghost commented Sep 18, 2017

@ChrisPenner I have some changes in https://github.com/siddhanathan/yi-rope/tree/abstract-base-type which I haven't tested thoroughly. I believe it should improve performance. See if it helps.

@ChrisPenner
Copy link
Member Author

Unfortunately @siddhanathan it doesn't seem to have made much difference 😢

@ghost
Copy link

ghost commented Sep 19, 2017

@ChrisPenner but it does make a small difference?

@ChrisPenner
Copy link
Member Author

@siddhanathan Yes, but we're still orders of magnitude away. As per the conversation on Matrix; implementing a better LL.concat brought us in the ballpark, we should review the rest of the LL functions to find other improvements.

@ChrisPenner
Copy link
Member Author

@ethercrow @noughtmare @Fuuzetsu I've been considering reviving this Pull Request for some new needs I have with Rasa; some options would include:

  • Leave Rope as an independent type and implementation and provide Braid alongside it as an orthoganal option, this avoids any performance degradation on Rope while still allowing people to use Braid when they like
  • clean it up against latest master and try to merge it in a similar form, this may cause some small performance degradation to Rope
  • fork the library into a new one just for the Braid type which would be maintained separately from Rope

Let me know if any of these seem reasonable 😄 Thanks!

Tagging @jmatsushita too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants