Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to collapse a Corpus #15

Open
neelsmith opened this issue Mar 2, 2017 · 2 comments
Open

Add function to collapse a Corpus #15

neelsmith opened this issue Mar 2, 2017 · 2 comments
Milestone

Comments

@neelsmith
Copy link
Contributor

E.g., collapse token-level exemplar to a canonical-level exemplar.

Function should take parameter for level to collapse to, and String value for new exemplar identifier.

@neelsmith
Copy link
Contributor Author

neelsmith commented Mar 2, 2017

Here's an example of how to do two-tier sorting of nodes.Functions are poorly named for this general case, but start wtih "publishable" and you reduce the token-level analytical edition to a read/analyzed canonical-level Corpus. NB: it's lacking the required feature of identifying the new exemplar by a unique exemplar identifier

def passagesFromTokens(readingPairs: Vector[(CtsUrn, String)]) = {
  val triples = readingPairs.zipWithIndex.map( v  => TextTriple(v._1._1, v._1._2, v._2))
  val trVect = triples.groupBy(_.urn).toSeq.toVector
  trVect.map{ case (k,v) =>(k, v.sortBy(_.seq).map(_.reading).mkString(" ") ) }  
}

def idxForUrn(u: CtsUrn, urnSeq: Vector[(CtsUrn, Int)]) = {
  urnSeq.filter(_._1 == u)(0)._2
}

def publishable(scholionGroup: String, publType: String) : Vector[CitableNode] = {
  val tkns =tokensForDocument(scholionGroup, publType)
  val passages = passagesFromTokens(tkns)
  // For final sort:
  val urnSeq = tkns.map(_._1).distinct.zipWithIndex

  val sortedFinal = passages.sortBy{ case (k,v) => idxForUrn(k, urnSeq) }
  sortedFinal.map { case (k,v) => CitableNode(k,v)} 
}

@neelsmith neelsmith modified the milestones: UI-ready, microservice Mar 6, 2017
@neelsmith
Copy link
Contributor Author

There may be easier algorithms. Look at the way hmt-mom collapses a set of tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant