Skip to content

Feature implementation

almgong edited this page Dec 7, 2016 · 18 revisions

Implementation deatils

##DynamoDBClient

###Overview: DynamoDBClient (DDBC) provides methods to upload larger items to AWS DyanoDB. The current implementation affords uploading items of any size (theoretically unbounded), by chunking the item into 400KB chunks.

An average flow of events when using DDBC is as follows:

  • A call is made to upload a record to DynamoDB.
  • DDBC, which is given the file size and the content as a String, begins to iteratively break the data into a byte array to send upstream to AWS. The length of said array is never larger than 400KB, which is the official limit of DynamoDB.
  • The byte array effectively represents a String representation of the data (i.e. using jackson-* libraries).
  • There are 2 keys in play, a partition and sort key. The chunks are stored in a separate chunk table, with a composite primary key consisting of both the partition and sort key. The partition key is the token, and the sort key indicates what position the series of bytes belongs to in the original non-chunked item.
  • On finish, a status is returned to the caller which includes the total number of chunks uploaded to Dynamo.
  • One last upload is made to the main metaprot-task table, which effectively acts as a manifest record. This record contains the same token used in the composite key of the chunk, and details information such as: timestamp, number of chunks, and original filename uploaded.
  • When data is to be retrieved, DDBC exposes a method to retrieve the String content (and any additional chunks needed). The return of this function should match exactly the input in step 1. The logic behind leaving them as a String representation is so that any arbitrary data type can be uploaded and retrieved, as long as it can be reasonably marshalled/unmarshalled into some String representation by the caller.

##Client-side file upload There is a file uploader JavaScript module written and used originally for Copakb (however has no internal dependencies on that project). The file, S3Uploader.js exposes the S3Uploader module with functions to upload files in chunks (with optional parallelism) directly to Amazon S3. Temporary user credentials retrieved via AWS Cognito give the users the right to upload to files to specific locations of the appropriate bucket.

Thus flow is as follows:

*User attempts to retrieve temporary credentials from Cognito *Users then load their files in memory using the FileController module (found in S3Uploader.js) *Users then upload their files directly to the S3 bucket associated with Metaprot.

##Metabolite Analysis (MA)

###Overview: MA uses both REST and web controllers. REST controllers exist to begin analysis, whereas the web controllers exist to display HTML results of the analysis. As with Pattern recognition, a user is expected to have uploaded a file (and selected their desired levels of pre-processing) at an earlier step.

Here is a breakdown of the expected interaction with the MA feature, as well as high level details of the business logic:

  • On the front end, a user will fill out a form that contains important threshold values, among others.
  • On form submit, the server will use an R script to attempt to transform the processed input file into one that is expected for metabolite analysis. The server will return (quickly) once the file is ready to continue. The server may return error messages if the token is invalid, or if transformation fails. The UI will be careful to show only error messages, and will silently move on to the next step on success.
  • Once the front end receives the OK from the server, another REST call is made to HTTP POST /analyze/metabolites/<token>, where <token> is a UUID retrieved from HTTP GET /analyze/token. This starts analysis.
  • The server will run the appropriate R commands via Rserve, which leads to a certain number of files being generated to a predefined directory.
  • The server will read in these files and store the results into the database.
  • The REST call above returns some HTML to display to the user, either an error or success message with a link to the results page.
  • The user navigates to the result page, and the web controller contacts the database for the computed results.
  • The results are passed back to the front end, using Thymeleaf as the template engine.
  • The JavaScript modules and libraries (D3.js) now do their work to initialize and bind the necessary events. Notable JS classes: DataSegregator.js for segregating the output from the server into significance groups, SVGPlot.js, handles all plotting and interactive events for the plot(s).

##Temporal Pattern Recognition Analysis

###Overview Temporal Pattern Recognition Analysis also uses both REST and web controllers. REST controllers exist to begin analysis, whereas the web controllers exist to display HTML results of the analysis. It is assumed that a file was uploaded at an earlier step, and has undergone all pre-processing specified by the user.

Here is a breakdown of the expected interaction with the Temporal Pattern Recognition feature, as well as high level details of the business logic:

  • On the front end, a user will fill out the form that contains important cluster values, among others.
  • On form submit, the server will attempt to transform the file into one that pattern recognition expects. Again, the server will return in a manner similar to that of MA, and the UI will be careful to show only error messages. Success is treated silently and only then does logic continue to the next step.
  • Once the front end receives the OK from the server, a REST call is made to HTTP POST /analyze/temporal-pattern-recognition/<token>, where <token> is a UUID retrieved from HTTP GET /analyze/token. This starts analysis.
  • The server will run the appropriate R commands via Rserve, which leads to a certain number of files being generated to a predefined directory.
  • The server will read in these files and store the results into the database.
  • The REST call above returns some HTML to display to the user, either an error or success message with a link to the results page.
  • The user navigates to the result page, and the web controller contacts the database for the computed results.
  • The results are passed back to the front end, using Thymeleaf as the template engine.
  • The user is also given the option of changing the input parameters and recomputing the clusters.
  • The JavaScript modules and libraries (D3.js) now do their work to initialize and bind the necessary events. Notable JS classes: PatternRecogPlot.js, handles all plotting and interactive events for the plot(s).
Clone this wiki locally