What does it do?

Experimental Node.js app to get content from non-product New Relic web properties and index them in an AWS Elasticsearch cluster. This can ultimately replace Swiftype which these web properties use to crawl their sites, index the pages, and serve that indexed data via API to provide search results for some of the sites.

This uses Node.js to run scripts, AWS Elasticsearch (in Clinton's account for now), and AWS DynamobDB to track crawler history and lookup Elasticsearch IDs (in Clinton's account for now).

Installation

Work in Progress

How to run

Run npm start.

To run specific web property crawler file:

Update package.json with the script name and path.
Run npm run NAME where NAME is the name you set in package.json.

How it crawls docs

AWS Access / Permissions

Uses Node.js AWS SDK to run client and authenticate requests as a user Clinton set up in his personal account. GET and POST requests are allowed via the following IP addresses. The admin user can also run PUT and DELETE requests from the following IP addresses:

38.104.104.46
67.171.204.51
38.104.105.178

TODO

Add paginating through docs endpoint
Add crawler for non-docs web property for further proof of concept
Convert searchIndexID property to number for sorting within DynamobDB

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What does it do?

Installation

How to run

How it crawls docs

AWS Access / Permissions

TODO

About

Releases

Packages

Languages

roadlittledawn/elasticsearch-indexer

Folders and files

Latest commit

History

Repository files navigation

What does it do?

Installation

How to run

How it crawls docs

AWS Access / Permissions

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages