Skip to content

Commit

Permalink
Lightweight wrapper for csvlint.rb to provide web API
Browse files Browse the repository at this point in the history
  • Loading branch information
davetaz committed Nov 8, 2024
0 parents commit 8c53151
Show file tree
Hide file tree
Showing 5 changed files with 209 additions and 0 deletions.
5 changes: 5 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
source 'https://rubygems.org'

gem 'sinatra'
gem 'csvlint'
gem 'webrick'
1 change: 1 addition & 0 deletions Procfile
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
web: ruby csvlint_server.rb -p $PORT
112 changes: 112 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# CSVLint-API

CSVLint is a Ruby-based server for validating CSV files. It checks CSV files against standard structures and schemas, providing detailed feedback on any issues detected.

## Features

- **Structure Validation**: Checks for structural issues, such as inconsistent row lengths, incorrect quoting, and malformed line endings.
- **Schema Validation**: Validates CSV data against schemas (e.g., JSON-based) to ensure data formats, types, and constraints are met.
- **Dialect Options**: Flexible options for parsing CSV files, such as delimiters, quoting characters, and line terminators.
- **Detailed Reporting**: Provides error, warning, and informational feedback on validation results.

## Installation

### Prerequisites

Make sure you have the following installed on your system:

- **Ruby** (version 2.6 or higher)
- **Bundler** (for managing Ruby dependencies)

### Step 1: Clone the Repository

Clone the repository from GitHub and navigate into the project directory:

```bash
git clone https://github.com/theodi/csvlint-api.git
cd csvlint-api
```

### Step 2: Install Ruby Dependencies

Ensure you have Bundler installed. If not, you can install it with:

```bash
gem install bundler
```

Then install all required Ruby gems:

```bash
bundle install
```

### Step 3: Set Up Environment Variables

Create an `.env` file in the project root (you can start by copying `.env.example`):

```bash
cp .env.example .env
```

Set any necessary environment variables in `.env`. At minimum, you should define:

- `PORT`: The port on which the server will run (default: `4567`).

### Step 4: Run the Server Locally

To start the CSVLint server locally, use:

```bash
ruby csvlint_server.rb
```

The server will start on the specified port (default `4567`). You can access it at `http://localhost:4567`.

### Usage

#### Web Interface

To use CSVLint through a web interface, simply open the URL in your browser:

```bash
http://localhost:4567
```

#### API Usage

The CSVLint API allows you to programmatically validate CSV files. You can upload a CSV file or provide a URL to a CSV with optional schema and dialect options.

**Example `POST` Request:**

```bash
curl -X POST http://localhost:4567/validate \
-F "file=@/path/to/yourfile.csv" \
-F "schema=@/path/to/yourschema.json" \
-F "dialect={\"delimiter\":\",\",\"quoteChar\":\"\\\"\"}"
```

**Response:**

The API responds with JSON validation results. Here’s an example response format:

```json
{
"valid": true,
"errors": [],
"warnings": [],
"info_messages": []
}
```

#### Deploying to Render

1. **Log in to Render** and create a new service.
2. **Link your GitHub repository**.
3. **Choose the Ruby environment** and specify `Gemfile` for dependencies.
4. Set environment variables, including `PORT`.
5. Click **Create Web Service** to deploy.

### License

This project is licensed under the MIT License.
2 changes: 2 additions & 0 deletions config.ru
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
require './csvlint_server'
run Sinatra::Application
89 changes: 89 additions & 0 deletions csvlint_server.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
require 'sinatra'
require 'csvlint'
require 'webrick'
require 'securerandom'
require 'json'

set :port, ENV['PORT'] || 4567
set :server, 'webrick'

post '/validate' do
content_type :json

# Retrieve file, schema, and dialect from the request
csv_file = params[:file]
schema_file = params[:schema]
dialect = JSON.parse(params[:dialect] || '{}') # Parse dialect JSON if provided

# Generate a unique temporary file name
csv_tempfile = "temp_#{SecureRandom.uuid}.csv"
schema_tempfile = "temp_schema_#{SecureRandom.uuid}.json" if schema_file

begin
# Save the CSV file temporarily
File.open(csv_tempfile, "wb") { |f| f.write(csv_file[:tempfile].read) }

# Save schema file if provided
schema = nil
if schema_file
File.open(schema_tempfile, "wb") { |f| f.write(schema_file[:tempfile].read) }
schema = Csvlint::Schema.load_from_json(File.new(schema_tempfile))
end

# Create the CSV validator with dialect options and schema
validator = Csvlint::Validator.new(File.new(csv_tempfile), dialect, schema)

# Perform validation
validator.validate

# Map errors, warnings, and info messages into hashes
errors = validator.errors.map do |error|
{
category: error.category,
type: error.type,
row: error.row,
column: error.column,
content: error.content
}
end

warnings = validator.warnings.map do |warning|
{
category: warning.category,
type: warning.type,
row: warning.row,
column: warning.column,
content: warning.content
}
end

info_messages = validator.info_messages.map do |info|
{
category: info.category,
type: info.type,
row: info.row,
column: info.column,
content: info.content
}
end

# Collect results
result = {
valid: validator.valid?,
errors: errors,
warnings: warnings,
info_messages: info_messages
}

rescue StandardError => e
# Handle errors gracefully and return the error message as JSON
result = { error: "Validation failed: #{e.message}" }
ensure
# Ensure temporary files are deleted
File.delete(csv_tempfile) if File.exist?(csv_tempfile)
File.delete(schema_tempfile) if schema_file && File.exist?(schema_tempfile)
end

# Return the result as JSON
result.to_json
end

0 comments on commit 8c53151

Please sign in to comment.