Skip to content

Commit

Permalink
Merge pull request #14 from zookzook/bulk_api
Browse files Browse the repository at this point in the history
added bulk writes
  • Loading branch information
zookzook authored May 21, 2019
2 parents 335839e + 3b0e23c commit 0699e7c
Show file tree
Hide file tree
Showing 18 changed files with 1,357 additions and 59 deletions.
1 change: 0 additions & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ sudo: required
language: elixir

elixir:
- 1.4
- 1.5
- 1.6
- 1.7
Expand Down
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
* Enhancements
* The driver provides now client metadata
* Added support for connecting via UNIX sockets (`:socket` and `:socket_dir`)
* Added support for bulk writes (ordered/unordered, in-memory/stream)
* Added support for `op_msg` with payload type 1
* Merged code from https://github.com/ankhers/mongodb/commit/63c20ff7e427744a5df915751adfaf6e5e39ae62
* Merged changes from https://github.com/ankhers/mongodb/pull/283
* Merged changes from https://github.com/ankhers/mongodb/pull/281
Expand All @@ -13,8 +15,8 @@
* Travis now using the right MongoDB version

* Bug Fixes
* added test unit for change streams
* removed debug code from change streams
* Added test unit for change streams
* Removed debug code from change streams

## v0.5.2

Expand Down
71 changes: 68 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ for the individual options.
* [x] Upgraded to ([DBConnection 2.x](https://github.com/elixir-ecto/db_connection))
* [x] Removed depreacated op codes ([See](https://docs.mongodb.com/manual/reference/mongodb-wire-protocol/#request-opcodes))
* [x] Added `op_msg` support ([See](https://docs.mongodb.com/manual/reference/mongodb-wire-protocol/#op-msg))
* [x] Added bulk writes ([See](https://github.com/mongodb/specifications/blob/master/source/crud/crud.rst#write))
* [ ] Add support for driver sessions ([See](https://github.com/mongodb/specifications/blob/master/source/sessions/driver-sessions.rst))
* [ ] Add support driver transactions ([See](https://github.com/mongodb/specifications/blob/master/source/transactions/transactions.rst))
* [ ] Add support for `op_compressed` ([See](https://github.com/mongodb/specifications/blob/master/source/compression/OP_COMPRESSED.rst))
Expand All @@ -32,13 +33,11 @@ for the individual options.
* Connection pooling ([through DBConnection 2.x](https://github.com/elixir-ecto/db_connection))
* Streaming cursors
* Performant ObjectID generation
* Follows driver specification set by 10gen
* Safe (by default) and unsafe writes
* Aggregation pipeline
* Replica sets
* Support for SCRAM-SHA-256 (MongoDB 4.x)
* Support for change streams api ([See](https://github.com/mongodb/specifications/blob/master/source/change-streams/change-streams.rst))

* Support for bulk writes ([See](https://github.com/mongodb/specifications/blob/master/source/crud/crud.rst#write))

## Data representation

Expand Down Expand Up @@ -190,6 +189,72 @@ end
spawn(fn -> for_ever(top, self()) end)
```

For more information see

* [Mongo.watch_collection](https://hexdocs.pm/mongodb_driver/Mongo.html#watch_collection/5)


### Bulk writes

The motivation for bulk writes lies in the possibility of optimization, the same operations
to group. Here, a distinction is made between disordered and ordered bulk writes.
In disordered, inserts, updates, and deletes are grouped as individual commands
sent to the database. There is no influence on the order of the execution.
A good use case is the import of records from one CSV file.
The order of the inserts does not matter.

For ordered bulk writers, order compliance is important to keep.
In this case, only the same consecutive operations are grouped.

Currently, all bulk writes are optimized in memory. This is unfavorable for large bulk writes.
In this case, one can use streaming bulk writes that only have a certain set of
group operation in memory and when the maximum number of operations
has been reached, operations are written to the database. The size can be specified.

Using ordered bulk writes. In this example we first insert some dog's name, add an attribute `kind`
and change all dogs to cats. After that we delete three cats. This example would not work with
unordered bulk writes.

```elixir

bulk = "bulk"
|> OrderedBulk.new()
|> OrderedBulk.insert_one(%{name: "Greta"})
|> OrderedBulk.insert_one(%{name: "Tom"})
|> OrderedBulk.insert_one(%{name: "Waldo"})
|> OrderedBulk.update_one(%{name: "Greta"}, %{"$set": %{kind: "dog"}})
|> OrderedBulk.update_one(%{name: "Tom"}, %{"$set": %{kind: "dog"}})
|> OrderedBulk.update_one(%{name: "Waldo"}, %{"$set": %{kind: "dog"}})
|> OrderedBulk.update_many(%{kind: "dog"}, %{"$set": %{kind: "cat"}})
|> OrderedBulk.delete_one(%{kind: "cat"})
|> OrderedBulk.delete_one(%{kind: "cat"})
|> OrderedBulk.delete_one(%{kind: "cat"})

result = Mongo.BulkWrite.write(:mongo, bulk, w: 1)
```

In the following example we import 1.000.000 integers into the MongoDB using the stream api:

We need to create an insert operation for each number. Then we call the `Mongo.UnorderedBulk.stream`
function to import it. This function returns a stream function which accumulate
all inserts operations until the limit `1000` is reached. In this case the operation group is send to
MongoDB. So using the stream api you can reduce the memory using while
importing big volume of data.

```elixir
1..1_000_000
|> Stream.map(fn i -> Mongo.BulkOps.get_insert_one(%{number: i}) end)
|> Mongo.UnorderedBulk.write(:mongo, "bulk", 1_000)
|> Stream.run()
```

For more information see and check the test units for examples.
* [Mongo.UnorderedBulk](https://hexdocs.pm/mongodb_driver/Mongo.UnorderedBulk.html#content)
* [Mongo.OrderedBulk](https://hexdocs.pm/mongodb_driver/Mongo.OrderedBulk.html#content)
* [Mongo.BulkWrites](https://hexdocs.pm/mongodb_driver/Mongo.BulkWrites.html#content)
* [Mongo.BulkOps](https://hexdocs.pm/mongodb_driver/Mongo.BulkOps.html#content)

### Examples

Using `$and`
Expand Down
66 changes: 39 additions & 27 deletions lib/mongo.ex
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,7 @@ defmodule Mongo do
@spec find_one_and_update(GenServer.server, collection, BSON.document, BSON.document, Keyword.t) :: result(BSON.document) | {:ok, nil}
def find_one_and_update(topology_pid, coll, filter, update, opts \\ []) do
_ = modifier_docs(update, :update)
query = [
cmd = [
findAndModify: coll,
query: filter,
update: update,
Expand All @@ -289,13 +289,12 @@ defmodule Mongo do
opts = Keyword.drop(opts, ~w(bypass_document_validation max_time projection return_document sort upsert collation)a)

with {:ok, conn, _, _} <- select_server(topology_pid, :write, opts),
{:ok, doc} <- direct_command(conn, query, opts) do
{:ok, doc} <- exec_command(conn, cmd, opts) do
{:ok, doc["value"]}
end

end


@doc """
Finds a document and replaces it.
Expand All @@ -317,7 +316,7 @@ defmodule Mongo do
@spec find_one_and_replace(GenServer.server, collection, BSON.document, BSON.document, Keyword.t) :: result(BSON.document)
def find_one_and_replace(topology_pid, coll, filter, replacement, opts \\ []) do
_ = modifier_docs(replacement, :replace)
query = [
cmd = [
findAndModify: coll,
query: filter,
update: replacement,
Expand All @@ -333,7 +332,7 @@ defmodule Mongo do
opts = Keyword.drop(opts, ~w(bypass_document_validation max_time projection return_document sort upsert collation)a)

with {:ok, conn, _, _} <- select_server(topology_pid, :write, opts),
{:ok, doc} <- direct_command(conn, query, opts), do: {:ok, doc["value"]}
{:ok, doc} <- exec_command(conn, cmd, opts), do: {:ok, doc["value"]}
end

defp should_return_new(:after), do: true
Expand All @@ -352,7 +351,7 @@ defmodule Mongo do
"""
@spec find_one_and_delete(GenServer.server, collection, BSON.document, Keyword.t) :: result(BSON.document)
def find_one_and_delete(topology_pid, coll, filter, opts \\ []) do
query = [
cmd = [
findAndModify: coll,
query: filter,
remove: true,
Expand All @@ -364,13 +363,13 @@ defmodule Mongo do
opts = Keyword.drop(opts, ~w(max_time projection sort collation)a)

with {:ok, conn, _, _} <- select_server(topology_pid, :write, opts),
{:ok, doc} <- direct_command(conn, query, opts), do: {:ok, doc["value"]}
{:ok, doc} <- exec_command(conn, cmd, opts), do: {:ok, doc["value"]}
end

@doc false
@spec count(GenServer.server, collection, BSON.document, Keyword.t) :: result(non_neg_integer)
def count(topology_pid, coll, filter, opts \\ []) do
query = [
cmd = [
count: coll,
query: filter,
limit: opts[:limit],
Expand All @@ -382,7 +381,7 @@ defmodule Mongo do
opts = Keyword.drop(opts, ~w(limit skip hint collation)a)

# Mongo 2.4 and 2.6 returns a float
with {:ok, doc} <- command(topology_pid, query, opts),
with {:ok, doc} <- command(topology_pid, cmd, opts),
do: {:ok, trunc(doc["n"])}
end

Expand Down Expand Up @@ -456,7 +455,7 @@ defmodule Mongo do
"""
@spec distinct(GenServer.server, collection, String.t | atom, BSON.document, Keyword.t) :: result([BSON.t])
def distinct(topology_pid, coll, field, filter, opts \\ []) do
query = [
cmd = [
distinct: coll,
key: field,
query: filter,
Expand All @@ -468,7 +467,7 @@ defmodule Mongo do

with {:ok, conn, slave_ok, _} <- select_server(topology_pid, :read, opts),
opts = Keyword.put(opts, :slave_ok, slave_ok),
{:ok, doc} <- direct_command(conn, query, opts),
{:ok, doc} <- exec_command(conn, cmd, opts),
do: {:ok, doc["values"]}
end

Expand Down Expand Up @@ -562,20 +561,21 @@ defmodule Mongo do
in the document.
"""
@spec command(GenServer.server, BSON.document, Keyword.t) :: result(BSON.document)
def command(topology_pid, query, opts \\ []) do
def command(topology_pid, cmd, opts \\ []) do
rp = ReadPreference.defaults(%{mode: :primary})
rp_opts = [read_preference: Keyword.get(opts, :read_preference, rp)]
with {:ok, conn, slave_ok, _} <- select_server(topology_pid, :read, rp_opts),
opts = Keyword.put(opts, :slave_ok, slave_ok),
do: direct_command(conn, query, opts)
do: exec_command(conn, cmd, opts)
end

@doc false
@spec direct_command(pid, BSON.document, Keyword.t) :: {:ok, BSON.document | nil} | {:error, Mongo.Error.t}
def direct_command(conn, cmd, opts \\ []) do
## refactor: exec_command
@spec exec_command(pid, BSON.document, Keyword.t) :: {:ok, BSON.document | nil} | {:error, Mongo.Error.t}
def exec_command(conn, cmd, opts) do
action = %Query{action: :command}

with {:ok, _query, doc} <- DBConnection.execute(conn, action, [cmd], defaults(opts)),
with {:ok, _cmd, doc} <- DBConnection.execute(conn, action, [cmd], defaults(opts)),
{:ok, doc} <- check_for_error(doc) do
{:ok, doc}
end
Expand All @@ -587,19 +587,31 @@ defmodule Mongo do
@doc """
Returns the current wire version.
"""
def wire_version(conn, opts \\ []) do
@spec wire_version(pid) :: {:ok, integer} | {:error, Mongo.Error.t}
def wire_version(conn) do
cmd = %Query{action: :wire_version}
with {:ok, _query, version} <- DBConnection.execute(conn, cmd, %{}, defaults(opts)) do
with {:ok, _cmd, version} <- DBConnection.execute(conn, cmd, %{}, defaults([])) do
{:ok, version}
end
end

@doc """
Returns the limits of the database.
"""
@spec limits(pid) :: {:ok, BSON.document} | {:error, Mongo.Error.t}
def limits(conn) do
cmd = %Query{action: :limits}
with {:ok, _cmd, limits} <- DBConnection.execute(conn, cmd, %{}, defaults([])) do
{:ok, limits}
end
end

@doc """
Similar to `command/3` but unwraps the result and raises on error.
"""
@spec command!(GenServer.server, BSON.document, Keyword.t) :: result!(BSON.document)
def command!(topology_pid, query, opts \\ []) do
bangify(command(topology_pid, query, opts))
def command!(topology_pid, cmd, opts \\ []) do
bangify(command(topology_pid, cmd, opts))
end

@doc """
Expand Down Expand Up @@ -640,7 +652,7 @@ defmodule Mongo do
] |> filter_nils()

with {:ok, conn, _, _} <- select_server(topology_pid, :write, opts),
{:ok, doc} <- direct_command(conn, cmd, opts) do
{:ok, doc} <- exec_command(conn, cmd, opts) do
case doc do
%{"writeErrors" => _} -> {:error, %Mongo.WriteError{n: doc["n"], ok: doc["ok"], write_errors: doc["writeErrors"]}}
_ ->
Expand Down Expand Up @@ -685,7 +697,7 @@ defmodule Mongo do
wtimeout: Keyword.get(opts, :wtimeout)
} |> filter_nils()

query = [
cmd = [
insert: coll,
documents: docs,
ordered: Keyword.get(opts, :ordered),
Expand All @@ -694,7 +706,7 @@ defmodule Mongo do
] |> filter_nils()

with {:ok, conn, _, _} <- select_server(topology_pid, :write, opts),
{:ok, doc} <- direct_command(conn, query, opts) do
{:ok, doc} <- exec_command(conn, cmd, opts) do
case doc do
%{"writeErrors" => _} -> {:error, %Mongo.WriteError{n: doc["n"], ok: doc["ok"], write_errors: doc["writeErrors"]}}
_ ->
Expand Down Expand Up @@ -757,15 +769,15 @@ defmodule Mongo do
collation: Keyword.get(opts, :collation)
} |> filter_nils()

query = [
cmd = [
delete: coll,
deletes: [filter],
ordered: Keyword.get(opts, :ordered),
writeConcern: write_concern
] |> filter_nils()

with {:ok, conn, _, _} <- select_server(topology_pid, :write, opts),
{:ok, doc} <- direct_command(conn, query, opts) do
{:ok, doc} <- exec_command(conn, cmd, opts) do
case doc do
%{"writeErrors" => _} -> {:error, %Mongo.WriteError{n: doc["n"], ok: doc["ok"], write_errors: doc["writeErrors"]}}
%{ "ok" => _ok, "n" => n } ->
Expand Down Expand Up @@ -885,7 +897,7 @@ defmodule Mongo do
] |> filter_nils()

with {:ok, conn, _, _} <- select_server(topology_pid, :write, opts),
{:ok, doc} <- direct_command(conn, cmd, opts) do
{:ok, doc} <- exec_command(conn, cmd, opts) do

case doc do

Expand Down Expand Up @@ -966,7 +978,7 @@ defmodule Mongo do
def select_server(topology_pid, type, opts \\ []) do
with {:ok, servers, slave_ok, mongos?} <- select_servers(topology_pid, type, opts) do
if Enum.empty? servers do
{:ok, nil, slave_ok, mongos?} # todo: warum wird [] zurückgeliefert?, nil wäre besser?
{:ok, nil, slave_ok, mongos?}
else
with {:ok, connection} <- servers |> Enum.take_random(1) |> Enum.at(0)
|> get_connection(topology_pid) do
Expand Down
Loading

0 comments on commit 0699e7c

Please sign in to comment.