Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master'
Browse files Browse the repository at this point in the history
Conflicts:
	.gitignore
  • Loading branch information
olear committed Sep 20, 2016
2 parents 9a72f42 + 8f64da4 commit 1de745a
Show file tree
Hide file tree
Showing 111 changed files with 1,299 additions and 8,749 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ socorro/unittest/config/*.py
*.log
distribute*.tar.gz
analysis/build/
breakpad/
breakpad.tar.gz
depot_tools/
nosetests.xml
scripts/config/*.py
socorro/unittest/config/*.py
Expand Down
40 changes: 40 additions & 0 deletions alembic/versions/5bafdc19756c_drop_server_status_table.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
"""drop server_status table
Revision ID: 5bafdc19756c
Revises: 89ef86a3d57a
Create Date: 2016-09-13 15:56:53.898014
"""

# revision identifiers, used by Alembic.
revision = '5bafdc19756c'
down_revision = '89ef86a3d57a'

from alembic import op
from socorrolib.lib import citexttype, jsontype, buildtype
from socorrolib.lib.migrations import fix_permissions, load_stored_proc

import sqlalchemy as sa
from sqlalchemy import types
from sqlalchemy.dialects import postgresql
from sqlalchemy.sql import table, column

from sqlalchemy.dialects import postgresql


def upgrade():
op.drop_table('server_status')


def downgrade():
op.create_table('server_status',
sa.Column('avg_process_sec', sa.REAL(), autoincrement=False, nullable=True),
sa.Column('avg_wait_sec', sa.REAL(), autoincrement=False, nullable=True),
sa.Column('date_created', postgresql.TIMESTAMP(timezone=True), autoincrement=False, nullable=False),
sa.Column('date_oldest_job_queued', postgresql.TIMESTAMP(timezone=True), autoincrement=False, nullable=True),
sa.Column('date_recently_completed', postgresql.TIMESTAMP(timezone=True), autoincrement=False, nullable=True),
sa.Column('id', sa.INTEGER(), nullable=False),
sa.Column('processors_count', sa.INTEGER(), autoincrement=False, nullable=True),
sa.Column('waiting_job_count', sa.INTEGER(), autoincrement=False, nullable=False),
sa.PrimaryKeyConstraint('id', name=u'server_status_pkey')
)
12 changes: 0 additions & 12 deletions analysis/hbase_schema

This file was deleted.

36 changes: 0 additions & 36 deletions config/cron_submitter.ini-dist
Original file line number Diff line number Diff line change
Expand Up @@ -119,41 +119,6 @@
# converter: str
dump_file_suffix='.dump'

# name: forbidden_keys
# doc: a comma delimited list of keys banned from the processed crash in HBase
# converter: socorro.external.hbase.connection_context.<lambda>
forbidden_keys='email, url, user_id, exploitability'

# name: hbase_connection_pool_class
# doc: the class responsible for pooling and giving out HBaseconnections
# converter: configman.converters.class_converter
hbase_connection_pool_class='socorro.external.hbase.connection_context.HBaseConnectionContextPooled'

# name: hbase_host
# doc: Host to HBase server
# converter: str
hbase_host='localhost'

# name: hbase_port
# doc: Port to HBase server
# converter: int
hbase_port='9090'

# name: hbase_timeout
# doc: timeout in milliseconds for an HBase connection
# converter: int
hbase_timeout='5000'

# name: number_of_retries
# doc: Max. number of retries when fetching from hbaseClient
# converter: int
number_of_retries='2'

# name: source_implementation
# doc: a class for a source of raw crashes
# converter: configman.converters.class_converter
source_implementation='socorro.external.hbase.crashstorage.HBaseCrashStorage'

# name: sql
# doc: an sql string that selects crash_ids
# converter: str
Expand Down Expand Up @@ -195,4 +160,3 @@
# doc: the number of crashes to submit (all, forever, 1...)
# converter: str
number_of_submissions='all'

1 change: 0 additions & 1 deletion config/crontabber.ini-dist
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,6 @@
socorro.cron.jobs.matviews.CrashAduByBuildSignatureCronApp|1d|07:30
#socorro.cron.jobs.ftpscraper.FTPScraperCronApp|1h
#socorro.cron.jobs.automatic_emails.AutomaticEmailsCronApp|1h
socorro.cron.jobs.serverstatus.ServerStatusCronApp|5m
socorro.cron.jobs.reprocessingjobs.ReprocessingJobsApp|5m
socorro.cron.jobs.matviews.SignatureSummaryProductsCronApp|1d|05:00
socorro.cron.jobs.matviews.SignatureSummaryInstallationsCronApp|1d|05:00
Expand Down
91 changes: 2 additions & 89 deletions config/middleware.ini-dist
Original file line number Diff line number Diff line change
Expand Up @@ -62,40 +62,6 @@
# umask to use for new files
#umask=18

[[hb]]

#+include ./common_hb.ini

# delays in seconds between retries
#backoff_delays=10, 30, 60, 120, 300

# the suffix used to identify a dump file (for use in temp files)
#dump_file_suffix=.dump

# the class responsible for proving an hbase connection
#hbase_connection_context_class=socorro.external.hb.connection_context.HBaseConnectionContext

# Host to HBase server
#hbase_host=localhost

# Port to HBase server
#hbase_port=9090

# timeout in milliseconds for an HBase connection
#hbase_timeout=5000

# the maximum number of new crashes to yield at a time
#new_crash_limit=1000000

# a local filesystem path where dumps temporarily during processing
#temporary_file_system_storage_path=/tmp

# a class that will execute transactions
#transaction_executor_class=socorro.database.transaction_executor.TransactionExecutorWithInfiniteBackoff

# seconds between log during retries
#wait_log_interval=10

[[logging]]

#+include ./common_logging.ini
Expand Down Expand Up @@ -263,59 +229,6 @@
# see "resource.fs.umask" for the default or override it here
#umask=18

[hbase]

# delays in seconds between retries
# see "resource.hb.backoff_delays" for the default or override it here
#backoff_delays=10, 30, 60, 120, 300

# the suffix used to identify a dump file (for use in temp files)
# see "resource.hb.dump_file_suffix" for the default or override it here
#dump_file_suffix=.dump

# a list of keys not allowed in a redacted processed crash
# see "resource.redactor.forbidden_keys" for the default or override it here
#forbidden_keys=url, email, user_id, exploitability,json_dump.sensitive,upload_file_minidump_flash1.json_dump.sensitive,upload_file_minidump_flash2.json_dump.sensitive,upload_file_minidump_browser.json_dump.sensitive,memory_info

# None
#hbase_class=socorro.external.hb.crashstorage.HBaseCrashStorage

# the class responsible for proving an hbase connection
# see "resource.hb.hbase_connection_context_class" for the default or override it here
#hbase_connection_context_class=socorro.external.hb.connection_context.HBaseConnectionContext

# Host to HBase server
# see "resource.hb.hbase_host" for the default or override it here
#hbase_host=localhost

# Port to HBase server
# see "resource.hb.hbase_port" for the default or override it here
#hbase_port=9090

# timeout in milliseconds for an HBase connection
# see "resource.hb.hbase_timeout" for the default or override it here
#hbase_timeout=5000

# the maximum number of new crashes to yield at a time
# see "resource.hb.new_crash_limit" for the default or override it here
#new_crash_limit=1000000

# the name of the class that implements a 'redact' method
# see "resource.redactor.redactor_class" for the default or override it here
#redactor_class=socorro.external.crashstorage_base.Redactor

# a local filesystem path where dumps temporarily during processing
# see "resource.hb.temporary_file_system_storage_path" for the default or override it here
#temporary_file_system_storage_path=/tmp

# a class that will execute transactions
# see "resource.hb.transaction_executor_class" for the default or override it here
#transaction_executor_class=socorro.database.transaction_executor.TransactionExecutorWithInfiniteBackoff

# seconds between log during retries
# see "resource.hb.wait_log_interval" for the default or override it here
#wait_log_interval=10

[http]

[[correlations]]
Expand All @@ -336,9 +249,9 @@
[implementations]

# list of packages for service implementations
#implementation_list=psql: socorro.external.postgresql, hbase: socorro.external.hb, es: socorro.external.es, fs: socorro.external.fs, http: socorro.external.http, rabbitmq: socorro.external.rabbitmq
#implementation_list=psql: socorro.external.postgresql, boto: socorro.external.boto, es: socorro.external.es, fs: socorro.external.fs, http: socorro.external.http, rabbitmq: socorro.external.rabbitmq

# comma separated list of class overrides, e.g `Crashes: hbase`
# comma separated list of class overrides, e.g `Crashes: boto`
#service_overrides=CrashData: fs, Correlations: http, CorrelationsSignatures: http, SuperSearch: es, Priorityjobs: rabbitmq, Query: es

[introspection]
Expand Down
2 changes: 1 addition & 1 deletion docs/configuring-crash-stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ underlying data stores:

.. code-block:: bash
implementations__implementation_list='psql: socorro.external.postgresql, fs: socorro.external.filesystem, es: socorro.external.es, http: socorro.external.http, rabbitmq: socorro.external.rabbitmq, hb: socorro.external.fs'
implementations__implementation_list='psql: socorro.external.postgresql, fs: socorro.external.filesystem, es: socorro.external.es, http: socorro.external.http, rabbitmq: socorro.external.rabbitmq'
implementations__service_overrides='Correlations: http, CorrelationsSignatures: http, SuperSearch: es, Priorityjobs: rabbitmq, Search: es, Query: es'
# Pluggable Elasticsearch implementation
elasticsearch__elasticsearch_class='socorro.external.es.connection_context.ConnectionContext'
Expand Down
2 changes: 1 addition & 1 deletion docs/configuring-socorro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,7 @@ in AWS using Consul at https://github.com/mozilla/socorro-infra/

Socorro has a very powerful and expressive configuration system, and can
be configured to read from and write to a number of different data stores
(S3, Elasticsearch, HBase, PostgreSQL) and use queues (RabbitMQ)
(S3, Elasticsearch, PostgreSQL) and use queues (RabbitMQ)

For instance, to have processor store crashes to both to the filesystem and to
ElasticSearch:
Expand Down
2 changes: 1 addition & 1 deletion docs/development/addaservice.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ URL with parameters. Documentation for each service is available in the
Those services are not containing any code, but are only interfaces. They are
using other resources from the external module. That external module is
composed of one submodule for each external resource we are using. For example,
there is a PostgreSQL submodule, an elasticsearch submodule and an HBase
there is a PostgreSQL submodule, an elasticsearch submodule and a boto (AWS S3)
submodule.

You will also find some common code among external resources in
Expand Down
61 changes: 6 additions & 55 deletions docs/development/api/crashstorage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,16 @@ Concrete implementation:
* `NullCrashStorage`: Silently ignores everything it is told to do.

Examples of other concrete implementations are: `PostgreSQLCrashStorage`,
`HBaseCrashStorage`.
`BotoCrashStorage`.

CrashStorage containers for aggregating multiple crash storage implementations:

* `PolyCrashStorage`: Container for other crash storage systems.
* `FallbackCrashStorage`: Container for two other crash storage systems,
a primary and a secondary. Attempts on the primary, if it fails it will
fallback to the secondary. In use when we had primary/secondary HBase.
Can be heterogeneous, example: Hbase + filesystem and use crashmovers to
move from filesystem into hbase when hbase comes back.
fallback to the secondary. In use when we have cutover between data stores.
Can be heterogeneous, example: S3 + filesystem and use crashmovers to
move from filesystem into S3 when S3 comes back.
* `PrimaryDeferredStorage`: Container for two different storage systems and a
predicate function. If predicate is false, store in primary, otherwise
store in secondary. Usecase: situation where we want crashes to be put
Expand Down Expand Up @@ -142,7 +142,7 @@ Use cases:

* For Mozilla use by the collectors.
* For other users, you can use this class as your primary storage instead of
HBase. Be sure to implement this in collectors, crashmovers, processors and
S3. Be sure to implement this in collectors, crashmovers, processors and
middleware (depending on which components you use in your configuration).

`Important ops note:`
Expand All @@ -168,48 +168,6 @@ Classes:
in-filesystem queueing techniques so that we know which crashes are new.
Backwards compatible with `socorro.external.filesystem` (aka the 2009 system).

socorro.external.hb
-------------------

socorro.external.hb.crashstorage
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is used by crashmovers, processors. In the future, our middleware will
also use this instead of socorro.external.hbase. Can store raw crashes and
dumps. It has no knowledge of aggregations or normalized data.

*TODO: Needs crash_data to be implemented for middleware*

Special functions:

* `crash_id_to_timestamped_row_id`: HBase uses a different primary key than our
internal UUID. Taking the first character and last six, and copying them to the
front of the UUID. First character is the salt for the region, and the next
six provide the date, for ordering. Sometimes you'll see 'ooid' or 'uuid' in
the docs, but we really mean `crash_id`.

Implementation:

* `HBaseCrashStorage`: implements access to HBase. HBase schema is defined in
``analysis/hbase_schema``.

Exceptions:

* `BadCrashIdException`: just passes

socorro.external.hb.connection_context
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* `HBaseConnection`: all of the code that implements the core connection. Loose
wrapper around a bare socket speaking Thrift protocol. Commit/rollback are
noops.

* `HBaseConnectionContext`: In production use. A factory in the form of a
functor for creating the HBaseConnection instances.

* `HBasePersistentConnectionContext`: These are "pooled" so you can use them
again without closing. We don't use it and appears to be broken.

socorro.external.postgresql
---------------------------

Expand Down Expand Up @@ -303,11 +261,6 @@ socorro.external.filesystem

* Preceded `socorro.external.fs`.

socorro.external.hbase
^^^^^^^^^^^^^^^^^^^^^^

* Still in use by the middleware for `crash_data`.

socorro.storage
^^^^^^^^^^^^^^^

Expand All @@ -331,7 +284,7 @@ Which classes are used with which _app
using `PolyCrashStore`. In testing we use `socorro.external.fs`,
`socorro.external.rabbitmq`, and `socorro.external.postgresql`.

* `socorro.middleware.middleware_app`: In production: `socorro.external.hbase`.
* `socorro.middleware.middleware_app`: In production: `socorro.external.boto`.
In testing: we use `socorro.external.fs` and `socorro.external.postgresql`.

* `socorro.collector.submitter_app`: Defines it's own storage classes:
Expand All @@ -340,8 +293,6 @@ Which classes are used with which _app
to get a list of crashstorage ids and uses any other crashstorage as a source
for the raw crashes that it pulls.

*TODO: update submitter_app to use the new socorro.external.hb instead of hbase*

Which classes can be used together
----------------------------------

Expand Down
Loading

0 comments on commit 1de745a

Please sign in to comment.