Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add post, absent and confirmed ballots to results.csv, results table #43

Open
maciej opened this issue Sep 28, 2014 · 24 comments
Open

Add post, absent and confirmed ballots to results.csv, results table #43

maciej opened this issue Sep 28, 2014 · 24 comments
Assignees
Labels
Milestone

Comments

@maciej
Copy link
Contributor

maciej commented Sep 28, 2014

Results currently ignore post (POŠTOM), absent (ODUSTVO) and confirmed and (POTVRÐENI) ballots.

@maciej maciej added this to the 2014 General Elections milestone Sep 28, 2014
@maciej
Copy link
Contributor Author

maciej commented Sep 28, 2014

Fixing this will affect raw data to results.csv transformation. It might also affect the data model.

cc/ @informationchef @mihi-tr

@mihi-tr
Copy link
Contributor

mihi-tr commented Sep 30, 2014

@maciej not sure we had this in the original data. Will check back.

@mihi-tr mihi-tr self-assigned this Sep 30, 2014
@maciej
Copy link
Contributor Author

maciej commented Sep 30, 2014

@mihi-tr yes, they are there
Please take for example 2010_FBiH_511_IZBORNA_JEDINICA_1.csv. It's the last 3 lines of the data set (except the totals).

@mihi-tr
Copy link
Contributor

mihi-tr commented Sep 30, 2014

@maciej would you want them to be treated similarly to parties? E.g. have a line for each of them?

@maciej
Copy link
Contributor Author

maciej commented Sep 30, 2014

@mihi-tr I'm not sure if I understand you.
From what I can tell these votes add up to party results. In the file I linked to you'll notice that parties are columns whereas polling stations are rows.
So, I guess my answer should be: I want them to be treated as polling stations.

@mihi-tr
Copy link
Contributor

mihi-tr commented Sep 30, 2014

@maciej in the cleaned data we have each polling-station-party combination as a row - would you just want a row with the missing, invalid etc. as well? or shall I include this simply in the polling station data (right now it get's removed afair).

@maciej
Copy link
Contributor Author

maciej commented Sep 30, 2014

@mihi-tr OK, I'd say I want a single row for a (Party, ElectionUnit, NonPollingStationVotes) tuple.
Where NonPollingStationVotes are either post, absent or confirmed.

@maciej
Copy link
Contributor Author

maciej commented Oct 12, 2014

The POST, ABSENT and CONFIRMED votes are available only with election unit granularity. We do have those results data broken by municipality, not to mention by polling station

The implications of that are far reaching. The current system design – all the way from the data representation, through the result endpoints up to the presentation layer – assumes that every result is assigned to a municipality. Fixing this issue will not be trivial and will require redesign work for all the layers of the system.

@darkobrkan
Copy link
Contributor

hey people, this might sound a bit naive, but I have also discussed it with
a few people, and I think it might be a way to do it without changing the
system design. Why don't we introduce the post, absent and confirmed
ballots as new municipalities - something like phantom municipalities that
would be just extra in the data, but not shown on the map? they could even
be shown separately on a map as a separate circle or whatever. We don't
need their granularity for the municipality nor poll level in this case.
Would this make things easier and more solvable at this point? we would
really need to figure this out if we want to finish this, and I think that
all other actions depend on this. Solving this will open up a lot of
options, including presenting the data from these elections and all, and
could really shift the success of the project in a positive way totally.

Please do let me know what do you think would it be solvable this way and
if you could help out in some way.

Best,
d.

On Sun, Oct 12, 2014 at 2:41 AM, Maciej Biłas [email protected]
wrote:

The POST, ABSENT and CONFIRMED votes are available only with election unit
granularity. We do have those results data broken by municipality, not to
mention by polling station

The implications of that are far reaching. The current system design – all
the way from the data representation, through the result endpoints up to
the presentation layer – assumes that every result is assigned to a
municipality. Fixing this issue will not be trivial and will require
redesign work for all the layers of the system.


Reply to this email directly or view it on GitHub
#43 (comment).

@maciej
Copy link
Contributor Author

maciej commented Oct 17, 2014

Hey @darkobrkan,
Phantom municipalities seem to be a good way to correct mandate computations. However that will make the results returned from the API look odd. We'd have to agree to publish only the platform without the API. There is nothing wrong in doing it later, though.
I'm super busy right now. I might have a chance to look at this early November the earliest. If that does not work out then it's around the weekend of 14-16 November.

@mihi-tr
Copy link
Contributor

mihi-tr commented Oct 22, 2014

Sorry for dumping out of the discussion - was quite busy with trainings and changing work (yes I'm not working with School of Data anymore). I can work cleaning this probably on the WE.

Did we get the new results yet?

@maciej
Copy link
Contributor Author

maciej commented Oct 22, 2014

@mihi-tr hey Michael!
If you could regenerate a results.cvs with the now missing data with some fabricated municipality_ids that would be awesome. The elections visualisations tool would then be releasable.

@mihi-tr
Copy link
Contributor

mihi-tr commented Oct 26, 2014

@maciej
Copy link
Contributor Author

maciej commented Oct 26, 2014

@mihi-tr looks good. I haven't verified the data, but I assume it's correct. Sadly, for the time being we also need a fake municipality_id for every electoral_unit.

@mihi-tr
Copy link
Contributor

mihi-tr commented Oct 26, 2014

Do you want to introduce them? Or shall I? e.g. 999 998 997

@maciej
Copy link
Contributor Author

maciej commented Oct 26, 2014

@mihi-tr I think it's better if you do it, because then the Google Refine settings/scripts will reflect all processed CSVs. Please layout the file in terms of columns it the same way as results.csv are.
If I did it – it'd be in Python and it would introduce another step and tool to the toolchain.

Then again, if it's a hassle in Refine – no big deal, it's trivial in Pandas.

mihi-tr added a commit that referenced this issue Oct 26, 2014
Adresses #43 on github.

The format now follows the results format. The following mock Municipality
IDs have been created:

997 post
998 absent
999 confirmed
@mihi-tr
Copy link
Contributor

mihi-tr commented Oct 26, 2014

@maciej take a look now - once we're done I'll also add the refine .json protocol.

@maciej
Copy link
Contributor Author

maciej commented Oct 27, 2014

This is the header of "General-Election-Results-2010.csv":

election_unit_ID,election_type,year,polling_station_ID,municipality_ID,party,party_abrev,votes,candidate

For some reason it contains a candidate column, which I think was supposed to refer to a candidate name – something we never had data for. Without it the data won't import. Since the column in the MySQL table and our results.csv file it's trivial to remove I've added a task for that #47. I'll look into that later this week and reimport the post_absent_confirmed_2010.csv and get back to you.

@darkobrkan
Copy link
Contributor

The new data is out. There is though one change that might be problematic.
There has been a change in the election units, but only for Republika
Srpska National Assembly. Now they have 9 instead 6 units. Everything else
is the same.

As for the data gathering, we could probably get the data in the same
format as we did for the other years. Would that work?

On Mon, Oct 27, 2014 at 1:13 AM, Maciej Biłas [email protected]
wrote:

This is the header of "General-Election-Results-2010.csv":

election_unit_ID,election_type,year,polling_station_ID,municipality_ID,party,party_abrev,votes,candidate

For some reason it contains a candidate column, which I think was supposed
to refer to a candidate name – something we never had data for. Without it
the data won't import. Since the column in the MySQL table and our
results.csv file it's trivial to remove I've added a task for that #47
#47. I'll look into that
later this week and reimport the post_absent_confirmed_2010.csv and get
back to you.


Reply to this email directly or view it on GitHub
#43 (comment).

@maciej
Copy link
Contributor Author

maciej commented Oct 31, 2014

@darkobrkan, I've created a separate issue for looking into the 2014 data. It's here: #48.

@mihi-tr
Copy link
Contributor

mihi-tr commented Nov 15, 2014

@maciej what is the status on this?

@maciej
Copy link
Contributor Author

maciej commented Nov 20, 2014

Hey @mihi-tr I'm swamped with my day job work currently. I cannot tell you when I'll be able to look at this. Perhaps during the weekend I'll regenerate enough to look at code.

@maciej
Copy link
Contributor Author

maciej commented Nov 29, 2014

@mihi-tr I've restarted working on this. I should give you some updates before end of the weekend.

I've looked at the updated post_absent_confirmed_2010.csv file and it seems some results have the Bosnian names for the faux polling_station_id and others have English. One line 1951 you have POTVRÐENI, but the next line has post. Is that intended?

@maciej
Copy link
Contributor Author

maciej commented Nov 30, 2014

The post, absent and confirmed votes are now imported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants