Skip to content

Latest commit

 

History

History
299 lines (228 loc) · 11.9 KB

faq.org

File metadata and controls

299 lines (228 loc) · 11.9 KB

Frequently Answers and Questions about Damegender

Ethical Values

Is Damegender reducing the gender gap?

We use say Damegender can reduce the gender gap. To reduce the gender gap is an objective in United Nations, so the states must help to this objective. To reduce the gender gap we must measure where it is happening. Damegender is Free Software with data extracted from the states, so we can measure about men and women with a critical point of view that can be reviewed by peers.

Can I guess the sexual orientation with Damegender?

Not, you can’t. You only guess the gender, such as the states is registering it. All states is using a binary ideology about the gender (male and female) and in Damegender too.

Is Damegender Free Software?

Yes, it’s. We are distributing the software in Pypi with GPLv3. You can read the license in all files and the full text in GPL.txt file.

Why are you guessing male or female?

Some aplications are guessing unknown, but in Damegender we are thinking that this option is ok if the name is not in database. We are providing a message about the problem with the dataset chosen and we are giving a guessed gender with machine learning giving intuitions to the user

Some LGTB associations are claiming by non binary options about gender, when these associations or collectives earns this idea in some state, we will be interested on to receive issues about it.

The standards has ideas about classifying male, female and other options. So, ISO/IEC 5218 proposes a norm about coding gender: “0 as not know”, “1 as male”, “2 as female” and “9 as not applicable”. The RFC 6350 where the section Gender has these categories: “m as male”, “f as female”, “o as other”, “n as not applicable” and “u as undefined”.

Are you predicting ethical problems using machine learning for guess names?

Machine learning is similar than statistics, so you don’t be afraid using maths when you are coding. Damegender can be used on a laptop, so that you are not depending of a very big company as Amazon for use the machine learning.

If you introduce a word such as chair, table, etc. you could have a gender ethical problem because you are giving a gender to an object. In Afghanistan there are women with no names referenced as things, objects, …, so the problem is dangerous. You can use Damegender to understand if the person is talking about a male or a female, but you must understand that you must not self reference or reference to another people as things. Please be careful, it’s your responsability.

More about Afghanistan and names:

Is Damegender helping to preserve the cultural identity?

Yes it’s. When you can find in what countries are being used names and surnames you can understand the cultural identity. For example, the spanish names and surnames can be find in many countries in South America, but don’t with the same quantities.

Why are you using the word Damegender?

Damegender means Gender Detection Tool from the name coded by David Arroyo MEnéndez (DAME). There are problems in some communities that can be detected retrieving data from Internet. And the people can claim gender justice, dame in spanish is “give me” in english, so the word seems about give me gender justice in spanglish :D

I understand that the dame word in french or english is about a female, such as Notre Dame. But in Spain dame is a name about a male. So, the name damegender is an intelligence challenge about discover if the gender of dame is about a male or female. What’s your country?.

$ python3 main.py dame --total=es
dame's gender is male
probability: 1.0
420 males for dame from INE.es
0 females for dame from INE.es

If you are reading about a strict english the gender of the dame. It’s ok for me, because there are an objective from United Nations to fix about gender gap in the world, so we must put the attention on females.

How can I pronounce Damegender?

I don’t have not very fix rules about it, but please don’t pronounce as de-mi-gen-der because there are another site with the name demigender with another subject and we don’t want problems.

If you know read the hyphenate, you can choose deym-gen-der, but you can read it as in spanish da-me-gen-der. I understand that both ways to pronounce is helping me because, I’m happy with the english and the spanish culture about this software.

Different ways to pronounce the name of a software, express different ideas about the software. If you are expressing on a precise way, you are helping to the author about the original idea.

Installation

How can I install it?

$ pip3 install damegender[all]

Where is the scripts to execute Damegender?

In my GNU/Linux installation you can access to Damegender from:

/usr/local/lib/python3.7/site-packages/damegender

You can guess a name with:

$ python3 main.py Clara

Using it

Why must I use Damegender?

  • If you don’t know the gender about a name
  • If you have a csv file, a mbox file, or a repository and you want to know males and females.
  • If you want to download csv files about gender and names from any country
  • If you want to compare csv files about gender and names related to accuracies, precision, errors, …
  • If you want see the most used names in different countries
  • If you want research with statistics about why a name is related with males or females.
  • If you want Free Software.
  • If you want check and use popular solutions in gender detection tools from the name

(genderize, genderapi, namsor, nameapi or gender guesser) from unified commands such as downloadcsv.py, api2gender.py, or downloadjson.py

How can I determine gender about a name?

$ python3 main.py David

What countries are related about a name?

$ python3 nameincountries.py David

What countries are related about a surname?

$ python3 surnameincountries.py David

How many people are using a surname?

$ python3 surname.py Menéndez --total=us

Give me the race about a name in USA!

$ python3 ethnicity.py David

How can I determine gender gap in free software projects or mailing lists.

You can count males and females in a git project with:

python3 git2gender.py https://github.com/davidam/orgguide-es.git --directory="/tmp/clonedir"

You can count males and females in a mailing list with:

python3 mail2gender.py http://mail-archives.apache.org/mod_mbox/httpd-announce/

How can I count males and females in csv files with names in some column?

For example, if the column zero of files/names/partial.csv is the column of names …

python3 csv2gender.py files/names/partial.csv --first_name_position=0 --dataset=us --outcsv=files/tests/out.csv  

The file files/tests/out.csv is the file where the column of names has been guessing gender using the dataset of United States of America

How can I update an dataset from a statistical institution?

You must use orig2.py and the shortname of the country. For instance, if you want update the spanish dataset …

python3 orig2.py es

What’s the scripts to research with statistics?

  • confusion.py
  • accuracy.py
  • errors.py
  • roc.py
  • pca-components.py and pca-features.py
  • infofeatures.py

Give me some real examples about to count males and females in communities

$ python3 csv2gender.py files/gnu-maintainers.csv --first_name_position=0 --title="GNU maintainers grouped by gender" --dataset="inter" --outcsv="files/gnu-maintainers.gender.csv" --outimg="files/gnu-maintainers.gender.png" --noshow --delete_duplicated
$ python3 csv2gender.py files/debian-maintainers-gpg-2020-04-01.csv --first_name_position=0 --title="Debian maintaners grouped by gender" --dataset="inter" --outcsv="files/debian-maintainers.gender.csv" --outimg="files/debian-maintainers.gender.png" --noshow --delete_duplicated

How can I use another solutions in gender detection from damegender?

First, you must register an account in genderapi, genderize, namsor or nameapi:

$ python3 apikeyadd.py

Later, you can guess the name choosing the rigth api:

$ python3 api2gender.py David --api=genderize

What is the most popular names in a country?

You can use the command top.py to discover it. For instance the 5 most used females names are:

$ python3 top.py es --position --number=5 --sex=female
1) MARIA CARMEN: 656276
2) MARIA: 606048
3) CARMEN: 391563
4) JOSEFA: 276682
5) ANA MARIA: 273319

How can I convert a csv file of names, gender and frequency to json?

python3 csv2jsonapirest.py files/names/names_inter/dkfemales10.csv --outdir="files/tmp" --gender=female --names_by_multiple_files=1

How can I merge 2 csv files of names, gender and frequency?

python3 mergeinterfiles.py --file1=files/names/names_inter/dkmales5.csv --file2=files/names/names_inter/dkfemales10.csv --output=files/tests/dkmalesfemales5and10-$(date "+%Y-%m-%d-%H").csv --malefemale

How can I dump a csv file applying a machine learning model in a json file?

python3 damegender2json.py --notoutput --csv=files/names/min.csv --jsonoutput=files/names/min.csv.today.json

How can I make a list of people with names non binary?

For, example, give me names in a country with percentage about males and females, for instance, from 40 until 70

python3 percentage2names.py 40 --percentage_until=70 --outcsv=files/tests/40-70.txt

If you need the list with french people, you can execute

python3 percentage2names.py 40 --percentage_until=70 --outcsv=files/tests/40-70.txt --total=fr

Data

What happens if I see chances between the original source and the dataset provided in Damegender?

Please, open an issue in https://github.com/davidam/damegender/issues.

We have found some chance in INE.es, so we have a physical dataset provided with an official stamp from the offices. An official dataset must not be changed, but the data can vary slightly, sometimes due to errors or updates.

Do you have a standard license for the datasets?

Not, I don’t. The datasets remains with the same license provided by the states. From src/damegender/files/names/ you can access to the folder for each country and you will find the license.

What principles is being guided about to manage the datasets?

In Damegender we are working on these principles:

  • To be scientific: we want to publish and to disseminate in scientific events.
  • To be usable: we want to allow retrieve data, easy as using a search engine.
  • To be hacker: we want to allow distribute software and data in hacker networks: github, pypi, npm, …
  • To be legal: we must be clever with licenses as in the Free Software Foundation.

What means official dataset in Damegender?

We are selecting datasets open data with a good number of names retrieved from official statistical offices as official dataset

Although, we are evaluating to include datasets using free licecenses retrieved from other sources, so we want to verify the correctness of these data with external gender detection tools such as GenderAPI, Namsor, Genderize, …