We use say Damegender can reduce the gender gap. To reduce the gender gap is an objective in United Nations, so the states must help to this objective. To reduce the gender gap we must measure where it is happening. Damegender is Free Software with data extracted from the states, so we can measure about men and women with a critical point of view that can be reviewed by peers.
Not, you can’t. You only guess the gender, such as the states is registering it. All states is using a binary ideology about the gender (male and female) and in Damegender too.
Yes, it’s. We are distributing the software in Pypi with GPLv3. You can read the license in all files and the full text in GPL.txt file.
Some aplications are guessing unknown, but in Damegender we are thinking that this option is ok if the name is not in database. We are providing a message about the problem with the dataset chosen and we are giving a guessed gender with machine learning giving intuitions to the user
Some LGTB associations are claiming by non binary options about gender, when these associations or collectives earns this idea in some state, we will be interested on to receive issues about it.
The standards has ideas about classifying male, female and other options. So, ISO/IEC 5218 proposes a norm about coding gender: “0 as not know”, “1 as male”, “2 as female” and “9 as not applicable”. The RFC 6350 where the section Gender has these categories: “m as male”, “f as female”, “o as other”, “n as not applicable” and “u as undefined”.
Machine learning is similar than statistics, so you don’t be afraid using maths when you are coding. Damegender can be used on a laptop, so that you are not depending of a very big company as Amazon for use the machine learning.
If you introduce a word such as chair, table, etc. you could have a gender ethical problem because you are giving a gender to an object. In Afghanistan there are women with no names referenced as things, objects, …, so the problem is dangerous. You can use Damegender to understand if the person is talking about a male or a female, but you must understand that you must not self reference or reference to another people as things. Please be careful, it’s your responsability.
More about Afghanistan and names:
Yes it’s. When you can find in what countries are being used names and surnames you can understand the cultural identity. For example, the spanish names and surnames can be find in many countries in South America, but don’t with the same quantities.
Damegender means Gender Detection Tool from the name coded by David Arroyo MEnéndez (DAME). There are problems in some communities that can be detected retrieving data from Internet. And the people can claim gender justice, dame in spanish is “give me” in english, so the word seems about give me gender justice in spanglish :D
I understand that the dame word in french or english is about a female, such as Notre Dame. But in Spain dame is a name about a male. So, the name damegender is an intelligence challenge about discover if the gender of dame is about a male or female. What’s your country?.
$ python3 main.py dame --total=es
dame's gender is male
probability: 1.0
420 males for dame from INE.es
0 females for dame from INE.es
If you are reading about a strict english the gender of the dame. It’s ok for me, because there are an objective from United Nations to fix about gender gap in the world, so we must put the attention on females.
I don’t have not very fix rules about it, but please don’t pronounce as de-mi-gen-der because there are another site with the name demigender with another subject and we don’t want problems.
If you know read the hyphenate, you can choose deym-gen-der, but you can read it as in spanish da-me-gen-der. I understand that both ways to pronounce is helping me because, I’m happy with the english and the spanish culture about this software.
Different ways to pronounce the name of a software, express different ideas about the software. If you are expressing on a precise way, you are helping to the author about the original idea.
$ pip3 install damegender[all]
In my GNU/Linux installation you can access to Damegender from:
/usr/local/lib/python3.7/site-packages/damegender
You can guess a name with:
$ python3 main.py Clara
- If you don’t know the gender about a name
- If you have a csv file, a mbox file, or a repository and you want to know males and females.
- If you want to download csv files about gender and names from any country
- If you want to compare csv files about gender and names related to accuracies, precision, errors, …
- If you want see the most used names in different countries
- If you want research with statistics about why a name is related with males or females.
- If you want Free Software.
- If you want check and use popular solutions in gender detection tools from the name
(genderize, genderapi, namsor, nameapi or gender guesser) from unified commands such as downloadcsv.py, api2gender.py, or downloadjson.py
$ python3 main.py David
$ python3 nameincountries.py David
$ python3 surnameincountries.py David
$ python3 surname.py Menéndez --total=us
$ python3 ethnicity.py David
You can count males and females in a git project with:
python3 git2gender.py https://github.com/davidam/orgguide-es.git --directory="/tmp/clonedir"
You can count males and females in a mailing list with:
python3 mail2gender.py http://mail-archives.apache.org/mod_mbox/httpd-announce/
For example, if the column zero of files/names/partial.csv is the column of names …
python3 csv2gender.py files/names/partial.csv --first_name_position=0 --dataset=us --outcsv=files/tests/out.csv
The file files/tests/out.csv is the file where the column of names has been guessing gender using the dataset of United States of America
You must use orig2.py and the shortname of the country. For instance, if you want update the spanish dataset …
python3 orig2.py es
- confusion.py
- accuracy.py
- errors.py
- roc.py
- pca-components.py and pca-features.py
- infofeatures.py
$ python3 csv2gender.py files/gnu-maintainers.csv --first_name_position=0 --title="GNU maintainers grouped by gender" --dataset="inter" --outcsv="files/gnu-maintainers.gender.csv" --outimg="files/gnu-maintainers.gender.png" --noshow --delete_duplicated
$ python3 csv2gender.py files/debian-maintainers-gpg-2020-04-01.csv --first_name_position=0 --title="Debian maintaners grouped by gender" --dataset="inter" --outcsv="files/debian-maintainers.gender.csv" --outimg="files/debian-maintainers.gender.png" --noshow --delete_duplicated
First, you must register an account in genderapi, genderize, namsor or nameapi:
$ python3 apikeyadd.py
Later, you can guess the name choosing the rigth api:
$ python3 api2gender.py David --api=genderize
You can use the command top.py to discover it. For instance the 5 most used females names are:
$ python3 top.py es --position --number=5 --sex=female
1) MARIA CARMEN: 656276
2) MARIA: 606048
3) CARMEN: 391563
4) JOSEFA: 276682
5) ANA MARIA: 273319
python3 csv2jsonapirest.py files/names/names_inter/dkfemales10.csv --outdir="files/tmp" --gender=female --names_by_multiple_files=1
python3 mergeinterfiles.py --file1=files/names/names_inter/dkmales5.csv --file2=files/names/names_inter/dkfemales10.csv --output=files/tests/dkmalesfemales5and10-$(date "+%Y-%m-%d-%H").csv --malefemale
python3 damegender2json.py --notoutput --csv=files/names/min.csv --jsonoutput=files/names/min.csv.today.json
For, example, give me names in a country with percentage about males and females, for instance, from 40 until 70
python3 percentage2names.py 40 --percentage_until=70 --outcsv=files/tests/40-70.txt
If you need the list with french people, you can execute
python3 percentage2names.py 40 --percentage_until=70 --outcsv=files/tests/40-70.txt --total=fr
Please, open an issue in https://github.com/davidam/damegender/issues.
We have found some chance in INE.es, so we have a physical dataset provided with an official stamp from the offices. An official dataset must not be changed, but the data can vary slightly, sometimes due to errors or updates.
Not, I don’t. The datasets remains with the same license provided by the states. From src/damegender/files/names/ you can access to the folder for each country and you will find the license.
In Damegender we are working on these principles:
- To be scientific: we want to publish and to disseminate in scientific events.
- To be usable: we want to allow retrieve data, easy as using a search engine.
- To be hacker: we want to allow distribute software and data in hacker networks: github, pypi, npm, …
- To be legal: we must be clever with licenses as in the Free Software Foundation.
We are selecting datasets open data with a good number of names retrieved from official statistical offices as official dataset
Although, we are evaluating to include datasets using free licecenses retrieved from other sources, so we want to verify the correctness of these data with external gender detection tools such as GenderAPI, Namsor, Genderize, …