Skip to content

Commit

Permalink
nf: testing git-keywords
Browse files Browse the repository at this point in the history
  • Loading branch information
nschmans committed Feb 6, 2018
1 parent 66cfafd commit 596647f
Show file tree
Hide file tree
Showing 11 changed files with 465 additions and 114 deletions.
107 changes: 107 additions & 0 deletions .git-keywords/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
Helper scripts for keyword expansion in git
###########################################

:date: 2015-05-14
:tags: git, keywords
:author: Roland Smith

.. Last modified: 2015-05-14 18:02:25 +0200
One of the things I liked about the old rcs_ revision control system was that
it supported keyword expansion in files. Unlike systems like ``rcs``, ``cvs``
and ``subversion``, the ``git`` revision control system cannot provide keyword
expansion. The cause for this is that you can't modify a file with information
about the commit after you've committed, because ``git`` checksums the file
first.

.. _rcs: http://en.wikipedia.org/wiki/Revision_Control_System

Git will let you inject text in a file when it is checked out, and remove it
when it is checked in. There are two ways of doing this. First, you can use
the ``ident`` attribute_. For any file type that has the ``ident`` attribute
set (in ``.gitattributes``), git will look for the string ``$Id$`` on checkout
and add the SHA-1 of the blob to it like this: ``$Id:
daf7affdeadc31cbcf8689f2ac5fcb6ecb6fd85e $``. While this unambiguously
identifies the commit, it is not all that practical.

* It cannot tell you the relative order of two commits.
* It doesn't tell you the commit date.

Luckily, keyword expansion can be done with git using attributes_.

.. _attribute: http://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes
.. _attributes: http://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes

In my global git configuration file (``~/.gitconfig``) I have defined a
filter_ called "kw":

.. _filter: http://git-scm.com/docs/gitattributes

.. code-block:: ini
[filter "kw"]
clean = kwclean
smudge = kwset
This configuration uses two programs (which should be in your ``\$PATH``)
called ``kwset`` and ``kwclean`` to expand and contract keywords. These are
two scripts written in python_ 3.

.. _python: http://python.org/

To *enable* these substitutions, you have to use git attributes. E.g. to have
keyword substitutions in *all* files in a repository, you need to add the
following to the ``.gitattributes`` file in that repository;

.. code-block:: ini
* filter=kw
Such a general use of filters can be problematic with e.g. binary files like
pictures. As a rule, modifying the contents of a binary (especially adding or
removing bytes) tends to *break* them.

It is therefore better to be explicit and specific as to what types of file
the filter should apply to;

.. code-block:: ini
*.py filter=kw
*.txt filter=kw
With this filter setup, file types that contain keywords and which are listed
as such in the ``.gitattributes`` file will have them expanded on checkout.

To make these updated keywords visible in the working directory, changed
objects will have to be checked out after their changes have been committed.
To accomplish this, we can use the ``post-commit`` hook_. There are several
possible choices here. You can e.g.:

.. _hook: http://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks

* Check out the files which have changed since the previous commit.
* Check out *all* files.

The first one is probably the most common case. I wrote the script
``update-modified-keywords.py`` for it. After a check-in it checks out all the
files that were modified in the last commit.

But if all the directories in one file are part of one project, you probably
want all files to carry the same date/revision. This is what the
``update-all-keywords.py`` script is for. After a check-in it checks out all the
files that are under git's control.

Put both these scripts in a location in your ``$PATH``, and then make symbolic
links from ``.git/hooks/post-commit`` to the appropriate script.

.. NOTE::

.. image:: http://i.creativecommons.org/p/zero/1.0/88x31.png
:alt: CC0
:align: center
:target: http://creativecommons.org/publicdomain/zero/1.0/

To the extent possible under law, Roland Smith has waived all copyright and
related or neighboring rights to ``kwset.py``, ``kwclean.py``,
``update-all-keywords.py`` and ``update-modified-keywords.py``. These
works are published from the Netherlands.
25 changes: 25 additions & 0 deletions .git-keywords/kwclean.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env python3
# vim:fileencoding=utf-8:ft=python
#
# Author: R.F. Smith <[email protected]>
# Last modified: 2015-05-03 22:06:55 +0200
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwclean.py. This work is published from the
# Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove the Date and Revision keyword contents from the standard input."""

import io
import re
import sys

if __name__ == '__main__':
dre = re.compile(''.join([r'\$', r'Date.*\$']))
drep = ''.join(['$', 'Date', '$'])
rre = re.compile(''.join([r'\$', r'Revision.*\$']))
rrep = ''.join(['$', 'Revision', '$'])
input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
for line in input_stream:
line = dre.sub(drep, line)
print(rre.sub(rrep, line), end="")
67 changes: 67 additions & 0 deletions .git-keywords/kwset.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
#!/usr/bin/env python3
# vim:fileencoding=utf-8:ft=python
#
# Author: R.F. Smith <[email protected]>
# Last modified: 2015-09-23 22:18:34 +0200
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to kwset.py. This work is published from
# the Netherlands. See http://creativecommons.org/publicdomain/zero/1.0/

"""Fill the Date and Revision keywords from the latest git commit and tag and
subtitutes them in the standard input."""

import io
import os
import re
import subprocess
import sys


def main():
"""Main program.
"""
dre = re.compile(''.join([r'\$', r'Date:?\$']))
rre = re.compile(''.join([r'\$', r'Revision:?\$']))
currp = os.getcwd()
if not os.path.exists(currp + '/.git'):
print >> sys.stderr, 'This directory is not controlled by git!'
sys.exit(1)
date = gitdate()
rev = gitrev()
input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8')
for line in input_stream:
line = dre.sub(date, line)
print(rre.sub(rev, line), end="")


def gitdate():
"""Get the date from the latest commit in ISO8601 format.
"""
args = ['git', 'log', '-1', '--date=iso']
outdata = subprocess.check_output(args, universal_newlines=True)
outlines = outdata.splitlines()
dline = [l for l in outlines if l.startswith('Date')]
try:
dat = dline[0][5:].strip()
return ''.join(['$', 'Date: ', dat, ' $'])
except IndexError:
raise ValueError('Date not found in git output')


def gitrev():
"""Get the latest tag and use it as the revision number. This presumes the
habit of using numerical tags. Use the short hash if no tag available.
"""
args = ['git', 'describe', '--tags', '--always']
try:
r = subprocess.check_output(args,
stderr=subprocess.DEVNULL,
universal_newlines=True)[:-1]
except subprocess.CalledProcessError:
return ''.join(['$', 'Revision', '$'])
return ''.join(['$', 'Revision: ', r, ' $'])


if __name__ == '__main__':
main()
126 changes: 126 additions & 0 deletions .git-keywords/update-all-keywords.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
#!/usr/bin/env python3
# vim:fileencoding=utf-8:ft=python
#
# Author: R.F. Smith <[email protected]>
# Last modified: 2015-09-23 21:17:05 +0200
#
# To the extent possible under law, Roland Smith has waived all copyright and
# related or neighboring rights to update-all-keywords.py. This work is
# published from the Netherlands.
# See http://creativecommons.org/publicdomain/zero/1.0/

"""Remove and check out all files under git's control that contain keywords in
the current working directory."""

from base64 import b64decode
import mmap
import os
import subprocess
import sys


def main(args):
"""Main program.
Arguments:
args: command line arguments
"""
# Check if git is available.
checkfor(['git', '--version'])
# Check if .git exists
if not os.access('.git', os.F_OK):
print('No .git directory found!')
sys.exit(1)
# Get all files that are controlled by git.
files = git_ls_files()
# Remove those that aren't checked in
mod = git_not_checkedin()
if mod:
files = [f for f in files if f not in mod]
if not files:
print('{}: Only uncommitted changes, nothing to do.'.format(args[0]))
sys.exit(0)
files.sort()
# Find files that have keywords in them
kwfn = keywordfiles(files)
if kwfn:
print('{}: Updating all files.'.format(args[0]))
for fn in kwfn:
os.remove(fn)
sargs = ['git', 'checkout', '-f'] + kwfn
subprocess.call(sargs)
else:
print('{}: Nothing to update.'.format(args[0]))


def checkfor(args):
"""Make sure that a program necessary for using this script is
available.
Arguments:
args: String or list of strings of commands. A single string may
not contain spaces.
"""
if isinstance(args, str):
if ' ' in args:
raise ValueError('No spaces in single command allowed.')
args = [args]
try:
subprocess.check_call(args, stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL)
except subprocess.CalledProcessError:
print("Required program '{}' not found! exiting.".format(args[0]))
sys.exit(1)


def git_ls_files():
"""Find ordinary files that are controlled by git.
Returns:
A list of files
"""
args = ['git', 'ls-files']
flist = subprocess.check_output(args).decode('utf8').splitlines()
return flist


def git_not_checkedin():
"""Find files that are modified but are not checked in.
Returns:
A list of modified files that are not checked in.
"""
lns = subprocess.check_output(['git', 'status', '-s'])
lns.decode('utf8').splitlines()
lns = [l.split()[-1] for l in lns]
return lns


def keywordfiles(fns):
"""Filter those files that have keywords in them
Arguments:
fns: A list of filenames.
Returns:
A list for filenames for files that contain keywords.
"""
# These lines are encoded otherwise they would be mangled if this file
# is checked in!
datekw = b64decode('JERhdGU=')
revkw = b64decode('JFJldmlzaW9u')
rv = []
for fn in fns:
with open(fn, 'rb') as f:
try:
mm = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
if mm.find(datekw) > -1 or mm.find(revkw) > -1:
rv.append(fn)
mm.close()
except ValueError:
pass
return rv


if __name__ == '__main__':
main(sys.argv)
Loading

0 comments on commit 596647f

Please sign in to comment.