This change filters out the jenkins commits from all OpenStack git commit logs. It does so at the point of processing within gitdm, rather than (for example) at the point of capture in the "do-it.sh" script.
This is somewhat brittle. If the jenkins commit email address were to change, for example, this would stop filter. Of course that would also be pretty obvious if we started seeing a bunch of Jenkins commits showing up again.
Change-Id: I2dc2b26778f3076561405ee3f0de58a8716e75cc
Reviewed-on: https://review.openstack.org/31666
Reviewed-by: Monty Taylor <mordred@inaugust.com>
Approved: Monty Taylor <mordred@inaugust.com>
Tested-by: Jenkins
Version tracking was used to see who had contributed to the most kernel
releases; not sure it's a long-term-useful feature. The unknown hackers
report helps when trying to improve the database.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Added first attempt for reporting by file type:
- A general report
- A report aggregated by file type and contributor
- A report aggregated by contributor and file type
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
The filetypes can be extended using a configuration files, where
is possible to associate file type and its corresponden regular
expression.
The code includes a script to test the regex without running
gitdm.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
When some projects have migrated from Subversion to Git, there
were several tags that were treated as new commits, which shows
a change in the whole project (code added/removed) when nothing
really happened. For instance, in GNOME a lot svn tags were
catched during the migration, but not all of them.
svn tags in git repositories brings bad stats because double count
commits, and in project with a lot history it may may involve several thousands of source of lines of code.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
Two new dumps were added: per filetype and for every changeset.
It necessary to set a prefix where to dump the data in csv,
because it will be generated one csv file per file type.
Now it is possible to get statistics per code, documentation,
build scripts, translations, multimedia and developers
documentation. This feature is useful for repositories where
there are different types of file, rather than code.
The detailed information does not use the Aggregate parameter.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
Patches as well s Total* and Dates are counted only if the
changeset is not a merge. However, CSCount (ChangeSetCount)
was counting everything, which changes a bit the results.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
The class LogPatchSplitter provides an iterator per patch. This
makes the code cleaner, easier to read and more pythonic.
The class only gets each commit set as lines.
It is possible to test it separately by:
$ git log | python logparser.py | more
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
It may distinguish between code, documentation, translations, etc.
Hence, it provides the basic feature to get more accurate reports.
It does not replace the current stats, it is only add the
possibility to generate reports by file type.
This feature was implemented originally by Gregorio Robles in
CVSAnalY http://tools.libresoft.es/cvsanaly/ Gregorio agreed to
add his code here.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
In order to make cleaner the code, I created a function
that parses a numstat line, which is useful to determine
the modified filename, and to calculate lines added and
removed.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
The dictionary used allows the use of a single meaningful
variable with a cleaner code. Also, it is not harder to add
new patterns.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
The option --numstat of git log gives the statistics of
lines added and removed file. Hence, it is not necessary
to parser a raw diff.
Another benefit, it is a less verbose log to be processed,
which helps to process long logs. This also prepares the
code for counting the changes per file type.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
Python provides a module to handle csv files which is named
csv. Therefore, it is necessary to rename the csv.py to
avoid name conflicts when the module csv is used.
Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
A certain obnoxious developer wants his contributions to be split between
two employers. So add the "VirtualEmployer" mechanism to make that
possible. A virtual employer is defined with:
VirtualEmployer ve-name
nn% real-name
...
end
(This construct must appear in the main configuration file). Developers
can be associated with the virtual employer in the usual way; at report
time, any changes credited to that employer will be split among the real
employers according to the percentages provided.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
When using -w option together with the -x file option, the data exported
in the CSV file are aggregated by weeks instead of months (the default).
This is useful to extract meaningful stats on short periods.
If you commit a git changelog to your repository, gitdm will be confused by
all the added patch tags. So make the patterns stricter to force them only
to match within the git log metadata - or so we hope. There is still room
for confusion here; we really need to make grabpatch() smart enough to
split metadata and the diff. Don't have time for that now.
This patch changes results slightly. In the 2.6.36 cycle, there's a tag
reading:
Original-Idea-and-Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Pre-patch gitdm would recognize that as a signoff; after the change it no
longer does.
Reported-by: Wolfgang Denk <wd@denx.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Instead boringly be replicating the directory base name where gitdm is
installed and write it on each option inside the configuration file, just send
it through the command line.
Signed-off-by: Tiago Vignatti <tiago.vignatti@nokia.com>
As a step to make grabpatch() more unit-test friendly, move out global
houskeeping from grabpatch(). This also gets rid of a TODO in the
code. The regression tests still passes after this refactoring, of
course.
Signed-off-by: Martin Nordholts <martinn@src.gnome.org>
This probably means an incorrect commit message, it also
means that if it is not fixed, the category for this person is probably
going to be incorrect.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Addresses of the form "user at host.wherever" can be trivially repaired, so
let's do so.
A couple of other minor tweaks are included here as well; nothing which
changes behavior.
Add tracking of tested-by, reported-by, and reviewed-by. For the first
two, we also track who is *giving* those credits.
While I was in the neighborhood I also:
- Started turning the "patch" class into something more than a bare
container; this work has just begin.
- Moved the report-writing code into its own file (reports.py)
Hi guys,
I knocked up a patch to generate some per-month, by-affiliation
statistics from the gitdm output; attached for interest or merging.
A sample of the output, complete with OO.o data-pilot, and pretty chart
is here:
http://www.gnome.org/~michael/data/2008-09-29-linux-stats.ods
with chart here:
http://www.gnome.org/~michael/images/2008-09-29-kernel-active.png
caption being:
"Graph showing number and affiliation of active kernel developers
(contributing more than 100 lines per month). Quick affiliation key,
from bottom up: Unknown, No-Affiliation, IBM, RedHat, Novell, Intel ..."
These are as yet not published, I plan to use them as a comparison to
OO.o's somewhat mediocre equivalents; hope to go live with them soon
(and fix the horrible bugs in stacked area charts to make them actually
pretty ).
HTH,
Michael.
--
michael.meeks@novell.com <><, Pseudo Engineer, itinerant idiot
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Yanmin Zhang committed a patch (09f2724a786f76475ef2985cf84f5359c553aade)
which claims to have been written in August, 2030. Code that bleeding-edge
makes gitdm confused, so pretend it's just normal, contemporary stuff.
Need to seed the database _after_ loading the config file,
otherwise we don't see the seeds as actually showing up for their
companies.
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Otherwise it doesn't matter if we change the config file option or not...
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
When gitdm is used for generating text-only report with its output
redirected to a file, all is well aside from the clutter at the beginning
of that file -- a very long line with repeating "Grabbing changesets...".
Solve that by redirecting progress reporting to stderr. It also helps to
see the progress when you redirect gitdm output to a file.
Also, we don't have to flush stdout since stderr is unbuffered by default.
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>