Commit Graph

43 Commits

Author SHA1 Message Date
Dan Stangel 9ae04f50b2 Add automated CI emails to be skipped in metrics
Omit two other automated/CI email addresses from metrics collection

Change-Id: I77b44e2ebd385a6f4e3d05d7da20e65a18fe85fb
2014-07-15 09:23:14 -06:00
Dan Stangel eb0b49d340 Filter out OpenStack Jenkins automated commits from the metrics
This change filters out the jenkins commits from all OpenStack git commit logs.  It does so at the point of processing within gitdm, rather than (for example) at the point of capture in the "do-it.sh" script.

This is somewhat brittle.  If the jenkins commit email address were to change, for example, this would stop filter.  Of course that would also be pretty obvious if we started seeing a bunch of Jenkins commits showing up again.

Change-Id: I2dc2b26778f3076561405ee3f0de58a8716e75cc
Reviewed-on: https://review.openstack.org/31666
Reviewed-by: Monty Taylor <mordred@inaugust.com>
Approved: Monty Taylor <mordred@inaugust.com>
Tested-by: Jenkins
2013-06-08 01:28:33 +00:00
Jonathan Corbet 1e293bc90a Add version tracking support and an "unknown hackers" report
Version tracking was used to see who had contributed to the most kernel
releases; not sure it's a long-term-useful feature.  The unknown hackers
report helps when trying to improve the database.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2012-04-06 16:00:04 -06:00
Aidan Skinner 65cd32216f Add -y option to aggregate changes by year, not month 2012-02-12 11:46:30 -07:00
Jonathan Corbet 47ffed3cee Merge branch 'refactoring' of git://gitorious.org/mining-tools/gitdm into german 2011-07-11 13:51:58 -06:00
Jonathan Corbet 85004f0f9b Use pypy by default
...it's 3x faster...

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2011-07-11 13:19:22 -06:00
Germán Póo-Caamaño b2770f20b9 Added reports by file types
Added first attempt for reporting by file type:
- A general report
- A report aggregated by file type and contributor
- A report aggregated by contributor and file type

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-24 00:17:30 -07:00
Germán Póo-Caamaño b2fd0c6939 Move filetypes onto configuration file
The filetypes can be extended using a configuration files, where
is possible to associate file type and its corresponden regular
expression.

The code includes a script to test the regex without running
gitdm.

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-24 00:12:01 -07:00
Germán Póo-Caamaño 1a85acef6b Added workaround for svn tags imported wrongly
When some projects have migrated from Subversion to Git, there
were several tags that were treated as new commits, which shows
a change in the whole project (code added/removed) when nothing
really happened.  For instance, in GNOME a lot svn tags were
catched during the migration, but not all of them.

svn tags in git repositories brings bad stats because double count
commits, and in project with a lot history it may may involve several thousands of source of lines of code.

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño 5964089840 Added CSV dumps: per filetype and per changeset
Two new dumps were added: per filetype and for every changeset.
It necessary to set a prefix where to dump the data in csv,
because it will be generated one csv file per file type.

Now it is possible to get statistics per code, documentation,
build scripts, translations, multimedia and developers
documentation.  This feature is useful for repositories where
there are different types of file, rather than code.

The detailed information does not use the Aggregate parameter.

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño cf1e69b859 Fixed CSCount which should not count merges
Patches as well s Total* and Dates are counted only if the
changeset is not a merge. However, CSCount (ChangeSetCount)
was counting everything, which changes a bit the results.

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño 7b26ae2109 Move out the grabpatch from the parser
The class LogPatchSplitter provides an iterator per patch.  This
makes the code cleaner, easier to read and more pythonic.
The class only gets each commit set as lines.

It is possible to test it separately by:
   $ git log | python logparser.py | more

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño efcc420153 Added initial support for file type reports
It may distinguish between code, documentation, translations, etc.
Hence, it provides the basic feature to get more accurate reports.

It does not replace the current stats, it is only add the
possibility to generate reports by file type.

This feature was implemented originally by Gregorio Robles in
CVSAnalY http://tools.libresoft.es/cvsanaly/  Gregorio agreed to
add his code here.

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño 27bb2eca31 Added a function to parse the stats per file
In order to make cleaner the code, I created a function
that parses a numstat line, which is useful to determine
the modified filename, and to calculate lines added and
removed.

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño 70df53f20f Use a dict of patterns instead of several global variables
The dictionary used allows the use of a single meaningful
variable with a cleaner code.  Also, it is not harder to add
new patterns.

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño 935be113b3 Added option to get the stats from numstat instead of diff
The option --numstat of git log gives the statistics of
lines added and removed file.  Hence, it is not necessary
to parser a raw diff.

Another benefit, it is a less verbose log to be processed,
which helps to process long logs.  This also prepares the
code for counting the changes per file type.

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño 4a729f1d72 Use csv package instead of manual CSV handling
Python provides a module to handle csv files which is named
csv.  Therefore, it is necessary to rename the csv.py to
avoid name conflicts when the module csv is used.

Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>
2011-06-22 19:27:19 -07:00
Jonathan Corbet 75bc1479df Add the VirtualEmployer mechanism
A certain obnoxious developer wants his contributions to be split between
two employers.  So add the "VirtualEmployer" mechanism to make that
possible.  A virtual employer is defined with:

	VirtualEmployer ve-name
		nn% real-name
		...
	end

(This construct must appear in the main configuration file).  Developers
can be associated with the virtual employer in the usual way; at report
time, any changes credited to that employer will be split among the real
employers according to the percentages provided.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2011-05-10 14:32:47 -06:00
Jonathan Corbet c599a8a29b Fix an option parsing regression
Cedric added a : for -a, no idea why.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2011-05-10 11:28:23 -06:00
Jonathan Corbet 6cd2721aec Update copyright notices
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2011-05-10 10:26:55 -06:00
Cédric Bosdonnat 92726d0df5 Made the CSV file aggregating data by weeks or months
When using -w option together with the -x file option, the data exported
in the CSV file are aggregated by weeks instead of months (the default).
This is useful to extract meaningful stats on short periods.
2011-04-22 08:43:41 -06:00
Cédric Bosdonnat aac3e8ccc8 Making the datelc an actual csv for easier data handling 2011-02-17 10:23:57 -07:00
Cédric Bosdonnat e26ed2dedc Why should we exit after datelc output? 2011-02-11 09:26:06 -07:00
Cédric Bosdonnat 5f2d65c8fd Only add Linus and Andrew with -a
[jc: removed unneeded global line]
2011-02-11 09:23:48 -07:00
Jonathan Corbet f64e9ffbd8 Make tag matching stricter
If you commit a git changelog to your repository, gitdm will be confused by
all the added patch tags.  So make the patterns stricter to force them only
to match within the git log metadata - or so we hope.  There is still room
for confusion here; we really need to make grabpatch() smart enough to
split metadata and the diff.  Don't have time for that now.

This patch changes results slightly.  In the 2.6.36 cycle, there's a tag
reading:

    Original-Idea-and-Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>

Pre-patch gitdm would recognize that as a signoff; after the change it no
longer does.

Reported-by: Wolfgang Denk <wd@denx.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2010-10-04 17:06:33 -06:00
Tiago Vignatti a91d5f59e5 Add option to get the configuration files from a given base directory
Instead boringly be replicating the directory base name where gitdm is
installed and write it on each option inside the configuration file, just send
it through the command line.

Signed-off-by: Tiago Vignatti <tiago.vignatti@nokia.com>
2010-06-30 20:13:21 +03:00
Jonathan Corbet db2db73480 Only gripe about missing author names once
(per name)

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2010-05-03 11:31:34 -06:00
Martin Nordholts 2d89da8864 Move out global houskeeping from grabpatch()
As a step to make grabpatch() more unit-test friendly, move out global
houskeeping from grabpatch(). This also gets rid of a TODO in the
code. The regression tests still passes after this refactoring, of
course.

Signed-off-by: Martin Nordholts <martinn@src.gnome.org>
2010-02-06 16:32:35 -07:00
Jonathan Corbet 79414010fb Get the developer count right even without full patch info 2009-07-24 17:11:41 -06:00
Greg Kroah-Hartman e7b9d7eba1 gitdm: report issue when an email address is a "name"
This probably means an incorrect commit message, it also
means that if it is not fixed, the category for this person is probably
going to be incorrect.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-07-24 13:57:59 -06:00
Greg Kroah-Hartman 9aaac7a67d add more spaces to the "done" message so it doesn't show the trailing 0 2009-07-24 13:57:48 -06:00
Jonathan Corbet d25572552b Reduce the number of "funky email" gripes
Addresses of the form "user at host.wherever" can be trivially repaired, so
let's do so.

A couple of other minor tweaks are included here as well; nothing which
changes behavior.
2009-07-24 13:56:21 -06:00
Jonathan Corbet c0d9831515 Quick hack to make the developer/employer counts at the top correct
...before we were counting everybody we knew about, regardless of whether
they did anything in the period we're looking at.
2009-03-21 15:29:57 -06:00
Jonathan Corbet 9a2ba8a4f5 Tested-by / Reported-by credits and more
Add tracking of tested-by, reported-by, and reviewed-by.  For the first
two, we also track who is *giving* those credits.

While I was in the neighborhood I also:

 - Started turning the "patch" class into something more than a bare
   container; this work has just begin.

 - Moved the report-writing code into its own file (reports.py)
2008-11-11 11:11:04 -07:00
Michael Meeks d1a8929872 gitdm patch ...
Hi guys,

	I knocked up a patch to generate some per-month, by-affiliation
statistics from the gitdm output; attached for interest or merging.

	A sample of the output, complete with OO.o data-pilot, and pretty chart
is here:

http://www.gnome.org/~michael/data/2008-09-29-linux-stats.ods

	with chart here:
	http://www.gnome.org/~michael/images/2008-09-29-kernel-active.png

	caption being:

	"Graph showing number and affiliation of active kernel developers
(contributing more than 100 lines per month). Quick affiliation key,
from bottom up: Unknown, No-Affiliation, IBM, RedHat, Novell, Intel ..."

	These are as yet not published, I plan to use them as a comparison to
OO.o's somewhat mediocre equivalents; hope to go live with them soon
(and fix the horrible bugs in stacked area charts to make them actually
pretty ).

	HTH,

		Michael.

--
 michael.meeks@novell.com  <><, Pseudo Engineer, itinerant idiot

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-10-06 14:29:27 -06:00
Jonathan Corbet 558dbe1cbe Don't accept totally bogus dates
Yanmin Zhang committed a patch (09f2724a786f76475ef2985cf84f5359c553aade)
which claims to have been written in August, 2030.  Code that bleeding-edge
makes gitdm confused, so pretend it's just normal, contemporary stuff.
2008-09-05 13:53:35 -06:00
Greg KH dd091c4268 finally get the config file stuff correct
Need to seed the database _after_ loading the config file,
otherwise we don't see the seeds as actually showing up for their
companies.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-24 12:03:40 -06:00
Greg KH 5234a1f726 parse the config file _after_ we have read the command line options
Otherwise it doesn't matter if we change the config file option or not...

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-24 12:03:40 -06:00
Greg Kroah-Hartman 80112779cf make -c option actually work
The -c option was not fully implemented

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-24 12:03:40 -06:00
Jonathan Corbet f05ab88175 Fix up the copyright notices. 2008-07-18 15:34:28 -06:00
Jonathan Corbet ba5f6b6943 Move regular expressions out to patterns.py
...I need them for an associated tool I'm working on.
2008-07-18 15:04:55 -06:00
Kir Kolyshkin 3d9830c3ce gitdm: Report progress to stderr not stdout
When gitdm is used for generating text-only report with its output
redirected to a file, all is well aside from the clutter at the beginning
of that file -- a very long line with repeating "Grabbing changesets...".

Solve that by redirecting progress reporting to stderr. It also helps to
see the progress when you redirect gitdm output to a file.

Also, we don't have to flush stdout since stderr is unbuffered by default.

Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-06-27 09:02:28 -06:00
Jonathan Corbet e1a6d06d65 Initial commit
First commit of gitdm to the new repo.  Call it version 0.10 or something
silly like that.
2008-06-27 08:58:35 -06:00