gitdm

Commit Graph

Author	SHA1	Message	Date
Dan Stangel	9ae04f50b2	Add automated CI emails to be skipped in metrics Omit two other automated/CI email addresses from metrics collection Change-Id: I77b44e2ebd385a6f4e3d05d7da20e65a18fe85fb	2014-07-15 09:23:14 -06:00
Dan Stangel	eb0b49d340	Filter out OpenStack Jenkins automated commits from the metrics This change filters out the jenkins commits from all OpenStack git commit logs. It does so at the point of processing within gitdm, rather than (for example) at the point of capture in the "do-it.sh" script. This is somewhat brittle. If the jenkins commit email address were to change, for example, this would stop filter. Of course that would also be pretty obvious if we started seeing a bunch of Jenkins commits showing up again. Change-Id: I2dc2b26778f3076561405ee3f0de58a8716e75cc Reviewed-on: https://review.openstack.org/31666 Reviewed-by: Monty Taylor <mordred@inaugust.com> Approved: Monty Taylor <mordred@inaugust.com> Tested-by: Jenkins	2013-06-08 01:28:33 +00:00
Jonathan Corbet	1e293bc90a	Add version tracking support and an "unknown hackers" report Version tracking was used to see who had contributed to the most kernel releases; not sure it's a long-term-useful feature. The unknown hackers report helps when trying to improve the database. Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2012-04-06 16:00:04 -06:00
Aidan Skinner	65cd32216f	Add -y option to aggregate changes by year, not month	2012-02-12 11:46:30 -07:00
Jonathan Corbet	47ffed3cee	Merge branch 'refactoring' of git://gitorious.org/mining-tools/gitdm into german	2011-07-11 13:51:58 -06:00
Jonathan Corbet	85004f0f9b	Use pypy by default ...it's 3x faster... Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2011-07-11 13:19:22 -06:00
Germán Póo-Caamaño	b2770f20b9	Added reports by file types Added first attempt for reporting by file type: - A general report - A report aggregated by file type and contributor - A report aggregated by contributor and file type Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-24 00:17:30 -07:00
Germán Póo-Caamaño	b2fd0c6939	Move filetypes onto configuration file The filetypes can be extended using a configuration files, where is possible to associate file type and its corresponden regular expression. The code includes a script to test the regex without running gitdm. Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-24 00:12:01 -07:00
Germán Póo-Caamaño	1a85acef6b	Added workaround for svn tags imported wrongly When some projects have migrated from Subversion to Git, there were several tags that were treated as new commits, which shows a change in the whole project (code added/removed) when nothing really happened. For instance, in GNOME a lot svn tags were catched during the migration, but not all of them. svn tags in git repositories brings bad stats because double count commits, and in project with a lot history it may may involve several thousands of source of lines of code. Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño	5964089840	Added CSV dumps: per filetype and per changeset Two new dumps were added: per filetype and for every changeset. It necessary to set a prefix where to dump the data in csv, because it will be generated one csv file per file type. Now it is possible to get statistics per code, documentation, build scripts, translations, multimedia and developers documentation. This feature is useful for repositories where there are different types of file, rather than code. The detailed information does not use the Aggregate parameter. Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño	cf1e69b859	Fixed CSCount which should not count merges Patches as well s Total* and Dates are counted only if the changeset is not a merge. However, CSCount (ChangeSetCount) was counting everything, which changes a bit the results. Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño	7b26ae2109	Move out the grabpatch from the parser The class LogPatchSplitter provides an iterator per patch. This makes the code cleaner, easier to read and more pythonic. The class only gets each commit set as lines. It is possible to test it separately by: $ git log \| python logparser.py \| more Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño	efcc420153	Added initial support for file type reports It may distinguish between code, documentation, translations, etc. Hence, it provides the basic feature to get more accurate reports. It does not replace the current stats, it is only add the possibility to generate reports by file type. This feature was implemented originally by Gregorio Robles in CVSAnalY http://tools.libresoft.es/cvsanaly/ Gregorio agreed to add his code here. Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño	27bb2eca31	Added a function to parse the stats per file In order to make cleaner the code, I created a function that parses a numstat line, which is useful to determine the modified filename, and to calculate lines added and removed. Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño	70df53f20f	Use a dict of patterns instead of several global variables The dictionary used allows the use of a single meaningful variable with a cleaner code. Also, it is not harder to add new patterns. Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño	935be113b3	Added option to get the stats from numstat instead of diff The option --numstat of git log gives the statistics of lines added and removed file. Hence, it is not necessary to parser a raw diff. Another benefit, it is a less verbose log to be processed, which helps to process long logs. This also prepares the code for counting the changes per file type. Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-22 19:27:47 -07:00
Germán Póo-Caamaño	4a729f1d72	Use csv package instead of manual CSV handling Python provides a module to handle csv files which is named csv. Therefore, it is necessary to rename the csv.py to avoid name conflicts when the module csv is used. Signed-off-by: Germán Póo-Caamaño <gpoo@gnome.org>	2011-06-22 19:27:19 -07:00
Jonathan Corbet	75bc1479df	Add the VirtualEmployer mechanism A certain obnoxious developer wants his contributions to be split between two employers. So add the "VirtualEmployer" mechanism to make that possible. A virtual employer is defined with: VirtualEmployer ve-name nn% real-name ... end (This construct must appear in the main configuration file). Developers can be associated with the virtual employer in the usual way; at report time, any changes credited to that employer will be split among the real employers according to the percentages provided. Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2011-05-10 14:32:47 -06:00
Jonathan Corbet	c599a8a29b	Fix an option parsing regression Cedric added a : for -a, no idea why. Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2011-05-10 11:28:23 -06:00
Jonathan Corbet	6cd2721aec	Update copyright notices Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2011-05-10 10:26:55 -06:00
Cédric Bosdonnat	92726d0df5	Made the CSV file aggregating data by weeks or months When using -w option together with the -x file option, the data exported in the CSV file are aggregated by weeks instead of months (the default). This is useful to extract meaningful stats on short periods.	2011-04-22 08:43:41 -06:00
Cédric Bosdonnat	aac3e8ccc8	Making the datelc an actual csv for easier data handling	2011-02-17 10:23:57 -07:00
Cédric Bosdonnat	e26ed2dedc	Why should we exit after datelc output?	2011-02-11 09:26:06 -07:00
Cédric Bosdonnat	5f2d65c8fd	Only add Linus and Andrew with -a [jc: removed unneeded global line]	2011-02-11 09:23:48 -07:00
Jonathan Corbet	f64e9ffbd8	Make tag matching stricter If you commit a git changelog to your repository, gitdm will be confused by all the added patch tags. So make the patterns stricter to force them only to match within the git log metadata - or so we hope. There is still room for confusion here; we really need to make grabpatch() smart enough to split metadata and the diff. Don't have time for that now. This patch changes results slightly. In the 2.6.36 cycle, there's a tag reading: Original-Idea-and-Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org> Pre-patch gitdm would recognize that as a signoff; after the change it no longer does. Reported-by: Wolfgang Denk <wd@denx.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2010-10-04 17:06:33 -06:00
Tiago Vignatti	a91d5f59e5	Add option to get the configuration files from a given base directory Instead boringly be replicating the directory base name where gitdm is installed and write it on each option inside the configuration file, just send it through the command line. Signed-off-by: Tiago Vignatti <tiago.vignatti@nokia.com>	2010-06-30 20:13:21 +03:00
Jonathan Corbet	db2db73480	Only gripe about missing author names once (per name) Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2010-05-03 11:31:34 -06:00
Martin Nordholts	2d89da8864	Move out global houskeeping from grabpatch() As a step to make grabpatch() more unit-test friendly, move out global houskeeping from grabpatch(). This also gets rid of a TODO in the code. The regression tests still passes after this refactoring, of course. Signed-off-by: Martin Nordholts <martinn@src.gnome.org>	2010-02-06 16:32:35 -07:00
Jonathan Corbet	79414010fb	Get the developer count right even without full patch info	2009-07-24 17:11:41 -06:00
Greg Kroah-Hartman	e7b9d7eba1	gitdm: report issue when an email address is a "name" This probably means an incorrect commit message, it also means that if it is not fixed, the category for this person is probably going to be incorrect. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>	2009-07-24 13:57:59 -06:00
Greg Kroah-Hartman	9aaac7a67d	add more spaces to the "done" message so it doesn't show the trailing 0	2009-07-24 13:57:48 -06:00
Jonathan Corbet	d25572552b	Reduce the number of "funky email" gripes Addresses of the form "user at host.wherever" can be trivially repaired, so let's do so. A couple of other minor tweaks are included here as well; nothing which changes behavior.	2009-07-24 13:56:21 -06:00
Jonathan Corbet	c0d9831515	Quick hack to make the developer/employer counts at the top correct ...before we were counting everybody we knew about, regardless of whether they did anything in the period we're looking at.	2009-03-21 15:29:57 -06:00
Jonathan Corbet	9a2ba8a4f5	Tested-by / Reported-by credits and more Add tracking of tested-by, reported-by, and reviewed-by. For the first two, we also track who is giving those credits. While I was in the neighborhood I also: - Started turning the "patch" class into something more than a bare container; this work has just begin. - Moved the report-writing code into its own file (reports.py)	2008-11-11 11:11:04 -07:00
Michael Meeks	d1a8929872	gitdm patch ... Hi guys, I knocked up a patch to generate some per-month, by-affiliation statistics from the gitdm output; attached for interest or merging. A sample of the output, complete with OO.o data-pilot, and pretty chart is here: http://www.gnome.org/~michael/data/2008-09-29-linux-stats.ods with chart here: http://www.gnome.org/~michael/images/2008-09-29-kernel-active.png caption being: "Graph showing number and affiliation of active kernel developers (contributing more than 100 lines per month). Quick affiliation key, from bottom up: Unknown, No-Affiliation, IBM, RedHat, Novell, Intel ..." These are as yet not published, I plan to use them as a comparison to OO.o's somewhat mediocre equivalents; hope to go live with them soon (and fix the horrible bugs in stacked area charts to make them actually pretty ). HTH, Michael. -- michael.meeks@novell.com <><, Pseudo Engineer, itinerant idiot Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-10-06 14:29:27 -06:00
Jonathan Corbet	558dbe1cbe	Don't accept totally bogus dates Yanmin Zhang committed a patch (09f2724a786f76475ef2985cf84f5359c553aade) which claims to have been written in August, 2030. Code that bleeding-edge makes gitdm confused, so pretend it's just normal, contemporary stuff.	2008-09-05 13:53:35 -06:00
Greg KH	dd091c4268	finally get the config file stuff correct Need to seed the database _after_ loading the config file, otherwise we don't see the seeds as actually showing up for their companies. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-07-24 12:03:40 -06:00
Greg KH	5234a1f726	parse the config file _after_ we have read the command line options Otherwise it doesn't matter if we change the config file option or not... Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-07-24 12:03:40 -06:00
Greg Kroah-Hartman	80112779cf	make -c option actually work The -c option was not fully implemented Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-07-24 12:03:40 -06:00
Jonathan Corbet	f05ab88175	Fix up the copyright notices.	2008-07-18 15:34:28 -06:00
Jonathan Corbet	ba5f6b6943	Move regular expressions out to patterns.py ...I need them for an associated tool I'm working on.	2008-07-18 15:04:55 -06:00
Kir Kolyshkin	3d9830c3ce	gitdm: Report progress to stderr not stdout When gitdm is used for generating text-only report with its output redirected to a file, all is well aside from the clutter at the beginning of that file -- a very long line with repeating "Grabbing changesets...". Solve that by redirecting progress reporting to stderr. It also helps to see the progress when you redirect gitdm output to a file. Also, we don't have to flush stdout since stderr is unbuffered by default. Signed-off-by: Kir Kolyshkin <kir@openvz.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-06-27 09:02:28 -06:00
Jonathan Corbet	e1a6d06d65	Initial commit First commit of gitdm to the new repo. Call it version 0.10 or something silly like that.	2008-06-27 08:58:35 -06:00

43 Commits