View on GitHub

Linear-regression

A simple linear regression machine learning program

Download this project as a .zip file Download this project as a tar.gz file

linear-regression

A simple linear regression machine learning program, written in Python 3.4.1

Dependencies

Numpy: http://www.numpy.org/
Docopt: http://www.docopt.org/

You should not have to install these manually if you follow the instructions under 'Installation'.

Installation

Click the download tar.gz or download .zip button at the top-right of this page. Extract the files within to somewhere you can find.
Install setuptools
In a command prompt, navigate to where you extracted the tar.gz or .zip on your computer. You should see setup.py.
Run

> python setup.py install
# do not use 'pip install .', this is just a script

You should now be able to run 'regress.py' directly

Training Sets and Testing Sets

regress.py will use the training set to learn a weight matrix that it will apply to the attributes of each entry in the testing set to predict that entry's category. It will compare its prediction to the actual category and see if its prediction is correct. After all entries are processed, it will report its overall accuracy.

The training set must contain at least one example of every category present in the testing set.

Data Format

Delimiter separated numbers (as specified with the --delimiter flag. defaults to a comma), with category being the first number. For example, with 3 categories, you can represent each category as 1 2 or 3. Each entry should be on a new line.

Category must be an integer. Categories should be in sequence, though they do not have to start from 1.

The following data

1,5.1,3.5,1.4,0.2
2,3.1,1.5,2.4,0.4

represents two entries, the first of category 1 and the second of category 2.

Example data are in the data/iris and data/wine folders.

Entries with a different number of attributes than the maximum in the training dataset are ignored. For example,

Training

1,5.1,3.5,1.4
1,5.1,3.5,1.4,0.2
2,3.1,1.5,2.4

Testing

1,5.7,4.4,1.5
1,5.4,3.9,1.3,0.4
1,5.1,3.5,1.4,0.3,0.7

will be interpreted as

Training

1,5.1,3.5,1.4,0.2

Testing

1,5.4,3.9,1.3,0.4

Example Usage

> regress.py TRAINING_SET TESTING_SET
Results:  correct/total
Accuracy: n%

> regress.py TRAINING_SET TESTING_SET --weight
W =
[[W1]
 [W2]
  .
  .
  .
 [Wm]]
Results:  correct/total
Accuracy: n%

> regress.py TRAINING_SET TESTING_SET --verbose
Category 1: correct/total, %
Category 2: correct/total, %
.
.
.
Category N: correct/total, %
Results:  correct/total
Accuracy: n%

> regress.py TRAINING_SET TESTING_SET --delimiter=" : "
Results:  correct/total
Accuracy: n%

--delimiter=" : " means your data looks like

1 : 0.7 : 0.9 : 2.5

--delimiter=, means your data looks like

1,0.7,0.9,2.5