-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heckman #1935
base: main
Are you sure you want to change the base?
Heckman #1935
Conversation
statsmodels/regression/heckman.py
Outdated
self.nobs_uncensored = self.nobs = np.sum(self.treated) | ||
self.nobs_censored = self.nobs_total - self.nobs_uncensored | ||
|
||
# store variable names if data came in as Pandas objects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part is taken care of by super for endog and exog. variable names and some descriptive things about the data are in the model.data
attribute.
check dir(model.data)
, I don't know those by heart
I went quickly through this to get a rough idea. Looks good so far, I don't think we will run into any serious problems with the inherited methods in this case. This will be for 0.7, it's already too late to add new models to 0.6. BTW: not sure where I have a do file, here is part of what I used for GLM.fit_constrained
|
about MLE On design problem will be that you have two different kinds of parameter vectors (different length if MLE estimates the joint model. |
I don't find the comments about To the latest comment: |
If you need numerical derivatives temporarily or because it's difficult to get an analytical expression, then you could copy score and hessian methods from base.model.GenericLikelihoodModel. another tip: if loglike and similar are just the sum of the individual contribution, then it's better to define the |
our git workflow is to rebase branches on master not to merge master into a "feature branch" I can fix this in an interactive rebase before merging, or, if you are comfortable enough with git, then you could checkout your branch at your second-to-last commit (before the merge of master) and git rebase on master instead of merging master, assuming you don't have any commits after the merge commit. |
Looks good overall. The main design problem that is left is how to handle the two different definitions of params in the Results class which depends on which estimation method was chosen. |
from numpy.testing import assert_ | ||
import pdb | ||
|
||
import heckman |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There is also a print of summary in the test log on TravisCI that contains NaNs in the bse of the summary table. Is this intended? |
The NaNs in the summary tables for the MLE estimates are not intentional. The LikelihoodModel class sometimes returns a covariance matrix with negative entries along the diagonal for the MLE estimate, and the NaNs appear because the summary table reports them as NaN because it square roots them to compute the standard error. Should I leave it as it is since it's what the LikelihoodModel superclass returns, or would it be better for me to implement bootstrap standard errors? |
bootstrap standard errors are a bonus. But we need to check that we get a valid positive definite hessian in a regular case (if we don't have close to perfect multicollinearity and the parameters are identified). Can you put the or an example with non-definite hessian in a script? I can look at it in a few days, I got pretty good at debugging optimization problems for most cases. BTW: along the same lines as your MLE here, I saw recently the discussion on the Stata forum for doing MLE with a bivariate Probit. Once this PR is finished and Heckman is working, then it will be easier to add similar models following the same or similar pattern. |
bnds.append((-1,1)) | ||
bnds.append((0,None)) | ||
|
||
heckman_res = heckman_model.fit(method='mle', method_mle='ncg', disp=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example of an MLE fit will produce a covariance estimate that is not positive-definite.
I'm trying to figure out what's going on in the mle estimation. A possible candidate is the loglikelihood defined in What reference did you use for the loglikelihood? |
two other things: Can you rebase on current master and force push, so we get rid of the unrelated test failures. To check what's going on, I attached the full mle results instance to the results, i.e. add one line at the end of
This will be useful in general for further analyzing the results. |
1 similar comment
bump for 0.14 After zero-inflated and Beta regression, we have more experience and support for multipart models. |
This pull request introduces 6 alerts when merging 92ea622 into 2d41628 - view on LGTM.com new alerts:
Heads-up: LGTM.com's PR analysis will be disabled on the 5th of December, and LGTM.com will be shut down ⏻ completely on the 16th of December 2022. It looks like GitHub code scanning with CodeQL is already set up for this repo, so no further action is needed 🚀. For more information, please check out our post on the GitHub blog. |
Added a module to estimate the Heckman selection model using the Heckman 2-step. A test module checks the parameter estimates that this module creates in an example dataset against the parameter estimates that Stata creates. If this happens to merge well into the existing statsmodels code, I'd like to add an MLE estimation method as well. Stata currently has the option to estimate the Heckman model by either the 2-step or by MLE.