-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance Issue]: Model Convergence issue in Keras with Parallel Execution of the fit() Method ? #1245
Comments
Hi, the losses in the two logs seem to have little difference. In the second log, only the last step has an increasing loss. What is the loss like after that? Are you using the debug mode or release mode? If you are in debug mode, will the result differ under release mode? |
In the second log, loss keep increasing every time (I didn't show it for more epoch but it's like I said in the thread). Moreover, I see mainly the val_loss as confirmation of performances.
I was in debug mode, but the result is the same in release mode (I check it).
|
I have no idea about it yet. model = keras.models.load_model(folder);
IOptimizer optimizer = keras.optimizers.Adam(0.01f);
ILossFunc loss = keras.losses.BinaryCrossentropy(); but keep |
It's exactly what I did, in fact, and the problem is the same as in
situation 2 (it will not converge).
To summarize: If model.compile and model.fit are in the same thread, it
will not converge. However, if model.fit is in another thread, it will not
cause any problems... Really strange...
|
I'll add another strange observation. I tried another example, the one from the TensorFlow.NET GitHub page, and got the opposite result! Below is the code used:
This example, similar to example 1 in my thread, DOESN'T CONVERGE! Contrarily, this code, similar to my example 2, converges:
This reveals a convergence issue opposite to the one I initially described in this problem. Furthermore, in this exemple, the more the graph is set in the same thread to the model.fit() code, the more the convergence will occur. To support this, consider situation 3 (a mix of 1 and 2):
This produces a better loss (acc = 0.3) than code 1 (acc = 0.1) but less than code 2 (acc = 0.45). This behavior is totally deterministic (not just one test has been done). Edit : But all of this have maybe an impact due to the optimizer used. I replace RSMprop by Adam and this exemple converge in case 1,2,3. I can't explain this behavior, which is totally opposite to my original problem but also indicates an underlying problem with the threads.. |
That's so weird! What's your code like after |
@AsakusaRinne there are litterally nothing |
Brief Description
I've encountered a perplexing issue while utilizing Keras and its fit() function to train a standard CNN.
To illustrate, consider the following code snippet where the model learns and converges successfully:
However, when the model.compile() and model.fit() functions are executed sequentially in the same thread, the model seemingly learns but fails to converge, as demonstrated below:
Remarkably, the convergence issue appears to be entirely deterministic (used obviously over the same dataset) between these two examples. The logs are send below.
Despite thorough investigation, I've failed to identify any critical differences. All variables remain consistent.
Specifically, it seems that the function model.compile() must not be executed in the same thread as the fit() function for successful convergence. I don't understand why...
Any insights or suggestions on this peculiar behavior would be greatly appreciated. Datas used can be given.
Best regards,
DEVERIN Adrien
Device and Context
Used on CPU
Benchmark
Logs example 1 (convergence working) :
ect ...
Logs example 2 (convergence doesn't work) :
Alternatives
No response
The text was updated successfully, but these errors were encountered: