Discussion:
[Spread-users] Assertion fail when reloading configuration file
Adam Grossman
2011-08-12 19:38:37 UTC
Permalink
hello,

i am using version 4.1.0 on fedora 15. i follow these steps:

Machine A & B: Created a spread config with one segment, machines A and
B in the segment
Machine A & B: Ran a program which joins a group (both machines join the
same group) and then in an infinite loop, sends out a message saying
"hello from <machine name>" , reads in any incoming messaging, and
sleeps for 1 second.
Machine A & B: Programs receive all messages
Machine A: Remove Machine B from the configuration, and use spmonitor to
reload the configuration
Machine A: core dumps with the error: "G_compare_proc_ids_by_conf:
Assertion `ia > -1' failed"

this only happens if under these exact conditions. is this a bug, or i
am handling this incorrectly? i was hoping that any still incoming
messages from the removed daemons would just be ignored. if i am
handling this incorrectly, would a solution be, any way to resolve
this? since spread is a single thread, i can't view it as race
condition or anything that some simple semaphores/mutexes would solve.

thank you for any help,
-=- adam
--------------------------------------------------------------------------------------
This email message has been delivered safely and archived online by Mimecast.
For more information please visit http://www.mimecast.com
---------------------------------------------------------------------------------------
Adam Grossman
2011-08-16 19:05:41 UTC
Permalink
Post by Adam Grossman
hello,
Machine A & B: Created a spread config with one segment, machines A and
B in the segment
Machine A & B: Ran a program which joins a group (both machines join the
same group) and then in an infinite loop, sends out a message saying
"hello from <machine name>" , reads in any incoming messaging, and
sleeps for 1 second.
Machine A & B: Programs receive all messages
Machine A: Remove Machine B from the configuration, and use spmonitor to
reload the configuration
Assertion `ia > -1' failed"
this only happens if under these exact conditions. is this a bug, or i
am handling this incorrectly? i was hoping that any still incoming
messages from the removed daemons would just be ignored. if i am
handling this incorrectly, would a solution be, any way to resolve
this? since spread is a single thread, i can't view it as race
condition or anything that some simple semaphores/mutexes would solve.
i have researched this a bit further, and the problem seems to be when
spread sends out a message to the daemon that has been removed, it
can't, so it tries to remove it (by calling G_remove_daemon). that is
where assertion fails, because the daemon is not in the config struct.

There is evidently some structure that is not cleared out when a config
is reloaded with removed daemons. i think the solution would be:
1. if the config reload, go through the new and old and remove any
daemons that are not in the new from the other settings
2. if G_remove_daemon does not find a daemon in config, remove it from
the other settings.

the issue is, i do not know what those other settings are, and how to
remove them...

thank you,
-=- adam grossman
--------------------------------------------------------------------------------------
This email message has been delivered safely and archived online by Mimecast.
For more information please visit http://www.mimecast.com
---------------------------------------------------------------------------------------
Loading...