We're updating the issue view to help you get more done. 

x10rt_sockets: blocking_probe doesn't block after a Place is killed

Description

Using x10rt_sockets and Resilient Native X10, when a place is killed calls to x10rt_blocking_probe in every surviving place no longer block. This degrades performance due to (a) CPU load of spinning immediate threads and (b) increased contention on the pthread_mutex that is guarding network access.

The attached test case illustrates the problem. Run the program with all the place on a single machine in one window and run top in another. After a place is killed, the immediate threads in each surviving place run flat out at 100% CPU because blocking probe no longer blocks. The noBlockWindow field of the x10SocketState in each surviving place is left non-zero after the place is killed, this causing the timeout to the poll call at line 1130 of probe to use a timeout of 0 (no timeout) instead of -1 (wait until there is data).

X10_NPLACES=4 X10_NTHREADS=2 X10_RESILIENT_MODE=1 ./BlockingProbe

Environment

None

Status

Assignee

DaveG

Reporter

DaveG

Labels

None

External issue ID

None

Components

Fix versions

Affects versions

X10 2.6.0

Priority

High