[committed,buildbot] Replace the aarch64 build slave

Message ID 5049068a-9243-e699-5dda-bde6d97f832c@arm.com
State New
Headers show
Series
  • [committed,buildbot] Replace the aarch64 build slave
Related show

Commit Message

Szabolcs Nagy Oct. 5, 2018, 10:22 a.m.
This one is a thunderx machine.

(the other one was down for a while now.)

i assume the slave will be able to connect once there is a server restart.

Comments

Tulio Magno Quites Machado Filho Oct. 8, 2018, 1:53 p.m. | #1
Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

> This one is a thunderx machine.

>

> (the other one was down for a while now.)

>

> i assume the slave will be able to connect once there is a server restart.


The server has just been restarted.

If the new slave doesn't reconnect in the following minutes, we'll have to
analyze its log.

Thanks!

-- 
Tulio Magno
Szabolcs Nagy Oct. 8, 2018, 2:46 p.m. | #2
On 08/10/18 14:53, Tulio Magno Quites Machado Filho wrote:
> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

>> i assume the slave will be able to connect once there is a server restart.

> 

> The server has just been restarted.

> 

> If the new slave doesn't reconnect in the following minutes, we'll have to

> analyze its log.


sorry i stopped the slaves, since it could not connect previously.

now i restarted it and it fails with

2018-10-08 14:44:22+0000 [-] Connection to 144.217.14.79:9989 failed: [Failure instance: Traceback (failure with no frames): <class
'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.
Tulio Magno Quites Machado Filho Oct. 8, 2018, 3:23 p.m. | #3
Szabolcs Nagy <Szabolcs.Nagy@arm.com> writes:

> On 08/10/18 14:53, Tulio Magno Quites Machado Filho wrote:

>> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

>>> i assume the slave will be able to connect once there is a server restart.

>> 

>> The server has just been restarted.

>> 

>> If the new slave doesn't reconnect in the following minutes, we'll have to

>> analyze its log.

>

> sorry i stopped the slaves, since it could not connect previously.

>

> now i restarted it and it fails with

>

> 2018-10-08 14:44:22+0000 [-] Connection to 144.217.14.79:9989 failed: [Failure instance: Traceback (failure with no frames): <class

> 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.


That port is wrong.  It should have been 9991.
You have to change that in the buildbot.tac:

port = 9991

-- 
Tulio Magno
Szabolcs Nagy Oct. 8, 2018, 4:53 p.m. | #4
On 08/10/18 16:23, Tulio Magno Quites Machado Filho wrote:
> Szabolcs Nagy <Szabolcs.Nagy@arm.com> writes:

> 

>> On 08/10/18 14:53, Tulio Magno Quites Machado Filho wrote:

>>> Szabolcs Nagy <szabolcs.nagy@arm.com> writes:

>>>> i assume the slave will be able to connect once there is a server restart.

>>>

>>> The server has just been restarted.

>>>

>>> If the new slave doesn't reconnect in the following minutes, we'll have to

>>> analyze its log.

>>

>> sorry i stopped the slaves, since it could not connect previously.

>>

>> now i restarted it and it fails with

>>

>> 2018-10-08 14:44:22+0000 [-] Connection to 144.217.14.79:9989 failed: [Failure instance: Traceback (failure with no frames): <class

>> 'twisted.internet.error.TimeoutError'>: User timeout caused connection failure.

> 

> That port is wrong.  It should have been 9991.

> You have to change that in the buildbot.tac:

> 

> port = 9991

> 


thanks, fixed, and updated the wiki to mention the nondefault port.
Szabolcs Nagy Oct. 9, 2018, 9:55 a.m. | #5
On 08/10/18 17:53, Szabolcs Nagy wrote:
> On 08/10/18 16:23, Tulio Magno Quites Machado Filho wrote:

>> That port is wrong.  It should have been 9991.

>> You have to change that in the buildbot.tac:

>>

>> port = 9991

> 

> thanks, fixed, and updated the wiki to mention the nondefault port.


the first build is red, there are two failures, both are timeouts:

libio/tst-readline takes more than 80s
nss/tst-nss-files-hosts-multi takes about 30s

i assume it's ok to set TIMEOUTFACTOR=5 in the bot environment
or should we raise the TIMEOUT of these particular tests?

XPASS: elf/tst-protected1a
XPASS: elf/tst-protected1b
UNSUPPORTED: iconv/tst-gconv-init-failure
FAIL: libio/tst-readline
UNSUPPORTED: math/test-fesetexcept-traps
UNSUPPORTED: math/test-fexcept-traps
UNSUPPORTED: math/test-nearbyint-except-2
UNSUPPORTED: misc/tst-pkey
UNSUPPORTED: nptl/test-cond-printers
UNSUPPORTED: nptl/test-condattr-printers
UNSUPPORTED: nptl/test-mutex-printers
UNSUPPORTED: nptl/test-mutexattr-printers
UNSUPPORTED: nptl/test-rwlock-printers
UNSUPPORTED: nptl/test-rwlockattr-printers
FAIL: nss/tst-nss-files-hosts-multi
UNSUPPORTED: posix/tst-spawn4-compat
UNSUPPORTED: resolv/tst-resolv-ai_idn
UNSUPPORTED: resolv/tst-resolv-ai_idn-latin1
Summary of test results:
      2 FAIL
   5815 PASS
     14 UNSUPPORTED
     17 XFAIL
      2 XPASS
Makefile:401: recipe for target 'tests' failed
make[1]: *** [tests] Error 1
Joseph Myers Oct. 9, 2018, 11:44 a.m. | #6
On Tue, 9 Oct 2018, Szabolcs Nagy wrote:

> the first build is red, there are two failures, both are timeouts:

> 

> libio/tst-readline takes more than 80s

> nss/tst-nss-files-hosts-multi takes about 30s

> 

> i assume it's ok to set TIMEOUTFACTOR=5 in the bot environment

> or should we raise the TIMEOUT of these particular tests?


If only a few tests are timing out, and there are good reasons for them to 
time out on slow systems (amount of processing or I/O involved), then I 
think raising those tests' TIMEOUT is appropriate.

-- 
Joseph S. Myers
joseph@codesourcery.com
Tulio Magno Quites Machado Filho Oct. 9, 2018, 12:53 p.m. | #7
Joseph Myers <joseph@codesourcery.com> writes:

> On Tue, 9 Oct 2018, Szabolcs Nagy wrote:

>

>> the first build is red, there are two failures, both are timeouts:

>> 

>> libio/tst-readline takes more than 80s

>> nss/tst-nss-files-hosts-multi takes about 30s

>> 

>> i assume it's ok to set TIMEOUTFACTOR=5 in the bot environment

>> or should we raise the TIMEOUT of these particular tests?

>

> If only a few tests are timing out, and there are good reasons for them to 

> time out on slow systems (amount of processing or I/O involved), then I 

> think raising those tests' TIMEOUT is appropriate.


I agree with Joseph.

But answering your initial question: we can indeed change TIMEOUTFACTOR in the
bot.
We can tune it for each slave, if necessary.

-- 
Tulio Magno

Patch

diff --git a/master.cfg b/master.cfg
index 164d309..701def3 100644
--- a/master.cfg
+++ b/master.cfg
@@ -26,7 +26,7 @@  builder_map = {
   'glibc-ppc-linux': ['debian8-ppc-power8-1'],
   'glibc-ppc64le-linux': ['fedora25-ppc64le-power8-1'],
   'glibc-s390x-linux': ['marist-fedora-s390x'],
-  'glibc-aarch64-linux': ['reservedbit-xgene-ubuntu-aarch64'],
+  'glibc-aarch64-linux': ['tx1-ubuntu-aarch64'],
 }
 
 # Sets with all builders and all slaves.