Hardware monitoring on Xserve

Architecture specific questions.

Hardware monitoring on Xserve

Postby nelsonjm » 25 Jul 2006, 21:54

When you run 10.4/10.3 on an Xserve you can get a fairly comprehensive log of hardware issues by reading /var/log/hwmond

Is there a comparable hardware monitor for YLD for noting ecc, core voltage, and other errors?
nelsonjm
ydl newbie
ydl newbie
 
Posts: 3
Joined: 17 Jul 2006, 20:40
Location: Blacksburg, VA

Postby StarKnight83 » 26 Jul 2006, 14:40

if you look around there might be a gkrellm plugin for that (i understand the framework for plugins is faily easy so you might be able to make ur own) mainly you just cat the file w/ a few modifiers like a txt based one (grep and the likes)
Til our paths cross again
User avatar
StarKnight83
Moderator
Moderator
 
Posts: 959
Joined: 12 Jul 2004, 16:26
Location: Ft. Wayne, IN; USA

Postby nelsonjm » 27 Jul 2006, 20:28

Thanks for the suggestion.

I am actually looking for something that will write to a log or provide another cli method of informing me that voltages/fan speeds/ecc ram is out of whack.

One of the plugins that gkrellm has is for lm_sensors which seems to provide partially what I want.. if it works... however I still have not found a YLD compatible ecc error reporter. Do you know of any?
nelsonjm
ydl newbie
ydl newbie
 
Posts: 3
Joined: 17 Jul 2006, 20:40
Location: Blacksburg, VA

Postby StarKnight83 » 28 Jul 2006, 08:36

unfortunatly no; out of my 7 computers not one of them is ecc capable so no help here. What you could do is look at what lm_sensors outputs (it can be a plaintext file) and just do string analsys upon it to get just the info you need it certain perateres are out of range then email adim. a bash script and cron job could do this (ill even try to help you on it if you'd like - just email me off-board and ill see what i can do)
Til our paths cross again
User avatar
StarKnight83
Moderator
Moderator
 
Posts: 959
Joined: 12 Jul 2004, 16:26
Location: Ft. Wayne, IN; USA

Postby nelsonjm » 24 May 2007, 16:58

I got in contact with an IBM ppc kernel hacker who was very helpful in working with me on this problem. We currently have a rudimentary edac based ecc driver that from my experience works quite well. This hopefully will be merged into the kernel some time after version 2.6.21.

My next plan is to get everything with 5.0.1/5.1 setup then get the system identifer light working and write a userland app that checks smart errors, temperatures (whose sensors still need some work to report correct locations), and ecc errors to write to the logs... :)
nelsonjm
ydl newbie
ydl newbie
 
Posts: 3
Joined: 17 Jul 2006, 20:40
Location: Blacksburg, VA


Return to G4, G5 and XServe

Who is online

Users browsing this forum: No registered users and 2 guests

cron