Historical CPU statistics utility

Previously I wrote a post which discusses real-time CPU statistics utility. Now if you need historical data of CPU utilization statistics, use sysstat’s sar. For Fedora and CentOS, there is an official RPM package for sysstat.

Once you have installed sysstat, leave it for a few minutes until you get some records. The data collector runs from /etc/cron.d every 10 minutes (by default) and records all CPU usage. If you have SMP systems, then it will record each CPU’s utilization statistics. The default behaviour of sar is to print ALL CPU statistics combined, use -p to select individual CPU. Have fun watching your CPU utilization stats! πŸ™‚

krb5-telnet != telnet-server

I had a task to allow root login via telnet on RHEL 4.3 servers. I tried my luck on Google and found this. Once I have done exactly as mentioned, I still couldn’t login as root via telnet.

After researching a little bit more on Google, I finally found the answer! Apparently krb5-workstation‘s /etc/xinetd.d/krb5-telnet is not the telnet-server package that I have been looking for. telnet-server‘s telnetd is actually another package which is mentioned in the document I found earlier. I disabled krb5-telnet and enabled telnet in /etc/xinetd.d/.

Voila! Now it allows root login via telnet. Red Hat should have written a note about this in the document.

PS: Please enable telnet-server ONLY if you need it and you know what you’re doing. I do NOT recommend the use of telnet-server.

References:
http://kbase.redhat.com/faq/FAQ_45_453.shtm
http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1210184407758+28353475&threadId=1035531

Telco terminologies

GERAN – GPRS EDGE Radio Access Network
UTRAN – UMTS Terrestrial Radio Access Network
UMTS – Universal Mobile Telecommunications System
SIGTRAN – SS7 Signaling over IP
GPRS – General Packet Radio Service (2.5G)
EDGE – Enhanced Data rates for GSM Evolution (2.75G)

PLMN – Public Land Mobile Network
PSTN – Public Switched Telephone Network

NSS – Network & Switching Subsystem
BSS – Base Station System
MSC – Mobile Switching Center
MSC-S/MSS – Mobile Switching Center Server (3G)
GMSC – Gateway MSC
IMS – IP Multimedia Subsystem
BSC – Base Station Controller
BTS – Base Transceiver Station
SGSN – Serving GPRS Supporting Node
GGSN – Gateway GPRS Supporting Node
VLR – Visitor Location Register
HLR – Home Location Register
AuC – Authentication Center
EIR – Equipment Identity Register
IN – Intelligent Network

Gb – Link between BSS and SGSN
Gn – Link between SGSN and other SGSNs and internal GGSN
Gp – Link between SGSN and external GGSN
Gr – Link between SGSN and HLR
Gi – Link between GGSN and PDN (Public Data Network)
Gd – Link between SGSN and SMS Gateway

A-Interface – Link between MSC and BSC
A-bis – Link between BSC and BTS
RNC – Radio Network Controller (BSC in UMTS world)
Node B – BTS in UMTS world
IuCS – Link between RNC and MSC (voice)
IuPS – Link between RNC and SGSN (data)
IuR – Link between RNC and RNC
IuB – Link between RNC and Node B

DDF – Digital Distribution Frame
ODF – Optical fiber Distribution Frame

More to come.. picture soon!

Update initrd with mkinitrd to install new (different) hardware on Linux

I’m not sure if this important topic is discussed thoroughly in manuals because I couldn’t find any easily on Google without the exact keywords. I could be wrong though, but I’m documenting it here for my own note.

Many people do not dare to adopt Linux due to its complexity. I have to agree that Linux is far from being user friendly despite a lot of people’s effort to make it one. Simple task such as changing/replacing hardware is very common, especially now with the rapid development of technological advances. Few weeks ago I had to change one of my servers’ motherboard with a new one to accommodate a new multi-core CPU and I had to keep everything (OS and data) on the hard drive intact. Once the new motherboard was in place, it wouldn’t boot because it couldn’t read from my hard drive. Typical error where it says root partition cannot be found. Initially, I didn’t know an incorrect initrd could cause this problem because the error messages (which leads to kernel panic) didn’t mention anything specific other than not being able to read the partition table and find /. I almost put my blame instantly on LVM. Fortunately with some dedications searching through Google, I was able to find the right solution.

Linux uses initrd to keep a handful of modules it requires for booting to take place properly. If you compile every driver you need into the kernel, then initrd is not required but including everything in the kernel is inefficient and defeats the purpose of modular design. Important modules such as ext3, PATA/SATA controller, LVM, need to be placed inside the initrd. How can a boot loader (in most cases GRUB or LILO) be able to load LVM partitions if the modules required to read LVM partitions are inside the LVM partition?

To “repair” my new system, I had to boot using a rescue CD, modify /etc/modprobe.conf to include the new motherboard’s SATA controller. This new motherboard’s PATA controller is different from the old motherboard and therefore requires a different driver. I replaced “alias scsi_hostadapter2 pata_amd” to “alias scsi_hostadapter2 libata” in modprobe.conf. Basically these are the steps:

chroot /mnt/sysimage
vi /etc/modprobe.conf
(make needed changes as required)
depmod -ae -F /boot/System.map-2.6.9-1.667 2.6.9-1.667
mkinitrd -v -f /boot/initrd-2.6.9-1.667.img 2.6.9-1.667

Once everything ran as expected, remove the rescue CD, and reboot. The system should boot properly now. If it doesn’t, most likely the module name specified in modprobe.conf is incorrect.

Here is a tip to find out which PATA/SATA module you should include in modprobe.conf:
When booting the rescue CD, right after the blue screen appears it will pop up a dialog. Find the module name that is required in that dialog. Be sure to keep an eye on it because it will disappear quickly.

BTW, this article is also useful when moving an installed Linux hard drive to a different computer with different hardware.

Good luck! Let me know if this article helps you by posting a comment. Thanks!

References:
http://www.keffective.com/mvsata/
http://en.wikipedia.org/wiki/Initrd

MikroTik RouterOS Interface Bonding

I have two separate Metro Ethernet links (via fiber optic) from the datacenter to the NOC. Each link is 10Mbps. I need to utilize both links (bonding) and make sure sure that if one of the links goes down (redundancy), I won’t lose half of my packets. Bonding and redundancy are my goals.

Initially I tried Cisco Catalyst’s EtherChannel feature to accommodate this need since I learned about EtherChannel when I was doing my CNAP. Unfortunately EtherChannel cannot fit in this scenario due to my Metro Ethernet provider’s network setup. They use Cisco Catalyst 3750 switches to aggregate customers links from each POP to their headquarters. My first attempt was to establish trunk mode EtherChannel (802.1q) with Cisco Catalyst 2950 on one side and Cisco Catalyst Express 500 on the other side. Later I noticed that this is not doable since trunking requires MTU size to be larger than 1500 (1504) when my provider strictly limits MTU size to 1500 and negotiation between my 2 switches to establish trunking wouldn’t work since my switches’ BPDU packets are “intercepted” by my provider’s switches. Basically my Cisco switches were trying to establish a VLAN trunk with my provider’s directly connected switches when my switches areΒ  supposed to be negotiating to each other.

I consulted a few experienced people including an employee of the provider, and they told me to use access mode EtherChannel instead of the trunk mode EtherChannel. This is not possible with Cisco Catalyst Express 500, which only offers trunk mode EtherChannel. I bought a Cisco Catalyst 2960 to replace the Cisco Catalyst Express 500 hoping that access mode EtherChannel would work, it didn’t. Even if it did work, it wouldn’t be aware of link state changes since my switches do not connect directly to the fiber cables. There is a fiber-to-ethernet bridge for each side of each link, so my switches will always detect both links as always up as long as the bridges are up.

Since link states cannot be used as a measurement in this scenario, I had to find another way. MikroTik RouterOS offers not only bonding feature, but fail-over mechanism too! The fail-over mechanism uses ARP packets to detect link failures, it is far from perfect but at least it works.

I will add examples later, but for now have a look at this. Hopefully I will discuss EoIP and EoIP over PPTP too.

References:
http://www.mikrotik.com/testdocs/ros/3.0/interface/bonding.php

LVM recovery on Fedora Core 6 with Fedora 8 Rescue CD

One of my hard disks on my Fedora Core 6 server nearly failed yesterday. It sounded like it loses power every now and then. This hard disk is my primary, it has the /boot partition and an LVM partition. It holds at least 36GB of the 300GB+ LVM Volume Group. Had it died totally, then most of the OS would have been gone along with some of my data. Luckily I was still able to boot from this nearly-dead hard disk for a couple of times.

I downloaded Fedora Core 6’s Rescue CD ISO and burned it. Every time I could boot into the system without a problem, I rebooted immediately and booted from the Rescue CD. I was hoping that I could move the LVM PEs off the broken hard disk ASAP.

During my first attempt, the faulty hard disk ‘disappeared’ when I was running e2fsck. I had to shut the system down for about 15 minutes to let it cool down. This trick did work and I tried another attempt. This time e2fsck finished without a problem, and I ran pvmove to move the PEs from the faulty disk. Unfortunately my kernel is the latest version but the device-mapper and lvm2 packages are not. pvmove printed out errors (“device-mapper: reload ioctl failed: Invalid argument”) no matter how many times I tried. Initially I thought that the faulty hard disk may be too damaged, but then since I could still boot the system without a hitch so I guessed it couldn’t be that damaged.

This post has a solution, but I didn’t use it. I downloaded Fedora 8 Rescue CD ISO and used that instead of Fedora Core 6’s. This time pvmove didn’t show any error and the process completed as expected without any lost data. I was then able to vgreduce the faulty hard disk from the LVM Volume Group.

If you experience the same problem, try downloading a newer Rescue CD and give it a try. Hopefully it will address problems that are present on older Rescue CDs. Good luck! πŸ™‚

cacti not updating graphs

If you use cacti and have a lot of graphs on it and suddenly some of the graphs are up-to-date while some are not, it is a good idea to check the Devices page. See if the device of which the graphs belong to is up.

I just noticed my some of my cacti graphs are 2 weeks old because SNMPD on one of the servers died, so cacti didn’t update graphs which belong to that host. It took me a while to find out what was causing the problem since cacti’s log didn’t mention anything. πŸ™