MikroTik RouterOS Interface Bonding

I have two separate Metro Ethernet links (via fiber optic) from the datacenter to the NOC. Each link is 10Mbps. I need to utilize both links (bonding) and make sure sure that if one of the links goes down (redundancy), I won’t lose half of my packets. Bonding and redundancy are my goals.

Initially I tried Cisco Catalyst’s EtherChannel feature to accommodate this need since I learned about EtherChannel when I was doing my CNAP. Unfortunately EtherChannel cannot fit in this scenario due to my Metro Ethernet provider’s network setup. They use Cisco Catalyst 3750 switches to aggregate customers links from each POP to their headquarters. My first attempt was to establish trunk mode EtherChannel (802.1q) with Cisco Catalyst 2950 on one side and Cisco Catalyst Express 500 on the other side. Later I noticed that this is not doable since trunking requires MTU size to be larger than 1500 (1504) when my provider strictly limits MTU size to 1500 and negotiation between my 2 switches to establish trunking wouldn’t work since my switches’ BPDU packets are “intercepted” by my provider’s switches. Basically my Cisco switches were trying to establish a VLAN trunk with my provider’s directly connected switches when my switches are  supposed to be negotiating to each other.

I consulted a few experienced people including an employee of the provider, and they told me to use access mode EtherChannel instead of the trunk mode EtherChannel. This is not possible with Cisco Catalyst Express 500, which only offers trunk mode EtherChannel. I bought a Cisco Catalyst 2960 to replace the Cisco Catalyst Express 500 hoping that access mode EtherChannel would work, it didn’t. Even if it did work, it wouldn’t be aware of link state changes since my switches do not connect directly to the fiber cables. There is a fiber-to-ethernet bridge for each side of each link, so my switches will always detect both links as always up as long as the bridges are up.

Since link states cannot be used as a measurement in this scenario, I had to find another way. MikroTik RouterOS offers not only bonding feature, but fail-over mechanism too! The fail-over mechanism uses ARP packets to detect link failures, it is far from perfect but at least it works.

I will add examples later, but for now have a look at this. Hopefully I will discuss EoIP and EoIP over PPTP too.


LVM recovery on Fedora Core 6 with Fedora 8 Rescue CD

One of my hard disks on my Fedora Core 6 server nearly failed yesterday. It sounded like it loses power every now and then. This hard disk is my primary, it has the /boot partition and an LVM partition. It holds at least 36GB of the 300GB+ LVM Volume Group. Had it died totally, then most of the OS would have been gone along with some of my data. Luckily I was still able to boot from this nearly-dead hard disk for a couple of times.

I downloaded Fedora Core 6’s Rescue CD ISO and burned it. Every time I could boot into the system without a problem, I rebooted immediately and booted from the Rescue CD. I was hoping that I could move the LVM PEs off the broken hard disk ASAP.

During my first attempt, the faulty hard disk ‘disappeared’ when I was running e2fsck. I had to shut the system down for about 15 minutes to let it cool down. This trick did work and I tried another attempt. This time e2fsck finished without a problem, and I ran pvmove to move the PEs from the faulty disk. Unfortunately my kernel is the latest version but the device-mapper and lvm2 packages are not. pvmove printed out errors (“device-mapper: reload ioctl failed: Invalid argument”) no matter how many times I tried. Initially I thought that the faulty hard disk may be too damaged, but then since I could still boot the system without a hitch so I guessed it couldn’t be that damaged.

This post has a solution, but I didn’t use it. I downloaded Fedora 8 Rescue CD ISO and used that instead of Fedora Core 6’s. This time pvmove didn’t show any error and the process completed as expected without any lost data. I was then able to vgreduce the faulty hard disk from the LVM Volume Group.

If you experience the same problem, try downloading a newer Rescue CD and give it a try. Hopefully it will address problems that are present on older Rescue CDs. Good luck! 🙂

cacti not updating graphs

If you use cacti and have a lot of graphs on it and suddenly some of the graphs are up-to-date while some are not, it is a good idea to check the Devices page. See if the device of which the graphs belong to is up.

I just noticed my some of my cacti graphs are 2 weeks old because SNMPD on one of the servers died, so cacti didn’t update graphs which belong to that host. It took me a while to find out what was causing the problem since cacti’s log didn’t mention anything. 🙁

Linux tunnel problem Buffalo-Tech’s WHR-HP-G54 firmware bug

I have 6 Buffalo-Tech’s WHR-HP-G54 (with the latest firmware — WHR-HP-G54 Ver.1.40 (1.0.37- installed at a shopping center to provide wireless Internet hotspot. Since these access points (AP) have to be accessible from my office LAN, I set their default gateway via the web interface. Apparently this didn’t do the trick, I couldn’t access nor ping any of the APs from my LAN.

I used their Ping Test page to test connectivity from the APs to the router which is set as its default gateway, they received replies from the router and are now accessible from my office LAN.

After a few hours I wanted to change something via the web interface of the APs, but they were no longer accessible! Ping from my LAN to the APs ceased to work again, then I tried to ping the APs from the router, which is in the same network as the APs, and I got replies.

Knowing that the APs are actually still accessible from the router, I installed tinyproxy on the router to get to the APs’ web interface. I found a trick that triggers the firmware not to ‘sleep’: set NTP server with 1 hour interval. Why 1 hour? Because when I used 2 hours or more, the firmware still went to ‘sleep’ until the next NTP synchronization. Since I wanted the APs to be accessible at any time, I set them to 1 hour and voila, no more ‘sleep’ing problem! 🙂

Update (Nov 23, 2007):

I got this one wrong, sorry. 🙁
The firmware is fine, there is nothing wrong with Buffalo-Tech’s WHR-HP-G54 latest firmware. I found out that the tunnel interface on my Linux server created by ChilliSpot caused this behavior. If devices behind this tunnel-bound network interface do not initiate any connection then they will not be reachable from outside (despite having correct routes on the server). The only way to reach these devices from outside network is by getting them to initiate something. In my case, the hourly NTP synchronization schedule that I set on my WHR-HP-G54s triggers something which initiates a connection every hour allowing these APs to be ‘registered’ and reachable from outside network.

ClamAV’s clamd/freshclam permission problems

Since I have always used ClamAV‘s clamd as the virus filter of my email servers along with qmail-scanner, I noticed that crash-hat‘s clamav RPM packages use logrotate to rotate the logs files. qmail-scanner runs as its own user (qscand), so clamd has to run under the same user. When the RPM package was first installed, it created these directories: /var/run/clamav/ and /var/log/clamav/. Chown these 2 directories to qscand (this assumes that User directives in freshclam.conf and clamd.conf have been changed to qscand), otherwise clamd and freshclam wouldn’t be able to write any logs and pid file and neither service would start.

As for the logrotate configuration, edit clamd and freshclam in /etc/logrotate.d/ to change the log files’ ownership to qscand instead of clamav. Modify line 8 where it says:

create 640 clamav clamav


create 640 qscand clamav

That should do the trick. 🙂

RP-PPPoE server problem in Fedora Core 5, 6, Fedora 7, 8

Since Fedora Core 5, pppoe-server that comes with rp-pppoe RPM package has always been broken. Someone actually filed a bug report, but unfortunately there was no response. Apparently the problem is caused by ppp conflicting with syslogd. If you stop syslogd and klogd, then pppoe-server will run properly. Fedora Core 4 does not have this problem though. I’m not sure if the newly released Fedora 7 has got this issue sorted out. I’m guessing that they haven’t.

If you have installed Fedora 7 and found out that the issue has been fixed, please let me know ASAP. Thanks! 🙂

Update (Jul 03, 2007): Problem confirmed in Fedora 7.

Update (Mar 14, 2008): Problem fixed as stated on bugzilla ticket.

Missing bitops.h in Fedora Core 6

I was just compiling an updated version of HTB-tools a few minutes ago then I noticed that either I forgot to make a note about removing a line from q_show.c or the addition of bitops.h is new in the latest version of HTB-tools (0.3.0a). If you don’t remove the following line in q_show.c:


the compilation process will fail with the following error:

sys/q_show.c:40:24: error: asm/bitops.h: No such file or directory

I found this on Google to explain why bitops.h is missing in Fedora Core 6.

PPPoE Server HOWTO for MikroTik RouterOS 2.9

If you wish to run a PPPoE server, MikroTik RouterOS provides a convenient way to set one up in a few minutes (with built-in traffic shaping feature too!). Previously I used Fedora Core for my PPPoE servers, but I couldn’t find a working solution to keep ghost PPPoE sessions from bogging down my Linux server. I tried MikroTik DOM with RouterOS to replace my Linux-powered PPPoE servers, so far the results are very good.

Below is a mini guide that may be able to help you get your PPPoE server running in a few minutes using RouterOS.

First, make sure that your RouterOS server’s WAN connectivity has been properly configured. Remember that you need at least 2 network interface cards (NICs). This guide assumes that both NICs are ethernet — ether1 and ether2. If you haven’t set anything up on the new system, let me help you with the checklist: (based on my experience, the following issues are the most common)

  • Make sure that the Internet-facing NIC has an IP address assigned on it and the default gateway is set (/ip route add gateway=…)
  • If NAT is used, ensure that src-nat/masquerade firewall rule has been added (/ip firewall nat …) and it is working properly

Once you have verified the server’s connectivity, create a PPP profile (/ppp profile add name=”pppoe-profile” local-address= dns-server=… rate-limit=128k/128k). Every user account that uses this profile will get 128Kbps upload and download limit. If you wish to have different types of accounts (for example some customers pay for 256Kbps), create a new PPP profile (change the rate-limit attribute).

Next, create a user account assigned to the new PPP profile (/ppp secret add name=”andryan” password=”test” service=pppoe profile=”pppoe-profile” remote-address= When this user logs in successfully, this user gets assigned To dynamically assign IP addresses, there is an example here.

Finally, create a PPPoE server instance (/interface pppoe-server server add service-name=”pppoe1″ interface=ether2 one-session-per-host=yes default-profile=”pppoe-profile”) and enable it. Now your RouterOS PPPoE server is ready to answer PPPoE requests and authenticate your PPPoE clients. 🙂

Good luck!