Why MSPs should demand 60-second polling and full data retention from their NPM

Blog series: How 60-second polling saves you time and money

It’s mid-afternoon and a support engineer staffing the ABC MSP service support helpdesk receives a call from a very disgruntled customer.

The customer says his head office network connection is repeatedly going offline – albeit for short periods of time – but that the poor service is disrupting their ability to provide a high level of customer service. The interrupted network connection is causing IP telephones to go offline and interfering with the customer service team’s ability to respond to online chats.

The support engineer ends the call, knowing he can’t give an immediate answer because of the time needed to run a report. He jumps onto his network monitoring tool’s dashboard to investigate.

The first report he runs is an availability report of all the customer’s assets. He knows this report takes around seven minutes to produce, so he has plenty of time to grab a coffee and catch up with a colleague first.

With a report to hand showing that all the customer’s network devices and interfaces have been available at 100% for the past week, the support engineer returns the call. The customer, however, is not satisfied with the report. He has compelling evidence that this assessment is incorrect, and requests the engineer investigate the issues further, claiming that the service being supplied is not fit for purpose, nor meeting the contracted SLA.

*(Scheduled or on demand availability reporting)*

The client forwards his syslogs report. Together, the support engineer and the customer examine the syslogs and verify that one key interface between the router and the core switch appears congested and unstable.

Yet this is not being picked up by the service provider’s SNMP poller? What’s the reason for this?

What’s happening is that the network monitoring tool employed by the service provider is missing the flapping transition times of the key ports.

This is a fast route to losing the trust and custom of a client, but how is it possible for an MSP to manage what is not visible to them?

To gain the client’s trust, the MSP needs a solution that polls more frequently and retains the data to give an accurate and verifiable record of the real network asset availability.

A supplier of critical network infrastructure to sectors such as defense or the emergency services will insist on a very high service level agreement. A typical requirement is 5 nines or 4 nines availability (higher than 99.99% availability). This is a particularly common requirement among MSPs providing managed services to demanding sectors, such as banking or logistics.

However, if your devices are only polled every five minutes, this could mean downtimes of multiple minutes would go unseen in dashboard monitoring.

Techniche’s unique technology pings every network device every 15 seconds and SNMP polls every device at least once a minute. With Operstatus monitoring enabled on key interfaces, our NPM (Network Performance Monitoring) stores all as-polled data information as part of our proprietary time series database.

Top 7 Benefits of Statseeker for Managed Service Providers

SLAs can be configured to high precision (10^th of microsecond ping or RTT, 0.0001% availability). SLAs can also set to only track specific times of the day or specific days of the week, if required.
Create and configure dashboards for each client in just a few minutes. Display network performance status in real time and indicate when thresholds are in jeopardy or breeched with customizable colouring thresholds.
Schedule an automated daily, weekly, or monthly report, detailing exceptions and reassuring all parties that the delivered network services are meeting customer service level agreements.
Interrogate the retained as-polled data by going back in time – weeks, months or even years, to see trends in network performance and accurately evidence the level of service delivered. This is a clear advantage over memory-intensive storage of all syslogs and traps.
Without detailed records going back months or years, it can be impossible to say when an issue started or track the trends. This long-term trend reporting is useful for memory, temperature and CPU readings.
15 second polling gives MSPs a real assurance of network performance.
Configure alerts based on any threshold breech or anomalous network behaviour. Alerts can be shared with applications such as Teams, Slack, Splunk, ServiceNow etc.