The development of the BAAZ prototype was coming to a close. The army troop assigned to the project had been undergoing training for the past month. The devices were going through final tests before we packed them up for their trip by Military transport to North India. Wheels up was 7:00am the next day. The demonstration to the Army Commander was two days away. Based on that we would move onwards to Kashmir (North India) where the devices would be evaluated for their value in the field…AND THE DEVICES WERE FAILING.
The SATHI (launched 2004) has been one of the highly visible project that I have worked on, it got the attention of the then President of India, Dr. APJ Abdul Kalam. Some press coverage that came out is at Jawan’s get a hi-tech SATHI, and an article by the Army Technical Liaison officer and a good friend Ravi Menon https://www.linkedin.com/pulse/invent-develop-make-india-story-you-may-have-missed-ravi-u-menon/ .
Before it got its official name, SATHI (Situational Awareness and Tactical Handheld Information) we called it the BAAZ, the hindi word for Hawk. The product goal was for every soldier to have an hawk eye view over the battlefield. Short summary, this was a ruggedized handheld device with GPS, and long range radio that would give every soldier visibility into where their fellow soldier was on the map. Simple stuff today, Remember we are talking 2001-2003, phone’s with GPS came out much much later; turns out SnapChat can do this today. Back to my story…
It was 9:00pm, We had 20 BAAZ devices ON, and spread out on the rooftop of our office. Device was working fine, GPS locked, RF working. Each device would get its location AND share it with all the other devices. But one by one devices would stop transmitting their position…another minute or two, and the next one would stop, and so on till we had only about 4 devices that were still working. A debug station that was listening to the messages from all devices was saying the same thing. Every few minutes one device would stop transmitting GPS position.
I’m sure you would have had this feeling sometime; the feeling in the pit of your stomach, “We’re sunk, this is not going to work, two years of hard work is going to result in spectacular failure.” If I close my eyes I can vividly remember the scene. It’s dark, the SATHI devices spread out on the roof, their displays on. I see the team members; discussing, arguing about what is happening.
The army team was waiting for us, they wanted to get some sleep, wheels up 7:00am meant that they would be at the airfield at 5am. The officers wanted to know what was going on; and so did my boss.
We rebooted all the devices and tested a little more; 9:30pm. Turns out messaging between devices still worked, it was just that transmission of GPS position was not. That was a clue…and helped narrow down the search for the problem to a smaller section of code (instead of everything is wrong!!!).
Radio Frequency (RF) transmission typically works by one person (device) transmitting, and everybody else (other devices) listening, then the next one transmit’s and everybody else listens, ad infinitum. We had a protocol that every time it came to be time to transmit, the device would check if somebody was already transmitting, and if there was, it would drop it’s transmission slot and randomly back off in a 20 second slot. Then when it’s next slot came it would do the same thing till it got its chance to send the message. Every other device was doing exactly the same thing.
http://man7.org/linux/man-pages/man3/rand.3.html
The C random function returns a number between 0 and RAND_MAX. So if you want a random number in a 20 second slot…
int seconds_to_backoff = rand() % 20;
There is a whole lot more I have about random seed’s…i’m trying to keep this short…
And this was the crux of the issue. The scheduling code was expecting that when somebody wants to schedule the next slot would give it data of at-least 1 second. Why the heck would somebody say “call me back in 0 seconds!” and it would never schedule the transmission function…ever!
The fixed code…
int seconds_to_backoff = ( rand() % 20 ) + 1;
A simple assertion in the beginning of the scheduler function would have caught this in the debug release during testing. This was a piece of code that was written a while back, and reared its head only when we had 10-20 devices setup to test. All our testing had been with 2-3 devices, who gives developers 20 devices to test with!
A lot of learnings from this episode, net-net the devices worked, they were tested (twice) and packed up by 11:30pm…and went on to their demo. Three days later we were on a flight to Jammu, and in a military caravan going up to the Jawahar tunnel in Banihal to field test the SATHI. But that is another story…

Recent mention in the news (october 2019), Project beta: https://www.indiatoday.in/india-today-insight/story/why-the-army-s-changed-mindset-on-indigenous-technology-is-a-relief-1611391-2019-10-21?utm_source=it_sticky_footer