Are AI cameras in our future?
In last year’s AWS re:invent event, which took place end of November, Amazon unveiled an interesting product: AWS DeepLens
There’s decent information about this new device on Amazon’s own website but very little of anything else out there. I decided to put my own thoughts on “paper” here as well.
Interested in AI, vision and where it meets communications? I am going to cover this topic in future articles, so you might want to sign-up for my newsletter
Get my free contentWhat is AWS DeepLens?
AWS DeepLens is the combination of 3 components: hardware (camera + machine), software and cloud. These 3 come in a tight integration that I haven’t seen before in a device that is first and foremost targeting developers.
With DeepLens, you can handle inference of video (and probably audio) inputs in the camera itself, without shipping the captured media towards the cloud.
The hype words that go along with this device? Machine Vision (or Computer Vision), Deep Learning (or Machine Learning), Serverless, IoT, Edge Computing.
It is all these words and probably more, but it is also somewhat less. It is a first tentative step of what a camera module will look like 5 years from today.
I’d like to go over the hardware and software and see how they combine into a solution.AWS DeepLens Hardware
AWS DeepLens hardware is essentially a camera that has been glued to an Intel NUC device:
Neither the camera nor the compute are on the higher end of the scale, which is just fine considering where we’re headed here – gazillion of low cost devices that can see.
The device itself was built in collaboration with Intel. As all chipset vendors, Intel is plunging into AI and deep learning as well. More on AWS+Intel vs Google later.
Here’s what’s in this package, based on the AWS blog post on DeepLens:
The hardware tries to look somewhat polished, but it isn’t. Although this isn’t written anywhere, this is:
In a way, this is just a more polished hardware version of Google’s computer vision kit. The real difference comes with the available tooling and workflow that Amazon baked into AWS DeepLens.AWS DeepLens Software
The AWS DeepLens software is where things get really interesting.
Before we get there, we need to understand a bit how machine learning works. At its basic, machine learning is about giving a “machine” a large dataset, letting it learn the data in one way or another, and then when you introduce similar new data, it will be able to classify it.
Dumbing the whole process and theory, at the end of the day, machine learning is built out of two main steps:
With AWS DeepLens, the intent is to run the training in the AWS cloud (obviously), and then run the deployment step for real time classification directly on the AWS DeepLens device. This also means that we can run this while being disconnected from the cloud and from any other network.
How does all this come to play in AWS DeepLens software stack?On device
On the device, AWS DeepLens runs two main packages:
Why MXNet and not TensorFlow?
The main component here is the new Amazon SageMaker:
SageMarker takes the effort away from the management of training machine learning, streamlining the whole process. That last step in the process of Deploy takes place in this case directly on AWS DeepLens.
Besides SageMaker, when using DeepLens you will probably make use of Amazon S3 for storage, Amazon Lambda when running serverless in the cloud, as well as other AWS services. Amazon even suggests using AWS DeepLens along with the newly announced Amazon Rekognition Video service.
To top it all, Amazon has a few pre-trained models and sample projects, shortening the path from getting a hold of an AWS DeepLens device to seeing it in action.AWS+Intel vs Google
So we’ve got AWS DeepLens. With its set of on-device and cloud software tools. Time to see what that means in the bigger picture.
I’d like to start with the main players in this story. Amazon, Intel and Google. Obviously, Google wasn’t part of the announcement. Its TensorFlow project was mentioned in various places and can be made to work with AWS DeepLens. But that’s about it.
Google is interesting here because it is THE company today that is synonymous to AI. And there’s the increasing rivalry between Amazon and Google that seems to be going on multiple fronts.
When Google came out with TensorFlow, it was with the intent of creating a baseline for artificial intelligence modeling that everyone will be using. It open sourced the code and let people play with it. That part succeeded nicely. TensorFlow is definitely one of the first projects developers would try to dabble with when it comes to machine learning. The problem with TensorFlow seems to be the amount of memory and CPU it requires for its computations compared to other frameworks. That is probably one of the main reasons why Amazon decided to place its own managed AI services on a different framework, ending up with MXNet which is said to be leaner with good scaling capabilities.
Google did one more thing though. It created its own special Tensor processing unit, calling it TPU. This is an ASIC type of a chip, designed specifically for high performance of machine learning calculations. In a research paper released by Google earlier last year, they show how their TPUs perform better than GPUs when it comes to TensorFlow machine learning work loads:
And if you’re wondering – you can get CLOUD TPU on the Google Cloud Platform, albait this is still in alpha stage.
This gives Google an advantage in hosting managed TensorFlow jobs, posing a threat to AWS when it comes to AI heavy applications (which is where we’re all headed anyway). So Amazon couldn’t really pick TensorFlow as its winning horse here.
Intel? They don’t sell TPUs at the moment. And like any other chip vendor, they are banking and investing heavily in AI. Which made working with AWS here on optimizing and working on end-to-end machine learning solutions for the internet of things in the form of AWS DeepLens an obvious choice.Artificial Intelligence and Vision
These days, it seems that every possible action or task is being scrutinized to see if artificial intelligence can be used to improve it. Vision is no different. You can find it other computer vision or machine vision and it covers a broad set of capabilities and algorithms.
Roughly speaking, there are two types of use cases here:
As with anything else in artificial intelligence and analytics, none of this is workable at the moment for a broad spectrum of classifications. You need to be very specific in what you are searching and aiming for, and this isn’t going to change in the near future.
On the other hand, there are many many cases where what you need is a camera to classify a very specific and narrow vision problem. The usual things include person detection for security cameras, counting people at an entrance to a store, etc. There are other areas you hear about today such as using drones for visual inspection of facilities and robots being more flexible in assembly lines.
We’re at a point where we already have billions of cameras out there. They are in our smartphones and are considered a commodity. These cameras and sensors are now headed into a lot of devices to power the IOT world and allow it to “see”. The AWS DeepLens is one such tool that just happened to package and streamline the whole process of machine vision.Pricing
On the price side, the AWS DeepLens is far from a cheap product.
The baseline cost is of an AWS DeepLens camera? $249
But as with other connected devices, that’s only a small part of the story. The device is intended to be connected to the AWS cloud and there the real story (and costs) takes place.
The two leading cost centers after the device itself are going to be AWS Greengrass and Amazon SageMaker.
AWS Greegrass starts at $1.49 per year per device. Amazon SageMaker costs 20-25% on top of the usual AWS EC2 machine prices. To that, add the usual bandwidth and storage pricing of AWS, and higher prices for certain regions and discounts on large quantities.
It isn’t cheap.
This is a new service that is quite generic and is aimed at tinkerers. Startups looking to try out and experiment with new ideas. It is also the first iteration of Amazon with such an intriguing device.
I, for one, can’t wait to see where this is leading us.3 Different Compute Models for Machine Vision
AWS DeepLens is one of 3 different compute models that I see in this space of machine vision.
Here are all 3 of them:#1 – Cloud
In a cloud based model, the expectation is that the actual media is streamed towards the cloud:
The data can be a video stream, or more often than not, it is just a set of captured images.
And that data gets classified in the cloud.
Here are two recent examples from a domain close to my heart – WebRTC.
At the last Kranky Geek event, Philipp Hancke shared how appear.in is trying to determine NSFW (Not Safe For Work):
The way this is done is by using Yahoo’s Open NSFW open source package. They had to resize images, send them to a server and there, using Python classify the image, determining if it is safe for work or not. Watch the video – it really is insightful at how to tackle such a project in the real world.
The other one comes from Chad Hart, who wrote a lengthy post about connecting WebRTC to TensorFlow for machine vision. The same technique was used – one of capturing still images from the stream and sending them towards a server for classification.
These approaches are nice, but they have their challenges:
This alternative is what we have today in smartphones and probably in modern room based video conferencing devices.
The camera is just the optics, but the heavy lifting takes place in the main processor that is doing other things as well. And since most modern CPUs today already have GPUs embedded as part of the SoC, and chip vendors are actively working on AI specific additions to chips (think Apple’s AI chip in the iPhone X or Google’s computational photography packed into the Pixel X phones).
The underlying concept here is that the camera is always tethered or embedded in a device that is powerful enough to handle the machine learning algorithms necessary.
They aren’t part of the camera but rather the camera is part of the device.
This works rather well, but you end up with a pricy device which doesn’t always make sense. Remember that our purpose here is to aim at having a larger number of camera sensors deployed and having an expensive computing device attached to it won’t make sense for many of the use cases.#3 – In the Camera
This is the AWS DeepLens model.
TBD – IMAGE
The computing power needed to run the classification algorithms is made part of the camera instead of taking place on another CPU.
We’re talking about $249 right now, but assuming this approach becomes popular, prices should go down. I can easily see such devices retailing at $49 on the low end in 2-3 technology cycles (5 years or so). And when that happens, the power developers will have over what use cases can be created are endless.
Think about a home surveillance system that costs below $1,000 to purchase and install. It is smart enough to have a lot less false positives in alerting its users. AND can be upgraded in its classification as time goes by. There can be a service put in place behind it with a monthly fee that includes such things. You can add face detection and classification of certain people – alerting you when the kids come home or leave for example. Ignoring a stray cat that came into view of the camera. And this system is independent of an external network to run on a regular basis. You can update it when an external network is connected, but other than that, it can live “offline” quite nicely.No Winning Model
All of the 3 models have their place in the world today. Amazon just made it a lot easier to get us to that third alternative of “in the camera”.IoT and the Cloud
Edge computing. Fog computing. Cloud computing. You hear these words thrown in the air when talking about the billions of devices that will comprise the internet of things.
For IoT to scale, there are a few main computing concepts that will need to be decided sooner rather than later:
I was reading The Meridian Ascent recently. A science fiction book in a long series. There’s a large AI machine there called Big John which sifts through the world’s digital data:
“The most impressive thing about Big John was that nobody comprehended exactly how it worked. The scientists who had designed the core network of processors understood the fundamentals: feed sufficient information to uniquely identify a target, and then allow Big John to scan all known information – financial transactions, medical records, jobs, photographs, DNA, fingerprints, known associates, acquaintances, and so on.
But that’s where things shifted into another realm. Using the vast network of processors at its disposal, Big John began sifting external information through its nodes, allowing individual neurons to apply weight to data that had no apparent relation to the target, each node making its own relevance and correlation calculations.”
I’ve emphasized that sentence. To me, this shows the view of the same IoT network looking at it from a cloud perspective. There, the individual sensors and nodes need to be smart enough to make their own decisions and take their own actions.
All these words for a device that will only be launched April 2018…
We’re not there yet when it comes to IoT and the cloud, but developers are working on getting the pieces of the puzzle in place.
Interested in AI, vision and where it meets communications? I am going to cover this topic in future articles, so you might want to sign-up for my newsletter
Get my free content
The post AWS DeepLens and the Future of AI Cameras and Vision appeared first on BlogGeek.me.
As many as you like. You can cram anywhere from one to a million users into a WebRTC call.
You’ve been asked to create a group video call, and obviously, the technology selected for the project was WebRTC. It is almost the only alternative out there and certainly the one with the best price-performance ratio. Here’s the big question: How many users can we fit into that single group WebRTC call?
Need to understand your WebRTC group calling application backend? Take this free video mini-course on the untold story of WebRTC’s server side.
At least once a week I get approached by someone saying WebRTC is peer-to-peer and asking me if you can use it for larger groups, as the technology might not fit for such use cases. Well… WebRTC fits well into larger group calls.
You need to think of WebRTC as a set of technological building blocks that you mix and match as you see fit, and the browser implementation of WebRTC is just one building block.
The most common building block today in WebRTC for supporting group video calls is the SFU (Selective Forwarding Unit). a media router that receives media streams from all participants in a session and decides who to route that media to.
What I want to do in this article, is review a few of the aspects and decisions you’ll need to take when trying to create applications that support large group video sessions using WebRTC.Analyze the Complexity
The first step in our journey today will be to analyze the complexity of our use case.
With WebRTC, and real time video communications in general, we will all boil down to speeds and feeds:
Let’s start with an example.
Assume you want to run a group calling service for the enterprise. It runs globally. People will join work sessions together. You plan on limiting group sessions to 4 people. I know you want more, but I am trying to keep things simple here for us.
The illustration above shows you how a 4 participants conference would look like.Magic Squares: 720p
If the layout you want for this conference is the magic squares one, we’re in the domain of:
You want high quality video. That’s what everyone wants. So you plan on having all participants send out 720p video resolution, aiming for WQHD monitors (that’s 2560×1440). Say that eats up 1.5Mbps (I am stingy here – it can take more), so:
Summing it up in a simple table, we get:Resolution 720p Bitrate 1.5Mbps User outgoing 1.5Mbps (1 stream) User incoming 4.5Mbps (3 streams) SFU outgoing 18Mbps (12 streams) SFU incoming 6Mbps (4 streams) Magic Squares: VGA
If you’re not interested in resolution that much, you can aim for VGA resolution and even limit bitrates to 600Kbps:Resolution VGA Bitrate 600Kbps User outgoing 0.6Mbps (1 stream) User incoming 1.8Mbps (3 streams) SFU outgoing 7.2Mbps (12 streams) SFU incoming 2.4Mbps (4 streams)
The thing you may want to avoid when going VGA is the need to upscale the resolution on the display – it can look ugly, especially on the larger 4K displays.
With crude back of the napkin calculations, you can potentially cram 3 VGA conferences for the “price” of 1 720p conference.Hangouts Style
But what if our layout is a bit different? A main speaker and smaller viewports for the other participants:
I call it Hangouts style, because Hangouts is pretty known for this layout and was one of the first to use it exclusively without offering a larger set of additional layouts.
This time, we will be using simulcast, with the plan of having everyone send out high quality video and the SFU deciding which incoming stream to use as the dominant speaker, picking the higher resolution for it and which will pick the lower resolution.
You will be aiming for 720p, because after a few experiments, you decided that lower resolutions when scaled to the larger displays don’t look that good. You end up with this:
0.3Mbps (2 streams) SFU outgoing 8.4Mbps (12 streams) SFU incoming 8.8Mbps (4 streams)
This is what have we learned:
Different use cases of group video with the same number of users translate into different workloads on the media server.
And if it wasn’t mentioned specifically, simulcast works great and improves the effectiveness and quality of group calls (simulcast is what we used in our Hangouts Style meeting).
Across the 3 scenarios we depicted here for 4-way video call, we got this variety of activity in the SFU:Magic Squares: 720p Magic Squares: VGA Hangouts Style SFU outgoing 18Mbps 7.2Mbps 8.4Mbps SFU incoming 6Mbps 2.4Mbps 8.8Mbps
Here’s your homework – now assume we want to do a 2-way session that gets broadcasted to 100 people over WebRTC. Now calculate the number of streams and bandwidths you’ll need on the server side.How Many Users Can be Active in a WebRTC Call?
That’s a tough one.
If you use an MCU, you can get as many users on a call as your MCU can handle.
If you are using an SFU, it depends on a 3 different parameters:
We’re going to review them in a sec.Same Scenario, Different Implementations
Anything about 8-10 users in a single call becomes complicated. Here’s an example of a publicly available service I want to share here.
The media server decided here how to limit and gauge traffic.
And here’s another service with an online demo running the exact same scenario:
Now the incoming bitrate on average per browser was only 2.7Mbps – almost a fourth of the other service.
Same scenario. Different implementations.What About Some Popular Services?
What about some popular services that do video conferencing in an SFU routed model? What kind of size restrictions do they put on their applications?
Here’s what I found browsing around:
Does this mean you can’t get above 50?
My take on it is that there’s an increasing degree of difficulty as the meeting size increases:The CPaaS Limit on Size
When you look at CPaaS platforms, those supporting video and group calling often have limits to their meeting size. In most cases, they give out an arbitrary number they have tested against or are comfortable with. As we’ve seen, that number is suitable for a very specific scenario, which might not be the one you are thinking about.
In CPaaS, these numbers vary from 10 participants to 100’s of participants in a single sesion. Usually, if you can go higher, the additional participants will be view-only.Key Points to Remember
Few things to keep in mind:
Sizing and media servers is something I have been doing lately at testRTC. We’ve played a bit with Kurento in the past and are planning to tinker with other media servers. I get this question on every other project I am involved with:
How many sessions / users / streams can we cram into a single media server?
Given what we’ve seen above about speeds and feeds, it is safe to say that it really really really depends on what it is that you are doing.
If what you are looking for is group calling where everyone’s active, you should aim for 100-500 participants in total on a single server. The numbers will vary based on the machine you pick for the media server and the bitrates you are planning per stream on average.
If what you are looking for is a broadcast of a single person to a larger audience, all done over WebRTC to maintain low latency, 200-1,000 is probably a better estimate. Maybe even more.Big Machines or Small Machines?
Another thing you will need to address is on which machines are you going to host your media server. Will that be the biggest baddest machines available or will you be comfortable with smaller ones?
Going for big machines means you’ll be able to cram larger audiences and sessions into a single machine, so the complexity of your service will be lower. If something crashes (media servers do crash), more users will be impacted. And when you’ll need to upgrade your media server (and you will), that process can cost you more or become somewhat more complicated as well.
The bigger the machine, the more cores it will have. Which results in media servers that need to run in multithreaded mode. Which means they are more complicated to build, debug and fix. More moving parts.
Going for small machines means you’ll hit scale problems earlier and they will require algorithms and heuristics that are more elaborate. You’ll have more edge cases in the way you load balance your service.Scale Based on Streams, Bandwidth or CPU?
How do you decide that your media server achieved full capacity? How do you decide if the next session needs to be crammed into a new machine or another one or be placed on the current media server you’re using? If you use the current one, and new participants want to join a session actively running in this media server, will there be room enough for them?
These aren’t easy questions to answer.
I’ve see 3 different metrics used to decide on when to scale out from a single media server to others. Here are the general alternatives:
Based on CPU – when the CPU hits a certain percentage, it means the machine is “full”. It works best when you use smaller machines, as CPU would be one of the first resources you’ll deplete.
Based on Bandwidth – SFUs eat up lots of networking resources. If you are using bigger machines, you’ll probably won’t hit the CPU limit, but you’ll end up eating too much bandwidth. So you’ll end up determining the capacity available by way of bandwidth monitoring.
Based on Streams – the challenge sometimes with CPU and Bandwidth is that the number of sessions and streams that can be supported may vary, depending on dynamic conditions. Your scaling strategy might not be able to cope with that and you may want more control over the calculations. Which will lead to you sizing the machine using either CPU or bandwidth, but placing rules in place that are based on the number of streams the server can support.
The challenge here is that whatever scenario you pick, sizing is something you’ll need to be doing on your own. I see many who come to use testRTC when they need to address this problem.Cascading a Single Session
Cascading is the process of connecting one media server to another. The diagram below shows what I mean:
We have a 4-way group video call that is spread across 3 different media servers. The servers route the media between them as needed to get it connected. Why would you want to do this?#1 – Geographical Distribution
When you run a global service and have SFUs as part of it, the question that is raised immediately is for a new session, which SFU will you allocate for it? In which of the data centers? Since we want to get our media servers as close as possible to the users, we either have pre-knowledge about the session and know where to allocate it, or decide by some reasonable means, like geolocation – we pick the data center closest to the user that created the meeting.
Assume 4 people are on a call. 3 of them join from New York, while the 4th person is from France. What happens if the French guy joins first?
The server will be hosted in France. 3 out of 4 people will be located far from the media server. Not the best approach…
One solution is to conduct the meeting by spreading it across servers closest to each of the participants:
We use more server resources to get this session served, but we have a lot more control over the media routes so we can optimize them better. This improved media quality for the session.#2 – Fragmented Allocations
Assume that we can connect up to 100 participants in a single media server. Furthermore, every meeting can hold up to 10 participants. Ideally, we won’t want to assign more than 10 meetings per media server.
But what if I told you the average meeting size is 2 participants? It can get us to this type of an allocation:
This causes a lot of wasted server resources. How can we solve that?
That last one of cascading? You can do that by reserving some of a media server’s resources for cascading existing sessions to other media servers.#3 – Larger Meetings
Assuming you want to create larger meetings than one a single media server can handle, your only choice is to cascade.
If your media server can hold 100 participants and you want meetings at the size of 5,000 participants, then you’ll need to be able to cascade to support them. This isn’t easy, which explains why there aren’t many such solutions available, but it definitely is possible.
Mind you, in such large meetings, the media flow won’t be bidirectional. You’ll have fewer participants sending media and a lot more only receiving media. For the pure broadcasting scenario, I’ve written a guest post on the scaling challenges on Red5 Pro’s blog.Recap
We’ve touched a lot of areas here. Here’s what you should do when trying to decide how many users can fit in your WebRTC calls:
What’s the size of your WebRTC meetings?
Need to understand your WebRTC group calling application backend? Take this free video mini-course on the untold story of WebRTC’s server side.
Here are CPaaS trends you should be expecting this year.
There’s no doubt about it. CPaaS is growing and it is doing so rapidly. It is a multi billion dollars industry, and while still small, there’s no sign of its growth stopping anytime soon. You’ll see the numbers $4 billion and $8 billion a year appearing in different reports and estimates that are flying around when talking about the near future of the CPaaS market size and growth potential. I have no clue if the numbers are correct – I’ve never been one to play with estimates.
What I do know, is that we’ve got multiple CPaaS vendors now with ARR (Annual Run Rate) higher than $100 million. Most of it may still come from good old SMS and phone calls, but I think this will change along with how consumers communicate.
This change will make CPaaS a lot more interesting and diversified than the boring race to the bottom that seems to be prevalent in some of the players’ offering and messaging in this market. The problem with CPaaS today is twofold:
Which brings me to what you can expect in 2018. Here are 7 CPaaS trends that will grow and become important this year – and more importantly – what they mean.
Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:
Get the shortlist#1 – Serverless
Serverless is also known as Functions.
You might know about serverless from AWS Lambda, Azure Functions, Google’s Cloud Functions and Apache’s OpenWhisk. The list here isn’t random – it goes to show that all big cloud platforms are now offering serverless capabilities.
This still isn’t prevalent in CPaaS, where for the most part, developers are expected to develop, maintain and operate their own servers that communicate with the CPaaS vendor’s infrastructure. But we do see signs of serverless making its way here.
I’ve covered that last year, when I took a deeper look into the Twilio Functions offering and what that means to the CPaaS market.
At the time, Twilio stated that Functions is already Twilio’s fastest growing product ever. Here’s where they explain what it does:
Twilio being the market leader in CPaaS, and Functions being a fast growing product of theirs means that other CPaaS vendors will follow. Simply because demand here is obvious.#2 – Omnichannel
When SMS just isn’t enough.
Not sure when you last used SMS for personal reasons – I know that I rarely end up inside that app on my smartphone. The way things are going, SMS can be considered the spam channel of 2018. Or maybe the channel used by businesses who’ve been told that this is the best way to reach customers and interrupt them.
While I definitely see value in SMS, I also think that businesses should strive to communicate with their customers on other channels – channels their users are now focusing on with their social life. In Israel that would be Whatsapp. In the US probably a mixture of Facebook and iMessage will work better. Telegram would be the choice for Russia.
Whatever that channel is, to support it, someone needs to integrate with it. And then decide which channel to use for which customer and for what interaction. For CPaaS, that’s what Omnichannel is about. Enabling developers, and by extension businesses to communicate with their customers on the customer’s preferred channel.
2018 is going to be the year Omnichannel becomes a serious requirement.
Because now we can actually use it.
Apple’s own Business Chat service is planned to make its public debut this year.
Facebook has its own APIs already, and Whatsapp announced business accounts (=APIs).
That alone covers a large majority of customer bases.
Throw in SMS, mix and choose the ones you want. And voila! Omnichannel.
For businesses, relying on CPaaS for Omnichannel makes sense, as the hassle of adding all of these channels and maintaining them is expensive. Omichannel CPaaS APIs will abstract that away.
For CPaaS vendors, this is a way to differentiate and make switching between vendors harder.
From code, to REST, to point-and-click.
We used to use DOS as an “operating system”. I worked at a small computer shop as a kid when I grew up. For a couple of years, my role was to go to people’s homes and explain to them how to use the new computer they just purchased. How to put the DOS disk inside the floppy drive, list the files in a floppy, run games and other applications.
Then came Windows (along with Mac and OS/2 and others) and we all just moved to using a visual operating system and a mouse.
As a kid, I programmed using Logo and Basic. Then Turbo Pascal – in a decent IDE for the first time. In the university, I got acquainted to Tcl/Tk. And then UI development seemed fun. Even it if was by writing code by hand. Then one day, vtcl came to life – a visual editor. Things got easier.
Developing communications is taking the same path now.
It started by needing to build your own stuff from scratch, then with open source frameworks and later CPaaS and REST (or god forbid SOAP) APIs.
In 2017, Twilio Studio was announced – a visual IDE to use on top of the Twilio functionality. In that corner, you can also count Amazon Connect, though not CPaaS but still in the domain of communications – it has a visual IDE of its own.
The concept of using visual tools requiring less coding can greatly increase productivity and the target audience of these tools. They are no longer restricted to developers “who code”. Hell – I can use these tools. I played with Twilio Studio a bit – it was fun and intuitive. It guides the way you think about what needs to be done. About the flow of the service.
I really can’t see how other CPaaS vendors are going to ignore this trend and not work on their own visual offerings during 2018.#4 – Machine Learning and Artificial Intelligence
It is time to be smart about communications
When I worked at Amdocs some years ago, we’ve looked into the area of Big Data Analytics. It was all about how you take the boatloads of information telecommunication companies have and do something with it. You start by analyzing and visualizing it, moving towards the domain of actionable.
It frustrated the hell out of me to understand how little communication vendors are doing with their data compared to enterprises in other markets. Or at least that was my impression looking from inside a vendor.
Fast forward to today, and what you find with CPaaS vendors is that they are offering a well oiled machine that provides generic communications. You can do whatever you want with it, and the smart ones are adding analytics on top for their own needs.
But want about the CPaaS vendors themselves? Shouldn’t they be doing something about analytics? Or its better branded colleague known as machine learning?
Gustavo Garcia wrote a good article about it – improving real time communications with machine learning. This is where most CPaaS vendors are probably looking today, optimizing their network to offer a better service.
But it is just scratching the surface.
The obvious is adding things around NLP – speech to text, text to speech, translation. All those are being done by integrating with third parties today, and many of the CPaaS vendors offer these out of the box.
To move the needle and differentiate, more needs to be done:
If you are a CPaaS vendor and you don’t have at least a data scientist, a machine learning developer and a product manager savvy in this domain yet, then start recruiting.#5 – AR/VR
Time to connect ARKit and ARCode to communications.
Augmented reality and virtual reality have been around for the better part of the last decade or two. But somehow, they are only now becoming interesting.
I guess the popularity of AR has grown a lot, and where it fits directly in smartphones today (and not the bulky 3D headsets) is with things like Pokemon Go and camera filters (started by popularized snapchat and found everywhere today).
With the introduction of Apple ARKit and Google ARCore, this is only going to get more commonplace. And what we see now is CPaaS vendors finding their way around this technology.
The most interesting one yet is Twilio’s work with ARKit, which they showcased at last year’s Kranky Geek event:
With all the focus put in this domain, I am sure we’ll see more CPaaS vendors looking into it.#6 – Bots
Omnichannel + Machine Learning + Automation = Bots
Chat bots is all the rage. Search the internet and you’ll be thinking that humans no longer talk to customers anymore. It is all taken care of by bots.
I’ve added a chat widget to certain pages on my website. And every once in awhile I get a question there asking if that’s a human they’re interacting with.
Bots require integration and APIs. They are also about communications. Which is probably why CPaaS vendors are taking a step towards this direction as well. The ones adding Omnichannel offerings across multiple channels are in effect enabling bots to be created there across channels.
That’s a first step though, as the next would be to cater this market better by enabling conversational interfaces and easing the part of packaging the bots for the various channels.
Expect to see a few announcements around bots to be made by CPaaS vendors this year. A lot of it will revolve around Amazon Alexa and Google Home#7 – GDPR
The governance headache we’ve all been waiting for.
GDPR stands for General Data Protection Regulation. It is a new set of EU rules that have been put in place to protect the data related to EU citizens that is collected and stored.
While it is easy to assume that CPaaS vendors store no data – they “live” in the real time, that isn’t accurate.
Stored meta data and logs may fall into the GDPR black hole, and definitely recording services. With the introduction of Omnichannel and Bots comes chat history storage.
Twilio jumped on this bandwagon last year with a GDPR program. Other vendors such as MessageBird indicated future support of GDPR. All global CPaaS vendors will need to support GDPR, and since these regulations come to force this year, 2018 will be the year GDPR gets more attention and focus by CPaaS vendors.2018 – The Year CPaaS Vendors Differentiated
In the past few years, we’ve seen CPaaS vendors struggling in two directions:
That second point is important. Up until recently, CPaaS equated to running one or two data centers (or the equivalent of running from a small number of cloud based data centers), connecting developers via REST APIs to the telecom backend. With the introduction of IP based communications (and WebRTC), the was a growing need for client side SDKs along with more points of presence closer to the end user.
We seem to be past that hurdle for most CPaaS vendors. Most of them have grown their footprint to include a global infrastructure.
The next frontier is going to happen elsewhere:
CPaaS will move in rapid pace in the next few years. Vendors who won’t invest and grow their offerings and business will not stay with us for long.
Planning on selecting a CPaaS vendor? Check out this shortlist of CPaaS vendor selection metrics:
Get the shortlist
adapter.js is the glue that sticks your code to the different browser implementations of WebRTC.
This article was co-written with Philipp Hancke. He has been the driving force behind adapter.js in the last two years, so it seemed like the best approach to have him contribute large portions of it. You can follow his writing here.
One of the visuals I created when I started out with WebRTC was this one:
It had several incarnations, and the main concept here is to show how WebRTC is different than traditional VoIP.
With traditional VoIP, you have multiple vendors implementing the specification, in hopes (as well as active interoperability testing) that the implementations will work in front of each other. If you knew one VoIP implementation, it said nothing about your ability to be able to yield another.
WebRTC was different. It brought to the table the concept of free, but also HTML5; and by that, I mean having a single API that every developer can use to add interactive voice and video to his application.
getUserMedia, PeerConnection and the data channel are all APIs specified in WebRTC. We’re now all speaking the same language when we’re implementing applications. And that, in turn, creates an ecosystem around it. One that was never there with such force with traditional VoIP.
Problem is, you can think of the WebRTC API as a suggestion only. That’s because today, version 1.0 of the specification isn’t yet a reality. We’ve got a candidate for it, but that says nothing about the implementations. Browser implementations of WebRTC are more like dialects of the same language. When you speak one, you understand another, but not fully. Not its nuances. And bad things can happen if two people with different dialects try to talk to each other without patience or understanding.
Which is probably where adapter.js comes into play.
Before we ask ourselves if adapter.js is needed today (it is), it would be worthwhile to understand how it came to be.adapter.js Origin Story
adapter.js has been around since the early days of WebRTC in late 2012 and early 2013. It was originally part of Google’s apprtc sample application. The original version can still be found in the Chrome tree. It was a very small project, less than 150 lines. The main job was to hide prefix differences like webkitRTCPeerConnection and mozRTCPeerConnection and to provide helper functions to attach a MediaStream to an HTML <audio> or <video> element.
During those wild west days of WebRTC, everyone wrote their own library to make WebRTC easier. This started to change in mid-2015 when Microsoft Edge came along. While Edge did not require prefixes for getUserMedia, attaching the MediaStream to a video element still worked in three different ways in as many implementations. This showed that there was a need to move to standardized behaviour. Also, as Microsoft’s Bernard Aboba pointed out, books were printed that showed the prefixed versions of the APIs — which is the wrong thing to teach.
Preferring ORTC over the WebRTC 1.0 API, Microsoft was extremely happy to support the addition of a shim of the RTCPeerConnection API on top of ORTC. This enabled early interoperability tests and allowed ironing out some bugs before the first public ORTC-enabled Edge version.
— Justin Uberti (@juberti) April 4, 2016
When Safari started shipping WebRTC they contributed a shim for the “legacy” bits of the WebRTC API that they did not want to ship. This was an interesting attempt to get developers to write modern, promise-based WebRTC code. However, it does not seem to have worked out as sadly the release version shipped with the legacy API is enabled by default.
With growing complexity (currently over 2,200 lines of code) and being in the “hot path”, testing of changes to the adapter.s code itself became more of an issue. Initially powered by Selenium the tests have been split up into unit tests and end-to-end tests that use standard testing tools like karma, mocha and chai to make assertions while running in a multitude of browsers on Travis-CI for every pull request and compare the results to previous runs. This shows the state of the art for testing WebRTC libraries and has been adopted by other projects as well.
For a quick and dirty project you can simply include https://webrtc.github.io/adapter/adapter-latest.js in your code.
This will give you the latest published version. Note however that your application will automatically pull any changes so this is not recommended for larger applications.
Since it is a polyfill, it transparently modifies the window object by default. The adapter object gives you information about the browser variant and version it detected in the browserDetails object:console.log(adapter.browserDetails.browser); console.log(adapter.browserDetails.version);
This is slightly different from a version detection library like platform as it treats Chromium-based browsers like Opera as Chrome — since they run the same WebRTC engine that makes sense.
You can use the detected browser and version to add your own logic for working around bugs present in certain Chrome versions (e.g. the Chrome 61/Android video freeze or the Chrome 58 TURN/TCP issue).
To check WebRTC support you will need to check that RTCPeerConnection is defined:!!window.RTCPeerConnection
and, if your use-case requires it, getUserMedia!!(navigator.mediaDevices && navigator.mediaDevices.getUserMedia)
or the createDataChannel method of the RTCPeerConnection:‘createDataChannel’ in RTCPeerConnection.prototype
After that you can simply write your WebRTC code as shown in the specification:
The official WebRTC samples are a great way to get started as they show a lot of use-cases and the maintainers ensure that they are semantically correct. Most of the shims are written in such a way that they will not become inactive when the native variant is available.Moving Forward
There are 4 forces at play with adapter.js:
Will a day come when we no longer need adapter.js?
But don’t wait up for it.
If the lifespan of jQuery is any indication (11 years and still going strong, with the last 4 of them with articles on why we don’t need jquery), we will be using adapter.js for many years to come.
As the year 2017 comes to an end, there was a small present. Hangouts started to support Firefox with WebRTC instead of a plug-in. While it had been public for a while that the Firefox WebRTC team had been testing this, it was a nice Christmas present to see this shipped. Tsahi Levent-Levi was one […]
The post All I want for Christmas is Hangouts to use WebRTC on Firefox appeared first on webrtcHacks.
WebRTC is… everywhere.
WebRTC started some 6 years ago. It was just another VoIP protocol specification that just happened to be targeted at browsers.
Six years in, and now WebRTC is everywhere. There are still those who believe it has failed, or haven’t lived up to its expectations. I’d say it is the vendors who failed to adopt it are the ones that have failed.
How do I know?
It has to do with those that are using it. Here are 10 massive applications that are making use of WebRTC. These companies trust WebRTC to offer them the leverage they need to deliver the user experience they strive for.
Looking for more vendors using WebRTC? Here are 10 interviews with inspiring vendors using WebRTC.
Download the eBookWhat’s Massive in WebRTC Land?
Before we start though, I want to say a word about what massive is.
It is really hard to know what’s massive. How do you count it? Especially when none of the vendors are willing to share their numbers in meaningful ways here.
So let’s do a back-of-the-napkin kind of calculation here for a sec –
In the recent Kranky Geek event, Google shared in their session an interesting statistic:
Over 1.5 billion of weekly audio/video minutes.
That’s easily upwards of 214 million minutes a day.
And that’s only on Chrome.
This number does include:
So the numbers are larger. Much larger.The Google Machine and its Leftovers
Back to that more than 214 million minutes a day.
During March 2017, Serge Lachapelle, the person in charge of WebRTC in the past and now of Google Hangouts and Meet, shared some numbers about video conferencing at Google during Google Cloud Next 2017:
9+ years daily translates to over 4.7 + minutes daily.
That’s the amount of use Google makes internally of Hangouts.
It is safe to assume that external use of non-Googlers can double that number with little effort to over 9 million minutes a day.
And continuing this lenient calculation, Hangouts accounts for 4-5% of all voice and video traffic in WebRTC.
Consider here fact that I counted Hangouts over multiple devices, browsers and applications while comparing it to Chrome only numbers, so I am fudging here a bit. On the other hand, I took non-Googlers to account for only half the usage, which is probably way too little.
Anyways, let’s look at them 10 massive applications who are already using WebRTC.1. Google Meet and Google Hangouts
9+ years daily. Inside Google alone.
Google Meet (or more accurately, Hangouts) is most probably one of the main reasons we have WebRTC.
Google had their own video conferencing service, working from Gmail, but it needed a plugin. Real time video just wasn’t there in the browser, which is where and why WebRTC started. And it started with a contribution by Google which we now know as webrtc.org.
To date, Google Meet (or Hangouts), is a massive application that makes use of WebRTC.2. Facebook Messenger
Here’s something I wrote some 5 years ago. It is about Skype vs Facebook. Here’s how I phrased it then:
Facebook can adopt WebRTC and provide a calling experience that surpasses most VoIP players.
The rest of the analysis then is kinda funny. Facebook did end up adopting WebRTC wholeheartedly into Messenger, but none of my suggestions were implemented (which in hindsight was probably for the best).
Here’s where Facebook have integrated WebRTC already:
All using WebRTC. I am even ignoring WhatsApp here (not sure what parts of WebRTC they use exactly).
At the recent Kranky Geek, we had Li-Tal Mashiach of Facebook talk about what it is they are doing with WebRTC and how do they scale their service.
No minutes here, but 400 million people using WebRTC every month. That’s 13+ million people a day on average. With only a minute each this is already massive.3. Discord
I came across Discord and its use of WebRTC in July 2016. That’s when I added them to my dataset, through a message I saw on Facebook somewhere. As any other vendor that gets into my radar, I continued to follow them closely.
Discord is a social platform for gamers (for lack of a better term). They have been around for only 2.5 years. This month, they shared a few numbers. Specifically:
Nothing here about voice and video, but I do know that the numbers here are impressive.4. Amazon Chime
Amazon Chime is new to the scene of unified communications and already big.
Chime started as an acquisition only a year ago of a company called Biba. It was probably already well underway to become a replacement for Amazon’s own internal video conferencing services. At Amazon’s re:invent event last month, Amazon shared a few numbers of how they use Chime internally:
24.8 million minutes a month. That’s almost a million minutes a day. From Amazon’s internal meetings only. Not including any of their Chime customers.
Not as massive as the others, but still quite large.
One thing to note – this isn’t “pure” WebRTC. Amazon took the approach of supporting legacy video conferencing systems first, so they “did” something to WebRTC to make it work. Their roadmap for next year is to add direct browser for users as well. What we do know, is that this uses WebRTC technologies inside today already.
Oh… and I didn’t even mention Amazon Connect, Alexa and Mayday – all making use of WebRTC.5. Houseparty
Houseparty is huge. Especially if you’re a teen. My daughter will probably start using it in a few years… once she grows out of Whatsapp and Musical.ly. Or so I’ve been told.
Houseparty makes use of WebRTC, although it is a mobile only service.
There’s not much numbers going on about Houseparty this year, so I’ll stick to the ones we know from a year ago.
20 million minutes a day.
Enough said.6. Appear.in
Appear.in started as a summer internship project at Telenor Digital somewhere, growing up to this point in time. Today it got acquired by Videonor.
The service is a favorite of many in the WebRTC community (and elsewhere – they are doing million of minutes a day).
If you haven’t tried it yet, then you should: appear.in
And yes. It is in the league of the other vendors here when it comes to size.7. Gotomeeting
There are many traditional VoIP (interesting that VoIP can now be considered traditional) that have started adding WebRTC to their offerings.
Most can probably make it into this list of massive applications.
Out of them, I decided to choose GoToMeeting. Why? Because the integration they’ve done was quite a natural one. I’ve been using it for well over a year now whenever someone invited me into a meeting over GoToMeeting – in most cases, they weren’t even aware of the browser option.8. Peer5
I wanted to add a company that doesn’t do voice and video. Or rather ones that are making use of the WebRTC’s data channel.
The one I picked here was Peer5. It was the easiest for me to get numbers from (I am an advisor there).
The P2P CDN scene is getting quite interesting lately. Alongside the startups like Peer5 that are pushing the envelope we now see companies like Akamai who stated publicly that they are headed this way with WebRTC as well.
In this year’s Kranky Geek event, Hadar Weiss, Co-founder and CEO of Peer5, shared a few of their numbers:
1 billion connections a day is large. Compared to millions of minutes a day. But we have to remember – a lot of these connections are short-lived in nature (viewers reaching out to peers they might stream data from or to) and that the more interesting number, which isn’t publicly available yet, is about actual data traffic.9. CPaaS vendors
CPaaS vendors drive this industry forwards. They do so for the smaller vendors as well as the largest ones.
In 2016, Twilio claimed to process “more than a billion minutes of WebRTC calls made through Twilio” as part of their launch of Voice Insights.
TokBox has stated this year that they power social video apps including Monkey, Houseparty, Fam and live.ly.
And they are not alone with it. There are 20+ such vendors catering to the needs of other developers.
Some of the CPaaS vendors can definitely be considered massive when it comes to the WebRTC traffic they generate.10. Back to you now
I most definitely forgot a vendor or two here.
Scroll down and comment below with your 10th candidate for the massive application using WebRTC.WebRTC is Still Miniscule
Let’s look at some other engagement metrics out there.
Netflix shared their numbers for the year this month:
Netflix members around the world watched more than 140 million hours per day
Hours. Not minutes. In minutes? That’s 8.4 billion minutes a day. For a single vendor. Compared to WebRTC’s 214 million minutes a day on Chrome.
I’d say WebRTC has room to grow.
Here’s for a bigger 2018.
Looking for more vendors using WebRTC? Here are 10 interviews with inspiring vendors using WebRTC.
Download the eBook
Are you doing your WebRTC pricing per minute? per gigabyte? per device?
You’re a developer. You decide it is time to build an application. But you don’t really want to do everything from scratch. Hell – you don’t even want to maintain and update all of that media backend – what do you really know about video? So you go look for someone to do it for you, finding a nice set of vendors offering WebRTC PaaS services. You can easily plug into their SDK and in no time have your service do group calling.
You probably won’t be conquering the world as the next Whatsapp with such an approach, but getting that healthcare service up and running an education application or a visual contact center is now within easy reach.
And you won’t be alone in this either. About a third of the dataset of vendors using WebRTC that I am tracking is using third parties. Most of them use managed services.
But here comes the question. Do you know how much you’re going to pay for that WebRTC PaaS service?
I get requests to assist in vendor selection on a weekly basis. This has been going for a few years now. This year, one of the main focus areas in this process has been pricing. Or more accurately, understanding the pricing schemes or the different vendors, and comparing the costs of these vendors.
There’s no easy way to get that done…
Let’s review the 3 leading pricing parameters are going to dictate your costs:Minutes
This one may seem easy.
You are going to pay for the number of minutes you use in a service.
It should be easy to calculate. Easy to understand the value (the more you use the more you pay).
But somehow, people translate minutes to the “old” days of telecom, where you paid top dollars to make phone calls. By the minute of course.
The devil is in the details here.
Here are few differences you’ll see between vendors.
The great thing about minutes? They are easy to comprehend and count.
If you have 10 people in a call for 10 minutes – that’s 100 minutes (assuming we count per device here).
The downside is that with minutes, there’s usually less regard to what is done in that minute. A video minute is the same as a voice minute on most platforms when it comes to pricing. And a low resolution video minute is the same as a high resolution video minute.Subscriptions
Subscriptions is related to minutes, and deals with the question of what it is you count the minutes against?
The two most common practices here is to count devices or count subscriptions.
Some of the WebRTC PaaS services work off the notion of a publish subscribe mechanism. Devices can publish media streams into a session, and devices can subscribe to media streams from the session. This is an elegant approach that can nicely be used when describing a complex scenario with asymmetric behaviors.
In an SFU group video call model, where each user publishes his own media streams and subscribes to the media streams of all other participants, the number of subscriptions grows at a polynomial rate: with N active users in a session, you’ll be counting N*(N-1) subscribed media streams.
In WebRTC PaaS, paying per subscribed minutes tends to be cheaper than paying per device minutes for lower group sizes (and vice versa)
Click To Tweet
It makes sense for a vendor to apply a per-subscription price as in many cases, his own costs are probably tightly coupled with the number of media subscriptions in the system.
Subscriptions are slightly harder to count than devices, but it is still gives you a solid number and an easy estimate.Bandwidth
The main complaint about per minute pricing is that it is a reminder of the old telecom days. The notion was that once we go for VoIP, cloud, web, WebRTC or whatever you want to call it, you can price it closer to the usage and not stay at the high level of a minute concept.
If it was limited only to the difference between audio and video then so be it. Give two price points per minute and you’re done. But video is different. It becomes more of a hassle with video. You can probably get video going with as little as 300kbps with 10-20mbps being applicable to 4K video resolutions. That’s not including things like 360 videos and other crazy trends like 8K or 10K resolutions that were just added to the HDMI spec.
So vendors are now looking into taking the route that is so common in IaaS – pricing per bandwidth processed.
Usually, that would be subscribed bandwidth. The reason for that is that cloud services usually cost the vendor based on the bandwidth he sends to browsers and mobile devices and not for bandwidth it receives on its cloud servers.
Here are a few quick things to validate in this price schemes:
Note that if you’re doing peer-to-peer sessions (that means doing a 1-on-1 session where you don’t want media to go through the vendor’s servers), you won’t be paying for bandwidth at all – unless the media gets relayed via TURN. TURN relay depends on network conditions and can’t be estimated properly (highly reliant on your users), but a rule of thumb of 15-20% of the sessions is usually used here.
Paying per bandwidth will tend to be cheaper than by minute. The reason is that the end result will be tailored to your exact usage pattern. That said, there are several downsides here:
Going for this IaaS type of a model is a great way to lower price points for customers, but at the same time it is a great way of dealing them with a huge headache.
At testRTC, I’ve been trying for some time now with my colleagues there to estimate what are costs are/should be. How much will we end up paying for our IaaS vendors every month? It is so hard, that I usually can’t even understand the detailed invoices we receive at the end of each month. I fear that the same is/will occur with per bandwidth pricing in WebRTC PaaS.Where Do We Go From Here?
In the latest update to my WebRTC PaaS report I’ve included a new appendix explaining pricing models in this space.
But the coolest thing yet was the inclusion of a new tool – a price calculator.
It is probably the 4th or 5th that I’ve created in 2017, each with its own nuances, target use cases and complexities.
This one was meant to be as generic and as simple as possible.
You enter the expected number of sessions you plan to have on a monthly basis, the number of users and the bandwidth per stream (there are a few suggested values in there).
Then you enter the pricing model and the price points of the vendors you want to compare, and the result will be the expected monthly cost you’ll have for each vendor.
Need something a bit more tailored? Reach out to me and I’ll help you out.
This latest update of my WebRTC PaaS report brings with it new vendors as well as a new price calculator.
It is becoming a ritual. Every 8 months or so I update the WebRTC PaaS (or CPaaS) report.
Every time I am surprised by the changes that occur. They come in 4 different areas:
How did we do since last time?New Vendors Covered ECLWebRTC by NTT Communications
I’ve been watching the work done by NTT Communications for quite some time. It started as a project that has signaling capabilities in it. At the time, they called it SkyWay.
Later on, they developed and added an SFU into the mix.
In September 2017 they decided to open up their platform globally. That’s the point where it made sense to add them to the report.Phenix
Phenix has been an enigma to me in the past two years.
From afar, it looked like a vendor trying to go after the broadcast market with a low latency technology based on WebRTC. Recently they approached me to explain what it is that they do and to check if it fits into this report.
And it did.
Phenix is focused on the large scale interactive streaming sessions. Places where you want to pick one or a few broadcasters and have their interactions shared with a larger audience.Vendors Closing Doors
We had those as well.Tropo by Cisco
Acquisitions of a WebRTC CPaaS vendor is sometimes beneficial and sometimes terrible for its customers.
TokBox’ acquisition by Telefonica was a good thing.
Tropo’s acquisition by Cisco… not so much.
Two years after its acquisition, Tropo closed doors to new customers. The signs were out there, since the platform didn’t really evolve. The service is still up and running, but I don’t think Tropo customers are happy to be using Tropo right now, and I don’t think Tropo/Cisco are happy to be needing to serve these customers. A lose-lose situation here.
Cisco simply pivoted. They decided that Tropo was not the right strategy and wanted to double down on Cisco Spark APIs and developer ecosystem.forge by Xura
Forge is another sad story of our industry.
Starting life as Crocodile RCS, it has been acquired by Acision. Acision was acquired by Comverse. Which got rebranded to Xura. Which was taken off the market by Siris Capital.
Forge, and probably other assets of Xura were just collateral damage in this process.M&A and Pivots in WebRTC PaaS Apidaze acquired by VoIP Innovations
VoIP Innovations acquired Apidaze. This is a good signal for the platform’s health. Looking at the investment section of Apidaze’ 4-pager in my report shows the story:
A lot of the attention and focus was taken from Apidaze API platform and put towards Ottspot, a “slack business phone app”.
This acquisition by VoIP Innovations might mean a renewed focus on the Apidaze platform and the developers who use it.TrueVoice is now Voxeet
TrueVoice was added to the report earlier this year. At the time, Voxeet added it as another product offering. This time around, Voxeet is making the APIs the main product.
This caused the TrueVoice brand to be removed, and Voxeet to be the actual thing.
Building a platform for developers is an all consuming process. Larger companies might be able to cope with doing that in parallel to other activities, but the smaller vendors will struggle. The fact that Voxeet decided to pivot and focus on developers is a good sign.Putting it all in a Visual
Here’s what it means visually:
2 in. 2 out. A few minor changes elsewhere.
The report shows the transitions in this market since 2014.What’s in the report?
The report is quite long. It now contains 223 pages. This includes:
Want to get a sneak peak into the report? You can check out these two PDF resources:
As you can see, this time, TokBox were kind enough to sponsor their 4-pager of the report and have it publicly available.
Here’s what Badri Rajasekar, TokBox CTO had to say:
2017 has been a big year for WebRTC. In what many considered a very significant piece of the puzzle, Apple announced support for WebRTC in Safari, finally allowing developers to use WebRTC on any browser platform. At the same time, we’ve seen a surge in adoption of live video communications driven in part by consumer demand. BlogGeek.me’s evaluation of this market is a valuable read for those looking for snapshot of this year’s trends in WebRTC.
Check out TokBox 4-pager from the report. You can expect to see 19 other such detailed profiles of the other vendors that the report covers.Report Tools
The report doesn’t come only as a “standalone” PDF file. You can access to a few additional tools:
There’s a ton more in the report, and work I do with vendors in this space – those offering such services, looking to offer such services or want to use these services.
WebRTC API Platforms are different than the classic/legacy/common CPaaS.
As I am working on getting the final TBDs in my upcoming report update on Choosing a WebRTC API Platform, I wanted to share something that may seem obvious, but probably isn’t.
When talking about CPaaS, WebRTC brings with it something more than just accessibility from the browser.
Here’s the makeup of a CPaaS platform:
There’s backend telephony in there, built out of some VoIP server components, connected to the carriers to handle things like phone numbers and actual calling.
Developers connect to that backend via REST APIs, or some other form of scripting interface.
Latencies and wait times aren’t important for the most part, so the CPaaS vendor doesn’t need to be spread across the globe to provide the service. A couple of data centers for redundancy and some reduction in latencies is usually enough.
Here’s what a WebRTC API platform looks like:
There might or might not be REST APIs. they are important, but definitely aren’t the main way developers interact with the system. That’s done via the SDKs. The SDKs are wrappers around the REST APIs or some other interface (probably WebSocket based), allowing getting the actual media and processing it as part of the SDK – either in the browser or on a mobile device.
And then there’s the backend. Signaling and NAT traversal are rather mandatory. Without them, this won’t be a WebRTC API platform. In the majority of the cases, you’ll also have access to an SFU, allowing you to support group video calls. All that backend? Especially the media parts of NAT traversal and SFU? They have to be as close to the end user as possible, so these platforms often deploy globally, on all possible data centers of a cloud provider (think AWS or GCE) and sometimes running on multiple cloud providers to increase their reach.
The difference then?
There’s a challenge selling to developers. They tend to underestimate the effort involved. And they usually prefer building new shiny toys than polishing and maintaining something that’s working. This is made worse by the seemingly “easy” fashion by which you can get a WebRTC peer-to-peer call happen inside a browser between two tabs. It gives the impression that developing and running WebRTC at scale is trivial.
Especially when you compare it to connecting to a phone number and dialing it. Doing this via an API is easy. But how do you go about dialing out a number on your own without the assistance of CPaaS? Is there a really simple example of this? Not really. This requires more than just programming – the value here is the accessibility to the phone network, which is considered a royal ongoing headache. So it is easy to outsource and to understand its value.
Here’s how the thinking goes:
SDKs? Sure. We can write them.
Signaling? I found a project on github that looks popular enough.
NAT Traversal? Everyone’s already using coturn. Should be simple enough to get it up and running.
SFU? Just passing data around. Can be written in a weekend.
Will WebRTC API Platform vendors be able to overcome this challenge? How can this be explained to developers? There is a lot that goes into building such a platform. More than the mere initial technical hurdles.
Browsers are changing. There are now 4 of them that have “support” for WebRTC. That support is different between browsers. New browser versions break things that used to work before. The specification is being finalized now, but no browser supports it yet.
Media backends need to be maintained. Monitored. Updated. Secured. In an ongoing basis.
In the coming years we will see a shift from H.264 and VP8 video codecs to VP9, HEVC and/or AV1 video codecs. This will require additional investment in the infrastructure.
And still it is believed to be easy and simple.
It isn’t.Planning on Launching Your Own WebRTC API Platform?
If you are planning to launch your own WebRTC API Platform, then you should know what you’re up against.
In the past 4 years I’ve been looking at this market, analyzing it. Seeing it grow and mature. The report covers 20+ vendors offering WebRTC API Platforms. Most of the are active. A few died or got acquired and taken off market.
One of the things to note is how new WebRTC API Platform vendors make their decision to launch their service. What do they decide to include in their initial launch. What do they use as differentiating factors from the existing players.
The space is rather crowded already, even if no clear winner exists yet.
Make sure to do your homework here. Understand what you’re up against and why should developers come to you and not to others. And plan for the long run.Planning to Use a WebRTC API Platform?
If you are in the build vs buy decision point, then think of the alternative costs of each approach. Also figure out your time to market and each and the risk of failure. For new projects, I tend to suggest a platform instead of self development. It reduces risk and upfront costs, but more than that, it enables experimenting and proving the business before committing too much into the project.
If you decided to build on your own, make sure your reasoning is rock solid. If the only reason is cost, then I suggest you recalculate.
If you decided to buy into a platform instead, then pick a platform that fits your need. But make sure it is here to stay as much as you can – this market is dynamic and is bound to stay that way for a few more years.The Report Update
The updated report will get published later this week.
If you want to learn more about it, just contact me.
TensorFlow is one of the most popular Machine Learning frameworks out there – probably THE most popular one. One of the great things about TensorFlow is that many libraries are actively maintained and updated. One of my favorites is the TensorFlow Object Detection API. The Tensorflow Object Detection API classifies and provides the location of multiple […]
The post Computer Vision on the Web with WebRTC and TensorFlow appeared first on webrtcHacks.
WebRTC Index has been around for 3 years now. Are you listed?
The idea behind it was quite simple. We create a place where someone can come and publish his company and its services – assuming they are related to WebRTC. The list grew, and now stands at 250 published vendors.
What we also did, was make sure the site is sustainable (there’s work to be done to keep it up to date). We chose the sponsorship approach:
Vendors can be listed freely in the index, but if you are a sponsor, then you get a bit of extra juice. You appear on the main page as a sponsor, get listed first on relevant search results, and get a few more ways to express what it is you offer on your own page.
What the WebRTC Index turned out into is a place to search for relevant vendors to assist people in understanding the industry and to pick up someone to work with.
And here comes my question to you?
Are you listed in the WebRTC Index?
Got check – http://webrtcindex.com/
I’ll sit and wait here. In the dark. Next to the nameless virtual machine that is hosting this website of mine.
Not there? Then read on…How can you join the WebRTC Index?
The system is easy and works as a manual process.
It really is that simple.
And it is a free process – no need to pay anything to join the list.
So why wait?