WiDS Conference: How Data Science Helps Do Business and Identify Rare Diseases
The main goal of the Women in Data Science international conference is to create a platform for experience exchange between successful women, establishing communications in the industry and consolidating their community. The idea originated at Stanford University, and now the event takes place in more than 150 places all over the world. On March 9, WiDS took place in St. Petersburg at the Higher School of Economics with speakers from such companies as Yandex, Biocad, etc. ITMO.NEWS put down the bullet points.
How Data Science helps develop businesses
Elena Chernysheva, Yandex
How do Yandex analysts operate? There’s no clear-cut answer to that: specialists’ activities differ depending on their department and the specifics of the end product. Yandex consists of various small companies, among which are Yandex Eda, Yandex Taxi and such. All of them are separate entities that play by different rules. Many expect a company such as Yandex to have a common infrastructure, but that’s not really true. In five years, I’ve worked for several departments, and they all operated differently.
For example, my first job was for Yandex.Direct. This is a special service that collects data on organizations. Our staff have to learn about an organization (let’s say, they have its address), and then we collect and add data from different sources. Let’s say that Yandex learns that there’s a Higher School of Economics at a particular address. Our call center calls them to find out what it’s about. A special algorithm is used to learn about the organization’s website. We also send a specialist who photographs their building. In the end, we get a record with data from various sources.
So, what did our team of analysts do? First of all, we developed metrics for various algorithms. For example, we spent a year on finding the proper metric for easier clustering so that we wouldn’t have records that duplicate each other.
Yandex.Search operates differently: they have over 2,000 programmers and analysts in the team. The latter often act as managers who develop new metrics and improve the existing ones.
So, what should a young specialist expect if they want to get a job at a major company such as ours? There is a good chance that it will already have departments that operate according to an established procedure, but things might be different, as well. Oftentimes, heads of such organizations underestimate the value of data analysis. They believe that algorithms have to offer ready solutions, and expect them to work miracles.
My advice is to start your career at companies that already have an established work culture. This will give you the opportunity to see how everything is organized, and how much time is needed to get the first results. Let’s take Amazon as an example. Nowadays, they have 2,6 billion visits per month, and their conversion to sales ratio is 9%. If the latter is to increase, it will earn them extra 52 million dollars. Therefore, the company will benefit greatly from hiring more specialists in data science.
When working as an analyst for a major company, you’ll be doing very specific tasks that will lead to small increases in profits and help improve the services’ operation. For example, specialists from Yandex.Taxi have recently developed an algorithm that helps the drivers get home earlier. An hour before the end of their shift, they notify the system, and it makes it so that their new routes are closer to their residences.
So on one hand, you’ll immerse into the technological aspects of your job, but on the other, you’ll be detached from specifics of the business and the product.
The deal with smaller companies is absolutely different. Your range of tasks will be a lot wider: business models, metrics, user research, more tasks on modeling and forecasting. You will be closer to the product, and you’ll be able to influence all important decisions made at the company.
On the whole, it’s an often occurrence when organizations don’t really understand how to apply data science. The invaluable feature of data science is not reports with dashboards and the benefits of monitoring, but instruments for forecasting and modeling. It is they that help businesses calculate the ways to attaining their goals.
So, when you’ll come to your new workplace, don’t let other people tell you what to do: their perception of data science might be outdated. It is you who have to explain to them how your instruments can be of use.
How data analysis takes companies over
Ekaterina Gudkova, BIOCAD
Biocad is an international innovative biotechnological full-service company. This means that we conduct research, develop drugs, bring them to the market and solve logistics issues. The nature of our company sets particular requirements for our personnel and their operation. First of all, we have lots of staff, over 2,000 people, and more than a half of them focus on drug design, searching for new albuminous compounds and their ways of interaction, or forecasting their effects. Naturally, we have many modern laboratories and equipment that generate new data with every passing moment.
So what about our IT department? We help the rest of our staff with logistics and also focus on three key tasks. First of all, that would be computational biology. We use computers to model interactions of molecules with the human body, which relieves us from the necessity to carry out multiple tests.
Our sales and marketing departments tell us about the market that we plan to release our products to. Then, our researchers join the operation, they identify the disease and search for ways to cure it. For one, oncology drugs are very hard to model: they are often rejected by the organism, and this can be hard to predict. After we have a possible solution, we have to choose a molecule that will neutralize or eliminate the disease. In the early stages, we don’t know which molecule it will be, but there’s a limited number of substances in the world. We model different options and look at how various albuminous compounds interact.
Our next task has to do with forecasting synthetic reactions. We come up with some kind of a “digital chemist” that shows how reactions proceed, how much they will cost and what the results will be. It’s been helping the company save money for years.
Eventually, we’ve understood that a similar system can be used to help our other departments, as data analysis can be applied in each of them.
Dabbling in machine learning for no reason is not a way to go. This is why our first step was to perform pure data analysis with monitoring. We wanted to learn things about our production facilities, i.e. how fast our equipment comes out of commission, or how much time it takes to deliver raw materials from one department to another.
Our HR specialists also need specific metrics in order to answer an important question: what is the proper way for telling whether an employee has burnt out and we need to have a talk with them? For this reason, almost every company that does data analysis employs metrics that have to do with general indications: how a particular person works, who they are similar to, at what time they leave, etc.
Still, there’s one thing you always have to consider. When you start working with lots of dashboards, you get spam issues. You have numerous indicators, and you can’t spend your whole day working with them. So, you either react to particular changes or collect data for analysis and come up with particular hypotheses later on.
We at Biocad are currently introducing a personal assistant. You see, our service department can’t keep up with our growth rates and the increase in personnel, so in order to provide our new employees with the necessary financial and legal information, we need bots that answer typical questions. In this regard, we also face the issue of providing our employees with essential content only: there’s lots of information, and we need to adjust it to every particular person.
Prospects of applying machine learning in the healthcare of the future
Varvara Tsurko, iCS RAS, IQVIA
Machine learning is making its way into all areas of our lives: speech recognition, automatic translation, image recognition, etc. And medicine is a science that heavily relies on data: depending on the information they know, a specialist makes a diagnosis and chooses a treatment. In this sense, doctors can be compared to algorithms: they learn from patients’ medical histories. The more patients a doctor has had, the better their memory and skills become, and the more we trust them.
Still, there are things at which machines are even better: as opposed to humans, they can work with big data. For instance, a machine can analyze all medical records with a common disease, and identify its traits.
For this very reason, we are now considering a new digital approach to healthcare. In theory, we can speak about four key trends. The first is related to leading a healthy lifestyle: different apps and devices that monitor our activity and offer feedback. For example, they can give advice concerning the proper time to go to sleep or monitor the number of calories we consume, and in the future, they’ll be able to identify stress levels based on our pulse, movements and voice.
The second trend has to do with various applications of machine learning, like monitoring symptoms of patients who care about their health and don’t want to miss the first signs of danger. Biosensors can be fixed into smartphones and smart watches, read their users’ cardiograms and report any emerging problems. What’s more, if a person falls and a sensor detects that they don’t move for, say, 60 seconds, they can automatically call for help. Classification of cardiograms, identification of anomalies and the like are all tasks for data analysis. What’s more, image recognition technologies made it possible to develop apps that store the images of moles and report any deviations from the norm.
Monitoring of symptoms makes it possible to do diagnostics. This is the third trend: collecting and processing information that can be used by doctors. For example, we have this bypass surgery for atherosclerosis that is known for its relapses. Data analysis is already being used to help predict long-term outcomes of such surgeries. If the forecasts are bad, the surgery can be postponed or better rehabilitation measures can be prescribed.
Machine learning can also be used to diagnose rare diseases. An algorithm based on medical records of a great number of patients can make a diagnosis better than a doctor would. The problem here has to do with unbalanced sampling: there are a lot more people with negative diagnoses than those with positive ones. This can be solved by means of data preprocessing (elimination of accidental large class objects from the sampling or artificial generation of smaller class objects close to real objects) or an algorithmic approach (assignment of equal penalties for classification mistakes that occur in smaller classes).
The fourth trend is about conducting medical research. Earlier, you needed to contact some 50 medical centers in order to collect data on 10,000 patients. Nowadays, you can use an app that will do it on its own, store the data in the system and process the results. In 2015, the mPower app collected a record number of 10,000 medical records for patients with Parkinson’s disease, 93% of whom never participated in any similar initiative.
So, why it is still people who treat us and not machines? There are several answers to this question:
Machines can’t use intuition. There’s lots of research available, but you have to be very careful when assigning additional investigations;
The data collection issue: there are a great many diseases, and you need various disease markers to classify them;
Ethical issues: is it always necessary to inform a patient about their illness? Can we really entrust important decisions to machines?
Issues of defining applicability: which cases are better diagnosed by machines, and which are best left to human specialists?