This was a brilliant talk @outreachdigit on how to build data teams with Florian Douetteau @dataiku and how to overcome various technological and HR related challenges faced.
He touched upon the topics like how artificial intelligence is crucial for the data for internal things. How the predictive analytics can work , The Deployment strategy can be useful. How one can push new behaviours of users.
Classical businesses have an intellectual team to perform their data analytics. Nowadays, specialised workforce is recruited in for web companies. What happens to the data which is generated by every click or a tweet. It has to be managed by really niche profiles of people with career paths like Data analysts, Data scientist and Data engineers.
Data Analysts: They have a strong understanding of how to use existing tools to solve the problems with core competencies in programming, stats, machine learning, data munging and data visualisation. They have to present data analysis effectively.
Rising alongside the relatively new technology of big data is the need of new job title called a Data scientist. A data scientist represents an evolution from the business or data analyst role. Good data scientists will not just address business problems, they will pick the right problems that have the most value to the organization. They look at the data from many angles for developing data, by firstly determining what it means and classifying it. then recommends ways to apply the data which is implementation.
The data scientist role has been described as “part analyst, part artist.”
Then the Data engineers: A data engineer builds a robust, fault-tolerant data pipeline that cleans, transforms, and aggregates unorganized and messy data into databases or datasources. Data engineers are typically software engineers by trade. Instead of data analysis, data engineers are responsible for compiling and installing database systems, writing complex queries, scaling to multiple machines, and putting disaster recovery systems into place.
Data engineers essentially lay the groundwork for a data analyst or data scientist to easily retrieve the needed data for their evaluations and experiments.
So, profiles for different types of data they usually have varying goals-analytics behaviour of different segments. Main goal is to deliver human team dynamics. It is very important to hire the right workforce to manage the data.
Image via Data Science 101
Clickers and Coders -Ages ago more people were coding like Ms Dos etc whereas people now have become more clickers, especially millennials only know how to click and drags on the touch screens,
Data issues: How are they supposed to get data inside his data lake? Which strategy should they adopt: the cicada, the spider or the fox one? There are different data strategies that are used like Cicada strategy where a startup can build new product using open data.
Cicada Strategy is when open data is freely accessible. He trusts open data, current or future, in order to provide his service.
This Open Data strategy can yield profitable results in the financial or transport markets; for example, startups can use merchandise transportation information and cross reference it with information on cargo and market prices, to provide highly relevant information to industry professionals.
The main drawback of the Open Data approach is the limited scope of open data. Indeed, for both ethical and economic reasons (which come together for once) open data is lacking when you are looking to learn specific things about a person, a product or an address… Anyway, the most useful things are private (fortunately) and paid (unfortunately)
Spider strategy is meticulous network of web trackers. Spider is a network of points where one can go to capture the data in every possible manner, sometimes starting with the smallest, and then gradually looks for the bigger ones. The spider will manufacture all the access points, all the connectors, allowing each player to provide him with its data and use his service.
Most online marketers take this approach: this means having your “tracker” (component for capturing traffic from a third party site) all over the web, so as to have the most data and the largest network possible.
Fox strategy: is a very good strategy for startups.
The fox seeks out “Big Data” where it is: in large businesses where “Big Data” is well fed! with this strategy business groups can take charge of critical problem and then sell the model to other companies.They build their own integrated problem within the projects for which they get fundings to solve critical situations. Just like a fox by first suggesting a possible solution to a problem. ◦ e.g., reducing your fraud, improving your ad buy costs, increasing the performance of your email marketing programs, optimizing the cost of raw material purchases, etc., etc.)
Thus, the knowledge obtained from this first customer to simply solve the problems of other customers.
The fox has a difficult life, because in order for his first approach to succeed, he must make believe he can solve a problem that he’s never solved before! To do this, he must stir the desires of the powerful (charming the big bosses of the group), flaunt his power.
Product issues: What is big data really about? And eventually, what are they willing to do with this bunch of data We live in age of distributed intelligence like I-cloud. The product teams must keep focus of the data. With high amount of artificial intelligence permeating in the sales and marketing scenario where automated emailers, newsletters, chat replies are sent to the customers, huge regulations have to be considered. For analysing the continuos prospect behaviours on social networks where CRM systems can joke and make decisions.There could also be data breach, privacy hindrance and legitimacy. Companies like Amazon, Linkedin use highly data smart softwares.