It is All About Data!
Data Science, AI, Consulting, Business.
Data is everywhere. If you turn around, you will find machines that are connected to the internet, devices that are recording voices, storing notes and exchanging messages, and providing live streaming of videos and webpages. All of this is considered a source of information, and implicitly is telling a story that should be discovered.
But how is data represented?
The main essence of data is simply “numbers” - Yes, data is collected into sequences of numbers, digits, or floats. For instance, an image is a set of pixels, and pixels are represented by numbers corresponding to either grayscale or RGB range. The texts are formed with words, characters, and symbols that have simply a digital representation which is based on a discrete or continuous distribution of numbers - but at the end, they have a digital representation. The same thing is applied to videos, webpages, audio, speech, hyperlinks, etc.
But when it comes to numbers, we should simply think about the origin of various disciplines that deal with numbers - Mathematics, Arithmetic, Algebra, Statistics, Probability, and Information theory.
These areas of study mostly existed to define a mapping between sets of numbers, to understand the logic of combining sets and characterizing new information, or to recognize new patterns from a particular order of numbers.
After thousands of years, these various disciplines have been transformed into numerous computational formulas and equations that forced the number to yield theorems, axioms, knowledge, and finally a decision.
So how to get insight from data?
Any kind of data has a natural composition, a logical order, or arrangement that describes it. The images taken from a device displaying an object or a background, should have a recognizable set of pixels, grouped to highlight the object in the question, and the representation should define a way to its shape, color, location, etc, The natural language also produces worlds that have a meaning, present a semantic and sometimes a sentiment, The sentences should define a sequence of words, and a word should present a defined order of characters. Hence, the language would not matter without this natural characteristic of “order”.
Hereby, to extract knowledge from the data, the key is to find out the secret behind its representation. As the data is modeled with numbers and the data has a sort of pattern, we should simply make the link between the two ideas and find a way to use the disciplines of numbers to help us extract the pattern and conclude the story the data is telling - how thoughtful is that!
Consequently, that is how Machine Learning was conceived! It was ideal to use Mathematics and other related disciplines to understand the data that is everywhere, which has led to knowledge discovery, early detection, and prevention, prediction and forecasting, pattern recognition, etc.
And how is data serving the AI world?
With the omnipresence of the data and the high interest in collecting any sort of information the data provides, it has become thrilling to apply Artificial Intelligence (AI) now more than ever. The huge amount of available data that people are exchanging, storing, publishing, posting, or recording, is exponentially growing to make it possible for AI and machine learning models to grow and become immersive. This has consequently resulted in a new generation of models with billions of parameters and trained on trillions of data instances to learn almost everything. This has absolutely no limit, with machines understanding languages, summarizing documents, understanding complex speech, solving arithmetic assignments, and generating new music, and new images conditionally and unconditionally. The applications seem to - actually - have no limit, that recently many big models that were trained on millions of pairs of images and texts could simply generate new artistic pictures from an imaginary description, and realistic images that never existed before could also be generated from unsense texts (check DALL-E, DALL-E 2, GLIDE, Imagen models, etc) - you will find impressive results just from a deep understanding and analysis of the data.
Data is precious and the more we have, the more we can explore the opportunities of computers, and do not forget, it is after all coming from the magic of numbers!
Comentarios