The interest in building software is increasing with the move towards online businesses. However, this process demands the building of customised software for the target business, which is time-consuming. Accordingly, several models and concepts have emerged to create software faster. One of the major topics is software reuse, which is the process of utilising existing components to build new software. By increasing online services in public resources and service registries, software reuse has captured the attention of engineers. These online services include a wide range of software, applications, or cloud technologies that use standard protocols to communicate and exchange data messaging. Moreover, developer task automation can improve composing available online services. Automation includes concepts and techniques that apply to developer tasks to increase productivity and reduce human errors.
In this context, the SmartCLIDE toolkit tries to bring most service development tasks together and also add automation techniques. This automation includes rule-based or AI-based approaches, which are presented as functionality to help developers.
These functionalities allow developers to invest much more time into their domain problem and software business logic rather than manual proper service identification and development.
This blog post aims to introduce SmartCLIDE DLE models, a subset of intelligent software development that refers to applying intelligent techniques to software development. Proposed models try to provide a learning algorithm with the information that data carries. SmartCLIDE data include internal data history and external online web services identified from online resources. The following figure demonstrated the big picture of SmartCLIDE external service identification.
A combination of the existing benchmark dataset and collected data enable SmartCLIDE to implement a range of intelligent learning models. The embedded learning models in SmartCLIDE seek to improve service development main tasks, which are:
- Identifying system requirements
- Finding and discovering service registries and providing a pool of services
- Classifying the discovered services to identify a list of candidate services with the same functionality for particular tasks
- Ranking selected services with the same functionality
SmartCLIDE DLE functionality has been embedded in order to improve automation in mentioned tasks. The selected AI approaches have demonstrated proper performance in software intelligence. Language modelling based on the sequence to sequence models, recommender systems, learning from existing code, and source code analysis are some instances, to name a few.
Moreover, the collected data type, service metadata, or related text directs us to mostly take advantage of text process trends, including deep learning methods such as encoder-decoder models. These models have impacted rapid developments. Therefore, SmartCLIDE has taken advantage of Transfer Learning and uses pre-trained models. The following table lists some popular deep learning models in software intelligent literature.
|OpenAI’s GPT-3||2020||GPT-3 is the 3rd version release of GPT-2. This model is over 10x the size of its predecessor, GPT-2)|
|OpenAI’s GPT-Neo||2021||Microsoft published in September 2020 that it had authorised “exclusive” use of GPT-3; others can still use the public API to receive output, but only Microsoft has control of the source code. GPT-Neo goal is to replicate a GPT-3 sized model and open source it to the public|
|OpenAI’s GPT-2||2019||The model is that it is trained in a database of 8 million web pages. GPT2 base model has 117M parameters, GPT2-medium has 345M and the GPT2-large 774M parameters.|
|Bert||2019||BERT (Bidirectional Encoder Representations from Transformers) was published in 2018 and Google has announced a language model in October 2019. The significant feature of BERT is using BiLSTM, the most promising model for learning long-term dependencies. BERT base model has 110M parameters whereas BERT large has 340M parameters.|
|DistilBERT||2020||The smaller BERT version to consider resource usage and performance, has been introduced, which is smaller and runs 60% faster than BERT .|
In summary, for increasing productivity with real-world data, some of the AI-based models in SmartCLIDE use pre-trained language modelings like BERT and GPT2. These models have trained on enormous data on the internet and have demonstrated acceptable results in both research and industrial communities. Yet, the individual classifiers are still considered based on available data size. The best practice for the training process is to use customised local data; nevertheless, in the beginning, we used some benchmark datasets in software intelligence academic works and available open source data. Training time, resource consumption, data storage capacity, and real-time interface interaction are other factors that DLE has to deal with to design and implement learning models.