Friday, November 29, 2024

Micro Services - Backward compatibility design

Microservice Backward Compatibility

In a microservices architecture, ensuring backward compatibility is crucial for maintaining system stability and allowing independent deployment of services without breaking upstream dependencies.

One of the primary goals in a microservices architecture is to enable independent deployment of services. This means that changes to one service should not break or disrupt the functionality of other services that depend on it. To achieve this, backward compatibility must be maintained, ensuring that new changes do not affect existing contracts or interfaces. 

Here are some key approaches to achieve backward compatibility: 

Approach: Contract Models – Enhance Instead of Change

When evolving a microservice, it is important to enhance the contract models rather than changing them. This means adding new fields or endpoints instead of modifying or removing existing ones. By doing so, existing clients can continue to function without any changes, while new clients can take advantage of the enhanced features. This approach ensures that the service remains backward compatible and does not break any upstream dependencies. 

Approach: Create Adaptor

Another effective approach is to create an adaptor. Instead of changing the contract model directly, an adaptor can be introduced to handle the differences between the internal business model and the contract model. This allows the internal business model to evolve independently of the contract model. The adaptor translates between the two models, ensuring that the contract remains stable and backward compatible. This approach provides flexibility to change the internal business logic without impacting the external contract. 

Version Strategy to Communicate Major Changes

When major changes are necessary, a versioning strategy should be employed. Versioning the API allows clear communication of the contract version that the API is working with. By using version numbers in the API endpoints (e.g., /api/v1/resource), clients can explicitly specify which version of the contract they are using. This ensures that clients are aware of the changes and can migrate to the new version at their own pace, while still maintaining backward compatibility with the older versions.

 


Micro Services - Resiliency

In modern software systems, resiliency is a critical aspect that ensures the system remains functional and responsive even in the face of failures or unexpected conditions. Several design patterns can be employed to enhance the resiliency of a system. Below are some key patterns: 

Timeout Pattern

The Timeout pattern is used to configure a timeout for all downstream calls. This pattern helps in failing fast by setting a maximum time limit for a call to complete. If the call does not complete within the specified time, it is aborted. This prevents the system from waiting indefinitely for a response, which can lead to resource exhaustion and degraded performance.  However, if there are a lot of calls within this timeout period, it can still lead to issues. This is where the Circuit Breaker pattern comes into play. 

Circuit Breaker Pattern

The Circuit Breaker pattern is often used in conjunction with the Timeout pattern. It involves implementing a circuit breaker component that tracks all outgoing calls from the service. If the component observes a high number of failures (above a certain threshold), it transitions to an "open" state. In this state, the circuit breaker immediately responds with an error message or a default response, without attempting to make the call. This prevents the system from being overwhelmed by repeated failures and helps maintain overall system health by avoiding cascading failures.  The circuit breaker also has a "half-open" state, where it allows a few calls to pass through to check if the downstream services have recovered. If the calls succeed, the circuit breaker transitions back to the "closed" state, allowing normal operation to resume. 

Retry Pattern

The Retry pattern is useful for handling transient issues. It involves retrying a failed operation a certain number of times before giving up. This pattern helps in self-correcting the services and is particularly effective when used with the Circuit Breaker and Timeout patterns. By retrying operations, the system can recover from temporary issues without manual intervention. 

Bulkhead Pattern

The Bulkhead pattern involves separating services by their criticality and functionality. High-criticality services are allocated more resources to ensure their availability. This separation makes it easier to manage and segregate execution. Additionally, workload balancing can be used to distribute the load across multiple instances of a service. Load shedding can also be employed, where the load balancer redirects requests from overloaded instances to less busy ones. This ensures that retries are more likely to succeed by avoiding overloaded instances. 

Caching Pattern

The Caching pattern involves storing responses for repeated data to reduce the load on the system. By caching frequently requested data, the system can serve responses faster and reduce the number of calls to downstream services. This not only improves performance but also enhances resiliency by reducing the dependency on external services.

 

Micro Services - Database Design Pattern

In a microservices architecture, each microservice typically manages its own database. This approach offers several benefits, such as improved scalability and independence. However, it also introduces challenges, particularly around data sharing and consistency. To address these challenges, several complementary patterns can be employed: 

Event-Driven Pattern

In an event-driven architecture, each microservice saves data in its own database. The limitation of this approach is the difficulty in sharing data between microservices. To overcome this, microservices can connect with each other and create a copy of data from other microservices, storing it as a local cache. This allows microservices to access the necessary data without directly querying another service's database, thus reducing inter-service dependencies and improving performance. 

Event Sourcing Pattern

Event sourcing involves storing data as a series of events. This pattern is straightforward to implement in a monolithic database but is also feasible with microdatabases. Instead of storing the current state of an entity, all changes (events) to the entity are stored. To determine the latest state of the data, all events must be replayed and evaluated. While this approach can impact performance and increase storage requirements, it offers the advantage of easy data rollback by replaying events up to a certain point. Event sourcing provides a detailed audit trail and can be useful for debugging and compliance purposes. 

CQRS (Command and Query Responsibility Segregation) Pattern

CQRS separates the reading and writing of data into different models. In this pattern, there are distinct services for handling commands (writes) and queries (reads). The write and read databases are different, and a synchronization mechanism is required to keep them in sync. This separation allows for optimized read and write operations, as each can be scaled and optimized independently. However, ensuring data consistency between the write and read databases is critical and can add complexity to the system.

Micro Services - Asynchronous design

Asynchronous communication is a key aspect of modern microservices architecture, providing several benefits that enhance the overall system performance and scalability :

De-coupling Across Components - Asynchronous communication decouples components, allowing them to operate independently without waiting for each other. This reduces dependencies and improves system resilience. 

Fire-and-Forget Interaction- In a fire-and-forget interaction, the sender sends a message and does not wait for a response. This is useful for tasks that do not require immediate feedback, reducing the load on the system and improving responsiveness. 

Support Long-Running Jobs - Asynchronous communication supports long-running jobs by allowing tasks to be processed in the background. This prevents blocking resources and ensures that the system remains responsive to other requests. 

 

Microservice Functional Requirements

To build effective microservices, certain functional requirements must be met: 

Loosely Coupled Service  - Microservices should be loosely coupled, meaning changes in one service should not impact others. This enhances maintainability and scalability. 

Backward Compatibility - Independently Changeable - Services should be independently changeable without breaking existing functionality. This ensures that updates can be made without disrupting the system. 

Backward Compatibility - Independently Deployable - Microservices should be independently deployable, allowing for updates and deployments without affecting other services. 

Support and Honor Contracts - Microservices must support and honor contracts, ensuring consistent and reliable communication between services. 

Technology Agnostic API - APIs should be technology agnostic, allowing different technologies to interact seamlessly. 

Stateless - Microservices should be stateless, meaning they do not retain client state between requests. This simplifies scaling and improves reliability. 

Lightweight Communication - Communication between microservices should be lightweight to reduce overhead and improve performance. 

Cache-able Communication - Communication should be cache-able to improve efficiency and reduce redundant processing. 

UsabilityAPI Consistency, Predictable, Readable - APIs should be consistent, predictable, and readable to enhance usability and developer experience. 

UsabilityQuery-able Data - APIs should allow for query-able data to enable flexible and efficient data retrieval. 


Pragmatic REST API - Pragmatic REST API refers to a more practical approach to REST, focusing on simplicity and usability rather than strictly adhering to CRUD operations. 

Example :

Use Verbs Instead of Nouns: Use verbs in the API to denote actions, e.g., /startProcess instead of /process.

Append Query Parameters: Use query parameters to pass constraints, e.g., /search?query=example.

HATEOAS: Implement Hypermedia as the Engine of Application State (HATEOAS) by including URLs in responses to guide the client on the next steps, e.g., the server responds with the next URL needed for the client to make a call.

By following these principles, microservices can achieve better performance, scalability, and maintainability, while providing a more practical and user-friendly API.

 

Facade Pattern

The Facade pattern is popular for keeping implementation separate from the contract. It provides flexibility in the implementation without impacting the integration. This pattern simplifies the interface for the client and hides the complexities of the underlying system. 

Proxy Pattern

The Proxy pattern is popular when connecting with other services. The proxy object helps in various scenarios such as caching and making the internal model extendable. It can also assist with authentication and authorization. Additionally, the proxy pattern is beneficial for unit testing, as mocking proxy data is straightforward.

Micro Services - synchronize data

 Data Consistency Across Microservices

In a distributed microservices architecture, ensuring data consistency is crucial. When a failure occurs in one step, other transactions must either be canceled or rolled back to maintain consistency. 

 

Traditional Approach - Traditionally, all updates are made as a single transaction, ensuring that either all operations succeed or fail together. This approach is feasible in monolithic architectures where a single database can be used to update all records simultaneously. However, this is not possible with microservices, which typically use distributed databases, each maintaining its own data. 

 

ACID Properties

Transactions in microservices should adhere to ACID properties: 

Atomicity: All or nothing when committing data changes.

Consistency: Data transitions from one valid state to another.

Isolation: Transactions run in isolation and are not affected by concurrency.

Durability: Committed data changes are durable and persist in the database.

 

2-Phase Commit Pattern

The 2-Phase Commit pattern focuses on data consistency rather than availability. It introduces a transaction manager (TxnMgr) to ensure data consistency. The process involves: 

Event Publication: The UI/BFF publishes an event to the bus.

Transaction Manager: TxnMgr subscribes to the event and creates a record in its database, detailing the components involved in the transaction.

Prepare Event: TxnMgr publishes a "prepare" event, which other microservices subscribe to and prepare for the task.

Vote Collection: Microservices publish their vote (success/fail) after preparation. TxnMgr records these votes.

Commit/Rollback: If all votes are successful, TxnMgr publishes a "commit" event, and microservices complete the task. If any service fails, TxnMgr publishes a "rollback" event, and microservices roll back their preparation steps.

Pros

    Guaranteed atomicity across services.

    ACID compliant.

Cons

    Performance bottleneck due to waiting for TxnMgr responses.

    Single point of failure with TxnMgr.

    Complexity in handling retries and failures.

 

Saga Pattern

The Saga pattern focuses on atomicity and is more popular than the 2-Phase Commit pattern. It uses a Saga Execution Coordinator (SEC) and saga logs to manage transactions. The process involves: 

Saga Entries: Incoming transactions are converted into saga entries, each representing a service task with a corresponding compensation request (rollback request).

Status Recording: SEC records the status of each microservice. If all responses are complete, the transaction is complete.

Communication: Requests can be sent to individual services together or one-by-one using HTTP or events.

 

Pros

    Ensures data integrity.

    Fault tolerance with compensation requests.

    Scalable and loosely coupled.

Cons

    Implementation complexity.

    Performance overhead due to asynchronous communication and compensating actions.


Eventual Consistency Pattern

The Eventual Consistency pattern prioritizes availability over strict ACID properties. Asynchronous messages are published, and microservices process them independently. Data may be temporarily out of sync, but it will eventually become consistent.  

Pros

    High availability.

    Simple implementation.

Cons

    Temporary data inconsistency.

    Requires careful handling to ensure eventual consistency.

Micro Services - Composition


Microservices architecture often requires composing multiple services to complete a single operation. Here are various composition patterns used in microservices: 

When an operation requires more than one service to complete, a composition pattern is used. This involves coordinating multiple microservices to achieve a single business goal. 

 

Broker Composition Pattern

In the Broker Composition pattern, a message is published to a message broker. All subscribers pick up this message and process their tasks. Once all the work is complete, the client gets a notification and can pull the relevant data back. This asynchronous communication provides flexibility, good performance, and reliability. 

 

Aggregate Composition Pattern

The Aggregate Composition pattern involves an aggregator component, often a Backend for Frontend (BFF), responsible for fetching data from multiple microservices. The aggregator connects with other microservices, aggregates the data, and returns it to the front end. However, synchronous calls between the aggregator and other microservices can lead to a poor client experience. 

 

Chained Composition Pattern

The Chained Composition pattern is considered an anti-pattern. In this pattern, the client calls Service 1, which calls Service 2, and so on, creating a chain of HTTP calls. This can lead to increased latency and complexity. 

 

Proxy Composition Pattern

In the Proxy Composition pattern, an API gateway connects with all downstream services and returns the information to the client. The gateway does not aggregate data; instead, the front end is responsible for making multiple gateway calls and aggregating the data. These endpoints are called passive endpoints because the proxy has minimal logic and acts as a pass-through. The API gateway can provide benefits such as a centralized layer for security checks and data caching. It can also handle API version mapping. However, the drawback is that the client has to make multiple calls, one for each service. 

 

Mixed Composition Pattern

The Mixed Composition pattern combines multiple composition patterns. It uses the Broker Composition pattern as the base and adds a BFF as an aggregator layer. Once a message is published to the broker and a notification indicates that the data is ready, the aggregator fetches the data points, aggregates them, and returns them to the client. The BFF can connect with individual services to get the data, avoiding a chain of calls. An API gateway (Proxy Composition pattern) can be introduced on top of the services to expose the data to more consumers.

 


Micro Services - Architect with context

Begin with Microservices

 

Traditionally, software development has often relied on monolithic architectures, where all functionalities are built as part of a single, large service. This approach can lead to significant issues, as making a change in one aspect of the system can impact the entire product. Microservices architecture addresses this problem by creating boundaries around different aspects of the product, known as contexts. Each context is implemented as an independent service, referred to as a bounded context. 

 

Bounded Context

A bounded context is a specific area of the application with a clear boundary and interface. Each microservice has a specific role centered around this context. Bounded contexts can have sub-contexts, which may be related to data/models or functions. There are two primary approaches to defining bounded contexts: 

 

Ubiquitous Language

This approach is suitable for smaller projects and teams. The bounded context is defined based on the language or context used within the team. During the definition process, if overlapping sub-contexts are identified, they are either renamed to match the bounded context or extracted to form a new bounded context. If cross-references between bounded contexts remain, these points become integration points.

 

Event Storming

Event storming is a planning, estimation, and design session where the team goes over the details of all the events that form the product. This includes the sequence of events, associated commands (which trigger events), and any issues related to the events. By aggregating all the notes around these events, the team can define the bounded context and the corresponding services.

 

Aggregation of Microservices

In some cases, the contexts of two services are so closely related that creating a single service is more efficient. In such scenarios, an aggregate service is created to encompass both contexts.

 

Understanding Architecture

In a microservices architecture, asynchronous communication is essential to ensure scalability, responsiveness, and efficient resource utilization. Here’s a detailed look at why asynchronous connections are needed and how to implement them effectively. 

 

Why We Need Asynchronous Connections

With microservices architecture, numerous components are connected over the network. To complete a particular action, multiple microservices may need to interact. Keeping synchronous calls between services can lead to a poor user experience and cause more network resources to be blocked while waiting for responses. Asynchronous connections decouple clients and services, eliminating direct dependencies and improving overall system performance. 

 

Asynchronous Architecture

 

Work Queue Pattern

In the Work Queue pattern, multiple workers are available to finish a given task. Once a task is provided, any available worker executes it. Each piece of work is independent and can be processed by separate workers. This pattern allows scaling up workers to handle the load without impacting the outcome. The caller/publisher gives the message to an individual broker, where multiple workers are waiting to act upon it. 

 

Publish and Subscribe Pattern

The Publish and Subscribe pattern differs from the Work Queue pattern. Here, the publisher publishes an event, and subscribers subscribe to it based on their responsibilities. This pattern is more event-driven rather than command-driven. Multiple tasks by multiple workers are performed when one event is raised. 

 

Asynchronous API Calls

Messaging is not the only way to achieve asynchronous communication. Asynchronous communication can also be implemented using HTTP APIs. In this pattern, the front end talks to the BFF and receives an immediate response. The BFF then communicates with downstream services and provides a callback. Once the downstream service completes its work, it uses the callback to notify the BFF, which in turn notifies the UI that the response is available.

 


Thursday, November 21, 2024

Understanding openAI

 Lets start our journey of understanding openai with some basic concepts around Machine learning.

Machine learning is subset of AI. ML is all about implement algo to make system learn. Deep learning is kind of ML, based on brain algos called Artificial neural network. GPT-4/chatGPT based on deep learning algorithm called as "Transformers".


AI - Any technique that allows a computer to mimic human behavior

ML - Ability to learn without explicit programming

Deep learning - Ability to extract patterns from data using Artificial neural networks

Transformers - is an algorithm, GPT and chatGPT are based on this

Transformer arch is based on "pays attention" model. It has 2 parts - Cross Attention and Self-Attention.


Cross Attention - > find relevance of different parts of input text to find next prediction. like I Love sunny weather -> Sunny and weather together are considered to find next word. An attention mechanism in Transformer architecture that mixes two different embedding sequences.

Self attention -> Give weightage to different words in the sentence and then figure out the meaning.The goal is to learn the dependencies between the words in the sentence and use that information to capture the internal structure of the sentence

Encoder -> process the input text, identify the valuable features and generate meaningful representations of that text (called embedding). encoders are designed to learn embeddings that can be used for various predictive modeling tasks such as classification.

Decoder -> use the embedding to produce output. To train our decoder model, we use a technique called “Teacher Forcing” in which we feed the true output/token (and not the predicted output/token) from the previous time-step as input to the current time-step.

Generative pre-trained transformers (GPT) are designed to use only decoder, and relies only on Self-Attention mechanism within decoder. There is no encoder in GPT, so no cross attention. 

GPT models, however, do not use an encoder. Instead, they are with a decoder-only architecture. This means that the input data is fed directly into the decoder without being transformed into a higher, more abstract representation by an encoder.

There is token mechanism to split the text into multiple chunks. In general 75 words is 100 tokens. These tokens are used to create context and that context is used to figure out or predict the next word. GPT-4 has context window of 8192 token and 32768 tokens. 

Based on context the new word is predicted with probability and then added into the text. The newly formed text becomes the new input to the algo again and keep repeating. 


Models - 

Models are different strategies by which the prompt will be processed. 

Instruct GPT - text-ada-001, text-babbage-001, text-curie-001 text-davinci-003 are different instruct models . Generally used for single turn task.

ChatGpt - gpt-3.5-turbo, gpt-3.5-turbo-16K. Used for chat.

GPT4 - gpt-1, gpt-4-32k.  Used for both chat and single-turn.


both chatGPT and GPT4 uses the same endpoint - openai.chatcompletion


while making the api call - 

OPENAI_API_KEY -> is api key, which is associated with your account. It is used to give you access and then charge you for the same.

Access the endpoint - 

response = openai.ChatCompletion.create(

    model="gpt-3.5-turbo",

    messages=[{"role": "user", "content": "Hello World!"}],

)

# Extract the response

print(response["choices"][0]["message"]["content"])


{

    "choices": [

        {

            "finish_reason": "stop", ---> Status of the response. stop means it got completed successfully.

            "index": 0,

            "message": {

                "content": "Hello there! How may I assist you today?", --> text generated by model. Instead of content node, there could be function_call also possible.

                "role": "assistant", --> role will always be assistant.

            },

        }

    ],

    "created": 1681134595,

    "id": "chatcmpl-73mC3tbOlMNHGci3gyy9nAxIP2vsU",   ---> technical identifier used internally by openAI

    "model": "gpt-3.5-turbo", --> model used

    "object": "chat.completion",  ---> always chat.completion

    "usage": {"completion_tokens": 10, "prompt_tokens": 11, "total_tokens": 21}, ---> token used in the call, gives yoiu idea of the cost

}


In the API 2 parameters are mandatory 

model

message  - It has role (system, user, assistant) and content (actual message)


How to use the function instead of content?

You can create a function call using openai apis, but openAI model doesnt call the function itself. It will give you the argumenst you can use to call the function.

function object has - 

name - name of the function

description - what is function is all about

parameters - parameters passed in the function


# Example function  -----> this is the function you want to call once openai respond.

def find_product(sql_query):

    # Execute query here

    results = [

        {"name": "pen", "color": "blue", "price": 1.99},

        {"name": "pen", "color": "red", "price": 1.78},

    ]

    return results

# Function definition  ----> this is the definition passed in the openai call

functions = [

    {

        "name": "find_product",

        "description": "Get a list of products from a sql query",

        "parameters": {

            "type": "object",

            "properties": {

                "sql_query": {

                    "type": "string",

                    "description": "A SQL query",

                }

            },

            "required": ["sql_query"],

        },

    }

]


# Example question

user_question = "I need the top 2 products where the price is less than 2.00"

messages = [{"role": "user", "content": user_question}]

# Call the openai.ChatCompletion endpoint with the function definition

response = openai.ChatCompletion.create(

        model="gpt-3.5-turbo-0613", messages=messages, functions=functions  -----------> you pass the function defintion here.

)

response_message = response["choices"][0]["message"]

messages.append(response_message)

the response has function_call, instead of content, which will look like this  - 

"function_call": {

        "name": "find_product",

        "arguments": '{\n  "sql_query": "SELECT * FROM products \

    WHERE price < 2.00 ORDER BY price ASC LIMIT 2"\n}',

    }


Now we can use this response and call the actual function as - 

# Call the function

function_args = json.loads(  ------> load the arguments

    response_message["function_call"]["arguments"]

)

products = find_product(function_args.get("sql_query"))   --> call the function. Please note the caller has to call the function them self, API can just return the arguments based on function definition passed.

# Append the function's response to the messages

messages.append(

    {

        "role": "function",

        "name": function_name,

        "content": json.dumps(products),       -------------> this is the output of our function call to DB.

    }

)

# Format the function's response into natural language

response = openai.ChatCompletion.create(

    model="gpt-3.5-turbo-0613",

    messages=messages,  ---> respond back to the user with the message. For this making another call to openai to get better user message

)


chat completion and text completion are 2 different things. Chat completion is more of the chat or conversation oriented response, but the text completion is more of completing the sentence, may not continue on conversation but just complete your sentence. 

In the response, insteado of getting content, we receive "text". In the request we send the prompt. 


Embedding - 

AI model is mathematical functions, and work with numeric inputs. embedding converts these words/tokens into numerical vectors. when you call the embedding api, you receive back vectors, which is array of floats.

result = openai.Embedding.create(

    model="text-embedding-ada-002", input="your text"

)

result['data']['embedding']  --> this is vector


embedding is like interpreter, that translates words and sentence into numbers. similar meaning words mapped closer together in numerical space. 


you can moderate the user input by calling the moderation api of openai. 

# Call the openai Moderation endpoint, with the text-moderation-latest model

response = openai.Moderation.create(

    model="text-moderation-latest",

    input="I want to kill my neighbor.",

)


{

    "id": "modr-7AftIJg7L5jqGIsbc7NutObH4j0Ig",

    "model": "text-moderation-004",

    "results": [

        {

            "categories": {

                "hate": false,

                "hate/threatening": false,

                "self-harm": false,

                "sexual": false,

                "sexual/minors": false,

                "violence": true,

                "violence/graphic": false,

            },

            "category_scores": {

                "hate": 0.0400671623647213,

                "hate/threatening": 3.671687863970874e-06,

                "self-harm": 1.3143378509994363e-06,

                "sexual": 5.508050548996835e-07,

                "sexual/minors": 1.1862029225540027e-07,

                "violence": 0.9461417198181152,

                "violence/graphic": 1.463699845771771e-06,

            },

            "flagged": true,

        }

    ],

}

=============================================

prompt engineering - 

Role 

Context

Task 



zero-shot-CoT strategy - 

CoT - chain of thoughts 

The term zero-shot means the model does not rely on task-specific examples to perform this reasoning; it is ready to handle new tasks based on its general training.

The few-shot learning technique gives examples of inputs with the desired outputs.

one-shot learning. As its name indicates, in this case you provide only one example to help the model execute the task.



==============


========================

Transcript - 

Slide 1- This presentation is to introduce you with the basic concepts of openai and generative ai. This covers few code examples to give you an idea of how the API is invoked. The content are created based on Oreilly book of "Deep dive into GPT-4 and chat GPT API".


Slide 2 -

Before we start on understanding openai and generative AI, lets get our terminology right. AI (Artificial intelligence) is a technique that allows computer to mimic human behaviour. It get powered by Machine learning, which is implementing the algorithm to make system learn. Within Machine learning, there is algorithm/sub-category which is called as "Deep Learning". One of the popular algorithm in Deep Learning is "Transformer". The transformer fundamental algo used to implement the GPT.


Slide 3 - Transformer works on the concept of "Pay Attention", which means focus on the different text element passed in the stream to figure out the context or meaning of the sentence. There are 2 categories in transformer to implement the pay attention-

1) Cross attention - where multiple stream or text elements are used to processed to figure out the meaning/context. 

2) Self attention - where single stream of words is used to figure out the meaning/context. In Self attention the different words in the text is given different weight-age and based on that it process the text and eventually define the meaning.

The reason we talk about this is to relate how GPT is connected with all these. 

AI -> ML -> Deep learning -> Transformer -> Self attention : GPT


Slide 4- 

Data processing is an important concept to understand when we discuss about how your text is getting interpreted by the system. System has 2 mechanism to convert data to process it well.

Encoder - The encoder is responsible to "understand" the data and create some meaningful result out of it. It can output the data after embedding. Embedding is different from encoder. Encoder is more of a brain, which understands the expectation based on our model and then give the data output which is meaningful summary of your input. 


Decoder - Decoder is a step where output is produced. This output could be produced based in pre generated encoded data combining with your input, or it could be 2 streams coming together and passed by encoder and then processed by decoder. GPT works with "decoder only" architecture. It doesn't have any concept of encoder. No wonder it is called as "Pre-Trained".

In GPT "Teacher-Forcing" technique is used to train the system. Which means in your prompt you give examples/conditions and corresponding output you would prefer. The system uses those examples to understand your expectation and then generate answer to your query. 


Slide 5- 

GPT aka Generative Pre-Trained Transformer is based on pre trained (i.e. no encoder) data transformer algorithm. Lets talk about tokens in GPT space. You will hear many times that this particular model of GPT has limit of x tokens. Tokens are used to calculate your cost. But what is the meaning of token exactly. The token is more of breaking the words/sentence into different smaller chunk and then then these chunks are fed into the processor to get the meaning out. The limit of model supporting certain number of token means - that is the window that model will have visibility to and process it. Example GPT4 is supporting almost 32K tokens, which means it will be able to understand roughly 20-25K words to figure out the meaning. If we pass anything beyond this limit, the model wont be able to process it. 

If you have bigger article/ text that you would like to process it, then you need to divide it and then make multiple trips to GPT and then process it.

Lets understand the model in GPT. Model is a kind of algorithm that will be used to process the text and respond. There are 3 categories of these models. 

1. Instruct GPT - All the models in this category are text based request/response. like text-davinci-003. These are meant for 1 ques and then 1 answer. You ask something, and it respond back with the answer, end of story. It is not meant to create context over multiple back and forth.

2. ChatGPT - This model is designed to handle conversations. It can consider your past history of conversations and derive context out of it and then generate the output. gpt-3.5-turbo is an example of chat/conversation based model.

3. GPT4 - This model is optimized and created to support both text and chat. It is advance and has better accuracy and less cost. example - gbt-4.32k. It is kind of combination of both the above models.


Slide 6 - 

Lets review the first API we have to make connection with GPT. Both chatGPT and GPT4 uses the same api endpoint, it is the model inside payload which determines what algorithm will be used to cater this request. Sample code to make a request- 

response = openai.ChatCompletion.create(

    model="gpt-3.5-turbo", --> model defines which algorithm to use to process the input.

    messages=[{"role": "user", "content": "Hello World!"}],  ---> This is where you can send the message to the api.

)

You can read the response provided by API like this - print(response["choices"][0]["message"]["content"])


However the actual response of the API looks like below - 

 "choices": [  ---> Observe the array data structure here. Which means there could be multiple choices. Why and how? I am yet to find out. But at this moment we will stick to reading the first object in this array,

        {

            "finish_reason": "stop", ---> Status of the response. stop means it got completed successfully.

            "index": 0,

            "message": {

                "content": "Hello there! How may I assist you today?", --> text generated by model. Instead of content node, there could be function_call also possible.

                "role": "assistant", --> role will always be assistant.

            },

        }

    ],

    "created": 1681134595,

    "id": "chatcmpl-73mC3tbOlMNHGci3gyy9nAxIP2vsU",   ---> technical identifier used internally by openAI

    "model": "gpt-3.5-turbo", --> model used

    "object": "chat.completion",  ---> always chat.completion

    "usage": {"completion_tokens": 10, "prompt_tokens": 11, "total_tokens": 21}, ---> token used in the call, gives you idea of the cost

}


Slide 7 - 

ChatGPT api also support a integration mechanism by which you can connect with other data source to get the information. You can use the openAI to generate certain payload based on the user query and then use that payload to fetch the results from another data source. With this pattern you require 2 openAI call, first to create the query for data source and 2nd call to format the result well, so as to display to the user. 

user query -> openAI -> result as parameters/DB query -> Use this to call DB/Service -> result -> openAI to format it in user friendly manner -> User

Please note the API itself wont execute any function for you, it will help you create arguments for you function.


Slide 8 -

How do we call openAI for function call. There are steps to follow - 

1) Define the function in your code. 

def find_product(sql_query):

    # Execute query here

    results = [

        {"name": "pen", "color": "blue", "price": 1.99},

        {"name": "pen", "color": "red", "price": 1.78},

    ]

    return results


2) Define function definition for API, which is like educating openAI about your function.

functions = [

    {

        "name": "find_product",

        "description": "Get a list of products from a sql query",

        "parameters": {

            "type": "object",

            "properties": {

                "sql_query": {

                    "type": "string",

                    "description": "A SQL query",

                }

            },

            "required": ["sql_query"],

        },

    }

]


3) call the openAI API with the function definition. Along with the userinput you will also pass the function definition, so that openAI understand that your intention is to execution function as a response, so it will respond back with the format that is usable for your code.

user_question = "I need the top 2 products where the price is less than 2.00"

messages = [{"role": "user", "content": user_question}]

# Call the openai.ChatCompletion endpoint with the function definition

response = openai.ChatCompletion.create(

        model="gpt-3.5-turbo-0613", messages=messages, functions=functions  -----------> you pass the function definition here.

)



Slide 9 - 

Once openAI has processed your request, it will respond back in the same format as before, only difference is the format or structure of the response itself. The response has arguments node, which contain the actual argument you can use to call the function itself. Now this function you can use to make another HTTP call or make DB query, but openAI wont call those itself, it will just give right information to you, which you can use to call these downstreams.


Slide 10 - 

Embedding and encoding are 2 different things. Embedding is nothing but converting the input/text into number format. As we know the AI model is mathematical and it requires numbers to work with. Hence before we pass any text to these models, we need to convert them into numbers. Converting text into number is embedding. There are APIs that are available to create embedding.


result = openai.Embedding.create(

    model="text-embedding-ada-002", input="your text"

)

result['data']['embedding']  --> this is vector



Slide 11 - 

Its important to moderate your API to avoid misuse by consumer. openAI provides you to understand user input and figure out the intention of that text. Using the moderation API call before processing any further will protect you api.

response = openai.Moderation.create(

    model="text-moderation-latest",

    input="I want to hack my office security system.",

)

Response structure - 

{

    "id": "modr-7AftIJg7L5jqGIsbc7NutObH4j0Ig",

    "model": "text-moderation-004",

    "results": [

        {

            "categories": {

                "hate": false,

                "hate/threatening": true,

                "self-harm": false,

                "sexual": false,

                "sexual/minors": false,

                "violence": false,

                "violence/graphic": false,

            },

            "category_scores": {

                "hate": 0.0400671623647213,

                "hate/threatening": 3.671687863970874e-06,

                "self-harm": 1.3143378509994363e-06,

                "sexual": 5.508050548996835e-07,

                "sexual/minors": 1.1862029225540027e-07,

                "violence": 0.9461417198181152,

                "violence/graphic": 1.463699845771771e-06,

            },

            "flagged": true,

        }

    ],

}

As you can see there are different categoies and corresponding score of those category here. you can use the score to understand the violation and give user a response back.