Prashant Gupta

Machine Learning Interview | Basics | Quick revision

Here it goes

  1. Deep learning Error Analysis (Bias variance, train/train-dev/dev/test) Continue Reading...


StegoSOC AI driven threat detection

This post is about how we are trying to automate the threat detection for an enterprise be it public, private or hybrid cloud setup.

Real time threat detection is imperative with GDPR and other cyberlaws. Threat detection is always been an expensive solution to implement. It requires CSO and Security Analyst to implement the solution.

Continue Reading...

OCR sample results

Demonstrating OCR results

Continue Reading...

Myntra challenge

I Participated in Myntra Data Science Challenge. The challenge was about predicting the graphic type of T-shirt given the image of T-shirt. From Business point of view, Knowing that in e-commerce industry, there is a lot of dead-stock, this can be used to rotate products in Inventory thereby reducing loss to company and enhance customer experience as well by giving them enough options to choose from latest trendy T-shirts that people are wearing in Social Media.

Continue Reading...

Transferring file from my MAC to MSI Windows laptop - GPU

Prepare the PC for the files and find the PC’s network address

On the Windows 10 PC

If you’re migtating from a MAC to a Windows 10 PC, follow these steps to create and then share a folder, and then find your PC’s IP address:

Continue Reading...

Sentimental Analysis

Sample sentence 1

Sample Sentence 2

Continue Reading...

Probability Basic Questions

  • Always remember in probability questions, if multiple objects are being picked simulataneously or one by one. for practice try this and this

  • Circular permutation (n-1)! , how? solve this


Context free Grammer

Parsing:

Input - Sentence Output - Parsed tree

Parsing is a supervised machine learning problem. Training can be achieved by Treebank which consists of several sentences and their associated parsed trees. One example is Penn WSJ Treebank

The leaf nodes makes up a sentences. THEN part of speech tagging THEN PHRASES/CONSTITUENTS.

NP - noun phrase VP - verb phrase DT - Determiner S - Sentence V - Verb N - Noun


ROC Curve...AUC...What is that?

YouTube explainer Video

Visualization

Research paper


Confusion Matrix...confusing ?


Serving Flask with Nginx + uWSGI

How this setup works?

Flask is managed by uWSGI.

uWSGI talks to nginx.

nginx handles contact with the outside world.

When a client connects to your server trying to reach your Flask app:

  1. nginx opens the connection and proxies it to uWSGI

  2. uWSGI handles the Flask instances you have and connects one to the client

  3. Flask talks to the client happily

Continue Reading...

Cross Validation techniques

Cross validation is a model evaluation method that is better than residuals. The problem with residual evaluations is that they do not give an indication of how well the learner will do when it is asked to make new predictions for data it has not already seen. One way to overcome this problem is to not use the entire data set when training a learner. Some of the data is removed before training begins. Then when training is done, the data that was removed can be used to test the performance of the learned model on new data. This is the basic idea for a whole class of model evaluation methods called cross validation.

Know more techniques? Start here


Machine learning evaluation metrics

One method of judging the quality of a particular model is by residuals. That means the model is fit using all the data points and the prediction for each data point is compared with its actual output. The absolute value of each error is taken and the mean of those values is computed to arrive at the mean absolute residual error. Models with lower values of this measure are deemed to be better.

There are always a plethora of metrics in machine learning that can be used to evaluate the performance of a ML model. This is an attempt to draw a metric map, just to keep them all in one place.

Continue Reading...

Machine learning algorithms map

Machine Learning Algorithms in one glance.

Continue Reading...

Benchmarking right way

Table of contents

  1. Benchmarking right way - Introduction
    1. What is?
      1. Latency
      2. Throughput
      3. Packet Loss
      4. Processing time
      5. Resposne time
  2. Benchmarking time
    1. Network Latency
      1. Measure by ping
      2. Meaure by flent
    2. Test using curl
    3. Benchmark POST API using ab
    4. Benchmark POST API using wrk
  3. Memory benchmark
  4. CPU benchmark

Benchmarking right way - Introduction

Benchmakring API should be majorly concerned with time, cpu & memory. One should also be concerned about the number of concurrent connections the API can handle in Prod env. This post will help you understand the mechanisms of network slowdowns.

Continue Reading...

Read these books

MySQL

Continue Reading...

Scalability

Reliability…what is that?

“continuing to work correctly, even when things go wrong.” The things that can go wrong are called faults, and systems that anticipate faults and can cope with them are called fault-tolerant or resilient. The former term is slightly misleading: it suggests that we could make a system tolerant of every possible kind of fault, which in reality is not feasible. If the entire planet Earth (and all servers on it) were swallowed by a black hole, tolerance of that fault would require web hosting in space—good luck getting that budget item approved. So it only makes sense to talk about tolerating certain types of faults. Note that a fault is not the same as a failure. A fault is usually defined as one component of the system deviating from its spec, whereas a failure is when the system as a whole stops providing the required service to the user. It is impossible to reduce the probability of a fault to zero; therefore it is usually best to design fault-tolerance mechanisms that prevent faults from causing failures.

Continue Reading...

Writing secure python applications

Table of contents

  1. Injection Attacks
    1. SQL Injection
    2. XML Injection
    3. Command Injection
  2. Broken authentication & session management
    1. Session fixation
    2. Use of Insufficiently random values
  3. Cross site scripting
    1. Reflected XSS
    2. Persistent XSS
    3. Document Object Model (DOM) Based XSS
  4. Insecure direct object references
    1. Directory (Path) Traversal
  5. Security Misconfiguration
    1. Privileged Interface Exposure
    2. Leftover debug code
  6. Sensitive data exposure
    1. Authentication credentials in URL
    2. Session Exposure within URL
    3. User Enumeration
  7. Missing function level access control
    1. Horizontal Privilege Escalation
    2. Vertical privileage Escalation
  8. Cross site request forgery
    1. Cross site request forgery(POST)
    2. Cross site request forgery(GET)
    3. Click Jacking
  9. Unvalidated redirects & Forwards
    1. Insecure URL redirect
Continue Reading...

System design

Table of contents

  1. Software Design Theoritical concepts - Introduction
    1. CRC Card
    2. Four concepts revolving around OOP
    3. Coupling & Cohesion
    4. Separation of concerns
    5. SOLID
      1. The Single Responsibility Principle
      2. The Open Closed Principle
      3. The Liskov Substitution Principle
      4. The Interface Segretation Principle
      5. The Dependency Inversion Principle
  2. Class Diagrams
    1. Tool to draw UML
    2. UML class diagram rules
  3. Sample UML Diagrams examples
  4. Object oriented cheat-sheet
  5. References
Continue Reading...

Using Anaconda in CI/CD pipeline

To successfully build the pipeline, it was required to automate all the yes invocation while executing Anaconda sh file.

I did by invoking sh file with -b option bash Anaconda2-5.0.1-Linux-x86_64.sh -b


Shipping machine learning modules in a single executable

You have done all your research, prototyped it, optimize it and now you are ready to ship it. This post not only focusses on shipping machine learning modules but python based codebases in general.

How do you ship ?

  1. Expose an API
  2. Package your code in a single executable
Continue Reading...

Feature Extraction

To start with

feature selection: select a subset of the original feature set.

feature extraction: build new set of features from original feature set.


Language Modelling in NLP

What is Language Modelling ?

Language modeling in very simple terms is the task of assigning a probability to sentences in a language. Besides assigning a probability to each sequence of words, the language models also assigns a probability for the likelihood of a given word (or a sequence of words) to follow a sequence of words.

Continue Reading...

All you need to know about Word2vec

Why is word embedding needed?

Purpose is to create a representation for words that capture their meanings, semantic relationships and the different types of contexts they are used in.

And all of these are implemented by using Word Embeddings or numerical representations of texts so that computers may handle them.

What is word embedding ?

In very simplistic terms, Word Embeddings are the texts converted into numbers and there may be different numerical representations of the same text. But before we dive into the details of Word Embeddings, the following question should be asked – Why do we need Word Embeddings?

As it turns out, many Machine Learning algorithms and almost all Deep Learning Architectures are incapable of processing strings or plain text in their raw form. They require numbers as inputs to perform any sort of job, be it classification, regression etc. in broad terms. And with the huge amount of data that is present in the text format, it is imperative to extract knowledge out of it and build applications. Some real world applications of text applications are – sentiment analysis of reviews by Amazon etc., document or news classification or clustering by Google etc.

Let us now define Word Embeddings formally. A Word Embedding format generally tries to map a word using a dictionary to a vector. Let us break this sentence down into finer details to have a clear view.

Take a look at this example – sentence=” Word Embeddings are Word converted into numbers ”

A word in this sentence may be “Embeddings” or “numbers ” etc.

A dictionary may be the list of all unique words in the sentence. So, a dictionary may look like – [‘Word’,’Embeddings’,’are’,’Converted’,’into’,’numbers’]

A vector representation of a word may be a one-hot encoded vector where 1 stands for the position where the word exists and 0 everywhere else. The vector representation of “numbers” in this format according to the above dictionary is [0,0,0,0,0,1] and of converted is[0,0,0,1,0,0].

This is just a very simple method to represent a word in the vector form. Let us look at different types of Word Embeddings or Word Vectors and their advantages and disadvantages over the rest.

Different types of Word Embeddings

The different types of word embeddings can be broadly classified into two categories-

  1. Frequency based Embedding
  2. Prediction based Embedding

Let us try to understand each of these methods in detail.

2.1 Frequency based Embedding

There are generally three types of vectors that we encounter under this category.

Count Vector TF-IDF Vector Co-Occurrence Vector

Let us look into each of these vectorization methods in detail.

To get a better semantic understanding of a word, word2vec was published for nlp community.

Continue Reading...

All you need to know about BOW

Feature Extraction from texts using Bag of words

The bag of words model ignores grammar and order of words. ‘All my cats in a row’, ‘When my cat sits down, she looks like a Furby toy!’,

Breaking down the given sentences into words and assigning them each a unique ID

Continue Reading...

Visual recognition

I am really fascinated by the subject of this broad research topic. So, I decided to play around things and this post is a serially arranged attempts of mine into visual recognition.

What inspired me ?

Dr. Fei Fei Li with her TED talk

Continue Reading...

Basics of probability and stats

Data ?

Data are pieces of information about individuals organized into variables. By an individual, we mean a particular person or object. By a variable, we mean a particular characteristic of the individual.

Variables can be classified into one of two types: categorical or quantitative.

Continue Reading...

Striving better at Competitive programming

I stepped into competitve programming in my college. I started from SPOJ attempted Life, the Universe, and Everything and wola got compilation errror :laughing:

Anyways, that was a learning curve and I continued with other platforms like Codechef and Codeforces along with SPOJ. Topcoder problems were tough then and now as well :wink:

In the spirit of making myself a better developer, I am releasing all my submitted solutions of various problems on all platforms. The main reason behind putting all my codes at one place Continue Reading...


Git utility commands

  1. Ignore file/folder while commit

    For a File For a folder Continue Reading...


Linux utility commands

To monitor CPU Usage on any Linux distribution, use

Continue Reading...