Data Science Blog . Research . Personal Webpage

Serving Keras models in golang

KERAS in golang                Keras Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. Why use Keras? Keras offers consistent & simple APIs, it minimizes the number of user actions required for common use cases, and it provides clear and actionable feedback upon user error.

Design Pattern Tricks for PySpark

Hi there! Apache Spark has been written in Scala originally, although Python developers are loving it’s wrapper-known as PySpark. One can work with RDD’s and dataframes in Python too. We,data science team @Talentica, love PySpark and mostly rely on Spark Clusters for data processing and other relevant stuffs. Recently, we faced one challange which is very important to be addressed. Spark Context Whenever in need, one can initialize the Spark Context in their py file and reuse it.

AWS IOT Project with ESP8266

Hello there! Today I have completed a POC using ESP8266 as a slave as well as client model which will interface between hardware to provide IOT support. I wish to make this project compatible with AWS so that anyone can deploy their models to S3 and it can serve as a micro service via EC2. I am looking for some research enthusiast people who can work with me. Requisite: C++,Scala,Python,AWS Architecture

Text Clustering at scale

Hello there! Today we will explore the overview of databricks © clusters and how to run the model using community account. Prerequisite Basic understanding of programming in Python or Scala. Knowledge or experience in Java, SQL, PySpark can be beneficial but is not essential. Objective After reading this blog, readers will be able to: Use the core Spark APIs to operate on text data. Build data pipelines and query large data sets using Spark SQL and DataFrames.

Data Fingerprinting with Text Data

Query : What would be the correct methodology to detect whether a statement or a word means a “yes” or a “no” using NLP? Answer : I can’t tell the precise solution. Since it is the main essence of my research[1],nonetheless I can provide you some heads up. Our research provides an accuracy of 85–90% when tested with unknown data. Here you can assume that the testing data comprises of such sentences or excepts which can be answered by Yes or No.

Learning NLP

Query : How do one learn natural language processing from scratch? Answer : Learn in a classical way: 1. Introduction to information retrieval (Manning ) 2. Foundation of Statistical Natural language processing (Manning ) Those both books from Manning will give you base and cover all the topics needed. Apart from that if you want you can cover a course of Natural language processing by Pushpak Bhattacharyya on nptel .

Handling Large text data in production

Question: I have a 5GB a single JSON file which consists of million of queries (questions) with its answers. Can anyone please tell me, 1) How to handle such kind of large datasets? 2) Objective is to find out the top 5 similar query along with the answers which is similar to the provided a new test query. Answer: I can suggest one option which I implemented. Pyspark for data processing If you want to make this simple, you can use pyspark to load the dataset and perform operations accordingly.