Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions Content/posts/2022-07-10-python-pickle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
---
title: Serialize Python objects using Pickle
date: 2022-07-10 00:00
description: Learn how to save your complex python objects on disk using built-in pickle module.
tags: Python, Basic
path: python-pickle
author: Pratik Choudhari
---

The ability to save complex programming data structures onto disk is one of the major advantages of using Python.
The process is called Serialization and in this article we are going to take a deep dive into Python's [`Pickle`](https://docs.python.org/3/library/pickle.html) module included in standard libraries.

## What is object serialization?

Object serialization is a technique through which a programming language can convert its data structures into a format which can be persisted on disk and sent via a network.
The process of convert the objects into bytes is often referred to as Marshalling or Pickling, the latter term is used when `pickle` module is used to perform the task.
A well defined serialization protocol not only successfully converts objects into bytes but can also deconstruct these bytes back into language specific constructs.
Modules such as marshall, pickle, dill and joblib are a few libraries in Python that can serialize data.

## The `Pickle` module

The `Pickle` module serializes objects in a Python specific manner, which means unlike JSON which is compatible with multiple languages and technologies pickled objects can only we worked on in Python.
Apart from common data structures like dictionary, tuple, list and set, users can serialize third party data structures like a pandas dataframe, numpy arrays.

### Pickling protocols

There six protocols used by `Pickle` for the data conversion, tha latest protocol is 5 introduced in Python 3.8.
Every new version of a protocol brought new types of objects which can be serialized and improved conversion speed.

- Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.
- Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
- Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes.
- Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.
- Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8.
- Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data.

Use `pickle.DEFAULT_PROTOCOL` to get the protocol used by `Pickle` for the python version in use.

### Pickling objects

The module comprises of four methods:

1. `pickle.dump()`
2. `pickle.load()`
3. `pickle.dumps()`
4. `pickle.loads()`

The functions with an s at the end work with strings whereas the other work with file handlers.
We will now understand these functions using code examples.

**1. pickle.dump()**

The arguments accepted are:
1. data: required, object to serialize
2. file: required, file handler
3. protocol: optional, pickle protocol version to use
4. fix_imports: optional, if True pickle tries to map Python2 names to Python3

```python
import pickle
from pandas import DataFrame

obj = {"list": [1, 2], "tuple": (1, 2), "set": {1, 2}, "dict": {1: 3, 2: 4},
"dataframe": DataFrame({"first name": ["john"], "last name": ["doe"]})}

print(f"Using protocol {pickle.DEFAULT_PROTOCOL} to serialize")
with open("my_pickle.pkl", "wb") as fp:
pickle.dump(obj, fp)
print("Done")
```

Output on Python 3.6:

```console
Using protocol 3 to serialize
Done
```

This script can run as is, on execution a my_pickle.pkl file will be created in the worling directory.
To create a pickle, a file needs to be opened in write-binary mode as pickle converts data to bytes.
When a function or class is pickled, the pickled version can not reconstruct the function and class because pickle will store only a reference to the object rather than the contents.

**2. pickle.load()**

This function does deserialization, opposite of what pickle.dump() does. The version of pickle protocol is detected automatically.

The arguments accepted are:
1. file: required, file handler
2. fix_imports: optional, if True pickle tries to map Python2 names to Python3
3. encoding: optional, tell pickle how to decode 8-bit string instances pickled by Python 2

```python
import pickle

with open("my_pickle.pkl", "rb") as fp:
obj = pickle.load(fp)

for key, value in obj.items():
print(f"Key={key} Value={value} Type={type(value)}")
```

Output:

```console
Key=list Value=[1, 2] Type=<class 'list'>
Key=tuple Value=(1, 2) Type=<class 'tuple'>
Key=set Value={1, 2} Type=<class 'set'>
Key=dict Value={1: 3, 2: 4} Type=<class 'dict'>
Key=dataframe Value= first name last name
0 john doe Type=<class 'pandas.core.frame.DataFrame'>
```

**3. pickle.dumps()**

Where `pickle.dump()` writes data into files, `pickle.dumps()` returns a string representation of pickled information.

The arguments accepted are:
1. data: required, object to serialize
2. protocol: optional, pickle protocol version to use
3. fix_imports: optional, if True pickle tries to map Python2 names to Python3

```python
import pickle
from pandas import DataFrame

obj = {"list": [1, 2]}

print(f"Using protocol {pickle.DEFAULT_PROTOCOL} to serialize")
print(pickle.dumps(obj))
```

Output on Python 3.6:

```console
Using protocol 3 to serialize
b'\x80\x03}q\x00X\x04\x00\x00\x00listq\x01]q\x02(K\x01K\x02es.'
```

**4. pickle.loads()**

The string returned by `pickle.dumps()` is passed as input for `pickle.loads()` and returns a python object reconstructed from the pickle information.

The arguments accepted are:
1. data: required, pickled information as string
2. fix_imports: optional, if True pickle tries to map Python2 names to Python3
3. encoding: optional, tell pickle how to decode 8-bit string instances pickled by Python 2

```python
import pickle

print(pickle.loads(b'\x80\x03}q\x00X\x04\x00\x00\x00listq\x01]q\x02(K\x01K\x02es.'))
```

Output:

```console
{'list': [1, 2]}
```
119 changes: 119 additions & 0 deletions Content/posts/2022-07-11-python-requests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
---
title: A guide to requests module in Python
date: 2022-07-11 00:00
description: An easy way to make REST calls via an easy to use interface library.
tags: Python, Basic
path: python-requests
author: Pratik Choudhari
---

To make REST API calls, Python has the [`urllib`](https://docs.python.org/3/library/urllib.html) standard module but it is has a relatively difficult interface as compared to the third-party [`requests`](https://requests.readthedocs.io/en/latest/) library.
In this article, we will learn how to use `requests` to make different types of REST calls, attach payloads and parse the response.

## Installation

`requests` can be installed via PyPi using the pip package manager. It is recommended to install third-party libraries in virtual environments.

For Linux:

```console
pip3 install requests
```

For Windows:

```console
pip install requests
```

## Making a request

In total six types of HTTP requests are supported by the module, which are:
1. GET
2. POST
3. PUT
4. DELETE
5. HEAD
6. OPTIONS

Lets make a simple get request.

```python
import requests

response = requests.get("https://hub.dummyapis.com/employee?noofRecords=10&idStarts=1001")
print(response.text)
```

That's it! Similarly, to make other types of requests replace the method name by the HTTP verb like `requests.post()`

## Send data via query parameters, body or headers

Every HTTP `requests` method takes three optional parameters:
- data: dict, json data to be passed in request body
- params: dict, json data which will be later added as query parameter in request URL
- headers: dict, set custom headers or override default ones

Example:

```python
import requests

query_params = {"noofRecords": 10, "idStarts": 1001}
headers = {"user-agent": "my-app/0.0.1"}
body = {}

response = requests.get("https://hub.dummyapis.com/employee", params=query_params, headers=headers, data=body)
print(response.text)
```

## Analyzing response

A `requests.Response` object is returned on a successful API call.
The status code of a response can be checked via `response.status_code` property which is an integer and will be one of the [HTTP response status codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status).

Next, we will see the different ways in which response can be parsed. Particularly, there are 3 methods to retrieve the response data:
- `response.json()`: returns a evaluated version of json data
- `response.text`: returns the response as a string
- `response.content`: returns response data in the form of bytes
- `response.headers`: returns a dictionary of response headers

Example:

```python
import requests
from pprint import pprint

query_params = {"noofRecords": 1, "idStarts": 1}

response = requests.get("https://hub.dummyapis.com/employee", params=query_params)
print("Response status code:", response.status_code)
print("Response as json:")
pprint(response.json())

print("Response as text:")
print(response.text)

print("Response as bytes:")
print(response.content)

print("Response headers:")
print(response.headers)
```

## Authentication

Basic HTTP authentication can be used out-of-the-box with requests.
This is done by passing the username and password as tuple in the `auth` parameter or by creating an object of `requests.auth.HTTPBasicAuth`.

```python
import requests
from requests.auth import HTTPBasicAuth

basic = HTTPBasicAuth('user', 'pass')
response = requests.get("http://dummy-api.com/records", auth=("user", "pass"))

# OR

response = requests.get("http://dummy-api.com/records", auth=basic)
```
96 changes: 96 additions & 0 deletions Content/posts/2022-07-20-python-httpx.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
---
title: Getting started with the HTTPX module in Python
date: 2022-07-20 00:00
description: A beginners guide to an async capable request library.
tags: Python, Basic
path: python-httpx
author: Pratik Choudhari
---

[Httpx](https://www.python-httpx.org/) is a module capable of making async requests in Python unlike the built-in urllib or third party requests module which have only sync capablities.
In this article, we will explore how async requests are made using the library.

## Installation

Httpx is supported in Python3.6 and above. Install httpx via pip

```python
pip install httpx
```

Note: Linux users should use pip3 instead of pip

## Sync requests

The sync interface of httpx is same as the requests library except when the response is a stream we will look into this in a section below.
To get started with the requests library, check out this comprehensive guide.

Example:

```python
import httpx

response = httpx.get("https://hub.dummyapis.com/employee?noofRecords=10&idStarts=1001")
print(response.text)
```

Similarly, calls for other request types can be made.

## Async requests

This is where httpx shines. In total, three async environments are supported namely asyncio, trio and anyio of these asyncio is built-in and other two need to be installed explicitly.
Unlike sync requests, async requests need a `AsyncClient` to be initialized before making any calls, this has added advantages such as HTTP connection pooling, cookie persistence and reduced memory and CPU usage.

```python
import asyncio
import httpx

async def main():
async with httpx.AsyncClient() as aclient:
response = await aclient.get("https://hub.dummyapis.com/employee?noofRecords=10&idStarts=1001")
print(response.text)

asyncio.run(main())
```

As we are using a async client the request call needs to be awaited.
It is recommended to use `AsyncClient` in a context manager, the connection can be closed manually using `aclient.aclose()`.

## Streaming requests

To make a request call the response for which would be a stream, the http verb method can not be used directly instead the `stream()` method is used.

```python
import httpx

data = b''
with httpx.stream("GET", "https://speed.hetzner.de/100MB.bin") as response:
for chunk in response.iter_bytes():
data += chunk
print(data)
```

The variable `data` will contain response data as bytes.

Streaming response can be iterated in four ways:
- `response.iter_raw()`: read data raw without decoding
- `response.iter_text()`: read data as text
- `response.iter_bytes()`: read data as bytes
- `response.iter_lines()`: read data as lines of text

Async version of streaming response:

```python
import asyncio
import httpx

async def main():
data = b''
async with httpx.AsyncClient() as aclient:
async with aclient.stream("GET", "https://speed.hetzner.de/100MB.bin") as response:
async for chunk in response.aiter_bytes():
data += chunk
print(data)

asyncio.run(main())
```