From ba0f250c80e3ee53bd452f264ac1b0c18b36be23 Mon Sep 17 00:00:00 2001 From: Pratik Choudhari <40862682+pratik-choudhari@users.noreply.github.com> Date: Sun, 10 Jul 2022 17:36:43 +0530 Subject: [PATCH 1/3] Create 2022-07-10-python-pickle.md --- Content/posts/2022-07-10-python-pickle.md | 158 ++++++++++++++++++++++ 1 file changed, 158 insertions(+) create mode 100644 Content/posts/2022-07-10-python-pickle.md diff --git a/Content/posts/2022-07-10-python-pickle.md b/Content/posts/2022-07-10-python-pickle.md new file mode 100644 index 0000000..22846ac --- /dev/null +++ b/Content/posts/2022-07-10-python-pickle.md @@ -0,0 +1,158 @@ +--- +title: Serialize Python objects using Pickle +date: 2022-07-10 00:00 +description: Learn how to save your complex python objects on disk using built-in pickle module. +tags: Python, Basic +path: python-pickle +author: Pratik Choudhari +--- + +The ability to save complex programming data structures onto disk is one of the major advantages of using Python. +The process is called Serialization and in this article we are going to take a deep dive into Python's [`Pickle`](https://docs.python.org/3/library/pickle.html) module included in standard libraries. + +## What is object serialization? + +Object serialization is a technique through which a programming language can convert its data structures into a format which can be persisted on disk and sent via a network. +The process of convert the objects into bytes is often referred to as Marshalling or Pickling, the latter term is used when `pickle` module is used to perform the task. +A well defined serialization protocol not only successfully converts objects into bytes but can also deconstruct these bytes back into language specific constructs. +Modules such as marshall, pickle, dill and joblib are a few libraries in Python that can serialize data. + +## The `Pickle` module + +The `Pickle` module serializes objects in a Python specific manner, which means unlike JSON which is compatible with multiple languages and technologies pickled objects can only we worked on in Python. +Apart from common data structures like dictionary, tuple, list and set, users can serialize third party data structures like a pandas dataframe, numpy arrays. + +### Pickling protocols + +There six protocols used by `Pickle` for the data conversion, tha latest protocol is 5 introduced in Python 3.8. +Every new version of a protocol brought new types of objects which can be serialized and improved conversion speed. + +- Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python. +- Protocol version 1 is an old binary format which is also compatible with earlier versions of Python. +- Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. +- Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7. +- Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. +- Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. + +Use `pickle.DEFAULT_PROTOCOL` to get the protocol used by `Pickle` for the python version in use. + +### Pickling objects + +The module comprises of four methods: + +1. `pickle.dump()` +2. `pickle.load()` +3. `pickle.dumps()` +4. `pickle.loads()` + +The functions with an s at the end work with strings whereas the other work with file handlers. +We will now understand these functions using code examples. + +**1. pickle.dump()** + +The arguments accepted are: +1. data: required, object to serialize +2. file: required, file handler +3. protocol: optional, pickle protocol version to use +4. fix_imports: optional, if True pickle tries to map Python2 names to Python3 + +```python +import pickle +from pandas import DataFrame + +obj = {"list": [1, 2], "tuple": (1, 2), "set": {1, 2}, "dict": {1: 3, 2: 4}, + "dataframe": DataFrame({"first name": ["john"], "last name": ["doe"]})} + +print(f"Using protocol {pickle.DEFAULT_PROTOCOL} to serialize") +with open("my_pickle.pkl", "wb") as fp: + pickle.dump(obj, fp) +print("Done") +``` + +Output on Python 3.6: + +```console +Using protocol 3 to serialize +Done +``` + +This script can run as is, on execution a my_pickle.pkl file will be created in the worling directory. +To create a pickle, a file needs to be opened in write-binary mode as pickle converts data to bytes. +When a function or class is pickled, the pickled version can not reconstruct the function and class because pickle will store only a reference to the object rather than the contents. + +**2. pickle.load()** + +This function does deserialization, opposite of what pickle.dump() does. The version of pickle protocol is detected automatically. + +The arguments accepted are: +1. file: required, file handler +2. fix_imports: optional, if True pickle tries to map Python2 names to Python3 +3. encoding: optional, tell pickle how to decode 8-bit string instances pickled by Python 2 + +```python +import pickle + +with open("my_pickle.pkl", "rb") as fp: + obj = pickle.load(fp) + +for key, value in obj.items(): + print(f"Key={key} Value={value} Type={type(value)}") +``` + +Output: + +```console +Key=list Value=[1, 2] Type= +Key=tuple Value=(1, 2) Type= +Key=set Value={1, 2} Type= +Key=dict Value={1: 3, 2: 4} Type= +Key=dataframe Value= first name last name +0 john doe Type= +``` + +**3. pickle.dumps()** + +Where `pickle.dump()` writes data into files, `pickle.dumps()` returns a string representation of pickled information. + +The arguments accepted are: +1. data: required, object to serialize +2. protocol: optional, pickle protocol version to use +3. fix_imports: optional, if True pickle tries to map Python2 names to Python3 + +```python +import pickle +from pandas import DataFrame + +obj = {"list": [1, 2]} + +print(f"Using protocol {pickle.DEFAULT_PROTOCOL} to serialize") +print(pickle.dumps(obj)) +``` + +Output on Python 3.6: + +```console +Using protocol 3 to serialize +b'\x80\x03}q\x00X\x04\x00\x00\x00listq\x01]q\x02(K\x01K\x02es.' +``` + +**4. pickle.loads()** + +The string returned by `pickle.dumps()` is passed as input for `pickle.loads()` and returns a python object reconstructed from the pickle information. + +The arguments accepted are: +1. data: required, pickled information as string +2. fix_imports: optional, if True pickle tries to map Python2 names to Python3 +3. encoding: optional, tell pickle how to decode 8-bit string instances pickled by Python 2 + +```python +import pickle + +print(pickle.loads(b'\x80\x03}q\x00X\x04\x00\x00\x00listq\x01]q\x02(K\x01K\x02es.')) +``` + +Output: + +```console +{'list': [1, 2]} +``` From 7a2be56155197eb7f660c7ca83ca5e1c3505c700 Mon Sep 17 00:00:00 2001 From: Pratik Choudhari <40862682+pratik-choudhari@users.noreply.github.com> Date: Mon, 11 Jul 2022 20:34:19 +0530 Subject: [PATCH 2/3] Create 2022-07-11-python-requests.md --- Content/posts/2022-07-11-python-requests.md | 119 ++++++++++++++++++++ 1 file changed, 119 insertions(+) create mode 100644 Content/posts/2022-07-11-python-requests.md diff --git a/Content/posts/2022-07-11-python-requests.md b/Content/posts/2022-07-11-python-requests.md new file mode 100644 index 0000000..269143f --- /dev/null +++ b/Content/posts/2022-07-11-python-requests.md @@ -0,0 +1,119 @@ +--- +title: A guide to requests module in Python +date: 2022-07-11 00:00 +description: An easy way to make REST calls via an easy to use interface library. +tags: Python, Basic +path: python-requests +author: Pratik Choudhari +--- + +To make REST API calls, Python has the [`urllib`](https://docs.python.org/3/library/urllib.html) standard module but it is has a relatively difficult interface as compared to the third-party [`requests`](https://requests.readthedocs.io/en/latest/) library. +In this article, we will learn how to use `requests` to make different types of REST calls, attach payloads and parse the response. + +## Installation + +`requests` can be installed via PyPi using the pip package manager. It is recommended to install third-party libraries in virtual environments. + +For Linux: + +```console +pip3 install requests +``` + +For Windows: + +```console +pip install requests +``` + +## Making a request + +In total six types of HTTP requests are supported by the module, which are: +1. GET +2. POST +3. PUT +4. DELETE +5. HEAD +6. OPTIONS + +Lets make a simple get request. + +```python +import requests + +response = requests.get("https://hub.dummyapis.com/employee?noofRecords=10&idStarts=1001") +print(response.text) +``` + +That's it! Similarly, to make other types of requests replace the method name by the HTTP verb like `requests.post()` + +## Send data via query parameters, body or headers + +Every HTTP `requests` method takes three optional parameters: +- data: dict, json data to be passed in request body +- params: dict, json data which will be later added as query parameter in request URL +- headers: dict, set custom headers or override default ones + +Example: + +```python +import requests + +query_params = {"noofRecords": 10, "idStarts": 1001} +headers = {"user-agent": "my-app/0.0.1"} +body = {} + +response = requests.get("https://hub.dummyapis.com/employee", params=query_params, headers=headers, data=body) +print(response.text) +``` + +## Analyzing response + +A `requests.Response` object is returned on a successful API call. +The status code of a response can be checked via `response.status_code` property which is an integer and will be one of the [HTTP response status codes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status). + +Next, we will see the different ways in which response can be parsed. Particularly, there are 3 methods to retrieve the response data: +- `response.json()`: returns a evaluated version of json data +- `response.text`: returns the response as a string +- `response.content`: returns response data in the form of bytes +- `response.headers`: returns a dictionary of response headers + +Example: + +```python +import requests +from pprint import pprint + +query_params = {"noofRecords": 1, "idStarts": 1} + +response = requests.get("https://hub.dummyapis.com/employee", params=query_params) +print("Response status code:", response.status_code) +print("Response as json:") +pprint(response.json()) + +print("Response as text:") +print(response.text) + +print("Response as bytes:") +print(response.content) + +print("Response headers:") +print(response.headers) +``` + +## Authentication + +Basic HTTP authentication can be used out-of-the-box with requests. +This is done by passing the username and password as tuple in the `auth` parameter or by creating an object of `requests.auth.HTTPBasicAuth`. + +```python +import requests +from requests.auth import HTTPBasicAuth + +basic = HTTPBasicAuth('user', 'pass') +response = requests.get("http://dummy-api.com/records", auth=("user", "pass")) + +# OR + +response = requests.get("http://dummy-api.com/records", auth=basic) +``` From 62d880ebbf57e75cc8e2e9f3c6cc309294d299ab Mon Sep 17 00:00:00 2001 From: Pratik Choudhari <40862682+pratik-choudhari@users.noreply.github.com> Date: Wed, 20 Jul 2022 22:36:21 +0530 Subject: [PATCH 3/3] Create 2022-07-20-python-httpx.md --- Content/posts/2022-07-20-python-httpx.md | 96 ++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 Content/posts/2022-07-20-python-httpx.md diff --git a/Content/posts/2022-07-20-python-httpx.md b/Content/posts/2022-07-20-python-httpx.md new file mode 100644 index 0000000..12818cf --- /dev/null +++ b/Content/posts/2022-07-20-python-httpx.md @@ -0,0 +1,96 @@ +--- +title: Getting started with the HTTPX module in Python +date: 2022-07-20 00:00 +description: A beginners guide to an async capable request library. +tags: Python, Basic +path: python-httpx +author: Pratik Choudhari +--- + +[Httpx](https://www.python-httpx.org/) is a module capable of making async requests in Python unlike the built-in urllib or third party requests module which have only sync capablities. +In this article, we will explore how async requests are made using the library. + +## Installation + +Httpx is supported in Python3.6 and above. Install httpx via pip + +```python +pip install httpx +``` + +Note: Linux users should use pip3 instead of pip + +## Sync requests + +The sync interface of httpx is same as the requests library except when the response is a stream we will look into this in a section below. +To get started with the requests library, check out this comprehensive guide. + +Example: + +```python +import httpx + +response = httpx.get("https://hub.dummyapis.com/employee?noofRecords=10&idStarts=1001") +print(response.text) +``` + +Similarly, calls for other request types can be made. + +## Async requests + +This is where httpx shines. In total, three async environments are supported namely asyncio, trio and anyio of these asyncio is built-in and other two need to be installed explicitly. +Unlike sync requests, async requests need a `AsyncClient` to be initialized before making any calls, this has added advantages such as HTTP connection pooling, cookie persistence and reduced memory and CPU usage. + +```python +import asyncio +import httpx + +async def main(): + async with httpx.AsyncClient() as aclient: + response = await aclient.get("https://hub.dummyapis.com/employee?noofRecords=10&idStarts=1001") + print(response.text) + +asyncio.run(main()) +``` + +As we are using a async client the request call needs to be awaited. +It is recommended to use `AsyncClient` in a context manager, the connection can be closed manually using `aclient.aclose()`. + +## Streaming requests + +To make a request call the response for which would be a stream, the http verb method can not be used directly instead the `stream()` method is used. + +```python +import httpx + +data = b'' +with httpx.stream("GET", "https://speed.hetzner.de/100MB.bin") as response: + for chunk in response.iter_bytes(): + data += chunk +print(data) +``` + +The variable `data` will contain response data as bytes. + +Streaming response can be iterated in four ways: +- `response.iter_raw()`: read data raw without decoding +- `response.iter_text()`: read data as text +- `response.iter_bytes()`: read data as bytes +- `response.iter_lines()`: read data as lines of text + +Async version of streaming response: + +```python +import asyncio +import httpx + +async def main(): + data = b'' + async with httpx.AsyncClient() as aclient: + async with aclient.stream("GET", "https://speed.hetzner.de/100MB.bin") as response: + async for chunk in response.aiter_bytes(): + data += chunk + print(data) + +asyncio.run(main()) +```