US Department of Transportation (USDOT) Intelligent Transportation Systems (ITS) Joint Program Office (JPO) Message Deduplicator
The JPO-Deduplicator is a Kafka Java spring-boot application designed to reduce the number of messages stored and processed in the ODE system. This is done by reading in messages from an input topic (such as topic.ProcessedMap) and outputting a subset of those messages on a related output topic (topic.DeduplicatedProcessedMap). Functionally, this is done by removing deduplicate messages from the input topic and only passing on unique messages. In addition, each topic will pass on at least 1 message per hour even if the message is a duplicate. This behavior helps ensure messages are still flowing through the system.
The following topics currently support deduplication.
- topic.OdeMapJson -> topic.DeduplicatedOdeMapJson
- topic.ProcessedMap -> topic.DeduplicatedProcessedMap
- topic.ProcessedMapWKT -> topic.DeduplicatedProcessedMapWKT
- topic.OdeTimJson -> topic.DeduplicatedOdeTimJson
- topic.OdeBsmJson -> topic.DeduplicatedOdeBsmJson
- topic.ProcessedBsm -> topic.DeduplicatedProcessedBsm
- topic.ProcessedSpat -> topic.DeduplicatedProcessedSpat
The processes that determine which messages are duplicates are unique and customized for each message type. The following is a detailed explanation on each message type's criteria for when messages are deduplicated. All of the criteria must be met for a single message type.
- Two messages within a 1 hour time window
- Two messages have the same intersection ID
- Two messages within a 1 hour time window
- Two messages have the same hash values after factoring out the odeReceivedAt and timeStamp fields
- Two messages within a 1 hour time window
- Two messages have the same packet ID
- Two message have the same msgCnt value
- Two messages within the defined time threshold defined by the
odeBsmMaximumTimeDeltavalue in application.yaml - The new message's speed is identical to the previous message's speed
- The new message's speed is below the speed threshold defined by the
odeBsmAlwaysIncludeAtSpeedvalue in application.yaml - The position delta between the messages is not suitably large defined by the
odeBsmMaximumPositionDeltavalue in application.yaml or position information is null
- Two messages within the defined time threshold defined by the
odeBsmMaximumTimeDeltavalue in application.yaml - The new speed is below the speed threshold defined by the
odeBsmAlwaysIncludeAtSpeedvalue in application.yaml - The position delta between the messages is not suitably large defined by the
odeBsmMaximumPositionDeltavalue in application.yaml or position information is null
- Two messages within 1 minute time window
- Signal states are identical
- Signal light phases are identical
The current version and release history of the JPO Deduplicator: Release Notes
Recommended machine specs running Docker to run the JPO-Deduplicator:
-
Minimum RAM: 16 GB
-
Minimum storage space: 100 GB
-
Supported operating systems:
- Ubuntu 20.04 Linux (Recommended)
- Windows 10/11 Professional (Professional version required for Docker virtualization)
- OSX 10 Mojave
The JPO-Deduplicator software can run on most standard Window, Mac, or Linux based computers with Pentium core processors. Performance of the software will be based on the computing power and available RAM in the system. Larger data flows can require much larger space requirements depending on the amount of data being processed by the software. The JPO-Deduplicator software application was developed using the open source programming language Java. If running the JPO-Deduplicator outside of Docker, the application requires the Java 21 runtime environment.
- Create a copy of
sample.envand rename it to.env. - Create a copy of the
jpo-utils/sample.envfile and rename it to.envin thejpo-utilsdirectory. - Update the variable
MAVEN_GITHUB_TOKENin the root.envfile to a github token used for downloading jar file dependencies. For full instructions on how to generate a token please see here. - Navigate back to the root directory and run the following command:
make build- Make sure that you have docker-compose and make installed.
- Run the following command:
make start - Produce a sample message to one of the sink topics by using
kafka_uiby:- Go to
localhost:8001 - Click local -> Topics
- Select
topic.OdeMapJson - Select
Produce Message - Copy in sample JSON for a Map Message
- Click
Produce Messagemultiple times
- Go to
- View the synced message in
kafka_uiby:- Go to
localhost:8001 - Click local -> Topics
- Select
topic.DeduplicatedOdeMapJson - You should now see only one copy of the map message sent.
- Go to
The JPO-Deduplicator is a micro service that runs as an independent application but serves the sole purpose of deduplicating JSON objects created by the JPO-ODE via Apache Kafka. To support these JSON objects, the JPO-Deduplicator application utilizes some classes from the JPO-ODE, JPO-GeojsonConverter, and the JPO-ConflictMonitor. These classes are referenced in the JPO-Deduplicator by pulling the built .jar artifact from GitHub Maven Central. All other required dependencies will automatically be downloaded and installed as part of the Docker build process.
- Docker: https://docs.docker.com/engine/installation/
- Docker-Compose: https://docs.docker.com/compose/install/
To manually configure deduplication for a topic, the following environment variables can also be used.
| Environment Variable | Description |
|---|---|
ENABLE_PROCESSED_MAP_DEDUPLICATION |
true / false - Enable ProcessedMap message Deduplication |
ENABLE_PROCESSED_MAP_WKT_DEDUPLICATION |
true / false - Enable ProcessedMap WKT message Deduplication |
ENABLE_ODE_MAP_DEDUPLICATION |
true / false - Enable ODE MAP message Deduplication |
ENABLE_ODE_TIM_DEDUPLICATION |
true / false - Enable ODE TIM message Deduplication |
ENABLE_PROCESSED_SPAT_DEDUPLICATION |
true / false - Enable ProcessedSpat Deduplication |
ENABLE_ODE_BSM_DEDUPLICATION |
true / false - Enable ODE BSM Deduplication |
ENABLE_PROCESSED_BSM_DEDUPLICATION |
true / false - Enable Processed BSM Deduplication |
A GitHub token is required to pull artifacts from GitHub repositories. This is required to obtain the jpo-deduplicator jars and must be done before attempting to build this repository.
- Log into GitHub.
- Navigate to Settings -> Developer settings -> Personal access tokens.
- Click "New personal access token (classic)".
- As of now, GitHub does not support
Fine-grained tokensfor obtaining packages.
- As of now, GitHub does not support
- Provide a name and expiration for the token.
- Select the
read:packagesscope. - Click "Generate token" and copy the token.
- Copy the token name and token value into your
.envfile.
For local development the following steps are also required
8. Create a copy of settings.xml and save it to ~/.m2/settings.xml
9. Update the variables in your ~/.m2/settings.xml with the token value and target jpo-ode organization.
Install the IDE of your choice:
- Eclipse: https://eclipse.org/
- STS: https://spring.io/tools/sts/all
- IntelliJ: https://www.jetbrains.com/idea/
- VS Code: https://code.visualstudio.com/
Contact the developers of the JPO-Deduplicator application by submitting a Github issue.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either expressed or implied. See the License for the specific language governing permissions and limitations under the License.
Please read the ODE contributing guide to learn about our development process, how to propose pull requests and improvements, and how to build and test your changes to this project.