Skip to content

Conversation

AlexandreSinger
Copy link
Contributor

The original Prepacker API created a lot of global variables which were confusing to work with and hard to maintain.

Created a Prepacker class which maintains the state of the Prepacker and provides methods to access prepacking information.

I also tried to remove the accesses to the global scope from within the prepacker to isolate it from the rest of VTR. This is in an attempt to make it easier to work with.

@github-actions github-actions bot added VPR VPR FPGA Placement & Routing Tool lang-cpp C/C++ code labels Aug 24, 2024
@AlexandreSinger AlexandreSinger force-pushed the feature-prepacker-rework branch from 90c3579 to 0f24140 Compare August 24, 2024 00:50
The original Prepacker API created a lot of global variables which were
confusing to work with and hard to maintain.

Created a Prepacker class which maintains the state of the Prepacker and
provides methods to access prepacking information.

I also tried to remove the accesses to the global scope from within the
prepacker to isolate it from the rest of VTR. This is in an attempt to
make it easier to work with.
@AlexandreSinger AlexandreSinger force-pushed the feature-prepacker-rework branch from 0f24140 to 8dd7795 Compare August 24, 2024 04:20
@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz This PR has passed CI. This PR abstracts prepacking into a single class which, when initialized, sets up internal variables for tracking molecules. The variables for tracking molecules used to be stored in the atom context; this has been replaced with an instance of the prepacker object which is initialized in the packing stage. I also removed all of the calls to g_vpr_ctx from the API, isolating it from the rest of the flow.

All of this makes it easier to use prepacking in the analytical placement flow and makes it easier to maintain (changes in the packing flow should not break the AP flow).

@vaughnbetz
Copy link
Contributor

Looks very good; just a few commenting changes.
I suggest getting a QoR comparison for safety too though (VTR and/or Titan).

Updated documentation based on Vaughn's comments.

Also simplified the interface of getting a molecule from a block since
it appears as though each block will only be contained within one
molecule.
@AlexandreSinger
Copy link
Contributor Author

AlexandreSinger commented Aug 24, 2024

@vaughnbetz I have resolved your comments. I have found that, at least for the strong tests, each block will be contained in only one molecule. This matches what I think I see in the prepacker code. I have updated the API interface to reflect this and made it much more efficient by turning the lookups from a multimap (O(log(n))) into a vtr::vector(O(1)). I have added asserts to ensure this assumption. We will see if the CI agrees with this change, I can easily undo it if I am wrong.

I will run the VTR benchmarks soon (I want to ensure a good number of the CI tests complete) to get QoR results, I do not expect any slowdown or QoR change.

For improved performance (and to simplify the interface further),
changed the internal maps to VTR Vectors.

Added asserts to ensure assumptions are maintained.
@AlexandreSinger AlexandreSinger force-pushed the feature-prepacker-rework branch from 7ac6cd9 to 0408e7d Compare August 24, 2024 16:34
@AlexandreSinger
Copy link
Contributor Author

AlexandreSinger commented Aug 24, 2024

@vaughnbetz Here are the QoR results for VTR:

  vtr_parse_results_new_yosys.txt vtr_parse_results_prepack.txt
vtr_flow_elapsed_time 1 0.995878184
odin_synth_time    
parmys_synth_time 1 0.910618505
abc_depth 1 1
abc_synth_time 1 1.012554285
num_clb 1 1
num_memories 1 1
num_mult 1 1
max_vpr_mem 1 0.994725417
num_pre_packed_blocks 1 1
num_post_packed_blocks 1 1
device_grid_tiles 1 1
pack_time 1 0.999367909
placed_wirelength_est 1 1
place_time 1 1.015163644
placed_CPD_est 1 1
min_chan_width 1 1
routed_wirelength 1 1
min_chan_width_route_time 1 1.005690237
crit_path_routed_wirelength 1 1
critical_path_delay 1 1
geomean_nonvirtual_intradomain_critical_path_delay 1 1
crit_path_route_time 1 0.990201513

The change in parmys_synth_time seems to be caused by smaller circuits (such as ch_intrinsics.v, diffeq1.v, and raygentop.v) being around 2x faster and bringing down the average. Everything else seems to be good. This makes sense. Although I made getting the molecule for a block O(1), the run time of the packing algorithm is dominated by the legalization of packing a molecule into a cluster (not getting the molecule).

Here is the raw data:
comparison_output.xlsx

When this passes CI, it should be good to merge unless you have any further comments.

@vaughnbetz
Copy link
Contributor

Looks good to me!

@AlexandreSinger
Copy link
Contributor Author

@vaughnbetz Please merge when you get the chance! Its a bit of a large PR and I want to prevent it from regressing. My next PR will also depend on this one.

@vaughnbetz vaughnbetz merged commit c7b9ce0 into verilog-to-routing:master Aug 27, 2024
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang-cpp C/C++ code VPR VPR FPGA Placement & Routing Tool
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants